| Summary | Package variables | Synopsis | General documentation | Methods |
| Summary | Top |
| Clair::Algorithm::LSI |
| Package variables | Top |
| No package variables defined. |
| Included modules | Top |
| Clair::Document |
| Lingua::Stem |
| Lingua::Stem::En |
| PDL::Basic |
| PDL::IO::Storable |
| PDL::Lite |
| PDL::Matrix |
| PDL::MatrixOps |
| PDL::Ufunc |
| Storable |
| Synopsis | Top |
| Provides latent semantic indexing interfacing with Clair::Cluster. The interface provides functionality for the construction of the index, which consist of the singular value decomposition of the document-term matrix underlying the cluster, as well as for the mapping and ranking of terms, documents, and queries into latent semantic space. Envisioned for this package are further functionalities, as well as their refinements and optimizations. |
| Description | Top |
| Methods | Top |
| build_index | Description | Code |
| new | Description | Code |
| build_index | code | next | Top |
| $index->build_index(); (public) Constructs the latent semantic index, which is defined by the singular value decomposition of the associated cluster's document-term matrix. Sets the initial approximation to full rank (K = N). |
| new | code | prev | next | Top |
| $index = Clair::Algorithm::LSI->new(type => "stem") (public) Instantiates a new latent semantic index (LSI) from either an existing Clair::Cluster object or by loading a previously saved index from file. In the latter case, the originally associated cluster may have been saved together with the index or may not have been so saved. In the latter case, the user has the option of specifying an existing cluster to be (re-)associated with the index. <B file> A path to a file containing a previously saved index. <B cluster> A reference to a cluster to be newly associated to the index to be built or reassociated to the existing index being loaded. <B type> (optional) the type of index (stemmed is the default) |
| build_index | description | prev | next | Top |
sub build_index
{ my $self = shift;
my $c = $self->{cluster};
my $type = $self->{type};
# Avoid unnecessarily rebuilding index} |
| new | description | prev | next | Top |
sub new
{ my $class = shift;
my %params = @_;
my $file = $params{file};
my $cluster = $params{cluster};
my $hashref;
# If file specified...} |
| get_approx_rank | Top |
| $approx_rank = $index->get_approx_rank(); (public) Returns the rank K of the current approximation; where K <= N, the full rank of the approximation. |
| set_approx_rank | Top |
| $index->set_approx_rank($K); (public) Sets the rank K of the current approximation, where K <= N, the full rank of the approximation. If some K > N is specified, the rank K retains its previous value. K The desired rank of the approximation. |
| term_to_latent_space | Top |
| $v = $index->term_to_latent_space($term); (public) Maps the specified term to its position vector in latent semantic space. term The (unstemmed) term. |
| query_to_latent_space | Top |
| $v = $index->query_to_latent_space($querystring); (public) Maps the specified query to its position vector in latent semantic space. The query is treated exactly like a document the text of which is precisely the query text. query The (unstemmed) query string. |
| doc_to_latent_space | Top |
| $v = $index->doc_to_latent_space($docref); (public) Maps the specified document to its position vector in latent semantic space. docref A reference to a Clair::Document object. |
| rank_terms | Top |
| (public) Compute the distance from the origin term of each of the specified terms, in latent semantic space. If no terms beside the origin term are specified, then the distance of each term occurring in the underlying cluster is computed. %term_distances = $index->rank_terms($origin_term); origin_term The "origin term" (from which distances are to be computed). terms (optional) A list of (unstemmed) terms. |
| rank_queries | Top |
| (public) Compute the distance from the origin query of each of the specified queries, in latent semantic space. %query_distances = $index->rank_queries($origin_query); origin_query The "origin query" (from which distances are to be computed). queries A list of query strings |
| rank_docs | Top |
| (public) Compute the distance from the origin document of each of the specified documents, in latent semantic space. If no documents beside the origin document are specified, then the distance of each document in the underlying cluster is computed. origin_docref A reference to the "origin document" (from which distances are to be computed). docrefs (optional) A list of other document references. |
| save_to_file | Top |
| (public) Dump the latent semantic index to file as a Storable object. # Only dump the associated cluster as well if the user so # specifies. file Path where the index is to be saved. savecluster (optional) 1 if the associated cluster is to be dumped together with the index; 0 if not to be dumped |