| Summary | Package variables | Synopsis | Description | General documentation | Methods |
| Summary | Top |
| Clair::LinkPolicy::LinkPolicyBase - Base class for creating corpora from collections |
| Package variables | Top |
| No package variables defined. |
| Included modules | Top |
| Carp |
| Clair::Utils::CorpusDownload |
| Synopsis | Top |
| Description | Top |
| Base class for document linking METHODS IMPLEMENTED BY THIS CLASS: new_linker Base Class Constructor METHODS REQUIRED BY SUBCLASSES: create_corpus Creates a corpus using this link policy. Generic object constructor for all link policies. Should only be called by subclass constructors. REQUIRED PARAMETERS: base_collection => $collection_object |
| Methods | Top |
| create_corpus | Description | Code |
| create_html_no_anchors | Description | Code |
| create_html_with_anchors | Description | Code |
| new_linker | No description | Code |
| pop_target | Description | Code |
| pop_term | Description | Code |
| prepare_directories | No description | Code |
| read_links_no_anchors | No description | Code |
| read_links_with_anchors | No description | Code |
| textfile2html_no_anchors | Description | Code |
| textfile2html_with_anchors | Description | Code |
| uniq_terms | Description | Code |
| create_corpus | code | next | Top |
| Generates a corpus using this link policy and the given base collection. Because this method is policy-specific, it must be implemented by children classes. |
| create_html_no_anchors | code | prev | next | Top |
| This method should be called by the child class's "create_corpus" method - it reads in the .links file and creates the appropriate HTML documents. Use this method if the anchor text of the links is irrelevant. |
| create_html_with_anchors | code | prev | next | Top |
| This method should be called by the child class's "create_corpus" method - it reads in the .links file and creates the appropriate HTML documents. This method assumes that there is a third column in the .links file, which is the anchor text to be used in linking. REQUIRED PARAMETERS: src_doc_dir => directory where source documents are located html_dir => directory where the html docs will go links_file => file with links specification base_url => base URL to use in html hyperlinks |
| pop_target | code | prev | next | Top |
| Takes a string of whitespace-delimited targets, and removes the last element. Returns the new string and the removed target. |
| pop_term | code | prev | next | Top |
| Takes a string of whitespace-delimited terms, and removes the last element. Returns the new string and the removed term. |
| textfile2html_no_anchors | code | prev | next | Top |
| Returns HTML text in a term array based on the given link model and the given raw text file name and link model. |
| textfile2html_with_anchors | code | prev | next | Top |
| Returns HTML text in a term array based on the given link model and the given raw text file name and link model. |
| uniq_terms | code | prev | next | Top |
| Takes a string, and removes repeated occurrences of terms. All whitespace is replaced by a single space. |
| create_corpus | description | prev | next | Top |
sub create_corpus
{my $self = shift; my $corpus = shift; # save current directory} |
| create_html_no_anchors | description | prev | next | Top |
sub create_html_no_anchors
{my $self = shift; my %params = @_; # Verify params} |
| create_html_with_anchors | description | prev | next | Top |
sub create_html_with_anchors
{my $self = shift; my %params = @_; # Verify params} |
| new_linker | description | prev | next | Top |
sub new_linker
{ my $class = shift;
my $self = bless { @_ }, $class;
# Verify parameters} |
| pop_target | description | prev | next | Top |
sub pop_target
{ my $str = shift;
my $term;
my @line = split (/\s+/, $str);
$term = pop (@line);
return ("@line", $term);} |
| pop_term | description | prev | next | Top |
sub pop_term
{ my $str = shift;
my $term;
my @line = split (/\s+/, $str);
$term = pop (@line);
return ("@line", $term);} |
| prepare_directories | description | prev | next | Top |
sub prepare_directories
{ my $self = shift;
my $corpus_name = shift;
my $download_dir = $self->{download_base} . "/" . $corpus_name;
my $corpus_dir = $self->{corpus_data} . "/" . $corpus_name;
my $corpora_dir = $self->{corpora_base} . "/" . $corpus_name;
unless (-d $corpus_dir) {
mkdir ($corpus_dir, 0775) ||
croak "Could not create directory $corpus_dir\n";
}
unless (-d $download_dir) {
mkdir ($download_dir, 0775) ||
croak "Could not create directory $download_dir\n";
}
unless (-d $corpora_dir) {
mkdir ($corpora_dir, 0775) ||
croak "Could not create directory $corpora_dir\n";
}} |
| read_links_no_anchors | description | prev | next | Top |
sub read_links_no_anchors
{my $infile = shift; my %model; my ($from, $to); open (LF, $infile) || die "Cant open $infile\n"; # Grab links, add them to our model} |
| read_links_with_anchors | description | prev | next | Top |
sub read_links_with_anchors
{my $infile = shift; my %model; my ($from, $to, $anchor); open (LF, $infile) || die "Cant open $infile\n"; # Grab links, add them to our model} |
| textfile2html_no_anchors | description | prev | next | Top |
sub textfile2html_no_anchors
{my ($url, $src_dir, $src_file, $linkmodel) = @_; my ($target, $anchor); my @line; my $remaining; my %anchor2targets; # Maps anchors to target docs} |
| textfile2html_with_anchors | description | prev | next | Top |
sub textfile2html_with_anchors
{my ($url, $src_dir, $src_file, $linkmodel) = @_; my ($target, $anchor); my @line; my $remaining; my %anchor2targets; # Maps anchors to target docs} |
| uniq_terms | description | prev | next | Top |
sub uniq_terms
{ my $str = shift;
my @uniq;
my %seen;
foreach my $term (split /\s+/, $str) {
unless (exists $seen{$term}) {
# This is the first we've seen this term.} |
| Prepare corpus directories | Top |