| Summary | Package variables | Synopsis | Description | General documentation | Methods |
| Summary | Top |
| package Clair::StringManip Majority of the string manipulation routines required by other packages are implemented here. |
| Package variables | Top |
| No package variables defined. |
| Included modules | Top |
| Clair::Debug |
| Data::Dumper |
| Lingua::Stem |
| Synopsis | Top |
| Necessary string manipulations such as stripping of meta characters, and word stemming is implemented here. You can try putting in arbitrary string and see how it works by: use Clair::StringManip;
my $strmanip = new Clair::StringManip();
my $return $strmanip->stem("operational operations operator");
print $return . "\n"; |
| Description | Top |
| Other string-related functions will be implemented here. The subroutines should be able to handle both SCALAR or ARRAY-ref as input param and return values should also be arbitrated between SCALAR and ARRAY-ref. |
| Methods | Top |
| lowercase | Description | Code |
| new | Description | Code |
| normalize_input | Description | Code |
| stem | Description | Code |
| strip | Description | Code |
| tokenize | Description | Code |
| lowercase | code | next | Top |
| Lowercases the string. |
| new | code | prev | next | Top |
| The constructor. As with other modules, make sure you specify the DEBUG flag for standardized debug printing: my $obj = new StringManip(DEBUG => $DEBUG); |
| normalize_input | code | prev | next | Top |
| Used for user query string processing. It parses and tokenizes the query string into appropriate segments. |
| stem | code | prev | next | Top |
| Takes either the string or the arrayref and stems the tokens (words) using Lingua::Stem module. Return value can be either string or arrayref based on the last parameter. |
| strip | code | prev | next | Top |
| Strips meta charcters from the string. |
| tokenize | code | prev | next | Top |
| Tokenizes the words, effectively getting rid of all the extra empty spaces. return values can be either string or arrayref depending on the last input param. |
| lowercase | description | prev | next | Top |
sub lowercase
{my ($self, $string) = @_; return lc $string;} |
| new | description | prev | next | Top |
sub new
{
my ($proto, %args) = @_;
my $class = ref $proto || $proto;
my $self = bless {}, $class;
$DEBUG = $args{DEBUG} || $ENV{MYDEBUG};
$self->{lowercase} = 1;
$self->{tokenize} = 1;
$self->{stem} = 1;
# overrides} |
| normalize_input | description | prev | next | Top |
sub normalize_input
{
my ($self, $input, $no_stem) = @_;
my @tokens = $input =~ m/(!{0,1}\w+|!{0,1}"[\w\s]+")/gs;
$_ =~ s/["']//g for @tokens;
$_ =~ s/^\s*|\s*$//g for @tokens;
# parse the query and then stem} |
| stem | description | prev | next | Top |
sub stem
{
my ($self, $items, $return_array) = @_;
# stem the words} |
| strip | description | prev | next | Top |
sub strip
{my ($self, $string) = @_; # strip all special chars - anything other than alpha-numeric or spaces} |
| tokenize | description | prev | next | Top |
sub tokenize
{my ($self, $string, $return_array) = @_; # tokenize all the words - split by empty spaces} |
| AUTHOR | Top |
| JB Kim jbremnant@gmail.com 20070407 |
| TODOS | Top |
Migrate the input normalizing function from Info::Query into this module. |