| Summary | Package variables | Synopsis | Description | General documentation | Methods |
| Summary | Top |
| package Clair::Learn Implement various learning algorithms here. Default algorithm is Perceptron. |
| Package variables | Top |
| No package variables defined. |
| Included modules | Top |
| Clair::Debug |
| Clair::Features |
| Data::Dumper |
| File::Path |
| Synopsis | Top |
| Use the train data produced by Clair::Feature.pm (in svm_light format) to train the classifier. The underlying algorithm can be either Naive Bayes or Perceptron. Here, the "train" parameter is required in the constructor. use Clair::Learn; my $lea = new Clair::Learn(DEBUG => $DEBUG, train => "train.dat", model => "model.file"); $lea->learn($algo); |
| Description | Top |
| The module should provide the ability to choose between different classifier algorithms. However, it defaults to Perceptron for learning. |
| Methods | Top |
| _learn_perceptron | Description | Code |
| dot_product | Description | Code |
| learn | Description | Code |
| new | Description | Code |
| read_model | Description | Code |
| _learn_perceptron | code | next | Top |
Implementation of perceptron algorithm. From the book,
"Modeling the Internet and the Web":
Perceptron(D)
W <- 0
w0 <- 0
repeat
e <- 0
for i < 1 .. n
do s <- sgn(y_i ( W' * X_i + w0 ))
if s < 0
then W <- W + y_i * X_i
w0 <- w0 + y_i
e <- e + 1
until e = 0
return (W, w0)
From the lecture notes:
W0 = 0, k = 0
For i = 1 to n
if y_i * (W_k * X_i) <= 0 //mistake
W_k+1 = W_k + eta * y_i * X_i
k = k + 1
end
end
Some notes:
n = number of documents
X_i = feature vector for i-th doc
W = weight vector
y_i = class identifier (-1 or +1) for i-th doc |
| dot_product | code | prev | next | Top |
Compute the dot product of two matrices - each matrix is a hash. |
| learn | code | prev | next | Top |
A wrapper function for the underlying algorithms. |
| new | code | prev | next | Top |
The constructor. Initializes several container hashes for later use. We instantiate the Feature.pm object here, because it has the routines to read in the svm_light formatted training data and convert it into a necessary hash structure. |
| read_model | code | prev | next | Top |
A simple function to read in key value-pair from a model file generated from Learn.pm. The model file should contain estimated coefficients/weights from the default (perceptron) algorithm. |
| _learn_perceptron | description | prev | next | Top |
sub _learn_perceptron
{
my ($self, $eta) = @_;
my $w = {};
my $w0 = 0;
$eta = $self->{eta} unless($eta);
for my $d (@{$self->{train_data}})
{
my $y = $d->{class};
my $x = $d->{features};
my $sum = $self->dot_product($x, $w);
my $s = $y * ( $sum + $w0 ); # linear equation} |
| dot_product | description | prev | next | Top |
sub dot_product
{my ($self, $a, $b) = @_; # we are only interested in multiplying the keys that intersect,} |
| learn | description | prev | next | Top |
sub learn
{
my ($self, $algo, $eta) = @_;
$self->errmsg("the 'train' parameter is required in the constructor for learn() method", 1)
unless($self->{train_data});
$algo = "_learn_perceptron" unless($algo);
$self->debugmsg("running\$ self->$algo()", 2);
return $self->$algo($eta);} |
| new | description | prev | next | Top |
sub new
{
my ($proto, %args) = @_;
my $class = ref $proto || $proto;
my $self = bless {}, $class;
$DEBUG = $args{DEBUG} || $ENV{MYDEBUG};
$self->{train} = "output.train";
$self->{model} = "model";
$self->{eta} = 1;
# overrides} |
| read_model | description | prev | next | Top |
sub read_model
{
my ($self, $modelfile) = @_;
$self->errmsg("file '$modelfile' does not exist", 1) unless(-f $modelfile);
open MF, "< $modelfile" or $self->errmsg("cannot open '$modelfile': $!", 1);
my @lines = <MF>;
close MF;
chomp @lines;
my %hash = ();
for my $l (@lines)
{
my ($key, $val) = split /\s+/, $l;
$hash{$key} = $val;
}
return\% hash;} |
| AUTHOR | Top |
| JB Kim
jbremnant@gmail.com 20070407 |