| Summary | Package variables | Synopsis | General documentation | Methods |
| Summary | Top |
| Clair::Interface::Weka |
| Package variables | Top |
| |
| $BUFFER_LEN = 1048576 |
| Included modules | Top |
| Clair::Config |
| Hash::Flatten qw ( flatten ) |
| Synopsis | Top |
| Provides an interface between Clair::Cluster and the Java machine learning toolkit Weka. The interface provides functionality for the automatic writing of document feature vectors to ARFF file as well as easy, low-overhead use of classifiers for training and testing. It is envisioned that in the future this package should provide a truly seamless interface between Clairlib and Weka for carrying out machine learning tasks the tools for which are implemented by the latter. |
| Description | Top |
| Methods | Top |
| test_classifier | Description | Code |
| train_classifier | Description | Code |
| write_ARFF | Description | Code |
| test_classifier | code | next | Top |
| Clair::Interface::Weka->test_classifier(mem => $mem, classifier => $classifier, modelfile => $modelfile, testfile => $testfile, predfile => $predfile, logfile => $logfile); (public) Evaluates a Weka classifier of the specified class given a (previously trained) model and a test file in ARFF format. Various parameters can be supplied to customize the details of the evaluation and the output thereby generated. mem (optional) A numeric argument specifying the heap size to be allocated by the Java VM. The actual argument passed to the VM is "-xM$memM". classifier The full package name of the Weka classifier to be used, e.g. "weka.classifiers.rules.ZeroR". modelfile A path to a file containing a previously trained model of the specified class of classifier. testfile A path to an existing ARFF file to be used for evaluation of the trained classifier. predfile A path to where the classifier's predictions for each feature vector in the test data are to be written. logfile (optional) A path to where a log of the classifier's output is to be written. |
| train_classifier | code | prev | next | Top |
| Clair::Interface::Weka->train_classifier(mem => $mem, classifier => $classifier, trainfile => $trainfile, modelfile => $modelfile, testfile => $testfile, logfile => $logfile); (public) Trains a Weka classifier of the specified class, given a training file and (optionally) a test file in ARFF format. Various parameters can be supplied to customize the details of the training and the output thereby generated. mem (optional) A numeric argument specifying the heap size to be allocated by the Java VM. The actual argument passed to the VM is "-xM$memM". classifier The full package name of the Weka classifier to be used, e.g. "weka.classifiers.rules.ZeroR". trainfile A path to an ARFF file containing the training data. modelfile A path to where the classifier model is to be written. testfile (optional) A path to an existing ARFF file to be used for cross-validation (testing) of the trained classifier. If none is specified, then tenfold cross-validation on the training data is used as the method of validation. logfile (optional) A path to where a log of the classifier's output is to be written. |
| write_ARFF | code | prev | next | Top |
| Clair::Interface::Weka->write_ARFF($c, $outfile, $header); (public) Writes feature vectors for all the documents in the specified cluster to a Weka ARFF (attribute-relation file format) file. c A reference to a Clair::Cluster object. outfile A path to where the ARFF file is to be written. header A string of header text to be prepended (using comments) to the ARFF file. |
| test_classifier | description | prev | next | Top |
sub test_classifier
{ my %params = @_;
my $mem = (defined $params{mem} ? "-mx$params{mem}m" : "");
my $classifier = $params{classifier} || "weka.classifiers.rules.ZeroR ";
my $modelfile = $params{modelfile};
my $testfile = $params{testfile};
my $predfile = (defined $params{predfile} ? "> $params{predfile}" : "");
my $logfile = (defined $params{logfile} ? "> $params{logfile}" : "");
# Execution templates (depending on whether $predfile is defined):} |
| train_classifier | description | prev | next | Top |
sub train_classifier
{ my %params = @_;
my $mem = (defined $params{mem} ? "-mx$params{mem}m" : "");
my $classifier = $params{classifier};
my $trainfile = $params{trainfile};
my $modelfile = $params{modelfile};
my $testfile = (defined $params{testfile} ? "-T $params{testfile}" : "");
my $logfile = (defined $params{logfile} ? "> $params{logfile}" : "");
# Execution template:} |
| write_ARFF | description | prev | next | Top |
sub write_ARFF
{ my $c = shift;
my $outfile = shift;
my $header = shift;
open(local *FH, '>', $outfile)
or die "write_ARFF() - unable to open $outfile for output";
# Take relation name from cluster id, or '?' if not defined} |