Development
From Clairlib
[edit] Developer Mailing List
For questions about Clairlib, to provide information about bugs, to suggest additional or revised features, or to find out how you can contribute to Clairlib, email clairlib-dev.
[edit] Changes
====1.04B June 2008====
- Added -no-duplicated-edges in convert_network.pl
- Added largest connected component in cos_to_stats.pl
- Added full avergage shortest path in print_network.pl
- fixed divide by zero error in Network.pm, Betweeness.pm
====1.04A April 2008====
- Added Clair::Network::GirvanNewman algorithm to do hierarchical clustering
- Added Clair::Network::KernighanLin algorithm to do graph partition
====1.04 Feburary 2008====
- Added Clair::Network::AdamicAdar to compute the adamic/adar value for a given network corpus
- Added Clair::ChisqIndependent to compute p-value and degree of freedom for Chi square
====1.03 August 2007====
- Added functionality to perform community finding within weighted, undirected networks
- Added util/chunk\_document.pl to break documents into smaller files by word number
- Added option to retain punctuation for idf and tf queries
- Added option to print out full lists of idf and tf values for a corpus
- LexRank moved from Clair::Network to Clair::Network::Centrality::LexRank
- LexRank use now follows the same use pattern as the other centrality modules
====1.02 July 2007====
- Distribution reorganized in standard format
- Improved and expanded installation documentation (INSTALL)
- Improved POD (inline) documentation
- Additional examples
- Updated PDF documentation
====1.01 May 2007====
- Added Phrase-based Retrieval and Fuzzy OR Queries
- Extended Clairlib-ext with interfaces for the Cluster class and the Document class to the Weka machine learning toolkit
- Added LSI functionality
- Extended parsing of strings / files into Documents
- Added perceptron learning and classification for documents
====1.0 RC1 April 2007====
- Moved all Clair modules beneath the Clair::* namespace, updated documentation
- Improved Network Analysis, added Clustering Coefficients code
- Added Network Generation and Statistics modules
====0.955 March 2007====
- Made it possible to distribute clairlib in two distributions, one containing core code and another containing code that may be dependent on other resources
- Cleaned up unit tests
====0.953 February 2007====
- Fixed bugs in Clair::Cluster, Clair::Document involving stemming
- Cleaned up t/ and test/ directories
- Created util/ directory
- Added scripts to util/ directory to:
- Run a Google query and save the returned URLs to a file
- Download files from a URL and build a corpus
- Segment a document into sentences and build a corpus of the sentences
- Take all documents in a directory and create a corpus
- Index the corpus (compute TF*IDF, etc.)
- Compute cosine similarity measures between all documents in a corpus
- Generate networks corresponding to various cosine thresholds
- Print network statistics about a network file
- Generate plots of degree distribution and cosine transitions
- New methods in Clair::Network:
print_network_info get_network_info_as_string get_cumulative_distribution cumulative_power_law_exponent find_components newman_clustering_coefficient linear_regression

