Clair

Network


SummaryPackage variablesSynopsisGeneral documentationMethods

SummaryTop
Clair::Network - Network Class for the CLAIR Library
head1 VERSION
Version 0.01

Package variablesTop
No package variables defined.

Included modulesTop
BerkeleyDB
Clair::Config
Clair::Document
Clair::GraphWrapper
Clair::Network::Reader::Edgelist
Clair::Network::Reader::Pajek
Clair::Network::Writer::Edgelist
Clair::Network::Writer::Pajek
Clair::Statistics::Distributions::TDist
Clair::Util
File::Temp qw /tempfile tempdir/
Graph::Directed
Graph::Undirected
MEAD::SimRoutines
MLDBM qw ( DB_File Storable )
Math::MatrixReal
Storable qw /dclone/
lib " $MEAD_HOME /lib "

SynopsisTop
The Network Class is one of the core modules for the CLAIR library. The Network described a structure
of relationships between nodes, and has operations for performing many typical graph functions, such as
finding the diameter of the graph, adding and removing normal and external nodes, and creating edges.

DescriptionTop
No description!
MethodsTop
DESTROYNo descriptionCode
Watts_Strogatz_clus_coeffDescriptionCode
Watts_Strogatz_local_clus_coeffDescriptionCode
add_edgeDescriptionCode
add_nodeDescriptionCode
add_weighted_edgeDescriptionCode
average_cosinesDescriptionCode
average_shortest_pathDescriptionCode
avg_in_degreeDescriptionCode
avg_out_degreeDescriptionCode
avg_total_degreeDescriptionCode
compute_cohesionDescriptionCode
compute_in_link_histogramDescriptionCode
compute_out_link_histogramDescriptionCode
compute_pagerankDescriptionCode
compute_random_walk_stepNo descriptionCode
compute_rank_resultNo descriptionCode
compute_rank_stepNo descriptionCode
compute_stationary_distributionDescriptionCode
compute_total_link_histogramDescriptionCode
cosine_histogramsDescriptionCode
count_linesNo descriptionCode
create_cluster_from_lexrankDescriptionCode
create_cosine_dat_filesDescriptionCode
create_network_from_lexrankDescriptionCode
create_subset_networkDescriptionCode
create_subset_network_from_fileDescriptionCode
create_uniform_vectorNo descriptionCode
dfs_visit_1DescriptionCode
dfs_visit_2DescriptionCode
diameterDescriptionCode
export_to_PajekDescriptionCode
find_largest_componentDescriptionCode
find_largest_component_sizeNo descriptionCode
find_pathDescriptionCode
find_sccDescriptionCode
get_current_probability_distributionDescriptionCode
get_current_probability_matrixNo descriptionCode
get_dat_statsDescriptionCode
get_edge_attributeDescriptionCode
get_edge_weightDescriptionCode
get_edgesDescriptionCode
get_histogram_as_stringNo descriptionCode
get_indexDescriptionCode
get_node_weightDescriptionCode
get_predecessor_matrixDescriptionCode
get_property_hashNo descriptionCode
get_property_matrixNo descriptionCode
get_property_vectorNo descriptionCode
get_sccNo descriptionCode
get_shortest_pathDescriptionCode
get_transition_probability_matrixNo descriptionCode
get_undirected_graphDescriptionCode
get_vertex_attributeDescriptionCode
get_verticesDescriptionCode
harmonic_mean_geodesic_distanceDescriptionCode
has_edgeDescriptionCode
has_edge_attributeDescriptionCode
has_nodeDescriptionCode
has_vertex_attributeDescriptionCode
is_vector_change_within_toleranceNo descriptionCode
iterative_dfs_visit_1DescriptionCode
iterative_dfs_visit_1_v2DescriptionCode
iterative_dfs_visit_2DescriptionCode
iterative_dfs_visit_2_v2DescriptionCode
make_matrix_stochasticNo descriptionCode
make_transitions_stochasticNo descriptionCode
newDescriptionCode
new_average_shorest_pathNo descriptionCode
new_hyperlink_networkNo descriptionCode
newman_power_law_exponentDescriptionCode
num_documentsDescriptionCode
num_linksDescriptionCode
num_nodesDescriptionCode
num_pairsDescriptionCode
perform_next_random_walk_stepNo descriptionCode
perform_next_rank_stepNo descriptionCode
power_law_exponentDescriptionCode
power_law_in_link_distributionDescriptionCode
power_law_out_link_distributionDescriptionCode
power_law_total_link_distributionDescriptionCode
print_current_lexrank_distributionDescriptionCode
print_current_pagerank_distributionDescriptionCode
print_current_probability_distributionDescriptionCode
print_dbNo descriptionCode
print_edges_with_propertyNo descriptionCode
print_hyperlink_edgesDescriptionCode
print_property_distributionNo descriptionCode
read_initial_probability_distributionDescriptionCode
read_matrix_property_from_fileNo descriptionCode
read_pagerank_initial_distributionDescriptionCode
read_pagerank_personalizationDescriptionCode
read_pagerank_probabilities_from_fileDescriptionCode
read_property_from_fileNo descriptionCode
read_transition_probabilities_from_fileNo descriptionCode
remove_edgeDescriptionCode
remove_nodeDescriptionCode
save_current_pagerank_distributionDescriptionCode
save_current_probability_distributionNo descriptionCode
save_edges_with_property_to_fileNo descriptionCode
save_hyperlink_edges_to_fileDescriptionCode
save_matrix_property_to_fileNo descriptionCode
save_pagerank_probabilities_to_fileDescriptionCode
save_property_distributionNo descriptionCode
save_transition_probabilities_to_fileNo descriptionCode
scale_to_unit_intervalNo descriptionCode
set_current_probability_matrixNo descriptionCode
set_edge_attributeDescriptionCode
set_edge_weightDescriptionCode
set_node_weightDescriptionCode
set_properties_from_matrixNo descriptionCode
set_property_matrixNo descriptionCode
set_transition_probability_from_matrixNo descriptionCode
set_vertex_attributeDescriptionCode
write_dbDescriptionCode
write_histogram_matlabDescriptionCode
write_link_distDescriptionCode
write_link_matlabDescriptionCode
write_linksDescriptionCode
write_nodesDescriptionCode

Methods description


Watts_Strogatz_clus_coeffcode    nextTop
Watts_Strogatz_clus_coeff(filename => $filename)
Computes the Watts Strogatz clustering coefficient. If a
filename is provided, intermediate output is written to
the file.

Watts_Strogatz_local_clus_coeffcodeprevnextTop
Get the local clustering coefficient for each vertex
This is only defined for vertices with > 2 edges

add_edgecodeprevnextTop
add_edge($id1, $id2);
Creates an edge between the two vertices specified. If a vertex is not already part
of the graph, it is automatically added.

add_nodecodeprevnextTop
add_node($id, $text)
Adds a vertex to the graph. Vertex has attribute text set to $text

add_weighted_edgecodeprevnextTop
add_weighted_edge($id1, $id2, $w);
Creates an edge between the two vertices specified. If a vertex is
not already part of the graph, it is automatically added.

average_cosinescodeprevnextTop
($linked_avg, $not_linked_avg) = average_cosines($cosine_matrix_reference)
Returns the average of the cosines between documents that are connected
in the matrix and between documents that are not connected. The averages
are returned in an array.

average_shortest_pathcodeprevnextTop
average_shortest_path()
Finds the average shortest path of a graph. The average shortest path is the
average of all of the shortest paths between pairs of vertices. To compute
this, we loop through each vertex, computing the shortest paths to all vertices
that vertex reaches. The average of that vertex is then computed. This is
repeated for all vertices with greater than zero out-degree in the graph, and
the average of that is returned.

avg_in_degreecodeprevnextTop
avg_in_degree()
Returns the average number of inlinks per node in the network

avg_out_degreecodeprevnextTop
avg_out_degree()
Returns the average number of outlinks per node in the network

avg_total_degreecodeprevnextTop
avg_total_degree()
Returns the average number of links (both out and in) per node in the network

compute_cohesioncodeprevnextTop
compute_cohesion
Computes the cohesion of the documents in the network.

compute_in_link_histogramcodeprevnextTop
compute_in_link_histogram()
Returns a histogram of the number of inlinks per node in the network

compute_out_link_histogramcodeprevnextTop
compute_out_link_histogram()
Returns a histogram of the number of outlinks per node in the network

compute_pagerankcodeprevnextTop
compute_pagerank(pagerank_value => 'pagerank_value', pagerank_transition => 'pagerank_transition',
pagerank_bias => 'pagerank_bias', jump => 0.15, tolerance => 0.0001, max_iterations => 200)
Computes the pagerank for the network. The property given for pagerank_value is used for the
initial value, and for pagerank_transition for the transition probabilities. The pagerank_bias
property is used to set the bias. If the network does not have any values for that property
(or they are all zero) then the unbiased pagerank is computed.
All parameters are optional, the defaults for the properties are given. Passing zero for any
numerical parameter (or not specifying that parameter) will cause the default value to be used.
The result is saved as the pagerank_value property of each node.

compute_stationary_distributioncodeprevnextTop
compute_stationary_distribution
Computes the stationary distribution from a random walk. This uses the values from the
probability distribution and the transition probabilities.

compute_total_link_histogramcodeprevnextTop
compute_total_link_histogram()
Returns a histogram of the number of links (both out and in) per node in the network

cosine_histogramscodeprevnextTop
cosine_histograms($cosine_matrix_reference)
Returns a histograms for cosines that are linked in the
graph and for cosines that are not.

create_cluster_from_lexrankcodeprevnextTop
create_cluster_from_lexrank($threshold, attribute_name => 'document', parent_document => 0)
Creates a cluster with any documents that currently have a lexrank value above the threshold.
The optional attribute_name parameter specifies what attribute of the node contains the
document. 'document', the default, is the attribute that will be used if the network
was created from a cluster. Setting the optional parent_document parameter to 1 will
create the cluster out of the parent document of each document, rather than the document
itself.

create_cosine_dat_filescodeprevnextTop
create_cosine_dat_files($domain, $cosine_matrix_reference, directory => $directory)
Creates dat files with information from the cosine matrix, based on
randomly selected cosines

create_network_from_lexrankcodeprevnextTop
create_network_from_lexrank
Creates a network with any nodes that currently have a lexrank value above the threshold.

create_subset_networkcodeprevnextTop
create_subset_network($@subset_vertices);
Creates a network with just the nodes in the array provided as the first parameter. Edges
from the original network are carried across to the network if they are between two
nodes that are in the new network.

create_subset_network_from_filecodeprevnextTop
create_subset_network_from_file($filename)
Creates a network with just the nodes listed in the file, one per each line. Edges from
the original network are carried across to the new network if they are between two
nodes that are in the new network.

dfs_visit_1codeprevnextTop
An internal function used by find_scc

dfs_visit_2codeprevnextTop
An internal function used by find_scc

diametercodeprevnextTop
diameter(filename => $filename, directed => 1, max => 0)
Returns the diameter of the network. If the parameter 'directed' is 1
or not specified, this is the diameter of the directed network. If it
is 0 or the parameter 'undirected' is 1, then this is the diameter of
the undirected network. If max is 1 or not specified, then this is the
maximum diameter. If max is 0 or avg is 1, then this is the average diameter.
A filename may also be specified to produce debugging information while
the diameter is being determined.

export_to_PajekcodeprevnextTop
export_to_Pajek($networkName, $filename)
Write the network to the file $filename in Pajek form, giving it the specified network name.

find_largest_componentcodeprevnextTop
find_largest_component($type)
type is the type of component, either "weakly" or "strongly"
Finds the largest component in a graph, returning a network made up of that
component.

find_pathcodeprevnextTop
@path = find_path($s, $v)
Finds the shortest path from $s to $v using the Floyed Warshall algorithm.

find_scccodeprevnextTop
find_scc($dbfile, $xpfile, $finfile)
$dbfile should be the filename of a db file of the links
that will be used by find_scc (the file can be produced
with write_db)
$xpfile should be the filename of a db file of the links
tranposed
$finfile is the location where a temporary file should go
that will be used by find_scc and the helper functions
find_scc finds a strongly connected subgraph from the graph
of the network. It needs to input files, a db file of the
links and a db file of the transposed links.

get_current_probability_distributioncodeprevnextTop
get_current_probability_distribution
Returns a hash with the current probability values (the values used for the random walk)

get_dat_statscodeprevnextTop
get_dat_stats($domain, $links_file, $cosine_file)
Returns a string with statistics obtained from the analyzing the dat
files created by create_cosine_dat_files

get_edge_attributecodeprevnextTop
get_edge_attribute($u, $v, $attribute_name)
Returns the value of the attribute on the given edge

get_edge_weightcodeprevnextTop
get_edge_weight($u, $v)
Returns the weight of the given edge.

get_edgescodeprevnextTop
get_edges
Returns the edges of the network

get_indexcodeprevnextTop
An internal function used by cosine_histograms. Used to determine
what bin a cosine value should go into.

get_node_weightcodeprevnextTop
get_node_weight($id)
Returns the weight of the specified vertex.

get_predecessor_matrixcodeprevnextTop
$matrix = get_predecessor_matrix()
Get the shortest path matrix from the network, using BFS algorithm.
The content of the matrix is the predecessor of the current node in the shortest path matrix.
e.g. : $matrix->{$i}->{$j} notes the predecessor of node $j in the shortest path from $i to $j
to get the shortest path from $i to $j, you can use function get_shortest_path

get_shortest_pathcodeprevnextTop
$path = get_shortest_path($start, $end)
Get the shortest path from $start to $end.

get_undirected_graphcodeprevnextTop
get_undirected_graph($graph)
Takes a graph and returns its undirected equivalent. This maintains the weight
on each edge and vertex.

get_vertex_attributecodeprevnextTop
get_vertex_attribute($u, $attribute_name)
Returns the value of the attribute on the given vertex

get_verticescodeprevnextTop
get_vertices
Returns the array of vertices (nodes) in the network

harmonic_mean_geodesic_distancecodeprevnextTop
Compute the harmonic mean geodesic distance

has_edgecodeprevnextTop
has_edge($u, $v)
Returns true if an edge exists in the network, false otherwise

has_edge_attributecodeprevnextTop
has_edge_attribute($u, $v, $attribute_name)
Returns true if the attribute has been set on the given edge and false otherwise.

has_nodecodeprevnextTop
has_node($u)
Returns true if the node is in the network

has_vertex_attributecodeprevnextTop
has_vertex_attribute($u, $attribute_name)
Returns true if the attribute has been set on the given vertex and false otherwise.

iterative_dfs_visit_1codeprevnextTop
An internal function used by find_scc

iterative_dfs_visit_1_v2codeprevnextTop
An internal function used by find_scc

iterative_dfs_visit_2codeprevnextTop
An internal function used by find_scc

iterative_dfs_visit_2_v2codeprevnextTop
An internal function used by find_scc

newcodeprevnextTop
$network = new Clair::Network();
Creates a new, empty network

newman_power_law_exponentcodeprevnextTop
newman_power_law_exponent($histogram_reference, $x_cutoff)
Computes the power law exponent on the histogram passed in by reference.
This uses the method described in Newman\'s "Power laws, Pareto distributions
and Zipf's law", formula 5 and 6.
Return value is an array containing two items, the power law exponent, and a
measure of the statistical error

num_documentscodeprevnextTop
num_documents
Returns the number of documents in the network

num_linkscodeprevnextTop
num_links
Returns the number of links (edges) in the network. If the parameter external is specified
and set to 1 (or internal is specified and set to 0), the number of external links is returned,
otherwise the number of internal links is given. If the parameter unique is specified and equal
to 1, only unique links will be counted.

num_nodescodeprevnextTop
num_nodes
Returns the number of nodes in the network

num_pairscodeprevnextTop
num_pairs
Returns the number of pairs of documents, defined as nd * (nd - 1) / 2 where nd is the number
of documents.

power_law_exponentcodeprevnextTop
power_law_exponent($histogram_reference)
Computes the power law coefficient on the histogram passed in by reference.
This uses linear regression on the logs of the data points to find both the
coefficient and the exponent.
Retun value is a string of the form "y = a x^b" where a is the coefficient and
b is the exponent.

power_law_in_link_distributioncodeprevnextTop
power_law_in_link_distribution()
Returns the power law formula from the in link distribution

power_law_out_link_distributioncodeprevnextTop
power_law_out_link_distribution()
Returns the power law formula from the out link distribution

power_law_total_link_distributioncodeprevnextTop
power_law_total_link_distribution()
Returns the power law formula from the total link distribution
(both in and out links)

print_current_lexrank_distributioncodeprevnextTop
print_current_lexrank_distribution
Prints the current lexrank values. If the lexrank has been calculated, these are the
results, otherwise this may be the initial or intermediate values.

print_current_pagerank_distributioncodeprevnextTop
print_current_pagerank_distribution
Prints the current pagerank values. If the pagerank has been calculated, these are the
results, otherwise this may be the initial or intermediate values.

print_current_probability_distributioncodeprevnextTop
print_current_probability_distribution
Prints the current probability values from the random walk. If the stationary distribution
has been calculated, these are the results, otherwise these may be the initial or
intermediate values

print_hyperlink_edgescodeprevnextTop
print_hyperlink_edges
Prints all edges with the 'pagerank_transition' property set. In the case of networks
built from hyperlinks from clusters, these edges are the edges that had a hyperlink
between them.
The edges are listed as source, then destination.

read_initial_probability_distributioncodeprevnextTop
read_initial_probability_distribution($filename)
Reads the initial probabilities for the random walk from the specified file.

read_pagerank_initial_distributioncodeprevnextTop
read_pagerank_initial_distribution($filename)
Reads the initial pagerank values from the specified file

read_pagerank_personalizationcodeprevnextTop
read_pagerank_personalization($filename)
Reads the pagerank personalization values (bias) from the specified file

read_pagerank_probabilities_from_filecodeprevnextTop
read_pagerank_probabilities_from_file($filename)
Read the pagerank transition probabilities from the specified file

remove_edgecodeprevnextTop
remove_edge($u, $v)
Removes the edge from $u to $v from the graph.

remove_nodecodeprevnextTop
remove_node($id)
Removes the vertex with id $id from the graph

save_current_pagerank_distributioncodeprevnextTop
save_current_pagerank_distribution($filename)
Saves the current pagerank values to a file. If pagerank has been calculated, then these
are the results, otherwise these could be initial or intermediate values.

save_hyperlink_edges_to_filecodeprevnextTop
save_hyperlink_edges_to_file($filename)
Saves all edges with the 'pagerank_transition' property set to the specified file.
In the case of networks built from hyperlinks from clusters, these edges are the
edges that had a hyperlink between them.
The edges are listed as source, then destination.

save_pagerank_probabilities_to_filecodeprevnextTop
save_pagerank_probabilities_to_file
Saves the transition probabilities used in pagerank to the specified file.

set_edge_attributecodeprevnextTop
set_edge_attribute($u, $v, $attribute_name, $value)
Sets the attribute for the given edge to the given value

set_edge_weightcodeprevnextTop
set_edge_weight($u, $v, $weight)
Sets the weight of the given edge.

set_node_weightcodeprevnextTop
set_node_weight($id, $weight)
Set the weight of node $id.

set_vertex_attributecodeprevnextTop
set_vertex_attribute($u, $attribute_name, $value)
Sets the attribute for the vertex to the given value

write_dbcodeprevnextTop
write_db($filename, transpose => 1)
Writes the graph's links to a db file. Links are written
transposed if the parameter transpose is provided and
equal to 1.

write_histogram_matlabcodeprevnextTop
write_histogram_matlab($linked_histogram_reference, $not_linked_histogram_reference, $filename_base)
Writes matlab files for linked, linked cumulative, and not linked
histograms based on the histogram distributions given.

write_link_distcodeprevnextTop
write_link_dist($histogram_reference, $filename)
Writes a link distribution file for the histogram that is passed in
by reference

write_link_matlabcodeprevnextTop
write_link_matlab($histogram_reference, $filename, $dependency)
Writes a Matlab for the histogram. $histogram_reference should
be a reference to the histogram that should be written to the
matlab file. $dependency is the names of any dependencies that
the Matlab file should have

write_linkscodeprevnextTop
write_links($filename, skip_duplicates => 1, transpose => 1, weights => 0)
Writes the network links to a file. If the parameter skip_duplicates
is specified as 1, duplicate edges are skipped. If the parameter
transpose is 1, the links are written transposed.

write_nodescodeprevnextTop
write_nodes($filename)
Writes the list of nodes in the network to a file.

Methods code


DESTROYdescriptionprevnextTop
sub DESTROY {
  my $self = shift;

  if (defined $self->{adjacency_matrix}) {
    untie %{$self->{adjacency_matrix}};
  }
  if (defined $self->{adjacency_matrix_file}) {
    unlink $self->{adjacency_matrix_file} or die "Couldn't delete " .
      $self->{adjacency_matrix_file} . ": $!\n";
  }

  if (defined $self->{path_length_matrix}) {
    untie %{$self->{path_length_matrix}};
  }
  if (defined $self->{path_length_matrix_filename}) {
    unlink $self->{path_length_matrix_filename} or die "Couldn't delete " .
      $self->{path_length_matrix_filename} . ": $!\n";
  }
}

Watts_Strogatz_clus_coeffdescriptionprevnextTop
sub Watts_Strogatz_clus_coeff {
  my $self = shift;
  my $graph = $self->{graph};

  my %parameters = @_;

  my $write_to_file = 0;
  my $filename = "";
  if (exists $parameters{filename}) {
    $write_to_file = 1;
    $filename = $parameters{filename};
    open(WATT, "> $filename") or die("Could not open file: $filename\n");
  }

  my %link_hash;

  if ($write_to_file == 1) {
    print WATT "reading the input...\n";
  }

  %link_hash = $self->get_adjacency_matrix(undirected => 1);

  if ($write_to_file == 1) {
    print WATT "done!\n";
  }

  my $sum = 0;
  my $count = 0;
  my $skipped = 0;

  foreach my $v (keys %link_hash) {
    my $c = 0;
    my $connected = 0;
    my %nn;
    my @neighbors;

    if (exists $link_hash{$v}) {
      @neighbors = keys %{$link_hash{$v}};
      if (@neighbors > 1) {
        if (@neighbors > 5000) {
          $skipped++;
          next;
        }
        foreach my $n1 (0..$#neighbors) {
          foreach my $n2 ($n1+1..$#neighbors) {
            if (exists $link_hash{$neighbors[$n1]}{$neighbors[$n2]}) {
              $connected++;
            }
          }
        }
        $c = 2 * $connected / (@neighbors * (@neighbors-1));
} } $count++; if ($write_to_file == 1) { print WATT "clustering coefficient for $v is $c\n"; } $sum += $c; } if ($write_to_file == 1) { print WATT "\nnumber of nodes skipped (with degree > 5000): $skipped\n"; } close WATT; if ($count == 0) { return 0; } else { return $sum/$count;
}
}

Watts_Strogatz_local_clus_coeffdescriptionprevnextTop
sub Watts_Strogatz_local_clus_coeff {
  my $self = shift;
  my $graph = $self->{graph};

  my %parameters = @_;

  my $write_to_file = 0;
  my $filename = "";
  if (exists $parameters{filename}) {
    $write_to_file = 1;
    $filename = $parameters{filename};
    open(WATT, "> $filename") or die("Could not open file: $filename\n");
  }

  my %link_hash;

  if ($write_to_file == 1) {
    print WATT "reading the input...\n";
  }

  %link_hash = $self->get_adjacency_matrix();

  my $skipped = 0;
  my %local_cc = ();

  foreach my $v (keys %link_hash) {
    my $c = 0;
    my $connected = 0;
    my %nn;
    my @neighbors;

    if (exists $link_hash{$v}) {
      @neighbors = keys %{$link_hash{$v}};
      if (@neighbors > 1) {
        if (@neighbors > 5000) {
          $skipped++;
          next;
        }
        foreach my $n1 (0..$#neighbors) {
          foreach my $n2 ($n1+1..$#neighbors) {
            if (exists $link_hash{$neighbors[$n1]}{$neighbors[$n2]}) {
              $connected++;
            }
          }
        }
        $c = 2 * $connected / (@neighbors * (@neighbors-1));
} } if ($write_to_file == 1) { print WATT "clustering coefficient for $v is $c\n"; } $local_cc{$v} = $c; } if ($write_to_file == 1) { print WATT "\nnumber of nodes skipped (with degree > 5000): $skipped\n"; } close WATT; return %local_cc;
}

add_edgedescriptionprevnextTop
sub add_edge {
        my $self = shift;

        my $u = shift;
        my $v = shift;

        my $graph = $self->{graph};

        $graph->add_edge($u, $v);
        $self->clear_cache();
}

add_nodedescriptionprevnextTop
sub add_node {
        my $self = shift;

        my $node = shift;
        my @remaining_args = @_;

        # my $text = shift;
my %parameters = @_; my $graph = $self->{graph}; $graph->add_vertex($node); foreach $key (keys %parameters) { $graph->set_vertex_attribute($node, $key, $parameters{$key}); } $self->clear_cache();
}

add_weighted_edgedescriptionprevnextTop
sub add_weighted_edge {
  my $self = shift;

  my $u = shift;
  my $v = shift;
  my $w = shift;

  my $graph = $self->{graph};

  $graph->add_weighted_edge($u, $v, $w);
  $self->clear_cache();
}

average_cosinesdescriptionprevnextTop
sub average_cosines {
        my $self = shift;
        my $graph = $self->{graph};

        my $cm = shift;
        my %cos_matrix = %$cm;

        my $tot_link_cos = 0;
        my $link_count = 0;

        my $tot_nl_cos = 0;
        my $nl_count = 0;

        foreach my $doc1 (keys %cos_matrix) {
                foreach my $doc2 (keys %{ $cos_matrix{$doc1} }) {
                        if ($graph->has_edge($doc1, $doc2)) {
                                $tot_link_cos += $cos_matrix{$doc1}{$doc2};
                                $link_count++;
                        } else {
                                $tot_nl_cos += $cos_matrix{$doc1}{$doc2};
                                $nl_count++;
                        }
                }
        }

        my $link_avg = 0;
        if ($link_count > 0) {
                $link_avg = $tot_link_cos/$link_count;
} my $nl_avg = 0; if ($nl_count > 0) { $nl_avg = $tot_nl_cos/$nl_count;
} return ($link_avg, $nl_avg);
}

average_shortest_pathdescriptionprevnextTop
sub average_shortest_path {
  my $self = shift;
  my $graph = $self->{graph};

  my $directed = $self->{directed};

  my $asp_matrix = $self->get_shortest_path_matrix(directed => $directed);

  my %avg = ();
  my $total_cnt = 0;
  foreach my $v1 (keys %{$asp_matrix}) {
    my $sum = 0;
    my $cnt = 0;
    foreach my $v2 (keys %{$asp_matrix->{$v1}}) {
      my $len = $asp_matrix->{$v1}{$v2};
      if ($len >= 0) {
        $sum += $len;
        $cnt++;
      }
    }
    if ($cnt > 1) {
      $avg{$v1} = $sum / $cnt;
$total_cnt++; } } my $sum = 0; foreach my $k (keys %avg) { $sum += $avg{$k}; } if (int(scalar(keys %avg)) == 0) { return 0; } else { return $sum / int(scalar(keys %avg));
}
}

avg_in_degreedescriptionprevnextTop
sub avg_in_degree {
        my $self = shift;
        my %histogram = $self->compute_in_link_histogram();

        my $total_in = 0;
        my $num_nodes = scalar $self->get_vertices();
        foreach my $value (keys %histogram)
        {
                #skip nodes that have no links
next if $value == 0; my $num = $histogram{$value}; $total_in += $value * $num; } return $total_in/$num_nodes;
}

avg_out_degreedescriptionprevnextTop
sub avg_out_degree {
        my $self = shift;
        my %histogram = $self->compute_out_link_histogram();

        my $total_out = 0;
        my $num_nodes = scalar $self->get_vertices();
        foreach my $value (keys %histogram)
        {
                #skip nodes that have no links
next if $value == 0; my $num = $histogram{$value}; $total_out += $value * $num; } return $total_out/$num_nodes;
}

avg_total_degreedescriptionprevnextTop
sub avg_total_degree {
        my $self = shift;
        my %histogram = $self->compute_total_link_histogram();

        if (!$self->{directed}) {
          return $self->{graph}->average_degree();
        }
        my $total = 0;
        my $num_nodes = 0;
        foreach my $value (keys %histogram)
        {
                #skip nodes that have no links
next if $value == 0; my $num = $histogram{$value}; $total += $value * $num; $num_nodes += $num; } my $deg = 0; if ($self->{directed}) { $deg = $total / $num_nodes;
} else { $deg = ($total / $num_nodes) / 2; } return $deg;
}

compute_cohesiondescriptionprevnextTop
sub compute_cohesion {
        my $self = shift;
        my %parameters = @_;

        my $text_of = $self->{text_of};

        my %cosine_of;

        my $graph = $self->{graph};

        foreach $u ($graph->vertices) {
                my %cosines_of_u;

                foreach $v ($graph->vertices) {
                        if ($u < $v) {
                                my $text_of_u = $text_of->{$u};
                                my $text_of_v = $text_of->{$v};

                                my $cosine = GetLexSim($text_of_u, $text_of_v);
                                print "$u $v, $cosine\n";
                                $cosines_of_u{$v} = $cosine;
                        }
                }

                if (scalar(keys(%u_cosine_of)) > 0) {
                        $cosine_of{$u} =\% cosines_of_u;
                }
        }

        return %cosine_of;
}

compute_in_link_histogramdescriptionprevnextTop
sub compute_in_link_histogram {
        my $self = shift;
        my $graph = $self->{graph};

        my %histogram = ();

        foreach my $v ($graph->vertices)
        {
                my $num_in = $graph->predecessors($v);
                if (not exists $histogram{$num_in} )
                {
                        $histogram{$num_in} = 1;
                } else {
                        $histogram{$num_in}++;
                }
        }

        return %histogram;
}

compute_out_link_histogramdescriptionprevnextTop
sub compute_out_link_histogram {
        my $self = shift;
        my $graph = $self->{graph};

        my %histogram = ();

        foreach my $v ($graph->vertices)
        {
                my $num_out = $graph->successors($v);
                if (not exists $histogram{$num_out})
                {
                        $histogram{$num_out} = 1;
                } else {
                        $histogram{$num_out}++;
                }
        }

        return %histogram;
}

compute_pagerankdescriptionprevnextTop
sub compute_pagerank {
        my $self = shift;
        my %params = @_;

        my $pagerank_value = 'pagerank_value';
        if (exists $params{pagerank_value}) {
                $pagerank_value = $params{pagerank_value};
        }

        my $pagerank_transition = 'pagerank_transition';
        if (exists $params{pagerank_transition}) {
                $pagerank_transition = $params{pagerank_transition};
        }

        my $pagerank_bias = 'pagerank_bias';
        if (exists $param{pagerank_bias}) {
                $pagerank_bias = $params{pagerank_bias};
        }

        my $jump = 0;
        if (exists $params{jump}) {
                $jump = $params{jump};
        }

        my $tolerance = 0;
        if (exists $params{tolerance}) {
                $tolerance = $params{tolerance};
        }

        my $max_iterations = 0;
        if (exists $params{max_iterations}) {
                $max_iterations = $params{max_iterations};
        }

        return $self->compute_rank_result($pagerank_value, $pagerank_transition, $jump,
                                          $pagerank_bias, tolerance => $tolerance,
                                                                                                                                                max_iterations => $max_iterations);
}

compute_random_walk_stepdescriptionprevnextTop
sub compute_random_walk_step {
        my $self = shift;
        my $graph = $self->{graph};

        my $cur_prob = shift;
        my $trans_matrix = shift;

        my $result = $trans_matrix->multiply($cur_prob)<