Modularity

🚨 This page is a work in progress.

Motivation

We are often faced with networks whose underlying community structure is not known in advance, yet still, we need a measure to evaluate how good our proposed partition of the network into communities is. This is especially important when dealing with algorithms requiring an objective function to maximize (e.g. genetic algorithms). Newman and Girvan proposed a measure called modularity in 2003, which was adopted broadly and successfully in the network research community. Modularity is maximized for divisions of a graph when many edges fall within the proposed communities (intra-community edges) as compared to edges between communities (inter-community edges).

The following figure depicts a contrived network with different vertex-community assignments. The intuitive grouping of vertices into communities in (a) leads to the maximal achievable modularity value of for this particular graph. This grouping is characterized by only sparse connections between communities which are in themselves densely connected. In contrast, figure (b) results in a lower value of just . We will discuss the possible range of modularity values later on.

Different qualities for different vertex-community assignments
Simple network with different proposed vertex-community assignments. (a) shows the optimal grouping of vertices into three communities, while (b) depicts a bad grouping into five communities. The blue community even induces a disconnected subgraph. Image inspired by this paper.

Configuration model & modularity formula

In the following, we assume a graph with vertices and edges. The edge set is determined by the adjacency matrix1 with entries:

Of all the edges, a high fraction of edges that connect vertices in the same community should give a good partition. For a specific community , this fraction is given by

where we iterate over all edges with both ends in community . We divide by to retrieve the fraction of all edges. The factor comes from the assumption that we only deal with undirected edges, so every edge occurs twice for every order of arguments ( and ). We call this term in accordance with here. Aiming for

with being the set of all proposed communities, makes sure a high fraction of edges is within communities (and not between). However, the trivial case with all vertices in the same community yields a maximum value of . Therefore, we cannot use this measure alone, but instead compare this fraction of intra-community edges with the expected number of intra-community edges if edges were distributed at random in our graph. Note that β€œrandom” does not mean any arbitrary graph, instead, the following properties should be preserved to allow for a fair comparison:

  1. The number of nodes
  2. The number of edges
  3. The vertex degree for all vertices .

We employ the configuration model as null model (as seen here). A new, β€œempty” graph with nodes is constructed and edges are randomly inserted between vertices whilst ignoring their (potentially known) community structure.

In order to analyze the expected fraction of edges between any nodes and in a community, we cut every edge into two halves, leaving only edge stubs on the vertices as shown in the next figure. Since we have edges in our graph, we end up with edge stubs. There are ways to connect one edge stub of vertex (marked in red) to any edge stub of (marked in blue). Randomly, we could connect this one edge stub of to any of the edge stubs in the entire graph 2. Hence, the probability that one edge stub of connects to any edge stub of in our random graph is given by . As vertex has edge stubs, the probability that vertex connects to vertex , i. e. any edge stub of connects to any edge stub of , is given by

Configuration model
Configuration model with and . One possible edge between stubs is shown as dashed line.

This yields the equation for the expected number of edges that fall within a community if edges were distributed randomly in the graph while preserving node degrees:

The same reasoning as before applies here for the term . Since we are dealing with big networks where , it is reasonable to drop the aforementioned β€œβ€ in the denominator, which is common in literature (see here and here). Again, to show the connection to the original definition of modularity by Newman and Girvan, we refer to this term as , where for any community , we define as:

This denotes the fraction of edge stubs (out of all edge stubs in the graph) that are attached to vertices in community . One could also refer to this as the total vertex degree of vertices in community divided by twice the number of edges in the graph .

By combining with and by iterating over all communities , we get:

This gives us the definition of modularity that is often found in literature. We can use this measure as quality function for algorithms, rendering the problem of community detection into that of modularity optimization. Note that the variable stands for β€œquality” 3.

While we only consider undirected (and possibly weighted) graphs here, modularity can be extended to directed graphs as well, as seen here and here.

Alternative formulation

An alternative formulation of modularity is used here. Both variants are equal as we can transform them into each other. We will use the following identity to get from the first to the second line:

with and . Here, is the sum of the weights of edges incident to vertices in (including self-loops), while is the sum of the weights of edges inside the community4. This is the actual formula used in the code.

β€œThis reveals an inherent trade-off: To maximize the first term [inside the parantheses], many edges should be contained in clusters, whereas the minimization of the second term is achieved by splitting the graph into many clusters with small total degrees each.” ~ from here

🎈 Task: We motivated the introduction of a negative term in the modularity formula by this observation: β€œHowever, the trivial case with all vertices in the same community yields a maximum value of 1.” Make sure you understand why this is the case. Take a look at the definition of .


1

Note that this is just the representation of the graph for this book, not for the actual Rust code.

2

The β€œβ€ is introduced because an edge stub cannot connect to itself in the configuration model. However, this still allows for self-loops as one edge stub could connect to another one on the same vertex.

3

Modularity is just one quality function (out of many others) that we will discuss here as it has been widely adopted in literature.

4

For undirected graphs, this is actually twice the sum of the weights of edges except for self-loops since they are counted only once (they are on the main diagonal of the adjacency matrix).