Introduction

🚨 The documentation is currently in progress. Meanwhile, you can see the source code take shape here.

Louvain social preview

Modularity

🚨 This page is a work in progress.

Motivation

We are often faced with networks whose underlying community structure is not known in advance, yet still, we need a measure to evaluate how good our proposed partition of the network into communities is. This is especially important when dealing with algorithms requiring an objective function to maximize (e.g. genetic algorithms). Newman and Girvan proposed a measure called modularity in 2003, which was adopted broadly and successfully in the network research community. Modularity is maximized for divisions of a graph when many edges fall within the proposed communities (intra-community edges) as compared to edges between communities (inter-community edges).

The following figure depicts a contrived network with different vertex-community assignments. The intuitive grouping of vertices into communities in (a) leads to the maximal achievable modularity value of for this particular graph. This grouping is characterized by only sparse connections between communities which are in themselves densely connected. In contrast, figure (b) results in a lower value of just . We will discuss the possible range of modularity values later on.

Different qualities for different vertex-community assignments
Simple network with different proposed vertex-community assignments. (a) shows the optimal grouping of vertices into three communities, while (b) depicts a bad grouping into five communities. The blue community even induces a disconnected subgraph. Image inspired by this paper.

Configuration model & modularity formula

In the following, we assume a graph with vertices and edges. The edge set is determined by the adjacency matrix1 with entries:

Of all the edges, a high fraction of edges that connect vertices in the same community should give a good partition. For a specific community , this fraction is given by

where we iterate over all edges with both ends in community . We divide by to retrieve the fraction of all edges. The factor comes from the assumption that we only deal with undirected edges, so every edge occurs twice for every order of arguments ( and ). We call this term in accordance with here. Aiming for

with being the set of all proposed communities, makes sure a high fraction of edges is within communities (and not between). However, the trivial case with all vertices in the same community yields a maximum value of . Therefore, we cannot use this measure alone, but instead compare this fraction of intra-community edges with the expected number of intra-community edges if edges were distributed at random in our graph. Note that β€œrandom” does not mean any arbitrary graph, instead, the following properties should be preserved to allow for a fair comparison:

  1. The number of nodes
  2. The number of edges
  3. The vertex degree for all vertices .

We employ the configuration model as null model (as seen here). A new, β€œempty” graph with nodes is constructed and edges are randomly inserted between vertices whilst ignoring their (potentially known) community structure.

In order to analyze the expected fraction of edges between any nodes and in a community, we cut every edge into two halves, leaving only edge stubs on the vertices as shown in the next figure. Since we have edges in our graph, we end up with edge stubs. There are ways to connect one edge stub of vertex (marked in red) to any edge stub of (marked in blue). Randomly, we could connect this one edge stub of to any of the edge stubs in the entire graph 2. Hence, the probability that one edge stub of connects to any edge stub of in our random graph is given by . As vertex has edge stubs, the probability that vertex connects to vertex , i. e. any edge stub of connects to any edge stub of , is given by

Configuration model
Configuration model with and . One possible edge between stubs is shown as dashed line.

This yields the equation for the expected number of edges that fall within a community if edges were distributed randomly in the graph while preserving node degrees:

The same reasoning as before applies here for the term . Since we are dealing with big networks where , it is reasonable to drop the aforementioned β€œβ€ in the denominator, which is common in literature (see here and here). Again, to show the connection to the original definition of modularity by Newman and Girvan, we refer to this term as , where for any community , we define as:

This denotes the fraction of edge stubs (out of all edge stubs in the graph) that are attached to vertices in community . One could also refer to this as the total vertex degree of vertices in community divided by twice the number of edges in the graph .

By combining with and by iterating over all communities , we get:

This gives us the definition of modularity that is often found in literature. We can use this measure as quality function for algorithms, rendering the problem of community detection into that of modularity optimization. Note that the variable stands for β€œquality” 3.

While we only consider undirected (and possibly weighted) graphs here, modularity can be extended to directed graphs as well, as seen here and here.

Alternative formulation

An alternative formulation of modularity is used here. Both variants are equal as we can transform them into each other. We will use the following identity to get from the first to the second line:

with and . Here, is the sum of the weights of edges incident to vertices in (including self-loops), while is the sum of the weights of edges inside the community4. This is the actual formula used in the code.

β€œThis reveals an inherent trade-off: To maximize the first term [inside the parantheses], many edges should be contained in clusters, whereas the minimization of the second term is achieved by splitting the graph into many clusters with small total degrees each.” ~ from here

🎈 Task: We motivated the introduction of a negative term in the modularity formula by this observation: β€œHowever, the trivial case with all vertices in the same community yields a maximum value of 1.” Make sure you understand why this is the case. Take a look at the definition of .


1

Note that this is just the representation of the graph for this book, not for the actual Rust code.

2

The β€œβ€ is introduced because an edge stub cannot connect to itself in the configuration model. However, this still allows for self-loops as one edge stub could connect to another one on the same vertex.

3

Modularity is just one quality function (out of many others) that we will discuss here as it has been widely adopted in literature.

4

For undirected graphs, this is actually twice the sum of the weights of edges except for self-loops since they are counted only once (they are on the main diagonal of the adjacency matrix).

Modularity calculation examples

🚨 This page is a work in progress.

We’ve given an intuition for the quality function modularity that is defined as follows with two equivalent formulas:

Also recall the definition of and :

Let’s see it in action with some examples and go through the calculations step by step. You can also verify the results in Desmos.

Weighted graph with singleton communities

Consider the following graph

Weighted test graph
A simple weighted test graph

Its adjacency matrix is given by:

Modularity evalutes to:

This is a pretty bad modularity and stems from the bad community assignment (every vertex in its own community).

We can also use the second equivalent formulation to get to the same result:

The difference between the two formulations is not apparent in this example as we only consider singleton communities here (every vertex is in its own community).

Weighted graph with other communities

Weighted test graph
The same weighted test graph as before with a different vertex-community assignment

The adjacency matrix is the same as above and we also have . But the vertex-community assignment differs. The partitioning shown in the figure is given by:

For the modularity, we therefore calculate:

In step , we’ve used the result from the weighted graph of the previous section. Note that while modularity is still not good, it has slightly improved.

With the equivalent formulation, we end up with – surprise, surprise – the same result:

If we somehow already know the values for and , i. e. the sum of the weights of edges inside the community and respectively the sum of the weights of edges incident to community vertices (aka vertex degrees), the second formula is the way to go as we can just plug our values in and are done in no time.

🎈 Task: Come up with your own graphs and calculate modularity by hand for those. Guess a good vertex-community assignment and see if modularity increases compared to a vertex-community assignment you feel is bad, e.g. when putting all vertices in one big community.




Here are some more examples of graphs used to test the code. The respective modularity values were calculated by hand in the same manner seen above.

Weighted graph 2 with singleton communities

TODO: Insert image of graph

Original Louvain paper graph

TODO: Insert image of graph

The graph is undirected, so we only need to consider the upper diagonal matrix of . We can also omit the diagonal since there are no self-loops in the graph (there are only s on the diagonal). Therefore, the sum of all entries in the upper diagonal matrix of corresponds to the total number of edges in the graph:

For the original node-community assignment where every node is in its own community, we have this clustering:

The initial modularity is therefore given by:

In the next level of the hierarchy, Louvain locally optimizes modularity. The new clustering is given by:

Finally, the second pass merges two communities together, so that we are left with a clustering into two communities in the end:

Notice the four edges , , , were inter-community edges, but are now intra-community edges as the two communities got merged together. The modularity increases to:

In the 3rd pass of Louvain, we find that we cannot locally increase modularity anymore. Therefore, this is the final assignment that a full Louvain run might return. Note that due to the randomness in the algorithm, different runs can return different assignments with different modularity values. For example, another run finds that is the highest modularity it can achieve.

🎈 Task: What is the corresponding clustering to this modularity value? Go to the project page, download the code and run it on the graph multiple times until you encounter a modularity value of in the final run. Use the hierarchy command to display the node-to-community assignment for the different levels. What is the reason for this behavior? Hint: Take a look at node .

Modularity value range

🚨 This page is a work in progress.

πŸ•‘ TL;DR: Modularity .

Going through the examples, you may have wondered what a β€œgood” or β€œbad” modularity value really is. Here, we will discuss the value range of the modularity measure.

Modularity is zero if the fraction of edges within communities is no different from the respective fraction we would expect in a random graph. We get a positive value if this fraction is higher than expected on the basis of chance. This directly results from our definition of modularity. In the simple network presented here, was around which is still slightly better than at random but nevertheless worse than the best result of for that particular graph.

Upper limit

In theory, the maximum possible modularity value is for a graph with infinitely many communities, such that . We show this property for a graph with communities, each forming a graph, i. e. two vertices sharing one edge.

Graph for upper limit of modularity
Infinite graph and community assignment yielding the upper limit of modularity:

This graph has communities as well as edges. When approaches infinity, we obtain:

As and for every graph partitioning, is indeed the upper limit of modularity.

Lower limit

The lower limit of modularity is which is achieved using any bipartite graph with the clustering (see here), that is, each cluster encompasses all the vertices of one bipartition of (the vertex sets or respectively ). The following figure shows the bipartite graph with , and . The given community assignment yields a modularity of , while the maximum modularity for this particular graph is .

Graph for lower limit of modularity
Bipartite graph and community assignment yielding the lower limit of modularity:

Common values

A rigorous proof that holds true for any undirected and unweighted graph and any community assignment (clustering) is presented here. Note however, that for a given graph, the maximum achievable modularity is oftentimes not . Newman and Girvan claim that a typical range for modularity values is from to (see here) with values above indicating a β€œsignificant community structure in a network” (see here). Whether values above really indicate a good community structure depends on the application and the particular graph.

Modularity resolution limit

🚨 This page is a work in progress.

Fortunato and BarthΓ©lemy have shown that modularity suffers from a resolution limit, implying that optimizing for modularity does not necessarily reveal the actual community structure of the network (see here). Two communities might get combined into a larger one, yielding a higher modularity. This is the case for communities where , i.e. the number of edges inside community is too small for the grouping of vertices to be recognized as one community.

Note that the notion of β€œcommunity” is defined in the β€œweak” sense as in this paper: β€œIn a weak community the sum of all degrees within [the subgraph] is larger than the sum of all degrees toward the rest of the network”. In the mathematical definition of modularity, however, what is considered a community is dependent on the size of the whole network (number of edges ) – which is not ideal.

Fortunato and BarthΓ©lemy point out the importance of further investigation of the modules found by an algorithm that optimizes for modularity since one does not know if a claimed β€œmodule” is really just one community or a combination of multiple communities. Special care is required for communities with

as they are likely to be composed of two or more smaller1 ones (this number results from merging the two biggest, but still unrecognizable modules together). Yet, even bigger modules could suffer from this constraint of modularity if they are – as the authors call it – β€œfuzzy”, meaning that they have an increasing number of external edges (edges to other modules), making the difference between internal and external edges smaller.

Graph illustrating the resolution limit of modularity
Network consisting of two and one clique. The former are not correctly identified as individual communities when optimizing for modularity. ( cliques identified as single communities), ( cliques grouped together as a pair).

The above figure illustrates the resolution limit in an extreme example. The two cliques (complete graphs ) on the right are interconnected by one single edge (β€œbridge”) and share just one edge with the rest of the network, yet they are not recognized as two individual communities; optimal modularity assigns all eight vertices to the same community. The clique may be an arbitrary graph with edges, so that the whole network consists of edges. This ensures that for the cliques (composed of edges each) the condition is met. A complete graph holds edges, thus, our network consists of edges. , yet still, which happens to be sufficiently close to 2.

1

The terms β€œsmall” and β€œbig” are used here not to indicate the number of nodes, but the number of intra-edges in the respective communities. How nodes are distributed in the network does not have any impact on modularity, i.e. whether a community has more or less nodes is irrelevant.

2

Fortunato and BarthΓ©lemy point out that they calculate as if the variables were continuous, which they are not. Thus, for the given formula of the resolution limit, one has to test the closest integer values.

Louvain algorithm

🚨 This page is a work in progress.

The Louvain method – named after the University of Louvain where Blondel et al. developed the algorithm – finds communities by optimizing modularity locally for every node’s neighborhood, then consolidating the vertices of newly found communities to super vertices and repeating the steps on the new, smaller graph (see original paper). This multi-step algorithm ends when modularity does not improve in one pass anymore. It has been shown to yield state-of-the-art results with very good performance which is linear in the number of edges in the graph (see here).

We demonstrate and explain the Louvain algorithm with the following undirected and unweighted graph. The source code can deal with weighted graphs as well.

Graph with communities
Initial singleton communities

We assume we somehow know the communities in the graph a priori. This known vertex-community assignment is oftentimes called the β€œground truth”. In our sampel graph, we have two communities that the Louvain algorithm should find in the end since it optimizes modularity.

The two ground-truth communities are denoted by the shape of the vertices (either a circle or a square). In the background, we draw in gray areas around the vertices to indicate the communities that the algorithm found in each step.

Initially each vertex gets assigned to its own community that we will call β€œsingleton community” (mathematically speaking a set consisting of just one element, e.g. ). Therefore, we have this partition:

🎈 Task: Calculate the modularity for the initial partition and assume a weight of 1 for every edge. Obviously, shouldn’t be very high.

I. Modularity optimization

After the initial assignment to singleton communities, the first step of one Louvain pass is essentially to apply Newman’s agglomerative hierarchical clustering method presented here and improved by Clauset, Newman, and Moore here:

β€œ[S]tarting with each vertex being the sole member of a community of one, we repeatedly join together the two communities whose amalgamation produces the largest increase in .” ~ from here

In the Louvain method, this is done by iterating over all vertices (in randomized order): a vertex is then β€œmoved” to the community of one of its neighbors , so that the modularity gain is maximized (this is why Louvain is a greedy algorithm). If no positive gain is possible, remains in its singleton community.

In the following figure, some vertices were already gather to communities, e.g. vertices , , and on the left and vertices and on the right belong together accoridng to the current state of the algorithm.

Delta modularity calculation for vertex 4
Louvain optimization phase in first pass: Delta modularity calculation for vertex 4.

At this point, we consider vertex and its neighbors . The highest increase is when the community of vertices and accommodates vertex , hence is β€œmoved” to this community. An efficient calculation of is derived in the next section. We consider vertices multiple times until modularity cannot be increased anymore which corresponds to a local maximum. The following figure illustrates the community assignments as result of the first local optimization phase.

Result of local optimization in first pass
Louvain optimization phase in first pass: Result of local optimization.

II. Community aggregation (consolidation)

In the second phase, the communities that were found in the first phase are replaced by super vertices. This results in a new graph where vertices are the communities of phase I. Consider the middle layer of the following figure showing the outcome of the first pass after phase I and II.

Resulting Louvain hierarchy for the sample graph
Resulting Louvain hierarchy for the sample graph. Communities of the previous pass become the new super vertices.

Edges now have a weight assigned to them, even though we started with an unweighted graph:

  • Edges between communities (inter-community edges) are condensed to one edge between the communities with edge weight equal to the sum of the weight of all previous edges between the communities.
  • Edges within communities (intra-community edges) result in self-loops with a weight of value two in order to account for the undirectedness (see calculations in modularity section).

Termination

These two phases make one pass of the Louvain algorithm. The first phase works directly on the graph that emerges from the second phase (the new super vertices are considered to lie in their own singleton communities). Passes are executed repeatedly while modularity is always calculated with respect to the original graph. The algorithm ends when modularity cannot be increased anymore (hopefully corresponding to a global maximum), which is often the case after a handful of passes, even for large networks (see here).

Capturing the resulting community-vertex assignment after each pass produces a hierarchy (that we’ve already seen) since the second step creates super-vertices based on the previous vertices of a community and thus the local optimization phase essentially finds communities of communities. Blondel et al. remark that this is favorable in order to reveal hierarchical structures present in many large networks. They also point out that the resolution limit is mitigated as vertices are moved one after another, yielding a low probability of merging two distinct communities. Even if the super vertices of the respective communities are merged in later passes, the resulting hierarchy helps in identifying on which β€œorganization level” this merge took place (see here).

Delta modularity calculation (singleton)

🚨 This page is a work in progress. TODO: Maybe add a graphic here to explain the various variables.

In the modularity optimization phase, we only rely on local information. Recalculating the global modularity value for every possible neighbor of each vertex would significantly degrade the performance of the Louvain algorithm. This is why Blondel et al. employ a delta modularity formula that can be applied to millions of nodes. In this section, we will explain and derive the formula and show how it can be simplified for usage in a program.

We have seen here that the modularity for a partition is given by:

We are only intrested in the contribution of a single community to the overall modularity. With a slight abuse of notation, let denote modularity of community 1, such that . Therefore, for the modularity of a community , we get:


Next, we look closely at the process of β€œmoving” a vertex to the community of one of its neighbors . We assume the vertex to be located in a singleton community. The formula Blondel et al. presented is for exactly this case, where we remove from its singleton community and then insert it into the neighbor’s community. This is actually only applicable for the first iteration of the modularity optimization phase where in the beginning of every new pass we deal with singleton communities. The authors state that they use a similar formula to calculate the modularity change when is removed from any community (also those with more than one vertex in it), but do not reveal the expression used. We will therefore derive the generalized version in the next section.

For now, let us consider the case where we move a vertex from its singleton community that we will denote by to any other community . The modularity change (delta) is then given by

where is the quality of community before the merge (and hence without vertex in that community) and is the quality of after the merge (and hence after vertex was integrated into community ). As the singleton community does not exist anymore after the merge, we also subtract the modularity of the singleton community . In the following, let the prime symbol always denote the state of a variable after the merge.

With the above formula, this gives us:

which is the equation presented in the original Louvain paper. If we consider that after the merging process, the sum of the weights of the edges between node and community – that we will denote2 by – now adds to the community accommodating vertex , we obtain

Likewise, we find that the weighted vertex degree adds to the total weighted vertex degree of after the merge, thus:

With this, we can simplify our expression for the modularity change. Note that we use , as there are no edges inside a singleton community since we do not allow self-loops in the original graph.

Finally, we have

We can ignore the constant since we only compare different modularity increases with each other and therefore merely require a relative measure. This saves us one division in the algorithm. After one complete pass, the new global modularity is calculated using the formula from here3.


Our short formula is used in the algorithm to efficiently calculate the delta modularity gain . To remove a vertex from its previous community or to insert it into a new community , only and have to be updated. is adjusted for the global modularity calculation and not for the delta modularity . Note that we can precalculate the sum of the weights of edges between and community or ( and ) before calling the remove- or insert-function. This is crucial for the speed of Louvain as has to be computed frequently. As stated above, we also dropped the factor to save one division. For the calculation of global modularity, we do not omit the factor in order to obtain the absolute global modularity .

1

We can distinguish the two functions by looking at the argument, which is either a community or a partition , i.e. a set of communities .

2

This is done in conformity with these notes on modularity. The arrow does not indicate a direction; we still deal with an undirected graph.

3

In newer versions of the algorithm, this is not the case anymore and we only use the relative modularity calculation. TODO

Delta modularity calculation (generalized)

🚨 This page is a work in progress. TODO: explain why this is needed.

Here we generalize the delta modularity calculation presented in the previous section for the case where we move a vertex from any community (not just a singleton community) to any other community . We proceed by removing from the old community and putting it into its own singleton community. The latter is done, so that the derived Louvain modularity difference formula can be reused, which is only applicable to the case where one moves a vertex from a singleton community to any other community. With this more generalized equation, we fill the gap caused by the absence of this formula in the original Louvain paper. While we do not employ this generalized formula in the actual implementation, it still might be useful for other applications.

First, we calculate the modularity difference when removing a vertex from its old community (not required to be a singleton community) and inserting it into its own singleton community . As before, we prohibit self-loops in the original graph, hence . We won’t be as elaborate this time due to the rationale being very similar:

We bring together this equation with the one from the previous section and obtain the following generalized formula for the modularity difference when moving a vertex from its previous community to any other community . This is done by first moving from into its own singleton community , then moving it from there into the target community :

Finally, for the generalized formula, we have


The delta modularity calculation for removing from a singleton community can be derived from this formula by assuming that is a singleton community. As we do not allow self-loops, . Moreover, the sum of the weights of edges incident to vertices in is equal to the vertex degree: , which gives us:

which is indeed the formula derived here.