Louvain algorithm
🚨 This page is a work in progress.
The Louvain method – named after the University of Louvain where Blondel et al. developed the algorithm – finds communities by optimizing modularity locally for every node’s neighborhood, then consolidating the vertices of newly found communities to super vertices and repeating the steps on the new, smaller graph (see original paper). This multi-step algorithm ends when modularity does not improve in one pass anymore. It has been shown to yield state-of-the-art results with very good performance which is linear in the number of edges in the graph (see here).
We demonstrate and explain the Louvain algorithm with the following undirected and unweighted graph. The source code can deal with weighted graphs as well.
We assume we somehow know the communities in the graph a priori. This known vertex-community assignment is oftentimes called the “ground truth”. In our sampel graph, we have two communities that the Louvain algorithm should find in the end since it optimizes modularity.
The two ground-truth communities are denoted by the shape of the vertices (either a circle or a square). In the background, we draw in gray areas around the vertices to indicate the communities that the algorithm found in each step.
Initially each vertex gets assigned to its own community that we will call “singleton community” (mathematically speaking a set consisting of just one element, e.g. ). Therefore, we have this partition:
🎈 Task: Calculate the modularity for the initial partition and assume a weight of 1 for every edge. Obviously, shouldn’t be very high.
I. Modularity optimization
After the initial assignment to singleton communities, the first step of one Louvain pass is essentially to apply Newman’s agglomerative hierarchical clustering method presented here and improved by Clauset, Newman, and Moore here:
“[S]tarting with each vertex being the sole member of a community of one, we repeatedly join together the two communities whose amalgamation produces the largest increase in .” ~ from here
In the Louvain method, this is done by iterating over all vertices (in randomized order): a vertex is then “moved” to the community of one of its neighbors , so that the modularity gain is maximized (this is why Louvain is a greedy algorithm). If no positive gain is possible, remains in its singleton community.
In the following figure, some vertices were already gather to communities, e.g. vertices , , and on the left and vertices and on the right belong together accoridng to the current state of the algorithm.
At this point, we consider vertex and its neighbors . The highest increase is when the community of vertices and accommodates vertex , hence is “moved” to this community. An efficient calculation of is derived in the next section. We consider vertices multiple times until modularity cannot be increased anymore which corresponds to a local maximum. The following figure illustrates the community assignments as result of the first local optimization phase.
II. Community aggregation (consolidation)
In the second phase, the communities that were found in the first phase are replaced by super vertices. This results in a new graph where vertices are the communities of phase I. Consider the middle layer of the following figure showing the outcome of the first pass after phase I and II.
Edges now have a weight assigned to them, even though we started with an unweighted graph:
- Edges between communities (inter-community edges) are condensed to one edge between the communities with edge weight equal to the sum of the weight of all previous edges between the communities.
- Edges within communities (intra-community edges) result in self-loops with a weight of value two in order to account for the undirectedness (see calculations in modularity section).
Termination
These two phases make one pass of the Louvain algorithm. The first phase works directly on the graph that emerges from the second phase (the new super vertices are considered to lie in their own singleton communities). Passes are executed repeatedly while modularity is always calculated with respect to the original graph. The algorithm ends when modularity cannot be increased anymore (hopefully corresponding to a global maximum), which is often the case after a handful of passes, even for large networks (see here).
Capturing the resulting community-vertex assignment after each pass produces a hierarchy (that we’ve already seen) since the second step creates super-vertices based on the previous vertices of a community and thus the local optimization phase essentially finds communities of communities. Blondel et al. remark that this is favorable in order to reveal hierarchical structures present in many large networks. They also point out that the resolution limit is mitigated as vertices are moved one after another, yielding a low probability of merging two distinct communities. Even if the super vertices of the respective communities are merged in later passes, the resulting hierarchy helps in identifying on which “organization level” this merge took place (see here).