Introduction

🚨 The documentation is currently in progress. Meanwhile, you can see the source code take shape here.

Modularity

🚨 This page is a work in progress.

Motivation

We are often faced with networks whose underlying community structure is not known in advance, yet still, we need a measure to evaluate how good our proposed partition of the network into communities is. This is especially important when dealing with algorithms requiring an objective function to maximize (e.g. genetic algorithms). Newman and Girvan proposed a measure called modularity in 2003, which was adopted broadly and successfully in the network research community. Modularity $Q$ is maximized for divisions of a graph when many edges fall within the proposed communities (intra-community edges) as compared to edges between communities (inter-community edges).

The following figure depicts a contrived network with different vertex-community assignments. The intuitive grouping of vertices into communities in (a) leads to the maximal achievable modularity value of $Q \approx 0.46$ for this particular graph. This grouping is characterized by only sparse connections between communities which are in themselves densely connected. In contrast, figure (b) results in a lower value of just $Q \approx 0.13$ . We will discuss the possible range of modularity values later on.

Different qualities for different vertex-community assignments — Simple network with different proposed vertex-community assignments. (a) shows the optimal grouping of vertices into three communities, while (b) depicts a bad grouping into five communities. The blue community even induces a disconnected subgraph. Image inspired by this paper.

Configuration model & modularity formula

In the following, we assume a graph $G$ with $∣ V (G) ∣= n$ vertices and $∣ E (G) ∣= m$ edges. The edge set $E (G)$ is determined by the adjacency matrix¹ $A$ with entries:

$A_{uv} = {1, 0, if u and v are neighbors, otherwise$

Of all the edges, a high fraction of edges that connect vertices in the same community should give a good partition. For a specific community $c$ , this fraction is given by

$e_{c} = \frac{1}{2 m} u \in c \sum v \in c \sum A_{uv}$

where we iterate over all edges $u, v$ with both ends in community $c$ . We divide by $m$ to retrieve the fraction of all edges. The factor $2$ comes from the assumption that we only deal with undirected edges, so every edge occurs twice for every order of arguments ( $A_{uv}$ and $A_{vu}$ ). We call this term $e_{c}$ in accordance with here. Aiming for

$max c \in C \sum e_{c}$

with $C$ being the set of all proposed communities, makes sure a high fraction of edges is within communities (and not between). However, the trivial case with all vertices in the same community yields a maximum value of $1$ . Therefore, we cannot use this measure alone, but instead compare this fraction of intra-community edges with the expected number of intra-community edges if edges were distributed at random in our graph. Note that “random” does not mean any arbitrary graph, instead, the following properties should be preserved to allow for a fair comparison:

The number of nodes $n$
The number of edges $m$
The vertex degree $k_{v}$ for all vertices $v \in V (G)$ .

We employ the configuration model as null model (as seen here). A new, “empty” graph with $n$ nodes is constructed and $m$ edges are randomly inserted between vertices whilst ignoring their (potentially known) community structure.

In order to analyze the expected fraction of edges between any nodes $u$ and $v$ in a community, we cut every edge into two halves, leaving only edge stubs on the vertices as shown in the next figure. Since we have $m$ edges in our graph, we end up with $2 m$ edge stubs. There are $k_{v}$ ways to connect one edge stub of vertex $u$ (marked in red) to any edge stub of $v$ (marked in blue). Randomly, we could connect this one edge stub of $u$ to any of the $2 m - 1$ edge stubs in the entire graph ². Hence, the probability that one edge stub of $u$ connects to any edge stub of $v$ in our random graph is given by $\frac{k _{v}}{2 m - 1}$ . As vertex $u$ has $k_{u}$ edge stubs, the probability that vertex $u$ connects to vertex $v$ , i. e. any edge stub of $u$ connects to any edge stub of $v$ , is given by

$p (u, v) = \frac{k _{u} k _{v}}{2 m - 1}$

Configuration model with $k_{u} = 6$ and $k_{v} = 5$ . One possible edge between stubs is shown as dashed line.

This yields the equation for the expected number of edges that fall within a community $c$ if edges were distributed randomly in the graph while preserving node degrees:

$a_{c}^{2} : = \frac{1}{2 m} u \in c \sum v \in c \sum \frac{k _{u} k _{v}}{2 m}$

The same reasoning as before applies here for the term $\frac{1}{2 m}$ . Since we are dealing with big networks where $2 m ≫ 1$ , it is reasonable to drop the aforementioned “ $- 1$ ” in the denominator, which is common in literature (see here and here). Again, to show the connection to the original definition of modularity by Newman and Girvan, we refer to this term as $a_{c}^{2}$ , where for any community $c$ , we define $a_{c}$ as:

$a_{c} = \frac{1}{2 m} v \in c \sum k_{v}$

This denotes the fraction of edge stubs (out of all $\approx 2 m$ edge stubs in the graph) that are attached to vertices in community $c$ . One could also refer to this as the total vertex degree of vertices in community $c$ divided by twice the number of edges in the graph $2 m$ .

By combining $e_{c}$ with $a_{c}^{2}$ and by iterating over all communities $c \in C$ , we get: $Q (C) : = (fraction of edges that fall within the proposed communities) - (expected fraction of such edges in a random graph) = c \in C \sum (e_{c} - a_{c}^{2}) = c \in C \sum (\frac{1}{2 m} u \in c \sum v \in c \sum A_{uv} - \frac{1}{2 m} u \in c \sum v \in c \sum \frac{k _{u} k _{v}}{2 m})$

This gives us the definition of modularity that is often found in literature. We can use this measure as quality function for algorithms, rendering the problem of community detection into that of modularity optimization. Note that the variable $Q$ stands for “quality” ³.

$Q (C) = \frac{1}{2 m} c \in C \sum u \in c \sum v \in c \sum (A_{uv} - \frac{k _{u} k _{v}}{2 m})$

While we only consider undirected (and possibly weighted) graphs here, modularity can be extended to directed graphs as well, as seen here and here.

Alternative formulation

An alternative formulation of modularity is used here. Both variants are equal as we can transform them into each other. We will use the following identity to get from the first to the second line:

$i = 1 \sum n j = 1 \sum n x_{i} x_{j} = (k = 1 \sum n x_{k})^{2}$

$Q (C) = c \in C \sum (u \in c \sum v \in c \sum \frac{A _{uv}}{2 m} - u \in c \sum v \in c \sum \frac{k _{u} k _{v}}{2 m} \frac{1}{2 m}) = c \in C \sum (u \in c \sum v \in c \sum \frac{A _{uv}}{2 m} - (\frac{\sum _{v \in c} k _{v}}{2 m})^{2}) = c \in C \sum (\frac{Σ _{c}}{2 m} - (\frac{Σ _{\overset{c}{^}}}{2 m})^{2}) = \frac{1}{2 m} c \in C \sum (Σ_{c} - \frac{( Σ _{\overset{c}{^}} ) ^{2}}{2 m})$

$Q (C) = \frac{1}{2 m} c \in C \sum (Σ_{c} - \frac{( Σ _{\overset{c}{^}} ) ^{2}}{2 m})$

with $Σ_{c} = \sum_{u \in c} \sum_{v \in c} A_{uv}$ and $Σ_{\overset{c}{^}} = \sum_{v \in c} k_{v}$ . Here, $Σ_{\overset{c}{^}}$ is the sum of the weights of edges incident to vertices in $c$ (including self-loops), while $Σ_{c}$ is the sum of the weights of edges inside the community⁴. This is the actual formula used in the code.

“This reveals an inherent trade-off: To maximize the first term [inside the parantheses], many edges should be contained in clusters, whereas the minimization of the second term is achieved by splitting the graph into many clusters with small total degrees each.” ~ from here

🎈 Task: We motivated the introduction of a negative term in the modularity formula by this observation: “However, the trivial case with all vertices in the same community yields a maximum value of 1.” Make sure you understand why this is the case. Take a look at the definition of $e_{c}$ .

Note that this is just the representation of the graph for this book, not for the actual Rust code.

The “ $- 1$ ” is introduced because an edge stub cannot connect to itself in the configuration model. However, this still allows for self-loops as one edge stub could connect to another one on the same vertex.

Modularity is just one quality function (out of many others) that we will discuss here as it has been widely adopted in literature.

⁴

For undirected graphs, this is actually twice the sum of the weights of edges except for self-loops since they are counted only once (they are on the main diagonal of the adjacency matrix).

Modularity calculation examples

🚨 This page is a work in progress.

We’ve given an intuition for the quality function modularity that is defined as follows with two equivalent formulas:

$Q (C) : = \frac{1}{2 m} c \in C \sum u \in c \sum v \in c \sum (A_{uv} - \frac{k _{u} k _{v}}{2 m}) = \frac{1}{2 m} c \in C \sum (Σ_{c} - \frac{( Σ _{\overset{c}{^}} ) ^{2}}{2 m})$

Also recall the definition of $Σ_{c}$ and $Σ_{\overset{c}{^}}$ :

$Σ_{c} Σ_{\overset{c}{^}} = u \in c \sum v \in c \sum A_{uv} = v \in c \sum k_{v}$

Let’s see it in action with some examples and go through the calculations step by step. You can also verify the results in Desmos.

Weighted graph with singleton communities

Consider the following graph

Weighted test graph — A simple weighted test graph

Its adjacency matrix is given by:

$A = 31001057 050 2.5 07 2.5 1$

$C = {c_{0}, c_{1}, c_{2}, c_{3}} = {{0}, {1}, {2}, {3}}$

$m = \frac{1}{2} u, v \sum A_{uv} = \frac{1}{2} (3 + 2 + 10 + 14 + 5 + 1) = 17.5$

Modularity evalutes to:

$Q (C) : = \frac{1}{2 m} c \in C \sum u \in c \sum v \in c \sum (= : ϕ A_{uv} - \frac{k _{u} k _{v}}{2 m}) = \frac{1}{2 m} [u, v \in c_{0} \sum ϕ + u, v \in c_{1} \sum ϕ + u, v \in c_{2} \sum ϕ + u, v \in c_{3} \sum ϕ] = \frac{1}{2 m} [(A_{00} - \frac{k _{0} k _{0}}{2 m}) + (A_{11} - \frac{k _{1} k _{1}}{2 m}) + (A_{22} - \frac{k _{2} k _{2}}{2 m}) + (A_{33} - \frac{k _{3} k _{3}}{2 m})] = \frac{1}{35} \cdot [(3 - \frac{4 ^{2}}{35}) + (0 - \frac{1 3 ^{2}}{35}) + (0 - \frac{7. 5 ^{2}}{35}) + (1 - \frac{10. 5 ^{2}}{35})] = - \frac{423}{2450} \approx - 0.17265$

This is a pretty bad modularity and stems from the bad community assignment (every vertex in its own community).

We can also use the second equivalent formulation to get to the same result:

$Q (C) = \frac{1}{2 m} c \in C \sum (Σ_{c} - \frac{( Σ _{\overset{c}{^}} ) ^{2}}{2 m}) = \frac{1}{35} [(3 - \frac{4 ^{2}}{35}) + (0 - \frac{1 3 ^{2}}{35}) + (0 - \frac{7. 5 ^{2}}{35}) + (1 - \frac{10. 5 ^{2}}{35})] = - \frac{423}{2450} \approx - 0.17265$

The difference between the two formulations is not apparent in this example as we only consider singleton communities here (every vertex is in its own community).

Weighted graph with other communities

The adjacency matrix is the same as above and we also have $m = 17.5$ . But the vertex-community assignment differs. The partitioning $C$ shown in the figure is given by: $C = {c_{0}, c_{1}, c_{2}} = {{0}, {1, 3}, {2}}$

For the modularity, we therefore calculate:

$Q (C) : = \frac{1}{2 m} c \in C \sum u \in c \sum v \in c \sum (= : ϕ A_{uv} - \frac{k _{u} k _{v}}{2 m}) = \frac{1}{2 m} [u, v \in c_{0} \sum ϕ + u, v \in c_{1} \sum ϕ + u, v \in c_{2} \sum ϕ] = \frac{1}{2 m} [(A_{00} - \frac{k _{0} k _{0}}{2 m}) + (A_{11} - \frac{k _{1} k _{1}}{2 m}) + (A_{22} - \frac{k _{2} k _{2}}{2 m}) + (A_{33} - \frac{k _{3} k _{3}}{2 m})] + \frac{1}{2 m} [(A_{13} - \frac{k _{1} k _{3}}{2 m}) + (A_{31} - \frac{k _{3} k _{1}}{2 m})] = I - \frac{423}{2450} + \frac{1}{35} \cdot [(7 - \frac{13 \cdot 10.5}{35}) + (7 - \frac{10.5 \cdot 13}{35})] = - \frac{423}{2450} + \frac{31}{175} = \frac{11}{2450} \approx 0.00449$

In step $I$ , we’ve used the result from the weighted graph of the previous section. Note that while modularity is still not good, it has slightly improved.

With the equivalent formulation, we end up with – surprise, surprise – the same result:

$Q (C) = \frac{1}{2 m} c \in C \sum (Σ_{c} - \frac{( Σ _{\overset{c}{^}} ) ^{2}}{2 m}) = \frac{1}{2 m} [(A_{00} - \frac{k _{0}^{2}}{2 m}) + ((A_{11} + A_{13} + A_{31} + A_{33}) - \frac{( k _{1} + k _{3} ) ^{2}}{2 m}) + (A_{22} - \frac{k _{2}^{2}}{2 m})] = \frac{1}{35} [(3 - \frac{4 ^{2}}{35}) + ((0 + 7 + 7 + 1) - \frac{( 13 + 10.5 ) ^{2}}{35}) + (0 - \frac{7. 5 ^{2}}{35})] = \frac{1}{35} [(3 - \frac{4 ^{2}}{35}) + (15 - \frac{( 23.5 ) ^{2}}{35}) - \frac{7. 5 ^{2}}{35}] = \frac{11}{2450} \approx 0.00449$

If we somehow already know the values for $Σ_{c}$ and $Σ_{\overset{c}{^}}$ , i. e. the sum of the weights of edges inside the community and respectively the sum of the weights of edges incident to community vertices (aka vertex degrees), the second formula is the way to go as we can just plug our values in and are done in no time.

🎈 Task: Come up with your own graphs and calculate modularity by hand for those. Guess a good vertex-community assignment and see if modularity increases compared to a vertex-community assignment you feel is bad, e.g. when putting all vertices in one big community.

Here are some more examples of graphs used to test the code. The respective modularity values were calculated by hand in the same manner seen above.

Weighted graph 2 with singleton communities

TODO: Insert image of graph

$A = 42511503213721220$

$m = \frac{1}{2} u, v \sum A_{uv} = \frac{1}{2} (42 + 10 + 2 + 2 + 6 + 4 + 7 + 4) = 38.5$

$C = {{0}, {1}, {2}, {3}}$

$Q (C) = \frac{2}{11} \approx 0.181818$

Original Louvain paper graph

TODO: Insert image of graph

$A = 0011110000000000001010010000000011001110000000001000000100000000111000000000000010100001000100000011000100010000010001100000000000000000011100110000000010001010000010001001111000000110101001000000000001100000000000000011000000000000111000000000000010000000$

The graph is undirected, so we only need to consider the upper diagonal matrix of $A$ . We can also omit the diagonal since there are no self-loops in the graph (there are only $0$ s on the diagonal). Therefore, the sum of all entries in the upper diagonal matrix of $A$ corresponds to the total number of edges in the graph:

$m = \frac{1}{2} u, v \sum A_{uv} = \frac{1}{2} u < v \sum 2 A_{uv} = u < v \sum A_{uv} = 28$

For the original node-community assignment where every node is in its own community, we have this clustering:

$C = {{0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9}, {10}, {11}, {12}, {13}, {14}, {15}}$

The initial modularity is therefore given by:

$Q (C) = \frac{1}{2 m} c \in C \sum (Σ_{c} - \frac{( Σ _{\overset{c}{^}} ) ^{2}}{2 m}) = \frac{1}{56} [(0 - \frac{4 ^{2}}{56}) + (0 - \frac{5 ^{2}}{56}) + (0 - \frac{2 ^{2}}{56}) + \dots] = - \frac{1}{14} \approx - 0.07143$

In the next level of the hierarchy, Louvain locally optimizes modularity. The new clustering is given by:

$C = {{0, 1, 2, 4, 5}, {3, 6, 7}, {11, 13}, {8, 9, 10, 12, 14, 15}}$

$Q (C) = \frac{1}{2 m} c \in C \sum (Σ_{c} - \frac{( Σ _{\overset{c}{^}} ) ^{2}}{2 m}) = \frac{1}{56} [(2 \cdot 7 - \frac{2 0 ^{2}}{56}) + (2 \cdot 2 - \frac{9 ^{2}}{56}) + (2 \cdot 1 - \frac{7 ^{2}}{56}) + (2 \cdot 8 - \frac{2 0 ^{2}}{56})] = \frac{543}{1568} \approx 0.34630$

Finally, the second pass merges two communities together, so that we are left with a clustering into two communities in the end:

$C = {{0, 1, 2, 3, 4, 5, 6, 7}, {8, 9, 10, 11, 12, 13, 14, 15}}$

Notice the four edges $(0, 3)$ , $(1, 7)$ , $(2, 6)$ , $(5, 7)$ were inter-community edges, but are now intra-community edges as the two communities got merged together. The modularity increases to:

$Q (C) = \frac{1}{2 m} c \in C \sum (Σ_{c} - \frac{( Σ _{\overset{c}{^}} ) ^{2}}{2 m}) = \frac{1}{56} [(2 \cdot (7 + 2 + 4) - \frac{2 9 ^{2}}{56}) + (2 \cdot (8 + 1 + 3) - \frac{2 7 ^{2}}{56})] = \frac{615}{1568} \approx 0.39222$

In the 3rd pass of Louvain, we find that we cannot locally increase modularity anymore. Therefore, this is the final assignment that a full Louvain run might return. Note that due to the randomness in the algorithm, different runs can return different assignments with different modularity values. For example, another run finds that $0.375$ is the highest modularity it can achieve.

🎈 Task: What is the corresponding clustering to this modularity value? Go to the project page, download the code and run it on the graph multiple times until you encounter a modularity value of $0.375$ in the final run. Use the hierarchy command to display the node-to-community assignment for the different levels. What is the reason for this behavior? Hint: Take a look at node $6$ .

Modularity value range

🚨 This page is a work in progress.

🕑 TL;DR: Modularity $Q \in [- 0.5, 1]$ .

Going through the examples, you may have wondered what a “good” or “bad” modularity value really is. Here, we will discuss the value range of the modularity measure.

Modularity is zero if the fraction of edges within communities is no different from the respective fraction we would expect in a random graph. We get a positive value if this fraction is higher than expected on the basis of chance. This directly results from our definition of modularity. In the simple network presented here, $Q$ was around $0.13$ which is still slightly better than at random but nevertheless worse than the best result of $Q \approx 0.46$ for that particular graph.

Upper limit $1$

In theory, the maximum possible modularity value is $1$ for a graph with infinitely many communities, such that $a_{c}^{2} \to 0$ . We show this property for a graph with $t : = ∣ C ∣$ communities, each forming a $K_{2}$ graph, i. e. two vertices sharing one edge.

Graph for upper limit of modularity — Infinite graph and community assignment yielding the upper limit of modularity: $1$

This graph has $t$ communities as well as $m = t$ edges. When $t$ approaches infinity, we obtain:

$Q = t \to \infty lim \frac{1}{2 t} c = 1 \sum t u \in c \sum v \in c \sum (A_{uv} - \frac{k _{u} k _{v}}{2 t}) = t \to \infty lim \frac{1}{2 t} c = 1 \sum t (2 \cdot 1 - 4 \cdot \frac{1 \cdot 1}{2 t}) = t \to \infty lim \frac{1}{2 t} c = 1 \sum t 2 = t \to \infty lim \frac{2 t}{2 t} = 1$

As $\sum_{c \in C s} \sum_{u \in c} \sum_{v \in c} A_{uv} \leq 2 m$ and $\sum_{c \in C s} \sum_{u \in c} \sum_{v \in c} \frac{k _{u} k _{v}}{2 m} \geq 0$ for every graph partitioning, $Q = 1$ is indeed the upper limit of modularity.

Lower limit $- \frac{1}{2}$

The lower limit of modularity is $- \frac{1}{2}$ which is achieved using any bipartite graph $G (U, V, E)$ with the clustering $C = {c_{U}, c_{V}}$ (see here), that is, each cluster encompasses all the vertices of one bipartition of $G$ (the vertex sets $U$ or respectively $V$ ). The following figure shows the bipartite graph $G (U, V, E)$ with $U = {0, 1}$ , $V = {2, 3, 4, 5}$ and $E = {{0, 2}, {0, 3}, {1, 4}}$ . The given community assignment yields a modularity of $- \frac{1}{2}$ , while the maximum modularity for this particular graph is $Q \approx 0.44$ .

Graph for lower limit of modularity — Bipartite graph and community assignment yielding the lower limit of modularity: $- 0.5$

Common values

A rigorous proof that $- 0.5 \leq Q (C) \leq 1$ holds true for any undirected and unweighted graph $G$ and any community assignment (clustering) $C$ is presented here. Note however, that for a given graph, the maximum achievable modularity is oftentimes not $1$ . Newman and Girvan claim that a typical range for modularity values is from $0.3$ to $0.7$ (see here) with values above $0.3$ indicating a “significant community structure in a network” (see here). Whether values above $0.3$ really indicate a good community structure depends on the application and the particular graph.

Modularity resolution limit

🚨 This page is a work in progress.

Fortunato and Barthélemy have shown that modularity suffers from a resolution limit, implying that optimizing for modularity does not necessarily reveal the actual community structure of the network (see here). Two communities might get combined into a larger one, yielding a higher modularity. This is the case for communities where $m_{c} < m /2$ , i.e. the number of edges inside community $c$ is too small for the grouping of vertices to be recognized as one community.

Note that the notion of “community” is defined in the “weak” sense as in this paper: “In a weak community the sum of all degrees within [the subgraph] is larger than the sum of all degrees toward the rest of the network”. In the mathematical definition of modularity, however, what is considered a community is dependent on the size of the whole network (number of edges $m$ ) – which is not ideal.

Fortunato and Barthélemy point out the importance of further investigation of the modules found by an algorithm that optimizes for modularity since one does not know if a claimed “module” is really just one community or a combination of multiple communities. Special care is required for communities with

$m_{c} < m /2$

as they are likely to be composed of two or more smaller¹ ones (this number results from merging the two biggest, but still unrecognizable modules together). Yet, even bigger modules could suffer from this constraint of modularity if they are – as the authors call it – “fuzzy”, meaning that they have an increasing number of external edges (edges to other modules), making the difference between internal and external edges smaller.

Graph illustrating the resolution limit of modularity — Network consisting of two $K_{4}$ and one $K_{13}$ clique. The former are not correctly identified as individual communities when optimizing for modularity. $Q_{single} \approx 0.239$ ( $K_{4}$ cliques identified as single communities), $Q_{pair} \approx 0.240$ ( $K_{4}$ cliques grouped together as a pair).

The above figure illustrates the resolution limit in an extreme example. The two cliques (complete graphs $K_{4}$ ) on the right are interconnected by one single edge (“bridge”) and share just one edge with the rest of the network, yet they are not recognized as two individual communities; optimal modularity assigns all eight vertices to the same community. The clique $K_{13}$ may be an arbitrary graph with $\geq 84$ edges, so that the whole network consists of $\geq 98$ edges. This ensures that for the $K_{4}$ cliques (composed of $6$ edges each) the condition $m_{c} < m /2 = 98/2 = 7$ $⟹ m_{c} = 6 < 7$ is met. A complete graph $K_{n}$ holds $n (n - 1) /2$ edges, thus, our network consists of $\frac{13 \cdot 12}{2} + 2 \cdot \frac{4 \cdot 3}{2} + 1 = 92$ edges. $92 < 98$ , yet still, $92/2 \approx 6.78$ which happens to be sufficiently close to $7.0$ ².

The terms “small” and “big” are used here not to indicate the number of nodes, but the number of intra-edges in the respective communities. How nodes are distributed in the network does not have any impact on modularity, i.e. whether a community has more or less nodes is irrelevant.

Fortunato and Barthélemy point out that they calculate as if the variables were continuous, which they are not. Thus, for the given formula of the resolution limit, one has to test the closest integer values.

Louvain algorithm

🚨 This page is a work in progress.

The Louvain method – named after the University of Louvain where Blondel et al. developed the algorithm – finds communities by optimizing modularity locally for every node’s neighborhood, then consolidating the vertices of newly found communities to super vertices and repeating the steps on the new, smaller graph (see original paper). This multi-step algorithm ends when modularity does not improve in one pass anymore. It has been shown to yield state-of-the-art results with very good performance which is linear in the number of edges $m$ in the graph (see here).

We demonstrate and explain the Louvain algorithm with the following undirected and unweighted graph. The source code can deal with weighted graphs as well.

Graph with communities — Initial singleton communities

We assume we somehow know the communities in the graph a priori. This known vertex-community assignment is oftentimes called the “ground truth”. In our sampel graph, we have two communities that the Louvain algorithm should find in the end since it optimizes modularity.

$C_{ground truth} = {c_{0}, c_{1}} = {{0, 1, 2, 3, 4}, {5, 6, 7, 8}}$

The two ground-truth communities are denoted by the shape of the vertices (either a circle or a square). In the background, we draw in gray areas around the vertices to indicate the communities that the algorithm found in each step.

Initially each vertex gets assigned to its own community that we will call “singleton community” (mathematically speaking a set consisting of just one element, e.g. ${4}$ ). Therefore, we have this partition:

$C = {{0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}}$

🎈 Task: Calculate the modularity $Q$ for the initial partition and assume a weight of 1 for every edge. Obviously, $Q$ shouldn’t be very high.

I. Modularity optimization

After the initial assignment to singleton communities, the first step of one Louvain pass is essentially to apply Newman’s agglomerative hierarchical clustering method presented here and improved by Clauset, Newman, and Moore here:

“[S]tarting with each vertex being the sole member of a community of one, we repeatedly join together the two communities whose amalgamation produces the largest increase in $Q$ .” ~ from here

In the Louvain method, this is done by iterating over all vertices (in randomized order): a vertex $u$ is then “moved” to the community of one of its neighbors $v \in N (u)$ , so that the modularity gain $Δ Q$ is maximized (this is why Louvain is a greedy algorithm). If no positive gain is possible, $u$ remains in its singleton community.

In the following figure, some vertices were already gather to communities, e.g. vertices $5$ , $6$ , $7$ and $8$ on the left and vertices $0$ and $1$ on the right belong together accoridng to the current state of the algorithm.

Louvain optimization phase in first pass: Delta modularity calculation for vertex 4.

At this point, we consider vertex $u = 4$ and its neighbors $N (4) = {0, 1, 3, 5}$ . The highest increase is $Δ Q \approx 0.097$ when the community of vertices $0$ and $1$ accommodates vertex $4$ , hence $4$ is “moved” to this community. An efficient calculation of $Δ Q$ is derived in the next section. We consider vertices multiple times until modularity cannot be increased anymore which corresponds to a local maximum. The following figure illustrates the community assignments as result of the first local optimization phase.

Result of local optimization in first pass — Louvain optimization phase in first pass: Result of local optimization.

II. Community aggregation (consolidation)

In the second phase, the communities that were found in the first phase are replaced by super vertices. This results in a new graph where vertices are the communities of phase I. Consider the middle layer of the following figure showing the outcome of the first pass after phase I and II.

Resulting Louvain hierarchy for the sample graph. Communities of the previous pass become the new super vertices.

Edges now have a weight assigned to them, even though we started with an unweighted graph:

Edges between communities (inter-community edges) are condensed to one edge between the communities with edge weight equal to the sum of the weight of all previous edges between the communities.
Edges within communities (intra-community edges) result in self-loops with a weight of value two in order to account for the undirectedness (see calculations in modularity section).

Termination

These two phases make one pass of the Louvain algorithm. The first phase works directly on the graph that emerges from the second phase (the new super vertices are considered to lie in their own singleton communities). Passes are executed repeatedly while modularity is always calculated with respect to the original graph. The algorithm ends when modularity cannot be increased anymore (hopefully corresponding to a global maximum), which is often the case after a handful of passes, even for large networks (see here).

Capturing the resulting community-vertex assignment after each pass produces a hierarchy (that we’ve already seen) since the second step creates super-vertices based on the previous vertices of a community and thus the local optimization phase essentially finds communities of communities. Blondel et al. remark that this is favorable in order to reveal hierarchical structures present in many large networks. They also point out that the resolution limit is mitigated as vertices are moved one after another, yielding a low probability of merging two distinct communities. Even if the super vertices of the respective communities are merged in later passes, the resulting hierarchy helps in identifying on which “organization level” this merge took place (see here).

Delta modularity calculation (singleton)

🚨 This page is a work in progress. TODO: Maybe add a graphic here to explain the various variables.

In the modularity optimization phase, we only rely on local information. Recalculating the global modularity value for every possible neighbor of each vertex would significantly degrade the performance of the Louvain algorithm. This is why Blondel et al. employ a delta modularity formula that can be applied to millions of nodes. In this section, we will explain and derive the formula and show how it can be simplified for usage in a program.

We have seen here that the modularity $Q (C)$ for a partition $C$ is given by:

$Q (C) = \frac{1}{2 m} c \in C \sum (Σ_{c} - \frac{( Σ _{\overset{c}{^}} ) ^{2}}{2 m})$

We are only intrested in the contribution of a single community $c$ to the overall modularity. With a slight abuse of notation, let $Q (c)$ denote modularity of community $c$ ¹, such that $Q (C) = \sum_{c \in C} Q (c)$ . Therefore, for the modularity of a community $c$ , we get:

$Q (c) = \frac{Σ _{c}}{2 m} - (\frac{Σ _{\overset{c}{^}}}{2 m})^{2}$

Next, we look closely at the process of “moving” a vertex $u$ to the community of one of its neighbors $N (u)$ . We assume the vertex to be located in a singleton community. The formula Blondel et al. presented is for exactly this case, where we remove $u$ from its singleton community and then insert it into the neighbor’s community. This is actually only applicable for the first iteration of the modularity optimization phase where in the beginning of every new pass we deal with singleton communities. The authors state that they use a similar formula to calculate the modularity change when $u$ is removed from any community (also those with more than one vertex in it), but do not reveal the expression used. We will therefore derive the generalized version in the next section.

For now, let us consider the case where we move a vertex $u$ from its singleton community that we will denote by $\overset{c}{˚}_{u}$ to any other community $c$ . The modularity change (delta) is then given by

$Δ Q (u, \overset{c}{˚}_{u}, c) = Q^{'} (c) - Q (c) - Q (\overset{c}{˚}_{u})$

where $Q (c)$ is the quality of community $c$ before the merge (and hence without vertex $u$ in that community) and $Q^{'} (c)$ is the quality of $c$ after the merge (and hence after vertex $u$ was integrated into community $c$ ). As the singleton community does not exist anymore after the merge, we also subtract the modularity of the singleton community $Q (\overset{c}{˚}_{u})$ . In the following, let the prime symbol always denote the state of a variable after the merge.

With the above formula, this gives us:

$Δ Q (u, \overset{c}{˚}_{u}, c) = Q^{'} (c) - Q (c) - Q (\overset{c}{˚}_{u}) = [\frac{Σ _{c}^{'}}{2 m} - (\frac{Σ _{\overset{c}{^}}^{'}}{2 m})^{2}] - [\frac{Σ _{c}}{2 m} - (\frac{Σ _{\overset{c}{^}}}{2 m})^{2}] - [\frac{Σ _{\overset{c}{˚}_{u}}}{2 m} - (\frac{k _{u}}{2 m})^{2}]$

which is the equation presented in the original Louvain paper. If we consider that after the merging process, the sum of the weights of the edges between node $u$ and community $c$ – that we will denote² by $k_{u}^{\to c}$ – now adds to the community $c$ accommodating vertex $u$ , we obtain

$Σ_{c}^{'} = Σ_{c} + 2 k_{u}^{\to c}$

Likewise, we find that the weighted vertex degree $k_{u}$ adds to the total weighted vertex degree of $c$ after the merge, thus:

$Σ_{\overset{c}{^}}^{'} = Σ_{\overset{c}{^}} + k_{u}$

With this, we can simplify our expression for the modularity change. Note that we use $Σ_{\overset{c}{˚}_{u}} = 0$ , as there are no edges inside a singleton community $\overset{c}{˚}_{u}$ since we do not allow self-loops in the original graph.

$Δ Q (u, \overset{c}{˚}_{u}, c) = [\frac{Σ _{c} + 2 k _{u}^{\to c}}{2 m} - (\frac{Σ _{\overset{c}{^}} + k _{u}}{2 m})^{2}] - [\frac{Σ _{c}}{2 m} - (\frac{Σ _{\overset{c}{^}}}{2 m})^{2}] - [\frac{Σ _{\overset{c}{˚}_{u}}}{2 m} - (\frac{k _{u}}{2 m})^{2}] = [\frac{Σ _{c} + 2 k _{u}^{\to c}}{2 m} - (\frac{Σ _{\overset{c}{^}} + k _{u}}{2 m})^{2}] - [\frac{Σ _{c}}{2 m} - (\frac{Σ _{\overset{c}{^}}}{2 m})^{2} - (\frac{k _{u}}{2 m})^{2}] = [\frac{Σ _{c} + 2 k _{u}^{\to c}}{2 m} - (\frac{Σ _{\overset{c}{^}} + k _{u}}{2 m})^{2}] - [\frac{Σ _{c}}{2 m} - (\frac{Σ _{\overset{c}{^}}}{2 m})^{2} - (\frac{k _{u}}{2 m})^{2}] = \frac{2 k _{u}^{\to c}}{2 m} - (\frac{Σ _{\overset{c}{^}}}{2 m})^{2} - 2 \frac{Σ _{\overset{c}{^}}}{2 m} \frac{k _{u}}{2 m} - (\frac{k _{u}}{2 m})^{2} + (\frac{Σ _{\overset{c}{^}}}{2 m})^{2} + (\frac{k _{u}}{2 m})^{2}] = \frac{k _{u}^{\to c}}{m} - 2 \frac{Σ _{\overset{c}{^}} k _{u}}{m \cdot 2 m} = \frac{1}{m} (k_{u}^{\to c} - \frac{Σ _{\overset{c}{^}} k _{u}}{2 m})$

Finally, we have

$Δ Q (u, \overset{c}{˚}_{u}, c) \propto k_{u}^{\to c} - \frac{Σ _{\overset{c}{^}} k _{u}}{2 m}$

We can ignore the constant $\frac{1}{m}$ since we only compare different modularity increases $Δ Q (u, \overset{c}{˚}_{u}, c$ with each other and therefore merely require a relative measure. This saves us one division in the algorithm. After one complete pass, the new global modularity is calculated using the formula from here³.

Our short formula is used in the algorithm to efficiently calculate the delta modularity gain $Δ Q$ . To remove a vertex $u$ from its previous community $c_{u}$ or to insert it into a new community $c$ , only $Σ_{c}$ and $Σ_{\overset{c}{^}}$ have to be updated. $Σ_{c}$ is adjusted for the global modularity calculation and not for the delta modularity $Δ Q$ . Note that we can precalculate the sum of the weights of edges between $u$ and community $c_{u}$ or $c$ ( $k_{u}^{\to c_{u}}$ and $k_{u}^{\to c}$ ) before calling the remove- or insert-function. This is crucial for the speed of Louvain as $Δ Q$ has to be computed frequently. As stated above, we also dropped the factor $\frac{1}{m}$ to save one division. For the calculation of global modularity, we do not omit the factor $\frac{1}{2 m}$ in order to obtain the absolute global modularity $Q$ .

We can distinguish the two functions by looking at the argument, which is either a community $c$ or a partition $C$ , i.e. a set of communities $C = {c_{1}, \dots, c_{k}}$ .

This is done in conformity with these notes on modularity. The arrow does not indicate a direction; we still deal with an undirected graph.

In newer versions of the algorithm, this is not the case anymore and we only use the relative modularity calculation. TODO

Delta modularity calculation (generalized)

🚨 This page is a work in progress. TODO: explain why this is needed.

Here we generalize the delta modularity calculation presented in the previous section for the case where we move a vertex $u$ from any community (not just a singleton community) to any other community $c$ . We proceed by removing $u$ from the old community and putting it into its own singleton community. The latter is done, so that the derived Louvain modularity difference formula can be reused, which is only applicable to the case where one moves a vertex from a singleton community to any other community. With this more generalized equation, we fill the gap caused by the absence of this formula in the original Louvain paper. While we do not employ this generalized formula in the actual implementation, it still might be useful for other applications.

First, we calculate the modularity difference when removing a vertex $u$ from its old community $c_{u}$ (not required to be a singleton community) and inserting it into its own singleton community $\overset{c}{˚}_{u}$ . As before, we prohibit self-loops in the original graph, hence $Σ_{\overset{c}{˚}_{u}}$ . We won’t be as elaborate this time due to the rationale being very similar:

$Δ Q (u, c_{u}, \overset{c}{˚}_{u}) = Q^{'} (c_{u}) - Q (c_{u}) + Q (\overset{c}{˚}_{u}) = [\frac{Σ _{c_{u}}^{'}}{2 m} - (\frac{Σ _{\overset{c}{^}_{u}}^{'}}{2 m})^{2}] - [\frac{Σ _{c_{u}}}{2 m} - (\frac{Σ _{\overset{c}{^}_{u}}}{2 m})^{2}] + [\frac{Σ _{\overset{c}{˚}_{u}}}{2 m} - (\frac{k _{u}}{2 m})^{2}] = [\frac{Σ _{c_{u}} - 2 k _{u}^{\to c_{u}}}{2 m} - (\frac{Σ _{\overset{c}{^}_{u}} - k _{u}}{2 m})^{2}] - [\frac{Σ _{c_{u}}}{2 m} - (\frac{Σ _{\overset{c}{^}_{u}}}{2 m})^{2}] - (\frac{k _{u}}{2 m})^{2} = - \frac{k _{u}^{\to c_{u}}}{m} - (\frac{Σ _{\overset{c}{^}_{u}}}{2 m})^{2} + 2 \frac{Σ _{\overset{c}{^}_{u}}}{2 m} \frac{k _{u}}{2 m} - (\frac{k _{u}}{2 m})^{2} + (\frac{Σ _{\overset{c}{^}_{u}}}{2 m})^{2} - (\frac{k _{u}}{2 m})^{2} = - \frac{k _{u}^{\to c_{u}}}{m} + \frac{2 k _{u} Σ _{\overset{c}{^}_{u}}}{( 2 m ^{2} )} - \frac{2 k _{u}^{2}}{( 2 m ) ^{2}} = \frac{1}{m} (- k_{u}^{\to c_{u}} + \frac{k _{u} ( Σ _{\overset{c}{^}_{u}} - k _{u} )}{2 m})$

We bring together this equation with the one from the previous section and obtain the following generalized formula for the modularity difference when moving a vertex $u$ from its previous community $c_{u}$ to any other community $c$ . This is done by first moving $u$ from $c_{u}$ into its own singleton community $\overset{c}{˚}_{u}$ , then moving it from there into the target community $c$ :

$Δ Q (u, c_{u}, c) = Δ Q (u, c_{u}, \overset{c}{˚}_{u}) + Δ Q (u, \overset{c}{˚}_{u}, c) = \frac{1}{m} (- k_{u}^{\to c_{u}} + \frac{k _{u} ( Σ _{\overset{c}{^}_{u}} - k _{u} )}{2 m}) + \frac{1}{m} (k_{u}^{\to c} - \frac{Σ _{\overset{c}{^}} k _{u}}{2 m}) = \frac{1}{m} (k_{u}^{\to c} - k_{u}^{\to c_{u}} + \frac{k _{u} ( Σ _{\overset{c}{^}_{u}} - k _{u} ) - Σ _{\overset{c}{^}} k _{u}}{2 m}) = \frac{1}{m} (k_{u}^{\to c} - k_{u}^{\to c_{u}} + \frac{k _{u} ( Σ _{\overset{c}{^}_{u}} - Σ _{\overset{c}{^}} - k _{u} )}{2 m})$

Finally, for the generalized formula, we have

$Δ Q (u, c_{u}, c) \propto (k_{u}^{\to c} - k_{u}^{\to c_{u}}) + \frac{k _{u} ( Σ _{\overset{c}{^}_{u}} - Σ _{\overset{c}{^}} - k _{u} )}{2 m}$

The delta modularity calculation for removing from a singleton community can be derived from this formula by assuming that $c_{u}$ is a singleton community. As we do not allow self-loops, $k_{u}^{\to c_{u}} = 0$ . Moreover, the sum of the weights of edges incident to vertices in $c_{u}$ is equal to the vertex degree: $\sum_{\overset{c}{^}_{u}} = k_{u}$ , which gives us:

$Δ Q (u, \overset{c}{˚}_{u}, c) \propto (k_{u}^{\to c} - 0) + \frac{k _{u} ( k _{u} - Σ _{\overset{c}{^}} - k _{u} )}{2 m} = k_{u}^{\to c} - \frac{Σ _{\overset{c}{^}} k _{u}}{m}$

which is indeed the formula derived here.

Fast Louvain