Louvain modularity
Network science | ||||
---|---|---|---|---|
Network types | ||||
Graphs | ||||
|
||||
Models | ||||
|
||||
| ||||
The Louvain method for community detection is a method to extract communities from large networks created by Blondel et al.[1] from the University of Louvain (affiliation of authors has given the method its name). The method is a greedy optimization method that appears to run in time [math]\displaystyle{ O(n \cdot log^2 n) }[/math] in the number of nodes in the network.[2]
Modularity optimization
The inspiration for this method of community detection is the optimization of modularity as the algorithm progresses. Modularity is a scale value between −0.5 (non-modular clustering) and 1 (fully modular clustering) that measures the relative density of edges inside communities with respect to edges outside communities. Optimizing this value theoretically results in the best possible grouping of the nodes of a given network, however going through all possible iterations of the nodes into groups is impractical so heuristic algorithms are used.
In the Louvain Method of community detection, first small communities are found by optimizing modularity locally on all nodes, then each small community is grouped into one node and the first step is repeated. The method is similar to the earlier method by Clauset, Newman and Moore[3] that connects communities whose amalgamation produces the largest increase in modularity.
Algorithm
The value to be optimized is modularity, defined as a value in the range [math]\displaystyle{ [-1/2,1] }[/math] that measures the density of links inside communities compared to links between communities.[1] For a weighted graph, modularity is defined as:
[math]\displaystyle{ Q = \frac{1}{2m}\sum\limits_{ij}\bigg[A_{ij} - \frac{k_i k_j}{2m}\bigg]\delta (c_i,c_j), }[/math]
where
- [math]\displaystyle{ A_{ij} }[/math] represents the edge weight between nodes [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math];
- [math]\displaystyle{ k_i }[/math] and [math]\displaystyle{ k_j }[/math] are the sum of the weights of the edges attached to nodes [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math], respectively;
- [math]\displaystyle{ m }[/math] is the sum of all of the edge weights in the graph;
- [math]\displaystyle{ c_i }[/math] and [math]\displaystyle{ c_j }[/math] are the communities of the nodes; and
- [math]\displaystyle{ \delta }[/math] is Kronecker delta function ([math]\displaystyle{ \delta (x,y)= 1 }[/math] if [math]\displaystyle{ x=y }[/math], [math]\displaystyle{ 0 }[/math] otherwise).
In order to maximize this value efficiently, the Louvain Method has two phases that are repeated iteratively.
First, each node in the network is assigned to its own community. Then for each node [math]\displaystyle{ i }[/math], the change in modularity is calculated for removing [math]\displaystyle{ i }[/math] from its own community and moving it into the community of each neighbor [math]\displaystyle{ j }[/math] of [math]\displaystyle{ i }[/math]. This value is easily calculated by two steps: (1) removing [math]\displaystyle{ i }[/math] from its original community, and (2) inserting [math]\displaystyle{ i }[/math] to the community of [math]\displaystyle{ j }[/math]. The two equations are quite similar, and the equation for step (2) is:[1]
[math]\displaystyle{ \Delta Q = \bigg[ \frac{\Sigma_{in} + 2k_{i,in}}{2m} - \bigg(\frac{\Sigma_{tot} + k_i}{2m}\bigg)^2 \bigg]-\bigg[\frac{\Sigma_{in}}{2m} - \bigg(\frac{\Sigma_{tot}}{2m}\bigg)^2-\bigg(\frac{k_i}{2m}\bigg)^2\bigg] }[/math]
Where [math]\displaystyle{ \Sigma_{in} }[/math] is sum of all the weights of the links inside the community [math]\displaystyle{ i }[/math] is moving into, [math]\displaystyle{ \Sigma_{tot} }[/math] is the sum of all the weights of the links to nodes in the community [math]\displaystyle{ i }[/math] is moving into, [math]\displaystyle{ k_i }[/math] is the weighted degree of [math]\displaystyle{ i }[/math], [math]\displaystyle{ k_{i,in} }[/math] is the sum of the weights of the links between [math]\displaystyle{ i }[/math] and other nodes in the community that [math]\displaystyle{ i }[/math] is moving into, and [math]\displaystyle{ m }[/math] is the sum of the weights of all links in the network. Then, once this value is calculated for all communities [math]\displaystyle{ i }[/math] is connected to, [math]\displaystyle{ i }[/math] is placed into the community that resulted in the greatest modularity increase. If no increase is possible, [math]\displaystyle{ i }[/math] remains in its original community. This process is applied repeatedly and sequentially to all nodes until no modularity increase can occur. Once this local maximum of modularity is hit, the first phase has ended.
In the second phase of the algorithm, it groups all of the nodes in the same community and builds a new network where nodes are the communities from the previous phase. Any links between nodes of the same community are now represented by self-loops on the new community node and links from multiple nodes in the same community to a node in a different community are represented by weighted edges between communities. Once the new network is created, the second phase has ended and the first phase can be re-applied to the new network.
Previous uses
- Twitter social Network (2.4 Million nodes, 38 million links) by Josep Pujol, Vijay Erramilli, and Pablo Rodriguez:[4] The authors explore the problem of partitioning Online Social Networks onto different machines.
- Mobile phone Network (4 Million nodes, 100 Million links) by Derek Greene, Donal Doyle, and Padraig Cunningham:[5] Community-tracking strategies for identifying dynamic communities of different dynamic social networks.
- Detecting species in network-based dynamical model.[6]
Comparison to other methods
When comparing modularity optimization methods, the two measures of importance are the speed and the resulting modularity value. A higher speed is better as it shows a method is more efficient than others and a higher modularity value is desirable as it points to having better-defined communities. The compared methods are, the algorithm of Clauset, Newman, and Moore,[3] Pons and Latapy,[7] and Wakita and Tsurumi.[8]
Karate | Arxiv | Internet | Web nd.edu | Phone | Web uk-2005 | Web WebBase 2001 | |
---|---|---|---|---|---|---|---|
Nodes/links | 34/77 | 9k/24k | 70k/351k | 325k/1M | 2.6M/6.3M | 39M/783M | 118M/1B |
Clauset, Newman, and Moore | .38/0s | .772/3.6s | .692/799s | .927/5034s | -/- | -/- | -/- |
Pons and Latapy | .42/0s | .757/3.3s | .729/575s | .895/6666s | -/- | -/- | -/- |
Wakita and Tsurumi | .42/0s | .761/0.7s | .667/62s | .898/248s | .56/464s | -/- | -/- |
Louvain Method | .42/0s | .813/0s | .781/1s | .935/3s | .769/134s | .979/738s | .984/152mn |
-/- in the table refers to a method that took over 24hrs to run. This table (from [1][10]) shows that the Louvain method outperforms many similar modularity optimization methods in both the modularity and the time categories.
Directed graphs
To deal with directed graphs [math]\displaystyle{ G=(V,A) }[/math], Leicht and Newman [11] designed the directed modularity.
[math]\displaystyle{ Q_d = \frac{1}{m}\sum\limits_{ij}\bigg[A_{ij} - \frac{k^{in}_i k^{out}_j}{m}\bigg]\delta (c_i,c_j), }[/math]
where [math]\displaystyle{ k^{in}_i }[/math] resp. [math]\displaystyle{ k^{out}_j }[/math] are the sum of the weights of the arcs that go to [math]\displaystyle{ i }[/math] resp. comes from [math]\displaystyle{ j }[/math].
Dugué and Perez used the Louvain algorithm to optimize the directed modularity [math]\displaystyle{ Q_d }[/math] to emphasize the performance of the algorithm and show the relevance of this modularity version on directed graphs[12].
See also
References
- ↑ 1.0 1.1 1.2 1.3 Blondel, Vincent D; Guillaume, Jean-Loup; Lambiotte, Renaud; Lefebvre, Etienne (9 October 2008). "Fast unfolding of communities in large networks". Journal of Statistical Mechanics: Theory and Experiment 2008 (10): P10008. doi:10.1088/1742-5468/2008/10/P10008. Bibcode: 2008JSMTE..10..008B.
- ↑ Lancichinetti, Andrea; Fortunato, Santo (2009-11-30). "Community detection algorithms: A comparative analysis". Physical Review E 80 (5): 056117. doi:10.1103/physreve.80.056117. ISSN 1539-3755. PMID 20365053. Bibcode: 2009PhRvE..80e6117L.
- ↑ 3.0 3.1 Clauset, Aaron; Newman, M. E. J.; Moore, Cristopher (2004-12-06). "Finding community structure in very large networks". Physical Review E 70 (6): 066111. doi:10.1103/PhysRevE.70.066111. ISSN 1539-3755. PMID 15697438. Bibcode: 2004PhRvE..70f6111C.
- ↑ Pujol, Josep M.; Erramilli, Vijay; Rodriguez, Pablo (2009). "Divide and Conquer: Partitioning Online Social Networks". arXiv:0905.4918v1 [cs.NI].
- ↑ Greene, Derek; Doyle, Dónal; Cunningham, Pádraig (May 2011). Tracking the Evolution of Communities in Dynamic Social Networks (PDF) (Technical report). University College Dublin. UCD-CSI-2011-06. Archived from the original (PDF) on 2013-05-12. Retrieved 2014-11-20.
- ↑ Markovitch, Omer; Krasnogor, Natalio (2018). "Predicting species emergence in simulated complex pre-biotic networks". PLOS ONE 13 (2): e0192871. doi:10.1371/journal.pone.0192871. PMID 29447212. Bibcode: 2018PLoSO..1392871M.
- ↑ Pons, Pascal; Latapy, Matthieu (2006). "Computing Communities in Large Networks Using Random Walks". Journal of Graph Algorithms and Applications 10 (2): 191–218. doi:10.7155/jgaa.00124. http://jgaa.info/accepted/2006/PonsLatapy2006.10.2.pdf.
- ↑ Wakita, Ken; Tsurumi, Toshiyuki (2007). "Finding Community Structure in Mega-scale Social Networks". arXiv:cs/0702048.
- ↑ Blondel, Vincent D.; Guillaume, Jean-Loup; Lambiotte, Renaud; Lefebvre, Etienne (2008). "Fast unfolding of communities in large networks". Journal of Statistical Mechanics: Theory and Experiment 2008 (10): P10008. doi:10.1088/1742-5468/2008/10/P10008. Bibcode: 2008JSMTE..10..008B.
- ↑ Aynaud, Thomas; Blondel, Vincent D.; Guillaume, Jean-Loup; Lambiotte, Renaud (2011). "Multilevel local optimization of modularity". Graph Partitioning (John Wiley & Sons): 315–345.
- ↑ Leicht, E. A.; Newman, M. E. J. (2008). "Community Structure in Directed Networks". Physical Review Letters 100 (11): 118703. doi:10.1103/PhysRevLett.100.118703. PMID 18517839. Bibcode: 2008PhRvL.100k8703L.
- ↑ Dugué, Nicolas; Perez, A. (2015). Directed Louvain: maximizing modularity in directed networks. (Report). https://hal.archives-ouvertes.fr/hal-01231784/document.
- "The Louvain method for community detection in large networks" Vincent Blondel http://perso.uclouvain.be/vincent.blondel/research/louvain.html
- "The directed Louvain version" https://github.com/nicolasdugue/DirectedLouvain/