Closeness centrality

From HandWiki
Revision as of 14:43, 6 February 2024 by Wincert (talk | contribs) (add)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

In a connected graph, closeness centrality (or closeness) of a node is a measure of centrality in a network, calculated as the reciprocal of the sum of the length of the shortest paths between the node and all other nodes in the graph. Thus, the more central a node is, the closer it is to all other nodes.

Distance and shortest path in a simple graph.
The number next to each node is the distance from that node to the square red node as measured by the length of the shortest path. The green edges illustrate one of the two shortest paths between the red square node and the red circle node. The closeness of the red square node is therefore 5/(1+1+1+2+2) = 5/7.

Closeness was defined by Bavelas (1950) as the reciprocal of the farness,[1][2] that is:

[math]\displaystyle{ C_B(x)= \frac{1}{\sum_y d(y,x)}, }[/math]

where [math]\displaystyle{ d(y,x) }[/math] is the distance (length of the shortest path) between vertices [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math]. This unnormalised version of closeness is sometimes known as status.[3][4][5] When speaking of closeness centrality, people usually refer to its normalized form which represents the average length of the shortest paths instead of their sum. It is generally given by the previous formula multiplied by [math]\displaystyle{ N-1 }[/math], where [math]\displaystyle{ N }[/math] is the number of nodes in the graph resulting in:

[math]\displaystyle{ C(x)= \frac{N-1}{\sum_y d(y,x)}. }[/math]

The normalization of closeness simplifies the comparison of nodes in graphs of different sizes. For large graphs, the minus one in the normalisation becomes inconsequential and it is often dropped.

As one of the oldest centrality measures, closeness is often given in general discussions of network centrality measures in introductory texts[6][7][8] or in articles comparing different centrality measures.[9][10][11][12] The values produced by many centrality measures can be highly correlated.[9][13][11] In particular, closeness and degree have been shown[12] to be related in many networks through an approximate relationship

[math]\displaystyle{ \frac{1}{C(x)} \approx \frac{-1}{\ln(z-1)} \ln(k_x) + \beta }[/math]

where [math]\displaystyle{ k_x }[/math] is the degree of vertex [math]\displaystyle{ x }[/math] while [math]\displaystyle{ z }[/math] and β are parameters found by fitting closeness and degree to this formula. The z parameter represents the branching factor, the average degree of nodes (excluding the root node and leaves) of the shortest-path trees used to approximate networks when demonstrating this relationship.[12] This is never an exact relationship but it captures a trend seen in many real-world networks.

Closeness is related to other length scales used in network science. For instance, the average shortest path length [math]\displaystyle{ \langle \ell \rangle }[/math], the average distance between vertices in a network, is simply the average of the inverse closeness values

[math]\displaystyle{ \langle \ell \rangle = \frac{1}{N} \sum_x \frac{1}{C(x)} = \frac{1}{N(N-1)} \sum_x \sum_y d(y,x) }[/math] .

Taking distances from or to all other nodes is irrelevant in undirected graphs, whereas it can produce totally different results in directed graphs (e.g. a website can have a high closeness centrality from outgoing links, but low closeness centrality from incoming links).

Applications

Closeness is used in many different contexts. In bibliometrics closeness has been used to look at the way academics choose their journals and bibliographies in different fields[14] or to measure the impact of an author on a field and their social capital.[15] When used to select potential leads in customer data, closeness has been seen to lead to a significant gain in the success rate.[16] The closeness of a city in an air transport network has been shown to be highly correlated with socio-economic indicators such as gross regional domestic product.[17] Closeness has also been applied to biological networks[5] where, for instance, this was used to identify more than 50% of the global regulators within the top 2% of the ranked genes[18] or essential genes were found to have higher closeness than nonessential genes in protein-interaction networks.[19] In a metabolic network the closeness of nodes can identify the most important metabolites.[20]

In disconnected graphs

When a graph is not strongly connected, Beauchamp introduced in 1965 the idea of using the sum of reciprocal of distances,[21] instead of the reciprocal of the sum of distances, with the convention [math]\displaystyle{ 1/\infty=0 }[/math]:

[math]\displaystyle{ H(x)= \sum_{y \neq x}\frac{n-1}{d(y,x)}. }[/math]

Beauchamp's modification follows the (much later in time) general principle proposed by Marchiori and Latora (2000)[22] that in graphs with infinite distances the harmonic mean behaves better than the arithmetic mean. Indeed, Bavelas's closeness can be described as the denormalized reciprocal of the arithmetic mean of distances, whereas Beauchamp's centrality is the reciprocal of the harmonic mean of distances.

This idea has resurfaced several time in the literature, often without the normalization factor [math]\displaystyle{ n-1 }[/math]: for undirected graphs under the name valued centrality by Dekker (2005)[23] and under the name harmonic centrality by Rochat (2009);[24] if was axiomatized by Garg (2009)[25] and proposed once again later by Opsahl (2010).[26] It was studied on general directed graphs by Boldi and Vigna (2014).[27] This idea is also quite similar to market potential proposed in Harris (1954)[28] which now often goes by the term market access.[29]

Variants

Dangalchev (2006),[30] in a work on network vulnerability proposes for undirected graphs a different definition:

[math]\displaystyle{ D(x)=\sum_{y\neq x}\frac{1}{2^{d(y,x)}}. }[/math]

This definition is used effectively for disconnected graphs and allows to create convenient formulae for graph operations. For example:

If graph [math]\displaystyle{ G_1 + G_2 }[/math] is created by linking node [math]\displaystyle{ p }[/math] of graph [math]\displaystyle{ G_1 }[/math] to node [math]\displaystyle{ q }[/math] of graph [math]\displaystyle{ G_2 }[/math] then the combined closeness is:

[math]\displaystyle{ D(G_1+G_2) = D(G_1) + D(G_2) + (1+D(p))(1+D(q)); }[/math]

if graph [math]\displaystyle{ G_1 + G_2 }[/math] is created by collapsing node [math]\displaystyle{ p }[/math] of graph [math]\displaystyle{ G_1 }[/math] and node [math]\displaystyle{ q }[/math] of graph [math]\displaystyle{ G_2 }[/math] into one node then the closeness is:[31]

[math]\displaystyle{ D(G_1+G_2) = D(G_1) + D(G_2) + 2 D(p) D(q). }[/math]

If graph [math]\displaystyle{ S(G) }[/math] is the shadow graph of graph [math]\displaystyle{ G }[/math], which has [math]\displaystyle{ n }[/math] nodes, then [math]\displaystyle{ S(G) }[/math] closeness is: [32]

[math]\displaystyle{ D(S(G)) = 4 D(G) + \frac{n}{2}. }[/math]

If graph [math]\displaystyle{ T(G) }[/math] is the thorn graph of graph [math]\displaystyle{ G }[/math], which has [math]\displaystyle{ n }[/math] nodes, then [math]\displaystyle{ T(G) }[/math] closeness is:[33]

[math]\displaystyle{ D(T(G)) = \frac{9}{4} D(G) + n. }[/math]

The natural generalization of this definition is:[34]

[math]\displaystyle{ D(x)=\sum_{y\neq x}\ {\alpha^{d(y,x)}}, }[/math]

where [math]\displaystyle{ \alpha }[/math] belongs to (0,1). As [math]\displaystyle{ \alpha }[/math] increases from 0 to 1, the generalized closeness changes from local characteristic (degree) to global (number of connected nodes).

The information centrality of Stephenson and Zelen (1989) is another closeness measure, which computes the harmonic mean of the resistance distances towards a vertex x, which is smaller if x has many paths of small resistance connecting it to other vertices.[35]

In the classic definition of the closeness centrality, the spread of information is modeled by the use of shortest paths. This model might not be the most realistic for all types of communication scenarios. Thus, related definitions have been discussed to measure closeness, like the random walk closeness centrality introduced by Noh and Rieger (2004). It measures the speed with which randomly walking messages reach a vertex from elsewhere in the graph.[36] Hierarchical closeness of Tran and Kwon (2014)[37] is an extended closeness centrality to deal still in another way with the limitation of closeness in graphs that are not strongly connected. The hierarchical closeness explicitly includes information about the range of other nodes that can be affected by the given node.

See also

References

  1. Bavelas, Alex (1950). "Communication Patterns in Task‐Oriented Groups". The Journal of the Acoustical Society of America 22 (6): 725–730. doi:10.1121/1.1906679. Bibcode1950ASAJ...22..725B. 
  2. Sabidussi, G (1966). "The centrality index of a graph". Psychometrika 31 (4): 581–603. doi:10.1007/bf02289527. PMID 5232444. 
  3. Harary, Frank (1959). "Status and Contrastatus". Sociometry 22 (1): 23–43. doi:10.2307/2785610. https://www.jstor.org/stable/2785610. 
  4. Hage, Per; Harary, Frank (1995). "Eccentricity and centrality in networks" (in en). Social Networks 17 (1): 57–63. doi:10.1016/0378-8733(94)00248-9. https://linkinghub.elsevier.com/retrieve/pii/0378873394002489. 
  5. 5.0 5.1 Wuchty, Stefan; Stadler, Peter F. (2003). "Centers of complex networks" (in en). Journal of Theoretical Biology 223 (1): 45–53. doi:10.1016/S0022-5193(03)00071-7. PMID 12782116. Bibcode2003JThBi.223...45W. https://linkinghub.elsevier.com/retrieve/pii/S0022519303000717. 
  6. Newman, M. E. J. (2010). Networks : an introduction. Oxford: Oxford University Press. ISBN 978-0-19-920665-0. OCLC 456837194. https://www.worldcat.org/oclc/456837194. 
  7. Latora, Vito (2017). Complex Networks : principles, methods and applications. Vincenzo Nicosia, Giovanni Russo. Cambridge, United Kingdom. ISBN 978-1-316-21600-2. OCLC 1004620089. https://www.worldcat.org/oclc/1004620089. 
  8. Cosia, Michele (2021). The Atlas for the Aspiring Network Scientist. ISBN 9788797282403. 
  9. 9.0 9.1 Bolland, John M (1988). "Sorting out centrality: An analysis of the performance of four centrality models in real and simulated networks". Social Networks 10 (3): 233–253. doi:10.1016/0378-8733(88)90014-7. 
  10. Brandes, Ulrik; Hildenbrand, Jan (2014). "Smallest graphs with distinct singleton centers" (in en). Network Science 2 (3): 416–418. doi:10.1017/nws.2014.25. ISSN 2050-1242. https://www.cambridge.org/core/product/identifier/S2050124214000253/type/journal_article. 
  11. 11.0 11.1 Schoch, David; Valente, Thomas W.; Brandes, Ulrik (2017). "Correlations among centrality indices and a class of uniquely ranked graphs" (in en). Social Networks 50: 46–54. doi:10.1016/j.socnet.2017.03.010. https://linkinghub.elsevier.com/retrieve/pii/S0378873316303690. 
  12. 12.0 12.1 12.2 Evans, Tim S.; Chen, Bingsheng (2022). "Linking the network centrality measures closeness and degree" (in en). Communications Physics 5 (1): 172. doi:10.1038/s42005-022-00949-5. ISSN 2399-3650. Bibcode2022CmPhy...5..172E. https://www.nature.com/articles/s42005-022-00949-5. 
  13. Valente, Thomas W.; Coronges, Kathryn; Lakon, Cynthia; Costenbader, Elizabeth (2008-01-01). "How Correlated Are Network Centrality Measures?". Connections (Toronto, Ont.) 28 (1): 16–26. ISSN 0226-1766. PMID 20505784. 
  14. Ni, Chaoqun; Sugimoto, Cassidy; Jiang, Jiepu (2011). Noyons, ED; Ngulube, Patrick; Leta, Jacqueline. eds. Degree, Closeness, and Betweenness: Application of group centrality measurements to explore macro-disciplinary evolution diachronically. pp. 605. https://www.issi-society.org/proceedings/issi_2011/ISSI_2011_Proceedings_Vol2_07.pdf. 
  15. Yan, Erjia; Ding, Ying (2009). "Applying centrality measures to impact analysis: A coauthorship network analysis" (in en). Journal of the American Society for Information Science and Technology 60 (10): 2107–2118. doi:10.1002/asi.21128. https://onlinelibrary.wiley.com/doi/10.1002/asi.21128. 
  16. Kiss, Christine; Bichler, Martin (2008). "Identification of influencers — Measuring influence in customer networks" (in en). Decision Support Systems 46 (1): 233–253. doi:10.1016/j.dss.2008.06.007. https://linkinghub.elsevier.com/retrieve/pii/S0167923608001231. 
  17. Wang, Jiaoe; Mo, Huihui; Wang, Fahui; Jin, Fengjun (2011). "Exploring the network structure and nodal centrality of China's air transport network: A complex network approach" (in en). Journal of Transport Geography 19 (4): 712–721. doi:10.1016/j.jtrangeo.2010.08.012. https://linkinghub.elsevier.com/retrieve/pii/S0966692310001328. 
  18. Koschützki, Dirk; Schreiber, Falk (2008). "Centrality Analysis Methods for Biological Networks and Their Application to Gene Regulatory Networks" (in en). Gene Regulation and Systems Biology 2: 193–201. doi:10.4137/GRSB.S702. ISSN 1177-6250. PMID 19787083. 
  19. Hahn, Matthew W.; Kern, Andrew D. (2005). "Comparative Genomics of Centrality and Essentiality in Three Eukaryotic Protein-Interaction Networks" (in en). Molecular Biology and Evolution 22 (4): 803–806. doi:10.1093/molbev/msi072. ISSN 1537-1719. PMID 15616139. https://academic.oup.com/mbe/article-lookup/doi/10.1093/molbev/msi072. 
  20. Ma, H.-W.; Zeng, A.-P. (2003-07-22). "The connectivity structure, giant strong component and centrality of metabolic networks" (in en). Bioinformatics 19 (11): 1423–1430. doi:10.1093/bioinformatics/btg177. ISSN 1367-4803. PMID 12874056. 
  21. Beauchamp, Murray (1965). "An Improved Index of Centrality". Behavioral Science 10 (2): 161–163. doi:10.1002/bs.3830100205. PMID 14284290. 
  22. Marchiori, Massimo; Latora, Vito (2000), "Harmony in the small-world", Physica A 285 (3–4): 539–546, doi:10.1016/s0378-4371(00)00311-3, Bibcode2000PhyA..285..539M 
  23. Dekker, Anthony (2005). "Conceptual Distance in Social Network Analysis". Journal of Social Structure 6 (3). http://www.cmu.edu/joss/content/articles/volume6/dekker/index.html. 
  24. Yannick Rochat. "Closeness centrality extended to unconnected graphs: The harmonic centrality index". Applications of Social Network Analysis, ASNA 2009. http://infoscience.epfl.ch/record/200525/files/%5bEN%5dASNA09.pdf. 
  25. Manuj Garg (2009), Axiomatic Foundations of Centrality in Networks, doi:10.2139/ssrn.1372441 
  26. Tore Opsahl (2010-03-20). "Closeness centrality in networks with disconnected components". http://toreopsahl.com/2010/03/20/closeness-centrality-in-networks-with-disconnected-components/. 
  27. Boldi, Paolo; Vigna, Sebastiano (2014), "Axioms for Centrality", Internet Mathematics 10 (3–4): 222–262, doi:10.1080/15427951.2013.865686 
  28. Harris, Chauncy D. (1954). "The Market as a Factor in the Localization of Industry in the United States". Annals of the Association of American Geographers 44 (4): 315–348. doi:10.2307/2561395. 
  29. Gutberlet, Theresa. Cheap Coal versus Market Access: The Role of Natural Resources and Demand in Germany's Industrialization. Working Paper. 2014.
  30. Ch, Dangalchev (2006). "Residual Closeness in Networks". Physica A 365 (2): 556. doi:10.1016/j.physa.2005.12.020. Bibcode2006PhyA..365..556D. 
  31. Ch, Dangalchev (2020). "Additional Closeness and Networks Growth". Fundamenta Informaticae 176 (1): 1–15. doi:10.3233/FI-2020-1960. 
  32. Dangalchev, Chavdar (2023). "Closeness of Some Graph Operations". arXiv:2308.14491 [cs.DM].
  33. Ch, Dangalchev (2018). "Residual Closeness of Generalized Thorn Graphs". Fundamenta Informaticae 162 (1): 1–15. doi:10.3233/FI-2018-1710. 
  34. Ch, Dangalchev (2011). "Residual Closeness and Generalized Closeness". International Journal of Foundations of Computer Science 22 (8): 1939–1948. doi:10.1142/s0129054111009136. 
  35. Stephenson, K. A.; Zelen, M. (1989). "Rethinking centrality: Methods and examples". Social Networks 11: 1–37. doi:10.1016/0378-8733(89)90016-6. 
  36. Noh, J. D.; Rieger, H. (2004). "Random Walks on Complex Networks". Phys. Rev. Lett. 92 (11): 118701. doi:10.1103/physrevlett.92.118701. PMID 15089179. Bibcode2004PhRvL..92k8701N. 
  37. Tran, Tien-Dzung; Kwon, Yung-Keun (2014). "Hierarchical closeness efficiently predicts disease genes in a directed signaling network". Computational Biology and Chemistry 53: 191–197. doi:10.1016/j.compbiolchem.2014.08.023. PMID 25462327.