Exponential random graph models
Network science | ||||
---|---|---|---|---|
Network types | ||||
Graphs | ||||
|
||||
Models | ||||
|
||||
| ||||
Exponential random graph models (ERGMs) are a family of statistical models for analyzing data about social and other networks.[1] Examples of networks examined using ERGM include knowledge networks,[2] organizational networks,[3] colleague networks,[4] social media networks, networks of scientific development,[5] and others.
Background
Many metrics exist to describe the structural features of an observed network such as the density, centrality, or assortativity.[6][7] However, these metrics describe the observed network which is only one instance of a large number of possible alternative networks. This set of alternative networks may have similar or dissimilar structural features. To support statistical inference on the processes influencing the formation of network structure, a statistical model should consider the set of all possible alternative networks weighted on their similarity to an observed network. However because network data is inherently relational, it violates the assumptions of independence and identical distribution of standard statistical models like linear regression.[8][9] Alternative statistical models should reflect the uncertainty associated with a given observation, permit inference about the relative frequency about network substructures of theoretical interest, disambiguating the influence of confounding processes, efficiently representing complex structures, and linking local-level processes to global-level properties.[10] Degree-preserving randomization, for example, is a specific way in which an observed network could be considered in terms of multiple alternative networks.
Definition
The Exponential family is a broad family of models for covering many types of data, not just networks. An ERGM is a model from this family which describes networks.
Formally a random graph [math]\displaystyle{ Y \in \mathcal{Y} }[/math] consists of a set of [math]\displaystyle{ n }[/math] nodes and [math]\displaystyle{ m }[/math] dyads (edges) [math]\displaystyle{ \{ Y_{ij}: i=1,\dots,n; j=1,\dots,n\} }[/math] where [math]\displaystyle{ Y_{ij}=1 }[/math] if the nodes [math]\displaystyle{ (i,j) }[/math] are connected and [math]\displaystyle{ Y_{ij}=0 }[/math] otherwise.
The basic assumption of these models is that the structure in an observed graph [math]\displaystyle{ y }[/math] can be explained by a given vector of sufficient statistics [math]\displaystyle{ s(y) }[/math] which are a function of the observed network and, in some cases, nodal attributes. This way, it is possible to describe any kind of dependence between the undyadic variables:
[math]\displaystyle{ P(Y = y | \theta) = \frac{\exp(\theta^{T} s(y))}{c(\theta)},\quad\forall y\in\mathcal{Y} }[/math]
where [math]\displaystyle{ \theta }[/math] is a vector of model parameters associated with [math]\displaystyle{ s(y) }[/math] and [math]\displaystyle{ c(\theta) = \sum_{y'\in\mathcal{Y}}\exp(\theta^{T} s(y')) }[/math] is a normalising constant.
These models represent a probability distribution on each possible network on [math]\displaystyle{ n }[/math] nodes. However, the size of the set of possible networks for an undirected network (simple graph) of size [math]\displaystyle{ n }[/math] is [math]\displaystyle{ 2^{n(n-1)/2} }[/math]. Because the number of possible networks in the set vastly outnumbers the number of parameters which can constrain the model, the ideal probability distribution is the one which maximizes the Gibbs entropy.[11]
References
- ↑ Harris, Jenine K (2014) (in English). An introduction to exponential random graph modeling. ISBN 9781452220802. OCLC 870698788.
- ↑ Brennecke, Julia; Rank, Olaf (2017-05-01). "The firm's knowledge network and the transfer of advice among corporate inventors—A multilevel network study". Research Policy 46 (4): 768–783. doi:10.1016/j.respol.2017.02.002. ISSN 0048-7333.
- ↑ Harris, Jenine K (2013). "Communication Ties Across the National Network of Local Health Departments" (in English). AMEPRE American Journal of Preventive Medicine 44 (3): 247–253. doi:10.1016/j.amepre.2012.10.028. ISSN 0749-3797. OCLC 4937103196. PMID 23415121.
- ↑ Brennecke, Julia (2019). "Dissonant Ties in Intraorganizational Networks: Why Individuals Seek Problem-Solving Assistance from Difficult Colleagues" (in English). AMJ Academy of Management Journal. ISSN 0001-4273. OCLC 8163488129.
- ↑ Harris, Jenine K; Luke, Douglas A; Shelton, Sarah C; Zuckerman, Rachael B (2009). "Forty Years of Secondhand Smoke Research. The Gap Between Discovery and Delivery". American Journal of Preventive Medicine 36 (6): 538–548. doi:10.1016/j.amepre.2009.01.039. ISSN 0749-3797. OCLC 6980180781. PMID 19372026.
- ↑ Wasserman, Stanley; Faust, Katherine (1994). Social Network Analysis: Methods and Applications. ISBN 978-0-521-38707-1.
- ↑ Newman, M.E.J. (2003). "The Structure and Function of Complex Networks". SIAM Review 45 (2): 167–256. doi:10.1137/S003614450342480. Bibcode: 2003SIAMR..45..167N.
- ↑ Contractor, Noshir; Wasserman, Stanley; Faust, Katherine (2006). "Testing Multitheoretical, Multilevel Hypotheses About Organizational Networks: An Analytic Framework and Empirical Example". Academy of Management Review 31 (3): 681–703. doi:10.5465/AMR.2006.21318925. https://pdfs.semanticscholar.org/aff3/68453aa6258c34d5aaeba0fd4a49a864b889.pdf.
- ↑ Harris, Jenine K (2014) (in English). An introduction to exponential random graph modeling. ISBN 9781452220802. OCLC 870698788.
- ↑ Robins, G.; Pattison, P.; Kalish, Y.; Lusher, D. (2007). "An introduction to exponential random graph models for social networks". Social Networks 29 (2): 173–191. doi:10.1016/j.socnet.2006.08.002.
- ↑ Newman, M.E.J. (2010-03-25). "Other Network Models". Networks. pp. 565–585. ISBN 978-0-19-920665-0.
Further reading
- Byshkin, M.; Stivala, A.; Mira, A.; Robins, G.; Lomi, A. (2018). "Fast Maximum Likelihood Estimation via Equilibrium Expectation for Large Network Data". Scientific Reports 8 (1): 11509. doi:10.1038/s41598-018-29725-8. PMID 30065311. Bibcode: 2018NatSR...811509B.
- Caimo, A.; Friel, N (2011). "Bayesian inference for exponential random graph models". Social Networks 33: 41–55. doi:10.1016/j.socnet.2010.09.004.
- Erdős, P.; Rényi, A (1959). "On random graphs". Publicationes Mathematicae 6: 290–297.
- Fienberg, S. E.; Wasserman, S. (1981). "Discussion of An Exponential Family of Probability Distributions for Directed Graphs by Holland and Leinhardt". Journal of the American Statistical Association 76 (373): 54–57. doi:10.1080/01621459.1981.10477600.
- Frank, O.; Strauss, D (1986). "Markov Graphs". Journal of the American Statistical Association 81 (395): 832–842. doi:10.2307/2289017.
- Handcock, M. S.; Hunter, D. R.; Butts, C. T.; Goodreau, S. M.; Morris, M. (2008). "statnet: Software Tools for the Representation, Visualization, Analysis and Simulation of Network Data". Journal of Statistical Software 24 (1): 1–11. doi:10.18637/jss.v024.i01. PMID 18618019.
- Harris, Jenine K (2014). An introduction to exponential random graph modeling. Sage.[1]
- Hunter, D. R.; Goodreau, S. M.; Handcock, M. S. (2008). "Goodness of Fit of Social Network Models". Journal of the American Statistical Association 103 (481): 248–258. doi:10.1198/016214507000000446.
- Hunter, D. R; Handcock, M. S. (2006). "Inference in curved exponential family models for networks". Journal of Computational and Graphical Statistics 15 (3): 565–583. doi:10.1198/106186006X133069.
- Hunter, D. R.; Handcock, M. S.; Butts, C. T.; Goodreau, S. M.; Morris, M. (2008). "ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks". Journal of Statistical Software 24 (3): 1–29. doi:10.18637/jss.v024.i03.
- Jin, I.H.; Liang, F. (2012). "Fitting social networks models using varying truncation stochastic approximation MCMC algorithm". Journal of Computational and Graphical Statistics 22 (4): 927–952. doi:10.1080/10618600.2012.680851.
- Koskinen, J. H.; Robins, G. L.; Pattison, P. E. (2010). "Analysing exponential random graph (p-star) models with missing data using Bayesian data augmentation". Statistical Methodology 7 (3): 366–384. doi:10.1016/j.stamet.2009.09.007. https://www.research.manchester.ac.uk/portal/en/publications/analysing-exponential-random-graph-pstar-models-with-missing-data-using-bayesian-data-augmentation(36299696-ce3a-4b8d-9048-295da632fc49).html.
- Morris, M.; Handcock, M. S.; Hunter, D. R. (2008). "Specification of Exponential-Family Random Graph Models: Terms and Computational Aspects". Journal of Statistical Software 24 (4): 1548–7660. doi:10.18637/jss.v024.i04. PMID 18650964.
- Rinaldo, A.; Fienberg, S. E.; Zhou, Y. (2009). "On the geometry of descrete exponential random families with application to exponential random graph models". Electronic Journal of Statistics 3: 446–484. doi:10.1214/08-EJS350.
- Robins, G.; Snijders, T.; Wang, P.; Handcock, M.; Pattison, P (2007). "Recent developments in exponential random graph (p*) models for social networks". Social Networks 29 (2): 192–215. doi:10.1016/j.socnet.2006.08.003. https://pure.rug.nl/ws/files/10249122/RobinsGL-Recent-2007.pdf.
- Schweinberger, Michael (2011). "Instability, sensitivity, and degeneracy of discrete exponential families". Journal of the American Statistical Association 106 (496): 1361–1370. doi:10.1198/jasa.2011.tm10747. PMID 22844170.
- Schweinberger, Michael; Handcock, Mark (2015). "Local dependence in random graph models: characterization, properties and statistical inference". Journal of the Royal Statistical Society, Series B 77 (3): 647–676. doi:10.1111/rssb.12081. PMID 26560142.
- Schweinberger, Michael; Stewart, Jonathan (2020). "Concentration and consistency results for canonical and curved exponential-family models of random graphs". The Annals of Statistics 48 (1): 374–396. doi:10.1214/19-AOS1810.
- Snijders, T. A. B. (2002). "Markov chain Monte Carlo estimation of exponential random graph models". Journal of Social Structure 3. http://www.cmu.edu/joss/content/articles/volume3/Snijders.pdf.
- Snijders, T. A. B.; Pattison, P. E.; Robins, G. L.; Handcock, M. S. (2006). "New specifications for exponential random graph models". Sociological Methodology 36: 99–153. doi:10.1111/j.1467-9531.2006.00176.x.
- Strauss, D; Ikeda, M (1990). "Pseudolikelihood estimation for social networks". Journal of the American Statistical Association 5 (409): 204–212. doi:10.2307/2289546. https://zenodo.org/record/1235101.
- van Duijn, M. A.; Snijders, T. A. B.; Zijlstra, B. H. (2004). "p2: a random effects model with covariates for directed graphs". Statistica Neerlandica 58 (2): 234–254. doi:10.1046/j.0039-0402.2003.00258.x.
- van Duijn, M. A. J.; Gile, K. J.; Handcock, M. S. (2009). "A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models". Social Networks 31 (1): 52–62. doi:10.1016/j.socnet.2008.10.003. PMID 23170041.
- ↑ Harris, Jenine K (2014) (in English). An introduction to exponential random graph modeling. ISBN 9781452220802. OCLC 870698788.