Webgraph

From HandWiki

A webgraph is a set of directed links between pages of the World Wide Web. A graph, in general, consists of several vertices, some pairs connected by edges. In a directed graph, edges are directed lines or arcs. The webgraph is a directed graph, whose vertices correspond to the pages of the WWW, and a directed edge connects page X to page Y if there exists a hyperlink on page X, referring to page Y.[1]

Properties

  • The degree distribution of the webgraph strongly differs from the degree distribution of the classical random graph model, the Erdős–Rényi model:[2] in the Erdős–Rényi model, there are very few large degree nodes, relative to the webgraph's degree distribution. The precise distribution is unclear,[3] however: it is relatively well described by a lognormal distribution, as well as the Barabási–Albert model for power laws.[4][5]
  • The webgraph is an example of a scale-free network.

Applications

The webgraph is used for:

  • computing the PageRank[6] of the world wide web's pages;
  • computing the personalized PageRank;[7]
  • detecting webpages of similar topics, through graph-theoretical properties only, like co-citation;[8]
  • and identifying hubs and authorities in the web for HITS algorithm.

References

  1. Manning, Christopher D.; Raghavan, Prabhakar; Schütze, Hinrich (2008). "The web graph". Introduction to Information Retrieval. Cambridge University Press. https://nlp.stanford.edu/IR-book/html/htmledition/the-web-graph-1.html. 
  2. Erdős, Paul; Rényi, Alfréd (1960). "On the evolution of random graphs". Publication of the Mathematical Institute of the Hungarian Academy of Sciences 5: 17-61. https://static.renyi.hu/~p_erdos/1960-10.pdf. 
  3. Meusel, R.; Vigna, S.; Lehmberg, O.; Bizer, C. (2015). "The Graph Structure in the Web - Analyzed on Different Aggregation Levels". Journal of Web Science 1 (1): 33–47. doi:10.1561/106.00000003. https://air.unimi.it/bitstream/2434/372411/2/Vigna_JWebScience_2015.pdf. 
  4. Clauset, A.; Shalizi, C. R.; Newman, M. E. J. (2009). "Power-law distributions in empirical data". SIAM Rev. 51 (4): 661–703. doi:10.1137/070710111. Bibcode2009SIAMR..51..661C. 
  5. Barabási, Albert-László; Albert, Réka (October 1999). "Emergence of scaling in random networks". Science 286 (5439): 509–512. doi:10.1126/science.286.5439.509. PMID 10521342. Bibcode1999Sci...286..509B. http://www.nd.edu/~networks/Publication%20Categories/03%20Journal%20Articles/Physics/EmergenceRandom_Science%20286,%20509-512%20(1999).pdf. .
  6. Brin, Sergey; Page, Lawrence (1998-04-01). "The anatomy of a large-scale hypertextual Web search engine". Computer Networks and ISDN Systems. Proceedings of the Seventh International World Wide Web Conference 30 (1): 107–117. doi:10.1016/S0169-7552(98)00110-X. ISSN 0169-7552. http://infolab.stanford.edu/~backrub/google.html. 
  7. Glen Jeh and Jennifer Widom. 2003. Scaling personalized web search. In Proceedings of the 12th international conference on World Wide Web (WWW '03). ACM, New York, NY, USA, 271–279. doi:10.1145/775152.775191
  8. Kumar, Ravi; Raghavan, Prabhakar; Rajagopalan, Sridhar; Tomkins, Andrew (1999). "Trawling the Web for emerging cyber-communities". Computer Networks 31 (11–16): 1481–1493. doi:10.1016/S1389-1286(99)00040-7.