Data Commons

From HandWiki
Short description: Knowledge repository integrating open datasets
Data Commons
Screenshot of a query in Data Commons
Results for a query in Data Commons
Founder(s)Ramanathan V. Guha
Key peoplePrem Ramaswami (Head of Data Commons)
ParentGoogle
Websitedatacommons.org
LaunchedMay 2018; 7 years ago (2018-05)

Data Commons is an open-source platform[1] created by Google[2] that provides an open knowledge graph, combining economic, scientific and other public datasets into a unified view.[3] Ramanathan V. Guha, a creator of web standards including RDF,[4] RSS, and Schema.org,[5] founded the project,[6] which is now led by Prem Ramaswami.[7]

The Data Commons website was launched in May 2018 with an initial dataset consisting of fact-checking data published in Schema.org "ClaimReview" format by several fact checkers from the International Fact-Checking Network.Cite error: Closing </ref> missing for <ref> tag In 2020 the service improved its coverage of non-US datasets, while also increasing its coverage of bioinformatics and coronavirus.[8] In 2023, the service relaunched with a natural-language front end powered by a large language model.[2] It also launched as the back end to the UN data portal with Sustainable Development Goals data.[9]

Features

Data Commons places more emphasis on statistical data than is common for linked data and knowledge graph initiatives. It includes geographical, demographic, weather and real estate data alongside other categories,[3] describing states, Congressional districts, and cities in the United States as well as biological specimens, power plants, and elements of the human genome via the Encyclopedia of DNA Elements (ENCODE) project.[10] It represents data as semantic triples each of which can have its own provenance.[3] It centers on the entity-oriented integration of statistical observations from a variety of public datasets. Although it supports a subset of the W3C SPARQL query language,[11] its APIs[12] also include tools — such as a Pandas dataframe interface — oriented towards data science, statistics and data visualization.

Data Commons is integrative, meaning that it does not provide a hosting platform for different datasets, but rather attempts to consolidate much of the information provided by the datasets into a single data graph.

Technology

Data Commons is built on a graph data-model. The graph can be accessed through a browser interface and several APIs,[3][10] and is expanded through loading data (typically CSV and MCF-based templates).[13] The graph can be accessed by natural language queries in Google Search.[14] The data vocabulary used to define the datacommons.org graph is based upon Schema.org.[3] In particular the Schema.org terms StatisticalPopulation[15] and Observation[16] were proposed to Schema.org to support datacommons-like use cases.[17]

Software from the project is available on GitHub under Apache 2 license.[18]

References

  1. "Custom Data Commons". https://docs.datacommons.org/custom_dc/. 
  2. 2.0 2.1 "Data Commons is using AI to make the world's public data more accessible and helpful" (in en-us). Google. 13 September 2023. https://blog.google/technology/ai/google-data-commons-ai/. 
  3. 3.0 3.1 3.2 3.3 3.4 Fensel, Dieter; Şimşek, Umutcan; Angele, Kevin; Huaman, Elwin; Kärle, Elias; Panasiuk, Oleksandra; Toma, Ioan; Umbrich, Jürgen et al. (2020), "Introduction: What Is a Knowledge Graph?" (in en), Knowledge Graphs (Cham: Springer International Publishing): pp. 1–10, doi:10.1007/978-3-030-37439-6_1, ISBN 978-3-030-37438-9, http://link.springer.com/10.1007/978-3-030-37439-6_1, retrieved 2020-10-16 
  4. Guns, Raf (2013). "Tracing the origins of the semantic web". Journal of the American Society for Information Science and Technology 64 (10): 2173–2181. doi:10.1002/asi.22907. 
  5. Funke, Daniel (7 December 2017). "This website helps you find related fact checks - and it was built by a 17-year-old". Poynter. https://www.poynter.org/fact-checking/2017/this-website-helps-you-find-related-fact-checks-%C2%97-and-it-was-built-by-a-17-year-old/. 
  6. Guha, Ramanathan V. (15 October 2020). "Data Commons, now accessible on Google Search". https://docs.datacommons.org/2020/10/15/search_launch.html. 
  7. O'Donnell, James (12 September 2024). "Google's new tool lets large language models fact-check their responses" (in en). MIT Technology Review. https://www.technologyreview.com/2024/09/12/1103926/googles-new-tool-lets-large-language-models-fact-check-their-responses/. 
  8. Ramasubramanian, Sowmya (21 September 2020). "Google's open source data to study impact of COVID-19". The Hindu. https://www.thehindu.com/sci-tech/technology/googles-open-source-data-to-study-impact-of-covid-19/article32660642.ece. 
  9. Manyika, James (19 September 2023). "Using data and AI to track progress toward the UN Global Goals" (in en-us). Google. https://blog.google/technology/ai/google-ai-data-un-global-goals/. 
  10. 10.0 10.1 Cite error: Invalid <ref> tag; no text was provided for refs named :1
  11. "Query the Data Commons Knowledge Graph using SPARQL". https://docs.datacommons.org/api/python/query.html. 
  12. "Overview". https://docs.datacommons.org/api/. 
  13. "Contributing to Data Commons – Adding datasets". Data Commons. https://docs.datacommons.org/contributing/adding_datasets.html. 
  14. Guha, Ramanathan V. (15 October 2020). "Data Commons, now accessible on Google Search". https://docs.datacommons.org/2020/10/15/search_launch.html. 
  15. "StatisticalPopulation type at Schema.org". https://schema.org/StatisticalPopulation. 
  16. "Observation type at Schema.org". https://schema.org/Observation. 
  17. "Proposal for representing Aggregate Statistical Data". 25 June 2019. https://github.com/schemaorg/schemaorg/issues/2291. 
  18. "datacommons.org GitHub". https://github.com/datacommonsorg/.