Relationship extraction

From HandWiki

A relationship extraction task requires the detection and classification of semantic relationship mentions within a set of artifacts, typically from text or XML documents. The task is very similar to that of information extraction (IE), but IE additionally requires the removal of repeated relations (disambiguation) and generally refers to the extraction of many different relationships.

Concept and applications

The concept of relationship extraction was first introduced during the 7th Message Understanding Conference in 1998.[1] Relationship extraction involves the identification of relations between entities and it usually focuses on the extraction of binary relations.[2] Application domains where relationship extraction is useful include gene-disease relationships,[3] protein-protein interaction[4] etc.

Current relationship extraction studies use machine learning technologies, which approach relationship extraction as a classification problem.[1] Never-Ending Language Learning is a semantic machine learning system developed by a research team at Carnegie Mellon University that extracts relationships from the open web.

Approaches

There are several methods used to extract relationships and these include text-based relationship extraction. These methods rely on the use of pretrained relationship structure information or it could entail the learning of the structure in order to reveal relationships.[5] Another approach to this problem involves the use of domain ontologies.[6][7] There is also the approach that involves visual detection of meaningful relationships in parametric values of objects listed on a data table that shift positions as the table is permuted automatically as controlled by the software user. The poor coverage, rarity and development cost related to structured resources such as semantic lexicons (e.g. WordNet, UMLS) and domain ontologies (e.g. the Gene Ontology) has given rise to new approaches based on broad, dynamic background knowledge on the Web. For instance, the ARCHILES technique[8] uses only Wikipedia and search engine page count for acquiring coarse-grained relations to construct lightweight ontologies.

The relationships can be represented using a variety of formalisms/languages. One such representation language for data on the Web is RDF.

More recently, end-to-end systems which jointly learn to extract entity mentions and their semantic relations have been proposed with strong potential to obtain high performance.[9]

Most of the reported systems have demonstrated their approach on English datasets. However, data and systems have been described for other languages, e.g., Russian[10] and Vietnamese.[11]

Datasets

Researchers have constructed multiple datasets for benchmarking relationship extraction methods.[12] One such dataset was the document-level relationship extraction dataset called DocRED released in 2019. It uses relations from Wikidata and text from the English Wikipedia.[12] The dataset has been used by other researchers and a prediction competition has been setup at CodaLab.[13][14]

See also

References

  1. 1.0 1.1 Ning, Huansheng (2019) (in en). Cyberspace Data and Intelligence, and Cyber-Living, Syndrome, and Health: International 2019 Cyberspace Congress, CyberDI and CyberLife, Beijing, China, December 16–18, 2019, Proceedings, Part II. Singapore: Springer Nature. pp. 260. ISBN 978-981-15-1924-6. 
  2. Nasar, Zara; Jaffry, Syed Waqar; Malik, Muhammad Kamran (2021-02-11). "Named Entity Recognition and Relation Extraction: State-of-the-Art". ACM Computing Surveys 54 (1): 20:1–20:39. doi:10.1145/3445965. ISSN 0360-0300. https://doi.org/10.1145/3445965. 
  3. Hong-Woo Chun; Yoshimasa Tsuruoka; Jin-Dong Kim; Rie Shiba; Naoki Nagata; Teruyoshi Hishiki; Jun-ichi Tsujii (2006). "Pacific Symposium on Biocomputing". 
  4. Minlie Huang and Xiaoyan Zhu and Yu Hao and Donald G. Payan and Kunbin Qu and Ming Li (2004). "Discovering patterns to extract protein-protein interactions from full texts". Bioinformatics 20 (18): 3604–3612. doi:10.1093/bioinformatics/bth451. PMID 15284092. 
  5. Tickoo, Omesh; Iyer, Ravi (2016) (in en). Making Sense of Sensors: End-to-End Algorithms and Infrastructure Design from Wearable-Devices to Data Centers. Portland: Apress. pp. 68. ISBN 978-1-4302-6592-4. 
  6. T.C.Rindflesch and L.Tanabe and J.N.Weinstein and L.Hunter (2000). "EDGAR: Extraction of drugs, genes, and relations from the biomedical literature". pp. 514–525. 
  7. C. Ramakrishnan and K. J. Kochut and A. P. Sheth (2006). "A Framework for Schema-Driven Relationship Discovery from Unstructured Text". pp. 583–596. http://knoesis.wright.edu/library/resource.php?id=00116. 
  8. W. Wong and W. Liu and M. Bennamoun (2009). "Acquiring Semantic Relations using the Web for Constructing Lightweight Ontologies". doi:10.1007/978-3-642-01307-2_26. 
  9. Dat Quoc Nguyen and Karin Verspoor (2019). "End-to-end neural relation extraction using deep biaffine attention". doi:10.1007/978-3-030-15712-8_47. 
  10.  , Wikidata Q104419957
  11.  , Wikidata Q104418048
  12. 12.0 12.1  , Wikidata Q104419388
  13.  , Wikidata Q104417795
  14. "DocRED. Competition. CodaLab". https://competitions.codalab.org/competitions/20717.