Confusion network

From HandWiki
Short description: Natural language processing method

A confusion network (sometimes called a word confusion network or informally known as a sausage) is a natural language processing method that combines outputs from multiple automatic speech recognition or machine translation systems.[1][2] Confusion networks are simple linear directed acyclic graphs with the property that each a path from the start node to the end node goes through all the other nodes. The set of words represented by edges between two nodes is called a confusion set. In machine translation, the defining characteristic of confusion networks is that they allow multiple ambiguous inputs, deferring committal translation decisions until later stages of processing.[3][4] This approach is used in the open source machine translation software Moses[5] and the proprietary translation API in IBM Bluemix Watson.[6]

Example of a confusion network

References

  1. Rosti, Antti-Veikko I.; Zhang, Bing; Matsoukas, Spyros; Schwartz, Richard (2008). "Incremental Hypothesis Alignment for Building Confusion Networks with Application to Machine Translation System Combination". Proceedings of the Third Workshop on Statistical Machine Translation. StatMT '08 (Stroudsburg, PA, USA: Association for Computational Linguistics): 183–186. ISBN 9781932432091. http://dl.acm.org/citation.cfm?id=1626394.1626423. 
  2. Matusov, Evgeny; Ueffing, Nicola; Ney, Hermann (2006). "Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment". In Proc. EACL. 
  3. Hoang, Hieu (2007). "Factored translation models". In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL: 868–876. 
  4. Koehn, Philipp; Hoang, Hieu; Birch, Alexandra; Callison-Burch, Chris; Federico, Marcello; Bertoldi, Nicola; Cowan, Brooke; Shen, Wade et al. (2007). "Moses: Open Source Toolkit for Statistical Machine Translation". Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. ACL '07 (Stroudsburg, PA, USA: Association for Computational Linguistics): 177–180. doi:10.3115/1557769.1557821. http://dl.acm.org/citation.cfm?id=1557769.1557821. 
  5. "Moses - Moses/ConfusionNetworks". http://www.statmt.org/moses/?n=Moses.ConfusionNetworks. 
  6. "IBM® Speech to Text service provides an API Reference | IBM Watson Developer Cloud" (in en). https://www.ibm.com/watson/developercloud/speech-to-text/api/v1/. "A confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as "Confusion Networks"). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if you omit the parameter."