MedSLT

From HandWiki

MedSLT is a medium-ranged open source spoken language translator developed by the University of Geneva. It is funded by the Swiss National Science Foundation. The system has been designed for the medical domain. It currently covers the doctor-patient diagnosis dialogues for the domains of headache, chest and abdominal pain in English, French, Japanese, Spanish, Catalan and Arabic. The vocabulary used ranges from 350 to 1000 words depending on the domain and language pair.[1]

Motivation for creating MedSLT

With more than 6000 languages worldwide, language barriers become an increasing problem for healthcare. The lack of medical interpreters can lead to disastrous consequences. These range from prolonged hospital stays to wrong diagnosis and medication. A study found that only about half of the 23 million people with limited proficiency in English in the United States had been provided with a medical interpreter. Millions of refugees and immigrants worldwide face similar problems, although not always as severe. The gap between need and availability of language services might be closed with speech translation systems.[2]

Challenges

The biggest challenge is and was to develop an ideal system, though it is not possible to do so at this moment. This system would fit the needs of doctors and the patients alike, and would provide accurate and flexible translation. A realisation of an ideal translation tool is impossible without the use of unrestricted language and a large vocabulary.

Medical professionals demand high reliability from translation. This favours rule-based architectures over data-driven. The latter are more suitable for inexperienced users. Rule-based architectures achieve higher accuracy especially if used by experts.

Though it is highly desirable to build a bidirectional system supporting a two-way dialogue, which concentrates on patient-centered communication, the patients will have difficult access to the system. Most patients have no experience with such systems. Less reliable results for translation from the patient-to-doctor direction are the outcome. To overcome this the system needs to provide either easy access or an integrated help tool to guide the users through the process.

Although controlled rule-based systems achieve good results, they are brittle. To receive good translations the user needs to be familiar with the system and has to know what is covered by the grammar.

Covering different sub-domains (headache, chest and abdominal pain) and language pairs presents additional problems. A shared structure and grammar for all subdomains and language pairs minimises development and maintenance costs. The integration of new doctor and patient languages is also a key challenge. Adding new languages should be quick and rather simple, because he system has to be used in many countries to cover multiple language pairs. Direct translation from source to target language proves to be rather difficult. Using interlingua for unidirectional translation instead of a bidirectional approach helps to simplify the translation process.

On top of this, the system has to run on different platforms, because mobility is a key issue for many attending physicians. A portable version addresses these issues, but has to deal with the heavy load of the translation process.[2][3][4][5][6]

The MedSLT system

The system's speech recognition is based on the Nuance 8.5 platform that supports grammar-based language models. All grammars used for recognition, analysis and generation are compiled from a small set of unification grammars.

These core grammars are created by the open-source Regulus Grammar Compiler and are automatically specialised using corpus-driven methods. The specialisation considers both the task (recognition, analysis and generation) and the sub-domain (headache, chest and abdominal pain).

The specialisation uses the explanation-based learning algorithm to create a treebank from the training corpus. These examples are divided into sets of subtrees by using domain- and grammar-specific rules (also known as "operationality criteria" in machine translation).

The subtree rules are combined into a single rule, creating a specialised unification grammar. The grammar is compiled to an executable form, for analysis and generation by a parser or generator, and for recognition of a CFG grammar. A CFG grammar is required for the Nuance engine.

Compilation by Nuance-specific criteria turns the grammar into speech recognition packages. The final step uses the training corpus again for statistical tuning of the language model.

MedSLT translation processes are based on a rule-based interlingua. The interlingua is treated as an actual language (it is a very simple version of English) and is specified by a Regulus grammar. This grammar does not take account of complex surface syntax phenomena of real languages like movement or agreement. A set of rules is the base for translating the source language semantic representation to interlingua.

Another set of rules covers the translation from interlingua to the target language. The semantic representations are converted to surface words using a target language grammar.

Defining semantics for a specific domain enables the developers to specify interlingua with a small, tightly constraint semantic grammar. The translations based on interlingua match direct translations almost perfectly, because the development shifts to a decoupled monolingual architecture.

A set of combined interlingua corpora, with one corpus per sub-domain, is the core of this architecture. All source language development corpora are translated to interlingua. These are sorted and grouped together with the corresponding source language examples.

The interlingua forms are then translated into each target language, and the results are attached together. This organisation improves the translation process. There is no duplicated effort for multilingual regression testing, because each parsing and generation step is performed once. This allows more frequent testing.

The representation language used for all forms is Almost Flat Functional semantics. AFF is derived from the Spoken Language Translator, the precursor of MEdSLT.

SLT uses Quasi Logical Form, a logical based representation language. QLF is an expressive yet very complex language, causing high development and maintenance costs.

A minimal solution was planned for the medical translator. Early versions of the system utilised a language using simple feature-value lists. These lists were supplemented with an optional level of nesting to represent subordinate clauses (i.e. embedded clauses).

Determiners were not included, because they are hard to translate and it is difficult to reliably distinguish and recognise them. This way, translation rules became a lot simpler, because only a list of feature-value pairs had to be mapped to another list of pairs. The language turned out to be underconstrained.

Adding natural sortal constraints to the grammar solved this problem, but also returned the language to a more expressive formalism. The newly created AFF combines elements of QLF and the feature-value list semantics. This version of flat semantics is enhanced with additional functional markings. This together with a relatively small vocabulary solved the ambiguity problem of the original flat representation language without creating overly complex rules.

In addition, the syntactic structures are treated carefully by a compromise of linguistic and engineering traditions.

The grammars are in fact retrieved from linguistically motivated resource, using corpus-based methods. They are driven by small sets of examples. This results in simpler and flatter domain-specific grammars.

The semantics are less sophisticated and represent a minimal approach in the engineering tradition. Each lexical item contributes a set of feature-value pairs.

This leads to simple-to-write translation rules. There are only lists of features-value pairs to map to other feature-value pairs. However, as a result the machine translation channel model becomes underspecified and is weakened, whereas the target language model is strengthened.

An intelligent help module is integrated into the system to support users in utilising the full coverage of the grammars. This tool provides the user with examples as close as possible to the users original utterance.

The output is based on a library. Each sub-domain and language pair has its own library. The contents are extracted from the combined interlingua corpora. The help module scans the corpus for the tagged source language form mapped with the corresponding target language form.

Additionally a second statistical recogniser is used as backup. The results are used to select similar examples from the library.

According to the generation preferences, one of the derived strings is picked and the target language string is realised as spoken language.

Some statistical corpus based methods are used to tune the system further.[1][2][3][4][5][6]

MedSLT on a PDA

As a response to demands from healthcare professionals, a mobile version of MedSLT was developed. The hand-held platform uses the same architecture as the normal one.

The heavy processing necessary for translation is done on a remote machine. Apart from a wireless connection, a good microphone is needed to reach the same results as the original version.[1][2]

References

  1. 1.0 1.1 1.2 "MedSLT - the medical speech translator". http://www.issco.unige.ch/en/research/projects/medslt/about.html. 
  2. 2.0 2.1 2.2 2.3 "Many-to-Many Multilingual Medical Speech Translation on a PDA" by P. Bouillon, G. Flores, M. Georgescul, S. Halimi, B. A. Hockey, H. Isahara, K. Kanzaki, Y. Nakao, M. Rayner, M. Santaholma, M. Starlander, N. Tsourakis in The Eighth Conference of the Association for Machine Translation in the Americas. Waikiki, Hawaii. 2008
  3. 3.0 3.1 "Multilingual Grammar Resources in Multilingual Application Development" by M. Santaholma in Proceedings of Workshop on Grammar Engineering Across Frameworks, GEAF. Manchester, UK. 2008
  4. 4.0 4.1 "A Small-Vocabulary Shared Task for Medical Speech Translation" by M. Rayner, P. Bouillon, G. Flores, F. Ehsani, M. Starlander, B. A. Hockey, J. Brotanek and L. Biewald in Proceedings of Coling 2008 Workshop on Speech Processing for Safety Critical Translation and Pervasive Applications, Manchester, UK. 2008
  5. 5.0 5.1 "The 2008 MedSLT System" by M. Rayner, P. Bouillon, J. Brotanek, G. Flores, S. Halimi, B. A. Hockey, H. Isahara, K. Kanzaki, E. Kron, Y. Nakao, M. Santaholma, M. Starlander, N. Tsourakis in Proceedings of Coling 2008 Workshop on Speech Processing for Safety Critical Translation and Pervasive Applications, Manchester, UK. 2008
  6. 6.0 6.1 "Almost Flat Functional Semantics for Speech Translation" by M. Rayner, P. Bouillon, B. A. Hockey and Y. Nakao in Proceedings of Coling 2008, Manchester, UK. 2008

External links