TraMOOC Project

From HandWiki

TraMOOC (Translation for Massive Open Online Courses) is a Horizon 2020 collaborative project that developed the first machine translation (MT) for Massive Open Online Courses (MOOCs). The main result of the project is an online translation platform, which utilized a wide set of linguistic infrastructure tools and resources in order to provide accurate and coherent translation of multi-genre and heterogeneous textual course material included in MOOCs from English into eleven European and BRIC languages (BG, CS, DE, EL, HR, IT, NL, PL , PT, RU, ZH). These target languages constitute strong use cases as they are hard to translate into and lack MT support.[1]

Overview

According to 2013 statistics, more than 200 universities around the globe are involved in the creation of Massive Open Online Courses (MOOCs), with the participation of more than 1300 instructors, more than 1200 courses on offer and around 10 million users being actively enrolled.[2] However, the vast majority of these courses are offered in English rendering them inaccessible to those who do not speak the language. The main aim of the TraMOOC project was to tackle the language impediment by delivering a machine translation (MT) service that addresses various types of educational MOOC content such as assignments, tests, presentations, lecture subtitles and forum discussions from English into eleven European and BRIC languages (BG, CS, DE, EL, HR, IT, NL, PL , PT, RU, ZH). The core of the TraMOOC service is open-source, enabling the access of European and world citizen’s to educational content previously unavailable due to language barriers.[3][4][5] Furthermore, the project achieved to introduce novel translation-evaluation schemata that add value to existing tools and resources in linguistics, natural language processing text analytics, data mining and the MT scientific communities.[6]

Supported Language Pairs

The TraMOOC target language pairs were selected on the basis of criteria related to weak or fragmentary existing MT infrastructure.

  • English → Bulgarian (Български)
  • English → Chinese (漢語, 汉语)
  • English → Croatian (Hrvatski)
  • English → Czech (Čeština)
  • English → Dutch (Nederlands)
  • English → German (Deutsch)
  • English → Greek (Ελληνικά)
  • English → Italian (Italiano)
  • English → Polish (Polszczyzna)
  • English → Portuguese (Português)
  • English → Russian (Русский)

The Science of TraMOOC

The platform developed during the course of the project uses cutting edge neural translation architecture and innovative domain adaptation techniques in order to deliver machine translation services adapted to the MOOC domain. The TraMOOC Platform is the first free and open translation service that uses Neural Machine Translation Models (NMT) for educational content, which immensely increase the fluency and accuracy of the translation outputs offering big improvements over the conventional phrase-based statistical machine translation systems (PBSMT). TraMOOC handles the translation of a wide range of file formats including HTML, subtitles and Microsoft Office documents with both synchronous and asynchronous translation modes while it is also able to transform the translated material in the same file format as it was inserted into the system. In particular, the file formats supported are:

  • SRT, WebVTT, SCC, DFXP/TTML, SAMI → WebVTT
  • Microsoft DOCX, XLS, PPT → Microsoft DOCX, XLS, PPT
  • XML, JSON, HTML → XML, JSON, HTML

The service was integrated, operated and successfully field tested in a live MOOC platform. Further innovations in the frame of the TraMOOC project include the bootstrapping of resources for low-resource language pairs, state of the art evaluation schemata and metrics for translation quality and the use of crowdsourcing for data collection and evaluation.

Status

The Horizon 2020 TraMOOC project begun in February 2015 and concluded its activities, having a lifespan of three years, in January 2018. The total budget of the project is approximately 3M€, partially funded by the European Commission under Grant Agreement number 644333.[7] The project brought together nine highly competent partners from six European countries, forming a consortium of leading researchers, industrial organizations and leading user partners while it was coordinated in Germany.

References

External links