Corpora in Translation Studies

From HandWiki

Corpora in Translation Studies Gradually the translator’s workplace has changed over the last ten years. Personal computers now have the capacity to process information easier and quicker than ever before, and so today's computer could be considered an important or even essential tool in translation. However, problems arise in the use of computers in translation, as the computer is no substitute for traditional tools such as monolingual and bilingual dictionaries, terminologies and encyclopaedias on paper or in digital format and although we can easily access a large amount of information, we need to find the right and reliable information.

Here Corpora and concordancing software play an important role since gaining access to information about language, content and translation practices which was hardly available to translators before the present stage of ICT development.

Machine Translation based on Corpus

Machine translation from corpus linguistics is based in the analysis of real samples with its own translations. Among the different devices that use corpus, there are statistical methods and based on examples.

Statistical Methods

The main objective of statistic machine translation is to generate translations from statistical methods based in corpus of bilingual texts. For instance European parliament minutes are written in all EU (European Union) official languages. If there were more of this corpus, we would get excellent results of translation of texts about those subjects. The first statistic machine translation program was CANDIDE by IBM.

Based on examples

Machine translation based on examples is well known for using a bilingual corpus as the main source of knowledge. Basically it’s an analogical translation and could be interpreted as a practice of cases reasoning used in automatic learning, which consists in solving a problem basing on solutions of others similar problems.

Corpora and Translation

Translation typology

According to EAGLES, we can make a general distinction between Monolingual and Multilingual corpora. At the same time in multilingual corpora, we can distinguish between Comparable corpora and Corpora compiled using similar design criteria but which are not translations.

Parallel or Translation corpora are texts in one language aligned with their translation in another. We have to take into account several variables like directness of translation, number of languages, etc.

There are many Monolingual Comparable Corpora (corpus composed in two sub-sections, one of original texts in one language and the other texts translated into the same language). It’s useful for translation theorists and researchers but professional technical translators use translation memories.

Defining Translation memories

Translation memory is a very specific type of parallel corpus in that:

  1. It is “propietory”: TMs are created individually or collectively around specific translation projects.
  2. TMs tend to be closure. They are standardized and have a restricted range of linguistic options.

Translation workbenches and TMs could be considered the most successful translation tool; however it’s restricted to specific text types.

Corpora aids in Translation

The previous kinds of corpora can be combined with other tools like a dictionary for example. Corpora can function as general or specialized dictionaries. In that way, Comparable corpora can be seen as a monolingual dictionary and Parallel corpora could be compared to a bilingual dictionary.

Corpus resources for Translators

Not all dictionaries are the same, and neither are all corpora. Apart from translation memories, corpus resources with a potential use for professional translators could be classified from “robust” to “virtual”.

Some examples of corpora could be BNC (British National Corpus) or the Spanish corpus CREA or the Italian CORIS and so on.

It’s important to mention the difference by corpus linguistics between corpora and archives of electronic texts; the second one is only a repertory of electronic texts. Building a corpus of web pages implies an information retrieval operation, in order to locate relevant and reliable documents.

In many translation classes students have made their own corpora with DIY (do it yourself) corpora. The main benefits of DIY corpora may be summarized as follows:

  • They are easy to make.
  • They are a great resource for content information.
  • They are a great resource for terminology and phraseology.
  • Not all topics, not all types and not all languages are available.
  • The relevance and reliability of documents needs to be carefully assessed.
  • Existing concordance software isn’t well equipped for HTML or XML files.

Finally the advantages of “robust” corpora that we can see over “virtual” corpora are the following:

  • They are usually more reliable
  • They are usually larger.
  • They may be improved with linguistic and contextual information.

References

Baker, M (1993). "Corpus linguistics and translation studies. "Implications and applications" in M. Baker G. Francis & E. Tognini-Bonelli (eds.) Text and technology. Philadelphia/ Amsterdam: John Benjamins, 232–252.

Scott, M. (1996) Wordsmith tools .Oxford: Oxford University Press.

Zanettin, Federico (2002). Corpora in Translation Practice. In E. Yuste-Rodrigo (ed.). Language Resources for Translation Work and Research LREC 2002 Workshop Proceedings, University of Las Palmas de Gran Canaria, ELRA, 10–14.

External links