SFB1102 - Information Density and Linguistic Encoding (IDeaL)


Deutsche Forschungsgemeinschaft DFG

Translated texts often exhibit characteristic features different from originally authored text. In Translation Studies, this is sometimes referred to as translationese. Some aspects of translationese are due to the source language, from which the translation was prepared, and varies with source language. Other aspects of translationese are deemed universal, including the tendency that translated texts are often more simple than originally authored text and the tendency that translated texts are often more explicit. A growing body of research has confirmed the existence of different aspects of translationese in human translation using a number of methodologies from empirical translation studies (Gellerstam 1986, Baker 1993, Laviosa 1998, Hansen 2003, Teich 2003) and more recently from machine learning based text categorisation and computational stylometrics (Baroni and Bernardini 2006, Ilisei et al. 2010, Koppel and Ordan 2011). However, to date there is no common methodological framework for characterising translationese and the study of translationese has concentrated on artefacts of human translation. Our research will explore to what extent information density can be used as a methodological framework to capture important aspects of translationese in both human and machine translation (MT). We will investigate the use of information density measures as additional features in MT models, as well as in MT evaluation (against a reference) and MT quality estimation (without access to a reference).

Dr. Raphaël Rubino, Prof. Dr. Josef van Genabith

Kontakt

Universität des Saarlandes
Fachrichtung Sprachwissenschaft
und Sprachtechnologie (ehem. FR 4.6)
Geb. A2 2, Raum 1.13
Campus
D-66123 Saarbrücken

Tel.: +49 681 302-2931
Fax: +49 681 302-64375
E-Mail: josef.vangenabith(at)uni-saarland.de