The Department of Language Science and Technology is one of the leading institutions in language and speech research in Europe. The current flagship project is the Collaborative Research Center on Information Density and Linguistic Encoding (SFB 1102) funded by the German Research Foundation (DFG).
Since 1992 our current department and/or its predecessors (“Translation and Interpreting” & “Computational Linguistics and Phonetics”) have been continuously involved in major collaborative projects:
- GRK Cognitive Science (1992 - 2000)
- CRC 378 Resource-Adaptive Cognitive Processes (1996 - 2007)
- International GRK 715 Language Technology & Cognitive Systems (2001-2010)
- Cluster of Excellence on Multimodal Computing and Interaction (2007-2018)
- CRC 1102 Information density & Linguistic Encoding (2014 - 2022)
At present the department research groups are involved in a significant number of European and nationally funded projects. All ongoing and completed research projects can be found on the chair web pages. The research groups are interested in a range of exciting research topics.
How is it that people understand language in real-time, mapping from linguistic signal — word by word — into a mental representation of what is being communicated? To address this question, we use advanced, high-resolution experimental methods such as eye-tracking and measurements of brain activity in the EEG signal, which can reveal subtle variations in cognitive effort on each word, and even index different stages of the language comprehension process. In our experiments we examine, for example, the interaction of meaning with perception: how people anticipate what’s coming next based on what they’ve heard, what they know about the world, and information in the visual environment.
These experimental findings are then brought together, to inform the development of computational theories of human language understanding. We build artificial neural network models that not only incrementally determine the meaning of utterances, but also use this meaning to guide the unfolding perception of the incoming signal, just as people do. In addition our computational models aim to reflect the organisation of the brain's language comprehension network, as revealed by neurophysiological evidence.
Our research group aims to bring together work on natural language processing systems, psycholinguistics and cognitive modelling. Current research foci are on the comprehension of discourse-level phenomena (such as coherence relations, pragmatic inferences and the integration between knowledge and the linguistic signal) and the generation of coherent texts, which can be tailored to a variety of different users.
Machine translation is the automated process of converting input in one language into another. We focus on research on
- machine translation of text and sign-language
- document and dialog translation
- multilingual embeddings and translation
- low- and rich-resource scenarios
- multimodal post-editing interfaces
- automatically detecting Translationese (i.e. characteristic linguistic aspects of translations)
MLT carries out basic, applied and contract research and development. MLT has four research groups:
- Machine Translation (MT)
- Question-Answering and Information Extraction (QAIE)
- Talking Robots (TR)
- Data and Resources (D&R)
Machine learning is already for a long time the paradigm behind many systems of natural language processing. In recent years deep neural networks have become the dominant paradigm. A deeper understanding of the working of neural networks is essential to move the topic ahead. In addition there are many specific questions like:
- low resource language processing: can neural networks be trained on tiny amounts of training data?
- neural networks & privacy: how do neural networks memory information. How can this be used or prevented?
- sentiment analysis and hate speech detection
- vision & language: how can real world information improve language processing systems?
- dialog systems: multimodal dialog, interaction analysis, dialog management
Our research group is interested in a number of topics in computational linguistics ranging from syntactic and semantic parsing over natural language generation to dialogue systems. The group developed a fast and accurate semantic parser that works across different styles of semantic representations and parsing and generation algorithms that generalize over many different grammar formalisms. The group has experience with the large-scale evaluation of interactive NLG systems and maintains the DialogOS system for rapidly developing spoken dialogue systems. At a methodological level, topics of interest include how to best divide the labor between symbolic and neural methods to obtain accurate, robust, and efficient systems that still respect linguistic principles.
Research in phonetics and speech science at Saarland University covers both linguistic phonetics and speech technology. Active research areas include speech production and perception, prosody, phonetic characteristics of discourse and dialog, and speech synthesis. A central theme of our research is to integrate phonetic knowledge in speech technology. Our group has worked extensively on text-to-speech synthesis, a wonderful framework for implementing and testing computational models of linguistic and phonetic processes. Recent work has focused on experimental methods and computational models to study various aspects of speech production, perception, and second language acquisition.
Modeling linguistic variation
Language varies according to a number of variables, such as social group, gender, discourse purpose, medium or time. For instance, we may be interested in how people expressed stance and opinions in the 19th century compared to modern time; or in the way scientific terminology formed in the Late Modern period; or in how men or women dominate linguistic usage in private vs. public settings in contemporary language. From the point of view of linguistic theory, insights on linguistic usage shed light on social norms and identities as reflected in difference in language use but also on the common concern in all communication: to bring our message across. We study linguistic variation and change on the basis of representative corpora using computational methods, ranging from n-gram models, topic models to word embeddings, and continuously adapt new methods to our analysis needs. This also includes visualization of language model outputs, e.g. diachronic word embeddings or surprisal in text.
Computational translation studies
The study of human translation is a multi-faceted endeavour that covers the analysis of translation as product and process. Focusing on translation as product, our research is focused on variation in translation using parallel and cross-lingually comparable corpora. Dimensions of variation we investigate include translation mode (written translation vs. interpreting) , language pair, translation direction and translation expertise (learner vs. professional). Analysis along these dimensions typically includes comparison with special regard of the specific properties of translations ("translationese"). We develop comparative methods using state-of-the-art computational language models enhanced with statistical and information-theoretic measures to assess differences and commonalities in translation for application in translation studies as well as machine translation and related technologies.
Comparing languages is central to understanding them. How to make sound comparisons, however, is another question. We do work on the intersection of linguistic typology, corpus linguistics and computational linguistics, using state-of-the-art methodology to shed light on topics such as word order, negation, and nominal classification. Two main foci are corpus-based typology and the use of phylogenetic comparative methods for typology. This has led us to build a parallel corpus (CIEP, Corpus of Indo-European Prose) and use information theoretic measures such as entropy and surprisal to describe cross-linguistic variability. The focus of the group is not limited to Indo-European; we also work on world-wide samples, Austronesian, and Bantu.