Research

The Department of Language Science and Technology is one of the leading institutions in language and speech research in Europe. The current flagship project is the Collaborative Research Center on Information Density and Linguistic Encoding (SFB 1102) funded by the German Research Foundation (DFG).

Since 1992 our current department and/or its predecessors (“Translation and Interpreting” & “Computational Linguistics and Phonetics”) have been continuously involved in major collaborative projects:

RTG Neuroexplicit Models (2023-2028)
CRC 1102 Information density & Linguistic Encoding (2014-2026)
Cluster of Excellence on Multimodal Computing and Interaction (2007-2018)
International GRK 715 Language Technology & Cognitive Systems (2001-2010)
CRC 378 Resource-Adaptive Cognitive Processes (1996-2007)
GRK Cognitive Science (1992-2000)

At present the department research groups are involved in a significant number of European and nationally funded projects. All ongoing and completed research projects can be found on the chair web pages. The research groups are interested in a range of exciting research topics.

How is it that people understand language in real-time, mapping from linguistic signal — word by word — into a mental representation of what is being communicated? To address this question, we use advanced, high-resolution experimental methods such as eye-tracking and measurements of brain activity in the EEG signal, which can reveal subtle variations in cognitive effort on each word, and even index different stages of the language comprehension process. In our experiments we examine, for example, the interaction of meaning with perception: how people anticipate what’s coming next based on what they’ve heard, what they know about the world, and information in the visual environment.

These experimental findings are then brought together, to inform the development of computational theories of human language understanding. We build artificial neural network models that not only incrementally determine the meaning of utterances, but also use this meaning to guide the unfolding perception of the incoming signal, just as people do. In addition our computational models aim to reflect the organisation of the brain's language comprehension network, as revealed by neurophysiological evidence.

Our research group aims to bring together work on natural language processing systems, psycholinguistics and cognitive modelling. Current research foci are on the comprehension of discourse-level phenomena (such as coherence relations, pragmatic inferences and the integration between knowledge and the linguistic signal) and the generation of coherent texts, which can be tailored to a variety of different users.

Machine translation is the automated process of converting input in one language into another. We focus on research on

machine translation of text and sign-language
document and dialog translation
multilingual embeddings and translation
low- and rich-resource scenarios
multimodal post-editing interfaces
automatically detecting Translationese (i.e. characteristic linguistic aspects of translations)

MLT carries out basic, applied and contract research and development. MLT has four research groups:

Machine Translation (MT)
Question-Answering and Information Extraction (QAIE)
Talking Robots (TR)
Data and Resources (D&R)

Our group studies information processing in humans and machines, with a focus on language. Specifically, we investigate:

Foundations of Machine Learning: We study the abilities, limitations, and inner workings of machine learning models underlying LLMs and other AI systems.
Computational Cognition and Neuroscience: We investigate how the human mind processes information in language, vision, and other domains.

Machine learning is already for a long time the paradigm behind many systems of natural language processing. In recent years deep neural networks have become the dominant paradigm. A deeper understanding of the working of neural networks is essential to move the topic ahead. In addition there are many specific questions like:

low resource language processing: can neural networks be trained on tiny amounts of training data?
neural networks & privacy: how do neural networks memory information. How can this be used or prevented?
sentiment analysis and hate speech detection
vision & language: how can real world information improve language processing systems?
dialog systems: multimodal dialog, interaction analysis, dialog management

At the interface between the humanities and natural sciences, our research group deals with all areas of spoken-language communication. The group's research focus is on laboratory phonology, sound change, first and second language acquisition, regional variation, and prosody. Using signal and experimental phonetic methods, which are applied to both controlled laboratory data and large speech corpora, we investigate sounds and their combinations from speech production (articulatory phonetics) to the transmission of the resulting acoustic speech signals (acoustic phonetics) and to speech perception and cognitive processing. The focus is not only on the individual speech sounds of the world and their systemic relevance to the language system (phonology), but also on meaningful speech melodies, speech rhythm, as well as synchronic variation (e.g. co-articulatory, speaker-idiosyncratic, cross-generational) and diachronic change. We model these dynamics using computer-based simulations, among other things.

read more here

Speech Science at Saarland University!

Our research group is interested in a number of topics in computational linguistics ranging from syntactic and semantic parsing over natural language generation to dialogue systems. The group developed a fast and accurate semantic parser that works across different styles of semantic representations and parsing and generation algorithms that generalize over many different grammar formalisms. The group has experience with the large-scale evaluation of interactive NLG systems and maintains the DialogOS system for rapidly developing spoken dialogue systems. At a methodological level, topics of interest include how to best divide the labor between symbolic and neural methods to obtain accurate, robust, and efficient systems that still respect linguistic principles.

A central theme of my research is the integration of phonetic knowledge in speech technology. I have worked extensively on text-to-speech synthesis, a wonderful framework for implementing and testing computational models of linguistic and phonetic processes - until the advent of end-to-end systems, that is. Another recurring topic in my research is the analysis and modeling of speech prosody. My recent work has focused on experimental methods and computational simulations to study aspects of speech production, perception, and acquisition.

Modeling linguistic variation
Language varies according to a number of variables, such as social group, gender, discourse purpose, medium or time. For instance, we may be interested in how people expressed stance and opinions in the 19th century compared to modern time; or in the way scientific terminology formed in the Late Modern period; or in how men or women dominate linguistic usage in private vs. public settings in contemporary language. From the point of view of linguistic theory, insights on linguistic usage shed light on social norms and identities as reflected in difference in language use but also on the common concern in all communication: to bring our message across. We study linguistic variation and change on the basis of representative corpora using computational methods, ranging from n-gram models, topic models to word embeddings, and continuously adapt new methods to our analysis needs. This also includes visualization of language model outputs, e.g. diachronic word embeddings or surprisal in text.

Computational translation studies
The study of human translation is a multi-faceted endeavour that covers the analysis of translation as product and process. Focusing on translation as product, our research is focused on variation in translation using parallel and cross-lingually comparable corpora. Dimensions of variation we investigate include translation mode (written translation vs. interpreting) , language pair, translation direction and translation expertise (learner vs. professional). Analysis along these dimensions typically includes comparison with special regard of the specific properties of translations ("translationese"). We develop comparative methods using state-of-the-art computational language models enhanced with statistical and information-theoretic measures to assess differences and commonalities in translation for application in translation studies as well as machine translation and related technologies.

Comparing languages is central to understanding them. How to make sound comparisons, however, is another question. We do work on the intersection of linguistic typology, corpus linguistics and computational linguistics, using state-of-the-art methodology to shed light on topics such as word order, negation, and nominal classification. Two main foci are corpus-based typology and the use of phylogenetic comparative methods for typology. This has led us to build a parallel corpus (CIEP, Corpus of Indo-European Prose) and use information theoretic measures such as entropy and surprisal to describe cross-linguistic variability. The focus of the group is not limited to Indo-European; we also work on world-wide samples, Austronesian, and Bantu.

Research

Cookie Configuration