I am a linguistic typologist working with corpus-based and phylogenetic methods for studying language diversity. My interests range broadly across Bayesian statistics, diachronic explanations for universals, information-theoretic modelling, linguistic complexity and its socio-linguistic correlates, numerals, phylogenetic inference, word order variation, and a whole lot more. A central theme, if there is one, would be the integration of diachronic and functional explanations for morphosyntactic diversity found in the world's languages. 

SFB 1102 - C7 "Cross-linguistic information-theoretic modelling of communicative efficiency"

In this project, part of the the SFB 1102 IDeaL project, we aim to incorporate information status in information-theoretic modelling of language use in an explicit and cross-linguistic fashion in order to investigate communicative efficiency in terms of dependency locality and the memory-surprisal tradeoff. The first findings on doing that using our corpus CIEP+ can be found:

SocioBaGS - "Macro- and micro-variation in Bantu grammatical gender systems and their sociolinguistic correlates"

In this new project, Francesca Di Garbo and I, together with a host of collaborators, aim to discover the limits of variation in Bantu grammatical gender systems and explain this variation in sociolinguistic terms, mostly in terms of Bantu-Bantu and Bantu-non-Bantu language contact. This project is a continuation of earlier work:

  • see here for our typology of northwestern Bantu gender systems;
  • for our study investigating the above-mentioned typology in terms of geography and sociology, see here
  • the project proposal of our new project can be found here.

Other research

My research program at large spans from work on inferring phylogenetic relationships (phylogenetic trees)…

  • for example, on the Dravidian language family, see here;
  • for work on accounting for reticulation when inferring phylogenies, see here;

…to using these trees to do quantitative typology using phylogenetic comparative methods. I have worked on a variety of topics using these methods, among others;

  • on motion event encoding, see my PhD thesis;
  • on numeral typology, see here, I am also involved in Numeralbank, a work in progress containing information on the numeral systems of 1000s of languages;
  • on the ecology of language diversity, joining Chris Bentz, see here and here;
  • work on Indo-European negative existential constructions with Shahar Shirtz, see here and here

But I have worked on other topics too:

  • work with Ewelina Wnuk and colleagues, demonstrating that color technology is not necessary for rich and efficient color language;
  • see here for the presentation of Steve Moran's BDPROTO, a database comprising phonological inventory data from 257 ancient and reconstructed languages, including a case study on rates of change of consonantal and vocalic systems;
  • see here for a paper with Mark Pagel and colleagues on modeling vocabulary using approximate Bayesian computation, methodology usually employed in population genetics;
  • and see here  for a study on source and goal marking in various European languages

Most of these works are open access, and some can be found on one of the outlets below. If you need a copy of something and can’t find it, please write to me.


You can find some videos by me here:


