Crowdsourcing linguistic annotations
Due to the current situation, the meetings of this seminar may take place virtually. Additional virtual sessions may be arranged depending on the number of enrolled students.
Interested students should register to Moodle before 4/May such that we can plan accordingly.
Seminar (in English):
held by: Dr. Frances Yung
time: 10:15 - 11:45 hrs, on Mondays
location: ( in building C7.3, seminarroom 1.12 )
start date: ....... not known until further notice, planned: 04 May 2020
suitable MSc in Language Science and Technology /
for: BSc in Computational Linguistics
I m p o r t a n t I n f o r m a t i o n
All information regarding schedule, location and topics can be found in the Moodle.
Before the first lecture: Sign up to the Moodle , and enrol in the course.
Annotated data is essential to train any supervised NLP systems. In particular, modern deep neural network models require a large amount of training data, which are costly and time-consuming to obtain through traditional expert annotation. Crowdsourcing is increasingly used as an alternative to collect linguistic annotations such as sentiment analysis, textual entailment, coreference, lexical semantics etc.
Crowdsourcing research is interdisciplinary. How do we build statistical models to aggregate multiple labels of each sample? Or should we aggregate the "crowd wisdom" to one gold label at all? In terms of human-computer interaction, how do we design intuitive and motivating tasks (e.g. Games with a Purpose). From the cognitive point of view, what is the effect of individual differences in crowdsourcing? In addition, linguistic annotation is particularly complex because the items generally depend on each other while the set of valid labels may not be always the same. How do we design a task to cope with that?
In this seminar, we will discuss papers on aggregation of crowdsourced labels, agreement / disagreement of crowdsourced data, design of crowdsourcing tasks / corpora etc.