German Research Center for Artificial Intelligence

Keynote Title: Who is Afraid of Spurious Correlation in Classification?


Often in NLP we are interested in training classifiers and classification where the signal we are trying to focus on is very subtle and weak, competing with many other signals in the data. Cases in point include stylometry, authorship and gender prediction, bias detection etc. Even though especially neural classifiers often exhibit surprisingly strong classification results in such applications, it is often an open research question (and a worry) whether and if so to which extent spurious correlations in the training data are responsible for some of the high classification accuracies. In this talk I focus on ways of quantifying the possible impact of spurious correlations in the training data on translationese classification. Translationese is a cover term for the subtle but systematic linguistic differences between texts resulting from (high-quality professional) translations and texts in the same genre and style but originally authored in the target language of the translations.

My talk is based on joint work with Angana Borah, Daria Pylypenko and Christina España-Bonet:
Measuring Spurious Correlation in Classification: "Clever Hans" in Translationese. Angana Borah, Daria Pylypenko, Cristina España-Bonet and Josef van Genabith, Recent Advances in Natural Language Processing RANLP 2023, Bulgaria, 196-206, arXiv:2308.13170 [cs.CL]


Josef van Genabith is one of the Scientific Directors of the German Research Centre for Artificial Intelligence (DFKI), where he heads the Multilingual Language Technologies (MLT) Lab. He is Professor at Saarland University where he holds the Chair of Translation Oriented Language Technologies. He was founding Director of CNGL, the Centre for Next Generation Localisation (now ADAPT), Director of the National Center for Language Technology (NCLT), and an Associate Professor, Senior Lecturer and Lecturer in the School of Computing at Dublin City University (DCU), Ireland. He worked as a postdoctoral researcher at IMS, University of Stuttgart, Germany, and obtained a PhD and MA from the University of Essex, U.K. His first degree is in Electronic Engineering and English from RWTH Aachen, Germany.

