COURAGE @ EVALITA 2020
COURAGE @ EVALITA 2020
A key concern of the COURAGE project is to identify and tackle hate speech in social media. Like many problems in the area of Natural Language Processing (NLP) this can be framed as a binary classification task, i.e. a posting is either considered to express hate speech or it is not. Despite some amazing progress in a wide range of NLP applications in recent years, hate speech detection remains a very challenging task.
Before looking at how this task might be addressed let us step back and take stock of what has been happening in NLP in the last two years or so. The wider field of NLP has seen a revolutionary transformation from traditional statistical methods that used to represent the state of the art in most problem areas to neural network-based approaches that have pushed up the performance benchmarks in more or less all common NLP tasks, not just a bit but by substantial margins. Perhaps the single most important milestone on this journey was the release of BERT, a contextual language model trained on hundreds of millions of words capturing regularities of language(s) and forming the backbone of fine-tuning steps to address individual NLP tasks.
Against this backdrop Julia Hoffmann, an MA student in Information Science at the University of Regensburg, developed an ensemble classifier which would tap into contextual embeddings from a multilingual BERT model as a reference architecture on our way to robust hate speech detection for a variety of languages (in COURAGE we are concerned with Italian, Spanish, German and English).
The perfect testbed to try out this work turned out to be EVALITA 2000, the 7th workshop of an NLP evaluation campaign that goes back to 2007. EVALITA introduced
a hate speech detection challenge applied to Italian
social media in 2018. Its success led to the continuation of the challenge in
2020, now called HaSpeeDe 2, and that’s what we signed up for.
The ensemble classifier we submitted demonstrates robust performance as it did not just score well on social media but when tested on news headlines it was ranked 6th out of 27 submissions. We are looking forward to the actual workshop which will be held online this year. Keep the dates in your diary, it’s going to be 16-17 December and it’s a chance to catch up with Julia.
Julia Hoffmann and Udo Kruschwitz (2020). “UR NLP @ HaSpeeDe 2 at EVALITA 2020: Towards Robust Hate Speech Detection with Contextual Embeddings”. In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors, Proceedings of Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA 2020), Online. CEUR.org.