On the Normalization of the Experimental Design of Multilingual Toxic Content Detection

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "On the Normalization of the Experimental Design of Multilingual Toxic 
Content Detection"

By

Miss Nedjma Djouhra OUSIDHOUM


Abstract

With the expanding use of social media platforms such as Twitter and the 
amount of text data generated online, hate speech has been proven to 
negatively affect individuals in general, and marginalized communities in 
particular. In order to improve the online moderation process, there has 
been an increasing need for accurate detection tools which do not only 
flag bad words but rather help to filter out toxic content in a more 
nuanced fashion. Hence, a problem of central importance is to acquire data 
of better quality in order to train toxic content detection models. 
However, the absence of a universal definition of hate speech makes the 
collection process hard and the training corpora sparse, imbalanced, and 
challenging for current machine learning techniques. In this thesis, we 
address the problem of automatic toxic content detection through three 
main axes: (1) the construction of resources lacking from robust toxic 
language and hate speech detection systems, (2) the study of bias in hate 
speech and toxic language classifiers, and (3) the assessment of inherent 
harmful biases within NLP systems by looking into Large Pre-trained 
Language Models (PTLMs) which are at the core of these systems.

In order to train a multi-cultural, fine-grained hate speech and toxic 
content detection system, we have built a new multi-aspect hate speech 
dataset in English, French, and Arabic. We also provide a detailed 
annotation scheme which indicates (a) whether a tweet is direct or 
indirect; (b) if it is offensive, disrespectful, hateful, fearful out of 
ignorance, abusive, or normal; (c) the attribute based on which it 
discriminates against an individual or a group of people; (d) the name of 
this group; and (e) how annotators feel about this tweet given a range of 
negative to neutral sentiments. We define classification tasks based on 
each labeled aspect and use multi-task learning to investigate how such a 
paradigm can improve the detection process.

Unsurprisingly, when testing the detection system, the imbalanced data 
along with implicit toxic content and misleading instances has resulted in 
false positives and false negatives. We examine misclassification 
instances due to the frequently neglected yet deep-rooted selection bias 
caused by the data collection process. In contrast to work on bias which 
typically focuses on the classification performance, we investigate 
another source of bias and present two language and label-agnostic 
evaluation metrics based on topic models and semantic similarity measures 
to evaluate the extent of such a problem on various datasets. Furthermore, 
since we generally focus on English and overlook other languages, we 
notice a gap in content moderation across languages and cultures, 
especially in low resource settings. Hence, we leverage the observed 
differences and correlations across languages, datasets, and annotation 
schemes to carry a study on multilingual toxic language data and how 
people react to it.

Finally, despite their incontestable usefulness and effectiveness, Large 
Pretrained Language Models (PTLMs), which are at the center of all major 
NLP systems nowadays, have been shown to carry and reproduce harmful 
biases due to the sources of their training data among other reasons. We 
propose a methodology to probe the potentially harmful content that they 
convey with respect to a set of templates, and report how often they 
enable toxicity towards specific communities in English, French, and 
Arabic.

The results presented in this thesis show that, despite the complexity of 
such tasks, there are promising paths to explore in order to improve the 
automatic detection, evaluation, and eventually mitigation of toxic 
content in NLP.


Date:			Monday, 5 July 2021

Time:			4:00pm - 6:00pm

Zoom Meeting: 
https://hkust.zoom.us/j/94560105386?pwd=MUFBWTR3VTU2eSs4R3hhWXNiY2R4dz09

Chairperson:		Prof. Hongtao ZHANG (ISOM)

Committee Members:	Prof. Yangqiu SONG (Supervisor)
 			Prof. Dit Yan YEUNG (Supervisor)
 			Prof. Brian MAK
 			Prof. Nevin ZHANG
 			Prof. Pascale FUNG (ECE)
 			Prof. Preslav NAKOV (Hamad Bin Khalifa University 
and QCRI)


**** ALL are Welcome ****