Skip to main content

Table 2 Challenges of online hate detection

From: Developing an online hate classifier for multiple social media platforms

False positive problemFalse positives occur when a model detects a non-threatening expression as hateful content due to the presence of some words/phrases as a feature. For example, a tweet such as “Bill aims to fix sex-offender list’s inequity toward gay men” can be labeled as hateful whereas, in reality, it is not an offensive expression but a simple statement
False negative problemFalse negatives include cases when the model detects a threatening expression as non-threatening. For example, a keyword detector could correctly detect “I fucking hate Donald Trump”, but ignore “Donald Trump is a rat”. In reality, both of these expressions can be considered hateful
SubjectivityThe datasets can involve subjectivity arising from several sources. Crowd raters may not understand context or follow instructions. There can be high disagreement of what constitutes hate and various biases, such as racial bias [66, 110], can occur when constructing ground truth datasets. Sarcasm and humor further exacerbate the problem, as individuals’ ability to interpret these types of language greatly varies
PolysemyPolysemy, i.e., the same word or phrase having a different meaning in different contexts (e.g., social media community or platform) can greatly complicate the detection of online hate, as it introduces contextuality that the model should be aware of