From: Developing an online hate classifier for multiple social media platforms
Challenge | Description |
---|---|
False positive problem | False positives occur when a model detects a non-threatening expression as hateful content due to the presence of some words/phrases as a feature. For example, a tweet such as “Bill aims to fix sex-offender list’s inequity toward gay men” can be labeled as hateful whereas, in reality, it is not an offensive expression but a simple statement |
False negative problem | False negatives include cases when the model detects a threatening expression as non-threatening. For example, a keyword detector could correctly detect “I fucking hate Donald Trump”, but ignore “Donald Trump is a rat”. In reality, both of these expressions can be considered hateful |
Subjectivity | The datasets can involve subjectivity arising from several sources. Crowd raters may not understand context or follow instructions. There can be high disagreement of what constitutes hate and various biases, such as racial bias [66, 110], can occur when constructing ground truth datasets. Sarcasm and humor further exacerbate the problem, as individuals’ ability to interpret these types of language greatly varies |
Polysemy | Polysemy, i.e., the same word or phrase having a different meaning in different contexts (e.g., social media community or platform) can greatly complicate the detection of online hate, as it introduces contextuality that the model should be aware of |