Abusive Language Classifier

Abusive language classifiers aim to automatically identify hateful or offensive language online, primarily focusing on improving accuracy and fairness. Current research emphasizes enhancing model generalizability across diverse demographics and domains, addressing biases that disproportionately affect certain groups, and developing methods to explain classifier decisions and detect spurious correlations. This work is crucial for mitigating the harms of online abuse and promoting safer online environments, impacting both content moderation practices and the broader understanding of bias in machine learning.

Papers