Moderation Model
Content moderation models aim to automatically identify and remove harmful or inappropriate content from online platforms, mitigating risks to users and reducing the burden on human moderators. Current research focuses on improving model robustness against adversarial attacks and biases, often employing techniques like data augmentation, fusion models combining different algorithms (e.g., KNN and LLMs), and incorporating community rules or platform guidelines directly into model training. These advancements are crucial for creating safer online environments and are driving significant improvements in the accuracy and efficiency of content moderation systems across various platforms and content types (text, images, multimedia).
Papers
August 21, 2024
March 19, 2024
November 29, 2023
November 16, 2023
November 14, 2023
September 12, 2023
August 18, 2023
July 4, 2023
May 23, 2023
February 19, 2023