Content Moderation Software
Content moderation software aims to automatically identify and filter harmful online content, such as hate speech and misinformation, across various media types (text, images, video). Current research focuses on mitigating biases against marginalized groups, improving detection of subtle or disguised toxic content (e.g., implicit hate speech, text embedded in images), and developing more robust methods to prevent "jailbreaking" of large language models. These advancements are crucial for creating safer online environments and ensuring fairness in algorithmic decision-making, impacting both the development of more equitable AI systems and the practical management of online platforms.
Papers
September 9, 2024
June 20, 2024
February 21, 2024
August 18, 2023
May 23, 2023