Content Moderation
Content moderation aims to identify and remove harmful or inappropriate content from online platforms, balancing freedom of expression with the need for safe online environments. Current research focuses on leveraging large language models (LLMs) and transformer-based architectures, often incorporating multimodal data (text, images, video, audio) and contextual information to improve accuracy and fairness in detection and mitigation of harmful content like hate speech, misinformation, and inappropriate material for children. This field is crucial for maintaining healthy online communities and is driving advancements in AI, particularly in areas like bias detection, explainable AI, and efficient model deployment for resource-constrained environments.
Papers
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang
Watching the Watchers: A Comparative Fairness Audit of Cloud-based Content Moderation Services
David Hartmann, Amin Oueslati, Dimitri Staufer