Content Moderation
Content moderation aims to identify and remove harmful or inappropriate content from online platforms, balancing freedom of expression with the need for safe online environments. Current research focuses on leveraging large language models (LLMs) and transformer-based architectures, often incorporating multimodal data (text, images, video, audio) and contextual information to improve accuracy and fairness in detection and mitigation of harmful content like hate speech, misinformation, and inappropriate material for children. This field is crucial for maintaining healthy online communities and is driving advancements in AI, particularly in areas like bias detection, explainable AI, and efficient model deployment for resource-constrained environments.
Papers
Socio-Culturally Aware Evaluation Framework for LLM-Based Content Moderation
Shanu Kumar, Gauri Kholkar, Saish Mendke, Anubhav Sadana, Parag Agrawal, Sandipan Dandapat
Zero-Shot Image Moderation in Google Ads with LLM-Assisted Textual Descriptions and Cross-modal Co-embeddings
Enming Luo, Wei Qiao, Katie Warren, Jingxiang Li, Eric Xiao, Krishna Viswanathan, Yuan Wang, Yintao Liu, Jimin Li, Ariel Fuxman