Non Toxic

Research on "non-toxic" language focuses on detecting and mitigating harmful content generated by large language models (LLMs), particularly toxic, biased, and offensive language. Current efforts concentrate on developing robust detection models using transformer architectures like BERT and LLMs, exploring methods to reduce toxicity during model training and prompting, and creating comprehensive benchmark datasets reflecting diverse languages and cultural contexts. This research is crucial for ensuring the safe and ethical deployment of LLMs in various applications, mitigating the risks of harmful content generation and promoting responsible AI development.

Papers

October 6, 2022

Toxicity in Multilingual Machine Translation at Scale
Marta R. Costa-jussà, Eric Smith, Christophe Ropers, Daniel Licht, Jean Maillard, Javier Ferrando, Carlos Escolano
Visual Analogue Scale Multilingual Machine Translation Toxicity Detection Machine Translation System Non Toxic Attribution Map Toxic Text Translation Artifact

May 15, 2022

Mitigating Toxic Degeneration with Empathetic Data: Exploring the Relationship Between Toxicity and Empathy
Allison Lahnala, Charles Welch, Béla Neuendorf, Lucie Flek
Human Relationship Non Toxic Cognitive Empathy Toxicity Mitigation Empathy Level

May 5, 2022

Robust Conversational Agents against Imperceptible Toxicity Triggers
Ninareh Mehrabi, Ahmad Beirami, Fred Morstatter, Aram Galstyan
Adversarial Attack Non Toxic Language Generation Model Toxicity Detection Model

May 1, 2022

Is Your Toxicity My Toxicity? Exploring the Impact of Rater Identity on Toxicity Annotation
Nitesh Goyal, Ian Kivlichan, Rachel Rosen, Lucy Vasserman
Global Impact Human Evaluation Non Toxic Toxicity Annotation Multi Rater

April 30, 2022

Detoxifying Language Models with a Toxic Corpus
Yoon A Park, Frank Rudzicz
Language Generation Autoregressive Language Model Debiasing Method Non Toxic Language Model Detoxification

April 19, 2022

Understanding Toxicity Triggers on Reddit in the Context of Singapore
Yun Yu Chong, Haewoon Kwak
Context Information Non Toxic Online Community Social Medium Site Reddit Social Contagion

March 6, 2022

Leashing the Inner Demons: Self-Detoxification for Language Models
Canwen Xu, Zexue He, Zhankui He, Julian McAuley
Language Model Training Corpus Trading Devil Non Toxic Toxic Language Fine Grained Detoxification Gated Toxicity Avoidance

February 26, 2022

Automated Identification of Toxic Code Reviews Using ToxiCR
Jaydeb Sarker, Asif Kamal Turzo, Ming Dong, Amiangshu Bosu
Toxicity Detection Non Toxic Automatic Identification Toxicity Annotation Code Review

February 7, 2022

Jury Learning: Integrating Dissenting Voices into Machine Learning Models
Mitchell L. Gordon, Michelle S. Lam, Joon Sung Park, Kayur Patel, Jeffrey T. Hancock, Tatsunori Hashimoto, Michael S. Bernstein
Machine Learning Machine Learning Model Machine Learning Algorithm Non Toxic

December 15, 2021

Simple Text Detoxification by Identifying a Linear Toxic Subspace in Language Model Embeddings
Andrew Wang, Mohit Sudhakar, Yangfeng Ji
Language Model Non Toxic Harmful Content Text Detoxification

December 7, 2021

Ground-Truth, Whose Truth? -- Examining the Challenges with Annotating Toxic Text Datasets
Kofi Arhin, Ioana Baldini, Dennis Wei, Karthikeyan Natesan Ramamurthy, Moninder Singh
Technical Challenge Natural Language Ground Truth Non Toxic Great Truth

November 15, 2021

Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection
Maarten Sap, Swabha Swayamdipta, Laura Vianna, Xuhui Zhou, Yejin Choi, Noah A. Smith
Individual Annotator Non Toxic Toxicity Annotation Toxic Language Detection Toxicity Detection Datasets Deviant Racist Behaviour