Self Restraint

Self-restraint in artificial intelligence focuses on developing methods to control and regulate the behavior of large language models (LLMs), preventing undesirable outputs like hallucinations or harmful content. Current research explores techniques like self-reflection and iterative self-evaluation, where models assess their own responses and adjust accordingly, as well as methods that leverage gradient-based control mechanisms to steer model generation towards desired behaviors without extensive human annotation. These advancements are crucial for ensuring the safe and responsible deployment of LLMs, improving their reliability and trustworthiness across various applications.

Papers

December 18, 2024

Self-control: A Better Conditional Mechanism for Masked Autoregressive Model
Qiaoying Qu, Shiyu Shen
Full Model Conditional Reasoning Autoregressive Generative Model Autoregressive Sequence Autoregressive Image Generation Self Restraint

November 1, 2024

A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective
Yeonsung Jung, Jaeyun Song, June Yong Yang, Jin-Hwa Kim, Sung-Yub Kim, Eunho Yang
Visual Perspective Model Bias Dataset Bias Biased Data Bias Aligned Sample Labeled Sample Self Restraint

June 4, 2024

Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Min Cai, Yuchen Zhang, Shichang Zhang, Fan Yin, Dan Zhang, Difan Zou, Yisong Yue, Ziniu Hu
Large Language Model LLM Behavior Auto Regressive Generation Suffix Prediction Meta Controller Self Restraint

May 15, 2024

LLMs can learn self-restraint through iterative self-reflection
Alexandre Piché, Aristides Milios, Dzmitry Bahdanau, Chris Pal
Large Language Model Self Feedback Context Prompting Self Restraint

April 27, 2023

Self-discipline on multiple channels
Jiutian Zhao, Liang Luo, Hao Wang
Self Distillation Multi Channel Label Regularization Self Restraint

February 27, 2023

Make Every Example Count: On the Stability and Utility of Self-Influence for Learning from Noisy NLP Datasets
Irina Bejan, Artem Sokolov, Katja Filippova
LeArning Abstract Natural Language Core Stability Borda Counting Data Cleaning Data Filtering Self Restraint Self Influence

August 8, 2022

Analog Bits: Generating Discrete Data using Diffusion Models with Self-Conditioning
Ting Chen, Ruixiang Zhang, Geoffrey Hinton
Diffusion Model Data Generation Continuous Variable Analog Bit Self Restraint

March 11, 2022

Overcoming Temptation: Incentive Design For Intertemporal Choice
Shruthi Sukumar, Adrian F. Ward, Camden Elliott-Williams, Shabnam Hakimi, Michael C. Mozer
Incentive Mechanism Design Reward Delay Self Restraint Goal Directed Decision Intertemporal Choice

February 17, 2022

Non-Autoregressive ASR with Self-Conditioned Folded Encoders
Tatsuya Komatsu
Encoder Side State of the Art Encoders Self Restraint Non Autoregressive ASR