Instruction Response Pair
Instruction-response pairs are foundational to training large language models (LLMs) to effectively follow instructions, a crucial step in creating helpful and safe AI assistants. Current research focuses on improving the efficiency and quality of instruction tuning, exploring methods like data augmentation, mixup regularization, and leveraging unstructured text data to generate high-quality training sets, often employing techniques like response tuning or instruction pre-training. These advancements aim to reduce the reliance on expensive human annotation while enhancing model performance and safety, impacting both the development of more capable LLMs and their responsible deployment in various applications.
Papers
SelfCodeAlign: Self-Alignment for Code Generation
Yuxiang Wei, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman Jain, Zachary Mueller, Harm de Vries, Leandro von Werra, Arjun Guha, Lingming Zhang
Constraint Back-translation Improves Complex Instruction Following of Large Language Models
Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li