Weak to Strong
Weak-to-strong (W2S) generalization explores how a powerful model ("strong student") can learn effectively from a less capable model ("weak teacher"), even when the teacher's guidance is imperfect or incomplete. Current research focuses on improving W2S learning in various contexts, including large language models (LLMs) and other machine learning architectures, often employing ensemble methods, self-training, and adaptive weighting schemes to mitigate the impact of noisy or limited teacher supervision. This research area is significant because it addresses the challenge of efficiently training and aligning increasingly complex AI systems, particularly in scenarios where obtaining high-quality labeled data is difficult or expensive, and has implications for improving the reliability and safety of advanced AI.
Papers
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
M. Emrullah Ildiz, Halil Alperen Gozeten, Ege Onur Taga, Marco Mondelli, Samet Oymak
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang, Pengfei Liu, Rui Wang
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang, Shiqi Shen, Guangyao Shen, Wei Yao, Yong Liu, Zhi Gong, Yankai Lin, Ji-Rong Wen
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion
Chenghao Fan, Zhenyi Lu, Wei Wei, Jie Tian, Xiaoye Qu, Dangyang Chen, Yu Cheng