Arithmetic Task
Task arithmetic, a method for modifying pre-trained models by arithmetically combining their weights after fine-tuning on different tasks, is a burgeoning area of research focused on improving model efficiency and performance. Current work centers on understanding the underlying mechanisms of task arithmetic, particularly weight disentanglement, and applying it to various tasks including speech translation, arithmetic reasoning, and speech recognition, often using transformer-based models. These studies aim to enhance model generalization, mitigate issues like catastrophic forgetting and the synthetic-to-real gap, and improve the cost-effectiveness of adapting large language models to new domains. The findings have implications for both the theoretical understanding of model behavior and the practical development of more efficient and adaptable AI systems.
Papers
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective
Guhao Feng, Bohang Zhang, Yuntian Gu, Haotian Ye, Di He, Liwei Wang
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation Analysis
Alessandro Stolfo, Yonatan Belinkov, Mrinmaya Sachan