Paper ID: 2411.17426

CLOVer: Cross-Layer Orthonormal Vectors Adaption

Fanxu Meng, Muhan Zhang

To adapt a well-trained large model to downstream tasks, we propose constraining learning within its original latent space by leveraging linear combinations of its basis vectors. This approach ensures stable training without compromising the model's capabilities. Traditionally, constructing orthonormal bases from a matrix requires a transfer matrix, which significantly increases storage and computational overhead for parameters and feature maps. In this paper, we introduce Cross-Layer Orthonormal Vectors in Q, K, V, and O matrices, enabling their orthogonalization without the need for transfer matrices. Furthermore, the CLOVer operation eliminates redundant vectors, reducing the encoder attention parameters of Whisper-large-v3 by 46.42% without requiring additional training. For parameter-efficient and stable fine-tuning, we orthonormalized Q, K, V, and O and fine-tuned only the singular values, allowing efficient adaptation while constraining changes to the original latent space. When fine-tuning LLaMA-2-7B on eight commonsense reasoning datasets, our method outperforms LoRA by 5.4% and DoRA by 3.7%. CLOVer forgetting less previous knowledge when learning new knowledge.

Submitted: Nov 26, 2024