Memory Efficient Fine Tuning

Memory-efficient fine-tuning focuses on adapting large pre-trained language and vision models to specific downstream tasks while minimizing the computational resources and memory required. Current research emphasizes techniques like low-rank adaptation (LoRA), quantization (e.g., 2-bit, 4-bit), and selective parameter updates (e.g., freezing layers, using adapters), often combined with strategies like reversible networks or approximate backpropagation. These advancements are crucial for deploying large models on resource-constrained devices and making advanced AI accessible to a wider range of users and applications, reducing both the financial and environmental costs of training and inference.

Papers