Capacity Gap

Capacity gap, the disparity between the capabilities of a larger, more complex model (teacher) and a smaller, more efficient model (student), is a central challenge in machine learning. Current research focuses on bridging this gap through techniques like knowledge distillation, where knowledge is transferred from teacher to student, often employing methods such as Monte Carlo Tree Search or incorporating modules like Mixture of Experts to optimize resource allocation and improve student performance. Addressing capacity gaps is crucial for deploying powerful models in resource-constrained environments and improving the efficiency of various applications, ranging from image segmentation and language modeling to resource allocation in operational settings.

Papers