Early Exiting

Early exiting is a technique to accelerate inference in large models by allowing computations to stop at intermediate layers, rather than always processing the full model depth. Current research focuses on developing efficient methods for determining optimal exit points, including approaches based on retrieval augmentation, adaptive threshold estimation using mixture models, and even simple hash functions to assign instances to specific exit layers. This optimization strategy is particularly relevant for resource-intensive models like diffusion models and large language models, offering significant improvements in speed and efficiency without substantial performance loss across various tasks, including image generation, text processing, and noise suppression.

Papers