Partial Execution
Partial execution, the strategy of performing computations on only a portion of data or a model, is gaining traction across diverse fields. Current research focuses on optimizing its application in large language model (LLM) serving, where it reduces latency by concurrently executing tools and decoding, and in deep learning for resource-constrained devices, improving memory efficiency and enabling on-device inference. This technique also shows promise in mitigating backdoor attacks in neural networks and enhancing Bayesian optimization by selectively evaluating function networks, ultimately improving efficiency and performance in various computational tasks.
Papers
August 8, 2024
May 29, 2024
May 26, 2024
November 3, 2023
June 29, 2023
March 16, 2023