Hybrid Inference
Hybrid inference combines the strengths of different inference methods, aiming to improve efficiency, accuracy, or security in various applications. Current research focuses on optimizing resource allocation in large language models (LLMs) by selectively using smaller, less expensive models alongside powerful but costly cloud-based LLMs, developing improved algorithms for probabilistic programming and particle filtering, and designing secure hybrid architectures to protect intellectual property in edge-deployed models. These advancements are significant for reducing computational costs, enhancing the performance of complex systems, and enabling secure deployment of sensitive models in resource-constrained environments.
Papers
September 15, 2024
August 21, 2024
July 15, 2024
April 22, 2024
December 16, 2023
February 7, 2023
May 5, 2022