Efficient LLM
Efficient Large Language Model (LLM) research focuses on reducing the computational cost and memory footprint of LLMs while maintaining or improving performance. Current efforts concentrate on optimizing model architectures (e.g., exploring Mixture of Experts and novel attention mechanisms), improving inference serving systems through dynamic scheduling and efficient memory management (like virtual tensor management), and developing more efficient training and fine-tuning strategies. These advancements are crucial for broadening LLM accessibility, enabling deployment on resource-constrained devices, and reducing the environmental impact of large-scale AI.
Papers
October 28, 2024
October 23, 2024
September 6, 2024
July 22, 2024
June 5, 2024
May 10, 2024
May 9, 2024
April 24, 2024
March 22, 2024
March 4, 2024
March 1, 2024
December 23, 2023
December 1, 2023
October 24, 2023