Billion Parameter
Research on billion-parameter models focuses on developing and optimizing extremely large language models (LLMs) and other deep learning architectures for improved performance and efficiency. Current efforts concentrate on efficient training methods (like optimized parallelism and zeroth-order optimization), model compression techniques (reducing parameter count without significant performance loss), and innovative architectures (including MatMul-free designs and Mixture-of-Experts). These advancements are crucial for enabling the deployment of powerful AI models across diverse applications, from scientific discovery to mobile devices, while addressing challenges related to computational cost and memory limitations.
15papers
Papers
December 10, 2024
October 10, 2024
July 3, 2024
June 28, 2024
June 4, 2024
February 12, 2024
February 5, 2024
November 11, 2022
October 27, 2022