Billion Parameter
Research on billion-parameter models focuses on developing and optimizing extremely large language models (LLMs) and other deep learning architectures for improved performance and efficiency. Current efforts concentrate on efficient training methods (like optimized parallelism and zeroth-order optimization), model compression techniques (reducing parameter count without significant performance loss), and innovative architectures (including MatMul-free designs and Mixture-of-Experts). These advancements are crucial for enabling the deployment of powerful AI models across diverse applications, from scientific discovery to mobile devices, while addressing challenges related to computational cost and memory limitations.
Papers
October 10, 2024
September 30, 2024
July 3, 2024
July 1, 2024
June 28, 2024
June 4, 2024
April 29, 2024
February 12, 2024
February 5, 2024
January 4, 2024
October 25, 2023
May 27, 2023
May 19, 2023
November 11, 2022
October 27, 2022