Resource Intensive LLM
Resource-intensive Large Language Models (LLMs) present a significant challenge due to their high computational demands during both training and inference. Current research focuses on improving efficiency through techniques like retrieval-augmented generation (RAG), developing linear-scaling architectures for streaming applications, and creating hybrid systems that combine smaller, less expensive models with larger LLMs only when necessary. These efforts aim to make LLMs more accessible and sustainable, impacting both scientific research by enabling broader experimentation and practical applications by reducing deployment costs and improving user experience.
Papers
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models
Shayekh Bin Islam, Md Asib Rahman, K S M Tozammel Hossain, Enamul Hoque, Shafiq Joty, Md Rizwan Parvez
Efficient Streaming LLM for Speech Recognition
Junteng Jia, Gil Keren, Wei Zhou, Egor Lakomkin, Xiaohui Zhang, Chunyang Wu, Frank Seide, Jay Mahadeokar, Ozlem Kalinli