LLM Deployment

Deploying large language models (LLMs) efficiently and responsibly is a major focus of current research, driven by the models' high computational demands and potential for errors. Key areas of investigation include optimizing energy efficiency during inference, developing methods for model compression and faster inference, and creating techniques for enhancing model accountability and transparency, such as metacognitive approaches for error detection and correction. These efforts aim to make LLMs more accessible, sustainable, and trustworthy for a wider range of applications, impacting both scientific advancements and practical deployment across various sectors.

Papers

August 6, 2024

LLM-Empowered Resource Allocation in Wireless Communications Systems
Woongsup Lee, Jeonghun Park
Resource Allocation Wireless System LLM Generated Solution LLM Deployment

March 29, 2024

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference
Jovan Stojkovic, Esha Choukse, Chaojie Zhang, Inigo Goiri, Josep Torrellas
LLM Inference Energy Efficiency Data Center LLM Deployment

March 8, 2024

Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach
Zhen Tan, Jie Peng, Tianlong Chen, Huan Liu
Large Language Model Natural Language Processing Task Human Cognition Metacognitive Difficulty LLM Deployment Metacognitive Approach

February 2, 2024

Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward
Arnav Chavan, Raghav Magazine, Shubham Kushwaha, Mérouane Debbah, Deepak Gupta
Timely Survey Model Compression LLM Inference Current Challenge Lightweight LLM Way Forward System Optimization LLM Deployment

January 29, 2024

Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report
YuHe Ke, Liyuan Jin, Kabilan Elangovan, Hairil Rizal Abdullah, Nan Liu, Alex Tiong Heng Sia, Chai Rick Soh, Joshua Yi Min Tung, Jasmine Chiat Ling Ong, Daniel Shu Wei Ting
Case Study Retrieval Augmented Generation Development Activity Level Test LLM Deployment

December 23, 2023

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao, Gabriele Oliaro, Zhihao Zhang, Xinhao Cheng, Hongyi Jin, Tianqi Chen, Zhihao Jia
Language Model Artificial Intelligence Timely Survey Practical Algorithm System Description Generative Large Language Model Efficient LLM LLM Deployment

September 3, 2023

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs
Zhenheng Tang, Yuxin Wang, Xin He, Longteng Zhang, Xinglin Pan, Qiang Wang, Rongfei Zeng, Kaiyong Zhao, Shaohuai Shi, Bingsheng He, Xiaowen Chu
Single GPU Decentralized Training GPU Memory LLM Deployment Consumer Level GPUs

LLM Deployment

Papers

LLM-Empowered Resource Allocation in Wireless Communications Systems

Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference

Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach

Faster and Lighter LLMs: A Survey on Current Challenges and Way Forward

Development and Testing of Retrieval Augmented Generation in Large Language Models -- A Case Study Report

Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems

FusionAI: Decentralized Training and Deploying LLMs with Massive Consumer-Level GPUs