Model Placement

Model placement, the strategic allocation of machine learning models across computing resources, aims to optimize training speed and efficiency while managing memory constraints. Current research focuses on developing efficient algorithms for placing large models, particularly deep learning architectures like convolutional neural networks and large language models, across distributed systems, including edge networks and clusters of GPUs. This involves exploring techniques like parameter sharing to improve storage efficiency and adaptive placement strategies to handle the varying computational demands of different model components, such as those found in Reinforcement Learning with Human Feedback (RLHF) pipelines. Improved model placement significantly impacts the scalability and practicality of training and deploying increasingly complex AI models.

Papers

December 16, 2024

Priority-Aware Model-Distributed Inference at Edge Networks
Teng Li, Hulya Seferoglu
Edge Network Distributed Inference Model Placement

October 14, 2024

SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization
Akrit Mudvari, Yuang Jiang, Leandros Tassiulas
Large Language Model Chatbot Response LLM Inference Worst Case User Throughput Collaborative Inference Inference Engine Model Placement

May 7, 2024

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks
Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang
AI Model Parameter Sharing Wireless Edge Forward Caching Model Placement

December 19, 2023

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
Youshao Xiao, Zhenglei Zhou, Fagui Mao, Weichang Wu, Shangchun Zhao, Lin Ju, Lei Liang, Xiaolu Zhang, Jun Zhou
Large Language Model Reinforcement Learning From Human Feedback Complex Instruction Model Placement

January 20, 2023

Baechi: Fast Device Placement of Machine Learning Graphs
Beomyeol Jeon, Linda Cai, Chirag Shetty, Pallavi Srivastava, Jintao Jiang, Xiaolan Ke, Yitao Meng, Cong Xie, Indranil Gupta
Graph Machine Learning Training Graph Model Parallelism Device Placement Model Placement

Model Placement

Papers

Priority-Aware Model-Distributed Inference at Edge Networks

SplitLLM: Collaborative Inference of LLMs for Model Placement and Throughput Optimization

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training

Baechi: Fast Device Placement of Machine Learning Graphs