Contextual Sparsity

Contextual sparsity aims to improve the efficiency of large language models (LLMs) by selectively activating only necessary parts of the network based on the input context, thus reducing computational cost without significant accuracy loss. Current research focuses on developing accurate predictors of these sparsity patterns, employing techniques like neural networks and novel activation functions, and designing efficient algorithms and hardware implementations to exploit the sparsity. This approach holds significant promise for deploying LLMs on resource-constrained devices and accelerating inference times, impacting both the scalability of AI applications and the accessibility of advanced language models.

Papers

December 1, 2024

Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Wenxuan Huang, Zijie Zhai, Yunhang Shen, Shaosheng Cao, Fei Zhao, Xiangfeng Xu, Zheyu Ye, Shaohui Lin
Multimodal Large Language Model Contextual Sparsity

September 5, 2024

Sirius: Contextual Sparsity with Correction for Efficient LLMs
Yang Zhou, Zhuoming Chen, Zhaozhuo Xu, Victoria Lin, Beidi Chen
Large Language Model Language Correction Inference Efficiency Sparse Model Contextual Sparsity

August 26, 2024

Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express
Cherag Aroraa, Tracy Holloway King, Jayant Kumar, Yi Lu, Sanat Sharma, Arvind Srikantan, David Uvalle, Josep Valls-Vargas, Harsha Vardhan
Modal Embeddings Multimodal Search Contextual Sparsity

June 24, 2024

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
Yash Akhauri, Ahmed F AbouElhamayed, Jordan Dotzel, Zhiru Zhang, Alexander M Rush, Safeen Huda, Mohamed S Abdelfattah
Large Language Model Sparsity Pattern Contextual Sparsity

April 12, 2024

CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models
Donghyun Lee, Je-Yong Lee, Genghan Zhang, Mo Tiwari, Azalia Mirhoseini
Sparsity Increase Inference Cost Activation Sparsity Automatic Thresholding Contextual Sparsity

October 26, 2023

Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen
Large Language Model LLM Inference Inference Latency Inference Time Dense Model Contextual Sparsity Deja Vu