Fast Inference

Fast inference in machine learning aims to accelerate the process of obtaining predictions from complex models, addressing the computational bottleneck hindering the deployment of powerful models like large language models and vision transformers. Current research focuses on techniques such as speculative decoding, model compression (including pruning and quantization), and architectural innovations like mixture-of-experts and hierarchical attention mechanisms to achieve speedups. These advancements are crucial for deploying sophisticated AI models in resource-constrained environments and real-time applications, impacting fields ranging from natural language processing and computer vision to astrophysics and robotics.

Papers

January 19, 2023

Fast Inference in Denoising Diffusion Models via MMD Finetuning
Emanuele Aiello, Diego Valsesia, Enrico Magli
Diffusion Model Denoising Diffusion Model Fast Inference Fast Sampling Maximum Mean Discrepancy Accelerated Sampling

December 15, 2022

FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference
Michiel de Jong, Yury Zemlyanskiy, Joshua Ainslie, Nicholas FitzGerald, Sumit Sanghai, Fei Sha, William Cohen
Retrieval Augmented Fast Inference Retrieval Augmented Language Model Knowledge Intensive

November 30, 2022

Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan, Matan Kalman, Yossi Matias
Transformer Megatron Decepticons Autoregressive Model Speculative Decoding Fast Inference Efficient Model Speculative Execution

June 29, 2022

Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices
Amit Chaulwar, Lukas Malik, Maciej Krajewski, Felix Reichel, Leif-Nissen Lundbæk, Michael Huth, Bartlomiej Matejczyk
Knowledge Distillation Fast Inference High Compression Vocabulary Size Sentence Level Distillation Large Ranking Model

June 22, 2022

FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search
Patrick H. Chen, Chang Wei-cheng, Yu Hsiang-fu, Inderjit S. Dhillon, Hsieh Cho-jui
Fast Inference Graph Search Graph Based Approximate Nearest Neighbor K Nearest Neighbor Search X Ray Style Distance Graph Entropy Distance Computation

May 31, 2022

SymFormer: End-to-end symbolic regression using transformer-based architecture
Martin Vastl, Jonáš Kulhánek, Jiří Kubalík, Erik Derner, Robert Babuška
Gradient Descent Symbolic Regression Transformer Based Approach Fast Inference

May 26, 2022

Consistent and fast inference in compartmental models of epidemics using Poisson Approximate Likelihoods
Michael Whitehouse, Nick Whiteley, Lorenzo Rimella
Fast Inference Disease Outbreak Compartmental Model Approximate Likelihood

May 25, 2022

Fast Inference and Transfer of Compositional Task Structures for Few-shot Task Generalization
Sungryull Sohn, Hyunjae Woo, Jongwook Choi, lyubing qiang, Izzeddin Gur, Aleksandra Faust, Honglak Lee
Formality Transfer Hierarchical Reinforcement Learning Meta Reinforcement Learning Fast Inference Task Inference Task Graph Shot Generalization Task Structure

January 21, 2022

Unity Smoothing for Handling Inconsistent Evidence in Bayesian Networks and Unity Propagation for Faster Inference
Mads Lindskou, Torben Tvedebrink, Poul Svante Eriksen, Søren Højsgaard, Niels Morling
Bayesian Network Fast Inference Tree Based Inconsistency Detection Laplacian Learning Hierarchical Propagation

January 8, 2022

Low-Rank Constraints for Fast Inference in Structured Models
Justin T. Chiu, Yuntian Deng, Alexander M. Rush
Neural Network Participation Constraint Fast Inference Probabilistic Context Free Grammar Structured Distribution Probabilistic Representation Structured Model

November 11, 2021

On Efficient Uncertainty Estimation for Resource-Constrained Mobile Applications
Johanna Rock, Tiago Azevedo, René de Jong, Daniel Ruiz-Muñoz, Partha Maji
Uncertainty Estimation Deep Ensemble Predictive Uncertainty Fast Inference Efficient Uncertainty Estimation Robust Uncertainty