Model Inference

Model inference, the process of using a trained machine learning model to make predictions, is a critical area of research focusing on improving efficiency, accuracy, and robustness. Current efforts concentrate on mitigating issues like hallucinations in large language and vision-language models, optimizing resource allocation for both inference and retraining (especially in edge computing scenarios), and enhancing privacy and security during inference. These advancements are crucial for deploying machine learning models effectively in various applications, ranging from real-time IoT systems to large-scale data analysis, while addressing challenges related to computational cost, data heterogeneity, and model interpretability.

Papers

May 24, 2024

RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference
Lianming Huang, Shangyu Wu, Yufei Cui, Ying Xiong, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue
Training Free Efficient Inference Large Language Model Inference Model Inference Early Exiting

February 24, 2024

Selective Task offloading for Maximum Inference Accuracy and Energy efficient Real-Time IoT Sensing Systems
Abdelkarim Ben Sada, Amar Khelloufi, Abdenacer Naouri, Huansheng Ning, Sahraoui Dhelim
Energy Policy Research Inference Efficiency Internet of Thing Environment Model Inference Edge Inference Inference Service Probabilistic Learning

February 12, 2024

Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Ashish Shenoy, Yichao Lu, Srihari Jayakumar, Debojeet Chatterjee, Mohsen Moslehpour, Pierce Chuang, Abhay Harpale, Vikas Bhardwaj, Di Xu, Shicong Zhao, Longfang Zhao, Ankit Ramchandani, Xin Luna Dong, Anuj Kumar
Multimodal Large Language Model Multimodal LLM Scene Text Recognition Model Inference First Person

February 5, 2024

Verifiable evaluations of machine learning models using zkSNARKs
Tobin South, Alexander Camuto, Shrey Jain, Shayla Nguyen, Robert Mahari, Christian Paquin, Jason Morton, Alex 'Sandy' Pentland
Machine Learning Model Zero Knowledge Model Inference Faithful Model Property Attestation

January 16, 2024

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K., Panda
Model Inference Expert Parallelism Parallel Inference

October 16, 2023

Let's reward step by step: Step-Level reward model as the Navigators for Reasoning
Qianli Ma, Haotian Zhou, Tingkai Liu, Jianbo Yuan, Pengfei Liu, Yang You, Hongxia Yang
Reward Model Multi Step Reasoning Reasoning Path Sample STEP Model Inference Navigation Agent

October 9, 2023

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang, Lili Qiu
Large Language Model Language Model Complex Prompt Model Inference Prompt Compression Token Compression

August 8, 2023

Federated Inference with Reliable Uncertainty Quantification over Wireless Channels via Conformal Prediction
Meiyi Zhu, Matteo Zecchin, Sangwoo Park, Caili Guo, Chunyan Feng, Osvaldo Simeone
Conformal Prediction Exact Inference Model Inference Wireless Channel Noisy Communication

June 30, 2023

The Integer Linear Programming Inference Cookbook
Vivek Srikumar, Dan Roth
Integer Programming Model Inference Inference Problem

May 10, 2023

Fast Distributed Inference Serving for Large Language Models
Bingyang Wu, Yinmin Zhong, Zili Zhang, Shengyu Liu, Fangyue Liu, Yuanhang Sun, Gang Huang, Xuanzhe Liu, Xin Jin
Large Language Model Scientific Inference LLM Inference Model Inference Inference Service

April 7, 2023

Profiling the news spreading barriers using news headlines
Abdul Sittar, Dunja Mladenic, Marko Grobelnik
News Article Knowledge Barrier Important News Semantic Structure Model Inference Commonsense Inference Traditional Text Classification

March 16, 2023

PyVBMC: Efficient Bayesian inference in Python
Bobby Huggins, Chengkun Li, Marlon Tobaben, Mikko J. Aarnos, Luigi Acerbi
Model Inference Variational Monte Carlo Bayesian Model Selection Efficient Bayesian Inference

March 2, 2023

Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision
Siyuan Yan, Zhen Yu, Xuelin Zhang, Dwarikanath Mahapatra, Shekhar S. Chandra, Monika Janda, Peter Soyer, Zongyuan Ge
Deep Neural Network Full Model Black Box Model Skin Lesion Decision Relevant Information Model Inference Melanoma Diagnosis

December 20, 2022

Improving the Robustness of Summarization Models by Detecting and Removing Input Noise
Kundan Krishna, Yao Zhao, Jie Ren, Balaji Lakshminarayanan, Jiaming Luo, Mohammad Saleh, Peter J. Liu
Native Robustness Summarization Model Model Inference Abstractive Summarization Model Noisy Input

November 30, 2022

A Probabilistic-Logic based Commonsense Representation Framework for Modelling Inferences with Multiple Antecedents and Varying Likelihoods
Shantanu Jaiswal, Liu Yan, Dongkyu Choi, Kenneth Kwok
Knowledge Graph Commonsense Knowledge Key Factor Model Inference Marginal Likelihood Large Knowledge Graph Commonsense Knowledge Graph

September 28, 2022

July 23, 2022

RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances
Baolin Li, Rohan Basu Roy, Tirthak Patel, Vijay Gadepally, Karen Gettings, Devesh Tiwari
Scientific Inference Inference Cost Cost Effective Multiple Instance Model Inference Inference Service Diverse Body

May 26, 2022

Deep Active Learning with Noise Stability
Xingjian Li, Pengkun Yang, Yangcheng Gu, Xueying Zhan, Tianyang Wang, Min Xu, Chengzhong Xu
Active Learning Uncertainty Estimation Deep Active Learning Model Inference Data Uncertainty

November 29, 2021

SPATL: Salient Parameter Aggregation and Transfer Learning for Heterogeneous Clients in Federated Learning
Sixing Yu, Phuong Nguyen, Waqwoya Abebe, Wei Qian, Ali Anwar, Ali Jannesari
Transfer Learning Federated Learning Model Convergence Heterogeneous Client Model Inference Parameter Selection Local Inference

Model Inference

Papers

RAEE: A Robust Retrieval-Augmented Early Exiting Framework for Efficient Inference

Selective Task offloading for Maximum Inference Accuracy and Energy efficient Real-Time IoT Sensing Systems

Lumos : Empowering Multimodal LLMs with Scene Text Recognition

Verifiable evaluations of machine learning models using zkSNARKs

Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference

Let's reward step by step: Step-Level reward model as the Navigators for Reasoning

LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models

Federated Inference with Reliable Uncertainty Quantification over Wireless Channels via Conformal Prediction

The Integer Linear Programming Inference Cookbook

Fast Distributed Inference Serving for Large Language Models

Profiling the news spreading barriers using news headlines

PyVBMC: Efficient Bayesian inference in Python

Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision

Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

A Probabilistic-Logic based Commonsense Representation Framework for Modelling Inferences with Multiple Antecedents and Varying Likelihoods

MLink: Linking Black-Box Models from Multiple Domains for Collaborative Inference

InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference

RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances

Deep Active Learning with Noise Stability

SPATL: Salient Parameter Aggregation and Transfer Learning for Heterogeneous Clients in Federated Learning