Scientific Inference
Scientific inference, the process of drawing conclusions from data, is a core challenge across numerous scientific fields, with current research focusing on improving efficiency and accuracy. This involves developing novel algorithms and architectures, such as those based on Bayesian networks, diffusion transformers, and autoregressive models, to optimize inference processes in various contexts, including large language models and image processing. These advancements are crucial for accelerating scientific discovery and enabling real-world applications in areas like personalized medicine, legal tech, and industrial automation, where efficient and reliable inference is paramount. The emphasis is on addressing computational bottlenecks and improving the reliability of inferences, particularly in scenarios with limited data or complex models.
Papers
PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference
Jiarui Fang, Jinzhe Pan, Jiannan Wang, Aoyu Li, Xibo Sun
Distributed Speculative Inference of Large Language Models is Provably Faster
Nadav Timor, Jonathan Mamou, Daniel Korat, Moshe Berchansky, Oren Pereg, Moshe Wasserblat, Tomer Galanti, Michal Gordon, David Harel
Bayesian Prediction-Powered Inference
R. Alex Hofer, Joshua Maynez, Bhuwan Dhingra, Adam Fisch, Amir Globerson, William W. Cohen
Experimental Pragmatics with Machines: Testing LLM Predictions for the Inferences of Plain and Embedded Disjunctions
Polina Tsvilodub, Paul Marty, Sonia Ramotowska, Jacopo Romoli, Michael Franke
Accelerating Production LLMs with Combined Token/Embedding Speculators
Davis Wertheimer, Joshua Rosenkranz, Thomas Parnell, Sahil Suneja, Pavithra Ranganathan, Raghu Ganti, Mudhakar Srivatsa
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis
Andy He, Darren Key, Mason Bulling, Andrew Chang, Skyler Shapiro, Everett Lee
Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment
Shanle Yao, Ghazal Alinezhad Noghre, Armin Danesh Pazho, Hamed Tabkhi
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
Xuezhe Ma, Xiaomeng Yang, Wenhan Xiong, Beidi Chen, Lili Yu, Hao Zhang, Jonathan May, Luke Zettlemoyer, Omer Levy, Chunting Zhou
AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees
William Fleshman, Aleem Khan, Marc Marone, Benjamin Van Durme