Streaming Inference
Streaming inference focuses on efficiently processing data in real-time, as it arrives, rather than batch processing. Current research emphasizes developing methods to handle long sequences and non-stationary data for various models, including Transformers, Convolutional Neural Networks (CNNs), and probabilistic models like Dirichlet Process Mixture Models (DPMMs), often incorporating techniques like attention mechanisms and Kalman filtering for improved efficiency and accuracy. This field is crucial for deploying large models on resource-constrained devices and enabling real-time applications in areas such as autonomous driving, robotics, and personalized medicine, where immediate processing of continuous data streams is essential.
Papers
September 11, 2024
August 6, 2024
April 10, 2024
January 30, 2024
August 31, 2023
May 4, 2023
October 13, 2022
May 2, 2022
February 27, 2022
November 29, 2021