Low Latency
Low latency, the minimization of delay in information processing, is a critical objective across diverse fields, driving research into efficient algorithms and hardware architectures. Current efforts focus on optimizing large language models (LLMs) for faster inference through techniques like speculative decoding and efficient resource allocation on GPUs, as well as developing low-latency solutions for speech processing, image recognition, and other real-time applications using spiking neural networks and specialized hardware like FPGAs. Achieving low latency is crucial for enabling real-time responsiveness in applications ranging from autonomous vehicles and interactive virtual reality to hearing aids and industrial IoT systems, significantly impacting performance and user experience.
Papers
Towards Interactively Improving ML Data Preparation Code via "Shadow Pipelines"
Stefan Grafberger, Paul Groth, Sebastian Schelter
Deep low-latency joint speech transmission and enhancement over a gaussian channel
Mohammad Bokaei, Jesper Jensen, Simon Doclo, Jan Østergaard
PEFSL: A deployment Pipeline for Embedded Few-Shot Learning on a FPGA SoC
Lucas Grativol Ribeiro, Lubin Gauthier, Mathieu Leonardon, Jérémy Morlier, Antoine Lavrard-Meyer, Guillaume Muller, Virginie Fresse, Matthieu Arzel