Low Latency
Low latency, the minimization of delay in information processing, is a critical objective across diverse fields, driving research into efficient algorithms and hardware architectures. Current efforts focus on optimizing large language models (LLMs) for faster inference through techniques like speculative decoding and efficient resource allocation on GPUs, as well as developing low-latency solutions for speech processing, image recognition, and other real-time applications using spiking neural networks and specialized hardware like FPGAs. Achieving low latency is crucial for enabling real-time responsiveness in applications ranging from autonomous vehicles and interactive virtual reality to hearing aids and industrial IoT systems, significantly impacting performance and user experience.
Papers
Momentum-based Distributed Resource Scheduling Optimization Subject to Sector-Bound Nonlinearity and Latency
Mohammadreza Doostmohammadian, Zulfiya R. Gabidullina, Hamid R. RabieeAdaptive UAV-Assisted Hierarchical Federated Learning: Optimizing Energy, Latency, and Resilience for Dynamic Smart IoT
Xiaohong Yang, Minghui Liwang, Liqun Fu, Yuhan Su, Seyyedali Hosseinalipour, Xianbin Wang, Yiguang HongXiamen University●Tongji University●University at Buffalo-SUNY●Western University