Forward Caching
Forward caching aims to improve efficiency by proactively storing data closer to where it's needed, reducing latency and bandwidth consumption. Current research focuses on optimizing caching strategies for various applications, including large language models (using techniques like RazorAttention for efficient Key-Value cache compression), diffusion transformers (leveraging the repetitive nature of the diffusion process), and wireless edge networks (employing parameter-sharing and online gradient-based methods). These advancements are significant for improving the performance of AI models, accelerating real-time applications, and enhancing the efficiency of data-intensive systems across diverse fields.
Papers
November 8, 2024
October 29, 2024
September 19, 2024
July 22, 2024
July 1, 2024
May 7, 2024
May 2, 2024
April 30, 2024
April 9, 2024
February 29, 2024
February 27, 2024
July 5, 2023
June 3, 2023
May 4, 2023
December 26, 2022
November 22, 2022
November 8, 2022
October 31, 2022
May 11, 2022