Processing Unit
Processing units, particularly neural processing units (NPUs), are central to accelerating machine learning (ML) inference, especially in resource-constrained environments like edge devices and cloud platforms. Current research emphasizes optimizing NPU architectures and algorithms for efficient parallel processing of diverse ML workloads, including large language models and convolutional neural networks, focusing on techniques like model compression, heterogeneous computing, and efficient resource allocation to minimize latency and power consumption. These advancements are crucial for enabling real-time AI applications in various sectors, from autonomous vehicles to mobile computing, by improving both performance and energy efficiency.
Papers
November 4, 2024
October 14, 2024
October 10, 2024
September 23, 2024
August 15, 2024
August 7, 2024
July 8, 2024
June 4, 2024
April 23, 2024
April 16, 2024
April 14, 2024
March 21, 2024
November 6, 2023
September 16, 2023
April 6, 2023
March 15, 2023
February 18, 2023
November 22, 2022
November 10, 2022