Quantization Search
Quantization search aims to optimize the efficiency of deep neural networks (DNNs) by reducing the precision of their numerical representations, thereby decreasing computational cost and memory footprint. Current research focuses on developing efficient algorithms to automatically find optimal quantization strategies, often employing neural architecture search techniques and exploring mixed-precision (integer and floating-point) approaches across various architectures, including convolutional neural networks and transformers. These advancements are significant because they enable the deployment of high-performing DNNs on resource-constrained devices, impacting fields like mobile computing, embedded systems, and edge AI.
Papers
August 1, 2024
April 10, 2024
December 19, 2023
August 7, 2023
March 15, 2023
October 16, 2022
July 9, 2022