Quantization Technique
Quantization techniques aim to reduce the memory footprint and computational cost of deep learning models by representing their weights and activations using fewer bits, thereby accelerating inference and enabling deployment on resource-constrained devices. Current research focuses on developing novel quantization algorithms for various architectures, including large language models (LLMs), diffusion models, and vision transformers, often employing strategies like post-training quantization (PTQ) and quantization-aware training (QAT) to minimize accuracy loss. This area is crucial for advancing the practical applicability of increasingly complex deep learning models across diverse fields, from natural language processing and image generation to speech recognition and computer vision.
Papers
Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets
Lu Zeng, Sree Hari Krishnan Parthasarathi, Yuzong Liu, Alex Escott, Santosh Kumar Cheekatmalla, Nikko Strom, Shiv Vitaladevuni
DiverGet: A Search-Based Software Testing Approach for Deep Neural Network Quantization Assessment
Ahmed Haj Yahmed, Houssem Ben Braiek, Foutse Khomh, Sonia Bouzidi, Rania Zaatour
Learning Representations for CSI Adaptive Quantization and Feedback
Valentina Rizzello, Matteo Nerini, Michael Joham, Bruno Clerckx, Wolfgang Utschick