Chip Memory

Chip memory optimization is crucial for accelerating deep learning inference and training, particularly for large models like LLMs and CNNs, aiming to minimize off-chip memory accesses and maximize on-chip storage efficiency. Current research focuses on techniques like optimized scheduling algorithms, configurable memory hierarchies, and data compression methods (e.g., block floating point quantization, arithmetic coding) to reduce memory footprint and improve energy efficiency. These advancements are vital for enabling faster and more power-efficient AI applications across various domains, from mobile devices to high-performance computing in scientific applications.

Papers