Processing in Memory

Processing-in-memory (PIM) aims to accelerate computation by performing operations directly within memory, thereby minimizing data movement bottlenecks that limit the performance of modern computing architectures. Current research focuses on optimizing PIM for various machine learning models, including deep learning recommendation models (DLRMs), large language models (LLMs), convolutional neural networks (CNNs), and graph neural networks (GNNs), often employing techniques like mixture-of-experts and specialized data partitioning strategies. This approach holds significant promise for improving the energy efficiency and speed of data-intensive applications across diverse fields, from recommendation systems and natural language processing to database management and scientific computing.

Papers