Near Memory

Near-memory computing (NMC) aims to accelerate machine learning by performing computations directly within or adjacent to memory, minimizing data movement and thus reducing latency and energy consumption. Current research focuses on applying NMC to large language models (LLMs), including transformers and mixture-of-experts architectures, as well as other deep learning models like neural radiance fields and convolutional neural networks. This approach shows significant promise for improving the speed and efficiency of training and inference, particularly for resource-intensive applications in areas such as natural language processing and computer vision, impacting both cloud computing and edge devices.

Papers