Paper ID: 2402.04032

ProactivePIM: Accelerating Weight-Sharing Embedding Layer with PIM for Scalable Recommendation System

Youngsuk Kim, Junghwan Lim, Hyuk-Jae Lee, Chae Eun Rhee

The personalized recommendation system's continuous size growth poses new challenges for model inference. Although weight-sharing algorithms have been proposed to reduce embedding table capacity, they increase memory access. Recent advancements in processing-in-memory (PIM) successfully enhance the recommendation system's throughput by exploiting memory parallelism, but our analysis shows that those algorithms introduce CPU-PIM communication overhead into prior PIM systems, compromising the PIM throughput. We propose ProactivePIM, a specialized memory architecture integrated with PIM technology tailored to accelerate the weight-sharing algorithms. ProacitvePIM integrates an SRAM cache within the PIM with an efficient prefetching scheme to leverage a unique locality of the algorithm and eliminate CPU-PIM communication.

Submitted: Feb 6, 2024