Semantic ID

Semantic IDs are compact, meaningful representations of data objects (e.g., items, nodes in a graph, documents) that aim to capture their semantic content within a discrete identifier. Current research focuses on generating these IDs using techniques like vector quantization of embeddings from neural networks (including graph neural networks and language models), often employing autoregressive decoding or generative models to create sequential representations. This work is driven by the need to improve the efficiency and generalization capabilities of systems like recommender systems and search engines, addressing limitations of traditional random hashing methods and enhancing performance on long-tail items and unseen data.

Papers