Graph Tokenization

Graph tokenization aims to represent graph data, such as social networks or molecules, as sequences of tokens suitable for processing by machine learning models, particularly large language models. Current research focuses on developing tokenization methods that capture hierarchical structures within graphs and incorporate semantic information about nodes and relationships, often employing techniques like vector quantization and hierarchical transformers. These advancements improve the efficiency and performance of graph neural networks on various tasks, including node classification, graph classification, and combinatorial optimization, bridging the gap between graph and language modalities.

Papers