Semantic Token
Semantic tokens represent meaningful units of information extracted from various data modalities, such as images, audio, and text, aiming to improve the efficiency and quality of downstream tasks in AI. Current research focuses on developing novel tokenization methods, often integrated into transformer architectures, to capture richer semantic content and improve model performance in areas like image generation, speech synthesis, and recommendation systems. This work is significant because effective semantic tokenization enhances model interpretability, reduces computational costs, and improves the accuracy and robustness of AI systems across diverse applications.
Papers
DemoCraft: Using In-Context Learning to Improve Code Generation in Large Language Models
Nirmal Joshua Kapu, Mihit Sreejith
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem, Yongqin Xian, Jan Eric Lenssen, Liwei Wang, Federico Tombari, Bernt Schiele