Image Tokenizer
Image tokenization is the process of converting images into discrete representations, or tokens, suitable for processing by machine learning models, particularly those based on transformer architectures. Current research focuses on developing tokenizers that improve the efficiency and effectiveness of image generation and understanding tasks, exploring various approaches such as Byte-Pair Encoding, superpixel segmentation, and wavelet transformations, often within the context of Vision Transformers and multimodal large language models. These advancements are crucial for improving the performance and scalability of numerous applications, including image generation, object detection, and autonomous driving, by enabling more efficient and semantically meaningful processing of visual data.
Papers
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang, Borui Zhang, Yueqi Duan, Jiwen Lu
Pre-trained Language Models Do Not Help Auto-regressive Text-to-Image Generation
Yuhui Zhang, Brandon McKinzie, Zhe Gan, Vaishaal Shankar, Alexander Toshev