Text Encoder
Text encoders are crucial components in many AI systems, transforming textual data into numerical representations suitable for machine learning. Current research focuses on improving their performance in various applications, particularly within text-to-image generation, where advancements involve fine-tuning pre-trained models like CLIP or integrating large language models (LLMs) to enhance accuracy and controllability. These improvements are significant because effective text encoders are essential for bridging the gap between human language and machine understanding, impacting fields ranging from image generation and retrieval to personalized healthcare and legal technology.
Papers
KKLIP: Knowledge Distillation Exploiting K-means Clustering for Language-Image Pre-Training
Kuei-Chun Kao
Implicit Priors Editing in Stable Diffusion via Targeted Token Adjustment
Feng He, Chao Zhang, Zhixue Zhao
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Shuai Tan, Biao Gong, Yutong Feng, Kecheng Zheng, Dandan Zheng, Shuwei Shi, Yujun Shen, Jingdong Chen, Ming Yang
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Enze Xie, Junsong Chen, Junyu Chen, Han Cai, Haotian Tang, Yujun Lin, Zhekai Zhang, Muyang Li, Ligeng Zhu, Yao Lu, Song Han
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Xiangru Zhu, Penglei Sun, Yaoxian Song, Yanghua Xiao, Zhixu Li, Chengyu Wang, Jun Huang, Bei Yang, Xiaoxiao Xu
Fast-and-Frugal Text-Graph Transformers are Effective Link Predictors
Andrei C. Coman, Christos Theodoropoulos, Marie-Francine Moens, James Henderson
MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers
Zichao Dong, Yilin Zhang, Xufeng Huang, Hang Ji, Zhan Shi, Xin Zhan, Junbo Chen