Visual Concept
Visual concept research focuses on how computers can understand and utilize the fundamental building blocks of visual information, enabling machines to interpret and generate images more effectively. Current research emphasizes disentangling and composing visual concepts using various deep learning architectures, including diffusion models, autoencoders, and vision-language models (VLMs), often incorporating techniques like prompt engineering and concept-based nearest neighbors for improved interpretability and robustness. This work is significant for advancing artificial intelligence, particularly in applications like image generation, object recognition, and medical image analysis, where understanding visual concepts is crucial for reliable and explainable performance.
Papers
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
Donghao Zhou, Jiancheng Huang, Jinbin Bai, Jiaze Wang, Hao Chen, Guangyong Chen, Xiaowei Hu, Pheng-Ann Heng
Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao, Jiaming Han, Changsheng Li, Yu-Feng Li, Xiangyu Yue