Zero Shot
Zero-shot learning aims to enable models to perform tasks on unseen data without any task-specific training, leveraging pre-trained knowledge to generalize to new situations. Current research focuses on improving zero-shot capabilities across diverse modalities (vision, language, audio) using large language models (LLMs), vision-language models (VLMs), and diffusion models, often incorporating techniques like chain-of-thought prompting, knowledge retrieval, and prompt engineering to enhance performance and interpretability. This field is significant because it promises more efficient and adaptable AI systems, impacting various applications from image editing and medical diagnosis to robotics and natural language processing.
Papers
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
Andrea Caraffa, Davide Boscaini, Amir Hamza, Fabio Poiesi
Towards Generalizable Zero-Shot Manipulation via Translating Human Interaction Plans
Homanga Bharadhwaj, Abhinav Gupta, Vikash Kumar, Shubham Tulsiani
SubZero: Subspace Zero-Shot MRI Reconstruction
Heng Yu, Yamin Arefeen, Berkin Bilgic
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han, Fangrui Zhu, Qianru Lao, Huaizu Jiang
Goal-conditioned Offline Planning from Curious Exploration
Marco Bagatella, Georg Martius
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation
Sitong Su, Litao Guo, Lianli Gao, Hengtao Shen, Jingkuan Song
InterControl: Zero-shot Human Interaction Generation by Controlling Every Joint
Zhenzhi Wang, Jingbo Wang, Yixuan Li, Dahua Lin, Bo Dai
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax
Yu Lu, Linchao Zhu, Hehe Fan, Yi Yang
GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?
Wenhao Wu, Huanjin Yao, Mengxi Zhang, Yuxin Song, Wanli Ouyang, Jingdong Wang
SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
Jiehong Lin, Lihua Liu, Dekun Lu, Kui Jia
Improving Adaptability and Generalizability of Efficient Transfer Learning for Vision-Language Models
Yongjin Yang, Jongwoo Ko, Se-Young Yun