Zero Shot
Zero-shot learning aims to enable models to perform tasks on unseen data without any task-specific training, leveraging pre-trained knowledge to generalize to new situations. Current research focuses on improving zero-shot capabilities across diverse modalities (vision, language, audio) using large language models (LLMs), vision-language models (VLMs), and diffusion models, often incorporating techniques like chain-of-thought prompting, knowledge retrieval, and prompt engineering to enhance performance and interpretability. This field is significant because it promises more efficient and adaptable AI systems, impacting various applications from image editing and medical diagnosis to robotics and natural language processing.
Papers
Track Anything Behind Everything: Zero-Shot Amodal Video Object Segmentation
Finlay G. C. Hudson, William A. P. Smith
Relation-Aware Meta-Learning for Zero-shot Sketch-Based Image Retrieval
Yang Liu, Jiale Du, Xinbo Gao, Jungong Han
CoDiff-VC: A Codec-Assisted Diffusion Model for Zero-shot Voice Conversion
Yuke Li, Xinfa Zhu, Hanzhao Li, JiXun Yao, WenJie Tian, YunLin Chen, YunLin Chen, Zhifei Li, Lei Xie
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation
Harsh Singh, Rocktim Jyoti Das, Mingfei Han, Preslav Nakov, Ivan Laptev
CoA: Chain-of-Action for Generative Semantic Labels
Meng Wei, Zhongnian Li, Peng Ying, Xinzheng Xu
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors
Zhengfei Kuang, Tianyuan Zhang, Kai Zhang, Hao Tan, Sai Bi, Yiwei Hu, Zexiang Xu, Milos Hasan, Gordon Wetzstein, Fujun Luan
Solving Zero-Shot 3D Visual Grounding as Constraint Satisfaction Problems
Qihao Yuan, Jiaming Zhang, Kailai Li, Rainer Stiefelhagen
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen, Zenghui Ding, Xianjun Yang, Yining Sun
StereoCrafter-Zero: Zero-Shot Stereo Video Generation with Noisy Restart
Jian Shi, Qian Wang, Zhenyu Li, Peter Wonka
Unlocking Historical Clinical Trial Data with ALIGN: A Compositional Large Language Model System for Medical Coding
Nabeel Seedat, Caterina Tozzi, Andrea Hita Ardiaca, Mihaela van der Schaar, James Weatherall, Adam Taylor
Hard-Synth: Synthesizing Diverse Hard Samples for ASR using Zero-Shot TTS and LLM
Jiawei Yu, Yuang Li, Xiaosong Qiao, Huan Zhao, Xiaofeng Zhao, Wei Tang, Min Zhang, Hao Yang, Jinsong Su
TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models
Xin Wang, Kai Chen, Jiaming Zhang, Jingjing Chen, Xingjun Ma