Zero Shot
Zero-shot learning aims to enable models to perform tasks on unseen data without any task-specific training, leveraging pre-trained knowledge to generalize to new situations. Current research focuses on improving zero-shot capabilities across diverse modalities (vision, language, audio) using large language models (LLMs), vision-language models (VLMs), and diffusion models, often incorporating techniques like chain-of-thought prompting, knowledge retrieval, and prompt engineering to enhance performance and interpretability. This field is significant because it promises more efficient and adaptable AI systems, impacting various applications from image editing and medical diagnosis to robotics and natural language processing.
Papers
From Zero to Hero: Cold-Start Anomaly Detection
Tal Reiss, George Kour, Naama Zwerdling, Ateret Anaby-Tavor, Yedid Hoshen
Streaming Video Diffusion: Online Video Editing with Diffusion Models
Feng Chen, Zhen Yang, Bohan Zhuang, Qi Wu
View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields
Haodi He, Colton Stearns, Adam W. Harley, Leonidas J. Guibas
TIMA: Text-Image Mutual Awareness for Balancing Zero-Shot Adversarial Robustness and Generalization Ability
Fengji Ma, Li Liu, Hei Victor Cheng
Listenable Maps for Zero-Shot Audio Classifiers
Francesco Paissan, Luca Della Libera, Mirco Ravanelli, Cem Subakan
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
ZeMing Gong, Austin T. Wang, Xiaoliang Huo, Joakim Bruslund Haurum, Scott C. Lowe, Graham W. Taylor, Angel X. Chang
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models
Abdelrahman Abdelhamed, Mahmoud Afifi, Alec Go
Open-Vocabulary SAM3D: Towards Training-free Open-Vocabulary 3D Scene Understanding
Hanchen Tai, Qingdong He, Jiangning Zhang, Yijie Qian, Zhenyu Zhang, Xiaobin Hu, Xiangtai Li, Yabiao Wang, Yong Liu
Blaze3DM: Marry Triplane Representation with Diffusion for 3D Medical Inverse Problem Solving
Jia He, Bonan Li, Ge Yang, Ziwen Liu
Pre-Trained Vision-Language Models as Partial Annotators
Qian-Wei Wang, Yuqiu Xie, Letian Zhang, Zimo Liu, Shu-Tao Xia
CLIPScope: Enhancing Zero-Shot OOD Detection with Bayesian Scoring
Hao Fu, Naman Patel, Prashanth Krishnamurthy, Farshad Khorrami
Time-FFM: Towards LM-Empowered Federated Foundation Model for Time Series Forecasting
Qingxiang Liu, Xu Liu, Chenghao Liu, Qingsong Wen, Yuxuan Liang