Zero Shot
Zero-shot learning aims to enable models to perform tasks on unseen data without any task-specific training, leveraging pre-trained knowledge to generalize to new situations. Current research focuses on improving zero-shot capabilities across diverse modalities (vision, language, audio) using large language models (LLMs), vision-language models (VLMs), and diffusion models, often incorporating techniques like chain-of-thought prompting, knowledge retrieval, and prompt engineering to enhance performance and interpretability. This field is significant because it promises more efficient and adaptable AI systems, impacting various applications from image editing and medical diagnosis to robotics and natural language processing.
Papers
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation
Homanga Bharadhwaj, Debidatta Dwibedi, Abhinav Gupta, Shubham Tulsiani, Carl Doersch, Ted Xiao, Dhruv Shah, Fei Xia, Dorsa Sadigh, Sean Kirmani
StyleSinger 2: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control
Yu Zhang, Ziyue Jiang, Ruiqi Li, Changhao Pan, Jinzheng He, Rongjie Huang, Chuxin Wang, Zhou Zhao
In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Understanding
Moucheng Xu, Evangelos Chatzaroulas, Luc McCutcheon, Abdul Ahad, Hamzah Azeem, Janusz Marecki, Ammar Anwar
Zero-shot forecasting of chaotic systems
Yuanzhao Zhang, William Gilpin
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis
Zhiyong Chen, Xinnuo Li, Zhiqi Ai, Shugong Xu
M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Taowen Wang, Yiyang Liu, James Chenhao Liang, junhan zhao, Yiming Cui, Yuning Mao, Shaoliang Nie, Jiahao Liu, Fuli Feng, Zenglin Xu, Cheng Han, Lifu Huang, Qifan Wang, Dongfang Liu
XMoP: Whole-Body Control Policy for Zero-shot Cross-Embodiment Neural Motion Planning
Prabin Kumar Rath, Nakul Gopalan
Using Similarity to Evaluate Factual Consistency in Summaries
Yuxuan Ye, Edwin Simpson, Raul Santos Rodriguez
Exploring Fine-grained Retail Product Discrimination with Zero-shot Object Classification Using Vision-Language Models
Anil Osman Tur, Alessandro Conti, Cigdem Beyan, Davide Boscaini, Roberto Larcher, Stefano Messelodi, Fabio Poiesi, Elisa Ricci
Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL
Eduardo Pignatelli, Johan Ferret, Tim Rockäschel, Edward Grefenstette, Davide Paglieri, Samuel Coward, Laura Toni
A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification
Michel Olvera (S2A, LTCI, IDS), Paraskevas Stamatiadis (S2A, LTCI, IDS), Slim Essid (IDS, S2A, LTCI)
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models
EverestAI: Sijin Chen, Yuan Feng, Laipeng He, Tianwei He, Wendi He, Yanni Hu, Bin Lin, Yiting Lin, Pengfei Tan, Chengwei Tian, Chen Wang, Zhicheng Wang, Ruoye Xie, Jingjing Yin, Jianhao Ye, Jixun Yao, Quanlei Yan, Yuguang Yang
IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition
Rui Liu, Zahiruddin Mahammad, Amisha Bhaskar, Pratap Tokekar
One Map to Find Them All: Real-time Open-Vocabulary Mapping for Zero-shot Multi-Object Navigation
Finn Lukas Busch, Timon Homberger, Jesús Ortega-Peimbert, Quantao Yang, Olov Andersson
Prompt-and-Transfer: Dynamic Class-aware Enhancement for Few-shot Segmentation
Hanbo Bi, Yingchao Feng, Wenhui Diao, Peijin Wang, Yongqiang Mao, Kun Fu, Hongqi Wang, Xian Sun
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
Yinghao Aaron Li, Xilin Jiang, Cong Han, Nima Mesgarani