Zero Shot
Zero-shot learning aims to enable models to perform tasks on unseen data without any task-specific training, leveraging pre-trained knowledge to generalize to new situations. Current research focuses on improving zero-shot capabilities across diverse modalities (vision, language, audio) using large language models (LLMs), vision-language models (VLMs), and diffusion models, often incorporating techniques like chain-of-thought prompting, knowledge retrieval, and prompt engineering to enhance performance and interpretability. This field is significant because it promises more efficient and adaptable AI systems, impacting various applications from image editing and medical diagnosis to robotics and natural language processing.
Papers
Crafting Narrative Closures: Zero-Shot Learning with SSM Mamba for Short Story Ending Generation
Divyam Sharma, Divya Santhanam
Zero-Shot Fact Verification via Natural Logic and Large Language Models
Marek Strong, Rami Aly, Andreas Vlachos
MultiVerse: Efficient and Expressive Zero-Shot Multi-Task Text-to-Speech
Taejun Bak, Youngsik Eom, SeungJae Choi, Young-Sun Joo
Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers
Shijie Chen, Bernal Jiménez Gutiérrez, Yu Su
Plots Unlock Time-Series Understanding in Multimodal Models
Mayank Daswani, Mathias M.J. Bellaiche, Marc Wilson, Desislav Ivanov, Mikhail Papkov, Eva Schnider, Jing Tang, Kay Lamerigts, Gabriela Botea, Michael A. Sanchez, Yojan Patel, Shruthi Prabhakara, Shravya Shetty, Umesh Telang
FreeMask: Rethinking the Importance of Attention Masks for Zero-Shot Video Editing
Lingling Cai, Kang Zhao, Hangjie Yuan, Yingya Zhang, Shiwei Zhang, Kejie Huang
VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Ruotong Liao, Max Erler, Huiyu Wang, Guangyao Zhai, Gengyuan Zhang, Yunpu Ma, Volker Tresp
Analysing Zero-Shot Readability-Controlled Sentence Simplification
Abdullah Barayan, Jose Camacho-Collados, Fernando Alva-Manchego
TROPE: TRaining-Free Object-Part Enhancement for Seamlessly Improving Fine-Grained Zero-Shot Image Captioning
Joshua Feinglass, Yezhou Yang
PALM: Few-Shot Prompt Learning for Audio Language Models
Asif Hanif, Maha Tufail Agro, Mohammad Areeb Qazi, Hanan Aldarmaki
MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation
Lijian Xu, Hao Sun, Ziyu Ni, Hongsheng Li, Shaoting Zhang
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Jing He, Haodong Li, Wei Yin, Yixun Liang, Leheng Li, Kaiqiang Zhou, Hongbo Zhang, Bingbing Liu, Ying-Cong Chen
FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction
Runze He, Kai Ma, Linjiang Huang, Shaofei Huang, Jialin Gao, Xiaoming Wei, Jiao Dai, Jizhong Han, Si Liu
IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning
Soeun Lee, Si-Woo Kim, Taewhan Kim, Dong-Jin Kim
Few-shot Pairwise Rank Prompting: An Effective Non-Parametric Retrieval Model
Nilanjan Sinhababu, Andrew Parry, Debasis Ganguly, Debasis Samanta, Pabitra Mitra
AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status
Jinghao Zhang, Wen Qian, Hao Luo, Fan Wang, Feng Zhao
T3: A Novel Zero-shot Transfer Learning Framework Iteratively Training on an Assistant Task for a Target Task
Xindi Tong, Yujin Zhu, Shijian Fan, Liang Xu
Leveraging Semantic and Geometric Information for Zero-Shot Robot-to-Human Handover
Jiangshan Liu, Wenlong Dong, Jiankun Wang, Max Q.-H. Meng