Zero Shot
Zero-shot learning aims to enable models to perform tasks on unseen data without any task-specific training, leveraging pre-trained knowledge to generalize to new situations. Current research focuses on improving zero-shot capabilities across diverse modalities (vision, language, audio) using large language models (LLMs), vision-language models (VLMs), and diffusion models, often incorporating techniques like chain-of-thought prompting, knowledge retrieval, and prompt engineering to enhance performance and interpretability. This field is significant because it promises more efficient and adaptable AI systems, impacting various applications from image editing and medical diagnosis to robotics and natural language processing.
Papers
PRISM: A Multi-Modal Generative Foundation Model for Slide-Level Histopathology
George Shaikovski, Adam Casson, Kristen Severson, Eric Zimmermann, Yi Kan Wang, Jeremy D. Kunz, Juan A. Retamero, Gerard Oakley, David Klimstra, Christopher Kanan, Matthew Hanna, Michal Zelechowski, Julian Viret, Neil Tenenholtz, James Hall, Nicolo Fusi, Razik Yousfi, Peter Hamilton, William A. Moye, Eugene Vorontsov, Siqi Liu, Thomas J. Fuchs
Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation
Bo Zhang, Hui Ma, Jian Ding, Jian Wang, Bo Xu, Hongfei Lin
LaT-PFN: A Joint Embedding Predictive Architecture for In-context Time-series Forecasting
Stijn Verdenius, Andrea Zerio, Roy L. M. Wang
Revisiting Deep Audio-Text Retrieval Through the Lens of Transportation
Manh Luong, Khai Nguyen, Nhat Ho, Reza Haf, Dinh Phung, Lizhen Qu
Zero-Shot Hierarchical Classification on the Common Procurement Vocabulary Taxonomy
Federico Moiraghi, Matteo Palmonari, Davide Allavena, Federico Morando
A Minimalist Prompt for Zero-Shot Policy Learning
Meng Song, Xuezhi Wang, Tanay Biradar, Yao Qin, Manmohan Chandraker
RoboHop: Segment-based Topological Map Representation for Open-World Visual Navigation
Sourav Garg, Krishan Rana, Mehdi Hosseinzadeh, Lachlan Mares, Niko Sünderhauf, Feras Dayoub, Ian Reid
LangCell: Language-Cell Pre-training for Cell Identity Understanding
Suyuan Zhao, Jiahuan Zhang, Yushuai Wu, Yizhen Luo, Zaiqing Nie
Joint semi-supervised and contrastive learning enables zero-shot domain-adaptation and multi-domain segmentation
Alvaro Gomariz, Yusuke Kikuchi, Yun Yvonna Li, Thomas Albrecht, Andreas Maunz, Daniela Ferrara, Huanxiang Lu, Orcun Goksel
Zero-shot LLM-guided Counterfactual Generation for Text
Amrita Bhattacharjee, Raha Moraffah, Joshua Garland, Huan Liu
CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification
Sankalp Sinha, Muhammad Saif Ullah Khan, Talha Uddin Sheikh, Didier Stricker, Muhammad Zeshan Afzal
Tilt your Head: Activating the Hidden Spatial-Invariance of Classifiers
Johann Schmidt, Sebastian Stober
Quantifying the Capabilities of LLMs across Scale and Precision
Sher Badshah, Hassan Sajjad
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation
Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta, Shubham Tulsiani
Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores
Kiyoon Jeong, Woojun Lee, Woongchan Nam, Minjeong Ma, Pilsung Kang