Universal Image

Universal image embedding research aims to create single models capable of representing and processing images across diverse domains and tasks, overcoming the limitations of domain-specific models. Current efforts focus on developing robust and efficient embedding models, often leveraging large language models (LLMs) and contrastive learning frameworks, to achieve high performance on various downstream applications like image retrieval, segmentation, and generation. This pursuit of universality is significant because it promises more efficient and adaptable AI systems, impacting fields ranging from medical image analysis to large-scale visual search.

Papers

July 17, 2024

E5-V: Universal Embeddings with Multimodal Large Language Models
Ting Jiang, Minghui Song, Zihan Zhang, Haizhen Huang, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang
Multimodal Large Language Model Multimodal Input Universal Image Multimodal Embeddings Unified Multimodal

July 11, 2024

Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending
Delong Wu, Hao Zhu, Qi Zhang, You Li, Zhan Ma, Xun Cao
Implicit Neural Representation Universal Image Continuous Time Neural Solver Poisson Surface Reconstruction Blending Method Target Preserving Blending

June 18, 2024

Universal Score-based Speech Enhancement with High Content Preservation
Robin Scheibler, Yusuke Fujita, Yuma Shirahata, Tatsuya Komatsu
Universal Image Enhanced Speech Enhancement Model Content Preservation Universal Speech Enhancement

June 14, 2024

Universal randomised signatures for generative time series modelling
Francesca Biagini, Lukas Gonon, Niklas Walter
Reservoir Computing Universal Image Neural Stochastic Differential Equation Adversarial Generation Handwritten Signature Path Signature Generative Time Series Randomized Signature

June 13, 2024

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning
Tairan He, Zhengyi Luo, Xialin He, Wenli Xiao, Chong Zhang, Weinan Zhang, Kris Kitani, Changliu Liu, Guanya Shi
LeArning Abstract Robot Teleoperation Universal Image Humanoid Teleoperation Humanoid Motion

June 7, 2024

USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation
Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren
Semantic Segmentation Greater Public Use Universal Image Open Vocabulary Segmentation Open Vocabulary Image Segmentation

June 6, 2024

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Junjie Zhou, Zheng Liu, Shitao Xiao, Bo Zhao, Yongping Xiong
Text Modality Multi Modal Visual Token Universal Image Spatial Attention Multi Modal Representation Multimodal Retrieval

June 3, 2024

Unseen Visual Anomaly Generation
Han Sun, Yunkang Cao, Hao Dong, Olga Fink
Anomaly Detection Universal Image Visual Anomaly Detection Cut and Approximate

May 30, 2024

AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization
Jiawei Chen, Xiao Yang, Zhengwei Fang, Yu Tian, Yinpeng Dong, Zhaoxia Yin, Hang Su
Large Language Model High Efficiency Optimization Purpose Jailbreak Attack Universal Image

May 28, 2024

April 24, 2024

URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression
Zhuoqun Li, Hongyu Lin, Tianshu Wang, Boxi Cao, Yaojie Lu, Weixiang Zhou, Hao Wang, Zhenyu Zeng, Le Sun, Xianpei Han
Universal Image Semantic Understanding Semantic Matching Coreference Information

April 11, 2024

April 9, 2024

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Daisuke Niizumi, Daiki Takeuchi, Yasunori Ohishi, Noboru Harada, Kunio Kashino
New Framework Audio Representation Universal Image Speech Domain Mask Pair Audio Pre Training

April 4, 2024

PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments
Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Guyue Zhou, Yixin Zhu, Hao Dong, Hao Zhao
Arbitrary Object Robotic Manipulation Affordance Learning Environment Feature Universal Image Pre Grasp

February 24, 2024

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models
Chaoya Jiang, Hongrui Jia, Wei Ye, Mengfan Dong, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang
Large Vision Language Model Universal Image Fine Grained Hallucination Level Hallucination

February 5, 2024

Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models
Michele Mastromattei, Fabio Massimo Zanzotto
Edge Pruning Universal Image Kernel Density Neural Network Pruning Unstructured Pruning Kernel Based Entropic Novelty

January 30, 2024

EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain
Wei Zhang, Miaoxin Cai, Tong Zhang, Yin Zhuang, Xuerui Mao
Remote Sensing Remote Sensing Image Multi Modal Large Language Model Universal Image Cross Modal Understanding Multi Modal Remote Sensing

December 19, 2023

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen
Domain Shift Universal Image Prompt Framework Cross Domain Retrieval Generalized Knowledge

Universal Image

Papers

E5-V: Universal Embeddings with Multimodal Large Language Models

Neural Poisson Solver: A Universal and Continuous Framework for Natural Signal Blending

Universal Score-based Speech Enhancement with High Content Preservation

Universal randomised signatures for generative time series modelling

OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning

USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval

Unseen Visual Anomaly Generation

AutoBreach: Universal and Adaptive Jailbreaking with Efficient Wordplay-Guided Optimization

Universal and Extensible Language-Vision Models for Organ Segmentation and Tumor Detection from Abdominal Computed Tomography

FlashST: A Simple and Universal Prompt-Tuning Framework for Traffic Prediction

URL: Universal Referential Knowledge Linking via Task-instructed Representation Compression

AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs

Learning Chemotherapy Drug Action via Universal Physics-Informed Neural Networks

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

Less is KEN: a Universal and Simple Non-Parametric Pruning Algorithm for Large Language Models

EarthGPT: A Universal Multi-modal Large Language Model for Multi-sensor Image Comprehension in Remote Sensing Domain

ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval