Unimodal Model

Unimodal models, focusing on single data modalities (e.g., text or images), are being increasingly leveraged to build and improve multimodal models that integrate information from multiple sources. Current research emphasizes efficient methods for aligning unimodal representations, often using contrastive learning, projection layers, or Mixture of Experts (MoE) architectures, to create effective multimodal systems. This work is significant because it allows researchers to build powerful multimodal models by leveraging the strengths of existing, well-trained unimodal architectures, reducing computational costs and data requirements while improving performance on tasks like sentiment analysis, activity recognition, and image retrieval.

Papers

June 24, 2024

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos
Yuting Mei, Linli Yao, Qin Jin
Unified Framework Gameplay Video Unimodal Model Video Summarization

May 28, 2024

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei, Di Hu
Multimodal Learning Unimodal Model Unimodal Encoders Unimodal Learning Multimodal Loss

May 24, 2024

Composed Image Retrieval for Remote Sensing
Bill Psomas, Ioannis Kakogeorgiou, Nikos Efthymiadis, Giorgos Tolias, Ondrej Chum, Yannis Avrithis, Konstantinos Karantzalos
Vision Language Model Remote Sensing Image to Image Unimodal Model Composed Image Retrieval

May 13, 2024

Improving Multimodal Learning with Multi-Loss Gradient Modulation
Konstantinos Kontras, Christos Chatzichristos, Matthew Blaschko, Maarten De Vos
Audio Visual Multimodal Learning Multi Modality Different Modality Unimodal Model ResNet Based Gradient Modulation

April 30, 2024

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta, Anna Deichler, Jim O'Regan, Birger Moëll, Jonas Beskow, Gustav Eje Henter, Simon Alexanderson
Synthetic Data Multimodal Phenomenon Multimodal Model Unimodal Model Data Scarcity Speech Driven Gesture Co Speech 3D Gesture

April 29, 2024

Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model
Seonhee Cho, Choonghan Kim, Jiho Lee, Chetan Chilkunda, Sujin Choi, Joo Heung Yoon
Domain Specific Large Multimodal Model Multimodal Data Unimodal Model Radiology Imaging Multimodal Problem Domain LLM

April 24, 2024

Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition
Hymalai Bello
Activity Recognition Sensor Fusion Gesture Recognition Wearable Device Unimodal Model Sensor Modality

April 13, 2024

MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild
Kateryna Chumachenko, Alexandros Iosifidis, Moncef Gabbouj
Multimodal Emotion Recognition Unimodal Model Dynamic Facial Expression Recognition Modal Adapter Modality Adaptation

April 2, 2024

March 28, 2024

MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models
Yanting Wang, Hongye Fu, Wei Zou, Jinyuan Jia
Adversarial Attack Multi Modal Model Unimodal Model Provable Defense

March 19, 2024

TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer
Eunjee Choi, Jong-Kook Kim
Transformer Based Fake News Fake News Detection Multimodal Information Multimodal Transformer Unimodal Model Bit Level Information Preserving

February 22, 2024

Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations
Lele Cao, Valentin Buchner, Zineb Senane, Fangkai Yang
Multimodal Large Language Model Multimodal LLM Unimodal Model Multimodal Benchmark MLLM Training Selective Annotation

February 1, 2024

In-Bed Pose Estimation: A Review
Ziya Ata Yazıcı, Sara Colantonio, Hazım Kemal Ekenel
Narrative Review Unimodal Model Multimodal Approach Bed Human

January 16, 2024

Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination
Syeda Nahida Akter, Aman Madaan, Sangwu Lee, Yiming Yang, Eric Nyberg
Vision Language Model Visual Representation Multimodal Model Reasoning Task Unimodal Model

December 28, 2023

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution
Ying Wang, Tim G. J. Rudner, Andrew Gordon Wilson
Vision Language Model Latent Representation Information Bottleneck Unimodal Model Visual Explanation Image Text Representation

December 18, 2023

UniForCE: The Unimodality Forest Method for Clustering and Estimation of the Number of Clusters
Georgios Vardakas, Argyris Kalogeratos, Aristidis Likas
Estimation Task Different Cluster Unimodal Model Unimodal Classifier Cluster Shape

November 28, 2023

A Multivariate Unimodality Test Harnessing the Dip Statistic of Mahalanobis Distances Over Random Projections
Prodromos Kolyvakis, Aristidis Likas
Unimodal Model P Value Unimodal Distribution

November 27, 2023

LMM-Assisted Breast Cancer Treatment Target Segmentation with Consistency Embedding
Kwanyoung Kim, Yujin Oh, Sangjoon Park, Hwa Kyung Byun, Jin Sung Kim, Yong Bae Kim, Jong Chul Ye
Large Multimodal Model Strong Consistency Unimodal Model Clinical Note Summarization Breast Tumor Segmentation Multimodal Segmentation

November 16, 2023

Improving Unimodal Inference with Multimodal Transformers
Kateryna Chumachenko, Alexandros Iosifidis, Moncef Gabbouj
Multimodal Transformer Unimodal Model Multi Modal Training Large Scale Multimodal

Unimodal Model

Papers

UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance

Composed Image Retrieval for Remote Sensing

Improving Multimodal Learning with Multi-Loss Gradient Modulation

Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis

Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model

Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition

MMA-DFER: MultiModal Adaptation of unimodal models for Dynamic Facial Expression Recognition in-the-wild

Attribution Regularization for Multimodal Paradigms

MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few Labels

MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

TT-BLIP: Enhancing Fake News Detection Using BLIP and Tri-Transformer

Introducing GenCeption for Multimodal LLM Benchmarking: You May Bypass Annotations

In-Bed Pose Estimation: A Review

Self-Imagine: Effective Unimodal Reasoning with Multimodal Models using Self-Imagination

Visual Explanations of Image-Text Representations via Multi-Modal Information Bottleneck Attribution

UniForCE: The Unimodality Forest Method for Clustering and Estimation of the Number of Clusters

A Multivariate Unimodality Test Harnessing the Dip Statistic of Mahalanobis Distances Over Random Projections

LMM-Assisted Breast Cancer Treatment Target Segmentation with Consistency Embedding

Improving Unimodal Inference with Multimodal Transformers