Unimodal Model

Unimodal models, focusing on single data modalities (e.g., text or images), are being increasingly leveraged to build and improve multimodal models that integrate information from multiple sources. Current research emphasizes efficient methods for aligning unimodal representations, often using contrastive learning, projection layers, or Mixture of Experts (MoE) architectures, to create effective multimodal systems. This work is significant because it allows researchers to build powerful multimodal models by leveraging the strengths of existing, well-trained unimodal architectures, reducing computational costs and data requirements while improving performance on tasks like sentiment analysis, activity recognition, and image retrieval.

Papers

December 13, 2022

The Hateful Memes Challenge Next Move
Weijun Jin, Lance Wilhelm
Semi Supervised Learning Unimodal Model Hateful Meme Image Meme

December 1, 2022

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models
Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille
Vision Language Localization Focus Visual Representation Multimodal Model Semantics Surfaced Unimodal Model Based Model Visual Learning Vision and Language Model

November 11, 2022

Unimodal and Multimodal Representation Training for Relation Extraction
Ciaran Cooney, Rachel Heyburn, Liam Madigan, Mairead O'Cuinn, Chloe Thompson, Joana Cavadas
Relation Extraction Document Understanding Unimodal Model Joint Representation Multimodal Chart Multimodal Integration

October 31, 2022

Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal and Multimodal Representations
Sijie Mai, Ying Zeng, Haifeng Hu
Information Bottleneck Multimodal Representation Unimodal Model Multimodal Machine Learning Unimodal Representation

October 20, 2022

A survey on Self Supervised learning approaches for improving Multimodal representation learning
Naman Goyal
Timely Survey Self Supervised Learning Multimodal Learning Unimodal Model Multimodal Representation Learning Machine Learning Task Cross Modal Generation

October 13, 2022

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting
Oscar Mañas, Pau Rodriguez, Saba Ahmadi, Aida Nematzadeh, Yash Goyal, Aishwarya Agrawal
Vision Language Large Pre Trained Model Unimodal Model Parameter Efficient Adaptation Pre Trained Unimodal

October 4, 2022

ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
Antonio Norelli, Marco Fumero, Valentino Maiorca, Luca Moschella, Emanuele Rodolà, Francesco Locatello
Multimodal Phenomenon Image Text Pair Multimodal Dataset Unimodal Model Image Text Model Domain Encoders

July 21, 2022

Unimodal vs. Multimodal Siamese Networks for Outfit Completion
Mariya Hendriksen, Viggo Overes
Effective Recommendation Siamese Network Unimodal Model Visual Modality Outfit Completion Fashion E Commerce

June 29, 2022

Conditioned Human Trajectory Prediction using Iterative Attention Blocks
Aleksey Postnikov, Aleksander Gamayunov, Gonzalo Ferrer
Trajectory Prediction Human Motion Prediction Unimodal Model Pedestrian Trajectory Prediction

June 18, 2022

GaLeNet: Multimodal Learning for Disaster Prediction, Management and Relief
Rohit Saha, Mengyi Fang, Angeline Yasodhara, Kyryl Truskovskyi, Azin Asgarian, Daniel Homola, Raahil Shah, Frederik Dieleman, Jack Weatheritt, Thomas Rogers
Multimodal Learning Unimodal Model Natural Disaster Multimodal Framework Disaster Image

May 30, 2022

Rites de Passage: Elucidating Displacement to Emplacement of Refugees on Twitter
Aparup Khatua, Wolfgang Nejdl
Social Medium Twitter Resource Unimodal Model Global Placement Refugee Law Large Displacement

May 17, 2022

SEMI-FND: Stacked Ensemble Based Multimodal Inference For Faster Fake News Detection
Prabhav Singh, Ridam Srivastava, K. P. S. Rana, Vineet Kumar
Fake News Fake News Detection Unimodal Model Weibo Dataset

April 7, 2022

Multi-objective optimization determines when, which and how to fuse deep networks: an application to predict COVID-19 outcomes
Valerio Guarrasi, Paolo Soda
Application Proficiency Covid 19 Deep Network Multi Objective Optimization Unimodal Model Multimodal Deep Learning Modality Effect

April 4, 2022

Using Explainable Boosting Machine to Compare Idiographic and Nomothetic Approaches for Ecological Momentary Assessment Data
Mandani Ntekouli, Gerasimos Spanakis, Lourens Waldorp, Anne Roefs
Machine Learning Model Nonlinear Model Interpretable Machine Learning Stereotype Content Unimodal Model Explainable Boosting Machine Exponential Moving Average Ecological Momentary Assessment

February 24, 2022

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Xichen Pan, Peiyu Chen, Yichen Gong, Helong Zhou, Xinbing Wang, Zhouhan Lin
Unimodal Model Audio Visual Speech Recognition Multimodal Framework

February 7, 2022

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang, An Yang, Rui Men, Junyang Lin, Shuai Bai, Zhikang Li, Jianxin Ma, Chang Zhou, Jingren Zhou, Hongxia Yang
Cross Modal Sequence to Sequence New Task Different Modality Modality Specific Unimodal Model Unifying Framework Cross Modal Task Modality Agnostic

January 26, 2022

CodeRetriever: Unimodal and Bimodal Contrastive Learning for Code Search
Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, Nan Duan
Contrastive Learning Unimodal Model Code Search Pre Trained Code Model Code Corpus Code Pair

November 10, 2021

Which is Making the Contribution: Modulating Unimodal and Cross-modal Dynamics for Multimodal Sentiment Analysis
Ying Zeng, Sijie Mai, Haifeng Hu
Cross Modal Multimodal Data Unimodal Model Multimodal Sentiment Analysis Client Contribution Unimodal Representation

Unimodal Model

Papers

The Hateful Memes Challenge Next Move

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models

Unimodal and Multimodal Representation Training for Relation Extraction

Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal and Multimodal Representations

A survey on Self Supervised learning approaches for improving Multimodal representation learning

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training

Unimodal vs. Multimodal Siamese Networks for Outfit Completion

Conditioned Human Trajectory Prediction using Iterative Attention Blocks

GaLeNet: Multimodal Learning for Disaster Prediction, Management and Relief

Rites de Passage: Elucidating Displacement to Emplacement of Refugees on Twitter

SEMI-FND: Stacked Ensemble Based Multimodal Inference For Faster Fake News Detection

Multi-objective optimization determines when, which and how to fuse deep networks: an application to predict COVID-19 outcomes

Using Explainable Boosting Machine to Compare Idiographic and Nomothetic Approaches for Ecological Momentary Assessment Data

Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition

OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

CodeRetriever: Unimodal and Bimodal Contrastive Learning for Code Search

Which is Making the Contribution: Modulating Unimodal and Cross-modal Dynamics for Multimodal Sentiment Analysis