Input Modality

Input modality research investigates how the form of information presented (e.g., text, images, audio) affects machine learning model performance and human-computer interaction. Current research focuses on multimodal learning, exploring how to effectively combine different input modalities using architectures like transformers and employing techniques such as contrastive learning and optimal transport for data alignment and fusion. This field is crucial for advancing artificial intelligence, particularly in robotics, healthcare, and human-computer interaction, by enabling more robust, efficient, and human-centered systems that can process and interpret information from diverse sources. Furthermore, understanding modality effects is vital for designing more effective and user-friendly interfaces.

Papers