Modality Specific
Modality-specific research focuses on effectively integrating information from diverse data sources (e.g., text, images, audio, video) in machine learning models, aiming to leverage the unique strengths of each modality while mitigating their individual limitations. Current research emphasizes developing advanced fusion techniques, including mixture-of-experts models and attention mechanisms, to create robust multimodal representations and improve performance on tasks like classification, generation, and object tracking. This field is crucial for advancing artificial intelligence, particularly in applications requiring nuanced understanding of complex real-world scenarios, such as medical diagnosis, autonomous driving, and affective computing. The development of efficient and effective modality-specific methods is driving progress in various domains by enabling more accurate and robust AI systems.
Papers
Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs
Anirudh Phukan, Divyansh, Harshit Kumar Morj, Vaishnavi, Apoorv Saxena, Koustava Goswami
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou, Jiachun Jin, Chang Liu, Ye Ma, Jian Jia, Quan Chen, Peng Jiang, Zhijie Deng