Multimodal Sensing

Multimodal sensing integrates data from multiple sensor types (e.g., vision, touch, inertial measurement units) to achieve a more comprehensive understanding of a system or environment than any single modality could provide. Current research focuses on developing robust methods for fusing these diverse data streams, often employing deep learning architectures like Vision Transformers (ViTs) and employing techniques like contrastive learning and variational inference to disentangle shared and modality-specific information. This field is crucial for advancing applications ranging from robotic manipulation and autonomous navigation to healthcare monitoring and urban change detection, offering improved accuracy, resilience, and efficiency in data analysis and decision-making.

Papers