Multimodal Representation Learning using Adaptive Graph Construction [2410.06395]