Dual Encoder

Dual encoder models map different data types (e.g., images and text) into a shared embedding space to efficiently determine similarities, primarily used in retrieval tasks. Current research focuses on improving their accuracy, often through knowledge distillation from more accurate but less efficient cross-encoders, and exploring architectural variations like asymmetric designs and the incorporation of frequency or multi-modal information. These advancements are significant because they enable faster and more scalable solutions for various applications, including image-text retrieval, question answering, and even robotic trajectory planning, while addressing limitations in accuracy and generalization.

Papers