Dense Retriever

Dense retrieval focuses on efficiently finding relevant information (e.g., documents, passages) from large datasets by representing both queries and data points as dense vectors, enabling fast similarity comparisons. Current research emphasizes improving the accuracy and efficiency of these methods, exploring techniques like contrastive learning, knowledge distillation, and the integration of large language models (LLMs) to enhance retrieval performance, particularly in low-resource or zero-shot scenarios. These advancements are significant for various applications, including question answering, conversational search, and biomedical literature search, by enabling faster and more accurate information access.

Papers

December 20, 2022

December 17, 2022

AugTriever: Unsupervised Dense Retrieval and Domain Adaptation by Scalable Data Augmentation
Rui Meng, Ye Liu, Semih Yavuz, Divyansh Agarwal, Lifu Tu, Ning Yu, Jianguo Zhang, Meghana Bhat, Yingbo Zhou
Data Augmentation Dense Retrieval Open Domain Question Answering Dense Retriever Dense Retrieval Model Query Understanding

December 15, 2022

October 25, 2022

Bridging the Training-Inference Gap for Dense Phrase Retrieval
Gyuwan Kim, Jinhyuk Lee, Barlas Oguz, Wenhan Xiong, Yizhe Zhang, Yashar Mehdad, William Yang Wang
Dense Retrieval Dense Retriever Training Inference Phrase Retrieval Dense Phrase Retrieval

October 11, 2022

Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering
Hao Cheng, Hao Fang, Xiaodong Liu, Jianfeng Gao
High Efficiency Dense Retrieval Open Domain Question Answering Dense Retriever Dense Retrieval Model Knowledge Intensive Task Specialization

October 4, 2022

A Study on the Efficiency and Generalization of Light Hybrid Retrievers
Man Luo, Shashank Jain, Anchit Gupta, Arash Einolghozati, Barlas Oguz, Debojeet Chatterjee, Xilun Chen, Chitta Baral, Peyman Heidari
Strong Generalization High Efficiency Study Feature Dense Retriever Hybrid Retriever

August 29, 2022

LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval
Kai Zhang, Chongyang Tao, Tao Shen, Can Xu, Xiubo Geng, Binxing Jiao, Daxin Jiang
Retrieval Model Dense Retriever Large Scale Retrieval Lexical Representation Lexicon Based Retrieval

May 25, 2022

Obj2Sub: Unsupervised Conversion of Objective to Subjective Questions
Aarish Chhabra, Nandini Bansal, Venktesh V, Mukesh Mohania, Deep Dwivedi
Dense Retriever Unsupervised Domain Subjective Question

May 23, 2022

Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval
Nandan Thakur, Nils Reimers, Jimmy Lin
Domain Adaptation Dense Retrieval Dense Retriever Zero Shot Retrieval Zero Shot Dense Retrieval

April 1, 2022

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos
Shengyao Zhuang, Guido Zuccon
Native Robustness Query Information Dense Retriever Self Directed Learning Query Encoder Typo Squatting

March 11, 2022

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval
Canwen Xu, Daya Guo, Nan Duan, Julian McAuley
Dense Retrieval Dense Retriever Zero Shot Retrieval

February 25, 2022

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training
Shengyao Zhuang, Guido Zuccon
Training Data Pre Trained Easy to Use Toolkit Dense Retriever Model Checkpoint Validation Performance

February 15, 2022

Saving Dense Retriever from Shortcut Dependency in Conversational Search
Sungdong Kim, Gangwoo Kim
Dense Retriever Conversational Search Shortcut Feature Hard Negative Retrieval Technique

December 16, 2021

Unsupervised Dense Information Retrieval with Contrastive Learning
Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave
Contrastive Learning Dense Retrieval Dense Retriever MultiLingual Information Retrieval Cross Lingual Retrieval

December 15, 2021

Large Dual Encoders Are Generalizable Retrievers
Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, Yinfei Yang
Domain Generalization Dense Retriever Dual Encoders Dual Encoder Dual Encoder Model Task Specific Retriever

December 14, 2021

Dense Retriever

Papers

What Are You Token About? Dense Retrieval as Distributions Over the Vocabulary

Adam: Dense Retrieval Distillation with Adaptive Dark Examples

AugTriever: Unsupervised Dense Retrieval and Domain Adaptation by Scalable Data Augmentation

MASTER: Multi-task Pre-trained Bottlenecked Masked Autoencoders are Better Dense Retrievers

Retrieval-based Disentangled Representation Learning with Natural Language Supervision

Bridging the Training-Inference Gap for Dense Phrase Retrieval

Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering

A Study on the Efficiency and Generalization of Light Hybrid Retrievers

LED: Lexicon-Enlightened Dense Retriever for Large-Scale Retrieval

Obj2Sub: Unsupervised Conversion of Objective to Subjective Questions

Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos

LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints during Training

Saving Dense Retriever from Shortcut Dependency in Conversational Search

Unsupervised Dense Information Retrieval with Contrastive Learning

Large Dual Encoders Are Generalizable Retrievers

Boosted Dense Retriever

Learning to Retrieve Passages without Supervision

GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval