Duplicate Detection
Duplicate detection aims to identify identical or near-identical items across diverse data types, ranging from text and images to software code and medical scans. Current research focuses on developing robust algorithms and models, including Siamese networks, transformers, and locality-sensitive hashing, to handle various data modalities and address challenges like fuzzy duplicates and near-duplicates caused by subtle transformations. These advancements are crucial for improving data quality, enhancing search efficiency, protecting intellectual property, and automating tasks in various fields, from software engineering and customer relationship management to medical imaging and copyright enforcement.
Papers
August 16, 2024
August 14, 2024
July 11, 2024
June 17, 2024
June 10, 2024
May 24, 2024
January 6, 2024
December 22, 2023
December 12, 2023
October 10, 2023
September 6, 2023
May 16, 2023
March 17, 2023
February 15, 2023
January 9, 2023
December 24, 2022
December 20, 2022
December 13, 2022
September 28, 2022