Near Duplicate

Near-duplicate detection focuses on identifying highly similar items, whether images, videos, text, or code, across vast datasets. Current research emphasizes developing robust algorithms and model architectures, such as Siamese networks and vision transformers, to effectively capture subtle semantic similarities beyond exact matches, often incorporating techniques like embedding refinement and graph-theoretic approaches. This field is crucial for managing large datasets, mitigating copyright infringement, improving search and recommendation systems, and ensuring fair evaluation in machine learning, with applications ranging from biometric security to software development and online learning platforms.

Papers