Clone Detection

Clone detection aims to identify duplicated or highly similar code, data, or even neural network models, addressing issues ranging from copyright infringement to data management inefficiencies. Current research focuses on developing robust algorithms, including those leveraging value similarity for tabular data, graph embeddings for code, and contrastive learning techniques for semantic clone detection in code and images. These advancements are crucial for improving software auditing, enhancing data governance, and furthering our understanding of model lineage and transferability in machine learning.

Papers