Clone Detection

Clone detection aims to identify duplicated or highly similar code, data, or even neural network models, addressing issues ranging from copyright infringement to data management inefficiencies. Current research focuses on developing robust algorithms, including those leveraging value similarity for tabular data, graph embeddings for code, and contrastive learning techniques for semantic clone detection in code and images. These advancements are crucial for improving software auditing, enhancing data governance, and furthering our understanding of model lineage and transferability in machine learning.

Papers

December 20, 2024

Towards Secure AI-driven Industrial Metaverse with NFT Digital Twins
Ravi Prakash, Tony Thomas
Digital Twin Non Fungible Token Tamper Detection Clone Detection

December 19, 2024

On the Use of Deep Learning Models for Semantic Clone Detection
Subroto Nag Pinku, Debajyoti Mondal, Chanchal K. Roy
Deep Learning Model Greater Public Use Code Clone Detection Clone Detection

November 2, 2024

Cloned Identity Detection in Social-Sensor Clouds based on Incomplete Profiles
Ahmed Alharbi, Hai Dong, Xun Yi, Prabath Abeysekara
Sensitive Attribute Uncertainty Induced Incomplete Clone Detection

June 24, 2024

SimClone: Detecting Tabular Data Clones using Value Similarity
Xu Yang, Gopi Krishnan Rajbahadur, Dayi Lin, Shaowei Wang, Zhen Ming, Jiang
Tabular Datasets Similarity Learning Clone Detection Data Analysis Replication

June 17, 2024

Neural Lineage
Runpeng Yu, Xinchao Wang
Single Neuron Level Clone Detection

February 14, 2024

Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code
Vahid Majdinasab, Amin Nikanjam, Foutse Khomh
Language Model Real World Code Code Clone Detection Informed Consent Clone Detection Code Detection

June 28, 2023

A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges
Morteza Zakeri-Nasrabadi, Saeed Parsa, Mohammad Ramezani, Chanchal Roy, Masoud Ekhtiarzadeh
Financial Application Barzilai Borwein Technique Systematic Literature Review Software Engineering Code Similarity Clone Detection

May 19, 2023

CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search
Anton Tikhonov, Nikita Sorokin, Dmitry Abulkhanov, Irina Piontkovskaya, Sergey Nikolenko, Valentin Malykh
Source Code Consistency Training Code Search Code Clone Clone Detection

December 16, 2022

Fake it till you make it: Learning transferable representations from synthetic ImageNet clones
Mert Bulent Sariyildiz, Karteek Alahari, Diane Larlus, Yannis Kalantidis
Supervised ImageNet Synthetic Image Image Generation Model Transferable Representation Fake Speech ImageNet Classification Clone Detection

August 24, 2022

Tracking by weakly-supervised learning and graph optimization for whole-embryo C. elegans lineages
Peter Hirsch, Caroline Malin-Mayor, Anthony Santella, Stephan Preibisch, Dagmar Kainmueller, Jan Funke
Web Tracking Weakly Supervised Learning Graph Optimization Mitosis Detection Human Embryo Clone Detection Cell Tracking Challenge

August 17, 2022

ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection
Yifan Zhang, Junwen Yang, Haoyu Dong, Qingchen Wang, Huajie Shao, Kevin Leach, Yu Huang
Astronomical Data Abstract Syntax Tree Code Clone Detection Clone Detection

June 17, 2022

Evaluation of Contrastive Learning with Various Code Representations for Code Clone Detection
Maksim Zubkov, Egor Spirin, Egor Bogomolov, Timofey Bryksin
Contrastive Learning Global Evaluation Code Representation Code Clone Detection Code Clone Clone Detection

Clone Detection

Papers

Towards Secure AI-driven Industrial Metaverse with NFT Digital Twins

On the Use of Deep Learning Models for Semantic Clone Detection

Cloned Identity Detection in Social-Sensor Clouds based on Incomplete Profiles

SimClone: Detecting Tabular Data Clones using Value Similarity

Neural Lineage

Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code

A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search

Fake it till you make it: Learning transferable representations from synthetic ImageNet clones

Tracking by weakly-supervised learning and graph optimization for whole-embryo C. elegans lineages

ASTRO: An AST-Assisted Approach for Generalizable Neural Clone Detection

Evaluation of Contrastive Learning with Various Code Representations for Code Clone Detection