Binary Code Similarity

Binary code similarity detection aims to identify functionally similar code segments within compiled programs, regardless of differences in compiler optimizations, architectures, or obfuscation techniques. Current research heavily utilizes deep learning, particularly transformer-based models and graph neural networks, often incorporating intermediate representations or leveraging contrastive learning to generate robust embeddings of binary functions. This field is crucial for various applications, including malware analysis, vulnerability detection, software plagiarism detection, and improving the security and reliability of software supply chains.

Papers