Semantic Code

Semantic code search aims to retrieve code snippets matching natural language descriptions, bridging the semantic gap between programming languages and human language. Current research focuses on improving search accuracy using techniques like Retrieval Augmented Generation (RAG) with large language models (LLMs), graph neural networks (GNNs) that leverage code structure (e.g., call graphs), and novel training methods incorporating both similar and dissimilar code examples to enhance model learning. These advancements significantly impact software development productivity by facilitating faster and more efficient code reuse and discovery, and are also relevant to tasks like malware analysis and automated machine learning.

Papers