Code Search
Code search aims to retrieve relevant code snippets from a large corpus based on natural language queries, improving software development efficiency. Current research focuses on enhancing semantic understanding through techniques like Retrieval Augmented Generation (RAG) with large language models (LLMs), contrastive learning, and graph neural networks (GNNs) to better capture code structure and semantics, addressing issues like modality misalignment and bias in search results. Improved datasets with more realistic queries and multiple valid code matches are also a key focus. These advancements have significant implications for developer productivity and the broader software engineering field by facilitating faster code reuse and improved code understanding.
Papers
CCT-Code: Cross-Consistency Training for Multilingual Clone Detection and Code Search
Nikita Sorokin, Dmitry Abulkhanov, Sergey Nikolenko, Valentin Malykh
Searching by Code: a New SearchBySnippet Dataset and SnippeR Retrieval Model for Searching by Code Snippets
Ivan Sedykh, Dmitry Abulkhanov, Nikita Sorokin, Sergey Nikolenko, Valentin Malykh