Code Corpus
Code corpora, massive collections of source code, are central to advancing code understanding and generation through machine learning. Current research focuses on developing effective code representations, often using graph-based structures or contrastive learning methods to capture semantic relationships within large codebases, and employing large language models (LLMs) for tasks like code generation, search, and refactoring. These efforts aim to improve software development efficiency and reliability by enabling automated code analysis, synthesis, and comprehension, impacting both research in software engineering and the practical application of AI in software development.
Papers
August 20, 2024
January 31, 2024
January 25, 2024
October 30, 2023
October 12, 2023
March 16, 2023
October 31, 2022
October 18, 2022
June 14, 2022
April 19, 2022
March 10, 2022
February 16, 2022