Code Representation

Code representation research focuses on creating effective numerical representations of source code to facilitate machine learning applications in software engineering. Current efforts concentrate on leveraging transformer-based models and contrastive learning techniques, often incorporating structural information from abstract syntax trees or control-flow graphs, to generate robust and semantically meaningful embeddings. These improved representations are crucial for enhancing various tasks, including code search, clone detection, vulnerability analysis, and automated program repair, ultimately improving software development efficiency and security. The field is also exploring multilingual code representation and efficient fine-tuning strategies to address the challenges of diverse programming languages and limited computational resources.

Papers