Program Synthesis Benchmark

Program synthesis benchmarks evaluate the ability of artificial intelligence models to generate code from natural language descriptions or other inputs. Current research focuses on developing more robust and comprehensive benchmarks that assess various aspects of code generation, including generalization capabilities, handling of different programming paradigms, and the ability to synthesize code for complex tasks. These benchmarks are crucial for evaluating the progress of various program synthesis approaches, such as large language models, genetic programming, and multi-agent systems, and for guiding the development of more effective and efficient code generation techniques with implications for software engineering and automation.

Papers

November 13, 2024

Searching Latent Program Spaces
Clément Bonnet, Matthew V Macfarlane
Program Synthesis Program Synthesis Benchmark Program Induction

September 2, 2024

H-ARC: A Robust Estimate of Human Performance on the Abstraction and Reasoning Corpus Benchmark
Solim LeGris, Wai Keen Vong, Brenden M. Lake, Todd M. Gureckis
New Benchmark Distribution Generalization Reasoning Benchmark Robust Estimation Cognitive Abstraction Human Performance Reasoning Corpus Program Synthesis Benchmark

June 17, 2024

Program Synthesis Benchmark for Visual Programming in XLogoOnline Environment
Chao Wen, Jacqueline Staub, Adish Singla
Multimodal Model Program Synthesis Visual Programming Synthetic Training Program Synthesis Benchmark

May 18, 2024

MapCoder: Multi-Agent Code Generation for Competitive Problem Solving
Md. Ashraful Islam, Mohammed Eunus Ali, Md Rizwan Parvez
Multi Agent Code Generation Code Generation Task Code Synthesis Code Generation Ability Programming Problem Program Synthesis Benchmark

March 28, 2024

Top Leaderboard Ranking = Top Coding Proficiency, Always? EvoEval: Evolving Coding Benchmarks via LLM
Chunqiu Steven Xia, Yinlin Deng, Lingming Zhang
New Benchmark Medical LLM Code Generation Program Synthesis Comprehensive Benchmark Code Benchmark Leaderboard Extraction Program Synthesis Benchmark

October 30, 2023

LILO: Learning Interpretable Libraries by Compressing and Documenting Code
Gabriel Grand, Lionel Wong, Maddy Bowers, Theo X. Olausson, Muxin Liu, Joshua B. Tenenbaum, Jacob Andreas
Large Language Model Code Generation Program Synthesis Code Corpus Program Synthesis Benchmark Self Interpretable Model

April 20, 2023

Fully Autonomous Programming with Large Language Models
Vadim Liventsev, Anastasiia Grishina, Aki Härmä, Leon Moonen
Program Repair Code Debugging Instruction Tuned Large Language Model Prompt Generation Program Synthesis Benchmark

January 4, 2023

Informed Down-Sampled Lexicase Selection: Identifying productive training cases for efficient problem solving
Ryan Boldi, Martin Briesch, Dominik Sobania, Alexander Lalejini, Thomas Helmuth, Franz Rothlauf, Charles Ofria, Lee Spector
Genetic Programming Efficient Approach Lexicase Selection Training Example Program Synthesis Benchmark

April 12, 2022

InCoder: A Generative Model for Code Infilling and Synthesis
Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis
Generative Modeling Critical Synthesis Program Synthesis Unified Generative Program Synthesis Benchmark Zero Shot Slot Filling

November 15, 2021

Choose Your Programming Copilot: A Comparison of the Program Synthesis Performance of GitHub Copilot and Genetic Programming
Dominik Sobania, Martin Briesch, Franz Rothlauf
Consistent Comparison Genetic Programming Program Synthesis Open Source Code Microsoft Copilot Program Synthesis Benchmark