Planning Benchmark

Planning benchmarks evaluate the ability of artificial intelligence models, particularly large language models (LLMs), to generate and execute plans in various domains, from household tasks to autonomous driving. Current research focuses on developing benchmarks that assess both the accuracy and robustness of plans, often incorporating multi-modal inputs (e.g., images and text) and real-world complexities like multi-agent interactions. These benchmarks are crucial for advancing AI planning capabilities and informing the development of more reliable and adaptable autonomous systems across diverse applications.

Papers