Decision Making Benchmark
Decision-making benchmarks evaluate the ability of large language models (LLMs) to perform complex, sequential decision-making tasks, often simulating real-world scenarios like resource management or interactive problem-solving. Current research focuses on developing new benchmarks that assess LLMs' performance in diverse contexts, including those requiring data analysis and interaction with external environments, and on improving LLM decision-making capabilities through techniques like offline data-driven learning and prompt engineering. These efforts are crucial for advancing the reliability and applicability of LLMs in various domains, from automated systems to assisting human decision-making processes.
Papers
June 18, 2024
October 22, 2023
August 3, 2023
June 4, 2023