Paper ID: 2502.15771 • Published Feb 16, 2025
Learning to Reason from Feedback at Test-Time
TL;DR
Get AI-generated summaries with premium
Get AI-generated summaries with premium
Solving complex tasks in a single attempt is challenging for large language
models (LLMs). Iterative interaction with the environment and feedback is often
required to achieve success, making effective feedback utilization a critical
topic. Existing approaches either struggle with length generalization or rely
on naive retries without leveraging prior information. In this paper, we
introduce FTTT, a novel paradigm that formulates feedback utilization as an
optimization problem at test time. Additionally, we propose a learnable
test-time optimizer, OpTune, to effectively exploit feedback. Experiments on
two LLMs across four reasoning datasets demonstrate that FTTT and OpTune
achieve superior scalability and performance.