Paper ID: 2408.05873
Defining Boundaries: A Spectrum of Task Feasibility for Large Language Models
Wenbo Zhang, Zihang Xu, Hengrui Cai
Large language models (LLMs) have shown remarkable performance in various tasks but often fail to handle queries that exceed their knowledge and capabilities, leading to incorrect or fabricated responses. This paper addresses the need for LLMs to recognize and refuse infeasible tasks due to the required skills surpassing their capabilities. We first conceptualize infeasible tasks for LLMs and provide categorizations that cover a spectrum of related hallucinations over existing literature. We develop and benchmark a new dataset comprising diverse infeasible and feasible tasks to evaluate multiple LLMs' abilities to reject infeasible tasks. Furthermore, we explore the potential of increasing LLMs' refusal capabilities with fine-tuning. Experiments validate the effectiveness of our trained models, offering promising directions for refining the operational boundaries of LLMs in real applications.
Submitted: Aug 11, 2024