Paper ID: 2407.03525

UnSeenTimeQA: Time-Sensitive Question-Answering Beyond LLMs' Memorization

Md Nayem Uddin, Amir Saeidi, Divij Handa, Agastya Seth, Tran Cao Son, Eduardo Blanco, Steven R. Corman, Chitta Baral

This paper introduces UnSeenTimeQA, a novel data contamination free time-sensitive question-answering (TSQA) benchmark. It differs from existing TSQA benchmarks by avoiding web-searchable queries grounded in the real-world. We present a series of time-sensitive event scenarios based on synthetically generated facts. It requires large language models (LLMs) to engage in genuine temporal reasoning without depending on the factual knowledge acquired during the pre-training phase. We designed three types of time-sensitive questions to test LLMs' temporal reasoning abilities over sequential and parallel event occurrences. Our evaluation of five LLMs shows that their performance on synthetic fact-based TSQA is inferior as compared to their performance on real-world fact-based TSQA. Further analysis of LLM-generated reasoning chains indicates difficulty in capturing long-range event dependencies and the effect of interlinked events in synthetic scenarios.

Submitted: Jul 3, 2024