Video Haystack

"Video Haystack" research focuses on evaluating and improving the ability of large language models (LLMs), particularly multimodal LLMs, to accurately retrieve and reason with information embedded within extensive visual and textual contexts – analogous to finding a "needle in a haystack." Current research emphasizes benchmarking model performance using synthetic datasets and novel evaluation tasks, such as summarization and multi-image question answering, to assess capabilities like long-context retrieval and reasoning across various modalities. These advancements are crucial for improving the robustness and reliability of LLMs in real-world applications requiring processing of large, complex datasets, such as visual search, video understanding, and information retrieval.

Papers