Frame Selection

Frame selection focuses on efficiently identifying the most informative subset of frames from a video or audio sequence for downstream tasks, improving both computational efficiency and model performance. Current research explores various methods, including those based on learned importance scores, graph attention networks, and heuristic search algorithms, often integrated into larger architectures for tasks like text-to-video retrieval, action recognition, and speech synthesis. These advancements are significant because they reduce computational costs associated with processing large multimedia datasets while maintaining or even improving accuracy in applications ranging from video understanding to personalized voice generation.

Papers

January 6, 2025

MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs
Hui Sun, Shiyin Lu, Huanyu Wang, Qing-Guo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Ming Li
Training Free Video LLM Frame Selection

November 22, 2024

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Songhao Han, Wei Huang, Hairong Shi, Le Zhuo, Xiu Su, Shifeng Zhang, Xu Zhou, Xiaojuan Qi, Yue Liao, Si Liu
Fine Grained Chain of Thought Multiple Choice VideoQA VideoQA Model Video Reasoning Frame Selection

August 30, 2024

SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
Ismail Rasim Ulgen, Shreeram Suresh Chandra, Junchen Lu, Berrak Sisman
Human VOICE Multi Speaker Text to Speech Frame Selection

June 25, 2024

Burst Image Super-Resolution with Base Frame Selection
Sanghyun Kim, Min Jung Lee, Woohyeok Kim, Deunsol Jung, Jaesung Rim, Sunghyun Cho, Minsu Cho
Super Resolution Burst Super Resolution Burst Image Frame Selection

November 1, 2023

An Empirical Study of Frame Selection for Text-to-Video Retrieval
Mengxia Wu, Min Cao, Yang Bai, Ziyin Zeng, Chen Chen, Liqiang Nie, Min Zhang
Empirical Study Text to Video Retrieval Video Context Frame Selection

April 20, 2023

Search-Map-Search: A Frame Selection Paradigm for Action Recognition
Mingjun Zhao, Yakun Yu, Xiaoli Wang, Lei Yang, Di Niu
Action Recognition Local Search Video Understanding Task Action Recognition Model Frame Level Importance Frame Selection

January 18, 2023

Gated-ViGAT: Efficient Bottom-Up Event Recognition and Explanation Using a New Frame Selection Policy and Gating Mechanism
Nikolaos Gkalelis, Dimitrios Daskalakis, Vasileios Mezaris
Line by Line Explanation Graph Attention Network Gating Mechanism Event Based Object Recognition Video Event Gated Camera Frame Selection