Egocentric Video
Egocentric video, capturing the world from a first-person perspective, is revolutionizing computer vision by enabling the analysis of human activities and interactions in their natural context. Current research heavily focuses on developing robust multimodal models, often leveraging transformer architectures and large language models, to understand and generate information from egocentric video data, addressing challenges like motion estimation, action recognition, and affordance prediction. This field is significant for advancing artificial intelligence, particularly in embodied AI and human-computer interaction, with applications ranging from assistive technologies and virtual reality to robotics and understanding human behavior. The development of large-scale datasets and standardized evaluation metrics is also driving progress.
Papers
The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
Bria Long, Violet Xiang, Stefan Stojanov, Robert Z. Sparks, Zi Yin, Grace E. Keene, Alvin W. M. Tan, Steven Y. Feng, Chengxu Zhuang, Virginia A. Marchman, Daniel L. K. Yamins, Michael C. Frank
PARSE-Ego4D: Personal Action Recommendation Suggestions for Egocentric Videos
Steven Abreu, Tiffany D. Do, Karan Ahuja, Eric J. Gonzalez, Lee Payne, Daniel McDuff, Mar Gonzalez-Franco
CARLOR @ Ego4D Step Grounding Challenge: Bayesian temporal-order priors for test time refinement
Carlos Plou, Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Ana C. Murillo
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen, Puyuan Peng, Ami Baid, Zihui Xue, Wei-Ning Hsu, David Harwath, Kristen Grauman
SViTT-Ego: A Sparse Video-Text Transformer for Egocentric Video
Hector A. Valdez, Kyle Min, Subarna Tripathi