Egocentric Video
Egocentric video, capturing the world from a first-person perspective, is revolutionizing computer vision by enabling the analysis of human activities and interactions in their natural context. Current research heavily focuses on developing robust multimodal models, often leveraging transformer architectures and large language models, to understand and generate information from egocentric video data, addressing challenges like motion estimation, action recognition, and affordance prediction. This field is significant for advancing artificial intelligence, particularly in embodied AI and human-computer interaction, with applications ranging from assistive technologies and virtual reality to robotics and understanding human behavior. The development of large-scale datasets and standardized evaluation metrics is also driving progress.
Papers
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022
Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, Rongcheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou
Egocentric Video-Language Pretraining @ EPIC-KITCHENS-100 Multi-Instance Retrieval Challenge 2022
Kevin Qinghong Lin, Alex Jinpeng Wang, Rui Yan, Eric Zhongcong Xu, Rongcheng Tu, Yanru Zhu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Wei Liu, Mike Zheng Shou