Egocentric Video
Egocentric video, capturing the world from a first-person perspective, is revolutionizing computer vision by enabling the analysis of human activities and interactions in their natural context. Current research heavily focuses on developing robust multimodal models, often leveraging transformer architectures and large language models, to understand and generate information from egocentric video data, addressing challenges like motion estimation, action recognition, and affordance prediction. This field is significant for advancing artificial intelligence, particularly in embodied AI and human-computer interaction, with applications ranging from assistive technologies and virtual reality to robotics and understanding human behavior. The development of large-scale datasets and standardized evaluation metrics is also driving progress.
Papers
Eliciting In-Context Learning in Vision-Language Models for Videos Through Curated Data Distributional Properties
Keunwoo Peter Yu, Zheyuan Zhang, Fengyuan Hu, Shane Storks, Joyce Chai
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Takehiko Ohkawa, Takuma Yagi, Taichi Nishimura, Ryosuke Furuta, Atsushi Hashimoto, Yoshitaka Ushiku, Yoichi Sato
EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video
Matthias De Lange, Hamid Eghbalzadeh, Reuben Tan, Michael Iuzzolino, Franziska Meier, Karl Ridgeway
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick, Yale Song, Sayan Nag, Kevin Qinghong Lin, Hardik Shah, Mike Zheng Shou, Rama Chellappa, Pengchuan Zhang