Egocentric Video
Egocentric video, capturing the world from a first-person perspective, is revolutionizing computer vision by enabling the analysis of human activities and interactions in their natural context. Current research heavily focuses on developing robust multimodal models, often leveraging transformer architectures and large language models, to understand and generate information from egocentric video data, addressing challenges like motion estimation, action recognition, and affordance prediction. This field is significant for advancing artificial intelligence, particularly in embodied AI and human-computer interaction, with applications ranging from assistive technologies and virtual reality to robotics and understanding human behavior. The development of large-scale datasets and standardized evaluation metrics is also driving progress.
Papers
Look Ma, No Hands! Agent-Environment Factorization of Egocentric Videos
Matthew Chang, Aditya Prakash, Saurabh Gupta
Guided Attention for Next Active Object @ EGO4D STA Challenge
Sanket Thakur, Cigdem Beyan, Pietro Morerio, Vittorio Murino, Alessio Del Bue
Cross-view Action Recognition Understanding From Exocentric to Egocentric Perspective
Thanh-Dat Truong, Khoa Luu