Ego4D AudioVisual
Ego4D AudioVisual research focuses on understanding and interpreting human actions and interactions from first-person perspective videos and audio, aiming to build more robust and context-aware AI systems. Current efforts concentrate on developing models that effectively fuse audio and visual data, employing architectures like transformers and recurrent neural networks, to address challenges such as pose estimation, action recognition, and human-object interaction understanding in complex, dynamic environments. This research is significant for advancing fields like augmented and virtual reality, human-computer interaction, and embodied AI, by enabling more natural and intuitive interactions between humans and machines.
Papers
December 12, 2024
December 1, 2024
October 11, 2024
August 31, 2024
June 18, 2024
June 14, 2024
May 22, 2024
March 5, 2024
September 21, 2023
August 23, 2023
August 12, 2023
July 11, 2023
June 30, 2023
June 18, 2023
May 28, 2023
May 12, 2023
November 18, 2022
November 16, 2022
August 10, 2022