Ava ActiveSpeaker
Ava ActiveSpeaker is a benchmark dataset and task focused on automatically identifying the currently speaking person in video, using audio-visual cues. Research emphasizes improving model accuracy through advanced techniques like contrastive learning, incorporating speaker-specific information, and leveraging long-short term contextual relationships via graph neural networks or recurrent architectures. This research contributes to advancements in audio-visual scene understanding, with applications in areas such as speaker diarization, video editing, and human-computer interaction, particularly in scenarios with multiple speakers or challenging audio-visual conditions.
Papers
September 21, 2023
May 22, 2023
March 9, 2023
March 8, 2023
January 19, 2023
September 24, 2022
July 27, 2022
July 15, 2022
June 22, 2022