Ava ActiveSpeaker
Ava ActiveSpeaker is a benchmark dataset and task focused on automatically identifying the currently speaking person in video, using audio-visual cues. Research emphasizes improving model accuracy through advanced techniques like contrastive learning, incorporating speaker-specific information, and leveraging long-short term contextual relationships via graph neural networks or recurrent architectures. This research contributes to advancements in audio-visual scene understanding, with applications in areas such as speaker diarization, video editing, and human-computer interaction, particularly in scenarios with multiple speakers or challenging audio-visual conditions.
Papers
December 11, 2024
December 6, 2024
September 21, 2023
May 22, 2023
March 9, 2023
March 8, 2023
January 19, 2023
September 24, 2022
July 27, 2022
July 15, 2022
June 22, 2022