Audio Captioning
Audio captioning aims to automatically generate natural language descriptions of audio content, bridging the gap between audio and text modalities. Current research focuses on improving caption quality, diversity, and efficiency through advancements in model architectures like diffusion models and transformers, often incorporating large language models for improved semantic understanding and evaluation. This field is significant for advancing audio understanding and multimedia applications, with ongoing efforts to address challenges such as data scarcity, evaluation metric limitations, and the development of more robust and generalizable models.
Papers
June 4, 2024
May 21, 2024
March 27, 2024
March 7, 2024
February 27, 2024
January 31, 2024
January 10, 2024
November 27, 2023
November 14, 2023
October 25, 2023
September 29, 2023
September 21, 2023
September 20, 2023
September 18, 2023
September 15, 2023
September 14, 2023
September 7, 2023