Audio Caption
Audio captioning aims to automatically generate natural language descriptions of audio content, bridging the gap between sound and text. Current research focuses on improving the accuracy and semantic richness of these captions, often employing transformer-based architectures and leveraging large language models for enhanced reasoning and contextual understanding. This field is significant for advancing multimodal understanding in AI, with applications ranging from improved accessibility for the hearing impaired to more effective audio search and retrieval systems. Furthermore, research is actively exploring better evaluation metrics that align with human perception of sound semantics.
Papers
October 31, 2024
September 13, 2024
June 27, 2024
June 7, 2024
March 27, 2024
November 16, 2023
October 7, 2023
September 8, 2023
September 6, 2023
August 8, 2023
June 16, 2023
March 22, 2023
February 1, 2023
November 12, 2022
November 5, 2022
October 10, 2022
September 20, 2022
August 24, 2022
August 13, 2022