Audio Text Pair
Audio-text pair research focuses on developing robust models that effectively link audio and textual representations, enabling tasks like audio captioning, speech recognition, and cross-modal retrieval. Current research emphasizes improving model performance through techniques like contrastive learning, leveraging large language models for data augmentation and prompt engineering, and exploring hierarchical interactions between audio segments and textual phrases. This work is significant for advancing multimodal understanding in AI, with applications ranging from improved accessibility for individuals with hearing impairments to more accurate and efficient transcription and information retrieval systems.
Papers
October 14, 2024
September 20, 2024
August 26, 2024
August 17, 2024
July 25, 2024
June 10, 2024
May 23, 2024
March 7, 2024
January 10, 2024
November 1, 2023
September 18, 2023
September 14, 2023
July 28, 2023
May 3, 2023
February 16, 2023
February 12, 2023
January 30, 2023
November 14, 2022
September 28, 2022