Acoustic Representation

Acoustic representation focuses on transforming raw audio signals into meaningful numerical representations that capture relevant information for various speech and audio processing tasks. Current research emphasizes developing robust and efficient representations using deep learning models, such as transformers and generative adversarial networks (GANs), often incorporating multi-scale and multi-modal approaches to leverage both acoustic and linguistic features. These advancements are driving improvements in applications ranging from speech recognition and synthesis to speaker identification and emotion analysis, ultimately leading to more accurate and versatile audio technologies.

Papers

August 21, 2022

CMSBERT-CLR: Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations
Junghun Kim, Jihie Kim
Contrastive Learning Multimodal Sentiment Analysis Multimodal Content Acoustic Representation Multimodal BERT

August 6, 2022

SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring
Jisheng Bai, Jianfeng Chen, Mou Wang, Muhammad Saad Ayub, Qingli Yan
Acoustic Feature Anomalous Sound Detection Acoustic Representation Machine Condition Monitoring Dual Path Transformer Anomalous Sound

July 14, 2022

Two-Pass Low Latency End-to-End Spoken Language Understanding
Siddhant Arora, Siddharth Dalmia, Xuankai Chang, Brian Yan, Alan Black, Shinji Watanabe
Language Model Spoken Language Understanding Acoustic Representation End to End Spoken Language End to End Speech Recognition

April 1, 2022

WavFT: Acoustic model finetuning with labelled and unlabelled data
Utkarsh Chauhan, Vikas Joshi, Rupesh R. Mehta
Fine Tuning Acoustic Model Self Supervised Learning Method Acoustic Representation Acoustic Context

March 31, 2022

Impact of Environmental Noise on Alzheimer's Disease Detection from Speech: Should You Let a Baby Cry?
Jekaterina Novikova
Speech Analysis Alzheimer'S Disease Speech Processing Disease Detection Acoustic Feature Acoustic Representation Environmental Noise Infant Cry

December 14, 2021

Improving Hybrid CTC/Attention End-to-end Speech Recognition with Pretrained Acoustic and Language Model
Keqi Deng, Songjun Cao, Yike Zhang, Long Ma
Language Model Automatic Speech Recognition Speech Recognition Human Attention CTC Based Acoustic Representation Audio Pre Training

Acoustic Representation

Papers

CMSBERT-CLR: Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations

SSDPT: Self-Supervised Dual-Path Transformer for Anomalous Sound Detection in Machine Condition Monitoring

Two-Pass Low Latency End-to-End Spoken Language Understanding

WavFT: Acoustic model finetuning with labelled and unlabelled data

Impact of Environmental Noise on Alzheimer's Disease Detection from Speech: Should You Let a Baby Cry?

Improving Hybrid CTC/Attention End-to-end Speech Recognition with Pretrained Acoustic and Language Model