Paper ID: 2409.06456

Attention-Based Beamformer For Multi-Channel Speech Enhancement

Jinglin Bai, Hao Li, Xueliang Zhang, Fei Chen

Minimum Variance Distortionless Response (MVDR) is a classical adaptive beamformer that theoretically ensures the distortionless transmission of signals in the target direction. Its performance in noise reduction actually depends on the accuracy of the noise spatial covariance matrix (SCM) estimate. Although recent deep learning has shown remarkable performance in multi-channel speech enhancement, the property of distortionless response still makes MVDR highly popular in real applications. In this paper, we propose an attention-based mechanism to calculate the speech and noise SCM and then apply MVDR to obtain the enhanced speech. Moreover, a deep learning architecture using the inplace convolution operator and frequency-independent LSTM has proven effective in facilitating SCM estimation. The model is optimized in an end-to-end manner. Experimental results indicate that the proposed method is extremely effective in tracking moving or stationary speakers under non-causal and causal conditions, outperforming other baselines. It is worth mentioning that our model has only 0.35 million parameters, making it easy to be deployed on edge devices.

Submitted: Sep 10, 2024