Challenge Task
Challenge tasks in computer vision, audio processing, and natural language processing drive advancements by focusing research efforts on specific, well-defined problems. Current research emphasizes developing robust and efficient models, often employing deep learning architectures like transformers, convolutional neural networks, and variational autoencoders, to improve performance metrics such as accuracy, efficiency, and generalization across diverse datasets and conditions. These challenges yield valuable benchmark datasets and innovative solutions with significant implications for various applications, including medical imaging, video enhancement, speech technology, and AI safety.
Papers
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Shilong Wu, Chenxi Wang, Hang Chen, Yusheng Dai, Chenyue Zhang, Ruoyu Wang, Hongbo Lan, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Zhong-Qiu Wang, Jia Pan, Jianqing Gao
Fine-tune the pretrained ATST model for sound event detection
Nian Shao, Xian Li, Xiaofei Li