Word Segmentation
Word segmentation, the task of dividing continuous text or speech into individual words, is crucial for various natural language processing applications, particularly in morphologically rich or unsegmented languages like those of East Asia. Current research emphasizes unsupervised methods, leveraging self-supervised speech models (like HuBERT and wav2vec2.0) and dynamic programming algorithms to discover word boundaries in audio, often incorporating contextual information and visual grounding for improved accuracy. These advancements are improving performance in low-resource scenarios and enabling applications such as speech recognition, machine translation, and sentiment analysis across diverse languages.
Papers
October 16, 2024
July 28, 2024
March 21, 2024
March 15, 2024
January 31, 2024
December 25, 2023
October 8, 2023
July 30, 2023
June 30, 2023
May 19, 2023
March 30, 2023
March 22, 2023
January 1, 2023
November 3, 2022
November 2, 2022
October 31, 2022
April 30, 2022
April 27, 2022
April 25, 2022