Grapheme Based Encoding
Grapheme-based encoding represents words as sequences of graphemes—the smallest units of writing corresponding to sounds—offering an alternative to subword-based methods in natural language processing. Current research focuses on improving the robustness and fairness of grapheme encoding across diverse languages, particularly those with complex writing systems, often employing neural network architectures like CNNs and transformers, and exploring techniques like grapheme pair encoding (GPE) to enhance performance. This approach holds significant promise for advancing language modeling, speech recognition, and optical character recognition, particularly for low-resource languages and those with significant orthographic variation.
Papers
October 2, 2024
September 17, 2024
April 2, 2024
December 15, 2023
August 11, 2023
May 11, 2023
August 12, 2022
February 21, 2022