Chinese Corpus
Chinese corpora are collections of textual data in the Chinese language, serving as crucial resources for advancing natural language processing (NLP). Current research focuses on developing and evaluating large language models (LLMs) specifically trained on massive Chinese corpora, addressing challenges in classical Chinese understanding, and improving tasks like machine translation, speaker verification, and bias mitigation. These efforts are significant because they enhance the capabilities of NLP systems for a language with a rich history and vast user base, impacting fields ranging from historical research to financial technology.
Papers
November 9, 2024
November 7, 2024
October 24, 2024
October 17, 2024
August 24, 2024
July 4, 2024
May 11, 2024
April 29, 2024
April 5, 2024
March 1, 2024
December 4, 2023
November 6, 2023
September 8, 2023
August 1, 2023
January 1, 2023
November 30, 2022
September 16, 2022
September 12, 2022