Czech Dataset
Research on Czech datasets focuses on developing and improving natural language processing (NLP) resources and models for the Czech language. Current efforts concentrate on creating larger, more diverse datasets for various tasks, including machine translation, sentiment analysis, and fact verification, often employing transformer-based architectures like BERT and Wav2Vec 2.0. These advancements are enabling more accurate and efficient NLP applications in Czech, impacting areas such as search engines, speech recognition, and news analysis, while also providing valuable resources for cross-lingual NLP research. The open-sourcing of many of these datasets and models fosters collaboration and reproducibility within the research community.
Papers
June 18, 2024
April 10, 2024
November 23, 2023
July 20, 2023
June 7, 2023
April 17, 2023
December 16, 2022
August 27, 2022
June 15, 2022
April 29, 2022
January 26, 2022