Chinese Language Model
Chinese language models (CLMs) are large language models trained on massive datasets of Chinese text, aiming to achieve human-level understanding and generation of Chinese language. Current research focuses on improving CLM performance across diverse tasks, including question answering, mathematical reasoning, and dialogue generation, often leveraging transfer learning techniques from English models and incorporating features like pinyin. This research is crucial for advancing natural language processing in Chinese, impacting applications ranging from improved machine translation and chatbots to more nuanced analysis of social media and mental health data. Furthermore, ongoing work addresses biases and limitations in existing CLMs, striving for more reliable and ethically sound models.
Papers
Multi-Granularity Tibetan Textual Adversarial Attack Method Based on Masked Language Model
Xi Cao, Nuo Qun, Quzong Gesang, Yulei Zhu, Trashi Nyima
Pay Attention to the Robustness of Chinese Minority Language Models! Syllable-level Textual Adversarial Attack on Tibetan Script
Xi Cao, Dolma Dawa, Nuo Qun, Trashi Nyima