Hungarian Corpus
Research on Hungarian corpora focuses on developing robust and accessible natural language processing (NLP) tools for this morphologically rich language. Current efforts concentrate on building high-performing pipelines encompassing tasks like lemmatization (often employing hybrid neural and rule-based approaches), part-of-speech tagging, and named entity recognition, frequently utilizing frameworks like spaCy. These advancements, including new corpora for machine translation and automatic speech recognition (ASR), are crucial for bridging the resource gap in NLP for less-resourced languages like Hungarian, enabling broader access to NLP technologies and facilitating research in various applications.
Papers
April 4, 2024
August 24, 2023
June 13, 2023
February 1, 2022
January 18, 2022