Entity Corpus
Entity corpora are collections of text data annotated with named entities (people, places, organizations, etc.) across multiple languages, serving as crucial training data for natural language processing (NLP) tasks like named entity recognition (NER). Current research focuses on creating larger, more linguistically diverse corpora, often leveraging knowledge graphs like Wikidata, and employing transformer-based neural network architectures for improved NER performance, particularly in low-resource languages. These corpora are essential for advancing multilingual NLP capabilities and improving applications such as machine translation, information retrieval, and question answering systems.
Papers
May 15, 2024
May 8, 2024
March 30, 2024
August 15, 2023
November 4, 2022
October 22, 2022