Open Corpus

Open corpora, large collections of publicly available text and other data, are increasingly crucial for advancing various fields of research. Current research focuses on developing and improving these corpora, including creating benchmarks for evaluating multi-object tracking and building models to extract information like character and emotion from narratives or mathematical concepts from scientific texts. This work facilitates advancements in natural language processing, knowledge graph construction, and other areas by providing researchers with standardized, accessible datasets for training and evaluating algorithms, ultimately leading to more robust and reliable models.

Papers