Discourse Treebank

Discourse treebanks are corpora of text annotated with hierarchical structures representing the relationships between sentences and clauses, aiming to model the coherence and organization of discourse. Current research focuses on developing robust methods for automatically generating these structures, often employing techniques like multi-label classification, dependency parsing, and distant supervision from tasks such as sentiment analysis and topic segmentation, leveraging transformer-based models and graph-based representations. These efforts are driven by the need for larger, more diverse datasets to improve the accuracy and generalizability of discourse parsing models, which have significant implications for various natural language processing applications, including text summarization and question answering. The development of improved discourse treebanks and parsing methods is crucial for advancing our understanding of how language creates meaning beyond the sentence level.

Papers