Causal News Corpus

The Causal News Corpus (CNC) is a benchmark dataset designed to advance research in natural language processing (NLP) by focusing on the automatic identification of causal relationships within news text. Current research emphasizes developing and evaluating models, often based on pre-trained transformers, that can accurately detect the presence of causality, identify causal and effect spans within sentences, and even infer causality from correlational statements. This work is significant because accurately identifying causality in text is crucial for various applications, including information extraction, event understanding, and potentially improving the reasoning capabilities of large language models.

Papers