Clickbait Corpus

Clickbait corpora are collections of online content designed to study and combat clickbait—deceptive headlines designed to lure users. Research focuses on developing automated clickbait detection systems, often employing techniques like text summarization, contrastive learning with models such as BERT, and multi-modal analysis incorporating metadata and user engagement data. These efforts aim to improve the accuracy of clickbait identification across various languages and platforms, ultimately contributing to a healthier online information ecosystem and mitigating the negative impacts of clickbait on users. The creation of diverse, multilingual corpora, like those in Spanish, Bangla, and Romanian, is crucial for advancing this research.

Papers