Machine Generated Text
Machine-generated text detection aims to distinguish computer-generated text from human-written text, addressing concerns about misinformation and authenticity. Current research focuses on developing robust, model-agnostic detectors that can identify text generated by various large language models (LLMs), often employing techniques like zero-shot learning, ensemble methods, and contrastive learning with transformer-based architectures. This field is crucial for maintaining the integrity of information sources across diverse domains, from news and social media to education and scientific publishing, and ongoing efforts are improving the accuracy and generalizability of detection methods.
Papers
k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text
Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, Tianxing He
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection
Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov
ToBlend: Token-Level Blending With an Ensemble of LLMs to Attack AI-Generated Text Detection
Fan Huang, Haewoon Kwak, Jisun An