Machine Generated
Machine-generated text detection focuses on distinguishing computer-generated content from human-written text, driven by the increasing sophistication of large language models (LLMs). Current research emphasizes developing robust and generalizable detection methods, often employing transformer-based architectures and exploring techniques like watermarking, rewriting analysis, and multi-modal approaches (combining text, image, and audio data). This field is crucial for mitigating the risks of misinformation, plagiarism, and other forms of malicious use of LLMs, impacting various sectors including journalism, education, and online content moderation.
Papers
k-SemStamp: A Clustering-Based Semantic Watermark for Detection of Machine-Generated Text
Abe Bohan Hou, Jingyu Zhang, Yichen Wang, Daniel Khashabi, Tianxing He
M4GT-Bench: Evaluation Benchmark for Black-Box Machine-Generated Text Detection
Yuxia Wang, Jonibek Mansurov, Petar Ivanov, Jinyan Su, Artem Shelmanov, Akim Tsvigun, Osama Mohanned Afzal, Tarek Mahmoud, Giovanni Puccetti, Thomas Arnold, Alham Fikri Aji, Nizar Habash, Iryna Gurevych, Preslav Nakov