Text Degeneration

Text degeneration, the tendency of neural language models to generate repetitive, dull, or incoherent text, is a significant challenge hindering the development of high-quality natural language generation systems. Current research focuses on understanding the root causes of this problem, often linking it to repetitions in training data and limitations of standard training objectives like cross-entropy. Researchers are exploring improved training methods, such as contrastive learning and modifications to sampling algorithms, to encourage more diverse and creative text generation. Addressing text degeneration is crucial for advancing the capabilities of language models and improving their applicability in various domains, including dialogue systems and text-to-speech technologies.

Papers