Topic Modeling
Topic modeling is a machine learning technique used to discover underlying themes (topics) within large collections of text data, aiming to provide a structured and interpretable summary of the information. Current research focuses on improving topic coherence and interpretability, particularly for short texts and multilingual data, often employing advanced models like BERT and other transformer-based architectures, variational autoencoders, and graph neural networks alongside traditional methods such as LDA and NMF. These advancements are enhancing the utility of topic modeling across diverse fields, from social media analysis and fake news detection to scientific literature review and legal document organization, by providing more accurate and insightful thematic representations of complex textual data. Furthermore, the integration of large language models is significantly improving topic labeling and evaluation.
Papers
Beyond Text-to-Text: An Overview of Multimodal and Generative Artificial Intelligence for Education Using Topic Modeling
Ville Heilala, Roberto Araya, Raija Hämäläinen
Beats of Bias: Analyzing Lyrics with Topic Modeling and Gender Bias Measurements
Danqing Chen, Adithi Satish, Rasul Khanbayov, Carolin M. Schuster, Georg Groh
Qualitative Insights Tool (QualIT): LLM Enhanced Topic Modeling
Satya Kapoor, Alex Gil, Sreyoshi Bhaduri, Anshul Mittal, Rutu Mulkar