Text Data
Text data analysis is a rapidly evolving field focused on extracting meaningful information and insights from textual sources. Current research emphasizes developing and refining methods for topic modeling, sentiment analysis, and causal inference using text, often leveraging large language models (LLMs) like BERT and GPT variants, along with other techniques such as graph-based word embeddings and transformer-based architectures. These advancements are crucial for improving various applications, including healthcare, finance, and social sciences, by enabling more accurate and efficient processing of the vast amounts of textual data generated daily. Furthermore, ongoing work addresses challenges like bias detection and mitigation in LLMs and the development of robust methods for handling code-mixed and noisy text data.
Papers
Figuring out Figures: Using Textual References to Caption Scientific Figures
Stanley Cao, Kevin Liu
This Paper Had the Smartest Reviewers -- Flattery Detection Utilising an Audio-Textual Transformer-Based Approach
Lukas Christ, Shahin Amiriparian, Friederike Hawighorst, Ann-Kathrin Schill, Angelo Boutalikakis, Lorenz Graf-Vlachy, Andreas König, Björn W. Schuller
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale
Guilherme Penedo, Hynek Kydlíček, Loubna Ben allal, Anton Lozhkov, Margaret Mitchell, Colin Raffel, Leandro Von Werra, Thomas Wolf