Unstructured Data
Unstructured data, encompassing text, images, and other non-tabular formats, presents significant challenges for analysis and knowledge extraction. Current research focuses on leveraging large language models (LLMs) and other deep learning architectures, such as transformers and graph neural networks, to extract meaningful information, perform entity matching, and enable efficient querying and summarization of these diverse data types. This work is crucial for advancing various fields, including healthcare, finance, and scientific research, by unlocking the insights hidden within massive volumes of currently underutilized unstructured data. The development of robust and scalable methods for handling unstructured data is transforming information processing across numerous sectors.
Papers
SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata
Mark Díaz, Sunipa Dev, Emily Reif, Emily Denton, Vinodkumar Prabhakaran
General-Purpose vs. Domain-Adapted Large Language Models for Extraction of Structured Data from Chest Radiology Reports
Ali H. Dhanaliwala, Rikhiya Ghosh, Sanjeev Kumar Karn, Poikavila Ullaskrishnan, Oladimeji Farri, Dorin Comaniciu, Charles E. Kahn
Automatic Coding at Scale: Design and Deployment of a Nationwide System for Normalizing Referrals in the Chilean Public Healthcare System
Fabián Villena, Matías Rojas, Felipe Arias, Jorge Pacheco, Paulina Vera, Jocelyn Dunstan
A Personalized Reinforcement Learning Summarization Service for Learning Structure from Unstructured Data
Samira Ghodratnama, Amin Beheshti, Mehrdad Zakershahrak