Unstructured Data
Unstructured data, encompassing text, images, and other non-tabular formats, presents significant challenges for analysis and knowledge extraction. Current research focuses on leveraging large language models (LLMs) and other deep learning architectures, such as transformers and graph neural networks, to extract meaningful information, perform entity matching, and enable efficient querying and summarization of these diverse data types. This work is crucial for advancing various fields, including healthcare, finance, and scientific research, by unlocking the insights hidden within massive volumes of currently underutilized unstructured data. The development of robust and scalable methods for handling unstructured data is transforming information processing across numerous sectors.
Papers
Learning Multi-view Molecular Representations with Structured and Unstructured Knowledge
Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, Zikun Nie, Hao Zhou, Zaiqing Nie
TabularFM: An Open Framework For Tabular Foundational Models
Quan M. Tran, Suong N. Hoang, Lam M. Nguyen, Dzung Phan, Hoang Thanh Lam
How In-Context Learning Emerges from Training on Unstructured Data: On the Role of Co-Occurrence, Positional Information, and Noise Structures
Kevin Christian Wibisono, Yixin Wang
Leveraging Large Language Models for Entity Matching
Qianyu Huang, Tongfang Zhao
GAMedX: Generative AI-based Medical Entity Data Extractor Using Large Language Models
Mohammed-Khalil Ghali, Abdelrahman Farrag, Hajar Sakai, Hicham El Baz, Yu Jin, Sarah Lam