Unstructured Data
Unstructured data, encompassing text, images, and other non-tabular formats, presents significant challenges for analysis and knowledge extraction. Current research focuses on leveraging large language models (LLMs) and other deep learning architectures, such as transformers and graph neural networks, to extract meaningful information, perform entity matching, and enable efficient querying and summarization of these diverse data types. This work is crucial for advancing various fields, including healthcare, finance, and scientific research, by unlocking the insights hidden within massive volumes of currently underutilized unstructured data. The development of robust and scalable methods for handling unstructured data is transforming information processing across numerous sectors.
Papers
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction
Qintong Zhang, Victor Shea-Jay Huang, Bin Wang, Junyuan Zhang, Zhengren Wang, Hao Liang, Shawn Wang, Matthieu Lin, Wentao Zhang, Conghui He
SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents
Qi Zhang, Zhijia Chen, Huitong Pan, Cornelia Caragea, Longin Jan Latecki, Eduard Dragut