Structured Document
Structured document research focuses on efficiently extracting and representing information from diverse document formats, aiming to bridge the gap between unstructured data (like PDFs and images) and structured, machine-readable formats. Current research emphasizes developing robust models, including multimodal approaches and those leveraging graph convolutional networks and large language models, to handle complex layouts and diverse data types like text, tables, and images. This work is crucial for improving information retrieval, enabling advanced analytics across various domains, and facilitating the development of more reliable and efficient AI systems that can process and understand complex documents.
Papers
Tree of Problems: Improving structured problem solving with compositionality
Armel Zebaze, Benoît Sagot, Rachel Bawden
SEGMENT+: Long Text Processing with Short-Context Language Models
Wei Shi, Shuang Li, Kerun Yu, Jinglei Chen, Zujie Liang, Xinhui Wu, Yuxi Qian, Feng Wei, Bo Zheng, Jiaqing Liang, Jiangjie Chen, Yanghua Xiao