Semi Structured
Semi-structured data, encompassing formats like tables, databases, and partially structured documents, presents unique challenges for information extraction and processing. Current research focuses on leveraging large language models (LLMs) and graph-based methods to improve information retrieval, question answering, and document editing from these sources, often incorporating techniques like knowledge graphs, triplet-based prefiltering, and multi-agent systems to enhance accuracy and efficiency. This area is significant because effective handling of semi-structured data is crucial for numerous applications, including legal reasoning, medical diagnosis, and e-commerce, driving the development of more robust and adaptable AI systems. The development of new benchmarks and datasets is also a key focus to facilitate further research and comparison of different approaches.
Papers
Wikidata as a seed for Web Extraction
Kunpeng Guo, Dennis Diefenbach, Antoine Gourru, Christophe Gravier
TAROT: A Hierarchical Framework with Multitask Co-Pretraining on Semi-Structured Data towards Effective Person-Job Fit
Yihan Cao, Xu Chen, Lun Du, Hao Chen, Qiang Fu, Shi Han, Yushu Du, Yanbin Kang, Guangming Lu, Zi Li
Semi-Structured Chain-of-Thought: Integrating Multiple Sources of Knowledge for Improved Language Model Reasoning
Xin Su, Tiep Le, Steven Bethard, Phillip Howard
TempTabQA: Temporal Question Answering for Semi-Structured Tables
Vivek Gupta, Pranshu Kandoi, Mahek Bhavesh Vora, Shuo Zhang, Yujie He, Ridho Reinanda, Vivek Srikumar