Document Processing
Document processing research focuses on efficiently and accurately extracting information from diverse document types, including scanned images and digital formats. Current efforts center on developing multimodal models that integrate text, layout, and visual information, often employing transformer-based architectures and techniques like state space models to handle long documents and reduce computational costs. These advancements are crucial for automating tasks across various sectors, such as banking and finance, improving efficiency and reducing operational expenses while addressing challenges like document forgery detection. The field is also exploring cost-optimization strategies for large language model usage in document processing applications.
Papers
Mitigating Hallucination with ZeroG: An Advanced Knowledge Management Engine
Anantha Sharma, Sheeba Elizabeth John, Fatemeh Rezapoor Nikroo, Krupali Bhatt, Mrunal Zambre, Aditi Wikhe
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding
Jaeyoo Park, Jin Young Choi, Jeonghyung Park, Bohyung Han