Paper ID: 2310.01206

appjsonify: An Academic Paper PDF-to-JSON Conversion Toolkit

Atsuki Yamaguchi, Terufumi Morishita

We present appjsonify, a Python-based PDF-to-JSON conversion toolkit for academic papers. It parses a PDF file using several visual-based document layout analysis models and rule-based text processing approaches. appjsonify is a flexible tool that allows users to easily configure the processing pipeline to handle a specific format of a paper they wish to process. We are publicly releasing appjsonify as an easy-to-install toolkit available via PyPI and GitHub.

Submitted: Oct 2, 2023