Paper ID: 2305.06148

A semi-automatic method for document classification in the shipping industry

Narayanan Arvind

In the shipping industry, document classification plays a crucial role in ensuring that the necessary documents are properly identified and processed for customs clearance. OCR technology is being used to automate the process of document classification, which involves identifying important documents such as Commercial Invoices, Packing Lists, Export/Import Customs Declarations, Bills of Lading, Sea Waybills, Certificates, Air or Rail Waybills, Arrival Notices, Certificate of Origin, Importer Security Filings, and Letters of Credit. By using OCR technology, the shipping industry can improve accuracy and efficiency in document classification and streamline the customs clearance process. The aim of this study is to build a robust document classification system based on keyword frequencies. The research is carried out by analyzing Contract-Breach law documents available with IN-D. The documents were collected by scraping the Singapore Government Judiciary website. The database developed has 250 Contract-Breach documents. These documents are splitted to generate 200 training documents and 50 test documents. A semi-automatic approach is used to select keyword vectors for document classification. The accuracy of the reported model is 92.00 %.

Submitted: Mar 29, 2023