Paper ID: 2311.02460
Extracting Network Structures from Corporate Organization Charts Using Heuristic Image Processing
Hiroki Sayama, Junichi Yamanoi
Organizational structure of corporations has potential to provide implications for dynamics and performance of corporate operations. However, this subject has remained unexplored because of the lack of readily available organization network datasets. To overcome the this gap, we developed a new heuristic image-processing method to extract and reconstruct organization network data from published organization charts. Our method analyzes a PDF file of a corporate organization chart and detects text labels, boxes, connecting lines, and other objects through multiple steps of heuristically implemented image processing. The detected components are reorganized together into a Python's NetworkX Graph object for visualization, validation and further network analysis. We applied the developed method to the organization charts of all the listed firms in Japan shown in the ``Organization Chart/System Diagram Handbook'' published by Diamond, Inc., from 2008 to 2011. Out of the 10,008 organization chart PDF files, our method was able to reconstruct 4,606 organization networks (data acquisition success rate: 46%). For each reconstructed organization network, we measured several network diagnostics, which will be used for further statistical analysis to investigate their potential correlations with corporate behavior and performance.
Submitted: Nov 4, 2023