Paper ID: 2407.12851

ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data

Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Junqiu Ye, Chu Liao, Qi Hao, Wen Ye, Cheng Luo, Xinyan Wang, Chuang Cheng, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou

Symptom phenotypes are one of the key types of manifestations for diagnosis and treatment of various disease conditions. However, the diversity of symptom terminologies is one of the major obstacles hindering the analysis and knowledge sharing of various types of symptom-related medical data particularly in the fields of Traditional Chinese Medicine (TCM). Objective: This study aimed to construct an Integrated Ontology of symptom phenotypes (ISPO) to support the data mining of Chinese EMRs and real-world study in TCM field. Methods: To construct an integrated ontology of symptom phenotypes (ISPO), we manually annotated classical TCM textbooks and large-scale Chinese electronic medical records (EMRs) to collect symptom terms with support from a medical text annotation system. Furthermore, to facilitate the semantic interoperability between different terminologies, we incorporated public available biomedical vocabularies by manual mapping between Chinese terms and English terms with cross-references to source vocabularies. In addition, we evaluated the ISPO using independent clinical EMRs to provide a high-usable medical ontology for clinical data analysis. Results: By integrating 78,696 inpatient cases of EMRs, 5 biomedical vocabularies, 21 TCM books and dictionaries, ISPO provides 3,147 concepts, 23,475 terms, and 55,552 definition or contextual texts. Adhering to the taxonomical structure of the related anatomical systems of symptom phenotypes, ISPO provides 12 top-level categories and 79 middle-level sub-categories. The validation of data analysis showed the ISPO has a coverage rate of 95.35%, 98.53% and 92.66% for symptom terms with occurrence rates of 0.5% in additional three independent curated clinical datasets, which can demonstrate the significant value of ISPO in mapping clinical terms to ontologies.

Submitted: Jul 8, 2024