Paper ID: 2408.13038

Improving the Classification Effect of Clinical Images of Diseases for Multi-Source Privacy Protection

Tian Bowen, Xu Zhengyang, Yin Zhihao, Wang Jingying, Yue Yutao

Privacy data protection in the medical field poses challenges to data sharing, limiting the ability to integrate data across hospitals for training high-precision auxiliary diagnostic models. Traditional centralized training methods are difficult to apply due to violations of privacy protection principles. Federated learning, as a distributed machine learning framework, helps address this issue, but it requires multiple hospitals to participate in training simultaneously, which is hard to achieve in practice. To address these challenges, we propose a medical privacy data training framework based on data vectors. This framework allows each hospital to fine-tune pre-trained models on private data, calculate data vectors (representing the optimization direction of model parameters in the solution space), and sum them up to generate synthetic weights that integrate model information from multiple hospitals. This approach enhances model performance without exchanging private data or requiring synchronous training. Experimental results demonstrate that this method effectively utilizes dispersed private data resources while protecting patient privacy. The auxiliary diagnostic model trained using this approach significantly outperforms models trained independently by a single hospital, providing a new perspective for resolving the conflict between medical data privacy protection and model training and advancing the development of medical intelligence.

Submitted: Aug 23, 2024