Partitioned Data
Partitioned data, where datasets are divided into subsets for training or analysis, is a crucial area of research addressing challenges in large-scale machine learning and data privacy. Current research focuses on developing efficient partitioning strategies, including novel clustering algorithms and optimized indexing structures, to improve model training speed, accuracy, and generalizability across diverse datasets and model architectures like LLMs and GNNs. These advancements are vital for handling the increasing volume and complexity of data in various fields, from IoT applications and graph embeddings to federated learning and biomedical image analysis, ultimately enhancing the reliability and scalability of machine learning systems.
Papers
Communication-Efficient Hybrid Federated Learning for E-health with Horizontal and Vertical Data Partitioning
Chong Yu, Shuaiqi Shen, Shiqiang Wang, Kuan Zhang, Hai Zhao
VFLGAN: Vertical Federated Learning-based Generative Adversarial Network for Vertically Partitioned Data Publication
Xun Yuan, Yang Yang, Prosanta Gope, Aryan Pasikhani, Biplab Sikdar