Data Distribution
Data distribution research focuses on understanding and manipulating the way data is spread across different datasets and within individual datasets. Current research emphasizes handling non-independent and identically distributed (non-IID) data, particularly in federated learning, where algorithms like personalized weight aggregation and novel clustering methods are being developed to address data heterogeneity across distributed clients. This work is crucial for improving the robustness, efficiency, and fairness of machine learning models trained on diverse and often limited datasets, with applications ranging from healthcare to autonomous systems. Furthermore, research explores methods to leverage public data to improve the privacy and accuracy of private learning algorithms.