Data Collaboration

Data collaboration focuses on enabling joint analysis of data held by multiple institutions without compromising privacy or requiring direct data sharing. Current research emphasizes methods like federated learning and data collaboration analysis, often employing dimensionality reduction and techniques based on the generalized eigenvalue problem or matrix manifolds to create and share intermediate representations, enabling collaborative model training and inference. This approach is significant for improving the accuracy and generalizability of machine learning models while addressing ethical and legal concerns around data privacy, with applications ranging from recommender systems to causal inference and drug discovery.

Papers