Data Discovery

Data discovery focuses on efficiently identifying and utilizing relevant data within massive, heterogeneous repositories like data lakes. Current research emphasizes developing and benchmarking neural models, particularly those leveraging tabular data representations and sketch-based approaches, to improve the discovery of related tables (e.g., joinable, unionable, or subset relationships). This work is crucial for enabling effective data analysis and machine learning tasks in large-scale data environments, addressing the significant challenge of navigating and extracting value from increasingly complex data landscapes.

Papers