Data Lake

Data lakes are large repositories storing diverse, heterogeneous data for advanced analytics, aiming to overcome limitations of traditional data management systems. Current research focuses on improving data discovery and organization within these lakes, employing techniques like Formal Concept Analysis to build unified schemas, neural models (e.g., Swin-Unet, TabSketchFM) for efficient data search and table augmentation, and large language models to extract knowledge and improve metadata management. These advancements are crucial for unlocking the value of increasingly massive datasets, enabling more efficient data analysis and driving improvements in diverse fields such as transportation, environmental monitoring, and machine learning model management.

Papers