Tabular Data
Tabular data, ubiquitous in various fields, presents unique challenges for machine learning due to its structured nature and mixed data types. Current research focuses on improving model performance through techniques like self-supervised learning (e.g., JEPA), generative models (e.g., GANs, VAEs, diffusion models) for data augmentation and synthesis, and the integration of large language models (LLMs) for enhanced feature extraction and data generation. These advancements aim to address limitations in existing methods, such as gradient boosted decision trees, and improve accuracy, efficiency, and robustness in applications ranging from medical diagnosis to anomaly detection and scientific simulations.
Papers
A Method for Discovering Novel Classes in Tabular Data
Colin Troisemaine, Joachim Flocon-Cholet, Stéphane Gosselin, Sandrine Vaton, Alexandre Reiffers-Masson, Vincent Lemaire
AnaMeta: A Table Understanding Dataset of Field Metadata Knowledge Shared by Multi-dimensional Data Analysis Tasks
Xinyi He, Mengyu Zhou, Mingjie Zhou, Jialiang Xu, Xiao Lv, Tianle Li, Yijia Shao, Shi Han, Zejian Yuan, Dongmei Zhang