Raw Data
Raw data, the foundation of machine learning, is the subject of intense research focusing on improving its quality, accessibility, and utility. Current efforts center on addressing data heterogeneity and sparsity through techniques like data augmentation, federated learning, and novel clustering algorithms, often employing variational autoencoders, large language models, and various neural network architectures for data processing and model training. These advancements are crucial for enhancing the accuracy, fairness, and efficiency of machine learning models across diverse applications, from speech recognition and medical diagnostics to environmental monitoring and social science research. The ultimate goal is to extract meaningful insights and build robust, reliable models from often imperfect or incomplete datasets.
Papers
Random Heterogeneous Neurochaos Learning Architecture for Data Classification
Remya Ajai A S, Nithin Nagaraj
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists
Michał Pietruszka, Łukasz Borchmann, Aleksander Jędrosz, Paweł Morawiecki
Federated Learning under Periodic Client Participation and Heterogeneous Data: A New Communication-Efficient Algorithm and Analysis
Michael Crawshaw, Mingrui Liu
Augmenting Polish Automatic Speech Recognition System With Synthetic Data
Łukasz Bondaruk, Jakub Kubiak, Mateusz Czyżnikiewicz
Data subsampling for Poisson regression with pth-root-link
Han Cheng Lie, Alexander Munteanu
ReMix: Training Generalized Person Re-identification on a Mixture of Data
Timur Mamedov, Anton Konushin, Vadim Konushin
Learning Infinitesimal Generators of Continuous Symmetries from Data
Gyeonghoon Ko, Hyunsu Kim, Juho Lee
On the Statistical Complexity of Estimating VENDI Scores from Empirical Data
Azim Ospanov, Farzan Farnia
Not All LLM-Generated Data Are Equal: Rethinking Data Weighting in Text Classification
Hsun-Yu Kuo, Yin-Hsiang Liao, Yu-Chieh Chao, Wei-Yun Ma, Pu-Jen Cheng
Federated Time Series Generation on Feature and Temporally Misaligned Data
Chenrui Fan, Zhi Wen Soi, Aditya Shankar, Abele Mălan, Lydia Y. Chen
Reference-Free Formula Drift with Reinforcement Learning: From Driving Data to Tire Energy-Inspired, Real-World Policies
Franck Djeumou, Michael Thompson, Makoto Suminaka, John Subosits
Learning Variational Inequalities from Data: Fast Generalization Rates under Strong Monotonicity
Eric Zhao, Tatjana Chavdarova, Michael Jordan
Federated Transformer: Multi-Party Vertical Federated Learning on Practical Fuzzily Linked Data
Zhaomin Wu, Junyi Hou, Yiqun Diao, Bingsheng He
Deep learning for model correction of dynamical systems with data scarcity
Caroline Tatsuoka, Dongbin Xiu
Escaping the Forest: Sparse Interpretable Neural Networks for Tabular Data
Salvatore Raieli, Abdulrahman Altahhan, Nathalie Jeanray, Stéphane Gerart, Sebastien Vachenc