Raw Data
Raw data, the foundation of machine learning, is the subject of intense research focusing on improving its quality, accessibility, and utility. Current efforts center on addressing data heterogeneity and sparsity through techniques like data augmentation, federated learning, and novel clustering algorithms, often employing variational autoencoders, large language models, and various neural network architectures for data processing and model training. These advancements are crucial for enhancing the accuracy, fairness, and efficiency of machine learning models across diverse applications, from speech recognition and medical diagnostics to environmental monitoring and social science research. The ultimate goal is to extract meaningful insights and build robust, reliable models from often imperfect or incomplete datasets.
Papers
Inductive Learning of Robot Task Knowledge from Raw Data and Online Expert Feedback
Daniele Meli, Paolo Fiorini
Data and System Perspectives of Sustainable Artificial Intelligence
Tao Xie, David Harel, Dezhi Ran, Zhenwen Li, Maoliang Li, Zhi Yang, Leye Wang, Xiang Chen, Ying Zhang, Wentao Zhang, Meng Li, Chen Zhang, Linyi Li, Assaf Marron
Synthesis and Analysis of Data as Probability Measures with Entropy-Regularized Optimal Transport
Brendan Mallery, James M. Murphy, Shuchin Aeron
On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena
Tarek Naous, Wei Xu
Towards resilient cities: A hybrid simulation framework for risk mitigation through data driven decision making
David Carraminana, Ana M. Bernardos, Juan A. Besada, Jose R. Casar
Decoding EEG Speech Perception with Transformers and VAE-based Data Augmentation
Terrance Yu-Hao Chen, Yulin Chen, Pontus Soederhaell, Sadrishya Agrawal, Kateryna Shapovalenko
Method of data forward generation with partial differential equations for machine learning modeling in fluid mechanics
Ruilin Chen, Xiaowei Jin, Nikolaus A. Adams, Hui Li
A Bayesian Approach for Discovering Time- Delayed Differential Equation from Data
Debangshu Chowdhury, Souvik Chakraborty
Tree-based RAG-Agent Recommendation System: A Case Study in Medical Test Data
Yahe Yang, Chengyue Huang