Raw Data
Raw data, the foundation of machine learning, is the subject of intense research focusing on improving its quality, accessibility, and utility. Current efforts center on addressing data heterogeneity and sparsity through techniques like data augmentation, federated learning, and novel clustering algorithms, often employing variational autoencoders, large language models, and various neural network architectures for data processing and model training. These advancements are crucial for enhancing the accuracy, fairness, and efficiency of machine learning models across diverse applications, from speech recognition and medical diagnostics to environmental monitoring and social science research. The ultimate goal is to extract meaningful insights and build robust, reliable models from often imperfect or incomplete datasets.
Papers
Robust Reinforcement Learning under Diffusion Models for Data with Jumps
Chenyang Jiang, Donggyu Kim, Alejandra Quintos, Yazhen Wang
Superpixel-informed Implicit Neural Representation for Multi-Dimensional Data
Jiayi Li, Xile Zhao, Jianli Wang, Chao Wang, Min Wang
Data Driven Automatic Electrical Machine Preliminary Design with Artificial Intelligence Expert Guidance
Yiwei Wang, Tao Yang, Hailin Huang, Tianjie Zou, Jincai Li, Nuo Chen, Zhuoran Zhang
Drone Detection using Deep Neural Networks Trained on Pure Synthetic Data
Mariusz Wisniewski, Zeeshan A. Rana, Ivan Petrunin, Alan Holt, Stephen Harman
Optimal Transport-Based Displacement Interpolation with Data Augmentation for Reduced Order Modeling of Nonlinear Dynamical Systems
Moaad Khamlich, Federico Pichi, Michele Girfoglio, Annalisa Quaini, Gianluigi Rozza
An Information Theoretic Approach to Operationalize Right to Data Protection
Abhinav Java, Simra Shahid, Chirag Agarwal
Data-Driven Predictive Control of Nonholonomic Robots Based on a Bilinear Koopman Realization: Data Does Not Replace Geometry
Mario Rosenfelder, Lea Bold, Hannes Eschmann, Peter Eberhard, Karl Worthmann, Henrik Ebel
Data-driven discovery of mechanical models directly from MRI spectral data
D.G.J. Heesterbeek, M.H.C. van Riel, T. van Leeuwen, C.A.T. van den Berg, A. Sbrizzi
Mixed Effects Deep Learning Autoencoder for interpretable analysis of single cell RNA Sequencing data
Aixa X. Andrade, Son Nguyen, Albert Montillo
A Similarity-Based Oversampling Method for Multi-label Imbalanced Text Data
Ismail Hakki Karaman, Gulser Koksal, Levent Eriskin, Salih Salihoglu
Cross-modal semantic segmentation for indoor environmental perception using single-chip millimeter-wave radar raw data
Hairuo Hu, Haiyong Cong, Zhuyu Shao, Yubo Bi, Jinghao Liu