Low Sample Size

High-dimensional, low sample size (HDLSS) data, where the number of features far exceeds the number of observations, presents a significant challenge in machine learning. Current research focuses on developing robust classification methods specifically tailored to HDLSS datasets, employing techniques like novel generative adversarial networks (GANs) for data augmentation and kernel methods based on random forest similarities to improve classification accuracy. These advancements are crucial for addressing real-world problems in diverse fields such as medicine and genomics, where data scarcity is often a limiting factor in building effective predictive models.

Papers

July 22, 2024

Distance-based mutual congestion feature selection with genetic algorithm for high-dimensional medical datasets
Hossein Nematzadeh, Joseph Mani, Zahra Nematzadeh, Ebrahim Akbari, Radziah Mohamad
Feature Selection Genetic Algorithm Binary Classification Congestion Prediction Low Sample Size

July 17, 2024

A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction
Nguyen Thach, Patrick Habecker, Bergen Johnston, Lillianna Cervantes, Anika Eisenbraun, Alex Mason, Kimberly Tyler, Bilal Khan, Hau Chan
GAN Model Short Term Tabular Data Augmentation Substance Use Low Sample Size

October 23, 2023

Random Forest Kernel for High-Dimension Low Sample Size Classification
Lucca Portes Cavalheiro, Simon Bernard, Jean Paul Barddal, Laurent Heutte
Machine Learning Random Forest Model Forest Classifier Low Sample Size

June 24, 2023

Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance
Jyotishka Ray Choudhury, Aytijhya Saha, Sarbojit Roy, Subhajit Dutta
Simple Classifier High Dimensional Data Classification Algorithm Robust Classification Contrastive Energy Low Sample Size

Low Sample Size

Papers

Distance-based mutual congestion feature selection with genetic algorithm for high-dimensional medical datasets

A Novel GAN Approach to Augment Limited Tabular Data for Short-Term Substance Use Prediction

Random Forest Kernel for High-Dimension Low Sample Size Classification

Robust Classification of High-Dimensional Data using Data-Adaptive Energy Distance