Dataset Level

Dataset-level analysis focuses on understanding how inherent properties of datasets—such as class imbalance, sample size per class, and label quality—influence machine learning model performance and privacy. Current research investigates the impact of these properties on model accuracy, robustness, and vulnerability to membership inference attacks, often employing statistical metrics like Kappa and developing predictive models to guide resampling strategies for imbalanced datasets. This work is crucial for improving model reliability, mitigating biases, and ensuring responsible data usage across various applications, from computer vision to educational technology.

Papers