Data Normalisation

Data normalization is a crucial preprocessing step in machine learning, aiming to standardize data features for improved model performance and fairness. Current research emphasizes the impact of normalization choices on model explainability, particularly in sensitive domains like medicine, and explores optimal normalization strategies for various data types, including text and biomedical data, often comparing different distance functions and their effectiveness in different contexts. These efforts are significant because appropriate normalization techniques are essential for reliable model training, accurate evaluation (e.g., avoiding biases in metrics like nDCG), and ensuring fair and generalizable results across diverse datasets.

Papers