Categorical Encoding

Categorical encoding transforms categorical data into numerical representations suitable for machine learning algorithms. Current research focuses on developing and comparing various encoding methods, including one-hot encoding, target encoding, and embedding techniques, often within the context of specific model types like tree-based or neural network models. Benchmarking studies highlight the significant impact of encoding choices on model performance and fairness, particularly in applications with high-cardinality categorical variables or imbalanced datasets. These findings are crucial for improving the accuracy and reliability of machine learning models across diverse fields, such as fraud detection and cybersecurity.

Papers