Categorical Encoding
Categorical encoding transforms categorical data into numerical representations suitable for machine learning algorithms. Current research focuses on developing and comparing various encoding methods, including one-hot encoding, target encoding, and embedding techniques, often within the context of specific model types like tree-based or neural network models. Benchmarking studies highlight the significant impact of encoding choices on model performance and fairness, particularly in applications with high-cardinality categorical variables or imbalanced datasets. These findings are crucial for improving the accuracy and reliability of machine learning models across diverse fields, such as fraud detection and cybersecurity.
Papers
April 7, 2024
January 18, 2024
July 17, 2023
October 25, 2022
February 19, 2022
January 27, 2022
December 22, 2021
November 29, 2021