Categorical Feature
Categorical features, representing discrete data values like colors or product categories, are ubiquitous in real-world datasets but pose unique challenges for machine learning models that typically require numerical input. Current research focuses on effective encoding techniques, comparing methods like one-hot encoding, target encoding, and binary encoding, often within the context of neural networks (including Transformers) and gradient boosting decision trees. These efforts aim to improve model accuracy and fairness while addressing issues like high cardinality and sparsity, ultimately impacting the performance and interpretability of various applications, from spam detection to actuarial modeling and cybersecurity.
Papers
November 3, 2024
September 13, 2024
June 24, 2024
May 22, 2024
March 8, 2024
December 28, 2023
November 10, 2023
August 23, 2023
July 17, 2023
June 25, 2023
January 30, 2023
December 18, 2022
October 18, 2022
September 25, 2022
September 8, 2022
July 12, 2022
June 4, 2022
January 27, 2022
December 22, 2021