One Hot Encoding

One-hot encoding is a common technique for representing categorical data as numerical vectors, crucial for machine learning algorithms that require numerical input. Recent research focuses on addressing its limitations, such as high dimensionality and inability to capture semantic relationships between categories, exploring alternatives like hierarchical encodings, dense vector representations, and incorporating label uncertainty. These efforts aim to improve model performance, particularly in tasks involving complex categorical data or out-of-vocabulary items, and enhance robustness and generalization capabilities across various applications.

Papers