Proxy Label
Proxy labels are surrogate variables used in machine learning when true labels are unavailable, expensive, or difficult to obtain, enabling model training and evaluation on readily accessible data. Current research focuses on mitigating biases introduced by using proxies, particularly in scenarios with limited or incomplete demographic information, and improving the accuracy and fairness of models trained with them. This work employs various techniques, including multi-label classification, unsupervised attribute generation, and contrastive learning, often incorporating uncertainty quantification to enhance robustness. The development and refinement of proxy label methods have significant implications for diverse fields, including environmental science, healthcare, and social sciences, by enabling the analysis of complex datasets and improving the fairness and accuracy of machine learning models.