Paper ID: 2411.02557

A Directional Rockafellar-Uryasev Regression

Alberto Arletti

Most ost Big Data datasets suffer from selection bias. For example, X (Twitter) training observations differ largely from the testing offline observations as individuals on Twitter are generally more educated, democratic or left-leaning. Therefore, one major obstacle to reliable estimation is the differences between training and testing data. How can researchers make use of such data even in the presence of non-ignorable selection mechanisms? A number of methods have been developed for this issue, such as distributionally robust optimization (DRO) or learning fairness. A possible avenue to reducing the effect of bias is meta-information. Researchers, being field exerts, might have prior information on the form and extent of selection bias affecting their dataset, and in which direction the selection might cause the estimate to change, e.g. over or under estimation. At the same time, there is no direct way to leverage these types of information in learning. I propose a loss function which takes into account two types of meta data information given by the researcher: quantity and direction (under or over sampling) of bias in the training set. Estimation with the proposed loss function is then implemented through a neural network, the directional Rockafellar-Uryasev (dRU) regression model. I test the dRU model on a biased training dataset, a Big Data online drawn electoral poll. I apply the proposed model using meta data information coherent with the political and sampling information obtained from previous studies. The results show that including meta information improves the electoral results predictions compared to a model that does not include them.

Submitted: Nov 4, 2024