Paper ID: 2211.14297 • Published Nov 25, 2022

Doubly robust nearest neighbors in factor models

Raaz Dwivedi, Katherine Tian, Sabina Tomkins, Predrag Klasnja, Susan Murphy, Devavrat Shah

TL;DR

Get AI-generated summaries with premium

We introduce and analyze an improved variant of nearest neighbors (NN) for estimation with missing data in latent factor models. We consider a matrix completion problem with missing data, where the (i, t)-th entry, when observed, is given by its mean f(u_i, v_t) plus mean-zero noise for an unknown function f and latent factors u_i and v_t. Prior NN strategies, like unit-unit NN, for estimating the mean f(u_i, v_t) relies on existence of other rows j with u_j \approx u_i. Similarly, time-time NN strategy relies on existence of columns t' with v_{t'} \approx v_t. These strategies provide poor performance respectively when similar rows or similar columns are not available. Our estimate is doubly robust to this deficit in two ways: (1) As long as there exist either good row or good column neighbors, our estimate provides a consistent estimate. (2) Furthermore, if both good row and good column neighbors exist, it provides a (near-)quadratic improvement in the non-asymptotic error and admits a significantly narrower asymptotic confidence interval when compared to both unit-unit or time-time NN.

arXiv PDF

Topics

Factor Model Latent Factor Matrix Completion Nearest Neighbor