Paper ID: 2407.03262

Nearly Linear Sparsification of $\ell_p$ Subspace Approximation

David P. Woodruff, Taisuke Yasuda

The $\ell_p$ subspace approximation problem is an NP-hard low rank approximation problem that generalizes the median hyperplane problem ($p = 1$), principal component analysis ($p = 2$), and the center hyperplane problem ($p = \infty$). A popular approach to cope with the NP-hardness of this problem is to compute a strong coreset, which is a small weighted subset of the input points which simultaneously approximates the cost of every $k$-dimensional subspace, typically to $(1+\varepsilon)$ relative error for a small constant $\varepsilon$. We obtain the first algorithm for constructing a strong coreset for $\ell_p$ subspace approximation with a nearly optimal dependence on the rank parameter $k$, obtaining a nearly linear bound of $\tilde O(k)\mathrm{poly}(\varepsilon^{-1})$ for $p<2$ and $\tilde O(k^{p/2})\mathrm{poly}(\varepsilon^{-1})$ for $p>2$. Prior constructions either achieved a similar size bound but produced a coreset with a modification of the original points [SW18, FKW21], or produced a coreset of the original points but lost $\mathrm{poly}(k)$ factors in the coreset size [HV20, WY23]. Our techniques also lead to the first nearly optimal online strong coresets for $\ell_p$ subspace approximation with similar bounds as the offline setting, resolving a problem of [WY23]. All prior approaches lose $\mathrm{poly}(k)$ factors in this setting, even when allowed to modify the original points.

Submitted: Jul 3, 2024