Paper ID: 2410.21426
Sum-of-squares lower bounds for Non-Gaussian Component Analysis
Ilias Diakonikolas, Sushrut Karmalkar, Shuo Pang, Aaron Potechin
Non-Gaussian Component Analysis (NGCA) is the statistical task of finding a non-Gaussian direction in a high-dimensional dataset. Specifically, given i.i.d.\ samples from a distribution $P^A_{v}$ on $\mathbb{R}^n$ that behaves like a known distribution $A$ in a hidden direction $v$ and like a standard Gaussian in the orthogonal complement, the goal is to approximate the hidden direction. The standard formulation posits that the first $k-1$ moments of $A$ match those of the standard Gaussian and the $k$-th moment differs. Under mild assumptions, this problem has sample complexity $O(n)$. On the other hand, all known efficient algorithms require $\Omega(n^{k/2})$ samples. Prior work developed sharp Statistical Query and low-degree testing lower bounds suggesting an information-computation tradeoff for this problem. Here we study the complexity of NGCA in the Sum-of-Squares (SoS) framework. Our main contribution is the first super-constant degree SoS lower bound for NGCA. Specifically, we show that if the non-Gaussian distribution $A$ matches the first $(k-1)$ moments of $\mathcal{N}(0, 1)$ and satisfies other mild conditions, then with fewer than $n^{(1 - \varepsilon)k/2}$ many samples from the normal distribution, with high probability, degree $(\log n)^{{1\over 2}-o_n(1)}$ SoS fails to refute the existence of such a direction $v$. Our result significantly strengthens prior work by establishing a super-polynomial information-computation tradeoff against a broader family of algorithms. As corollaries, we obtain SoS lower bounds for several problems in robust statistics and the learning of mixture models. Our SoS lower bound proof introduces a novel technique, that we believe may be of broader interest, and a number of refinements over existing methods.
Submitted: Oct 28, 2024