On the least-squares problem with outliers, matrix decomposition and related concentration inequalities
Statistics Seminar
27th May 2022, 2:30 pm – 3:30 pm
Fry Building, 2.41
Estimation/prediction with "contaminated" data is a classical problem considered in Huber's seminal work in the 60's. This set-up regained a lot of attention recently in the modern literature of robust statistics and machine learning. Optimality in this setting concerns not only sample size and effective dimension but the contamination fraction and, sometimes, the confidence level. In this talk we consider the least-squares regression problem with a linear subgaussian class, including the setting where a fraction of the label sample is contaminated by adversarial outliers. Our contributions to this problem span different perspectives. First, we study outlier-robust least-squares in the p>>n regime when the parameter is sparse/low-rank and show tractable optimality adaptively to sparsity/low-rankness and the contamination fraction. Besides improving prior guarantees for this problem, our bounds make an important distinction between label contamination and label-feature contamination: unlike the latter, the label-contamination model has breakdown-point independent of the covariance's condition number and can be tuned adaptively to the contamination fraction and confidence level. Second, the literature in least squares in the p>>n regime typically assumes the noise is independent of the features, an assumption we avoid by just improving upon the analysis wrt the Multiplier Process. A third concern is optimality wrt confidence level. Most rates for this problem are optimal only "on average". We present new adaptive optimal rates that hold uniformly wrt any confidence level. Our estimator is based on a new "Sorted Huber loss" which we show by numerical experiments can significantly outperform the classical Huber regression. Lastly, we consider the problem of "trace-regression with matrix decomposition". While natural in high-dimensional statistics and compressed sensing, to the best of our knowledge no optimality theory has been presented before for this problem. Our statistical theory relies on improved generic-chaining concentration inequalities for the Multiplier Process and Product Processes wrt confidence level that could be useful elsewhere. Chevet's inequality is an essential ingredient for optimality of the label contamination model.
Comments are closed.