Robust Learning of Multivariate Gaussians and Mixtures: A Computational Discrete Optimization Approach

In preparation with Rahul Mazumder.

Traditional statistics estimators are often adversarially affected by the presence of a single outlier in a dataset. Robust statistics suggests estimators that are agnostic to the presence of arbitrary corruptions in the data. We consider the problem of estimating parameters of a class of multivariate Gaussian distribution and a mixture of Gaussians from a sample of observations contaminated with possibly arbitrary corruptions. To this end, we present two estimators: one based on a trimmed version of the maximum likelihood estimator, and another based on a robust version of a Kolmogorov-Smirnov goodness of fit measure. We show that these estimators can be cast as mixed integer optimization problems. Exploiting problem-specific structure, we develop specialized algorithms and demonstrate that they can solve instances of these problems well beyond the capabilities of existing off-the-shelf commercial solvers. The estimators available from our approach outperforms existing heuristic approaches.