My primary research interests lie at the interface of Optimization and Machine Learning with applications in Healthcare and Finance. Specifically, I work on designing novel Optimization algorithms for Machine Learning problems using tools from Robust and Discrete Optimization.
In this exploratory study, we analyze immunization records from 137,037 individuals who received SARS-CoV-2 PCR tests. We find that polio, Hemophilus influenzae type-B (HIB), measles-mumps-rubella (MMR), varicella, pneumococcal conjugate (PCV13), geriatric flu, and hepatitis A / hepatitis B (HepA-HepB) vaccines administered in the past 1, 2, and 5 years are associated with decreased SARS-CoV-2 infection rates, even after adjusting for geographic SARS-CoV-2 incidence and testing rates, demographics, comorbidities, and number of other vaccinations.
We propose a holistic framework using Tensor completion and Robust Optimization to prescribe influenza vaccine composition that are specific to a region, or a country based on historical data concerning the rates of circulation of predominant viruses. Through numerical experiments, we show that our proposed vaccine compositions could potentially lower morbidity by 11-14% and mortality by 8-11% over vaccine compositions proposed by World Health Organization (WHO) for Northern hemisphere.
We present two estimators: one based on a trimmed version of the maximum likelihood estimator, and another based on a robust version of a Kolmogorov-Smirnov goodness of fit measure for the problem of estimating parameters of a class of multivariate Gaussian distribution and a mixture of Gaussians from a sample of observations contaminated with possibly arbitrary corruptions. Exploiting problem-specific structure, we develop specialized algorithms and demonstrate that they can solve instances of these problems well beyond the capabilities of existing off-the-shelf commercial solvers.
We develop specialized proximal gradient based first-order algorithms for the problem of estimating a nonparametric function under a variety of smoothness and shape constraints such as monotonicity, convexity, unimodality and Lipschitz smoothness whenever some prior knowledge about the relationship between the independent and dependent variables is given.
We develop a novel framework for designing statistical hypothesis tests when given access to i.i.d. samples drawn under the hypothesis. We model the uncertainty in a sample using uncertainty sets based on the Wasserstein distance with respect to the empirical distribution and design tests that maximally separate the two hypotheses using an affine combination of statistics by solving a sample robust optimization problem
We develop a discrete optimization formulation to learn a Multivariate Gaussian mixture model (GMM) given access to n samples that are believed to have come from a mixture of multiple subpopulations. The formulation optimally recovers the parameters of a GMM by minimizing a discrepancy measure (either the Kolmogorov–Smirnov or the Total Variation distance) between the empirical distribution function and the distribution function of the GMM whenever the mixture component weights are known.
We obtain a robust optimization based explanation for why regularized linear regression methods perform well in the face of noise, even when these methods do not produce reliably sparse solutions. We derive tight regularized regression bounds for the corresponding robust problems with convex, positive homogeneous loss functions and Fenchel convex loss functions on Frobenius norm bounded uncertainty sets. And based on the regularized regression bounds, we propose a principled way to choose the regularization parameter λ to balance bias-variance trade-off for the regularized linear regression problem