## Lagrangian rejection and fitting

Fitting a mathematical model to a set of samples (eg fitting a sphere to a set of 3D cartesian points supposed to lie on its surface) is not an easy task.

The samples usually come from measurement and as such, even assuming that systematic errors are absent, they are anyway affected by errors due to the finite accuracy and precision of the measuring process.

It can be shown that, at least for linear models, a least squares fitting gives the maximum likelihood estimation of the model parameters if the error distribution is gaussian, an assumption which is usually considered acceptable.

Unluckily it is very common to have samples in good agreement with a gaussian error distribution except for a certain, supposedly small number of outliers, that is samples with exceptional errors, not fitting into the gaussian distribution, due to unforeseeable conditions.

Even one single outlier may severely hurt the result of least squares fitting; so outliers must be recognized and excluded from the fitting.

The statistical methods for outlier rejection I know of (RANSAC, LMedS) are binary; that is, they select a subset of the available samples, on which fitting is to be performed. A sample either is accepted or rejected; therefore such methods have no closed form solution.

Typically, if the model being fitted needs $n$ samples as a minimum to determine its parameters (example with a geometric shape: four points for a sphere), a large number of minimal subsets is prepared by random choice; each subset determines a model, which is tested against the whole set of samples by computing the distance between each sample and the model.

According to RANSAC, the best model is the one having the maximum number of samples under a specified distance; according to LMedS, it is the one having the least value of the median of squared distances.

Then all the samples not agreeing with the best model found are discarded; the remaining are used for a least squares fit, giving the ‘accredited’ model.

The above procedure is often summarized as

$\min_ p \sum_i w_i d (p, S_i)$

where $p$ is the set of model parameters, $S_i$ are the samples and $d(p, S_i)$ is the (squared, or anyway unsigned) distance between the sample $S_i$ and the model having parameters $p$.

$w_i$ is formally a weight, but it can attain only the values $1$ for accepted samples or $0$ for rejected samples.

Would it be possible to consider $w_i$ as true real weights to be computed together with parameters $p$ ? Something like

$\min_{p,w} \sum_i w_i d (p, S_i)$

Of course it can not work as written, because the expression has no lower bound with respect to $w$. We could use squared weights (as we don’t want negative weights) and add a constraint to avoid trivial minima; eg

$\min_{p,w} \sum_i w_i^2 d (p, S_i)$   subject to   $\sum_i w_i^2 = 1$

It is easy to see that this isn’t sufficient, because this expression still has many unwanted minima; for example, when all the $w_i$ are null except one, say $w_k$, and the set of parameters $p$ is chosen so that $d(p,S_k)=0$.

It turns out that a good choice is

$\min_{p,w} \sum_i (1+w_i)^2 d (p, S_i)$   subject to   $\sum_i w_i^2 = q$

where $q$ is not less than the expected number of outliers. This way, all weights are forced to be greater than zero except at most $q$ of them, for which it could be $w_i=-1$.