An alternative derivation of the DLT for homographies

What if in the linear system A \mathbf{x} = \mathbf{b} the unknowns are not \mathbf{x} but the matrix A? Of course, if we have a sufficient number N of such systems A \mathbf{x}^{(1)} = \mathbf{b}^{(1)}A \mathbf{x}^{(2)} = \mathbf{b}^{(2)}, … with the same matrix A and independent second hands, we can stack them, and solve

x_1^{(1)} A_{11}+x_2^{(1)} A_{12}+ \cdots + x_n^{(1)} A_{1n}= b_1^{(1)}

x_1^{(N)} A_{m1}+x_2^{(N)} A_{m2}+ \cdots + x_n^{(N)} A_{mn}= b_m^{(N)}

for the m \times n unknowns A_{11} \cdots A_{mn} .  A similar problem arises when looking for a planar homography:

\begin{bmatrix} H_{11} & H_{12} & H_{13} \\ H_{21} & H_{22} & H_{23} \\ H_{31} & H_{32} & H_{33} \end{bmatrix} \begin{bmatrix} u \\ v \\ w \end{bmatrix} \propto \begin{bmatrix} u' \\ v' \\ w' \end{bmatrix}                                                    (1)

given a sufficient number of couples of homogeneous points ((u^{(i)}, v^{(i)}, w^{(i)}), (u'^{(i)}, v'^{(i)}, w'^{(i)})), i \in 1 \cdots M, M \geq 4 . It is to be noted that the problem is not identical as we have here \propto meaning proportionality, not equality; this because, with homogeneous coordinates, (u, v, w) and \lambda (u, v, w), with \lambda \neq 0, do represent the same point, and as a consequence the homography matrix H , while being full rank for regular homographies, is also defined up to a scale factor.

The correct method to solve this problem (eg Hartley&Zisserman, Multiple View Geometry in Computer Vision, §4.1) is to transform the system H \mathbf{u} \propto \mathbf{u'}  by applying a cross product with \mathbf{u}' to both hands, so that the second hand vanishes and we are left with the set of homogeneous linear equations \mathbf{u'} \times H \mathbf{u}=0 , containing for each sample (\mathbf{u}, \mathbf{u'}) three equations in the unknown entries of H. Only two of such equations are linearly independent, so one can be discarded.

This method is the so called DLT, short for Direct Linear Transformation. It works fine, albeit the cross product smells of deus ex machina math trick. One can get the same result in a way that, in my opinion, is more elegant if one recognizes that the real problem is not the homogeneity of the involved quantities in itself, but the related fact that u, v, w, u', v', w' are not observable: the observables are the ratios \frac{u}{w}, \frac{v}{w}, \frac{u'}{w'}, \frac{v'}{w'} instead (provided that we have no point at infinity). So one should manipulate the equations (1) to make these ratios emerge. Start with

\dfrac{u'_i}{w'_i} = \dfrac{H_{11} u_i + H_{12} v_i + H_{13} w_i}{H_{31} u_i + H_{32} v_i + H_{33} w_i}

\dfrac{v'_i}{w'_i} = \dfrac{H_{21} u_i + H_{22} v_i + H_{23} w_i}{H_{31} u_i + H_{32} v_i + H_{33} w_i}


H_{11} u_i + H_{12} v_i + H_{13} w_i - \dfrac{u'_i}{w'_i} H_{31} u_i - \dfrac{u'_i}{w'_i} H_{32} v_i - \dfrac{u'_i}{w'_i} H_{33} w_i = 0

H_{21} u_i + H_{22} v_i + H_{23} w_i - \dfrac{v'_i}{w'_i} H_{31} u_i - \dfrac{v'_i}{w'_i} H_{32} v_i - \dfrac{v'_i}{w'_i} H_{33} w_i = 0

Last, divide by w_i :

H_{11} \dfrac{u_i}{w_i} + H_{12} \dfrac{v_i}{w_i} + H_{13} - \dfrac{u'_i}{w'_i} H_{31} \dfrac{u_i}{w_i} - \dfrac{u'_i}{w'_i} H_{32} \dfrac{v_i}{w_i} - \dfrac{u'_i}{w'_i} H_{33} = 0

H_{21} \dfrac{u_i}{w_i} + H_{22} \dfrac{v_i}{w_i} + H_{23} - \dfrac{v'_i}{w'_i} H_{31} \dfrac{u_i}{w_i} - \dfrac{v'_i}{w'_i} H_{32} \dfrac{v_i}{w_i} - \dfrac{v'_i}{w'_i} H_{33} = 0

and you are left with the two DLT equations.

Posted in Uncategorized | Tagged , , , , | 3 Comments

In defense of H C Longuet-Higgins

Not that the late Hugh Cristopher Longuet-Higgins need any defense from me, of course. I choose this startling headline to express my surprise in finding out, during an internet search, a paper titled Mathematical Flaws in the Essential Matrix Theory, by a Tayeb Basta.

I am using the essential matrix in a visual metrology application and it works just fine, but nonetheless this title caused me some uneasiness.

What Longuet-Higgins did in his seminal paper A Computer Algorithm for Reconstructing a Scene from Two Projections was to introduce a new, linear method for computing the essential matrix which, not with this name, was already known (and used, I suppose) since years in photogrammetry. There was not much exchange between the computer vision and photogrammetry communities in that period, so Longuet-Higgins must be credited also with introducing this concept to computer vision for the first time. And, moreover, he experimented a lot with his algorithm, finding and pointing out precisely a number of circumstances where it fails – this cannot be considered a flaw anyway, and the circumstances of failure do not occur easily in practical cases; rather, they must be intentionally arranged.

Later, Richard Hartley and Olivier Faugeras elaborated on the concept of essential matrix introducing the fundamental matrix. Even this derived entity is being used, therefore tested, extensively in computer vision.

So how much is the claim of a newly found mathematical flaw reliable? Maybe it is a kind of numerical instability or inaccuracy, similar to what Hartley cured with normalization? Or is it something springing up in very special situations? Maybe the author has found more breakdown circumstances similar to the ones already pointed out by Longuet-Higgins himself? One just has to read the paper linked above to find out.

It turns out that it is nothing of the kind. To summarize: the main property of the essential matrix is that \mathbf{X}'^T E \mathbf{X}=0, where E is the essential matrix and \mathbf{X}=[X_1, X_2, X_3], \mathbf{X}'=[X'_1,X'_2,X'_3] are the coordinates of the same 3D point in the reference frames associated with the two cameras. The alleged flaw consists in that the expression \mathbf{X}'^T E \mathbf{X} mixes vectors from two different reference frames (this is true), and therefore it must be meaningless (this is false).

The author substantially claims that \mathbf{X}' is a vector in a certain reference frame, so that \mathbf{V}^T=\mathbf{X}'^T E must be a different vector in the same frame, while \mathbf{X} is a vector in a different frame, so that the scalar product \mathbf{V}^T \mathbf{X} of two vectors in two different frames is meaningless. So, by the way, this would not be a subtle flaw causing instability or inaccuracy or springing up in very special situations: it would be a paramount, gigantic, trivial flaw completely wrecking the whole algorithm and the concept of essential matrix itself in any situation. Luckily the essential matrix did not read the paper, so it continues to exist, and to work fine.

Why on earth should \mathbf{V}^T=\mathbf{X}'^T \cdot E be a vector in the same frame of \mathbf{X}'? Multiplying a vector by a rotation matrix does change its reference frame, and the essential matrix does contain a rotation matrix. So at last the two vectors \mathbf{V} and \mathbf{X} are in the same reference frame and their scalar product is meaningful.

I will now derive here the equation \mathbf{X}'^T E \mathbf{X}=0 using a notation different from that of Longuet-Higgins, which is a little bit cumbersome (but nonetheless correct). I am speaking of two cameras looking at the same 3D point \mathbf{P}. In the pinhole camera model,  the cameras have optical centres \mathbf{C}, \mathbf{C}' and each image of \mathbf{P} is the intersection with the retinal (sensor) plane of the straight line (optical ray) through \mathbf{P} and through the optical centre.

Three points define a plane, so the mixed product (\mathbf{P}-\mathbf{C}') \cdot (\mathbf{C}'-\mathbf{C}) \times (\mathbf{P}-\mathbf{C}) must vanish; this fact is known as epipolar constraint.

Each camera has a reference frame associated with it, having origin in the optical centre. Let the transformation between the two frames be \mathbf{X}'=R \mathbf{X}+\mathbf{t} or equivalently \mathbf{X}=R^T \mathbf{X}'-R^T \mathbf{t} = R^T \mathbf{X}' + \mathbf{T}. Here follows a table of what some entities read in the two reference frames:

Entity Unprimed frame Primed frame
\mathbf{C} [0, 0, 0]^T \mathbf{t}=[t_1, t_2, t_3]^T
\mathbf{C}' \mathbf{T}=[T_1, T_2, T_3]^T [0, 0, 0]^T
\mathbf{P} \mathbf{X}=[X_1, X_2, X_3]^T \mathbf{X}'=[X'_1, X'_2, X'_3]^T
\mathbf{P}-\mathbf{C} \mathbf{X}=R^T (\mathbf{X}'-\mathbf{t}) \mathbf{X}'-\mathbf{t}=R \mathbf{X}
\mathbf{P}-\mathbf{C}' \mathbf{X}-\mathbf{T} = R^T \mathbf{X}' \mathbf{X}' = R (\mathbf{X}-\mathbf{T})
\mathbf{C}'-\mathbf{C} \mathbf{T} = - R^T \mathbf{t} -\mathbf{t} = R \mathbf{T}

By the above table I can express the equation (\mathbf{P}-\mathbf{C}') \cdot (\mathbf{C}'-\mathbf{C}) \times (\mathbf{P}-\mathbf{C}) = 0 in either frame. In the unprimed frame it reads (R^T \mathbf{X}')^T \cdot \mathbf{T} \times \mathbf{X} = \mathbf{X}'^T R [\mathbf{T}]_{\times} \mathbf{X}=0 while in the primed frame it reads - \mathbf{X}'^T \cdot \mathbf{t} \times R \mathbf{X} = -\mathbf{X}'^T [\mathbf{t}]_{\times} R \mathbf{X}=0, where [\mathbf{a}]_{\times} is short for the skew symmetric matrix \begin{bmatrix}0 & -a_3 & a_2 \\ a_3 & 0 & -a_1 \\ -a_2 & a_1 & 0\end{bmatrix}.

So R [\mathbf{T}]_{\times} and -[\mathbf{t}]_{\times} R are two equivalent expressions for the essential matrix; the first is the one used by Longuet-Higgins in his paper. In both cases, all the quantities involved are expressed in the same reference frame and the equations are meaningful.

Posted in Uncategorized | Tagged , , , , , , , , , , , , , | 3 Comments

Rigid plate on four supports

Consider a rigid rectangular planar plate of width W and height H lying horizontal on four supports at its corners in S_1=\left[0 \hspace{1 mm} 0 \right], S_2=\left[W \hspace{1 mm} 0 \right], S_3=\left[W \hspace{1 mm} H \right], S_4=\left[0 \hspace{1 mm} H \right] and with a concentrated vertical load F at \left[x \hspace{1 mm} y \right], with 0 \leq x \leq W, 0 \leq y \leq H.

How can I determine the reactions R_1, R_2, R_3, R_4?

This is a hyperstatic problem (the object has redundant constraints) and as such the equilibrium equations make an underdetermined system.

Force equilibrium along Z: R_1 + R_2 + R_3 + R_4+F=0

Torque equilibrium around X at y=0: H R_3 + H R_4 + y F = 0

Torque equilibrium around Y at x=0: W R_2 + W R_3 + x F = 0

Any other equilibrium equation I could write wouldn’t increase the rank of the system.

Such problems are usually solved by considering the deformation of the object and requiring a null displacement at the constraints.

I said that the plate is rigid, so it will stay planar even when the force F is applied. How can I use this information to add a fourth equation?

Suppose the supports are elastic with a large constant k, so we do have small displacements z_1 = - R_1/k, z_2 = - R_2/k, z_3=-R_3/k, z_4=-R_4/k. If the corners lie on a plane, and neglecting for simplicity the slant of the plane, which is small if the displacements are small, it must be

\left| \begin{array}{cccc} x_1 & y_1 & z_1 & 1 \\ x_2 & y_2 & z_2 & 1 \\ x_3 & y_3 & z_3 & 1 \\ x_4 & y_4 & z_4 & 1 \end{array} \right| = 0

that is

\left| \begin{array}{cccc} 0 & 0 & - R_1/k & 1 \\ W  & 0 & - R_2/k & 1 \\ W & H & - R_3/k & 1 \\ 0  & H & - R_4/k & 1 \end{array} \right| = 0

that is

R_1 - R_2 + R_3 - R_4=0

This is my fourth equation, whence the solution

R_1 = -F \dfrac{3 H W - 2 H x - 2 W y}{4 H W}

R_2 = -F \dfrac{H W + 2 H x - 2 W y}{4 H W}

R_3 = -F \dfrac{- H W + 2 H x + 2 W y}{4 H W}

R_4 = -F \dfrac{H W - 2 H x + 2 W y}{4 H W}

The reverse problem is easier. If I know R_1, R_2, R_3, R_4, I can directly obtain F from the force equilibrium, x from the equilibrium around Y and y from the equilibrium around X:

F = -R_1-R_2-R_3-R_4

x = -W \dfrac{R_2+R_3}{F}

y=-H \dfrac{R_3+R_4}{F}

Posted in Uncategorized | Tagged , , | Leave a comment

Orthogonal least squares fitting of a sphere/2

As said in my previous post, to obtain an orthogonal least squares fitting of a sphere to a cloud of points P_i = \left[ x_i \hspace{1 mm} y_i \hspace{1 mm} z_i \right], i=0 \cdots m-1 one should minimize the function

E(x_c, y_c, z_c, r) = \sum_{i=0}^{m-1} e_i^2 = \\ \\ \hspace{5 mm} \sum_{i=0}^{m-1} (\sqrt{(x_i - x_c)^2+(y_i - y_c)^2 + (z_i - z_c)^2} - r)^2

Dave Eberly calls this the energy function, probably as a metaphorical reference to the minimum total potential energy principle.

Setting L_i (x_c, y_c, z_c) = \sqrt{(x_i - x_c)^2 + (y_i - y_c)^2 + (z_i - z_c)^2}

and considering that

\dfrac{\partial L_i}{\partial x_c} = \dfrac{x_c - x_i}{L_i}

\dfrac{\partial L_i}{\partial y_c} = \dfrac{y_c - y_i}{L_i}

\dfrac{\partial L_i}{\partial z_c} = \dfrac{z_c - z_i}{L_i}

\dfrac{\partial L_i}{\partial r} = 0

one can write the components of the gradient as

\dfrac{\partial E}{\partial x_c} = 2 m x_c - 2 \sum_{i=0}^{m-1} x_i - 2 r \sum_{i=0}^{m-1} \dfrac{x_c - x_i}{L_i}

\dfrac{\partial E}{\partial y_c} = 2 m y_c - 2 \sum_{i=0}^{m-1} y_i - 2 r \sum_{i=0}^{m-1} \dfrac{y_c - y_i}{L_i}

\dfrac{\partial E}{\partial z_c} = 2 m z_c - 2 \sum_{i=0}^{m-1} z_i - 2 r \sum_{i=0}^{m-1} \dfrac{z_c - z_i}{L_i}

\dfrac{\partial E}{\partial r} = 2 m r - 2 \sum_{i=0}^{m-1} L_i

Each L_i in the denominators approaches zero when the centre gets close to  P_i, what should not happen if a sufficiently good initial approximation is chosen.

In the minimum point one expects a null gradient. Eberly suggests to solve the equations \dfrac{\partial E}{\partial x_c} = 0, \dfrac{\partial E}{\partial y_c} = 0, \dfrac{\partial E}{\partial z_c} = 0 and \dfrac{\partial E}{\partial r} = 0 with a fixed point iteration, and says:

Warning. I have not analyzed the convergence properties of this algorithm. In a few experiments it seems to converge just fine

In my experience, fixed point iteration resulted very slow. I had better results with Levenberg-Marquardt.

Posted in Uncategorized | Tagged , , , , , , , , | Leave a comment

Orthogonal least squares fitting of a sphere

Recently I presented a linear method to obtain centre and radius of a sphere given four or more points on its surface, not all beginning to the same plane.

This method is not completely satisfactory when working with more than four points if the points are affected by errors (that is always, if the points come from measurement). The least squares solution is the set of values \alpha, \beta, \gamma, \delta which minimizes the (squared) 2-norm of the residual (algebraic fitting):

\sum_{i=1}^m (\alpha x_i + \beta y_i + \gamma z_i - \delta)^2

that is

\sum_{i=1}^m ((x_i - x_c)^2 + (y_i - y_c)^2 + (z_i - z_c)^2 - r^2)^2

This quantity is not very significant. Yes, it is related to the goodness of fit: it is zero if all the points lie on the sphere and increases as they depart, but its geometric meaning is awkward, as each term of the sum is proportional to the squared area of the annulus between a circumference centered in \left[ x_c \hspace{1 mm} y_c \hspace{1 mm} z_c \right] and passing through \left[ x_i \hspace{1 mm} y_i \hspace{1 mm} z_i \hspace{1 mm} \right], and a circumference with same centre and of radius r.

The squared residuals of the equations for other shapes will in general have a different geometric meaning, so that what one minimizes in this way depends on the shape.

Even more important, if the points are not evenly distributed on the sphere but are lumped in a small zone, this method will exhibit a significant bias, finding a centre too close to the lump and a radius too small. This because the area of the annulus decreases much more by moving the centre toward the lump and correspondingly decreasing the radius, than by moving the centre sideways to adjust the thickness of the annulus.

What we really should minimize is the sum of the unsigned distances from each point to the sphere; but this quantity has a discontinuous gradient in its minima, so that it is easier to minimize the sum of the squared distances from each point to the sphere (*):

\sum_{i=1}^m (\sqrt{(x_i - x_c)^2 + (y_i - y_c)^2 + (z_i - z_c)^2} - r)^2

This is said geometric fitting, or orthogonal fitting, or orthogonal regression. When fitting a function, the term total least squares is often found in literature, because the method takes into account not only errors on the ordinates but on the abscissas too.

Unfortunately, this problem is nonlinear, and harder to solve than algebraic fitting.

In my next post, I will describe a little variant to the method presented by Dave Eberly.

(*) Addendum 120920. I later learned that Giuseppe Calafiore of Politecnico di Torino devised a method for n-dimensional spherical and ellipsoidal fitting based on SDP (Semi-Definite Programming) which can easily deal with the L_1 norm too.

Posted in Uncategorized | Tagged , , , , , , , , , , , , | 1 Comment

Linear least squares fitting of a circumference in 3D

In my last post I described how it is possible to find the centre and radius of a sphere given four or more points on its surface and not belonging to the same plane.

What happens if the points do belong to the same plane?

Let me assume that the points belong to a circumference on the sphere (the intersection of the sphere with the plane). Be the points \left[ x_i y_i z_i \right], i=1 \cdots m, with m \geq 3. The linear system

(1) \alpha x_i + \beta y_i + \gamma z_i + \delta = \epsilon_i

in the unknowns \alpha, \beta, \gamma, \delta (see my previous post) is underdetermined (rank 3) and does not individuate one single sphere, but rather a pencil of spheres sharing the common circumference.

So in this conditions the method is not useful for finding a sphere, but it can be used to find the circumference in 3D containing the points: here is how.

The m x 4 matrix of the system, A=\left[ \begin{array}{cccc} x_1 & y_1 & z_1 & 1 \\ \cdots & \cdots & \cdots & \cdots \\ x_m & y_m & z_m & 1\end{array} \right], can be decomposed in singular values as

A = U \cdot W \cdot V^T,

where U (m x 4) is the matrix having the left singular vectors as columns, W (diagonal 4 x 4) is the matrix of the singular values w_{1 1}, \cdots w_{4 4} and V (4 x 4) is the matrix having the right singular vectors as columns.

Being 3 the rank of the matrix, one of the singular values is zero (within machine precision) and the corresponding right singular vector spans the nullspace of A.

As all the points lie on the circumference, the second hand vector \left[ \epsilon_1 \cdots \epsilon_m\right] ^T lies in the range of A and a solution can be found as

(2) \left[ \alpha^{(0)} \hspace{1 mm} \beta^{(0)} \hspace{1 mm} \gamma^{(0)} \hspace{1 mm} \delta^{(0)}\right] ^T = V W' U^T \left[ \epsilon_1 \cdots \epsilon_m\right] ^T

remembering that V^{-1} = V^T and U^{-1} = U^T; W' is the diagonal matrix having w'_{i i} = \dfrac{1}{w_{i i}} if w_{i i} > 0 and w'_{i i} = 0 if w_{i i} = 0.

As always with underdetermined system, there is an infinity of solutions, obtained by adding to (2) any vector in the nullspace. In this case, the nullspace is spanned by the right singular vector corresponding to the null singular value, say v = \left[ v_{\alpha} \hspace{1 mm} v_{\beta} \hspace{1 mm} v_{\gamma} \hspace{1 mm} v_{\delta} \right]^T.

So \left[ \alpha \hspace{1 mm} \beta \hspace{1 mm} \gamma \hspace{1 mm} \delta \right]^T = \left[ \alpha^{(0)} \hspace{1 mm} \beta^{(0)} \hspace{1 mm} \gamma^{(0)} \hspace{1 mm} \delta^{(0)}\right]^T + \tau \left[ v_{\alpha} \hspace{1 mm} v_{\beta} \hspace{1 mm} v_{\gamma} \hspace{1 mm} v_{\delta}  \right]^T is the parametric equation of the pencil of spheres sharing the circumference one is looking for.

Now, the circumference will be an equator of the smallest sphere in the pencil. So, remembering from my previous post that \alpha = - 2 x_c\beta = - 2 y_c, \gamma = - 2 z_c and \delta = x_c^2 + y_c^2 + z_c^2 - r^2,

\min_{\tau} r^2 = \min_{\tau} x_c^2 + y_c^2 + z_c^2 - \delta = \min_{\tau} \dfrac{1}{4} (\alpha^2 + \beta^2 + \gamma^2) - \delta

= \min_{\tau} \dfrac{1}{4} ((\alpha^{(0)} + \tau v_{\alpha})^2 + (\beta^{(0)} + \tau v_{\beta})^2 + (\gamma^{(0)} + \tau v_{\gamma})^2) - \delta^{(0)} - \tau v_{\delta}

By equating to zero the derivative of the expression to be minimized one gets:

\tau = \dfrac{2 v_{\delta} - \alpha^{(0)} v_{\alpha} - \beta^{(0)} v_{\beta} - \gamma^{(0)} v_{\gamma}}{v_{\alpha}^2 + v_{\beta}^2 + v_{\gamma}^2}

whence the coordinates x_c, y_c, z_c of the centre and the radius r of the circumference can be obtained, while the the plane of the circumference is orthogonal to \left[ v_{\alpha} \hspace{1 mm} v_{\beta} \hspace{1 mm} v_{\gamma} \hspace{1 mm} \right]^T.

Posted in Uncategorized | Tagged , , , , , | Leave a comment

Linear least squares fitting of a sphere

The equation of a sphere with centre in \left[ x_c, y_c, z_c \right] and radius r is

(x - x_c)^2 + (y - y_c)^2 + (z - z_c)^2 = r^2


x^2 - 2 x x_c + x_c^2 + y^2 - 2 y y_c + y_c^2 + z^2 - 2 z z_c + z_c^2 = r^2

that is

(1)  \alpha x + \beta y + \gamma z + \delta = \epsilon


\alpha = - 2 x_c

\beta = - 2 y_c

\gamma = - 2 z_c

\delta = x_c^2 + y_c^2 + z_c^2 - r^2

\epsilon = - x^2 - y^2 - z^2

Writing (1) for four points on the sphere, \left[ x_i \hspace{1 mm} y_i \hspace{1 mm} z_i \right], i=1 \cdots 4, not belonging to the same plane, one gets a linear system of rank 4 in the four unknowns \alpha, \beta, \gamma, \delta; then of course

x_c = - \dfrac{1}{2} \alpha

y_c = - \dfrac{1}{2} \beta

z_c = - \dfrac{1}{2} \gamma

r = \sqrt{x_c^2 + y_c^2 + z_c^2 - \delta}

If more than four points on the sphere are given, and they don’t belong all to the same plane, one gets an overdetermined linear system of rank 4 whose solution, eg with the Singular Value Decomposition method (SVD), is an approximation of the sphere in the least squares sense.

Posted in Uncategorized | Tagged , , , , | Leave a comment