In defense of H C Longuet-Higgins

Not that the late Hugh Cristopher Longuet-Higgins need any defense from me, of course. I choose this startling headline to express my surprise in finding out, during an internet search, a paper titled Mathematical Flaws in the Essential Matrix Theory, by a Tayeb Basta.

I am using the essential matrix in a visual metrology application and it works just fine, but nonetheless this title caused me some uneasiness.

What Longuet-Higgins did in his seminal paper A Computer Algorithm for Reconstructing a Scene from Two Projections was to introduce a new, linear method for computing the essential matrix which, not with this name, was already known (and used, I suppose) since years in photogrammetry. There was not much exchange between the computer vision and photogrammetry communities in that period, so Longuet-Higgins must be credited also with introducing this concept to computer vision for the first time. And, moreover, he experimented a lot with his algorithm, finding and pointing out precisely a number of circumstances where it fails – this cannot be considered a flaw anyway, and the circumstances of failure do not occur easily in practical cases; rather, they must be intentionally arranged.

Later, Richard Hartley and Olivier Faugeras elaborated on the concept of essential matrix introducing the fundamental matrix. Even this derived entity is being used, therefore tested, extensively in computer vision.

So how much is the claim of a newly found mathematical flaw reliable? Maybe it is a kind of numerical instability or inaccuracy, similar to what Hartley cured with normalization? Or is it something springing up in very special situations? Maybe the author has found more breakdown circumstances similar to the ones already pointed out by Longuet-Higgins himself? One just has to read the paper linked above to find out.

It turns out that it is nothing of the kind. To summarize: the main property of the essential matrix is that \mathbf{X}'^T E \mathbf{X}=0, where E is the essential matrix and \mathbf{X}=[X_1, X_2, X_3], \mathbf{X}'=[X'_1,X'_2,X'_3] are the coordinates of the same 3D point in the reference frames associated with the two cameras. The alleged flaw consists in that the expression \mathbf{X}'^T E \mathbf{X} mixes vectors from two different reference frames (this is true), and therefore it must be meaningless (this is false).

The author substantially claims that \mathbf{X}' is a vector in a certain reference frame, so that \mathbf{V}^T=\mathbf{X}'^T E must be a different vector in the same frame, while \mathbf{X} is a vector in a different frame, so that the scalar product \mathbf{V}^T \mathbf{X} of two vectors in two different frames is meaningless. So, by the way, this would not be a subtle flaw causing instability or inaccuracy or springing up in very special situations: it would be a paramount, gigantic, trivial flaw completely wrecking the whole algorithm and the concept of essential matrix itself in any situation. Luckily the essential matrix did not read the paper, so it continues to exist, and to work fine.

Why on earth should \mathbf{V}^T=\mathbf{X}'^T \cdot E be a vector in the same frame of \mathbf{X}'? Multiplying a vector by a rotation matrix does change its reference frame, and the essential matrix does contain a rotation matrix. So at last the two vectors \mathbf{V} and \mathbf{X} are in the same reference frame and their scalar product is meaningful.

I will now derive here the equation \mathbf{X}'^T E \mathbf{X}=0 using a notation different from that of Longuet-Higgins, which is a little bit cumbersome (but nonetheless correct). I am speaking of two cameras looking at the same 3D point \mathbf{P}. In the pinhole camera model,  the cameras have optical centres \mathbf{C}, \mathbf{C}' and each image of \mathbf{P} is the intersection with the retinal (sensor) plane of the straight line (optical ray) through \mathbf{P} and through the optical centre.

Three points define a plane, so the mixed product (\mathbf{P}-\mathbf{C}') \cdot (\mathbf{C}'-\mathbf{C}) \times (\mathbf{P}-\mathbf{C}) must vanish; this fact is known as epipolar constraint.

Each camera has a reference frame associated with it, having origin in the optical centre. Let the transformation between the two frames be \mathbf{X}'=R \mathbf{X}+\mathbf{t} or equivalently \mathbf{X}=R^T \mathbf{X}'-R^T \mathbf{t} = R^T \mathbf{X}' + \mathbf{T}. Here follows a table of what some entities read in the two reference frames:

Entity Unprimed frame Primed frame
\mathbf{C} [0, 0, 0]^T \mathbf{t}=[t_1, t_2, t_3]^T
\mathbf{C}' \mathbf{T}=[T_1, T_2, T_3]^T [0, 0, 0]^T
\mathbf{P} \mathbf{X}=[X_1, X_2, X_3]^T \mathbf{X}'=[X'_1, X'_2, X'_3]^T
\mathbf{P}-\mathbf{C} \mathbf{X}=R^T (\mathbf{X}'-\mathbf{t}) \mathbf{X}'-\mathbf{t}=R \mathbf{X}
\mathbf{P}-\mathbf{C}' \mathbf{X}-\mathbf{T} = R^T \mathbf{X}' \mathbf{X}' = R (\mathbf{X}-\mathbf{T})
\mathbf{C}'-\mathbf{C} \mathbf{T} = - R^T \mathbf{t} -\mathbf{t} = R \mathbf{T}

By the above table I can express the equation (\mathbf{P}-\mathbf{C}') \cdot (\mathbf{C}'-\mathbf{C}) \times (\mathbf{P}-\mathbf{C}) = 0 in either frame. In the unprimed frame it reads (R^T \mathbf{X}')^T \cdot \mathbf{T} \times \mathbf{X} = \mathbf{X}'^T R [\mathbf{T}]_{\times} \mathbf{X}=0 while in the primed frame it reads - \mathbf{X}'^T \cdot \mathbf{t} \times R \mathbf{X} = -\mathbf{X}'^T [\mathbf{t}]_{\times} R \mathbf{X}=0, where [\mathbf{a}]_{\times} is short for the skew symmetric matrix \begin{bmatrix}0 & -a_3 & a_2 \\ a_3 & 0 & -a_1 \\ -a_2 & a_1 & 0\end{bmatrix}.

So R [\mathbf{T}]_{\times} and -[\mathbf{t}]_{\times} R are two equivalent expressions for the essential matrix; the first is the one used by Longuet-Higgins in his paper. In both cases, all the quantities involved are expressed in the same reference frame and the equations are meaningful.

Advertisements
This entry was posted in Uncategorized and tagged , , , , , , , , , , , , , . Bookmark the permalink.

3 Responses to In defense of H C Longuet-Higgins

  1. Tayeb Basta says:

    Hello there,

    Thank you very much for defending the essential matrix theory. I recognize that I did not realize what reported in the above bog: “Why on earth should V=X ́E be a vector in the same frame of X ́? Multiplying a vector by a rotation matrix does change its reference frame, and the essential matrix does contain a rotation matrix. So at last the two vectors V and X are in the same reference frame and their scalar product is meaningful.”
    You are completely right and the expression X ́EX is well defined.
    Tayeb Basta

  2. Pingback: Forward bundle adjustment | Tramp's Trifles

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s