Gaze tracking as a novel input method

Smartphones and tablets usually have a camera on their back, to take photographs, and a frontal camera for videoconferencing.

smartphone5

 

 

 

In a recent model (Samsung Galaxy S4) the frontal camera can be used as an input device too: it can suspend the current operation if the user looks away or disappears, and it can react to gestures by scrolling or clicking. This is probably the best that can be done with one single frontal camera.

 

 

 

 

smartphone6

 

 

With two frontal cameras a gaze tracking functionality could be added, sufficiently precise to point at specific locations on the screen, so complementing or replacing a touch screen. (The expressions gaze tracking and eye tracking are interchangeable in common use, but to be precise eye tracking means tracking the movements of the eye with respect to the head, while gaze tracking is with respect to a fixed reference frame. So here I am specifically speaking of  gaze tracking, with respect to a reference frame fixed to the device).

 

 

 

smartphone7

 

 

 

 

The two cameras could be positioned at the top left and top right corners or, to maximize their distance (baseline), at two diagonally opposite corners.

The cameras should be attached to a common rigid frame for stability.

 

 

 

 

(Meta-note: Keeping text and images aligned in WordPress is a nightmare. Even when everything looks fine, there is no guarantee that it will stay so, due probably to updates to the WordPress platform. Should the post become unreadable due to bad alignment, please tell me in a comment and I will try to adjust it).

Eye/gaze tracking is being used since the beginning of last century in psychological research. Leaving aside intrusive techniques, requiring special contact lenses or electrodes, there is now a number of commercially available devices which are based on digitally processing the image of the eye; NuiaSMITobiiLC Technologies are the principal vendors; I also know of  an open source application based on OpenCV. Applications range from gaming to disability support. All these products, to my knowledge, need special illumination to create corneal reflections and, in most cases, a calibration is necessary to adapt the system to a new user. The need for special illumination makes gaze tracking not easily applicable to small, portable devices.

I have developed a method for visual measurement of holes in a metal sheet; it relies on detecting the margins of the hole, and is suitable for an iris too. No corneal reflection is needed. The geometric calibration of the two cameras (internal and external parameters and distortion correction coefficients) must be done only once and it is valid for all users; a new user could only be asked to look at a couple of points on the display, in order to determine the dominant eye and the angle between gaze direction and normal to the iris.

I made some experiments with what I had at hand, that is a measuring rig I was using for metal sheets. It has two high resolution (5 Mp) cameras with a baseline of about 500 mm. So not exactly the size of a smartphone, but there is no reason why the method shouldn’t work there too.

First, the irides must be found. To this purpose, several algorithms are available. As this is not a central point in this discussion, here I just select by hand the ROIs (Regions Of Interest) around the irides:

ROI

Next, edgels are extracted inside the ROIs:

edgels 

Edgels correspond to abrupt transitions from dark to light and their position can be computed with sub-pixel accuracy, exploiting the information carried by the grey levels. Here we find many edgels on the border of the irides, which are what we were looking for, but we also find many spurious edgels on the eyelids, on the eyelashes and around reflections inside the irides. The edgels of interest lie along an almost continuous arc around the irides, while the spurious edgels are organized in short segments; this allows to prune most of them:

pruned

The next step is to fit ellipses to the surviving edgels. The fitting is preceded by a statistical selection with RANSAC to reject the few outliers (spurious edgels) which are still there.

fitted

It is to be noted that the ellipses are well fitted even if edgels are missing in the portion of their perimeters hidden by eyelids.

Each iris ellipse is the directrix of a cone having vertex in the optical center of the camera; the two cones based on the left and right image of the same iris ellipse intersect in 3D space, and this intersection contains the 3D iris ellipse:  

coni1

The algorithm to reconstruct a 3D conic from two images was first described by Long Quan in his paper Conic Reconstruction and Corresponcence from Two Views, IEEE-PAMI vol. 18, no. 2, Feb. 1996. Cordelia Schmid and Andrew Zisserman in The Geometry and Matching of Lines and Curves over Multiple Views present a substantially similar but formally more elegant approach.

To make sure that the reconstruction of the 3D ellipses has been successful, I reproject them back onto the images:

backprojected

They fit perfectly.

To make tridimensionality evident, here is an animation of the reconstructed irides in a reference frame fixed with the measuring rig and having origin in front of the nose, X axis to the left, Y upwards, Z towards the face:

movie

 (units: mm) Left iris Right iris
Centre X; Y; Z -28,79; 6,51; 126,37 33,14; 7,00; 128,80
Normal X; Y; Z 0,088; 0,044; -0,995 0,063; -0,005; -0,998
Major semiaxis 6,46 6,47
Minor semiaxis 5,84 5,98
Average diameter 12,30 12,44

The distance between centers is 61,98.

It remains to be noted that all the algorithms dealing with edgels – extraction, grouping, pruning – are well suitable for highly parallel computation, e.g. each pixel can be examined independently to determine if it contains an edgel. RANSAC provides for randomly choosing a number of minimal subsets of edgels and fitting and ellipse to each of them, and here too a high parallelism can be attained because each subset can be dealt with independently. Last, each of the two object ellipses can be reconstructed independently. So the whole process described here is highly parallelizable.

Advertisements
This entry was posted in Uncategorized and tagged , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s