Soumil Chugh

Papers from this author

Detection and Correspondence Matching of Corneal Reflections for Eye Tracking Using Deep Learning

Soumil Chugh, Braiden Brousseau, Jonathan Rose, Moshe Eizenman

Responsive image

Auto-TLDR; A Fully Convolutional Neural Network for Corneal Reflection Detection and Matching in Extended Reality Eye Tracking Systems

Slides Poster Similar

Eye tracking systems that estimate the point-of-gaze are essential in extended reality (XR) systems as they enable new interaction paradigms and technological improvements. It is important for these systems to maintain accuracy when the headset moves relative to the head (known as device slippage) due to head movements or user adjustment. One of the most accurate eye tracking techniques, which is also insensitive to shifts of the system relative to the head, uses two or more infrared (IR) light emitting diodes to illuminate the eye and an IR camera to capture images of the eye. An essential step in estimating the point-of-gaze in these systems is the precise determination of the location of two or more corneal reflections (virtual images of the IR-LEDs that illuminate the eye) in images of the eye. Eye trackers tend to have multiple light sources to ensure at least one pair of reflections for each gaze position. The use of multiple light sources introduces a difficult problem: the need to match the corneal reflections with the corresponding light source over the range of expected eye movements. Corneal reflection detection and matching often fail in XR systems due to the proximity of camera and steep illumination angles of light sources with respect to the eye. The failures are caused by corneal reflections having varying shape and intensity levels or disappearance due to rotation of the eye, or the presence of spurious reflections. We have developed a fully convolutional neural network, based on the UNET architecture, that solves the detection and matching problem in the presence of spurious and missing reflections. Eye images of 25 people were collected in a virtual reality headset using a binocular eye tracking module consisting of five infrared light sources per eye. A set of 4,000 eye images were manually labelled for each of the corneal reflections, and data augmentation was used to generate a dataset of 40,000 images. The network is able to correctly identify and match 91% of corneal reflections present in the test set. This is comparable to a state-of-the-art deep learning system, but our approach requires 33 times less memory and executes 10 times faster. The proposed algorithm, when used in an eye tracker in a VR system, achieved an average mean absolute gaze error of 1°. This is a significant improvement over the state-of-the-art learning-based XR eye tracking systems that have reported gaze errors of 2-3°.