Starburst:A hybrid algorithm for video-bad eye tracking combining feature-bad and model-bad approaches
Dongheng Li,David Winfield,Derrick J.Parkhurst
Human Computer Interaction Program
Iowa State University,Ames,Iowa,50010
Abstracttomorrowwillbebetter
Knowing the ur’s point of gaze has significant potential to enhance current human-computer interfaces,given that eye movements can be ud as an indicator of the atten-tional state of a ur.The primary obstacle of integrat-ing eye movements into today’s interfaces is the availabil-ity of a reliable,low-cost open-source eye-tracking system. Towards making such a system available to interface de-signers,we have developed a hybrid eye-tracking algorithm that integrates feature-bad and model-bad approaches and made it available in an open-source package.We refer to this algorithm as“starburst”becau of the novel way in which pupil features are detected.This starburst algo-rithm is more accurate than pure feature-bad approaches yet is significantly less time consumin
g than pure model-bad approaches.The current implementation is tailored to tracking eye movements in infrared video obtained from an inexpensive head-mounted eye-tracking system.A vali-dation study was conducted and showed that the technique can reliably estimate eye position with an accuracy of ap-proximately one degree of visual angle.
1.Introduction
The u of eye tracking has significant potential to enhance the quality of everyday human-computer interfaces.Two types of human-computer interfaces utilize eye-movement measures–active and passive interfaces.Active interfaces allow urs to explicitly control the interface though the u of eye movements[8].For example,eye typing has urs look at keys on a virtual keyboard to type instead of manu-ally depressing keys as on a traditional keyboard[9].Such active interfaces have been quite effective at helping urs with movement disabilities interact with computers.The techniques may also be uful for normal interface usage given that when urs intend to lect an icon in a graphical ur interface,they typicallyfirst look at the icon and thus lection can potentially be speeded with eye tracking[16]. On the other hand,passive interfaces monitor the ur’s eye movements and automatically adapt themlves to the ur. For example in video transmission and virtual reality ap-plications,gaze-contingent variable-resolution display
tech-
(a)
(b)
(c)(d)
Figure1:(a)&(b)Head-mounted eye tracker(c)Image of a scene obtained by the eye tracker.(d)Image of the ur’s right eye il-luminated with infrared light.Note the clearly defined dark pupil and the specular reflection of the infrared LED.Also note the de-gree of line noi prent in(c)&(d)due to the low-cost construc-tion bad on consumer-grade off-the-shelf parts.
美式英语和英式英语
高考英语2020niques actively track the viewer’s eyes and prent a high level of detail at the point of gaze while sacrificing level of detail in the periphery where it is not distracting[13,14].
While eye tracking has been deployed in a number of rearch systems and to some smaller degree con-sumer products,eye tracking has not reached its full po-tential.Importantly,eye tracking technology has been available for many years using a variety of , Purkinje-reflection bad,contact-lens bad eye coil sys-tems,electro-oculography;e[19]for a survey of classi-cal eye-tracking technology).The primary obstacle to inte-grating the techniques into human-computer interfaces is that they have been either too invasive or too expensive for routine u.Recently,the invasiveness of eye tracking has been significantly reduced with advances in the miniatur-ization of head-mounted video-bad eye-trackers[15,1]. Remote video-bad eye tracking techniques also mi
nimize intrusiveness[6,10],however can suffer from reduced ac-curacy with respect to head-mounted systems.Given the advances,the most significant remaining obstacle is the cost.Currently,a number of eye trackers are available on
the market and their prices range from approximately5,000 to40,000US Dollars.Notably,the bulk of this cost is not due to hardware,as the price of high-quality digital cam-era technology has dropped precipitously over the last ten years.Rather,the costs are associated with custom soft-ware implementations,sometimes integrated with special-ized digital processors,to obtain high-speed performance.
This analysis clearly indicates that in order to inte-grate eye tracking into everyday human-computer inter-faces,the development of widely available,reliable and high-speed eye-tracking algorithms that run on general computing hardware need to be implemented.Towards this goal,we have developed a hybrid eye-tracking algorithm that integrates feature-bad and model-bad approaches and made its implementation available for distribution in an open-source package.In combination with low-cost head-mounted eye-tracking systems[18],there is a significant potential that eye tracking will be successfully incorporated into the next generation of human-computer interfaces. 2.Problem statement
As mentioned above,eye-tracking systems can be divided into remote and head-mounted systems.Each type of sys-tem has its respective advantages.For example,remote sys-tems are not as intrusive but are not as accurate orflexible as head-mounted systems.In other work,we have devel-oped a low-cost head-mounted eye tracker[18].This eye tracker consists of two consumer-grade CCD cameras that are mounted on a pair of safety glass(e Figure1).One camera captures an image of the eye while the other cap-tures an image of the scene.The two cameras are synchro-nized and operate at30hz each capturing640x480pixels. In this paper we develop an eye-tracking algorithm appli-cable for u with images captured from this type of head-mounted system.However,the propod algorithm could also be applied to video captured with a remote system.
线索英文Two types of imaging process are commonly ud in eye tracking,visible and infrared spectrum imaging[5]. Visible spectrum imaging is a passive approach that cap-tures ambient light reflected from the eye.In the images, it is often the ca that the best feature to track in visible spectrum images is the contour between the iris and the sclera known as the limbus.The three most relevant fea-tures of the eye are the pupil-the aperture that lets light into the eye,the iris-the colored muscle group that con-trols the diameter of the pupil,and the sclera,the white pro-tective tissue that covers t
he remainder of the eye.Visible spectrum eye tracking is complicated by the fact that un-controlled ambient light is ud as the source,which can contain multiple specular and diffu components.Infrared imaging eliminates uncontrolled specular reflection by ac-tively illuminating the eye with a uniform and controlled infrared light not perceivable by the ur.A further benefit of infrared imaging is that the pupil,rather than the lim-bus,is the strongest feature contour in the image(, Figure1d);both the sclera and the iris strongly reflect in-frared light while only the sclera strongly reflects visible light.Tracking the pupil contour is preferable given that the pupil contour is smaller and more sharply defined than the limbus.Furthermore,due to its size,the pupil is less likely to be occluded by the eye lids.The primary disadvantage of infrared imaging techniques is that they cannot be ud outdoors during daytime due to the ambient infrared illumi-nation.In this paper,we focus our algorithm development on infrared spectrum imaging techniques but aim to extend the techniques to visible spectrum imaging as well.
Infrared eye tracking typically utilizes either bright-pupil or dark-pupil techniques(however e[10]for the com-bined u of both bright-pupil and dark-pupil techniques). Bright-pupil techniques illuminate the eye with a source that is on or very near the axis of the camera.The result of such illumination is that the pupil is clearly demarcated as a bright region due to the photoreflective nature of the back of
the eye.Dark-pupil techniques illuminate the eye with an off-axis source such that the pupil is the darkest region in the image,while the sclera,iris and eye lids all reflect relatively more illumination.In either method,the first-surface specular reflection of the illumination source off of the cornea(the outer-most optical element of the eye) is also visible.This vector between the pupil center and the corneal reflection is typically ud as the dependent mea-sure rather than the pupil center alone.This is becau the vector difference is innsitive to slippage of the head gear -both the camera and the source move simultaneously(e the results of our validation study,below).In this paper we focus on algorithm development for dark-pupil techniques however our algorithm could be readily applied to bright-pupil techniques.
3.Related Work
Eye-tracking algorithms can be classified into two ap-proaches:feature-bad and model-bad approaches. Feature-bad approaches detect and localize image fea-tures related to the position of the eye.A commonality among feature-bad approaches is that a ,a threshold)is needed to decide when a feature is prent or abnt.The determination of an appropriate threshold is typically left as a free parameter that is adjusted by the ur. The tracked features vary widely across algorithms but most often rely on intensity levels or intensity gradients.For ex-ample,in infrared imag
es created with the dark-pupil tech-nique,an appropriately t intensity threshold can be ud to extract the region corresponding to the pupil.The pupil center can be taken as the geometric center of this identified region.The intensity gradient can be ud to detect the lim-bus in visible spectrum images[21]or the pupil contour in
infrared spectrum images[12].An ellip can then befitted to the feature points.
On the other hand,model-bad approaches do not ex-plicitly detect features but ratherfind the bestfitting model that is consistent with the image.For example,integro-differential operators can be ud tofind the best-fitting circle[3]or ellip[11]for the limbus and pupil contour. This approach requires an iterative arch of the model pa-rameter space that maximizes the integral of the derivative along the contour of the circle or ellip.The model-bad approach can provide a more preci estimate of the pupil center than a feature-bad approach given that a feature-defining criteria is not applied to the image data.However, this approach requires arching a complex parameter space that can be fraught with local minima.Thus gradient tech-niques cannot be ud without a good initial guess for the model parameters.Thus,the gain in accuracy of a model-bad approach is obtained at a significant cost in terms of computational speed andflexibility.Notably however,the u of multi-scale image processing methods[2]in combi-nation with a model-bad app
roach hold promi for real-time performance[5].
4.Starburst Algorithm
Prented in this ction is an eye-tracking algorithm that combines feature-bad and model-bad approaches to achieve a good tradeoff between run-time performance and accuracy for dark-pupil infrared illumination.The goal of the algorithm is to extract the location of the pupil center and the corneal reflection so as to relate the vector differ-ence between the measures to coordinates in the scene image.The algorithm begins by locating and removing the corneal reflection from the image.Then the pupil edge points are located using an iterative feature-bad tech-nique.An ellip isfitted to a subt of the detected edge points using the Random Sample Connsus(RANSAC) paradigm[4].The bestfitting parameters from this feature-bad approach are then ud to initialize a local model-bad arch for the ellip parameters that maximize thefit to the image data.
4.1.Noi Reduction
Due to the u of a low-cost head-mounted eye tracker de-scribed in Section2,we need to begin by reducing the noi prent in the images.There are two types of noi,shot noi and line noi.We reduce the shot noi by applying a5×5Gaussianfilter with a standard deviation of2pixels. The line n
oi is spurious and a normalization factor can be applied line by line to shift the mean intensity of the line to the running average derived from previous frames.This factor C for each line l in frame i is
C(i,l)=β¯I(i,l)+(1−β)C(i−1,l)(1) where¯I(i,l)is the average line intensity andβ=0.2.Note
that this noi reduction technique is optional and can be eliminated when the algorithm is ud in combination with an eye tracker capable of capturing less noisy images.
4.2.Corneal reflection detection,localization
and removal
The corneal reflection corresponds to one of the brightest regions in the eye image.Thus the corneal reflection can be obtained through thresholding.However,a constant thresh-old across obrvers and even within obrvers is not opti-mal.Therefore we u an adaptive thresholding technique in each frame to localize the corneal reflection.Note that becau the cornea extends approximately to the limbus,we can limit our arch for the corneal reflection to a square re-gion of interest with a half width of h=150pixels(e the Discussion ction regarding parameter values).To begin, the maximum threshold is ud to produce a binary image in which only values above this threshold are
taken as corneal reflection candidates.It is likely that the largest candidate region is attributable to the corneal reflection,as other spec-ular reflections tend to be quite small and located off the cornea as well as near the corner of the image where the eye lids meet.The ratio between the area of the largest can-didate and the average area of other regions is calculated as the threshold is lowered.Atfirst,the ratio will increa becau the corneal reflection will grow in size faster than other areas.Note that the intensity of the corneal reflection monotonically decreas towards its edges,explaining this growth.A lower threshold will,in general,also induce an increa in fal candidates.The ratio will begin to drop as the fal candidates become more prominent and the size of the corneal reflection region becomes large.We take the threshold that generates the highest ratio as optimal.The location of the corneal reflection is then given by the ge-ometric center(x c,y c)of the largest region in the image using the adaptively determined threshold.
Given its small size,the corneal reflection is approxi-mately a circle in the image.While the approximate size of the corneal reflection can be derived using the thresholded region from the localization step,this region does not typi-cally include the entire profile of the corneal reflection.To determine the full extent of the corneal reflection,we as-sume that the intensity profile of the corneal reflection fol-lows a bivariate Gaussian distribution.If wefind the radius r where the average decline in intensity is maximal and re-late it to the radius with maximal decline for a
a radius of one standard deviation),we can take the full ex-tent of the corneal reflection as2.5r to capture99%of the corneal reflection profile.Wefind r through a gradient de-cent arch that minimizes
I(r+δ,x c,y c,θ)dθ
I(r−δ,x c,y c,θ)dθ
(2)
whereδ=1,and I(r,x,y,θ)is the pixel intensity at angle θon the contour of a circle defined by the parameters r,x
1Input :Eye image with corneal reflection removed,Best guess of pupil center
2Output :Set of feature points 3Procedure:4Iterate 5Stage 1:
6Follow rays extending from the starting point 7Calculate intensity derivative at each point 8If derivative >threshold then 9Place feature point
crucial10Halt marching along ray 11Stage 2:
12For each feature point detected in Stage 1
13March along rays returning towards the start point 14Calculate intensity derivative at each point 15If derivative >threshold then 16Place feature point
17Halt marching along ray
18Starting point =geometric center of feature points 19
Until starting point converges
Figure 2:Feature-point detection method
and y .The arch is initialized with r =
,where area is the number of pixels in the thresholded region.The arch converges rapidly.
Radial interpolation is then ud to remove the corneal reflection.First,the central pixel of the identified corneal reflection region is t to the average of the intensities along the contour of the region.Then for each pixel between the center and the contour,the pixel intensity is determined via linear interpolation.An example of this process can be en in Figure 5(compare a and b).
4.3.Pupil contour detection
We have developed a novel feature-bad method to detect the pupil contour.The pudo code that describes the al-gorithm is shown in Figure 2.While other feature-bad approaches apply edge detection to the entire eye image or to a region of interest around the estimated pupil loca-tion,the approaches can be computationally wasteful as the pupil contour frequently occupies very little of the im-age.We,instead,detect edges along a limited number of rays that extend from a central best guess of the pupil cen-ter.The rays can be en in Figure 3a.This method takes advantage of the high-contrast elliptical profile of the pupil contour prent in images taken with infrared illumination using the dark-pupil technique.
For each frame,a location is chon that reprents the best guess of the pupil center in the frame.For the first frame this can be manually determined or taken as the cen-ter of the image.For subquent frames,the location of the pupil center from the previous frame is ud.Next,the derivatives ∆along N =18rays,extending radially away from this starting point,are independently evaluated pixel by pixel until a threshold φ=20is exceeded.Given that we are using the dark-pupil technique,only positive deriva-(a)
(b)(c)
Figure 3:Feature detection.(a)Pupil contour edge candidates
are detected along the length of a ries of rays extending from a best guess of the pupil center.Pupil contour candidates are marked using cross.Note that two contour candidates are incorrect -one ray reaches the border and does not generate a candidate.(b)For each pupil contour candidate another t of a rays are gener-ated that create a cond t of pupil contour candidates (c)pupil contour candidates not on the pupil contour can lead to additional feature points not on the contour however the are typically not consistent with any single ellip.
tives (increasing intensity as the ray extends)are consid-ered.When this threshold is exceeded,a feature point is defined at that location and the processing along the ray is halted.If the ray extends to the border of the image,no fea-ture point is defined.An example t of candidate feature points is shown in Figure 3a.
外语学习方法
For each of the candidate feature points,the above de-scribed feature-detection process is repeated.However,rays are limited to γ=±50degrees around the ray that origi-nally generated the feature point.The motivation for limit-ing the return rays in this way is that if the candidate feature point is indeed on the pupil contour (as shown in Figure 3b),the returning rays will generate additiona
l feature points on the opposite side of the pupil such that they are all con-sistent with a single ellip (i.e.the pupil contour).How-ever,if the candidate is not on the pupil (for example e Figure 3c),this process will generate additional candidate feature points that are not necessarily consistent with any single ellip.Thus,this procedure tends to increa the ra-tio of the number of feature points on the pupil contour over the number of feature points not on the pupil contour.Given that feature points defined by a large ∆are more likely to be located on the pupil contour (as this is the strongest image contour),the number of rays returned is t to 5φ/∆.Note that the minimum number of rays is 5becau by definition a feature point is determined by ∆>=φ.
The two-stage feature-detection process improves the ro-bustness of the method to poor initial guess for the start-ing point.This is a problem when an eye movement is
made as the eye can rapidly change positions from frame to frame.This is especially true for images obtained at low frame rates.For example,shown in Figure4a is such a ca. While the initial t of rays only detects three feature points on the pupil contour,the return rays from the three points detect many more points on the contour(e Figure4b).
The combined t of feature points is shown in Figure4d and the number of points on the contour we
ll exceed tho off of the contour.However,the feature points are biad to the side of the pupil contour nearest the initialization point. Although another iteration of the ray process would min-imize this bias,the computational burden grows exponen-tially with each iteration and thus would be an inefficient strategy.
At this point an ellip could befitted to the candidate points,however,the bias would induce a significant error into thefit.To eliminate this bias,the above described two-stage feature-detection process is iterated.For each iter-ation after thefirst,the average location of all the candi-date feature points from the last iteration is taken as the next starting location.The red circle in Figure4d shows the starting point for the cond iteration.The detected feature locations for the cond iteration are shown in Figure4e. Note the abnce of a strong bias.Figure4f shows how the central locations rapidly converge to the actual pupil cen-ter.The iteration is halted when the center of the detected feature points changes less than d=10pixels.When the initial guess is a good estimate of the pupil center,for ex-ample during eyefixations which occupy the majority of the frames,only a single iteration is required.When the initial estimate is not good,typically only a few iterations(<5) are required for convergence.If convergence is not reached within i=10iterations,as occurs sometimes during a blink when no pupil is visible,the algorithm halts and begins pro-cessing the next frame.
蘑菇的英文
4.4.Ellipfitting
Given a t of candidate feature points,the next step of the algorithm is tofind the bestfitting ellip.While other al-gorithms commonly u least-squaresfitting of an ellip to all the feature [20]),gross errors made in the feature detection stage can strongly influence the accuracy of the results.Consider the detected feature points shown in Figure5c and the resulting best-fit ellip using the least-squares techniques shown in Figure5d.Notice that a few feature points not on the pupil contour dramatically reduces the quality of thefit to an unacceptable level.
To address this issue,we apply the Random Sample Con-nsus(RANSAC)paradigm for modelfitting[4].To our knowledge,ours is thefirst application of RANSAC in the context of eye tracking,however RANSAC is frequently applied to other computer-vision ,e[7]). RANSAC is an effective technique for modelfitting in the prence of a large but unknown percentage of outliers in a
(a)(b)
(c)
(d)
(e)(f)
Figure4:Feature detection.(a)The original start point(yellow circle)shoots rays(blue)to generate candidate pupil points(green cross).(b&c)The candidate pupil points shoot rays back to-wards the start point to detect more candidate pupil points.(d) All the candidate pupil points are shown.The average of the locations is shown as a red circle.This location eds the next iteration.(e)The results of the cond iteration.(f)The starting locations from all iterations show a rapid convergence. measurement sample.An inlier is a sample in the data at-tributable to the mechanism being modeled whereas an out-lier is a sample generated through error and is attributable to another mechanism not under consideration.In our applica-tion,inliers are all of tho detected feature points that cor-respond to the pupil contour and outliers are feature points that correspond to other contours,such as that between the eye lid and the eye.Least-squares methods u all available data tofit a model b
ecau it is assumed that all of the sam-ples are inliers and that any error is attributable exclusively to measurement error.On the other hand,RANSAC admits the possibility of outliers and only us a subt of the data tofit the model.In detail,RANSAC is an iterative proce-dure that lects many small but random subts of the data, us each subt tofit a model,andfinds the model that has the most agreement with the data t as a whole.The subt of data consistent with this model is the connsus t.
In some cas,our two stage feature-detection process results in very few ,e Figure5e)while in other cas,outliers are much more ,e Fig-ure5f).Therefore it is important that we u the RANSAC paradigm tofind the ellip that bestfits the pupil contour. The following procedure is repeated R times.First,five英语四级准考证查询
samples are randomly chon from the detected feature t given that this is the minimum sample size required to de-termine all the parameters of an ellip.Singular Value De-composition(SVD)on the conic constraint matrix gener-ated with normalized feature-point coordinates[7]is ud tofind the parameters of the ellip that perfectlyfit the five points.
If the parameters of the ellip are imaginary,the ellip center is outside of the image,or the major a
xis is greater than two times the minor axis,five different points are ran-domly chon until this is no longer the ca.Then,the number of candidate feature points in the data t that agree with this he inliers)are counted.Inliers are tho sample points for which the algebraic distance to the ellip is less than some threshold t.This threshold is de-rived from a probabilistic model of the error expected bad on the nature of our feature detector.It is assumed that the average error variance of our feature detector is approxi-mately one pixel and that this error is distributed as a Gaus-sian with zero mean.Thus to obtain a95%probability that a sample is correctly classified as an inlier,the threshold should be derived from aχ2distribution with one degree of freedom[7].This results in a threshold distance of1.98 pixels.After R repetitions,the model with the largest con-nsus t is ud.Becau it is often computationally in-feasible to evaluate all possible feature point combinations, the number of random subts to try must be determined such that it is assured that at least one of the randomly -lected subts contains only inliers.This can be guaranteed with probability p=0.99,if
R=
log(1−p)
log(1−w5)
(3)
where w is the proportion of inliers in the sample.Although w is not known a priori,its lower bound is given by the maximum number of inliers found for any model in the it-eration and thus R can initially be t very large and low-ered using Equation3as the iteration proceeds.After the necessary number of iterations,an ellip isfit to the largest connsus ,e Figure5g).
4.5.Model-bad optimization
Although the accuracy of the RANSACfit may be sufficient for many eye tracking applications,the result of ellipfit-ting can be improved through a model-bad optimization that does not rely on feature detection.Tofind the parame-ters a,b,x,y,αof the bestfitting ellip,we minimize
−
I(a+δ,b+δ,α,x,y,θ)dθ
I(a−δ,b−δ,α,x,y,θ)dθ
(4)
esp是什么whereδ=1and I(a,b,α,x,y,θ)is the pixel intensity at angleθon the contour of an ellip defined by the param-eters a,b,x,y andα.The arch is initialized with the best-fitting ellip parameters as determined by the RANSACfit.
Model-bad-Pupil Center1st2nd3rd Narrow FOV0.507 6.57210.362
Wide FOV0.5917.52712.316 Model-bad Vector Difference1st2nd3rd Narrow FOV0.4710.981 1.204
Wide FOV0.515 1.203 1.565 Table1:Accuracy results of the validation study 4.6.Homographic mapping and calibration
In order to calculate the point of gaze of the ur in the
scene image,a mapping between locations in the scene im-
age and an eye-position ,the vector difference
between the pupil center and the corneal reflection)must be
determined.The typical procedure in eye-tracking method-
ology is to measure this relationship through a calibration
procedure[17].During calibration,the ur is required to
look at a number of scene points for which the positions in
the scene image are known.While the ur isfixating each
scene point s=(x s,y s,1),the eye position e=(x e,y e,1) is measured(note the homogeneous coordinates).We gen-
erate the mapping between the two ts of points using a
linear homographic mapping.This mapping H is a3×3
吃一堑长一智英文matrix and has eight degrees of freedom.To determine the
entries of H,a constraint matrix is generated using mea-
sured point correspondences.Each correspondence gener-
ates two constraints and thus four correspondences are suf-
ficient to solve for H up to scale[7].The null space of
the constraint matrix can be determined through SVD,and
provides H.Once this mapping is determined the ur’s
point of gaze in the scene for any frame can be established
as s=H e.Note that we u a3×3grid of calibration
points distributed uniformly in the scene image to assure an
accurate prediction of eye movements.In this ca,there
are more constraints than unknowns and SVD produces the
mapping H that minimizes the algebraic error distance. 5.Algorithm Validation
An eye-tracking evaluation was conducted in order to vali-
date the performance of the algorithm.Video was recorded
from the head-mounted eye tracker described in Section2
while each of the three authors viewed two movie trailers
prented on a laptop computer.Prior to and after the view-
ing of each trailer,the ur placed their head in a chin rest
andfixated a ries of nine calibration marks on a white
board positioned approximately60cm away.The evalu-
ation was conducted twice for each ur.During the c-
ond evaluation,the narrowfield offield lens(56o Field of
View(FOV))ud on the scene camera was replaced with
a widefield of view lens(111o FOV,and significant radial
distortion)to evaluate the decrea in eye-tracking quality
attributable to the non-linear distortion of the lens.The
video captured during the evaluation is available for view-
ing at hcvl.hci.iastate.edu/openEyes.