3D Fingertip and Palm Tracking in Depth Image Sequences
Hui Liang Institute for Media Innovation &School of EEE
Nanyang T echnological
University,Singapore u.edu.sg
Junsong Yuan
School of EEE
Nanyang T echnological
University
50Nanyang Avenue,
Singapore639798
jsyuan@ntu.edu.sg
Daniel Thalmann
Institute for Media Innovation
Nanyang T echnological
University
50Nanyang Drive,Singapore
637553
danielthalmann@ntu.edu.sg
ABSTRACT
We prent a vision-bad approach for robust3Dfingertip and palm tracking on depth images using a single Kinect nsor.First the hand is gmented in the depth images by applying depth and morphological constraints.The palm is located by performing distance transform to the hand contour and tracked with a Kalmanfilter.Thefingertips are detected by combining three depth-bad features
and tracked with a particlefilter over successive frames.Quan-titative results on synthetic depth quences show the pro-pod scheme can track thefingertips quite accurately.Be-sides,its capabilities are further demonstrated through a real-life human-computer interaction application.
Categories and Subject Descriptors
H.1.2[Models and Principles]:Ur/Machine Systems—Human information processing;I.4.8[Image Processing and Computer Vision]:Scene Analysis—Depth cues,Track-ing
Keywords
Fingertip Tracking,Human-Computer Interaction,Kinect Sensor,Geodesic Distance
1.INTRODUCTION
Human hand is an esntial body part for human-computer interaction due to its various usages in gesture recognition, animation synthesis and virtual object manipulation[7,10, 11].As important features of the hand,the positions of trackedfingertips have a variety of applications.They can be ud in combination with inver kinematics solver for hand po estimation[1].Their trajectories can be ud for gesture recognition[6,4]or manipulative purpo in multi-touch systems[5].A lot of work h
ave been done for vision-badfingertip tracking,while many previous methods only focus on extracting2Dfingertips and cannot trackfinger-tips robustly for a freely moving hand[6,4,5,9].Rearch
Permission to make digital or hard copies of all or part of this work for personal or classroom u is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on thefirst page.To copy otherwi,to republish,to post on rvers or to redistribute to lists,requires prior specific permission and/or a fee.
MM’12,October29–November2,2012,Nara,Japan.
Copyright2012ACM978-1-4503-1089-5/12/10
...$15.00.Figure1:Difficulty infingertip tracking and solu-tions.(a)Side-by-sidefingers.(b)Bendingfingers.
(c)Nearbyfingertips:fingertips of the thumb and indexfingers are too clo to be labeled correctly.
(d)Rectangle feature.(e)Geodesic shortest path.
in3Dfingertip localization and tracking is still very lim-ited,and their performances are far from satisfactory for real-life applications[2,3].Difficulty in accurate3Dfin-gertip tracking mainly lies in three aspects.First,multiple fingertips of veral side-by-sidefingers are hard to be distin-guished.Second,traditional contour-bad methods cannot locate thefingertips for bendingfingers.Third,it is a chal-lenging problem to label each detectedfingertip correctly. Fig.1(a-c)illustrates the problems.
Most previousfingertip tracking schemes are bad on contour analysis of the extracted hand region[6,4,5,2]and usually can track thefingertips for only stretchedfingers. In[6]fingertips are tracked for infrared image quences.It utilizes a template matching strategy to detect thefingertip locations.The correspondence of thefingertips between suc-cessive frames is built by minimizing the sum of distances between the predicted locations given by a Kalmanfilter and the detectedfingertips.
In[4]fingertips are located within the hand region byfirst propagating a t of par-ticles from the hand center to the hand contour and then choosing the particles where the transitions between skin and non-skin areas meet certain requirement.Stereoscopic vision is adopted in[2]to track the3D position of thefin-gertip of a single pointingfinger.Thefingertip is located by finding the two points which maximize the distance to the center of gravity of the hand region and the boundary cur-vature on the silhouette of the hand in both input images. The3D position of thefingertip is then found using stere-ovision and tracked with a Kalmanfilter.In[9]the Kinect nsor is utilized for3Dfingertip and palm center detection
for two hands.The palm center is detected by applying dis-tance transform to the inverted binary image of the hand regions.Thefinger regions are gmented from the palm re-gion andfingertip locations are found by assuming they are the clost to the camera in eachfinger region.In[3]a more discriminative circular image feature is adopted forfinger-tip detection,which can tackle more complex hand motion such as grasping.Thefingertips are tracked by combining particlefiltering and mean-shift tracking.However,none of the methods is capable of extracting the3D positions for all thefivefingertips during natural hand motios,such as in Fig.(b-c).
In this paper we prent a robustfingertip and palm track-ing scheme with the input of depth images
captured by a single Kinect nsor.The hand region is gmented from the depth frame by applying depth and morphological con-straints and the palm circle is then identified.The3D positions of thefingertips are tracked using a particlefil-ter through successive frames.We rely on three depth-bad features to differentiate thefingertip and non-fingertip points.Quantitative test results on six synthetic quences show that the propodfingertip tracking scheme tracks the 3Dfingertip positions quite accurately.In addition,we de-velop an applications bad on thefingertip tracking results, in which the3Dfingertip positions are ud with an inver kinematic solver to drive a hand model to manipulate virtual objects.
2.HAND AND PALM DETECTION
We utilize the morphology of the hand for hand gmen-tation in the depth image and make veral assumptions on hand motion.First,we assume the hand is the nearest ob-ject to the camera and constrain global hand rotation by:−15◦≤θx≤15◦,−15◦≤θy≤15◦,−90◦≤θz≤90◦,
(2.1) where(θx,θy,θz)is the global rotation angle of the hand. Second,the depth value differences within the forearm and hand region are less than a threshold z D=0.2m. Third,bad on morphology of hand,we assume that hand palm forms a globally largest blob in the hand and fore-ar
汽车导航怎么升级m region in the depth image whenθx≈θy≈0◦,and forms a locally largest blob when the hand rotates within ranges defined in(2.1).The palm region can thus be ap-proximated with a circle C p=(p p,r p),where p p is the palm center and r p is the radius.The propod hand and palm detection scheme consists of three steps:foreground gmen-tation,palm localization and hand gmentation.It starts with thresholding the depth frame to obtain the foreground F.F is given by:
F={(p|z(p)<z0+z D},(2.2) where(p,z(p))denotes a pixel in the depth image at co-ordinate p and with depth value z(p);z0is the minimum depth value.This ensures that both hand and forearm re-gions are extracted from the depth frame.C p then equals to the largest inscribed circle of the contour of F.To reduce the computational complexity of palm localization,the cen-ter of C p is tracked with a2D Kalmanfilter.Finally the hand and forearm regions are parated by a line which is both tangent to C p and perpendicular to the orientation of the forearm.We approximate the orientation of the fore-arm using the Eigenvector that corresponds to the largest Eigenvalue of the covariance matrix of the contour pixel co-ordinates of F.Let the extracted hand regions in the depth frame be F D.We further process F D to get a3D point cloud F V by calculating the3D world position for each point in F D using the projection parameters of the Kinect nsor.
3.FINGERTIP DETECTION&TRACKING Inspired by the concept of Accumulative Geodesic Ex-trem
a[8],we define thefingertip position as the point that maximizes the geodesic distance from the palm center within eachfinger.However,due to lf-occlusion of the hand,the geodesic distances may not be correctly estimated for all points with the hand region.In addition,the AGEX extrac-tion algorithm[8]cannot robustly detect thefingertip po-sitions when multiplefingers are side-by-side as it requires the AGEX interest points to be sparly located,and its computational complexity is high since Dijkstra’s algorithm needs to be performed every time when an AGEX point is to be extracted.We address the issues by imposing more constraints on possiblefingertip locations.First,we assume fingertips can be only detected where depth is discontinuous in F Fig.1(a-c),and denote the t of the points as the border point t U B.Second,the relative depth differ-ences between one point and its neighborhood are important to differentiatefingertip and non-fingertip Fig. 1(d),and we design a rectangle local feature to take advan-tage of this fact.Third,we utilize the3D geodesic shortest path(GSP)to differentiate nearbyfingertips,which is more robust than thefingertip position alone,e.gi.Fig.1.(e). Overall,the propodfingertip detection and tracking sch-eme consists of two stages,namely,the initialization and reinitialization stage,as well as thefingertip tracking stage. In thefirst stage,the ur is requested to po the hand so that thefingers are not side-by-side.Thefingertip positions are detected using three depth-bad features.Eachfinger-tip is then given a label l∈L f={T,I,M,R,P}using a GSP-bad voting strategy.The
labels in L f corresponds to the thumb,index,middle,ring and pinkyfingers.The cond stage starts only when allfivefingertips are detected in thefirst stage.In the cond stage,each of the detected fingertips in thefirst stage is tracked with a single particle filter.Note that thefirst stage can be performed not only at thefirst frame.Whenfivefingertips are detected,thefinger-tip tracking process can also be automatically reinitialized.
3.1Initialization and Re-initialization
The task of this stage is to detect allfivefingertips in the depth image F V bad on three depth-bad features:the geodesic distance,local rectangle feature and GSP points. To estimate the geodesic distance for each point,wefirst build a graph G h=(V h,E h)using the point cloud F V as in[8].V h consists of all points within F V.For each pair of vertices(p,q)∈V h,there is an edge between p and q if and only if they are in the8-neighbohood of each other and their3D distance d(p,q)= p−q 2is within thresholdτ. To ensure the resulting graph is connected,we arch for a t of connected components in G h using the union-find algorithm.The connected component containing the palm center is identified and the remaining ones are connected to it byfinding their nearest vertices and adding an edge with weights equal to the3D Euclidean distance.We then per-form Dijkstra graph arch on G h to calculate the geodesic distance from the palm center p p for each vertex p∈V h.Let
the geodesic distance of each vertex p be d g(p).The GSP point t U G(p)for p is defined as the t of vertices on the shortest path from p p to p.A rectangle local feature RL(p) is ud to describe the neighborhood of a point p in F V, which is defined as a square of size S centered at p.Each pixel q within RL(p)is binarized according to the following
rule:
I(q)=
1if|z(p)−z(q)|≤z T
0otherwi
,(3.1)
where z T is a threshold value of about1cm.We defineη(p)
as the ratio of the number of points with nonzero values in
RL(p)to the size of RL(p).For a stretched hand,fingertip can only locate where d g is locally maximized in F V and the
points around afingertip take much smaller values ofηthan
other points,say,η≤0.4.Bad on the obrvations,we
detect thefingertips using Algorithm1.
Algorithm1Fingertip Detection.
Input:
The border point t,U B;
Output:
The detectedfingertip positions,p i f;
1:Preprocess:D C={p|p∈U B,d g(p)>d T,η(p)<ηT}; 2:Label connected components:D C=D1∪D2∪...∪D M; 3:Sort the components according to size and ignore small ones to get D i B,i=1,2,...,M B;
4:Get the number offingertips M F:if M B<5,M F= M B,otherwi M F=5;
5:p i f=arg max
p∈D i B
d g(p),i=1,2,...,M F;
6:return p i f;
If allfivefingertips are detected by Algorithm1,each of
them will be given a label l∈L f using a GSP-bad voting
strategy.We do not u the positions of thefingertips for
labeling as they show great uncertainty and are not robust
for Fi.1(c).Instead,we u the GSP point
t of each detectedfingertip p i f to vote for its label l i.This
voting strategy is inspired by the fact that the2D relative
positions of points near to p p on different GSPs remain stable
againstfinger bending and global hand transformation.Let
the GSP point ts of thefingertips be U i G=U G(p i f)= {p i,k|k=0,1,...,N i},i=1,2,...,5.For eachfingertip p i f,a five element counter arrayΓi,j,j=1,2,...,5is maintained to estimate the probability that p i f has the label l j.Note that a right hand is adopted in our system.Thefingertips are labeled using the Algorithm2.
In Fig.2we prent labeling results for veral samples.
We can e this labeling scheme is quite robust to hand
articulation.
3.2Fingertip Tracking
After the initial positions of thefivefingertips are de-
tected,we build a particlefilter for eachfingertip to track
their positions through successive frames.Let(x,ω)denote
a particle,where the state hypothesis x is its2D position in
F V andωis the particle weight.The basic idea is to con-strain the positions of each particle to the border point t U B to reduce the arch space,instead of choosing arbitrary positions within the2D space.Let f(y k|x k)be the likeli-hood function with y k reprenting the current obrvation.Algorithm2GSP-bad Voting for Fingertip Labeling. Input:
The GSP point ts of thefivefingertips,U i G; Output:
The label for eachfingertip,l i;
1:Γi,j={0},N max=max{N i},k=0;
2:Extractfive points p i,k
r
,where k r=k×N i/N max;
3:Sort thefive points by arranging thefive vectors v i d= p i,k r−p p clockwily;
梦想起航4:Let the order number of p i,k
后背脊柱中间疼痛r
be j,Γi,j=Γi,j+1;
5:k=k+1.If k≥N max,go to6,otherwi go to2;
6:l i=arg max
j
怎样才能变聪明
Γi,j;
7:return l i
;
Figure2:Fingertip Labeling.GSP Points(left). Labeling results(right).
Let k be the frame number of thefingertip tracking stage. Eachfingertip is tracked using Algorithm3in frame k.
Algorithm3Particle Filter bad Fingertip Tracking. Input:
The initially detectedfingertip position,p i f;
很任性歌词
Output:
年贴现率The updated position of thefingertip,p i f;
1:If k=0,generate N random particles at positions x j
k
, j=1,2,...,N around p i f,ωj k−1=1/N;
2:Diffu the positions of the particles byfinding their nearest neighbor in U B;
3:Estimate f(y k|x j k)and updateωj k;
4:Updatefingertip position p i f=arg maxωj
k
j
;
5:return p i f;
We now define the likelihood function f(y k|x k)bad on the geodesic distance d g,the rectangle local feature RL and GSP point t U G:
f(y k|x k)=f(d g,RL,U G|x k)
=f(d g|x k)f(RL|x k)f(U G|x k),(3.2) where we assume d g,RL and U G are conditionally inde-pendent.The three terms in f(y k|x k)all take the form of an exponential function of certain distance metric.In f(d g|x k),the distance metric is defined as the difference be-tween D i g and d g(x k),where D i g is a pre-defined geodesic distance value for thefingertip of label l i.Temporal refer-ence is ud for estimating f(RL|x k)and f(U G|x k)as they change with thefinger motions.Let the referencefingertip position be p ref.f(U G|x k)is defined bad on the Hausdorffdistance D H(U G(x k),U G(p ref)).f(RL|x k)is defined bad
on the feature distance D RL between RL(x k)and RL p
ref
,
披萨做法
Table 1:Quantitative results on synthetic quences
Seq.Average Error (cm)No.Thumb Index Middle Ring Pinky Seq.1 2.51 1.53 1.50 1.270.77Seq.2
1.630.930.780.740.69Seq.3 1.340.880.650.840.89Seq.4 2.11 1.15 1.160.840.81Seq.5 1.200.820.750.520.59Seq.6 1.440.930.890.770.86which is defined as the ratio of the number of points with the same values to the size of the rectangle.f (y k |x k )is given by:
f =exp −λ
g d g −D i g
−λh D H −λrl D RL (3.3)4.EXPERIMENTS
4.1
Fingertip Tracking Accuracy
We quantitatively evaluate the fingertip tracking accuracy
on six synthetic quences in terms of the Euclidean distance between the tracked fingertips and the ground truth.As it is difficult to define the fingertip locations on the skin surface,we define their ground truth using the phalanx end point of each finger.Table 1shows the average localization errors in centimeter on all six quences with q.1for grasping motion,q.2for adduction/abduction motio
n,q.3for successive single finger motion,q.4for flexion motion of two fingers,q.5for global rotation and q.6for combination of grasping and global rotation.Note the localization error partly results from the fact that the fingertips are detected on the skin surface rather than the hand skeleton.
4.2Virtual Object Manipulation
We combine the 3D positions of the fingertips and palm center with an inver kinematics solver to drive a 3D hand model to manipulate virtual objects.Each finger is modeled as a kinematic chain and the cyclic coordinate descent al-gorithm [12]is ud for inver kinematics estimation of the finger po.Besides,we build a virtual environment using the Nvidia Physx SDK,which contains a 3D hand model and some virtual objects like boxes and spheres.Urs can u their bare hands to perform some manipulative tasks such as moving,pushing and grasping.A quence of snapshots for virtual object manipulation is shown in Fig.3.
5.CONCLUSION
Fingertip and palm positions are important features fork金和铂金的区别
human-computer interaction.Most previous approaches can-not track the 3D positions of fingertips robustly due to the high flexibility of finger motion.In this paper,we address the issues by using multiple depth-bad features for ac-curate fingertip localization and adopting a particle filter to track the fingertips over successive frames.The palm is lo-cated by performing distance transform to the gmented hand contour and tracked with a Kalman filter.Quanti-tative results on synthetic depth quences and a real-life human-computer interaction application show the propod scheme can track the fingertips accurately and has great po-tential for extension to other HCI applications.
Figure 3:Virtual object grasping.
6.ACKNOWLEDGMENTS
This rearch,which is carried out at BeingThere Centre,is supported by the Singapore National Rearch Founda-tion under its International Rearch Centre @Singapore Funding Initiative and administered by the IDM Programme Office.
7.REFERENCES
[1] C.-S.Chua,H.Guan,and Y.-K.Ho.Model-bad 3d hand
posture estimation from a single 2d image.Image and Vision Computing ,20(3):191–202,2002.
[2]S.Coni1,S.Bourennane,and L.Martin.Three
dimensional fingertip tracking in stereovision,2005.Proc.of the 7th Int’ Advanced Concepts for Intelligent Vision Systems.
[3]M.Do,T.Asfour,and R.Dillmann.Particle filter-bad
fingertip tracking with circular hough transform features,2011.Proc.of the 12th Machine Vision Applications.
[4]K.Hsiao,T.Chen,and S.Chien.Fast fingertip positioning
by combining particle filtering with particle random
diffusion,2008.Proc.IEEE Int’ Multimedia and Expo.
[5]I.Katz,K.Gabayan,and H.Aghajan.A multi-touch
surface using multiple cameras,2007.Proc.of the 9th Int’ Advanced concepts for intelligent v
ision systems.[6]K.Oka,Y.Sato,and H.Koike.Real-time tracking of
multiple fingertips and gesture recognition for augmented desk interface systems,2002.Proc.IEEE Int’ Automatic Face and Gesture Recognition.
[7]V.I.Pavlovic,R.Sharma,and T.S.Huang.Visual
interpretation of hand gestures for human-computer interaction:A review.PAMI ,19(7):677–695,1997.
[8] C.Plagemann,V.Ganapathi,D.Koller,and S.Thrun.
Real-time identification and localization of body parts from depth images,2010.Proc.IEEE Int’ Robotics and Automation.
[9]J.L.Raheja,A.Chaudhary,and K.Singal.Tracking of
fingertips and centres of palm using kinect,2011.Proc.of the 3rd Int’ Computational Intelligence,Modelling and Simulation.
[10]Z.Ren,J.Meng,J.Yuan,and Z.Zhang.Robust hand
gesture recognition with kinect nsor,2011.Proc.of ACM Multimedia.
[11]Z.Ren,J.Yuan,and Z.Zhang.Robust hand gesture
recognition bad on finger-earth mover distance with a commodity depth camera,2011.Proc.of ACM Multimedia.[12]L.C.T.Wang and C.C.Chen.A combined optimization
method for solving the inver kinematics problem of mechanical manipulators.IEEE Trans.Robotics and Automation ,7(4):489–499,1991.