首页 > 美文鉴赏

3D Fingertip and Palm Tracking in Depth Image Sequences

更新时间:2023-06-23 15:46:50 阅读：评论：0

Hui Liang Institute for Media Innovation &School of EEE

Nanyang T echnological

University,Singapore u.edu.sg

Junsong Yuan

School of EEE

Nanyang T echnological

University

50Nanyang Avenue,

Singapore639798

jsyuan@ntu.edu.sg

Daniel Thalmann

Institute for Media Innovation

Nanyang T echnological

University

50Nanyang Drive,Singapore

637553

danielthalmann@ntu.edu.sg

ABSTRACT

We prent a vision-bad approach for robust3Dﬁngertip and palm tracking on depth images using a single Kinect nsor.First the hand is gmented in the depth images by applying depth and morphological constraints.The palm is located by performing distance transform to the hand contour and tracked with a Kalmanﬁlter.Theﬁngertips are detected by combining three depth-bad features

and tracked with a particleﬁlter over successive frames.Quan-titative results on synthetic depth quences show the pro-pod scheme can track theﬁngertips quite accurately.Be-sides,its capabilities are further demonstrated through a real-life human-computer interaction application.

Categories and Subject Descriptors

H.1.2[Models and Principles]:Ur/Machine Systems—Human information processing;I.4.8[Image Processing and Computer Vision]:Scene Analysis—Depth cues,Track-ing

Keywords

Fingertip Tracking,Human-Computer Interaction,Kinect Sensor,Geodesic Distance

1.INTRODUCTION

Human hand is an esntial body part for human-computer interaction due to its various usages in gesture recognition, animation synthesis and virtual object manipulation[7,10, 11].As important features of the hand,the positions of trackedﬁngertips have a variety of applications.They can be ud in combination with inver kinematics solver for hand po estimation[1].Their trajectories can be ud for gesture recognition[6,4]or manipulative purpo in multi-touch systems[5].A lot of work h

ave been done for vision-badﬁngertip tracking,while many previous methods only focus on extracting2Dﬁngertips and cannot trackﬁnger-tips robustly for a freely moving hand[6,4,5,9].Rearch

Permission to make digital or hard copies of all or part of this work for personal or classroom u is granted without fee provided that copies are not made or distributed for proﬁt or commercial advantage and that copies bear this notice and the full citation on theﬁrst page.To copy otherwi,to republish,to post on rvers or to redistribute to lists,requires prior speciﬁc permission and/or a fee.

MM’12,October29–November2,2012,Nara,Japan.

...$15.00.Figure1:Diﬃculty inﬁngertip tracking and solu-tions.(a)Side-by-sideﬁngers.(b)Bendingﬁngers.

(c)Nearbyﬁngertips:ﬁngertips of the thumb and indexﬁngers are too clo to be labeled correctly.

(d)Rectangle feature.(e)Geodesic shortest path.

in3Dﬁngertip localization and tracking is still very lim-ited,and their performances are far from satisfactory for real-life applications[2,3].Diﬃculty in accurate3Dﬁn-gertip tracking mainly lies in three aspects.First,multiple ﬁngertips of veral side-by-sideﬁngers are hard to be distin-guished.Second,traditional contour-bad methods cannot locate theﬁngertips for bendingﬁngers.Third,it is a chal-lenging problem to label each detectedﬁngertip correctly. Fig.1(a-c)illustrates the problems.

Most previousﬁngertip tracking schemes are bad on contour analysis of the extracted hand region[6,4,5,2]and usually can track theﬁngertips for only stretchedﬁngers. In[6]ﬁngertips are tracked for infrared image quences.It utilizes a template matching strategy to detect theﬁngertip locations.The correspondence of theﬁngertips between suc-cessive frames is built by minimizing the sum of distances between the predicted locations given by a Kalmanﬁlter and the detectedﬁngertips.

In[4]ﬁngertips are located within the hand region byﬁrst propagating a t of par-ticles from the hand center to the hand contour and then choosing the particles where the transitions between skin and non-skin areas meet certain requirement.Stereoscopic vision is adopted in[2]to track the3D position of theﬁn-gertip of a single pointingﬁnger.Theﬁngertip is located by ﬁnding the two points which maximize the distance to the center of gravity of the hand region and the boundary cur-vature on the silhouette of the hand in both input images. The3D position of theﬁngertip is then found using stere-ovision and tracked with a Kalmanﬁlter.In[9]the Kinect nsor is utilized for3Dﬁngertip and palm center detection

for two hands.The palm center is detected by applying dis-tance transform to the inverted binary image of the hand regions.Theﬁnger regions are gmented from the palm re-gion andﬁngertip locations are found by assuming they are the clost to the camera in eachﬁnger region.In[3]a more discriminative circular image feature is adopted forﬁnger-tip detection,which can tackle more complex hand motion such as grasping.Theﬁngertips are tracked by combining particleﬁltering and mean-shift tracking.However,none of the methods is capable of extracting the3D positions for all theﬁveﬁngertips during natural hand motios,such as in Fig.(b-c).

In this paper we prent a robustﬁngertip and palm track-ing scheme with the input of depth images

captured by a single Kinect nsor.The hand region is gmented from the depth frame by applying depth and morphological con-straints and the palm circle is then identiﬁed.The3D positions of theﬁngertips are tracked using a particleﬁl-ter through successive frames.We rely on three depth-bad features to diﬀerentiate theﬁngertip and non-ﬁngertip points.Quantitative test results on six synthetic quences show that the propodﬁngertip tracking scheme tracks the 3Dﬁngertip positions quite accurately.In addition,we de-velop an applications bad on theﬁngertip tracking results, in which the3Dﬁngertip positions are ud with an inver kinematic solver to drive a hand model to manipulate virtual objects.

2.HAND AND PALM DETECTION

We utilize the morphology of the hand for hand gmen-tation in the depth image and make veral assumptions on hand motion.First,we assume the hand is the nearest ob-ject to the camera and constrain global hand rotation by:−15◦≤θx≤15◦,−15◦≤θy≤15◦,−90◦≤θz≤90◦,

(2.1) where(θx,θy,θz)is the global rotation angle of the hand. Second,the depth value diﬀerences within the forearm and hand region are less than a threshold z D=0.2m. Third,bad on morphology of hand,we assume that hand palm forms a globally largest blob in the hand and fore-ar

汽车导航怎么升级m region in the depth image whenθx≈θy≈0◦,and forms a locally largest blob when the hand rotates within ranges deﬁned in(2.1).The palm region can thus be ap-proximated with a circle C p=(p p,r p),where p p is the palm center and r p is the radius.The propod hand and palm detection scheme consists of three steps:foreground gmen-tation,palm localization and hand gmentation.It starts with thresholding the depth frame to obtain the foreground F.F is given by:

F={(p|z(p)<z0+z D},(2.2) where(p,z(p))denotes a pixel in the depth image at co-ordinate p and with depth value z(p);z0is the minimum depth value.This ensures that both hand and forearm re-gions are extracted from the depth frame.C p then equals to the largest inscribed circle of the contour of F.To reduce the computational complexity of palm localization,the cen-ter of C p is tracked with a2D Kalmanﬁlter.Finally the hand and forearm regions are parated by a line which is both tangent to C p and perpendicular to the orientation of the forearm.We approximate the orientation of the fore-arm using the Eigenvector that corresponds to the largest Eigenvalue of the covariance matrix of the contour pixel co-ordinates of F.Let the extracted hand regions in the depth frame be F D.We further process F D to get a3D point cloud F V by calculating the3D world position for each point in F D using the projection parameters of the Kinect nsor.

3.FINGERTIP DETECTION&TRACKING Inspired by the concept of Accumulative Geodesic Ex-trem

a[8],we deﬁne theﬁngertip position as the point that maximizes the geodesic distance from the palm center within eachﬁnger.However,due to lf-occlusion of the hand,the geodesic distances may not be correctly estimated for all points with the hand region.In addition,the AGEX extrac-tion algorithm[8]cannot robustly detect theﬁngertip po-sitions when multipleﬁngers are side-by-side as it requires the AGEX interest points to be sparly located,and its computational complexity is high since Dijkstra’s algorithm needs to be performed every time when an AGEX point is to be extracted.We address the issues by imposing more constraints on possibleﬁngertip locations.First,we assume ﬁngertips can be only detected where depth is discontinuous in F Fig.1(a-c),and denote the t of the points as the border point t U B.Second,the relative depth diﬀer-ences between one point and its neighborhood are important to diﬀerentiateﬁngertip and non-ﬁngertip Fig. 1(d),and we design a rectangle local feature to take advan-tage of this fact.Third,we utilize the3D geodesic shortest path(GSP)to diﬀerentiate nearbyﬁngertips,which is more robust than theﬁngertip position alone,e.gi.Fig.1.(e). Overall,the propodﬁngertip detection and tracking sch-eme consists of two stages,namely,the initialization and reinitialization stage,as well as theﬁngertip tracking stage. In theﬁrst stage,the ur is requested to po the hand so that theﬁngers are not side-by-side.Theﬁngertip positions are detected using three depth-bad features.Eachﬁnger-tip is then given a label l∈L f={T,I,M,R,P}using a GSP-bad voting strategy.The

labels in L f corresponds to the thumb,index,middle,ring and pinkyﬁngers.The cond stage starts only when allﬁveﬁngertips are detected in theﬁrst stage.In the cond stage,each of the detected ﬁngertips in theﬁrst stage is tracked with a single particle ﬁlter.Note that theﬁrst stage can be performed not only at theﬁrst frame.Whenﬁveﬁngertips are detected,theﬁnger-tip tracking process can also be automatically reinitialized.

3.1Initialization and Re-initialization

The task of this stage is to detect allﬁveﬁngertips in the depth image F V bad on three depth-bad features:the geodesic distance,local rectangle feature and GSP points. To estimate the geodesic distance for each point,weﬁrst build a graph G h=(V h,E h)using the point cloud F V as in[8].V h consists of all points within F V.For each pair of vertices(p,q)∈V h,there is an edge between p and q if and only if they are in the8-neighbohood of each other and their3D distance d(p,q)= p−q 2is within thresholdτ. To ensure the resulting graph is connected,we arch for a t of connected components in G h using the union-ﬁnd algorithm.The connected component containing the palm center is identiﬁed and the remaining ones are connected to it byﬁnding their nearest vertices and adding an edge with weights equal to the3D Euclidean distance.We then per-form Dijkstra graph arch on G h to calculate the geodesic distance from the palm center p p for each vertex p∈V h.Let

the geodesic distance of each vertex p be d g(p).The GSP point t U G(p)for p is deﬁned as the t of vertices on the shortest path from p p to p.A rectangle local feature RL(p) is ud to describe the neighborhood of a point p in F V, which is deﬁned as a square of size S centered at p.Each pixel q within RL(p)is binarized according to the following

rule:

I(q)=

1if|z(p)−z(q)|≤z T

0otherwi

,(3.1)

where z T is a threshold value of about1cm.We deﬁneη(p)

as the ratio of the number of points with nonzero values in

RL(p)to the size of RL(p).For a stretched hand,ﬁngertip can only locate where d g is locally maximized in F V and the

points around aﬁngertip take much smaller values ofηthan

other points,say,η≤0.4.Bad on the obrvations,we

detect theﬁngertips using Algorithm1.

Algorithm1Fingertip Detection.

Input:

The border point t,U B;

Output:

The detectedﬁngertip positions,p i f;

1:Preprocess:D C={p|p∈U B,d g(p)>d T,η(p)<ηT}; 2:Label connected components:D C=D1∪D2∪...∪D M; 3:Sort the components according to size and ignore small ones to get D i B,i=1,2,...,M B;

4:Get the number ofﬁngertips M F:if M B<5,M F= M B,otherwi M F=5;

5:p i f=arg max

p∈D i B

d g(p),i=1,2,...,M F;

6:return p i f;

If allﬁveﬁngertips are detected by Algorithm1,each of

them will be given a label l∈L f using a GSP-bad voting

strategy.We do not u the positions of theﬁngertips for

labeling as they show great uncertainty and are not robust

for Fi.1(c).Instead,we u the GSP point

t of each detectedﬁngertip p i f to vote for its label l i.This

voting strategy is inspired by the fact that the2D relative

positions of points near to p p on diﬀerent GSPs remain stable

againstﬁnger bending and global hand transformation.Let

the GSP point ts of theﬁngertips be U i G=U G(p i f)= {p i,k|k=0,1,...,N i},i=1,2,...,5.For eachﬁngertip p i f,a ﬁve element counter arrayΓi,j,j=1,2,...,5is maintained to estimate the probability that p i f has the label l j.Note that a right hand is adopted in our system.Theﬁngertips are labeled using the Algorithm2.

In Fig.2we prent labeling results for veral samples.

We can e this labeling scheme is quite robust to hand

articulation.

3.2Fingertip Tracking

After the initial positions of theﬁveﬁngertips are de-

tected,we build a particleﬁlter for eachﬁngertip to track

their positions through successive frames.Let(x,ω)denote

a particle,where the state hypothesis x is its2D position in

F V andωis the particle weight.The basic idea is to con-strain the positions of each particle to the border point t U B to reduce the arch space,instead of choosing arbitrary positions within the2D space.Let f(y k|x k)be the likeli-hood function with y k reprenting the current obrvation.Algorithm2GSP-bad Voting for Fingertip Labeling. Input:

The GSP point ts of theﬁveﬁngertips,U i G; Output:

The label for eachﬁngertip,l i;

1:Γi,j={0},N max=max{N i},k=0;

2:Extractﬁve points p i,k

,where k r=k×N i/N max;

3:Sort theﬁve points by arranging theﬁve vectors v i d= p i,k r−p p clockwily;

梦想起航4:Let the order number of p i,k

后背脊柱中间疼痛r

be j,Γi,j=Γi,j+1;

5:k=k+1.If k≥N max,go to6,otherwi go to2;

6:l i=arg max

怎样才能变聪明

Γi,j;

7:return l i

;

Figure2:Fingertip Labeling.GSP Points(left). Labeling results(right).

Let k be the frame number of theﬁngertip tracking stage. Eachﬁngertip is tracked using Algorithm3in frame k.

Algorithm3Particle Filter bad Fingertip Tracking. Input:

The initially detectedﬁngertip position,p i f;

很任性歌词

Output:

年贴现率The updated position of theﬁngertip,p i f;

1:If k=0,generate N random particles at positions x j

, j=1,2,...,N around p i f,ωj k−1=1/N;

2:Diﬀu the positions of the particles byﬁnding their nearest neighbor in U B;

3:Estimate f(y k|x j k)and updateωj k;

4:Updateﬁngertip position p i f=arg maxωj

;

5:return p i f;

We now deﬁne the likelihood function f(y k|x k)bad on the geodesic distance d g,the rectangle local feature RL and GSP point t U G:

f(y k|x k)=f(d g,RL,U G|x k)

=f(d g|x k)f(RL|x k)f(U G|x k),(3.2) where we assume d g,RL and U G are conditionally inde-pendent.The three terms in f(y k|x k)all take the form of an exponential function of certain distance metric.In f(d g|x k),the distance metric is deﬁned as the diﬀerence be-tween D i g and d g(x k),where D i g is a pre-deﬁned geodesic distance value for theﬁngertip of label l i.Temporal refer-ence is ud for estimating f(RL|x k)and f(U G|x k)as they change with theﬁnger motions.Let the referenceﬁngertip position be p ref.f(U G|x k)is deﬁned bad on the Hausdorﬀdistance D H(U G(x k),U G(p ref)).f(RL|x k)is deﬁned bad

on the feature distance D RL between RL(x k)and RL p

ref

披萨做法

Table 1:Quantitative results on synthetic quences

Seq.Average Error (cm)No.Thumb Index Middle Ring Pinky Seq.1 2.51 1.53 1.50 1.270.77Seq.2

1.630.930.780.740.69Seq.3 1.340.880.650.840.89Seq.4 2.11 1.15 1.160.840.81Seq.5 1.200.820.750.520.59Seq.6 1.440.930.890.770.86which is deﬁned as the ratio of the number of points with the same values to the size of the rectangle.f (y k |x k )is given by:

f =exp −λ

g d g −D i g

−λh D H −λrl D RL (3.3)4.EXPERIMENTS

4.1

Fingertip Tracking Accuracy

We quantitatively evaluate the ﬁngertip tracking accuracy

on six synthetic quences in terms of the Euclidean distance between the tracked ﬁngertips and the ground truth.As it is diﬃcult to deﬁne the ﬁngertip locations on the skin surface,we deﬁne their ground truth using the phalanx end point of each ﬁnger.Table 1shows the average localization errors in centimeter on all six quences with q.1for grasping motion,q.2for adduction/abduction motio

n,q.3for successive single ﬁnger motion,q.4for ﬂexion motion of two ﬁngers,q.5for global rotation and q.6for combination of grasping and global rotation.Note the localization error partly results from the fact that the ﬁngertips are detected on the skin surface rather than the hand skeleton.

4.2Virtual Object Manipulation

We combine the 3D positions of the ﬁngertips and palm center with an inver kinematics solver to drive a 3D hand model to manipulate virtual objects.Each ﬁnger is modeled as a kinematic chain and the cyclic coordinate descent al-gorithm [12]is ud for inver kinematics estimation of the ﬁnger po.Besides,we build a virtual environment using the Nvidia Physx SDK,which contains a 3D hand model and some virtual objects like boxes and spheres.Urs can u their bare hands to perform some manipulative tasks such as moving,pushing and grasping.A quence of snapshots for virtual object manipulation is shown in Fig.3.

5.CONCLUSION

Fingertip and palm positions are important features fork金和铂金的区别

human-computer interaction.Most previous approaches can-not track the 3D positions of ﬁngertips robustly due to the high ﬂexibility of ﬁnger motion.In this paper,we address the issues by using multiple depth-bad features for ac-curate ﬁngertip localization and adopting a particle ﬁlter to track the ﬁngertips over successive frames.The palm is lo-cated by performing distance transform to the gmented hand contour and tracked with a Kalman ﬁlter.Quanti-tative results on synthetic depth quences and a real-life human-computer interaction application show the propod scheme can track the ﬁngertips accurately and has great po-tential for extension to other HCI applications.

Figure 3:Virtual object grasping.

6.ACKNOWLEDGMENTS

This rearch,which is carried out at BeingThere Centre,is supported by the Singapore National Rearch Founda-tion under its International Rearch Centre @Singapore Funding Initiative and administered by the IDM Programme Oﬃce.

7.REFERENCES

[1] C.-S.Chua,H.Guan,and Y.-K.Ho.Model-bad 3d hand

posture estimation from a single 2d image.Image and Vision Computing ,20(3):191–202,2002.

[2]S.Coni1,S.Bourennane,and L.Martin.Three

dimensional ﬁngertip tracking in stereovision,2005.Proc.of the 7th Int’ Advanced Concepts for Intelligent Vision Systems.

[3]M.Do,T.Asfour,and R.Dillmann.Particle ﬁlter-bad

ﬁngertip tracking with circular hough transform features,2011.Proc.of the 12th Machine Vision Applications.

[4]K.Hsiao,T.Chen,and S.Chien.Fast ﬁngertip positioning

by combining particle ﬁltering with particle random

diﬀusion,2008.Proc.IEEE Int’ Multimedia and Expo.

[5]I.Katz,K.Gabayan,and H.Aghajan.A multi-touch

surface using multiple cameras,2007.Proc.of the 9th Int’ Advanced concepts for intelligent v

ision systems.[6]K.Oka,Y.Sato,and H.Koike.Real-time tracking of

multiple ﬁngertips and gesture recognition for augmented desk interface systems,2002.Proc.IEEE Int’ Automatic Face and Gesture Recognition.

[7]V.I.Pavlovic,R.Sharma,and T.S.Huang.Visual

interpretation of hand gestures for human-computer interaction:A review.PAMI ,19(7):677–695,1997.

[8] C.Plagemann,V.Ganapathi,D.Koller,and S.Thrun.

Real-time identiﬁcation and localization of body parts from depth images,2010.Proc.IEEE Int’ Robotics and Automation.

[9]J.L.Raheja,A.Chaudhary,and K.Singal.Tracking of

ﬁngertips and centres of palm using kinect,2011.Proc.of the 3rd Int’ Computational Intelligence,Modelling and Simulation.

[10]Z.Ren,J.Meng,J.Yuan,and Z.Zhang.Robust hand

gesture recognition with kinect nsor,2011.Proc.of ACM Multimedia.

[11]Z.Ren,J.Yuan,and Z.Zhang.Robust hand gesture

recognition bad on ﬁnger-earth mover distance with a commodity depth camera,2011.Proc.of ACM Multimedia.[12]L.C.T.Wang and C.C.Chen.A combined optimization

method for solving the inver kinematics problem of mechanical manipulators.IEEE Trans.Robotics and Automation ,7(4):489–499,1991.

本文发布于:2023-06-23 15:46:50，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1051470.html

上一篇：python数字图像处理（19）：骨架提取与分水岭算法

下一篇：数字图像处理（二）分水岭算法+python

标签：汽车脊柱起航

留言与评论（共有 0 条评论）