Real Time Head Po Estimation from Consumer Depth Cameras

更新时间:2023-07-18 08:30:38 阅读：评论：0

Real Time Head Po Estimation from

Consumer Depth Cameras

Gabriele Fanelli1,Thibaut Wei2,Juergen Gall1and Luc Van Gool1,3

1ETH Zurich,Switzerland2EPFL Lausanne,Switzerland3KU Leuven,Belgium {fanelli,gall,vangool}@hz.ch,thibaut.wei@epfl.ch简笔画牛

Abstract.We prent a system for estimating location and orientation

of a person’s head,from depth data acquired by a low quality device.Our

approach is bad on discriminative random regression forests:enmbles

of random trees trained by splitting each node so as to simultaneously

reduce the entropy of the class labels distribution and the variance of the

head position and orientation.We evaluate three diﬀerent approaches

to jointly take classiﬁcation and regression performance into account

during training.For evaluation,we acquired a new datat and propo

a method for its automatic annotation.

1Introduction

Head po estimation is a key element of human behavior analysis.For this reason,many applications would beneﬁt from automatic and robust head po estimation systems.While2D video prents ambiguities hard to resolve in real time,systems relying on3D data have shown very good results[5,10].Such approaches,however,u bulky3D scanners like[22]and are not uful for con-sumer products or mobile applications like robots.Today,cheap depth cameras exist,even though they provide much lower quality data.

We prent an approach for real time3D head po estimation robust to the poor signal-to-noi ratio of current consumer depth cameras.The method is inspired by the recent work of[10]that us random regression forests[9]to estimate the3D head po in real time from high quality depth data.It basically learns a mapping between simple depth features and real-valued parameters such as3D head position and rotation angles.The system achieves very good performance and is robust to occlusions but it assumes that the face is the sole object in theﬁeld of view.We extend the regression

forests such that they discriminate depth patches that belong to a head(classiﬁcation)and u only tho patches to predict the po(regression),jointly solving the classiﬁcation and regression problems.In our experiments,we evaluate veral schemes that can be ud to optimize both the discriminative power as well as the regression accuracy of such a random forest.In order to deal with the characteristic noi level of the nsor,we cannot rely on synthetic data as in[10],but we have to acquire real training ,faces captured with a similar nsor.We therefore recorded veral subjects and their head movements,annotating the data by tracking each quence using a personalized template.

2G.Fanelli,T.Wei,J.Gall and L.Van Gool

Our system works on a frame-by-frame basis,needs no initialization,and runs in real time.In our experiments,we show that it can handle large po changes and variations such as facial hair and partial occlusions.

2Related Work

The literature contains veral works on head po estimation,which can be conveniently divided depending on whether they u2D images or depth data.

Among the algorithms bad on2D images,we can further distinguish be-tween appearance-bad methods,which analyze the whole face region,and feature-bad methods,which rely on the localization of speciﬁc facial features, e.g.,the eyes.Examples of appearance-bad methods are[13]and[17],where the head po space is discretized and parate detectors are learned for each gment.Statistical generative ,active appearance models[8]and their variations[7,19,2],are very popular in the face analysisﬁeld,but are rarely employed for head po estimation.Feature-bad methods are limited by their need to either have the same facial features visible across diﬀerent pos,or deﬁne po-dependent features[24,16].In general,all2D image-bad methods suﬀer from veral problems,in particular changes in illumination and identity, and rather textureless regions of the face.

With the recent increasing availability of depth-nsing technologies,a few notable works have shown the ufulness of the depth for solving the problem of head po estimation,either as unique cue[5,10],or in combination with 2D image data[6,20].Breitenstein et al.[5]developed a real time system capa-ble of handling large head po variations.Using high quality depth data,the method relies on the assumption that the no is visible.Real time performance is achieved by using the parallel processing power of a GPU.The approach pro-pod in[10]also relies on high quality depth d

ata,but us random regression forests[9]to estimate the head po,reaching real time performance without the aid of parallel computations on the GPU and without assuming any particular facial feature to be visible.While both[10]and[5]consider the ca where the head is the only object prent in theﬁeld of view,we deal with depth images where other parts of the body might be visible and therefore need to discriminate which image patches belong to the head and which don’t.

Random forests[4]and their variants are very popular in computer vision[18, 11,9,14,12]for their capability of handling large training ts,fast execution time,and high generalization power.In[18,11],random forests have been com-bined with the concept of Hough transform for object detection and action recog-nition.The methods u two objective functions for optimizing the classiﬁca-tion and the Hough voting properties of the random forests.While Gall et al.[11] randomly lect which measure to optimize at each node of the trees,Okada[18] propos a joint objective function deﬁned as a weighted sum of the classiﬁcation and regression measures.In this work,we evaluate veral schemes for integrat-ing two diﬀerent objective functions including linear weighting[18]and random lection[11].

Real Time Head Po Estimation from Consumer Depth Cameras3

(a)(b)

Fig.1.Simple example of Discriminative Regression Forest a):A patch is nt down to two trees,ending up in a non-head leaf in theﬁrst ca,thus not producing a vote, and in a head leaf in th

e cond ca,extracting the multivariate Gaussian distribution stored at the leaf.In b),one training depth image is shown.The blue bounding box enclosing the head speciﬁes where to sample positive(green-inside)and negative patches(red-outside).

3Discriminative Random Regression Forests for Head Po Estimation

Decision trees[3]are powerful tools capable of splitting a hard problem into simpler ones,solvable with trivial predictors,and thus achieving highly non-linear mappings.Each node in a tree performs a test,the result of which directs a data sample towards one of the children nodes.The tests at the nodes are chon in order to cluster the training data as to allow good predictions using simple models.Such models are computed and stored at the leaves,bad on the clusters of annotated data which reach them during training.

Forests of randomly trained trees generalize much better and are less nsitive to overﬁtting than decision trees taken parately[4].Randomness is introduced in the training process,either in the t of training examples provided to each tree,in the t of tests available for optimization at each node,or in both.

范雎说秦王When the task at hand involves both classiﬁcation and regression,we call Discriminative Random Re

爱企业gression Forests(DRRF)an enmble of trees which allows to simultaneously parate test data into whether they reprent part of the object of interest and,only in the positive cas,vote for the desired real valued variables.A simple DRRF is shown in Figure1(a):The tests at the nodes lead a sample to a leaf,where it is classiﬁed.Only if classiﬁed positively,the sample retrieves a Gaussian distribution computed at training time and stored at the leaf,which is ud to cast a vote in a multidimensional continuous space.

Our goal is to estimate the3D position of a head and its orientation from low-quality depth images acquired using a commercial,low-cost nsor.Unlike in[10],the head is not the only part of the person visible in the image,therefore the need to classify image patches before letting them vote for the head po.

4G.Fanelli,T.Wei,J.Gall and L.Van Gool

3.1Training

Assuming a t of depth images is available,together with labels indicating head locations and orientations,we randomly lect patches of ﬁxed size from the region of the image containing the head as positives samples,and from outside the head region as negatives.Figure 1(b)shows one of t

he training images we ud (acquisition and annotation is explained in Section 4),with the head region marked in blue,and examples of a positive and negative patch drawn in green,respectively red.

A tree T in the forest T ={T t }is constructed from the t of patches {P i =(I i ,c i ,θi )}sampled from the training images.I i are the depth patches and c i ∈{0,1}are the class labels.The vector θi ={θx ,θy ,θz ,θya ,θpi ,θro }contains the oﬀt between the 3D point falling on the patch’s center and the head center location,and the Euler rotation angles describing the head orientation.

As in [10],we deﬁne the binary test at a non-leaf node as t F 1,F 2,τ(I ):

|F 1|−1 q ∈F 1I (q )−|F 2|−1 q ∈F 2

I (q )>τ,(1)

where F 1and F 2are rectangular,asymmetric regions deﬁned within the patch and τis a threshold.Such tests can be eﬃciently evaluated using integral images.

During training,for each non-leaf node starting from the root,we generate a large pool of binary tests t k by randomly choosing F 1,F 2,and τ.The test which maximizes a speciﬁc optimization function is

picked;the data is then split using the lected test and the process iterates until a leaf is created when either the maximum tree depth is reached,or less than a certain number of patches are left.Leaves store two kinds of information:The ratio of positive patches that reached them during training p c =1|P and the multivariate Gaussian distribution computed from the po parameters of the positive patches.

For the problem at hand,we need trees able to both classify a patch as be-longing to a head or not and cast preci votes into the spaces spanned by 3D head locations and orientations.This is the main diﬀerence with [10],where the face is assumed to cover most of the image and thus only a regression measure is ud.We thus evaluate the goodness of a split using a classiﬁcation measure U C P t高清动漫图片

k and a regression measure U R P t k :The former tends to p-arate the patches at each node eking to maximize the discriminative power of the tree,the latter favors regression accuracy.

Similar to [11],we employ a classiﬁcation measure which,when maximized,tends to parate the patches so that class uncertainty for a split is minimized:

炒泡面的做法

U C P t k =|P L |· c p c |P L ln p c |P L +|P R |· c p c |P R ln p c |P R |P L |+|P R |

,(2)where p c |P is the ratio of patches belonging to class c ∈{0,1}in the t P .

For what concerns regression,we u the information gain deﬁned by [9]:

U R P t k =H (P )−(w L H (P L )+w R H (P R )),(3)

Real Time Head Po Estimation from Consumer Depth Cameras 5

where H (P )is the diﬀerential entropy of the t P and w i =L,R is the ratio of patches nt to each child node.

Our labels (the vectors θ)are modeled as realizations of a multivariate ,p (θ|L )=N (θ;θ,Σ).Moreover,as in [10],we assume the covariance matrix to be ,we allow covariance only among oﬀt vectors and among head rotation angles,but not between the two.For the reasons,we can rewrite eq.(3)as:

U R P t k =log (|Σv |+|Σa |)−君陈

i ={L,R }w i log (|Σv i |+|Σa i |),(4)

清明节来源

where Σv and Σa are the covariance matrices of the oﬀts and rotation angles (the two diagonal blocks in Σ).Maximizing Eq.(4)minimizes the determinants of the covariance matrices,thus decreasing regression uncertainty.

The two measures (2)and (4)can be combined in diﬀerent ways,and we in-vestigate three diﬀerent approaches.While the method [11]randomly choos be-tween classiﬁcation and regression at each node,the method [18]us a weighted sum of the two measures,deﬁned as:

arg max k

U C +αmax p c =1|P −t p ,0 U R .(5)

女人吃葛根有什么功效In the above equation,p c =1|P reprents the ratio of positive samples

contained in the t,or purity,t p is an activation threshold,and αa constant weight.When maximizing (5),the optimization is steered by the classiﬁcation term alone until the purity of positive patches reaches the threshold t p .From that point on,the regression term starts to play an ever important role.

We propo a third way to combine the two measures by removing the acti-vation threshold from (5)

and using as weight an exponential function:

arg max k U C +(1.0−e −d λ)U R ,(6)where d is the depth of the node.In this way,the regression measure is given increasingly higher weight as we descend towards the leaves,with the parameter λspecifying the steepness of the change.

3.2Head po estimation

For estimating the head po from a depth image,we denly extract patches from the image and pass them through the forest.The tests at the nodes guide each patch all the way to a leaf L ,but not all leaves are to be considered for regression;only if p c =1|P =1and trace (Σ)<max v ,with max v an em-pirical value for the maximum allowed variance,the Gaussian p (θ)is taken into account.As in [10],a stride in the sampling of the patches can be introducted in order to ﬁnd the desired compromi between speed and accuracy of the es-timate.To be able to handle multiple heads and remove outliers,we perform a

本文发布于:2023-07-18 08:30:38，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1086120.html

上一篇：风电场数据特征提取及风电功率实时预测研究

下一篇：幸存者偏差

标签：图片泡面动漫做法

留言与评论（共有 0 条评论）