Kernel-Bad Object Tracking
Dorin Comaniciu Visvanathan Ramesh Peter Meer
Real-Time Vision and Modeling Department
Siemens Corporate Rearch
755College Road East,Princeton,NJ08540
Electrical and Computer Engineering Department
Rutgers University
94Brett Road,Piscataway,NJ08854-8058
Abstract
A new approach toward target reprentation and localization,the central component in visual track-
ing of non-rigid objects,is propod.The feature histogram bad target reprentations are regularized
猪跟狗相配吗by spatial masking with an isotropic kernel.The masking induces spatially-smooth similarity functions
suitable for gradient-bad optimization,hence,the target localization problem can be formulated us-
ing the basin of attraction of the local maxima.We employ a metric derived from the Bhattacharyya
coefficient as similarity measure,and u the mean shift procedure to perform the optimization.In the
prented tracking examples the new method successfully coped with camera motion,partial occlusions,
clutter,and target scale variations.Integration with motionfilters and data association techniques is als
o
discusd.We describe only few of the potential applications:exploitation of background information,
Kalman tracking using motion models,and face tracking.
Keywords:non-rigid object tracking;target localization and reprentation;spatially-smooth sim-ilarity function;Bhattacharyya coefficient;face tracking.
1Introduction
Real-time object tracking is the critical task in many computer vision applications such as surveil-lance[44,16,32],perceptual ur interfaces[10],augmented reality[26],smart rooms[39,75,47], object-bad video compression[11],and driver assistance[34,4].
Two major components can be distinguished in a typical visual tracker.Target Reprenta-tion and Localization is mostly a bottom-up process which has also to cope with the changes in the appearance of the target.Filtering and Data Association is mostly a top-down process dealing with the dynamics of the tracked object,learning of scene priors,and evaluation of different hy-pothes.The way the two components are combined and weighted is application dependent and pla
ys a decisive role in the robustness and efficiency of the tracker.For example,face tracking in
1
a crowded scene relies more on target reprentation than on target dynamics [21],while in aerial video ,[74],the target motion and the ego-motion of the camera are the more important components.In real-time applications only a small percentage of the system resources can be allocated for tracking,the rest being required for the preprocessing stages or to high-level tasks such as recognition,trajectory interpretation,and reasoning.Therefore,it is desirable to keep the computational complexity of a tracker as low as possible.
The most abstract formulation of the filtering and data association process is through the state space approach for modeling discrete-time dynamic systems [5].The information characterizing the target is defined by the state quence
,who evolution in time is specified by
the dynamic equation
.The available measurements are related to
the corresponding states through the measurement equation
.In general,both
and are vector-valued,nonlinear and time-varying functions.Each of the noi quences,
and is assumed to be independent and identically distributed (i.i.d.).
日星隐曜
阿诗玛电影
The objective of tracking is to estimate the state given all the measurements up
that moment,or equivalently to construct the probability density function (pdf)
.The
祝小孩生日快乐的祝福语
晏子使吴
theoretically optimal solution is provided by the recursive Bayesian filter which solves the problem in two steps.The prediction step us the dynamic equation and the already computed pdf of the state at time ,
,to derive the prior pdf of the current state,.
Then,the update step employs the likelihood function of the current measurement to
compute the posterior pdf ).
When the noi quences are Gaussian and and are linear functions,the optimal
solution is provided by the Kalman filter [5,p.56],which yields the posterior being also Gaussian.(We will return to this topic in Section 6.2.)When the functions and are nonlinear,by
linearization the Extended Kalman Filter (EKF)[5,p.106]is obtained,the posterior density being still modeled as Gaussian.A recent alternative to the EKF is the Unscented Kalman Filter (UKF)
[42]which us a t of discretely sampled points to parameterize the mean and covariance of the posterior density.When the state space is discrete and consists of a finite number of states,Hidden Markov Models (HMM)filters [60]can be applied for tracking.The most general class of filters is reprented by particle filters [45],also called bootstrap filters [31],which are bad on Monte Carlo integration methods.The current density of the state is reprented by a t of
2
random samples with associated weights and the new density is computed bad on the samples and weights(e[23,3]for reviews).The UKF can be employed to generate proposal distributions for particlefilters,in which ca thefilter is called Unscented Particle Filter(UPF)[54].
When the tracking is performed in a cluttered environment where multiple targets can be prent[52],problems related to the validation and association of the measurements ari[5, p.150].Gating techniques are ud to validate only measurements who predicted probability of appearance is high.After validation,a strategy is needed to associate the measurements with the current targets.In addition to the Nearest Neighbor Filter,which lects the clost measure-ment,techniques such as Probabilistic Data Association Filter(PDAF)are available for the single targe
t ca.The underlying assumption of the PDAF is that for any given target only one mea-surement is valid,and the other measurements are modeled as random interference,that is,i.i.d. uniformly distributed random variables.The Joint Data Association Filter(JPDAF)[5,p.222], on the other hand,calculates the measurement-to-target association probabilities jointly across all the targets.A different strategy is reprented by the Multiple Hypothesis Filter(MHF)[63,20], [5,p.106]which evaluates the probability that a given target gave ri to a certain measurement quence.The MHF formulation can be adapted to track the modes of the state density[13].The data association problem for multiple target particlefiltering is prented in[62,38].
Thefiltering and association techniques discusd above were applied in computer vision for various tracking scenarios.Boykov and Huttenlocher[9]employed the Kalmanfilter to track vehicles in an adaptive framework.Rosales and Sclaroff[65]ud the Extended Kalman Filter to estimate a3D object trajectory from2D image motion.Particlefiltering wasfirst introduced in vision as the Condensation algorithm by Isard and Blake[40].Probabilistic exclusion for tracking multiple objects was discusd in[51].Wu and Huang developed an algorithm to integrate multiple target clues[76].Li and Chellappa[48]propod simultaneous tracking and verification bad on particlefilters applied to vehicles and faces.Chen et al.[15]ud the Hidden Markov Model formulation for tracking combined
with JPDAF data association.Rui and Chen propod to track the face contour bad on the unscented particlefilter[66].Cham and Rehg[13]applied a variant of MHF forfigure tracking.
The emphasis in this paper is on the other component of tracking:target reprentation and localization.While thefiltering and data association have their roots in control theory,algorithms
3
for target reprentation and localization are specific to images and related to registration methods [72,64,56].Both target localization and registration maximizes a likelihood type function.The difference is that in tracking,as oppod to registration,only small changes are assumed in the location and appearance of the target in two concutive frames.This property can be exploited to develop efficient,gradient bad localization schemes using the normalized correlation criterion [6].Since the correlation is nsitive to illumination,Hager and Belhumeur[33]explicitly mod-eled the geometry and illumination changes.The method was improved by Sclaroff and Isidoro [67]using robust M-estimators.Learning of appearance models by employing a mixture of stable image structure,motion information and an outlier process,was discusd in[41].In a differ-ent approach,Ferrari et al.[26]prented an affine tracker bad on planar regions and anchor points.Tra
cking people,which ris many challenges due to the prence of large3D,non-rigid motion,was extensively analyzed in[36,1,30,73].Explicit tracking approaches of people[69] are time-consuming and often the simpler blob model[75]or adaptive mixture models[53]are also employed.
The main contribution of the paper is to introduce a new framework for efficient tracking of non-rigid objects.We show that by spatially masking the target with an isotropic kernel,a spatially-smooth similarity function can be defined and the target localization problem is then reduced to a arch in the basin of attraction of this function.The smoothness of the similarity function allows application of a gradient optimization method which yields much faster target localization compared with the(optimized)exhaustive arch.The similarity between the target model and the target candidates in the next frame is measured using the metric derived from the Bhattacharyya coefficient.In our ca the Bhattacharyya coefficient has the meaning of a correlation score.The new target reprentation and localization method can be integrated with various motionfilters and data association techniques.We prent tracking experiments in which our method successfully coped with complex camera motion,partial occlusion of the target,prence of significant clutter and large variations in target scale and appearance.We also discuss the integration of background information and Kalmanfilter bad tracking.
The paper is organized as follows.Section2discuss issues of target reprentation and the importance of a spatially-smooth similarity function.Section3introduces the metric derived from the Bhattacharyya coefficient.The optimization algorithm is described in Section4.Experimental results are shown in Section5.Section6prents extensions of the basic algorithm and the new
4
approach is put in the context of computer vision literature in Section7.
2Target Reprentation
To characterize the target,first a feature space is chon.The reference target model is reprented by its pdf in the feature space.For example,the reference model can be chon to be the color
pdf of the target.Without loss of generality the target model can be considered as centered at the spatial location.In the subquent frame a target candidate is defined at location,and
is characterized by the pdf.Both pdf-s are to be estimated from the data.To satisfy the
low computational cost impod by real-time processing discrete ,-bin histograms
should be ud.Thus we have
target model:
target candidate:
The histogram is not the best nonparametric density estimate[68],but it suffices for our purpos. Other discrete density estimates can be also employed.
We will denote by
(1)
a similarity function between and.The function plays the role of a likelihood and its local
maxima in the image indicate the prence of objects in the cond frame having reprentations similar to defined in thefirst frame.If only spectral information is ud to characterize the target,
the similarity function can have large variations for adjacent locations on the image lattice and the spatial information is lost.Tofind the maxima of such functions,gradient-bad optimization pro-cedures are difficult to apply and only an expensive exhaustive arch can be ud.We regularize the similarity function by masking the objects with an isotropic kernel in the spatial domain.When the kernel weights,carrying continuous spatial information,are ud in defining the feature space reprentations,becomes a smooth function in.
5
2.1Target Model
A target is reprented by an ellipsoidal region in the image.To eliminate the influence of different target dimensions,all targets are first normalized to a unit circle.This is achieved by independently rescaling the row and column dimensions with and .
Let
be the normalized pixel locations in the region defined as the target model.
The region is centered at .An isotropic kernel,with a convex and monotonic decreasing kernel
profile 1,assigns smaller weights to pixels farther from the center.Using the weights in-
creas the robustness of the density estimation since the peripheral pixels are the least reliable,being often affected by occlusions (clutter)or interference from the background.
The function
associates to the pixel at location the index of its
bin in the quantized feature space.The probability of the feature in the target model
is then computed as
(2)
电脑远程控制怎么弄
where is the Kronecker delta function.The normalization constant
is derived by imposing the
condition ,from where
教育书籍读后感
(3)
since the summation of delta functions for is equal to one.
2.2
Target Candidates Let be the normalized pixel locations of the target candidate,centered at in the current
frame.The normalization is inherited from the frame containing the target model.Using the same kernel profile
,
but with bandwidth ,the probability of the feature in the target
candidate is given by
拥挤的公交车
(4)
1The profile of a kernel is defined as a function such that .
6