Speeded-Up Robust Features (SURF)

更新时间:2023-07-31 20:55:33 阅读: 评论:0

Speeded-Up Robust Features (SURF)
Herbert Bay a ,Andreas Ess
a,*
,Tinne Tuytelaars b ,Luc Van Gool精品笑话
a,b
a ETH Zurich,BIWI,Sternwartstras 7,CH-8092Zurich,Switzerland b
K.U.Leuven,ESAT-PSI,Kasteelpark Arenberg 10,B-3001Leuven,Belgium
Received 31October 2006;accepted 5September 2007
Available online 15December 2007
Abstract
This article prents a novel scale-and rotation-invariant detector and descriptor,coined SURF (Spee
ded-Up Robust Features).SURF approximates or even outperforms previously propod schemes with respect to repeatability,distinctiveness,and robustness,yet can be computed and compared much faster.
This is achieved by relying on integral images for image convolutions;by building on the strengths of the leading existing detectors and descriptors (specifically,using a Hessian matrix-bad measure for the detector,and a distribution-bad descriptor);and by sim-plifying the methods to the esntial.This leads to a combination of novel detection,description,and matching steps.
The paper encompass a detailed description of the detector and descriptor and then explores the effects of the most important param-eters.We conclude the article with SURF’s application to two challenging,yet conver goals:camera calibration as a special ca of image registration,and object recognition.Our experiments underline SURF’s ufulness in a broad range of topics in computer vision.Ó2007Elvier Inc.All rights rerved.
Keywords:Interest points;Local features;Feature description;Camera calibration;Object recognition
1.Introduction
The task of finding point correspondences between two images of the same scene or object is part of many com-puter vision applications.Image registration,camera cali-bration,object recognition,and image retrieval are just a few.
The arch for discrete image point correspondences can be divided into three main steps.First,‘interest points’are lected at distinctive locations in the image,such as cor-ners,blobs,and T-junctions.The most valuable property of an interest point detector is its repeatability.The repeat-ability express the reliability of a detector for finding the same physical interest points under different viewing condi-tions.Next,the neighbourhood of every interest point is reprented by a feature vector.This descriptor has to be distinctive and at the same time robust to noi,detection
displacements and geometric and photometric deforma-tions.Finally,the descriptor vectors are matched between different images.The matching is bad on a distance between the he Mahalanobis or Euclidean dis-tance.The dimension of the descriptor has a direct impact on the time this takes,and less dimensions are desirable for fast interest point matching.However,lower dimensional feature vectors are in general less distinctive than their high-dimensional counterparts.
It has been our goal to develop both a detector and descriptor that,in comparison to the state-of-the-art,are fast to compute while not sacrificing performance.In order to succeed,one has to strike a balance between the above requirements like simplifying the detection scheme while keeping it accurate,and reducing the descriptor’s size while keeping it sufficiently distinctive.
A wide variety of detectors and descriptors have already been propod in the literature (e.g.[21,24,27,39,25]).Also,detailed comparisons and evaluations on benchmarking datats have been performed [28,30,31].Our fast detector and descriptor,called SURF (Speeded-Up Robust
1077-3142/$-e front matter Ó2007Elvier Inc.All rights rerved.doi:10.1016/j.cviu.2007.09.014
*
Corresponding author.
E-mail address:hz.ch (A.Ess).
/locate/cviu
Available online at
Computer Vision and Image Understanding 110(2008)
346–359
Features),was introduced in[4].It is built on the insights gained from this previous work.In our experiments on the benchmarking datats,SURF’s detector and descriptor are not only faster,but the former is also more repeatable and the latter more distinctive.
We focus on scale and in-plane rotation-invariant detec-tors and descriptors.The em to offer a good compromi between feature complexity and robustness to commonly occurring photometric deformations.Skew,anisotropic scaling,and perspective effects are assumed to be cond order effects,that are covered to some degree by the overall robustness of the descriptor.Note that the descriptor can be extended towards affine-invariant regions using affine normalisation of the ellip(cf.[31]),although this will have an impact on the computation time.Extending the detector, on the other hand,is less straightforward.Concerning the photometric deformations,we assume a simple linear model with a bias(offt)and contrast change(scale factor).Nei-ther detector nor descriptor u colour information.
The article is structured as follows.In Section2,we give a review over previous work in interest point detection and description.In Section3,we describe the strategy applied for fast and robust interest point detection.The input image is analyd at different scales in order to guarantee invariance to scale changes.The detected interest points are provided with a rotation and scale-invariant descripto
r in Section4.Furthermore,a simple and efficientfirst-line indexing technique,bad on the contrast of the interest point with its surrounding,is propod.
In Section5,some of the available parameters and their effects are discusd,including the benefits of an upright version(not invariant to image rotation).We also investi-gate SURF’s performance in two important application scenarios.First,we consider a special ca of image regis-tration,namely the problem of camera calibration for3D reconstruction.Second,we will explore SURF’s applica-tion to an object recognition experiment.Both applications highlight SURF’s benefits in terms of speed and robustness as oppod to other strategies.The article is concluded in Section6.
野生兰草2.Related work
2.1.Interest point detection
The most widely ud detector is probably the Harris corner detector[15],propod back in1988.It is bad on the eigenvalues of the cond moment matrix.However, Harris corners are not scale invariant.Lindeberg[21]intro-duced the concept of automatic scale lection.This allows to detect interest points in an image,each with their own characteristic scale.He experimented with both the deter-minant of the Hessian matrix as well as the Laplacian (which corresponds to the trace of the H
essian matrix)to detect blob-like structures.Mikolajczyk and Schmid[26] refined this method,creating robust and scale-invariant feature detectors with high repeatability,which they coined Harris-Laplace and Hessian-Laplace.They ud a(scale-adapted)Harris measure or the determinant of the Hessian matrix to lect the location,and the Laplacian to lect the scale.Focusing on speed,Lowe[23]propod to approxi-mate the Laplacian of Gaussians(LoG)by a Difference of Gaussians(DoG)filter.
Several other scale-invariant interest point detectors have been propod.Examples are the salient region detec-tor,propod by Kadir and Brady[17],which maximis the entropy within the region,and the edge-bad region detector propod by Jurie and Schmid[16].They em less amenable to acceleration though.Also veral affine-invari-ant feature detectors have been propod that can cope with wider viewpoint changes.However,the fall outside the scope of this article.
From studying the existing detectors and from published comparisons[29,30],we can conclude that Hessian-bad detectors are more stable and repeatable than their Harris-bad counterparts.Moreover,using the determinant of the Hessian matrix rather than its trace(the Laplacian) ems advantageous,as itfires less on elongated,ill-localid structures.We also obrved that approximations like the DoG can bring speed at a low cost in terms of lost accuracy.
2.2.Interest point description
An even larger variety of feature descriptors has been propod,like Gaussian derivatives[11],moment invari-ants[32],complex features[1],steerablefilters[12], pha-bad local features[6],and descriptors reprenting the distribution of smaller-scale features within the interest point neighbourhood.The latter,introduced by Lowe[24], have been shown to outperform the others[28].This can be explained by the fact that they capture a substantial amount of information about the spatial intensity patterns, while at the same time being robust to small deformations or localisation errors.The descriptor in[24],called SIFT for short,computes a histogram of local oriented gradients around the interest point and stores the bins in a128D vec-tor(8orientation bins for each of4Â4location bins).
Various refinements on this basic scheme have been pro-pod.Ke and Sukthankar[18]applied PCA on the gradi-ent image around the detected interest point.This PCA-SIFT yields a36D descriptor which is fast for matching, but proved to be less distinctive than SIFT in a cond comparative study by Mikolajczyk and Schmid[30];and applying PCA slows down feature computation.In the same paper[30],the authors propod a variant of SIFT, called GLOH,which proved to be even more distinctive with the same number of dimensions.However,GLOH is computationally more expensive
as it us again PCA for data compression.
The SIFT descriptor still ems the most appealing descriptor for practical us,and hence also the most widely ud nowadays.It is distinctive and relatively fast, which is crucial for on-line applications.Recently,Se et al.[37]implemented SIFT on a Field Programmable
H.Bay et al./Computer Vision and Image Understanding110(2008)346–359347
丢丢铜Gate Array(FPGA)and improved its speed by an order of magnitude.Meanwhile,Grabner et al.[14]also ud inte-gral images to approximate SIFT.Their detection step is bad on difference-of-mean(without interpolation),their description step on integral histograms.They achieve about the same speed as we do(though the description step is constant in speed),but at the cost of reduced quality compared to SIFT.Generally,the high dimensionality of the descriptor is a drawback of SIFT at the matching step. For on-line applications relying only on a regular PC,each one of the three steps(detection,description,matching)has to be fast.
An entire body of work is available on speeding up the matching step.All of them come at the expen of getting an approximative matching.Methods include the best-bin-first propod by Lowe[24],balltrees[35],vocabulary trees[34],locality nsitive hashing[9],or redundant bit vectors[13].
Complementary to this,we suggest the u of the Hessian matrix’s trace to significantly increa the matching speed.Together with the descriptor’s low dimen-sionality,any matching algorithm is bound to perform faster.
3.Interest point detection
Our approach for interest point detection us a very basic Hessian matrix approximation.This lends itlf to the u of integral images as made popular by Viola and Jones[41],which reduces the computation time drastically. Integral imagesfit in the more general framework of box-lets,as propod by Simard et al.[38].
3.1.Integral images
In order to make the article more lf-contained,we briefly discuss the concept of integral images.They allow for fast computation of box type convolutionfilters.The entry of an integral image I RðxÞat a location x¼ðx;yÞT reprents the sum of all pixels in the input image I within a rectangular region formed by the origin and x.
I RðxÞ¼
X i6x
i¼0X j6y
j¼0高波级驱逐舰
Iði;jÞð1Þ
Once the integral image has been computed,it takes three additions to calculate the sum of the intensities over any upright,rectangular area(e Fig.1).Hence,the calcu-lation time is independent of its size.This is important in our approach,as we u bigfilter sizes.
3.2.Hessian matrix-bad interest points
We ba our detector on the Hessian matrix becau of its good performance in accuracy.More precily,we detect blob-like structures at locations where the determi-nant is maximum.In contrast to the Hessian-Laplace detector by Mikolajczyk and Schmid[26],we rely on the determinant of the Hessian also for the scale lection,as done by Lindeberg[21].
Given a point x¼ðx;yÞin an image I,the Hessian matrix Hðx;rÞin x at scale r is defined as follows
Hðx;rÞ¼
L xxðx;rÞL xyðx;rÞ
L xyðx;rÞL yyðx;rÞ
;ð2Þ
where L xxðx;rÞis the convolution of the Gaussian cond
order derivative o2
2
gðrÞwith the image I in point x,and similarly for L xyðx;rÞand L yyðx;rÞ.
Gaussians are optimal for scale-space analysis[19,20], but in practice they have to be discretid and cropped (Fig.2,left half).This leads to a loss in repeatability under image rotations around odd multiples of p.This weakness holds for Hessian-bad detectors in general. Fig.3shows the repeatability rate of two detectors bad on the Hessian matrix for pure image rotation. The repeatability attains a maximum around multiples of p
2
.This is due to the square shape of thefilter.Nev-ertheless,the detectors still perform well,and the slight decrea in performance does not outweigh the advan-tage of fast convolutions brought by the discretisation and cropping.As realfilters are non-ideal in any ca, and given Lowe’s success with his LoG approximations, we push the approximation for the Hessian matrix even further with boxfilters(in the right half of Fig.2). The approximate cond order Gaussian derivatives and can be evaluated at a very low computational
cost Fig.1.Using integral images,it takes only three additions and four memory access to calculate the sum of intensities inside a rectangular region of any时事政治热点
size.
Fig.2.Left to right:The(discretid and cropped)Gaussian cond order partial derivative in y-(L yy)and xy-direction(L xy),respectively;our approximation for the cond order Gaussian partial derivative in y-(D yy) and xy-direction(D xy).The grey regions are equal to zero.
348H.Bay et al./Computer Vision and Image Understanding110(2008)346–359
using integral images.The calculation time therefore is independent of thefilter size.As shown in Section5 and Fig.3,the performance is comparable or better than with the discretid and cropped Gaussians.
The9Â9boxfilters in Fig.2are approximations of a Gaussian with r¼1:2and reprent the lowest highest spatial resolution)for computing the blob respon maps.We will denote them by D xx,D yy,and D xy.The weights applied to the rectangular regions are kept simple for computational efficiency.This yields
detðH approxÞ¼D xx D yyÀðwD xyÞ2:ð3ÞThe relative weight w of thefilter respons is ud to bal-ance the expression for the Hessian’s determinant.This is needed for the energy conrvation between the Gaussian kernels and the approximated Gaussian kernels,
w¼j L xyð1:2Þj
F
j D yyð9Þj
F
j L yyð1:2Þj
F
j D xyð9Þj
F
¼0:912:::’0:9;ð4Þ
where j x j
F is the Frobenius norm.Notice that for theoret-
ical correctness,the weighting changes depending on the scale.In practice,we keep this factor constant,as this did not have a significant impact on the results in our experiments.
运动会通讯稿100字左右
Furthermore,thefilter respons are normalid with respect to their size.This guarantees a constant Frobenius norm for anyfilter size,an important aspect for the scale space analysis as discusd in the next ction.
The approximated determinant of the Hessian repre-nts the blob respon in the image at location x.The respons are stored in a blob respon map over different scales,and local maxima are detected as explained in Sec-tion3.4.3.3.Scale space reprentation
Interest points need to be found at different scales,not least becau the arch of correspondences often requires their comparison in images where they are en at different scales.Scale spaces are usually implemented as an image pyramid.The images are repeatedly smoothed with a Gaussian and then sub-sampled in order to achieve a higher level of the pyramid.Lowe[24]subtracts the pyr-
amid layers in order to get the DoG(Difference of Gaussi-ans)images where edges and blobs can be found.
宝宝大便有粘液Due to the u of boxfilters and integral images,we do not have to iteratively apply the samefilter to the output of a previouslyfiltered layer,but instead can apply boxfilters of any size at exactly the same speed directly on the original image and even in parallel(although the latter is not exploited here).Therefore,the scale space is analyd by up-scaling thefilter size rather than iteratively reducing the image size,Fig.4.The output of the9Â9filter,intro-duced in previous ction,is considered as the initial scale layer,to which we will refer as scale s¼1:2(approximating Gaussian derivatives with r¼1:2).The following layers are obtained byfiltering the image with gradually bigger masks,taking into account the discrete nature of integral images and the specific structure of ourfilters.
Note that our main motivation for this type of sampling is its computational efficiency.Furthermore,as we do not have to downsample the image,there is no aliasing.On the downside,boxfilters prerve high-frequency compo-nents that can get lost in zoomed-out variants of the same scene,which can limit scale-invariance.This was however not noticeable in our experiments.
The scale space is divided into octaves.An octave repre-nts a ries offilter respon maps obtained by convolv-ing the same input image with afilter of increasing size.In total,an octave encompass a scaling factor of2(which implies that one needs to more than double thefilter size, e below).Each octave is subdivided into a constant num-ber of scale levels.Due to the discrete nature of integral images,the minimum scale difference between two sub-quent scales depends on the length l0of the positive or neg-ative lobes of the partial cond order derivative in the direction of derivation(x or y),which is t to a third of thefilter size length.For the9Â9filter,this length l0is 3.For two successive levels,we must increa this size by
Fig.3.Top:Repeatability score for image rotation of up to180°.Hessian-
bad detectors have in general a lower repeatability score for angles
Fig.4.Instead of iteratively reducing the image size(left),the u of
integral images allows the up-scaling of thefilter at constant cost(right).
H.Bay et al./Computer Vision and Image Understanding110(2008)346–359349
a minimum of 2pixels (1pixel on every side)in order to keep the size uneven and thus ensure the prence of the central pixel.This results in a total increa of the mask size by 6pixels (e Fig.5).Note that for dimensions different from l he width of the central band for the vertical filter in Fig.5),rescaling the mask introduces rounding-offerrors.However,since the errors are typically much smaller than l 0,this is an acceptable approximation.
The construction of the scale space starts with the 9Â9filter,which calculates the blob respon of the image for the smallest scale.Then,filters with sizes 15Â15,21Â21,and 27Â27are applied,by which even more than a scale change of two has been achieved.But this is needed,as a 3D non-maximum suppression is applied both spa-tially and over the neighbouring scales.Hence,the first and last Hessian respon maps in the stack cannot contain such maxima themlves,as they are ud for reasons of comparison only.Therefore,after interpolation,e Sec-tion 3.4,the smallest possible scale is r ¼1:6¼1:2129
corre-sponding to a filter size of 12Â12,and the highest to r ¼3:2¼1:224.For more details,we refer to [2].Similar considerations hold for the other octaves.For each new octave,the filter size increa is doubled (going from 6–12to 24–48).At the same time,the sampling inter-vals for the extraction of the interest points can be doubled as well for every new octave.This reduces the computation time and the loss in accuracy is comparable to the image sub-sampling of the traditional approaches.The filter sizes for the cond octave are 15,27,39,51.A third octave is com-puted with the filter sizes 27,51,75,99and,if the original image size is still larger than the corresponding filter sizes,the scale space analysis is performed for a fourth octave,
using the filter sizes 51,99,147,and 195.Fig.6gives an over-view of the filter sizes for the first three octaves.Further octaves can be computed in a similar way.In typical scale-space analysis however,the number of detected interest points per octave decays very quickly,cf.Fig.7.
The large scale changes,especially between the first fil-ters within the octaves (from 9to 15is a change of 1.7),renders the sampling of scales quite crude.Therefore,we have also implemented a scale space with a finer sam-pling of the scales.This computes the integral image on the image up-scaled by a factor of 2,and then starts the first octave by filtering with a filter of size 15.Additional filter sizes are 21,27,33,and 39.Then a cond octave starts,again using filters which now increa t
歌曲《父亲》歌词heir sizes by 12pixels,after which a third and fourth octave follow.Now the scale change between the first two filters is only 1.4(21/15).The lowest scale for the accurate version that can be detected through quadratic interpolation is s ¼ð1:2189
Þ=2¼1:2.As the Frobenius norm remains constant for our filters at any size,they are already scale normalid,and no fur-ther weighting of the filter respon is required,for more information on that topic,e [22].3.4.Interest point localisation
In order to locali interest points in the image and over scales,a non-maximum suppression in a 3Â3Â3neigh-bourhood is applied.Specifically,we u a fast variant introduced by Neubeck and Van Gool [33].The maxima of the determinant of the Hessian matrix are then interpo-lated in scale and image space with the method propod by Brown and Lowe [5].
Scale space interpolation is especially important in our ca,as the difference in scale between the first layers of every octave is relatively large.Fig.8shows an example of the detected interest points using our ‘Fast-Hessian’detector.4.Interest point description and matching
Our descriptor describes the distribution of the intensity content within the interest point neighbourhood,similar
to
Fig.5.Filters D yy (top)and D xy (bottom)for two successive scale levels (9Â9and 15Â15).The length of the dark lobe can only be incread by an even number of pixels in order to guarantee the prence of a central pixel
(top).
Fig.6.Graphical reprentation of the filter side lengths for three different octaves.The logarithmic horizontal axis reprents the scales.Note that the octaves are overlapping in order to cover all possible scales amlessly.
350H.Bay et al./Computer Vision and Image Understanding 110(2008)346–359

本文发布于:2023-07-31 20:55:33,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/89/1103488.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:父亲   精品   驱逐舰
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图