首页 > 美文鉴赏

Speeded-Up Robust Features (SURF)

更新时间:2023-07-31 20:55:33 阅读：评论：0

Herbert Bay a ,Andreas Ess

a,*

,Tinne Tuytelaars b ,Luc Van Gool精品笑话

a,b

a ETH Zurich,BIWI,Sternwartstras 7,CH-8092Zurich,Switzerland b

K.U.Leuven,ESAT-PSI,Kasteelpark Arenberg 10,B-3001Leuven,Belgium

Received 31October 2006;accepted 5September 2007

Available online 15December 2007

Abstract

This article prents a novel scale-and rotation-invariant detector and descriptor,coined SURF (Spee

ded-Up Robust Features).SURF approximates or even outperforms previously propod schemes with respect to repeatability,distinctiveness,and robustness,yet can be computed and compared much faster.

This is achieved by relying on integral images for image convolutions;by building on the strengths of the leading existing detectors and descriptors (speciﬁcally,using a Hessian matrix-bad measure for the detector,and a distribution-bad descriptor);and by sim-plifying the methods to the esntial.This leads to a combination of novel detection,description,and matching steps.

The paper encompass a detailed description of the detector and descriptor and then explores the eﬀects of the most important param-eters.We conclude the article with SURF’s application to two challenging,yet conver goals:camera calibration as a special ca of image registration,and object recognition.Our experiments underline SURF’s ufulness in a broad range of topics in computer vision.Ó2007Elvier Inc.All rights rerved.

Keywords:Interest points;Local features;Feature description;Camera calibration;Object recognition

1.Introduction

The task of ﬁnding point correspondences between two images of the same scene or object is part of many com-puter vision applications.Image registration,camera cali-bration,object recognition,and image retrieval are just a few.

The arch for discrete image point correspondences can be divided into three main steps.First,‘interest points’are lected at distinctive locations in the image,such as cor-ners,blobs,and T-junctions.The most valuable property of an interest point detector is its repeatability.The repeat-ability express the reliability of a detector for ﬁnding the same physical interest points under diﬀerent viewing condi-tions.Next,the neighbourhood of every interest point is reprented by a feature vector.This descriptor has to be distinctive and at the same time robust to noi,detection

displacements and geometric and photometric deforma-tions.Finally,the descriptor vectors are matched between diﬀerent images.The matching is bad on a distance between the he Mahalanobis or Euclidean dis-tance.The dimension of the descriptor has a direct impact on the time this takes,and less dimensions are desirable for fast interest point matching.However,lower dimensional feature vectors are in general less distinctive than their high-dimensional counterparts.

It has been our goal to develop both a detector and descriptor that,in comparison to the state-of-the-art,are fast to compute while not sacriﬁcing performance.In order to succeed,one has to strike a balance between the above requirements like simplifying the detection scheme while keeping it accurate,and reducing the descriptor’s size while keeping it suﬃciently distinctive.

A wide variety of detectors and descriptors have already been propod in the literature (e.g.[21,24,27,39,25]).Also,detailed comparisons and evaluations on benchmarking datats have been performed [28,30,31].Our fast detector and descriptor,called SURF (Speeded-Up Robust

1077-3142/$-e front matter Ó2007Elvier Inc.All rights rerved.doi:10.1016/j.cviu.2007.09.014

Corresponding author.

E-mail address:hz.ch (A.Ess).

/locate/cviu

Available online at

Computer Vision and Image Understanding 110(2008)

346–359

Features),was introduced in[4].It is built on the insights gained from this previous work.In our experiments on the benchmarking datats,SURF’s detector and descriptor are not only faster,but the former is also more repeatable and the latter more distinctive.

We focus on scale and in-plane rotation-invariant detec-tors and descriptors.The em to oﬀer a good compromi between feature complexity and robustness to commonly occurring photometric deformations.Skew,anisotropic scaling,and perspective eﬀects are assumed to be cond order eﬀects,that are covered to some degree by the overall robustness of the descriptor.Note that the descriptor can be extended towards aﬃne-invariant regions using aﬃne normalisation of the ellip(cf.[31]),although this will have an impact on the computation time.Extending the detector, on the other hand,is less straightforward.Concerning the photometric deformations,we assume a simple linear model with a bias(oﬀt)and contrast change(scale factor).Nei-ther detector nor descriptor u colour information.

The article is structured as follows.In Section2,we give a review over previous work in interest point detection and description.In Section3,we describe the strategy applied for fast and robust interest point detection.The input image is analyd at diﬀerent scales in order to guarantee invariance to scale changes.The detected interest points are provided with a rotation and scale-invariant descripto

r in Section4.Furthermore,a simple and eﬃcientﬁrst-line indexing technique,bad on the contrast of the interest point with its surrounding,is propod.

In Section5,some of the available parameters and their eﬀects are discusd,including the beneﬁts of an upright version(not invariant to image rotation).We also investi-gate SURF’s performance in two important application scenarios.First,we consider a special ca of image regis-tration,namely the problem of camera calibration for3D reconstruction.Second,we will explore SURF’s applica-tion to an object recognition experiment.Both applications highlight SURF’s beneﬁts in terms of speed and robustness as oppod to other strategies.The article is concluded in Section6.

野生兰草2.Related work

2.1.Interest point detection

The most widely ud detector is probably the Harris corner detector[15],propod back in1988.It is bad on the eigenvalues of the cond moment matrix.However, Harris corners are not scale invariant.Lindeberg[21]intro-duced the concept of automatic scale lection.This allows to detect interest points in an image,each with their own characteristic scale.He experimented with both the deter-minant of the Hessian matrix as well as the Laplacian (which corresponds to the trace of the H

essian matrix)to detect blob-like structures.Mikolajczyk and Schmid[26] reﬁned this method,creating robust and scale-invariant feature detectors with high repeatability,which they coined Harris-Laplace and Hessian-Laplace.They ud a(scale-adapted)Harris measure or the determinant of the Hessian matrix to lect the location,and the Laplacian to lect the scale.Focusing on speed,Lowe[23]propod to approxi-mate the Laplacian of Gaussians(LoG)by a Diﬀerence of Gaussians(DoG)ﬁlter.

Several other scale-invariant interest point detectors have been propod.Examples are the salient region detec-tor,propod by Kadir and Brady[17],which maximis the entropy within the region,and the edge-bad region detector propod by Jurie and Schmid[16].They em less amenable to acceleration though.Also veral aﬃne-invari-ant feature detectors have been propod that can cope with wider viewpoint changes.However,the fall outside the scope of this article.

From studying the existing detectors and from published comparisons[29,30],we can conclude that Hessian-bad detectors are more stable and repeatable than their Harris-bad counterparts.Moreover,using the determinant of the Hessian matrix rather than its trace(the Laplacian) ems advantageous,as itﬁres less on elongated,ill-localid structures.We also obrved that approximations like the DoG can bring speed at a low cost in terms of lost accuracy.

2.2.Interest point description

An even larger variety of feature descriptors has been propod,like Gaussian derivatives[11],moment invari-ants[32],complex features[1],steerableﬁlters[12], pha-bad local features[6],and descriptors reprenting the distribution of smaller-scale features within the interest point neighbourhood.The latter,introduced by Lowe[24], have been shown to outperform the others[28].This can be explained by the fact that they capture a substantial amount of information about the spatial intensity patterns, while at the same time being robust to small deformations or localisation errors.The descriptor in[24],called SIFT for short,computes a histogram of local oriented gradients around the interest point and stores the bins in a128D vec-tor(8orientation bins for each of4Â4location bins).

Various reﬁnements on this basic scheme have been pro-pod.Ke and Sukthankar[18]applied PCA on the gradi-ent image around the detected interest point.This PCA-SIFT yields a36D descriptor which is fast for matching, but proved to be less distinctive than SIFT in a cond comparative study by Mikolajczyk and Schmid[30];and applying PCA slows down feature computation.In the same paper[30],the authors propod a variant of SIFT, called GLOH,which proved to be even more distinctive with the same number of dimensions.However,GLOH is computationally more expensive

as it us again PCA for data compression.

The SIFT descriptor still ems the most appealing descriptor for practical us,and hence also the most widely ud nowadays.It is distinctive and relatively fast, which is crucial for on-line applications.Recently,Se et al.[37]implemented SIFT on a Field Programmable

H.Bay et al./Computer Vision and Image Understanding110(2008)346–359347

丢丢铜Gate Array(FPGA)and improved its speed by an order of magnitude.Meanwhile,Grabner et al.[14]also ud inte-gral images to approximate SIFT.Their detection step is bad on diﬀerence-of-mean(without interpolation),their description step on integral histograms.They achieve about the same speed as we do(though the description step is constant in speed),but at the cost of reduced quality compared to SIFT.Generally,the high dimensionality of the descriptor is a drawback of SIFT at the matching step. For on-line applications relying only on a regular PC,each one of the three steps(detection,description,matching)has to be fast.

An entire body of work is available on speeding up the matching step.All of them come at the expen of getting an approximative matching.Methods include the best-bin-ﬁrst propod by Lowe[24],balltrees[35],vocabulary trees[34],locality nsitive hashing[9],or redundant bit vectors[13].

Complementary to this,we suggest the u of the Hessian matrix’s trace to signiﬁcantly increa the matching speed.Together with the descriptor’s low dimen-sionality,any matching algorithm is bound to perform faster.

3.Interest point detection

Our approach for interest point detection us a very basic Hessian matrix approximation.This lends itlf to the u of integral images as made popular by Viola and Jones[41],which reduces the computation time drastically. Integral imagesﬁt in the more general framework of box-lets,as propod by Simard et al.[38].

3.1.Integral images

In order to make the article more lf-contained,we brieﬂy discuss the concept of integral images.They allow for fast computation of box type convolutionﬁlters.The entry of an integral image I RðxÞat a location x¼ðx;yÞT reprents the sum of all pixels in the input image I within a rectangular region formed by the origin and x.

I RðxÞ¼

X i6x

i¼0X j6y

j¼0高波级驱逐舰

Iði;jÞð1Þ

Once the integral image has been computed,it takes three additions to calculate the sum of the intensities over any upright,rectangular area(e Fig.1).Hence,the calcu-lation time is independent of its size.This is important in our approach,as we u bigﬁlter sizes.

3.2.Hessian matrix-bad interest points

We ba our detector on the Hessian matrix becau of its good performance in accuracy.More precily,we detect blob-like structures at locations where the determi-nant is maximum.In contrast to the Hessian-Laplace detector by Mikolajczyk and Schmid[26],we rely on the determinant of the Hessian also for the scale lection,as done by Lindeberg[21].

Given a point x¼ðx;yÞin an image I,the Hessian matrix Hðx;rÞin x at scale r is deﬁned as follows

Hðx;rÞ¼

L xxðx;rÞL xyðx;rÞ

L xyðx;rÞL yyðx;rÞ

;ð2Þ

where L xxðx;rÞis the convolution of the Gaussian cond

order derivative o2

gðrÞwith the image I in point x,and similarly for L xyðx;rÞand L yyðx;rÞ.

Gaussians are optimal for scale-space analysis[19,20], but in practice they have to be discretid and cropped (Fig.2,left half).This leads to a loss in repeatability under image rotations around odd multiples of p.This weakness holds for Hessian-bad detectors in general. Fig.3shows the repeatability rate of two detectors bad on the Hessian matrix for pure image rotation. The repeatability attains a maximum around multiples of p

.This is due to the square shape of theﬁlter.Nev-ertheless,the detectors still perform well,and the slight decrea in performance does not outweigh the advan-tage of fast convolutions brought by the discretisation and cropping.As realﬁlters are non-ideal in any ca, and given Lowe’s success with his LoG approximations, we push the approximation for the Hessian matrix even further with boxﬁlters(in the right half of Fig.2). The approximate cond order Gaussian derivatives and can be evaluated at a very low computational

cost Fig.1.Using integral images,it takes only three additions and four memory access to calculate the sum of intensities inside a rectangular region of any时事政治热点

size.

Fig.2.Left to right:The(discretid and cropped)Gaussian cond order partial derivative in y-(L yy)and xy-direction(L xy),respectively;our approximation for the cond order Gaussian partial derivative in y-(D yy) and xy-direction(D xy).The grey regions are equal to zero.

348H.Bay et al./Computer Vision and Image Understanding110(2008)346–359

using integral images.The calculation time therefore is independent of theﬁlter size.As shown in Section5 and Fig.3,the performance is comparable or better than with the discretid and cropped Gaussians.

The9Â9boxﬁlters in Fig.2are approximations of a Gaussian with r¼1:2and reprent the lowest highest spatial resolution)for computing the blob respon maps.We will denote them by D xx,D yy,and D xy.The weights applied to the rectangular regions are kept simple for computational eﬃciency.This yields

detðH approxÞ¼D xx D yyÀðwD xyÞ2:ð3ÞThe relative weight w of theﬁlter respons is ud to bal-ance the expression for the Hessian’s determinant.This is needed for the energy conrvation between the Gaussian kernels and the approximated Gaussian kernels,

w¼j L xyð1:2Þj

j D yyð9Þj

j L yyð1:2Þj

j D xyð9Þj

¼0:912:::’0:9;ð4Þ

where j x j

F is the Frobenius norm.Notice that for theoret-

ical correctness,the weighting changes depending on the scale.In practice,we keep this factor constant,as this did not have a signiﬁcant impact on the results in our experiments.

运动会通讯稿100字左右

Furthermore,theﬁlter respons are normalid with respect to their size.This guarantees a constant Frobenius norm for anyﬁlter size,an important aspect for the scale space analysis as discusd in the next ction.

The approximated determinant of the Hessian repre-nts the blob respon in the image at location x.The respons are stored in a blob respon map over diﬀerent scales,and local maxima are detected as explained in Sec-tion3.4.3.3.Scale space reprentation

Interest points need to be found at diﬀerent scales,not least becau the arch of correspondences often requires their comparison in images where they are en at diﬀerent scales.Scale spaces are usually implemented as an image pyramid.The images are repeatedly smoothed with a Gaussian and then sub-sampled in order to achieve a higher level of the pyramid.Lowe[24]subtracts the pyr-

amid layers in order to get the DoG(Diﬀerence of Gaussi-ans)images where edges and blobs can be found.

宝宝大便有粘液Due to the u of boxﬁlters and integral images,we do not have to iteratively apply the sameﬁlter to the output of a previouslyﬁltered layer,but instead can apply boxﬁlters of any size at exactly the same speed directly on the original image and even in parallel(although the latter is not exploited here).Therefore,the scale space is analyd by up-scaling theﬁlter size rather than iteratively reducing the image size,Fig.4.The output of the9Â9ﬁlter,intro-duced in previous ction,is considered as the initial scale layer,to which we will refer as scale s¼1:2(approximating Gaussian derivatives with r¼1:2).The following layers are obtained byﬁltering the image with gradually bigger masks,taking into account the discrete nature of integral images and the speciﬁc structure of ourﬁlters.

Note that our main motivation for this type of sampling is its computational eﬃciency.Furthermore,as we do not have to downsample the image,there is no aliasing.On the downside,boxﬁlters prerve high-frequency compo-nents that can get lost in zoomed-out variants of the same scene,which can limit scale-invariance.This was however not noticeable in our experiments.

The scale space is divided into octaves.An octave repre-nts a ries ofﬁlter respon maps obtained by convolv-ing the same input image with aﬁlter of increasing size.In total,an octave encompass a scaling factor of2(which implies that one needs to more than double theﬁlter size, e below).Each octave is subdivided into a constant num-ber of scale levels.Due to the discrete nature of integral images,the minimum scale diﬀerence between two sub-quent scales depends on the length l0of the positive or neg-ative lobes of the partial cond order derivative in the direction of derivation(x or y),which is t to a third of theﬁlter size length.For the9Â9ﬁlter,this length l0is 3.For two successive levels,we must increa this size by

Fig.3.Top:Repeatability score for image rotation of up to180°.Hessian-

bad detectors have in general a lower repeatability score for angles

Fig.4.Instead of iteratively reducing the image size(left),the u of

integral images allows the up-scaling of theﬁlter at constant cost(right).

H.Bay et al./Computer Vision and Image Understanding110(2008)346–359349

a minimum of 2pixels (1pixel on every side)in order to keep the size uneven and thus ensure the prence of the central pixel.This results in a total increa of the mask size by 6pixels (e Fig.5).Note that for dimensions diﬀerent from l he width of the central band for the vertical ﬁlter in Fig.5),rescaling the mask introduces rounding-oﬀerrors.However,since the errors are typically much smaller than l 0,this is an acceptable approximation.

The construction of the scale space starts with the 9Â9ﬁlter,which calculates the blob respon of the image for the smallest scale.Then,ﬁlters with sizes 15Â15,21Â21,and 27Â27are applied,by which even more than a scale change of two has been achieved.But this is needed,as a 3D non-maximum suppression is applied both spa-tially and over the neighbouring scales.Hence,the ﬁrst and last Hessian respon maps in the stack cannot contain such maxima themlves,as they are ud for reasons of comparison only.Therefore,after interpolation,e Sec-tion 3.4,the smallest possible scale is r ¼1:6¼1:2129

corre-sponding to a ﬁlter size of 12Â12,and the highest to r ¼3:2¼1:224.For more details,we refer to [2].Similar considerations hold for the other octaves.For each new octave,the ﬁlter size increa is doubled (going from 6–12to 24–48).At the same time,the sampling inter-vals for the extraction of the interest points can be doubled as well for every new octave.This reduces the computation time and the loss in accuracy is comparable to the image sub-sampling of the traditional approaches.The ﬁlter sizes for the cond octave are 15,27,39,51.A third octave is com-puted with the ﬁlter sizes 27,51,75,99and,if the original image size is still larger than the corresponding ﬁlter sizes,the scale space analysis is performed for a fourth octave,

using the ﬁlter sizes 51,99,147,and 195.Fig.6gives an over-view of the ﬁlter sizes for the ﬁrst three octaves.Further octaves can be computed in a similar way.In typical scale-space analysis however,the number of detected interest points per octave decays very quickly,cf.Fig.7.

The large scale changes,especially between the ﬁrst ﬁl-ters within the octaves (from 9to 15is a change of 1.7),renders the sampling of scales quite crude.Therefore,we have also implemented a scale space with a ﬁner sam-pling of the scales.This computes the integral image on the image up-scaled by a factor of 2,and then starts the ﬁrst octave by ﬁltering with a ﬁlter of size 15.Additional ﬁlter sizes are 21,27,33,and 39.Then a cond octave starts,again using ﬁlters which now increa t

歌曲《父亲》歌词heir sizes by 12pixels,after which a third and fourth octave follow.Now the scale change between the ﬁrst two ﬁlters is only 1.4(21/15).The lowest scale for the accurate version that can be detected through quadratic interpolation is s ¼ð1:2189

Þ=2¼1:2.As the Frobenius norm remains constant for our ﬁlters at any size,they are already scale normalid,and no fur-ther weighting of the ﬁlter respon is required,for more information on that topic,e [22].3.4.Interest point localisation

In order to locali interest points in the image and over scales,a non-maximum suppression in a 3Â3Â3neigh-bourhood is applied.Speciﬁcally,we u a fast variant introduced by Neubeck and Van Gool [33].The maxima of the determinant of the Hessian matrix are then interpo-lated in scale and image space with the method propod by Brown and Lowe [5].

Scale space interpolation is especially important in our ca,as the diﬀerence in scale between the ﬁrst layers of every octave is relatively large.Fig.8shows an example of the detected interest points using our ‘Fast-Hessian’detector.4.Interest point description and matching

Our descriptor describes the distribution of the intensity content within the interest point neighbourhood,similar

Fig.5.Filters D yy (top)and D xy (bottom)for two successive scale levels (9Â9and 15Â15).The length of the dark lobe can only be incread by an even number of pixels in order to guarantee the prence of a central pixel

(top).

Fig.6.Graphical reprentation of the ﬁlter side lengths for three diﬀerent octaves.The logarithmic horizontal axis reprents the scales.Note that the octaves are overlapping in order to cover all possible scales amlessly.

350H.Bay et al./Computer Vision and Image Understanding 110(2008)346–359

本文发布于:2023-07-31 20:55:33，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1103488.html

上一篇：fault diagnosis bad on entropy feature fusion of enmble empirical mode decomposition

下一篇：rfc5938.Individual Session Control Feature for TWAMP

标签：父亲精品驱逐舰

留言与评论（共有 0 条评论）