首页 > 美文阅读

lowe-ijcv04(SIFT)

更新时间:2023-06-25 02:55:54 阅读：评论：0

Distinctive Image Features

from Scale-Invariant Keypoints

David G.Lowe

盘庚迁殷Computer Science Department

University of British Columbia

Vancouver,B.C.,Canada

lowe@cs.ubc.ca

梦见下雨天January5,2004

Abstract

This paper prents a method for extracting distinctive invariant features from images that can be ud to perform reliable matching between different views of an object or scene.The features are invariant to image scale and rotation,and are shown to provide robust matching across a a substantial range of afﬁ

ne dis-tortion,change in3D viewpoint,addition of noi,and change in illumination.

The features are highly distinctive,in the n that a single feature can be cor-rectly matched with high probability against a large databa of features from many images.This paper also describes an approach to using the features for object recognition.The recognition proceeds by matching individual fea-tures to a databa of features from known objects using a fast nearest-neighbor algorithm,followed by a Hough transform to identify clusters belonging to a sin-gle object,andﬁnally performing veriﬁcation through least-squares solution for consistent po parameters.This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance. Accepted for publication in the International Journal of Computer Vision,2004.

1Introduction

Image matching is a fundamental aspect of many problems in computer vision,including object or scene recognition,solving for3D structure from multiple images,stereo correspon-dence,and motion tracking.This paper describes image features that have many properties that make them suitable for matching differing images of an object or scene.The features are invariant to image scaling and rotation,and partially invariant to change in illumination and 3D camera viewpoint.They are well locali

zed in both the spatial and frequency domains,re-ducing the probability of disruption by occlusion,clutter,or noi.Large numbers of features can be extracted from typical images with efﬁcient algorithms.In addition,the features are highly distinctive,which allows a single feature to be correctly matched with high probability against a large databa of features,providing a basis for object and scene recognition.

The cost of extracting the features is minimized by taking a cascadeﬁltering approach, in which the more expensive operations are applied only at locations that pass an initial test. Following are the major stages of computation ud to generate the t of image features:

1.Scale-space extrema detection:Theﬁrst stage of computation arches over all scales

and image locations.It is implemented efﬁciently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation.

2.Keypoint localization:At each candidate location,a detailed model isﬁt to determine

location and scale.Keypoints are lected bad on measures of their stability.

3.Orientation assignment:One or more orientations are assigned to each keypoint lo-

cation bad on local image gradient directions.All future operations are performed on image data that has been transformed relative to the assigned orientation,scale,and location for each feature,thereby providing invariance to the transformations.

4.Keypoint descriptor:The local image gradients are measured at the lected scale

沙发翻新多少钱in the region around each keypoint.The are transformed into a reprentation that allows for signiﬁcant levels of local shape distortion and change in illumination.

This approach has been named the Scale Invariant Feature Transform(SIFT),as it transforms image data into scale-invariant coordinates relative to local features.

An important aspect of this approach is that it generates large numbers of features that denly cover the image over the full range of scales and locations.A typical image of size 500x500pixels will give ri to about2000stable features(although this number depends on both image content and choices for various parameters).The quantity of features is partic-ularly important for object recognition,where the ability to detect small objects in cluttered backgrounds requires that at least3features be correctly matched from each object for reli-able identiﬁcation.

For image matching and recognition,SIFT features areﬁrst extracted from a t of ref-erence images and stored in a databa.A new image is matched by individually comparing each feature from the new image to this previous databa andﬁnding candidate match-ing features bad on Euclidean distance of their feature vectors.This paper will discuss fast nearest-neighbor algorithms that can perform this computation rapidly against large databas.

The keypoint descriptors are highly distinctive,which allows a single feature toﬁnd its correct match with good probability in a large databa of features.However,in a cluttered

image,many features from the background will not have any correct match in the databa, giving ri to many fal matches in addition to the correct ones.The correct matches can beﬁltered from the full t of matches by identifying subts of keypoints that agree on the object and its location,scale,and orientation in the new image.The probability that veral features will agree on the parameters by chance is much lower than the probability that any individual feature match will be in error.The determination of the consistent clusters can be performed rapidly by using an efﬁcient hash table implementation of the generalized Hough transform.

Each cluster of3or more features that agree on an object and its po is then subject to further detail

ed veriﬁcation.First,a least-squared estimate is made for an afﬁne approxi-mation to the object po.Any other image features consistent with this po are identiﬁed, and outliers are discarded.Finally,a detailed computation is made of the probability that a particular t of features indicates the prence of an object,given the accuracy ofﬁt and number of probable fal matches.Object matches that pass all the tests can be identiﬁed as correct with high conﬁdence.

2Related rearch火炬之光2攻略

The development of image matching by using a t of local interest points can be traced back to the work of Moravec(1981)on stereo matching using a corner detector.The Moravec detector was improved by Harris and Stephens(1988)to make it more repeatable under small image variations and near edges.Harris also showed its value for efﬁcient motion tracking and3D structure from motion recovery(Harris,1992),and the Harris corner detector has since been widely ud for many other image matching tasks.While the feature detectors are usually called corner detectors,they are not lecting just corners,but rather any image location that has large gradients in all directions at a predetermined scale.

The initial applications were to stereo and short-range motion tracking,but the approach was later ext

ended to more difﬁcult problems.Zhang et al.(1995)showed that it was possi-ble to match Harris corners over a large image range by using a correlation window around each corner to lect likely matches.Outliers were then removed by solving for a funda-mental matrix describing the geometric constraints between the two views of rigid scene and removing matches that did not agree with the majority solution.At the same time,a similar approach was developed by Torr(1995)for long-range motion matching,in which geometric constraints were ud to remove outliers for rigid objects moving within an image.

The ground-breaking work of Schmid and Mohr(1997)showed that invariant local fea-ture matching could be extended to general image recognition problems in which a feature was matched against a large databa of images.They also ud Harris corners to lect interest points,but rather than matching with a correlation window,they ud a rotationally invariant descriptor of the local image region.This allowed features to be matched under arbitrary orientation change between the two images.Furthermore,they demonstrated that multiple feature matches could accomplish general recognition under occlusion and clutter by identifying consistent clusters of matched features.

The Harris corner detector is very nsitive to changes in image scale,so it does not provide a good basis for matching images of different sizes.Earlier work by the author (Lowe,1999)extended the loc

al feature approach to achieve scale invariance.This work also described a new local descriptor that provided more distinctive features while being less

nsitive to local image distortions such as3D viewpoint change.This current paper provides a more in-depth development and analysis of this earlier work,while also prenting a number of improvements in stability and feature invariance.

There is a considerable body of previous rearch on identifying reprentations that are stable under scale change.Some of theﬁrst work in this area was by Crowley and Parker (1984),who developed a reprentation that identiﬁed peaks and ridges in scale space and linked the into a tree structure.The tree structure could then be matched between images with arbitrary scale change.More recent work on graph-bad matching by Shokoufandeh, Marsic and Dickinson(1999)provides more distinctive feature descriptors using wavelet co-efﬁcients.The problem of identifying an appropriate and consistent scale for feature detection has been studied in depth by Lindeberg(1993,1994).He describes this as a problem of scale lection,and we make u of his results below.

Recently,there has been an impressive body of work on extending local features to be invariant to ful

l afﬁne transformations(Baumberg,2000;Tuytelaars and Van Gool,2000; Mikolajczyk and Schmid,2002;Schaffalitzky and Zisrman,2002;Brown and Lowe,2002). This allows for invariant matching to features on a planar surface under changes in ortho-graphic3D projection,in most cas by resampling the image in a local afﬁne frame.How-ever,none of the approaches are yet fully afﬁne invariant,as they start with initial feature scales and locations lected in a non-afﬁne-invariant manner due to the prohibitive cost of exploring the full afﬁne space.The afﬁne frames are are also more nsitive to noi than tho of the scale-invariant features,so in practice the afﬁne features have lower repeatability than the scale-invariant features unless the afﬁne distortion is greater than about a40degree tilt of a planar surface(Mikolajczyk,2002).Wider afﬁne invariance may not be important for many applications,as training views are best taken at least every30degrees rotation in view-point(meaning that recognition is within15degrees of the clost training view)in order to capture non-planar changes and occlusion effects for3D objects.

While the method to be prented in this paper is not fully afﬁne invariant,a different approach is ud in which the local descriptor allows relative feature positions to shift signif-icantly with only small changes in the descriptor.This approach not only allows the descrip-tors to be reliably matched across a considerable range of afﬁne distortion,but it also makes the features more robust against ch

anges in3D viewpoint for non-planar surfaces.Other advantages include much more efﬁcient feature extraction and the ability to identify larger numbers of features.On the other hand,afﬁne invariance is a valuable property for matching planar surfaces under very large view changes,and further rearch should be performed on the best ways to combine this with non-planar3D viewpoint invariance in an efﬁcient and stable manner.

提升自我

Many other feature types have been propod for u in recognition,some of which could be ud in addition to the features described in this paper to provide further matches under differing circumstances.One class of features are tho that make u of image contours or region boundaries,which should make them less likely to be disrupted by cluttered back-grounds near object boundaries.Matas et al.,(2002)have shown that their maximally-stable extremal regions can produce large numbers of matching features with good stability.Miko-lajczyk et al.,(2003)have developed a new descriptor that us local edges while ignoring unrelated nearby edges,providing the ability toﬁnd stable features even near the boundaries of narrow shapes superimpod on background clutter.Nelson and Selinger(1998)have shown good results with local features bad on groupings of image contours.Similarly,

Pope and Lowe(2000)ud features bad on the hierarchical grouping of image contours, which ar

e particularly uful for objects lacking detailed texture.

The history of rearch on visual recognition contains work on a diver t of other image properties that can be ud as feature measurements.Carneiro and Jepson(2002) describe pha-bad local features that reprent the pha rather than the magnitude of local spatial frequencies,which is likely to provide improved invariance to illumination.Schiele and Crowley(2000)have propod the u of multidimensional histograms summarizing the distribution of measurements within image regions.This type of feature may be particularly uful for recognition of textured objects with deformable shapes.Basri and Jacobs(1997) have demonstrated the value of extracting local region boundaries for recognition.Other uful properties to incorporate include color,motion,ﬁgure-ground discrimination,region shape descriptors,and stereo depth cues.The local feature approach can easily incorporate novel feature types becau extra features contribute to robustness when they provide correct matches,but otherwi do little harm other than their cost of computation.Therefore,future systems are likely to combine many feature types.

文竹作文

3Detection of scale-space extrema

As described in the introduction,we will detect keypoints using a cascadeﬁltering approach that us

efﬁcient algorithms to identify candidate locations that are then examined in further detail.Theﬁrst stage of keypoint detection is to identify locations and scales that can be repeatably assigned under differing views of the same object.Detecting locations that are invariant to scale change of the image can be accomplished by arching for stable features across all possible scales,using a continuous function of scale known as scale space(Witkin, 1983).

It has been shown by Koenderink(1984)and Lindeberg(1994)that under a variety of reasonable assumptions the only possible scale-space kernel is the Gaussian function.There-fore,the scale space of an image is deﬁned as a function,L(x,y,σ),that is produced from the convolution of a variable-scale Gaussian,G(x,y,σ),with an input image,I(x,y):

L(x,y,σ)=G(x,y,σ)∗I(x,y),

头像怎么换where∗is the convolution operation in x and y,and

G(x,y,σ)=

当兵一个月多少工资1

2πσ2e

−(x2+y2)/2σ2.

To efﬁciently detect stable keypoint locations in scale space,we have propod(Lowe,1999) using scale-space extrema in the difference-of-Gaussian function convolved with the image, D(x,y,σ),which can be computed from the difference of two nearby scales parated by a constant multiplicative factor k:

D(x,y,σ)=(G(x,y,kσ)−G(x,y,σ))∗I(x,y)

=L(x,y,kσ)−L(x,y,σ).(1) There are a number of reasons for choosing this function.First,it is a particularly efﬁcient function to compute,as the smoothed images,L,need to be computed in any ca for scale space feature description,and D can therefore be computed by simple image subtraction.

本文发布于:2023-06-25 02:55:54，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/82/1033270.html

上一篇：最新读简爱的读书心得(4篇)

下一篇：时代楷模黄文秀心得体会500字时代楷模黄文秀先进事迹心得体会(六篇)

标签：梦见火炬文竹当兵

留言与评论（共有 0 条评论）