lowe-ijcv04(SIFT)

更新时间:2023-06-25 02:55:54 阅读: 评论:0

Distinctive Image Features
from Scale-Invariant Keypoints
David G.Lowe
盘庚迁殷Computer Science Department
University of British Columbia
Vancouver,B.C.,Canada
lowe@cs.ubc.ca
梦见下雨天January5,2004
Abstract
This paper prents a method for extracting distinctive invariant features from images that can be ud to perform reliable matching between different views of an object or scene.The features are invariant to image scale and rotation,and are shown to provide robust matching across a a substantial range of affi
ne dis-tortion,change in3D viewpoint,addition of noi,and change in illumination.
The features are highly distinctive,in the n that a single feature can be cor-rectly matched with high probability against a large databa of features from many images.This paper also describes an approach to using the features for object recognition.The recognition proceeds by matching individual fea-tures to a databa of features from known objects using a fast nearest-neighbor algorithm,followed by a Hough transform to identify clusters belonging to a sin-gle object,andfinally performing verification through least-squares solution for consistent po parameters.This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance. Accepted for publication in the International Journal of Computer Vision,2004.
1Introduction
Image matching is a fundamental aspect of many problems in computer vision,including object or scene recognition,solving for3D structure from multiple images,stereo correspon-dence,and motion tracking.This paper describes image features that have many properties that make them suitable for matching differing images of an object or scene.The features are invariant to image scaling and rotation,and partially invariant to change in illumination and 3D camera viewpoint.They are well locali
zed in both the spatial and frequency domains,re-ducing the probability of disruption by occlusion,clutter,or noi.Large numbers of features can be extracted from typical images with efficient algorithms.In addition,the features are highly distinctive,which allows a single feature to be correctly matched with high probability against a large databa of features,providing a basis for object and scene recognition.
The cost of extracting the features is minimized by taking a cascadefiltering approach, in which the more expensive operations are applied only at locations that pass an initial test. Following are the major stages of computation ud to generate the t of image features:
1.Scale-space extrema detection:Thefirst stage of computation arches over all scales
and image locations.It is implemented efficiently by using a difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation.
2.Keypoint localization:At each candidate location,a detailed model isfit to determine
location and scale.Keypoints are lected bad on measures of their stability.
3.Orientation assignment:One or more orientations are assigned to each keypoint lo-
cation bad on local image gradient directions.All future operations are performed on image data that has been transformed relative to the assigned orientation,scale,and location for each feature,thereby providing invariance to the transformations.
4.Keypoint descriptor:The local image gradients are measured at the lected scale
沙发翻新多少钱in the region around each keypoint.The are transformed into a reprentation that allows for significant levels of local shape distortion and change in illumination.
This approach has been named the Scale Invariant Feature Transform(SIFT),as it transforms image data into scale-invariant coordinates relative to local features.
An important aspect of this approach is that it generates large numbers of features that denly cover the image over the full range of scales and locations.A typical image of size 500x500pixels will give ri to about2000stable features(although this number depends on both image content and choices for various parameters).The quantity of features is partic-ularly important for object recognition,where the ability to detect small objects in cluttered backgrounds requires that at least3features be correctly matched from each object for reli-able identification.
For image matching and recognition,SIFT features arefirst extracted from a t of ref-erence images and stored in a databa.A new image is matched by individually comparing each feature from the new image to this previous databa andfinding candidate match-ing features bad on Euclidean distance of their feature vectors.This paper will discuss fast nearest-neighbor algorithms that can perform this computation rapidly against large databas.
The keypoint descriptors are highly distinctive,which allows a single feature tofind its correct match with good probability in a large databa of features.However,in a cluttered
image,many features from the background will not have any correct match in the databa, giving ri to many fal matches in addition to the correct ones.The correct matches can befiltered from the full t of matches by identifying subts of keypoints that agree on the object and its location,scale,and orientation in the new image.The probability that veral features will agree on the parameters by chance is much lower than the probability that any individual feature match will be in error.The determination of the consistent clusters can be performed rapidly by using an efficient hash table implementation of the generalized Hough transform.
Each cluster of3or more features that agree on an object and its po is then subject to further detail
ed verification.First,a least-squared estimate is made for an affine approxi-mation to the object po.Any other image features consistent with this po are identified, and outliers are discarded.Finally,a detailed computation is made of the probability that a particular t of features indicates the prence of an object,given the accuracy offit and number of probable fal matches.Object matches that pass all the tests can be identified as correct with high confidence.
2Related rearch火炬之光2攻略
The development of image matching by using a t of local interest points can be traced back to the work of Moravec(1981)on stereo matching using a corner detector.The Moravec detector was improved by Harris and Stephens(1988)to make it more repeatable under small image variations and near edges.Harris also showed its value for efficient motion tracking and3D structure from motion recovery(Harris,1992),and the Harris corner detector has since been widely ud for many other image matching tasks.While the feature detectors are usually called corner detectors,they are not lecting just corners,but rather any image location that has large gradients in all directions at a predetermined scale.
The initial applications were to stereo and short-range motion tracking,but the approach was later ext
ended to more difficult problems.Zhang et al.(1995)showed that it was possi-ble to match Harris corners over a large image range by using a correlation window around each corner to lect likely matches.Outliers were then removed by solving for a funda-mental matrix describing the geometric constraints between the two views of rigid scene and removing matches that did not agree with the majority solution.At the same time,a similar approach was developed by Torr(1995)for long-range motion matching,in which geometric constraints were ud to remove outliers for rigid objects moving within an image.
The ground-breaking work of Schmid and Mohr(1997)showed that invariant local fea-ture matching could be extended to general image recognition problems in which a feature was matched against a large databa of images.They also ud Harris corners to lect interest points,but rather than matching with a correlation window,they ud a rotationally invariant descriptor of the local image region.This allowed features to be matched under arbitrary orientation change between the two images.Furthermore,they demonstrated that multiple feature matches could accomplish general recognition under occlusion and clutter by identifying consistent clusters of matched features.
The Harris corner detector is very nsitive to changes in image scale,so it does not provide a good basis for matching images of different sizes.Earlier work by the author (Lowe,1999)extended the loc
al feature approach to achieve scale invariance.This work also described a new local descriptor that provided more distinctive features while being less
nsitive to local image distortions such as3D viewpoint change.This current paper provides a more in-depth development and analysis of this earlier work,while also prenting a number of improvements in stability and feature invariance.
There is a considerable body of previous rearch on identifying reprentations that are stable under scale change.Some of thefirst work in this area was by Crowley and Parker (1984),who developed a reprentation that identified peaks and ridges in scale space and linked the into a tree structure.The tree structure could then be matched between images with arbitrary scale change.More recent work on graph-bad matching by Shokoufandeh, Marsic and Dickinson(1999)provides more distinctive feature descriptors using wavelet co-efficients.The problem of identifying an appropriate and consistent scale for feature detection has been studied in depth by Lindeberg(1993,1994).He describes this as a problem of scale lection,and we make u of his results below.
Recently,there has been an impressive body of work on extending local features to be invariant to ful
l affine transformations(Baumberg,2000;Tuytelaars and Van Gool,2000; Mikolajczyk and Schmid,2002;Schaffalitzky and Zisrman,2002;Brown and Lowe,2002). This allows for invariant matching to features on a planar surface under changes in ortho-graphic3D projection,in most cas by resampling the image in a local affine frame.How-ever,none of the approaches are yet fully affine invariant,as they start with initial feature scales and locations lected in a non-affine-invariant manner due to the prohibitive cost of exploring the full affine space.The affine frames are are also more nsitive to noi than tho of the scale-invariant features,so in practice the affine features have lower repeatability than the scale-invariant features unless the affine distortion is greater than about a40degree tilt of a planar surface(Mikolajczyk,2002).Wider affine invariance may not be important for many applications,as training views are best taken at least every30degrees rotation in view-point(meaning that recognition is within15degrees of the clost training view)in order to capture non-planar changes and occlusion effects for3D objects.
While the method to be prented in this paper is not fully affine invariant,a different approach is ud in which the local descriptor allows relative feature positions to shift signif-icantly with only small changes in the descriptor.This approach not only allows the descrip-tors to be reliably matched across a considerable range of affine distortion,but it also makes the features more robust against ch
anges in3D viewpoint for non-planar surfaces.Other advantages include much more efficient feature extraction and the ability to identify larger numbers of features.On the other hand,affine invariance is a valuable property for matching planar surfaces under very large view changes,and further rearch should be performed on the best ways to combine this with non-planar3D viewpoint invariance in an efficient and stable manner.
提升自我
Many other feature types have been propod for u in recognition,some of which could be ud in addition to the features described in this paper to provide further matches under differing circumstances.One class of features are tho that make u of image contours or region boundaries,which should make them less likely to be disrupted by cluttered back-grounds near object boundaries.Matas et al.,(2002)have shown that their maximally-stable extremal regions can produce large numbers of matching features with good stability.Miko-lajczyk et al.,(2003)have developed a new descriptor that us local edges while ignoring unrelated nearby edges,providing the ability tofind stable features even near the boundaries of narrow shapes superimpod on background clutter.Nelson and Selinger(1998)have shown good results with local features bad on groupings of image contours.Similarly,
Pope and Lowe(2000)ud features bad on the hierarchical grouping of image contours, which ar
e particularly uful for objects lacking detailed texture.
The history of rearch on visual recognition contains work on a diver t of other image properties that can be ud as feature measurements.Carneiro and Jepson(2002) describe pha-bad local features that reprent the pha rather than the magnitude of local spatial frequencies,which is likely to provide improved invariance to illumination.Schiele and Crowley(2000)have propod the u of multidimensional histograms summarizing the distribution of measurements within image regions.This type of feature may be particularly uful for recognition of textured objects with deformable shapes.Basri and Jacobs(1997) have demonstrated the value of extracting local region boundaries for recognition.Other uful properties to incorporate include color,motion,figure-ground discrimination,region shape descriptors,and stereo depth cues.The local feature approach can easily incorporate novel feature types becau extra features contribute to robustness when they provide correct matches,but otherwi do little harm other than their cost of computation.Therefore,future systems are likely to combine many feature types.
文竹作文
3Detection of scale-space extrema
As described in the introduction,we will detect keypoints using a cascadefiltering approach that us
efficient algorithms to identify candidate locations that are then examined in further detail.Thefirst stage of keypoint detection is to identify locations and scales that can be repeatably assigned under differing views of the same object.Detecting locations that are invariant to scale change of the image can be accomplished by arching for stable features across all possible scales,using a continuous function of scale known as scale space(Witkin, 1983).
It has been shown by Koenderink(1984)and Lindeberg(1994)that under a variety of reasonable assumptions the only possible scale-space kernel is the Gaussian function.There-fore,the scale space of an image is defined as a function,L(x,y,σ),that is produced from the convolution of a variable-scale Gaussian,G(x,y,σ),with an input image,I(x,y):
L(x,y,σ)=G(x,y,σ)∗I(x,y),
头像怎么换where∗is the convolution operation in x and y,and
G(x,y,σ)=
当兵一个月多少工资1
2πσ2e
−(x2+y2)/2σ2.
To efficiently detect stable keypoint locations in scale space,we have propod(Lowe,1999) using scale-space extrema in the difference-of-Gaussian function convolved with the image, D(x,y,σ),which can be computed from the difference of two nearby scales parated by a constant multiplicative factor k:
D(x,y,σ)=(G(x,y,kσ)−G(x,y,σ))∗I(x,y)
=L(x,y,kσ)−L(x,y,σ).(1) There are a number of reasons for choosing this function.First,it is a particularly efficient function to compute,as the smoothed images,L,need to be computed in any ca for scale space feature description,and D can therefore be computed by simple image subtraction.

本文发布于:2023-06-25 02:55:54,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/82/1033270.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:梦见   火炬   文竹   当兵
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图