Fast SIFT Design for Real-Time Visual

更新时间:2023-06-07 11:37:47 阅读: 评论:0

Fast SIFT Design for Real-Time Visual
Feature Extraction
Liang-Chi Chiu,Tian-Sheuan Chang,Senior Member,IEEE,Jiun-Yen Chen,and Nelson Yen-Chung Chang
Abstract—Visual feature extraction with scale invariant feature transform(SIFT)is widely ud for object recognition. However,its real-time implementation suffers from long latency, heavy computation,and high memory storage becau of its frame level computation with iterated Gaussian blur operations. Thus,this paper propos a layer parallel SIFT(LPSIFT)with integral image,and its parallel hardware design with an on-the-fly feature extractionflow for real-time application needs.Com-pared with the original SIFT algorithm,the propod approach reduces the computational amount by90%and memory usage by95%.Thefinal implementation us580-K gate count with90-nm CMOS technology,and offers6000feature points/frame for VGA images at30frames/s and∼2000feature points/frame for 1920×1080images at30frames/s at the clock rate of100MHz.
Index Terms—Feature extraction,SIFT,VLSI design.
I.I NTRODUCTION
V ISUAL feature extraction is a key technology of com-puter vision for intelligent video processing.Among extraction techniques,scale invariant feature transform (SIFT)[1]is the most widely adopted approach that provides stable visual feature points for reliable object detection.SIFT detects and describes local feature points in images for object detection in different scenes or different angles.The fea-ture points describe the strength and direction of the object, which resists the frame distortion such as zooming,rotation, movement,perspective change,shading and noi.To provide the stable points,SIFT computation involves the image convolution with Gaussianfilters at different scales,and local maximum or minimum of the difference of Gaussians(DoG) blurred images at multiple scales.However,the real time implementation of this algorithm faces the challenges of heavy computation,large memory storage and long computational latency becau of its frame level computation with iterated Gaussian blur operations on images and the frame difference operations on blurred images for feature extraction.Thus,a fast algorithm and its VLSI design are demanded. Manuscript received November5,2012;revid February19,2013; accepted April19,2013.Date of publication April24,2013;date of current version June4,2013.This work was supported in part by the Industrial Technology Rearch Institute under Grant A301AR3710.The associate editor coordinating the review of this manuscript and approving it for publication was Prof.Mark H.-Y.Liao.
L.-C.Chiu is with PixelArt Technology,Hsinchu78229,Taiwan(e-mail: oboe.).
bisT.-S.Chang is with the Dept.Electronics Engineering,National Chiao Tung University,Hsinchu300,Taiwan(e-mail:u.edu.tw). J.Y.Chen and N.Y.  C.Chang are with the Industrial Technology Rearch Institute,Hsinchu31040,Taiwan(e-mail:itri990075@itri.tw; NelsonChang@itri.tw).
Color versions of one or more of thefigures in this paper are available online at ieeexplore.ieee.
Digital Object Identifier10.1109/TIP.2013.2259841
Previous implementations of SIFT can be classified into following categories:multi-core,GPU,or FPGA bad approaches.The multi-core[3],[4]or GPU approaches[5]–[8] ud massive parallel computing resource to speed up the com-putation.Works[9]–[12]bad on the FPGA platform ud parallel FPGA resources to speed up the Gaussian blurring in SIFT.However,the corresponding feature point calculation was still computed by the processor,which results in heavy data bus congestion and reduces the performance.Beyond above implementations,the fast algorithm like SURF[2]ud the integral image concept to speed up the Gaussian blur computation,which was follo
wed by veral works[13]–[15]. Integral image reduces the computation significantly.However, all the approaches still need iterations of Gaussian blur, which hinders their real time applications on larger frame size and increasing number of features points.
To solve above issues,this paper propod a layer parallel SIFT(LPSIFT)with integral image and its parallel hardware design for real time application needs.First,to avoid the long latency due to the frame level computation,we adopted the layer parallel restructured box kernel to replace iterated Gaussian blur operations.The computation of box kernel was further simplified by the integral image approach with reu of sub-kernel sum.With this,the latency of thefirst feature point was reduced to veral image lines(depending on the location of that feature point)from veral frames.For the keypoint localization,we simplified the complex low contrast analysis to be a low brightness test.For hardware design,we adopted the on-the-fly feature extractionflow so that only partial temporal results have to be stored.Furthermore,the costly inver square root and divider in keypoint localization were implemented by a low cost universal operation unit with precision equivalent cycles(PECs)to reduce the gate count. With above methods,SIFT can be implemented with low hardware cost for real time high definition videos.
The rest of the paper is organized as follows.In Section II, we will briefly describe the operation of SIill的反义词
FT.In Section III, we will prent the propod algorithm.Its corresponding hardware design is shown in Section IV.Then,in Section IV, we will show the implementation results and comparison. Finally,a conclusion will be made in Section V.
II.O VERVIEW OF SIFTnon woven fabric
Fig.1shows the overall algorithmflow of SIFT[1], which consists of scale space extrema detection,accurate keypoint localization,orientation assignment and the local image descriptor.
1057-7149/$31.00©2013IEEE
Fig.1.SIFT algorithm:(1)scale-space extrema detection,(2)accurate
keypoint localization,(3)orientation assignment,and (4)the local image descriptor.
The first step,scale space extrema detection,identifies keypoint candidates.It first convolves the image S k with Gaussian filters G F (x ,y ,k σ)at different scale k σto build a pyramid of Gaussian blurred images.In which,
S k =S (x ,y ,k σ)=G F (x ,y ,k σ)∗S k −1
(1)
where for k =1,S k −1=S 0=I (x ,y )is the input image,and
G F (x ,y ,k σ)= 12π(k σ)2 e −
(x 2+y 2)2(k σ)2.(2)The convolved images are grouped by octave (an octave corresponds to doubling the value of σ),and the value of k is lected for a fixed number of convolved images per octave.The frame at the highest scale per octave is downsampled by four as the input frame for next octave,as depicted in Fig.1.In this paper,we denote the frame in a pyramid of Gaussian blurred imag
es by its octave and scale,G [Octave ,Scale ].With this,the Gaussian pyramid can be defined as follows:G [0,0]=I (x ,y ):the original frame (3)G [0,1]=G F (x ,y ,σ)∗G [0,0](4)
bxlG [m ,k ]=G F (x ,y ,k σ)∗G [m ,k −1]
for m ,k ≥0(5)
where G [m ,0]is obtained by downsampling G [m −1,maximum k in a octave]by four.
Then,we can compute the DoG blurred images by
DoG [m ,k ]=G [m ,k +1]−G [m ,k ]
(6)
expresswaywith DoG,we can find the local maximum or minimum in a 3×3window as the keypoint candidates by comparing the pixels at neighboring scales.
Among the candidates,stable keypoints are localized by rejecting some bad keypoints in the keypoint localization step
with a low contrast and edge detection test.The low contrast test rejects the unstable extrema location,defined by
ˆx =−
∂2
DoG −1
wind万点
pbo∂x 2  ∂DoG ∂x  (7)with a low contrast threshold derived from the Taylor expan-sion of DoG [m ,k ]
DoG  ˆx  =DoG +
12 ∂DoG T
∂x
ˆx .(8)Edge detection is decided by
Tr (H )2Det (H )<(r +1)2r
(9)
where r is a coefficient to adapt the sharpness of object edge,and Tr (H )and Det (H )are the trace and determinant of 2×2Hessian matrix
H =    DoG x x DoG xy DoG xy DoG yy
(10)Tr (H )=DoG x x +DoG yy =α+β(11)
Det (H )=DoG x x DoG yy −DoG xy 2=αβ.
(12)
Bad on above test,if the candidate is verified to be a feature point,SIFT begins to calculate its orientation assignment.In the orientation assignment step,each keypoint is assigned one or more orientations bad on local image gradient directions to achieve invariance to rotation.In which,SIFT lects the largest vector,S -vector,to reprent the orientation assignment.The steps ensure invariance to image location,scale and rotation.Finally,in the local image detector step,a descriptor vector for each keypoint is computed such that it is highly distinctive and partially invariant to other variations.III.SIFT A NALYSIS AND P ROPOSED LPSIFT D ESIGN A.Design Analysis of SIFT
Table I shows the design analysis of SIFT for parameters [octave,scale]=[2,4].In this table,A is the nu
mber of pixels in a frame and its first level down sampled frame defined as A =W H +1
4W H =114
W H ,
(13)where W and H are width and height of the original frame.B reprents the number of feature points.
From this table,we can find that the scale-space extrema detection and the keypoint localization contribute to most of overall computation,and require a lot of memory to save inter-mediate data frames.In addition,the keypoint localization,orientation assignment,and the local frame descriptor calcu-lation u a lot of dividers to compute Taylor expansion and normalization.This caus hardware implementation difficult.Thus,how to solve the problems is an important issue for SIFT hardware design.
Another problem for SIFT hardware is its highly data depen-dent computational structure.In the data flow of SIFT,a new scale image has to wait for the completion of the previous scale image in the Gaussian pyramid.Thus,different scale images are computed quentially.This results in long latency and prohibits the following DoG computation in parallel that needs
ninja是什么意思Fig.2.Algorithmflow of the propod LPSIFT.
at least three scale images at the same time to calculate the candidates of features points.Thus,the intermediate scale images have to be stored until completing DoG computations, which results in large memory storage.
B.Propod LPSIFT Algorithm
Fig.2shows the propod LPSIFT algorithm,which is bad on SIFT with three major modifications:fast scale space extreme detection with layer parallel Gaussian pyramid and integral image,and simplified keypoint localization with a brightness threshold.
In thisflow,we propo a layer parallel algorithm to solve the data dependency problem,which not only calculate Gaussian pyramid and DoG pyramid at the same time but also calculate keypoint candidates simultaneously.This par-allel computation reduces storage and latency from multiple whole images to veral image rows.To assist this parallel computation while reduce the computational complexity,we u the integral image approach to create a box kernel with the respon similar to that of the Gaussianfilter so as to keep the feature matching performance as clo as that of SIFT but without complex Gaussianfilters.By adopting the integral image approach for multiple box kernels in each scale, we reduce computational complexity significantly and enable parallel computation for all scales.
Another problem is the high complexity of Taylor expan-sion.We propo a low brightness method instead of the low contrast method to reduce complexity.
1)Fast Scale Space Extreme Detection With Layer Parallel Gaussian Pyramid and Integral Image:Fig.3shows the concept of the layer parallel Gaussian pyramid.In the original Gaussian pyramid,each new pyramid level depends on
its
Fig.3.Concept of the layer parallel Gaussian pyramid. previous level.To eliminate such dependency,we can fully expand the pyramid level and merge the multiple kernels into one
G[m,k]=G F(x,y,kσ)∗G[m,k−1]
=(G F(x,y,kσ)∗G F(x,y,(k−1)σ)···
∗G F(x,y,σ))∗G[m,0]
=G F (x,y,k)∗G[m,0]
for positive integer m,and k.(14) With this independent Gaussianfiltering,we can compute all scale layers in parallel and generate the DoG images on demand.
However,the merged kernel will become larger with more kernels merged as shown in Fig.3,which will demand high computational complexity and storage.To solve this,we u the integral image concept that can compute this large kernel with constant time.By using integral image,no matter what the kernel size is,we need only four points of information and three additions,and reduce a lot of computation.With this simple computation,we can compute all scale layers in parallel without worry about the heavy complexity and storage due to layer parallel computation.
The integral image works best for the box kernel,but the box kernel results in inferior matching performance.Thus,we u restructured box kernels to approach the performance by the Gaussianfilter.Take thefirst scale5×5box kernel in Fig.4(c)as an example.We modify this simple kernel to be the kernel in Fig.4(d)to approach the performance of Gaussian filter.Its computation with integral image can be very simple as shown in Fig.4(b).In this paper,we lect the size of box kernels as5×5,7×7,and3×3to construct thefilters for different scales with appropriate modifications as the example illustrated in Fig.4.
2)Simplified Keypoint Localization With a Brightness Threshold:SIFT us Taylor expansion to exclude the low contrast candidates,but Taylor expansion needs complex com-putation.In practical,the low contrast candidates almost have low brightness,too.Therefore,low brightness candidates can be excluded according to this threshold
threshold new=(2precision of DoG−1)∗coefficient bright.
In which,Coefficient bright is a ratio of the gray level luminance in the DoG image with the default value0.04.This simplifi-cation has similar performance as that by Taylor expansion according to our simulation,but this method saves a lot of calculation and is more suitable for hardware computation.
C OMPUTATIONAL C OMPLEXITY AN
D M EMORY S TORAG
E A NALYSIS O
compound是什么意思
F SIFT
Memory (byte)
Multiplication
Division Addition Smoothing A 81A 080A Scale-space extrema 4A 2916A 02933A Keypoint localization 3A 59A 50A 132B Orientation assignment 0450B 225B 673B The local frame descriptor 0450B 225B 705B Total
8A
3056A +900B
50A +450B
3013A +1510
B
Fig.4.Mask for 5×5box kernel,its integral image reprentation of (a)original one (b)and modified one,and the corresponding pixel domain of (c)original one,(d)and modified one.
Table II shows the complexity analysis of our propod algorithm for parameters [octave,scale]=[2,4],with the comparison of VGA and full HD images in Table III.In Table II,C is the number of frame width in a group of down sample twice (C =W +(1/4)W =1(1/4)W ),and W is the width of the original frame.In the first three steps,computation with integral image only us additions to eliminate all mul-tiplications and data dependent computation.In the keypoint localization,divisions to compute Taylor expansion are also eliminated.Beyond above computation reduction,we only need a few rows of the integral image for the box kernel bad computation instead the whole integral image.In summary,th
e propod algorithm can reduce a lot of computation and memory usage,which are uful for the following hardware implementation.
IV.P ROPOSED A RCHITECTURE
Fig.5shows the propod architecture that consists of two stages:scale-space extrema detection and keypoint localization in the stage one,and orientation assignment and the local frame descriptor in the stage two.In the stage one,we adopt the propod LPSIFT with integral image to reduce the com-putation significantly,and further reduce the required睥睨什么意思
integral
Fig.5.Propod architecture
diagram.
Fig.6.Scheduling for a VGA image.
image buffer by the on-the-fly computation scheduling.In the stage two,we adopt the propod brightness threshold to simplify computation and implement the normalization in keypoint localization with a low cost universal operation unit with precision equivalent cycles (PEC).
Fig.6shows the propod scheduling for a VGA image,operated row by row.Our propod design specification is [octave,scale]=[2,4],and supports more than 1000feature points in one VGA frame with 30frames per cond at 100MHz.Each row spends 19cycles to calculate the initial information in the beginning.After that we find the feature point candidates of the stage one.If a candidate is a feature point,the stage two starts to compute the required information of that feature point.When the former step completes,oper-ation is back to the stage one.In summary,the computation time of the stage one is not fixed until a feature point is found,while the stage 2takes 520cycles.
C OMPUTATION C OMPLEXITY OF P ROPOSE
D A LGORITHM
Memory (byte)
Multiplication
Division Addition Smoothing 7C 0058A Integral image 80C 003A Scale-space extrema 89300383A Keypoint localization 2727A 012A Orientation assignment 256450B 225B 673B The local frame descriptor 0450B 225B 705B Total
87C +1176
27A +900B
450B
456A +1378B
TABLE III
C OMPARISON OF P ROPOSE
D AND SIFT A LGORITHM VGA Size Full HD Size
Mem.(byte)
Mul.Div.Add.Mem.(byte)Mul.Div.Add.SIFT    3.07E+06  1.17E+09  1.97E+07  1.16E+09  2.07E+077.92E+09  1.30E+087.81E+09Propod 7.08E+04  1.13E+07  4.50E+05  1.76E+08  2.10E+057.09E+07  4.50E+05  1.18E+09Saving (%)
97.7
99.0
97.7
84.8
99.0
99.1
99.7
84.9
Fig.7.Architecture of the stage one.
A.Stage One With On-the-Fly Computation Scheduling Fig.7shows the architecture of the stage one,which
applies the noi smoothing to the input image and builds the integral image for following filtering operations.With enough rows of integral image,we begin to compute scale images and corresponding DoG images to find feature point candidates at the same time as shown in Fig.8,called on-the-fly computation.In which,when we calculate the last frame in the first layer (G [0,3]),we will downsample this frame to the first frame of next layer (G [1,0])simultaneously.Therefore,we not only compute different scales in an octave at the same time but also compute different octaves at the same time.With this on-the-fly computation,we need only a few rows of buffer cost instead of multiple frames due to the Gaussian pyramid and DoG pyramid.
B.Architecture of the Stage Two
Fig.9shows the design of the keypoint localization after obtaining the DoG.We can compute the feature point can-didate by examining conditions such as the local maximum and minimum value,low brightness and edge detection.Since all the conditions are independent to each other,we can compute each condition simultaneously with a flag bit to denote if it matches the feature point criteria.If all flag bits are zeros,that indicates the candidate is approved to be a new feature
point.
Fig.8.Parallel computation of the Gaussian pyramid,and DoG pyramid to find feature
points.
Fig.9.Hardware architecture of the keypoint localization.
Fig.10shows the design of the orientation assignment.In this design,19×19pixels of integral image will be fetched again to re-compute 15×15information of scale 1since no scale image is stored.In this recomputation step,we will

本文发布于:2023-06-07 11:37:47,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/78/893547.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图