Fast SIFT Design for Real-Time Visual

更新时间:2023-05-28 07:17:07 阅读：评论：0

瞄的组词Fast SIFT Design for Real-Time Visual

Feature Extraction

Liang-Chi Chiu,Tian-Sheuan Chang,Senior Member,IEEE,Jiun-Yen Chen,and Nelson Yen-Chung Chang

Abstract—Visual feature extraction with scale invariant feature transform(SIFT)is widely ud for object recognition. However,its real-time implementation suffers from long latency, heavy computation,and high memory storage becau of its frame level computation with iterated Gaussian blur operations. Thus,this paper propos a layer parallel SIFT(LPSIFT)with integral image,and its parallel hardware design with an on-the-ﬂy feature extractionﬂow for real-time application needs.Com-pared with the original SIFT algorithm,the propod approach reduces the computational amount by90%and memory usage by95%.Theﬁnal implementation us580-K gate count with90-nm CMOS technology,and offers6000feature points/frame for VGA images at30frames/s and∼2000feature points/frame for 1920×1080images at30frames/s at the clock rate of100MHz.

Index Terms—Feature extraction,SIFT,VLSI design.

I.I NTRODUCTION

V ISUAL feature extraction is a key technology of com-puter vision for intelligent video processing.Among extraction techniques,scale invariant feature transform (SIFT)[1]is the most widely adopted approach that provides stable visual feature points for reliable object detection.SIFT detects and describes local feature points in images for object detection in different scenes or different angles.The fea-ture points describe the strength and direction of the object, which resists the frame distortion such as zooming,rotation, movement,perspective change,shading and noi.To provide the stable points,SIFT computation involves the image convolution with Gaussianﬁlters at different scales,and local maximum or minimum of the difference of Gaussians(DoG) blurred images at multiple scales.However,the real time implementation of this algorithm faces the challenges of heavy computation,large memory storage and long computational latency becau of its frame level computation with iterated Gaussian blur operations on images and the frame difference operations on blurred images for feature extraction.Thus,a fast algorithm and its VLSI design are demanded. Manuscript received November5,2012;revid February19,2013; accepted April19,2013.Date of publication April24,2013;date of current version June4,2013.This work was supported in part by the Industrial Technology Rearch Institute under Grant A301AR3710.The associate editor coordinating the review of this manuscript and approving it for publication was Prof.Mark H.-Y.Liao.

苹果id登陆不了L.-C.Chiu is with PixelArt Technology,Hsinchu78229,Taiwan(e-mail: oboe.).

T.-S.Chang is with the Dept.Electronics Engineering,National Chiao Tung University,Hsinchu300,Taiwan(e-mail:u.edu.tw). J.Y.Chen and N.Y. C.Chang are with the Industrial Technology Rearch Institute,Hsinchu31040,Taiwan(e-mail:w; w).

Color versions of one or more of theﬁgures in this paper are available online at

Digital Object Identiﬁer10.1109/TIP.2013.2259841

宁波石浦

收拢的反义词Previous implementations of SIFT can be classiﬁed into following categories:multi-core,GPU,or FPGA bad approaches.The multi-core[3],[4]or GPU approaches[5]–[8] ud massive parallel computing resource to speed up the com-putation.Works[9]–[12]bad on the FPGA platform ud parallel FPGA resources to speed up the Gaussian blurring in SIFT.However,the corresponding feature point calculation was still computed by the processor,which results in heavy data bus congestion and reduces the performance.Beyond above implementations,the fast algorithm like SURF[2]ud the integral image concept to speed up the Gaussian blur computation,which was followed by veral works[13]–[15]. Integral image reduces the computation signiﬁcantly.However, all

the approaches still need iterations of Gaussian blur, which hinders their real time applications on larger frame size and increasing number of features points.

To solve above issues,this paper propod a layer parallel SIFT(LPSIFT)with integral image and its parallel hardware design for real time application needs.First,to avoid the long latency due to the frame level computation,we adopted the layer parallel restructured box kernel to replace iterated Gaussian blur operations.The computation of box kernel was further simpliﬁed by the integral image approach with reu of sub-kernel sum.With this,the latency of theﬁrst feature point was reduced to veral image lines(depending on the location of that feature point)from veral frames.For the keypoint localization,we simpliﬁed the complex low contrast analysis to be a low brightness test.For hardware design,we adopted the on-the-ﬂy feature extractionﬂow so that only partial temporal results have to be stored.Furthermore,the costly inver square root and divider in keypoint localization were implemented by a low cost universal operation unit with precision equivalent cycles(PECs)to reduce the gate count. With above methods,SIFT can be implemented with low hardware cost for real time high deﬁnition videos.

The rest of the paper is organized as follows.In Section II, we will brieﬂy describe the operation of SIFT.In Section III, we will prent the propod algorithm.Its corresponding hardware design is sho

wn in Section IV.Then,in Section IV, we will show the implementation results and comparison. Finally,a conclusion will be made in Section V.

II.O VERVIEW OF SIFT

Fig.1shows the overall algorithmﬂow of SIFT[1], which consists of scale space extrema detection,accurate keypoint localization,orientation assignment and the local image descriptor.

Fig.1.SIFT algorithm:(1)scale-space extrema detection,(2)accurate

keypoint localization,(3)orientation assignment,and (4)the local image descriptor.

The ﬁrst step,scale space extrema detection,identiﬁes keypoint candidates.It ﬁrst convolves the image S k with Gaussian ﬁlters G F (x ,y ,k σ)at different scale k σto build a pyramid of Gaussian blurred images.In which,

S k =S (x ,y ,k σ)=G F (x ,y ,k σ)∗S k −1

(1)

where for k =1,S k −1=S 0=I (x ,y )is the input image,and

G F (x ,y ,k σ)= 12π(k σ)2 e −

(x 2+y 2)2(k σ)2.(2)The convolved images are grouped by octave (an octave corresponds to doubling the value of σ),and the value of k is lected for a ﬁxed number of convolved images per octave.The frame at the highest scale per octave is downsampled by four as the input frame for next octave,as depicted in Fig.1.In this paper,we denote the frame in a pyramid of Gaussian blurred imag

es by its octave and scale,G [Octave ,Scale ].With this,the Gaussian pyramid can be deﬁned as follows:G [0,0]=I (x ,y ):the original frame (3)G [0,1]=G F (x ,y ,σ)∗G [0,0](4)发音训练方法

G [m ,k ]=G F (x ,y ,k σ)∗G [m ,k −1]

for m ,k ≥0(5)

where G [m ,0]is obtained by downsampling G [m −1,maximum k in a octave]by four.

Then,we can compute the DoG blurred images by

DoG [m ,k ]=G [m ,k +1]−G [m ,k ]

(6)

with DoG,we can ﬁnd the local maximum or minimum in a 3×3window as the keypoint candidates by comparing the pixels at neighboring scales.

Among the candidates,stable keypoints are localized by rejecting some bad keypoints in the keypoint localization step

with a low contrast and edge detection test.The low contrast test rejects the unstable extrema location,deﬁned by

ˆx =−

∂2

DoG −1

∂x 2 ∂DoG ∂x (7)with a low contrast threshold derived from the Taylor expan-sion of DoG [m ,k ]

DoG ˆx =DoG +

12 ∂DoG T

∂x

ˆx .(8)Edge detection is decided by

Tr (H )2Det (H )<(r +1)2r

(9)

where r is a coefﬁcient to adapt the sharpness of object edge,and Tr (H )and Det (H )are the trace and determinant of 2×2Hessian matrix

H = DoG x x DoG xy DoG xy DoG yy

(10)Tr (H )=DoG x x +DoG yy =α+β(11)

Det (H )=DoG x x DoG yy −DoG xy 2=αβ.

(12)

Bad on above test,if the candidate is veriﬁed to be a feature point,SIFT begins to calculate its orientation assignment.In the orientation assignment step,each keypoint is assigned one or more orientations bad on local image gradient directions to achieve invariance to rotation.In which,SIFT lects the largest vector,S -vector,to reprent the orientation assignment.The steps ensure invariance to image location,scale and rotation.Finally,in the local image detector step,a descriptor vector for each keypoint is computed such that it is highly distinctive and partially invariant to other variations.III.SIFT A NALYSIS AND P ROPOSED LPSIFT D ESIGN A.Design Analysis of SIFT

Table I shows the design analysis of SIFT for parameters [octave,scale]=[2,4].In this table,A is the nu

mber of pixels in a frame and its ﬁrst level down sampled frame deﬁned as A =W H +1

4W H =114

W H ,

(13)where W and H are width and height of the original frame.B reprents the number of feature points.

From this table,we can ﬁnd that the scale-space extrema detection and the keypoint localization contribute to most of overall computation,and require a lot of memory to save inter-mediate data frames.In addition,the keypoint localization,orientation assignment,and the local frame descriptor calcu-lation u a lot of dividers to compute Taylor expansion and normalization.This caus hardware implementation difﬁcult.Thus,how to solve the problems is an important issue for SIFT hardware design.

Another problem for SIFT hardware is its highly data depen-dent computational structure.In the data ﬂow of SIFT,a new scale image has to wait for the completion of the previous scale image in the Gaussian pyramid.Thus,different scale images are computed quentially.This results in long latency and prohibits the following DoG computation in parallel that needs

Fig.2.Algorithmﬂow of the propod LPSIFT.

at least three scale images at the same time to calculate the candidates of features points.Thus,the intermediate scale images have to be stored until completing DoG computations, which results in large memory storage.

B.Propod LPSIFT Algorithm

Fig.2shows the propod LPSIFT algorithm,which is bad on SIFT with three major modiﬁcations:fast scale space extreme detection with layer parallel Gaussian pyramid and integral image,and simpliﬁed keypoint localization with a brightness threshold.

In thisﬂow,we propo a layer parallel algorithm to solve the data dependency problem,which not only calculate Gaussian pyramid and DoG pyramid at the same time but also calculate keypoint candidates simultaneously.This par-allel computation reduces storage and latency from multiple whole images to veral image rows.To assist this parallel computation while reduce the computational complexity,we u the integral image approach to create a box kernel with the respon similar to that of the Gaussianﬁlter so as to keep the feature matching performance as clo as that of SIFT but without complex Gaussianﬁlters.By adopting the integral image approach for multiple box kernels in each scale, we reduce computational complexity signiﬁcantly and enable parallel computation for all scales.

Another problem is the high complexity of Taylor expan-sion.We propo a low brightness method instead of the low contrast method to reduce complexity.

1)Fast Scale Space Extreme Detection With Layer Parallel Gaussian Pyramid and Integral Image:Fig.3shows the concept of the layer parallel Gaussian pyramid.In the original Gaussian pyramid,each new pyramid level depends on

郝经

its

Fig.3.Concept of the layer parallel Gaussian pyramid. previous level.To eliminate such dependency,we can fully expand the pyramid level and merge the multiple kernels into one

G[m,k]=G F(x,y,kσ)∗G[m,k−1]

=(G F(x,y,kσ)∗G F(x,y,(k−1)σ)···

∗G F(x,y,σ))∗G[m,0]

=G F (x,y,k)∗G[m,0]

for positive integer m,and k.(14) With this independent Gaussianﬁltering,we can compute all scale layers in parallel and generate the DoG images on demand.

However,the merged kernel will become larger with more kernels merged as shown in Fig.3,which will demand high computational complexity and storage.To solve this,we u the integral image concept that can compute this large kernel with constant time.By using integral image,no matter what the kernel size is,we need only four points of information and three additions,and reduce a lot of computation.With this simple computation,we can compute all scale layers in parallel without worry about the heavy complexity and storage due to layer parallel computation.

The integral image works best for the box kernel,but the box kernel results in inferior matching performance.Thus,we u restructured box kernels to approach the performance by the Gaussianﬁlter.Take theﬁrst scale5×5box kernel in Fig.4(c)as an example.We modify this simple kernel to be the kernel in Fig.4(d)to approach the performance of Gaussian ﬁlter.Its computation with integral image can be very simple as shown in Fig.4(b).In this paper,we lect the size of box kernels as5×5,7×7,and3×3to construct theﬁlters for different scales with appropriate modiﬁcations as the example illustrated in Fig.4.

2)Simpliﬁed Keypoint Localization With a Brightness Threshold:SIFT us Taylor expansion to exclude the low contrast candidates,but Taylor expansion needs complex com-putation.In practical,the low contrast candidates almost have low brightness,too.Therefore,low brightness candidates can be excluded according to this threshold

threshold new=(2precision of DoG−1)∗coefﬁcient bright.

In which,Coefﬁcient bright is a ratio of the gray level luminance in the DoG image with the default value0.04.This simpliﬁ-cation has similar performance as that by Taylor expansion according to our simulation,but this method saves a lot of calculation and is more suitable for hardware computation.

C OMPUTATIONAL C OMPLEXITY AN

D M EMORY S TORAG

E A NALYSIS O

F SIFT

Memory (byte)

Multiplication

Division Addition Smoothing A 81A 080A Scale-space extrema 4A 2916A 02933A Keypoint localization 3A 59A 50A 132B Orientation assignment 0450B 225B 673B The local frame descriptor 0450B 225B 705B Total

3056A +900B

50A +450B

3013A +1510

Fig.4.Mask for 5×5box kernel,its integral image reprentation of (a)original one (b)and modiﬁed one,and the corresponding pixel domain of (c)original one,(d)and modiﬁed one.

Table II shows the complexity analysis of our propod algorithm for parameters [octave,scale]=[2,4],with the comparison of VGA and full HD images in Table III.In Table II,C is the number of frame width in a group of down sample twice (C =W +(1/4)W =1(1/4)W ),and W is the width of the original frame.In the ﬁrst three steps,computation with integral image only us additions to eliminate all mul-tiplications and data dependent computation.In the keypoint localization,divisions to compute Taylor expansion are also eliminated.Beyond above computation reduction,we only need a few rows of the integral image for the box kernel bad computation instead the whole integral image.In summary,th

e propod algorithm can reduce a lot of computation and memory usage,which are uful for the following hardware implementation.

IV.P ROPOSED A RCHITECTURE

Fig.5shows the propod architecture that consists of two stages:scale-space extrema detection and keypoint localization in the stage one,and orientation assignment and the local frame descriptor in the stage two.In the stage one,we adopt the propod LPSIFT with integral image to reduce the com-putation signiﬁcantly,and further reduce the required

integral

Fig.5.Propod architecture

diagram.

Fig.6.Scheduling for a VGA image.

image buffer by the on-the-ﬂy computation scheduling.In the stage two,we adopt the propod brightness threshold to simplify computation and implement the normalization in keypoint localization with a low cost universal operation unit with precision equivalent cycles (PEC).

Fig.6shows the propod scheduling for a VGA image,operated row by row.Our propod design speciﬁcation is [octave,scale]=[2,4],and supports more than 1000feature points in one VGA frame with 30frames per cond at 100MHz.Each row spends 19cycles to calculate the initial information in the beginning.After that we ﬁnd the feature point candidates of the stage one.If a candidate is a feature point,the stage two starts to compute the required information of that feature point.When the former step completes,oper-ation is back to the stage one.In summary,the computation time of the stage one is not ﬁxed until a feature point is found,while the stage 2takes 520cycles.

C OMPUTATION C OMPLEXITY OF P ROPOSE

D A LGORITHM

Memory (byte)

Multiplication

Division Addition Smoothing 7C 0058A Integral image 80C 003A Scale-space extrema 89300383A Keypoint localization 2727A 012A Orientation assignment 256450B 225B 673B The local frame descriptor 0450B 225B 705B Total

87C +1176

27A +900B

450B

456A +1378B

TABLE III

C OMPARISON OF P ROPOSE

D AND SIFT A LGORITHM VGA Size Full HD Size

Mem.(byte)

Mul.Div.Add.Mem.(byte)Mul.Div.Add.SIFT 3.07E+06 1.17E+09 1.97E+07 1.16E+09 2.07E+077.92E+09 1.30E+087.81E+09Propod 7.08E+04 1.13E+07 4.50E+05 1.76E+08 2.10E+057.09E+07 4.50E+05 1.18E+09Saving (%)

97.7

微波炉可以做蛋挞吗99.0

97.7

84.8

99.0

99.1

99.7

84.9

Fig.7.Architecture of the stage one.

A.Stage One With On-the-Fly Computation Scheduling Fig.7shows the architecture of the stage one,which

applies the noi smoothing to the input image and builds the integral image for following ﬁltering operations.With enough rows of integral image,we begin to compute scale images and corresponding DoG images to ﬁnd feature point candidates at the same time as shown in Fig.8,called on-the-ﬂy computation.In which,when we calculate the last frame in the ﬁrst layer (G [0,3]),we will downsample this frame to the ﬁrst frame of next layer (G [1,0])simultaneously.Therefore,we not only compute different scales in an octave at the same time but also compute different octaves at the same time.With this on-the-ﬂy computation,we need only a few rows of buffer cost instead of multiple frames due to the Gaussian pyramid and DoG pyramid.

B.Architecture of the Stage Two

龙和蛇配婚姻好不好

Fig.9shows the design of the keypoint localization after obtaining the DoG.We can compute the feature point can-didate by examining conditions such as the local maximum and minimum value,low brightness and edge detection.Since all the conditions are independent to each other,we can compute each condition simultaneously with a ﬂag bit to denote if it matches the feature point criteria.If all ﬂag bits are zeros,that indicates the candidate is approved to be a new feature

point.

Fig.8.Parallel computation of the Gaussian pyramid,and DoG pyramid to ﬁnd feature

points.

Fig.9.Hardware architecture of the keypoint localization.

Fig.10shows the design of the orientation assignment.In this design,19×19pixels of integral image will be fetched again to re-compute 15×15information of scale 1since no scale image is stored.In this recomputation step,we will

本文发布于:2023-05-28 07:17:07，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/941106.html

上一篇：1+X智能网联汽车考前练习题与答案

下一篇：声发射源的定位方法

标签：苹果石浦收拢训练方法登陆宁波

留言与评论（共有 0 条评论）