E. H. Adelson | C. H. Anderson | J. R. Bergen | P. J. Burt | J. M. Ogden Pyramid methods in image processing The image pyramid offers a flexible, convenient multiresolution
format that mirrors the multiple scales of processing in the
human visual system.
D igital image processing is being ud in many domains today. In image enhance-ment, for example, a variety of methods now exist for removing image degrada-tions and emphasizing important image in-formation, and in computer graphics, dig-ital images can be generated, modified, and combined for a wide variety of visual effects. In data compression, images may be efficiently stored and transmitted if trans-lated into a compact digital code. In ma-chine vision, automatic inspection systems and robots can make simple decisions bad on the digitized input from a television camera.
But digital image processing is still in a developing state. In all of the areas just mentioned, many important problems re-main to be solved. Perhaps this is most obvious in the ca of machine vision: we still do not know how to build machines Abstract:The data structure ud to reprent image information can be critical to the successful completion of an image processing task. One structure that has attracted considerable attention is the image pyramid This consists of a t of lowpass or bandpas
s copies of an image, each reprenting pattern information of a different scale. Here we describe a variety of pyramid methods that we have developed for image data compression, enhancement, analysis and graphics.
©1984 RCA Corporation
Final manuscript received November 12, 1984
Reprint Re-29-6-5that can perform most of the routine vis-
ual tasks that humans do effortlessly.
It is becoming increasingly clear thatadvice的用法
the format ud to reprent image data
can be as critical in image processing as
初中英语单词the algorithms applied to the data. A dig-
ital image is initially encoded as an array
of pixel intensities, but this raw format is
not suited to most tasks. Alternatively, an
image may be reprented by its Fourier
transform, with operations applied to the
transform coefficients rather than to the
original pixel values. This is appropriate
for some data compression and image en-
hancement tasks, but inappropriate for
others. The transform reprentation is par-
ticularly unsuited for machine vision and
computer graphics, where the spatial loca-
tion of pattem elements is critical.
Recently there has been a great deal of
interest in reprentations that retain spa-
tial localization as well as localization in
the spatial—frequency domain. This is
achieved by decomposing the image into a
t of spatial frequency bandpass compo-
nent images. Individual samples of a com-
ponent image reprent image pattern in-
formation that is appropriately localized,
while the bandpasd image as a whole rep-
rents information about a particular fine-
ness of detail or scale. There is evidence
that the human visual system us such a
reprentation,1 and multiresolution sche-
mes are becoming increasingly popular in
machine vision and in image processing in
general.
The importance of analyzing images at
many scales aris from the nature of
images themlves. Scenes in the world
contain objects of many sizes, and the
objects contain features of many sizes.
Moreover, objects can be at various dis-
tances from the viewer. As a result, any
analysis procedure that is applied only at a
single scale may miss information at other
scales. The solution is to carry out analy-
s at all scales simultaneously.
Convolution is the basic operation of
most image analysis systems, and convo-
lution with large weighting functions is a
notoriously expensive computation. In a
multiresolution system one wishes to per-
form convolutions with kernels of many
kj是什么意思
sizes, ranging from very small to very
large. and the computational problems
appear forbidding. Therefore one of the
main problems in working with multires-
olution reprentations is to develop fast
and efficient techniques.
Members of the Advanced Image Pro-
cessing Rearch Group have been actively
involved in the development of multireso-
lution techniques for some time. Most of
the work revolves around a reprentation
known as a "pyramid," which is versatile,
convenient, and efficient to u. We have
applied pyramid-bad methods to some
fundamental problems in image analysis,
data compression, and image manipulation.
Image pyramids
The task of detecting a target pattern that
may appear at any scale can be approached
in veral ways. Two of the, which in-
volve only simple convolutions, are illus-
RCA Engineer • 29-6 • Nov/Dec 1984 33
Fig. 1. Two methods of arching for a target pattern over many scales. In the first approach, (a), copies of the target pattern are constructed at veral expanded scales, and each is convolved with the original image. In the cond approach, (b), a single copy of the target is convolved with copies of the image reduced in scale. The target should be just large enough to resolve critical detail
s The two ap-proaches should give equivalent results, but the cond is more efficient by the fourth power of the scale factor (image
convolutions are reprented by 'O').
trated in Fig. 1. Several copies of the pat-tern can be constructed at increasing scales, then each is c
onvolved with the image. Alternatively, a pattern of fixed size can be convolved with veral copies of the image reprented at correspondingly reduced re-solutions. The two approaches yield equi-valent results, provided critical information in the target pattern is adequately repre-nted. However, the cond approach is much more efficient: a given convolution with the target pattern expanded in scale by a factor s will require s4 more arith-metic operations than the corresponding convolution with the image reduced in
scale by a factor of s. This can be substan-
tial for scale factors in the range 2 to 32, a
commonly ud range in image analysis.
The image pyramid is a data structure
designed to support efficient scaled convo-
lution through reduced image reprenta-
tion. It consists of a quence of copies of
an original image in which both sample
density and resolution are decread in
regular steps. An example is shown in Fig.
2a. The reduced resolution levels of the
pyramid are themlves obtained through a
highly efficient iterative algorithm. The
bottom, or zero level of the pyramid, G
,
is equal to the original image. This is low-
pass-filtered and subsampled by a factor of
two to obtain the next pyramid level, G
1
.
G
1
is then filtered in the same way and
subsampled to obtain G
chine tea
2
. Further repeti-
tions of the filter/subsample steps generate
the remaining pyramid levels. To be pre-
ci, the levels of the pyramid are obtained
iteratively as follows. For 0 < l < N:
(1)
G
l
(i,j) ΣΣ
m n
w (m,n) G
l-1
(2i+m,2j+n)
However, it is convenient to refer to this
34RCA Engineer • 29-6 • Nov/Dec 1984
Fig. 2b. Levels of the Gaussian pyramid expanded to the size of the original image.
The effects of lowpass filtering are now clearly apparent.
Fig.3. Equivalent weighting functions.The process of constructing the Gaus-sian (lowpass) pyramid is equivalent to convolving the original image with a t of Gaussian-like weighting functions,then subsampling, as shown in (a). The weighting functions double in size with each increa in 1. The corresponding functions for the Laplacian pyramid re-mble the difference of two Gaussians,as shown in (b).process as a standard REDUCE opera- tion, and simply write
G l = REDUCE [G l -1].We call the weighting function w (m ,n )the "generating kernel." For reasons of computational efficiency this should be small and parable. A five-tap filter was ud to generate the pyramid in Fig. 2a.Pyramid construction is equivalent to convolving the original image with a t of Gaussian-like weighting functions. The
"equivalent weighting functions" for three
successive pyramid levels are shown in Fig. 3a. Note that the functions double in
width with each level. The convolution
acts as a lowpass filter with the band limit
reduced correspondingly by one octave with each level. Becau of this remblance to the Gaussia
n density function we refer to the pyramid of lowpass images as the "Gaussian pyramid."Bandpass, rather than lowpass, images are required for many purpos. The may be obtained by subtracting each Gaussian (lowpass) pyramid level from the next-lower level in the pyramid. Becau the levels differ in their sample density it is necessary to interpolate new sample values between tho in a given level before that level is subtracted from the next-lower level. Interpolation can be achieved by reversing the REDUCE process. We call this an EXPAND operation. Let G l ,k be the image obtained by expanding G l k times. Then G l ,k = EXPAND [G G l ,k -1] or, to be preci, G l ,0 = G l , and for k >0,(2)G l ,k (i ,j ) = 4 ΣΣm n G l ,k -1 ( 2222i m j n ++, )
Here only terms for which (2i+m)/2 and
(2j+n)/2 are integers contribute to the sum. The expand operation doubles the size of the image with each iteration, so
that G l ,1, is the size of G l ,1, and G l ,1 is the
same size as that of the original image.
Examples of expanded Gaussian pyramid
levels are shown in Fig. 2b.
The levels of the bandpass pyramid, L 0,
L 1, ...., L N , may now be specified in terms
of the lowpass pyramid levels as follows:
L l = G l —EXPAND [G l +1] (3)= G l —G l +1,1.
The first four levels are shown in Fig. 4a.Just as the value of each node in the Gaussian pyramid could have been ob-tained directly by convolving a Gaussian-like equivalent weighting function with the original image, each value of this bandpass pyramid could be obtained by convolving a difference of two Gaussians with the original image. The functions cloly remble the Laplacian operators common-ly ud in image processing (Fig. 3b). For this reason we refer to the bandpass pyra-mid as a "Laplacian pyramid."An important property of the Laplacian pyramid is that it is a complete image
2013年考研成绩reprentation: the steps ud to construct the pyramid may be reverd to recover
the original image exactly. The top pyra-mid level, L N
, is first expanded and added
to L N -1 to form G N -1 then this array is expanded and added to L N -2 to recover G N -2, and so on. Alternatively, we may write G 0 = ∑ L l ,l (4)The pyramid has been introduced here as
a data structure for supporting scaled image
analysis. The same structure is well suited for a variety of other image processing
tasks. Applications in data compression and graphics, as well as in image analysis,will be described in the following ctions.It can be shown that the pyramid-building procedures described here have significant advantages over other approaches to scaled analysis in terms of both computation cost and complexity. The pyramid levels are obtained with fewer steps through repeated REDUCE and EXPAND operations than is possible with the standard FFT. Further-more, direct convolution with large equiva-lent weighting functions requires 20- to 30-bit arithmetic to maintain the same ac-Adelson et al.: Pyramid methods in image processing
35
prenomen
Fig. 4b.Levels of the Laplacian pyramid expanded to the size of the original image. Note that edge and bar features are enhanced and gregated by size.
curacy as the cascade of convolutions with the small generating kernel using just 8-bit arithmetic.
A compact code
The Laplacian pyramid has been described as a data structure compod of bandpass copies of an image that is well suited for scaled-image analysis. But the pyramid may also be viewed as an image transform-ation, or code. The pyramid nodes are then considered code elements, and the equiva-lent weighting functions are sampling functions that give node values when con-volved with the image. Since the original image can be exactly reconstructed from it's
pyramid reprentation (Eq. 4), the pyramid
code is complete.
There are two reasons for transforming
an image from one reprentation to an-refuto
other: the transformation may isolate criti-
cal components of the image pattern so
they are more directly accessible to analy-
sis, or the transformation may place the
data in a more compact form so that they
can be stored and transmitted more effi-
ciently. The Laplacian pyramid rves both
of the objectives. As a bandpass filter,
pyramid construction tends to enhance
image features, such as edges, which are
important for interpretation. The features
are gregated by scale in the various pyra-
mid levels, as shown in Fig. 4. As with the
Fourier transform, pyramid code elements
reprent pattern components that are res-
tricted in the spatial-frequency domain. But
unlike the Fourier transform, pyramid code
elements are also restricted to local regions
in the spatial domain. Spatial as well as
spatial-frequency localization can be critical
in the analysis of images that contain
multiple objects so that code elements will
tend to reprent characteristics of single
objects rather than confound the characteris-
tics of many objects.
The pyramid reprentation also permits
harbin weatherdata compression.3 Although it has one
36 RCA Engineer • 29-6 • Nov/Dec 1984
Fig. 5.Pyramid data compression. The original image reprented at 8 bits per-pixel is shown in (a). The node values of tbe Laplacian pyramid reprentation of this image were quantitized to obtain effective data rates of 1 b/p and 1/2 b/p. Reconstructed images (b) and (c) show relatively little degradation.
third more sample elements than the orig-inal image, the values of the samples tend to be near zero, and therefore can be reprented with a small number of bits. Further data compression can be obtained through quantization: the number of dis-tinct values taken by samples is reduced by binning the existing values. This results in some degradation when the image is reconstructed, but if
the quantization bins are carefully chon, the degradation will not be detectable by human obrvers and will not affect the performance of analysis algorithms.
Figure 5 illustrates an application of the pyramid to data compression for image transmission. The original image is shown in Fig. 5a. A Laplacian pyramid repren-tation was constructed for this image, then the values were quantized to reduce the effective data rate to just one bit per pixel, then to one-half bit per pixel. Images recon-structed from the quantized data are shown in Figs. 5b and 5c. Humans tend to be more nsitive to errors in low-frequency image components than in high-frequency components. Thus in pyramid compression, nodes at level zero can be quantized more coarly than tho in higher levels. This is fortuitous for compression since three-quart-ers of the pyramid samples are in the zero level.
Data compression through quantization may also be important in image analysis to reduce the number of bits of precision carried in arithmetic operations. For exam-ple, in a study of pyramid-bad image motion analysis it was found that data could be reduced to just three bits per sample without noticeably degrading the computed flow field.4
The examples suggest that the pyra-
mid is a particularly effective way of repre-
nting image information both for trans-
mission and analysis. Salient information
is enhanced for analysis, and to the extent
that quantization does not degrade analy-
sis, the reprentation is both compact and
robust.
Image analysis
Pyramid methods may be applied to anal-
ysis in veral ways. Three of the will be
outlined here. The first concerns pattern
matching and has already been mentioned:
to locate a particular target pattern that
may occur at any scale within an image,
the pattern is convolved with each level of
the image pyramid. All levels of the pyra-
mid combined contain just one third more
nodes than there are pixels in the original
image. Thus the cost of arching for a
pattern at many scales is just one third
more than that of arching the original
image alone.
The complexity of the patterns that may
be found in this way is limited by the fact
that not all image scales are reprented in
the pyramid. As defined here, pyramid
levels differ in scale by powers of two, or
by octave steps in the frequency domain.
Power-of-two steps are adequate when the
patterns to be located are simple, but com-
plex patterns require a clor match be-
tween the scale of the pattern as defined in
韩语大婶怎么说
the target array, and the scale of the pat-
tern as it appears in the image. Variants on
the pyramid can easily be defined with
squareroot-of-two and smaller steps. How-
ever, the not on]y have more levels, but
many more samples, and the computational
cost of image processing bad on such
pyramids is correspondingly incread.
A cond class of operations concerns
the estimation of integrated properties
within local image regions. For example, a
texture may often be characterized by local
density or energy measures. Reliable esti-
mates of image motion also require the
integration of point estimates of displace-
ment within regions of uniform motion. In
such cas early analysis can often be
formulated as a three-stage quence of
standard operations. First, an appropriate
pattern is convolved with the image (or
images, in the ca of motion analysis).
This lects a particular pattern attribute to
be examined in the remaining two stages.
Second, a nonlinear intensity transforma-
tion is performed on each sample value.
Operations may include a simple threshold
to detect the prence of the target pattern,
a power function to be ud in computing
texture energy measures, or the product of
corresponding samples in two images ud
in forming correlation measures for motion
analysis. Finally the transformed sample
values are integrated within local windows
to obtain the desired local property
measures.
Pattern scale is an important parameter
of both the convolution and integration
stages. Pyramid-bad processing may be
employed at each of the stages to facili-
tate scale lection and to support efficient
computation. A flow diagram for this three-
stage analysis is given in Fig. 6. Analysis
begins with the construction of the pyramid
reprentation of the image. A feature pat-
adventurer
tern is then convolved with each level of the
pyramid (Stage 1), and the resulting
correlation values may be pasd through
Adelson et al.: Pyramid methods in image processing 37