(1984)Pyramid Methods in Image Processing

更新时间:2023-05-20 09:35:49 阅读：评论：0

E. H. Adelson | C. H. Anderson | J. R. Bergen | P. J. Burt | J. M. Ogden Pyramid methods in image processing The image pyramid offers a flexible, convenient multiresolution

format that mirrors the multiple scales of processing in the

human visual system.

D igital image processing is being ud in many domains today. In image enhance-ment, for example, a variety of methods now exist for removing image degrada-tions and emphasizing important image in-formation, and in computer graphics, dig-ital images can be generated, modified, and combined for a wide variety of visual effects. In data compression, images may be efficiently stored and transmitted if trans-lated into a compact digital code. In ma-chine vision, automatic inspection systems and robots can make simple decisions bad on the digitized input from a television camera.

But digital image processing is still in a developing state. In all of the areas just mentioned, many important problems re-main to be solved. Perhaps this is most obvious in the ca of machine vision: we still do not know how to build machines Abstract:The data structure ud to reprent image information can be critical to the successful completion of an image processing task. One structure that has attracted considerable attention is the image pyramid This consists of a t of lowpass or bandpas

s copies of an image, each reprenting pattern information of a different scale. Here we describe a variety of pyramid methods that we have developed for image data compression, enhancement, analysis and graphics.

Final manuscript received November 12, 1984

Reprint Re-29-6-5that can perform most of the routine vis-

ual tasks that humans do effortlessly.

It is becoming increasingly clear thatadvice的用法

the format ud to reprent image data

can be as critical in image processing as

初中英语单词the algorithms applied to the data. A dig-

ital image is initially encoded as an array

of pixel intensities, but this raw format is

not suited to most tasks. Alternatively, an

image may be reprented by its Fourier

transform, with operations applied to the

transform coefficients rather than to the

original pixel values. This is appropriate

for some data compression and image en-

hancement tasks, but inappropriate for

others. The transform reprentation is par-

ticularly unsuited for machine vision and

computer graphics, where the spatial loca-

tion of pattem elements is critical.

Recently there has been a great deal of

interest in reprentations that retain spa-

tial localization as well as localization in

the spatial—frequency domain. This is

achieved by decomposing the image into a

t of spatial frequency bandpass compo-

nent images. Individual samples of a com-

ponent image reprent image pattern in-

formation that is appropriately localized,

while the bandpasd image as a whole rep-

rents information about a particular fine-

ness of detail or scale. There is evidence

that the human visual system us such a

reprentation,1 and multiresolution sche-

mes are becoming increasingly popular in

machine vision and in image processing in

general.

The importance of analyzing images at

many scales aris from the nature of

images themlves. Scenes in the world

contain objects of many sizes, and the

objects contain features of many sizes.

Moreover, objects can be at various dis-

tances from the viewer. As a result, any

analysis procedure that is applied only at a

single scale may miss information at other

scales. The solution is to carry out analy-

s at all scales simultaneously.

Convolution is the basic operation of

most image analysis systems, and convo-

lution with large weighting functions is a

notoriously expensive computation. In a

multiresolution system one wishes to per-

form convolutions with kernels of many

kj是什么意思

sizes, ranging from very small to very

large. and the computational problems

appear forbidding. Therefore one of the

main problems in working with multires-

olution reprentations is to develop fast

and efficient techniques.

Members of the Advanced Image Pro-

cessing Rearch Group have been actively

involved in the development of multireso-

lution techniques for some time. Most of

the work revolves around a reprentation

known as a "pyramid," which is versatile,

convenient, and efficient to u. We have

applied pyramid-bad methods to some

fundamental problems in image analysis,

data compression, and image manipulation.

Image pyramids

The task of detecting a target pattern that

may appear at any scale can be approached

in veral ways. Two of the, which in-

volve only simple convolutions, are illus-

RCA Engineer • 29-6 • Nov/Dec 1984 33

Fig. 1. Two methods of arching for a target pattern over many scales. In the first approach, (a), copies of the target pattern are constructed at veral expanded scales, and each is convolved with the original image. In the cond approach, (b), a single copy of the target is convolved with copies of the image reduced in scale. The target should be just large enough to resolve critical detail

s The two ap-proaches should give equivalent results, but the cond is more efficient by the fourth power of the scale factor (image

convolutions are reprented by 'O').

trated in Fig. 1. Several copies of the pat-tern can be constructed at increasing scales, then each is c

onvolved with the image. Alternatively, a pattern of fixed size can be convolved with veral copies of the image reprented at correspondingly reduced re-solutions. The two approaches yield equi-valent results, provided critical information in the target pattern is adequately repre-nted. However, the cond approach is much more efficient: a given convolution with the target pattern expanded in scale by a factor s will require s4 more arith-metic operations than the corresponding convolution with the image reduced in

scale by a factor of s. This can be substan-

tial for scale factors in the range 2 to 32, a

commonly ud range in image analysis.

The image pyramid is a data structure

designed to support efficient scaled convo-

lution through reduced image reprenta-

tion. It consists of a quence of copies of

an original image in which both sample

density and resolution are decread in

regular steps. An example is shown in Fig.

2a. The reduced resolution levels of the

pyramid are themlves obtained through a

highly efficient iterative algorithm. The

bottom, or zero level of the pyramid, G

is equal to the original image. This is low-

pass-filtered and subsampled by a factor of

two to obtain the next pyramid level, G

is then filtered in the same way and

subsampled to obtain G

chine tea

. Further repeti-

tions of the filter/subsample steps generate

the remaining pyramid levels. To be pre-

ci, the levels of the pyramid are obtained

iteratively as follows. For 0 < l < N:

(1)

(i,j) ΣΣ

m n

w (m,n) G

l-1

(2i+m,2j+n)

However, it is convenient to refer to this

34RCA Engineer • 29-6 • Nov/Dec 1984

Fig. 2b. Levels of the Gaussian pyramid expanded to the size of the original image.

The effects of lowpass filtering are now clearly apparent.

Fig.3. Equivalent weighting functions.The process of constructing the Gaus-sian (lowpass) pyramid is equivalent to convolving the original image with a t of Gaussian-like weighting functions,then subsampling, as shown in (a). The weighting functions double in size with each increa in 1. The corresponding functions for the Laplacian pyramid re-mble the difference of two Gaussians,as shown in (b).process as a standard REDUCE opera- tion, and simply write

G l = REDUCE [G l -1].We call the weighting function w (m ,n )the "generating kernel." For reasons of computational efficiency this should be small and parable. A five-tap filter was ud to generate the pyramid in Fig. 2a.Pyramid construction is equivalent to convolving the original image with a t of Gaussian-like weighting functions. The

"equivalent weighting functions" for three

successive pyramid levels are shown in Fig. 3a. Note that the functions double in

width with each level. The convolution

acts as a lowpass filter with the band limit

reduced correspondingly by one octave with each level. Becau of this remblance to the Gaussia

n density function we refer to the pyramid of lowpass images as the "Gaussian pyramid."Bandpass, rather than lowpass, images are required for many purpos. The may be obtained by subtracting each Gaussian (lowpass) pyramid level from the next-lower level in the pyramid. Becau the levels differ in their sample density it is necessary to interpolate new sample values between tho in a given level before that level is subtracted from the next-lower level. Interpolation can be achieved by reversing the REDUCE process. We call this an EXPAND operation. Let G l ,k be the image obtained by expanding G l k times. Then G l ,k = EXPAND [G G l ,k -1] or, to be preci, G l ,0 = G l , and for k >0,(2)G l ,k (i ,j ) = 4 ΣΣm n G l ,k -1 ( 2222i m j n ++, )

Here only terms for which (2i+m)/2 and

(2j+n)/2 are integers contribute to the sum. The expand operation doubles the size of the image with each iteration, so

that G l ,1, is the size of G l ,1, and G l ,1 is the

same size as that of the original image.

Examples of expanded Gaussian pyramid

levels are shown in Fig. 2b.

The levels of the bandpass pyramid, L 0,

L 1, ...., L N , may now be specified in terms

of the lowpass pyramid levels as follows:

L l = G l —EXPAND [G l +1] (3)= G l —G l +1,1.

The first four levels are shown in Fig. 4a.Just as the value of each node in the Gaussian pyramid could have been ob-tained directly by convolving a Gaussian-like equivalent weighting function with the original image, each value of this bandpass pyramid could be obtained by convolving a difference of two Gaussians with the original image. The functions cloly remble the Laplacian operators common-ly ud in image processing (Fig. 3b). For this reason we refer to the bandpass pyra-mid as a "Laplacian pyramid."An important property of the Laplacian pyramid is that it is a complete image

2013年考研成绩reprentation: the steps ud to construct the pyramid may be reverd to recover

the original image exactly. The top pyra-mid level, L N

, is first expanded and added

to L N -1 to form G N -1 then this array is expanded and added to L N -2 to recover G N -2, and so on. Alternatively, we may write G 0 = ∑ L l ,l (4)The pyramid has been introduced here as

a data structure for supporting scaled image

analysis. The same structure is well suited for a variety of other image processing

tasks. Applications in data compression and graphics, as well as in image analysis,will be described in the following ctions.It can be shown that the pyramid-building procedures described here have significant advantages over other approaches to scaled analysis in terms of both computation cost and complexity. The pyramid levels are obtained with fewer steps through repeated REDUCE and EXPAND operations than is possible with the standard FFT. Further-more, direct convolution with large equiva-lent weighting functions requires 20- to 30-bit arithmetic to maintain the same ac-Adelson et al.: Pyramid methods in image processing

prenomen

Fig. 4b.Levels of the Laplacian pyramid expanded to the size of the original image. Note that edge and bar features are enhanced and gregated by size.

curacy as the cascade of convolutions with the small generating kernel using just 8-bit arithmetic.

A compact code

The Laplacian pyramid has been described as a data structure compod of bandpass copies of an image that is well suited for scaled-image analysis. But the pyramid may also be viewed as an image transform-ation, or code. The pyramid nodes are then considered code elements, and the equiva-lent weighting functions are sampling functions that give node values when con-volved with the image. Since the original image can be exactly reconstructed from it's

pyramid reprentation (Eq. 4), the pyramid

code is complete.

There are two reasons for transforming

an image from one reprentation to an-refuto

other: the transformation may isolate criti-

cal components of the image pattern so

they are more directly accessible to analy-

sis, or the transformation may place the

data in a more compact form so that they

can be stored and transmitted more effi-

ciently. The Laplacian pyramid rves both

of the objectives. As a bandpass filter,

pyramid construction tends to enhance

image features, such as edges, which are

important for interpretation. The features

are gregated by scale in the various pyra-

mid levels, as shown in Fig. 4. As with the

Fourier transform, pyramid code elements

reprent pattern components that are res-

tricted in the spatial-frequency domain. But

unlike the Fourier transform, pyramid code

elements are also restricted to local regions

in the spatial domain. Spatial as well as

spatial-frequency localization can be critical

in the analysis of images that contain

multiple objects so that code elements will

tend to reprent characteristics of single

objects rather than confound the characteris-

tics of many objects.

The pyramid reprentation also permits

harbin weatherdata compression.3 Although it has one

36 RCA Engineer • 29-6 • Nov/Dec 1984

Fig. 5.Pyramid data compression. The original image reprented at 8 bits per-pixel is shown in (a). The node values of tbe Laplacian pyramid reprentation of this image were quantitized to obtain effective data rates of 1 b/p and 1/2 b/p. Reconstructed images (b) and (c) show relatively little degradation.

third more sample elements than the orig-inal image, the values of the samples tend to be near zero, and therefore can be reprented with a small number of bits. Further data compression can be obtained through quantization: the number of dis-tinct values taken by samples is reduced by binning the existing values. This results in some degradation when the image is reconstructed, but if

the quantization bins are carefully chon, the degradation will not be detectable by human obrvers and will not affect the performance of analysis algorithms.

Figure 5 illustrates an application of the pyramid to data compression for image transmission. The original image is shown in Fig. 5a. A Laplacian pyramid repren-tation was constructed for this image, then the values were quantized to reduce the effective data rate to just one bit per pixel, then to one-half bit per pixel. Images recon-structed from the quantized data are shown in Figs. 5b and 5c. Humans tend to be more nsitive to errors in low-frequency image components than in high-frequency components. Thus in pyramid compression, nodes at level zero can be quantized more coarly than tho in higher levels. This is fortuitous for compression since three-quart-ers of the pyramid samples are in the zero level.

Data compression through quantization may also be important in image analysis to reduce the number of bits of precision carried in arithmetic operations. For exam-ple, in a study of pyramid-bad image motion analysis it was found that data could be reduced to just three bits per sample without noticeably degrading the computed flow field.4

The examples suggest that the pyra-

mid is a particularly effective way of repre-

nting image information both for trans-

mission and analysis. Salient information

is enhanced for analysis, and to the extent

that quantization does not degrade analy-

sis, the reprentation is both compact and

robust.

Image analysis

Pyramid methods may be applied to anal-

ysis in veral ways. Three of the will be

outlined here. The first concerns pattern

matching and has already been mentioned:

to locate a particular target pattern that

may occur at any scale within an image,

the pattern is convolved with each level of

the image pyramid. All levels of the pyra-

mid combined contain just one third more

nodes than there are pixels in the original

image. Thus the cost of arching for a

pattern at many scales is just one third

more than that of arching the original

image alone.

The complexity of the patterns that may

be found in this way is limited by the fact

that not all image scales are reprented in

the pyramid. As defined here, pyramid

levels differ in scale by powers of two, or

by octave steps in the frequency domain.

Power-of-two steps are adequate when the

patterns to be located are simple, but com-

plex patterns require a clor match be-

tween the scale of the pattern as defined in

韩语大婶怎么说

the target array, and the scale of the pat-

tern as it appears in the image. Variants on

the pyramid can easily be defined with

squareroot-of-two and smaller steps. How-

ever, the not on]y have more levels, but

many more samples, and the computational

cost of image processing bad on such

pyramids is correspondingly incread.

A cond class of operations concerns

the estimation of integrated properties

within local image regions. For example, a

texture may often be characterized by local

density or energy measures. Reliable esti-

mates of image motion also require the

integration of point estimates of displace-

ment within regions of uniform motion. In

such cas early analysis can often be

formulated as a three-stage quence of

standard operations. First, an appropriate

pattern is convolved with the image (or

images, in the ca of motion analysis).

This lects a particular pattern attribute to

be examined in the remaining two stages.

Second, a nonlinear intensity transforma-

tion is performed on each sample value.

Operations may include a simple threshold

to detect the prence of the target pattern,

a power function to be ud in computing

texture energy measures, or the product of

corresponding samples in two images ud

in forming correlation measures for motion

analysis. Finally the transformed sample

values are integrated within local windows

to obtain the desired local property

measures.

Pattern scale is an important parameter

of both the convolution and integration

stages. Pyramid-bad processing may be

employed at each of the stages to facili-

tate scale lection and to support efficient

computation. A flow diagram for this three-

stage analysis is given in Fig. 6. Analysis

begins with the construction of the pyramid

reprentation of the image. A feature pat-

adventurer

tern is then convolved with each level of the

pyramid (Stage 1), and the resulting

correlation values may be pasd through

Adelson et al.: Pyramid methods in image processing 37

本文发布于:2023-05-20 09:35:49，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/705835.html

上一篇：LGBMClassifier参数

下一篇：REDSHIFT学习笔记-渲染设置1_CommonOutput

标签：考研成绩

留言与评论（共有 0 条评论）