首页 > 美文鉴赏

A uniform transform domain video codec bad on dual tree complex wavelet transform

更新时间:2023-07-26 14:46:11 阅读：评论：0

A UNIFORM TRANSFORM DOMAIN VIDEO CODEC BASED ON DUAL

TREE COMPLEX W A VELET TRANSFORM

Kamakshi Sivaramakrishnan†and Truong Nguyen‡

Boston University

Electrical and Computer Engineering Department

Boston MA02215

†kamakshi@bu.edu,‡nguyent@engc.bu.edu

ABSTRACT

This paper describes a uniform transform domain Video

Codec where the motion estimation/compensation(ME)is performed in the transform domain.The estimation tech-nique discusd here is a subpixel transform domain ME bad on the Dual Tree Complex Wavelet Transform(DT CWT)and a maximum pha correlation technique.The DT CWT is a

multiresolutionﬁne-to-coar bandpassﬁl-tered decomposition of each still frame and has desirable properties of shiftablility,directional lectivity and per-fect reconstruction(PR).Estimation isﬁrst performed at theﬁnest resolution and successively proceeds to the coar resolutions using aﬁne-to-coar strategy.This gives mul-tiresolution motion estimates enabling estimation of current frame transform coeﬃcients from the corresponding ones in previous frame.The key diﬀerence of this approach is the transform domain error frames-a uniform transform do-main Video Codec.This further simpliﬁes the encoder and decoder resulting in computational savings with compara-ble performance to the standards.

1.INTRODUCTION

The intensity-bad block matching(BKM)ME techniques are a good approximation of the underlying3-D motion. However the ambiguity of the very measure of motion(pat-tern of intensity changes)may result in an erroneous rep-rentation of the2-D projected motion.The conventional hybrid video coding techniques like the MPEG and H.263 compri of a spatial domain ME(SD-ME)block which in-states a heavily loaded feedback loop in the encoder shown in Fig.1(a).The Inver transform(T−1),is solely for the purpo of spatial domain ME and has been long recog-nized as a major bottleneck of the video coding system for high speed real-time video networks.A po

ssible solution to this problem is to perform the motion estimation in the transform domain(TD-ME),thus moving the transform (T)block out of the loop and removing the(T−1)block. This results in a uniform transform-domain Video codec cir-cumventing the shortcomings in the conventional codec.In this paper,we propo a modiﬁed transform domain Video codec as shown in the Fig.1(b).

课题研究方案

An approach to the problem of transform domain ME is the pha-matching technique,which is feasible if local dis-placements in the spatial domain can be approximated by

Q-1

T-1

SD-ME Buffer

Heavily loaded feedback loop Entropy

Encode

WDR

I/P

Spatial domain

frames

error

CWT

Simplified feedback loop

Q-1

Buffer

TD-ME

Transform

domain

error

Entropy

Encode

WDR

I/P

Hybrid Video Codec Propod Video Codec

(a)(b)

Fig.1.Block diagram of(a)conventional and(b)Propod Video encoder

the linear pha-shifts in the transform domain.Adelson and Bergen et.al[1]designed a bank of spatio-temporal ﬁlters that decompod the signal into channels that are tuned to diﬀerent speeds and orientations and hence repre-ntative of the underlying motion components in the sig-nal.This gives ri to the pha-bad opticalﬂow con-straint

(∇φi)T v+

∂φi

∂t

=0(1) whereφi is the output(transform coeﬃcients)of the channel i and v is the corresponding component of the mo-tion(displacement)vector.This idea was exploited by Ma-gaurey et.al.[2]in a multiresolution coar-to-ﬁne strat-egy to increa the range of estimation.Their technique however suﬀers from the deﬁciencies of a coar-to-ﬁne ME strategy,viz:-

•Inaccurate ME at the coarst resolution,due to the lack of detail and aliasing eﬀects.

•ME performed using coded reference frames involves quantization noi which aﬀects coar resolutions the

most.

•Finite extent subbandﬁlters introduce aliasing com-ponents at coarr resolutions preventing resolution

scaling of translational motion.

•Not applicable for a complete uniform Video Codec as the error frame of the estimation is still in the

spatial domain.

Also,the transform ud in[2],is a complex-valued transform and hence not compatible with standards.The problems have been addresd in theﬁne-to-coar DT CWT-bad ME technique that we describe in this paper.

2.DUAL TREE COMPLEX W A VELET

TRANSFORM(DT CWT)

In order to perform estimation in the transform domain,we need a“shift-invariant”transform.This eliminates the real Discrete Wavelet Transform(DWT)which does not exhibit shift-invariance as it violates the Nyquist sampling rate[3] and information moves from one subband to another un-der translation.It has been shown by Simoncelli et.al[3] that there must be a relaxation of the critical sampling con-dition to achieve approximate shift-invariance resulting in abandoning orthogonality.Kingsbury et.al in[4],has de-signed an approximately shift-invariant implementation of t

he wavelet transform,viz:-Dual Tree Complex Wavelet Transform.It has the added advantage of perfect recon-struction(PR)and directional lectivity.The PR prop-erty is very important for the ME technique suggested in this paper.The transform and its shiftability is brieﬂy dis-cusd in this ction,the details of which can be found in [4],[3].

Approximate shift invariance is possible with a real DWT by“doubling”the sampling rate at each level of the tree.“Doubling”can be achieved by eliminating the downsam-pling operator after theﬁrst level of decomposition provided the samples are uniformly spaced.This is equivalent to two fully-decimated trees(two real DWTs)provided theﬁlters in the two trees meet the delay criterion to maintain uni-form intervals between samples.The dual tree of realﬁlters is shown in the Fig.2.

Fig.2.Dual tree of realﬁlters for the DT CWT,ﬁlters of even and odd length alternate at successive levels The uniform sampling condition alongwith linear pha requirements impo conditions[4]on the length of theﬁl-ters in the two trees.Greater symmetry considerations re-sult in the alternating even and odd lengthﬁlters in DT CWT.Theﬁlter respons are Gausssian-shaped rembling the real and imaginary parts of the complexﬁlters in[2]and are shown in

平行四边形的概念Fig.3

05101520253035

−0.3

−0.2

−0.1

0.1

0.2

0.3

__ real

.... Imag

−− Abs

Fig.3.Impul respons at level4of the DT CWT scaling and wavelet function.

The transform is complex if the output from tree a is considered to be the real part and that from tree b,the imaginary part of the transform coeﬃcient.Alternatively it could be considered to be a limited redundancy oversampled real transform.

2.1.Shiftability of CWT

Simoncelli et.al in[3]have shown that a transform is shiftable if and only if there exists a t of interpolation functions that interpolate the“non-translated”transform coeﬃcients to give the translated

coeﬃcients for any arbitrary transla-tion x0of the spatial domain input signal.This condition is summarized here for a periodic input signal,f(x).The sig-nal f(x)when transformed using the projection functions corresponding to shifted copies(basis functions)of the ker-nel,h(x):

{h(n−∆x−x)|n=0,1,...,N−1}

over a period[0,2π]and sampling interval,∆x=2π

gives transform coeﬃcients,C(n).Thus,

C(n)= 2π0dxh(n∆x−x)f(x),n∈0,1,...,N−1(2)

Simoncelli et.al in[3],have shown that the above trans-formation is shiftable if

h(x0−x)=ΣN−1

n=0

b n(x0)h(n∆x−x)(3)

where x0is the arbitrary shift,b n(x0)is the interpolant.

That is,the arbitrary shifted kernel h(x0−x)can be expresd as a linear combination of the basis functions h(x−n∆x).This implies that the sampled basis t spans the entire subspace of all translations of the kernel.Equiv-alently,the linear subspace spanned by the sample basis is invariant to translation.

Using equation(2)in equation(3),we get

C(n0−n)=ΣN−1

k=0

b k(n0∗

2π

)C(k−n)(4) where n0=x0∗N is the corresponding shift in the transform domain.

If a solution to the above equation exists,then the trans-form is shiftable.In an aliad transform (real DWT),the respon power depends on the signal position:translation of the input signal generally results in a re-distribution of the power content amongst the various frequency subbands.The shiftability constraint is equal to the fact that power of the transform coeﬃcients in the subband is prerved when the input signal is shifted in position [3].This is demon-strated by the DT CWT while the real DWT exhibits a periodic variation in the power content of the transform co-eﬃcients with translations of the input signal and is shown in the Fig.4(a)and (b).This constraint is ud in the ap-proximation of the spatial motion to a linear pha change,giving a clod form expression for the motion vector esti-mate.

2030405060

−0.03

−0.02−0.01

0.010.020.03Translations of I/P signal

P o w e r c o n t e n t o f s u b b a n d 1

国际篮球规则

Level 4剖腹产多久能怀二胎

(a)(b)

Fig.4.(a)Constant power of DT CWT coeﬃcients (b)Pe-riodic power variation of Real DWT coeﬃcients v/s trans-lation of the spatial domain signal

Thus,the DT CWT gives us a “real-valued”transform that is approximately shift-invariant !

3.FINE-TO-COARSE MOTION ESTIMATION The solution to equation (4)is the key to the problem of ME in transform domain and more importantly,it is necessary that we look for a computationally inexpensive interpolant,b n (x 0).The zero-order interpolation technique discusd by Magaurey et.al [2]describes a simple single subpel interpo-lation technique which is good only for small range motion vectors,n 0.Here,we describe a higher order interpolation technique that is more robust and it averages out the noi eﬀects in the transform coeﬃcients.The error measure is the subband squared diﬀerence (SD ),similar to that in [2]SD (n,m )(n ,n 0)=|C (n,m )

(n +n 0)−C (n,m )

(n 0)|2

(5)

where C (n,m )

1is the subpel of the reference frame (un-shifted)at a level of decomposition m and subband n cen-tered around the position n =(n 1,n 2)and C (n,m )

2is the subpel of the current frame (shifted)with similar attributes.Here,n 0is the translation in the transform domain corre-sponding to a motion in the spatial domain.

From equations (3)and (2)

C (n,m )(n +n 0)=Σk B (n,m )

n 0

(k )C (n,m )(n +k )(6)

where B (n,m )

f is the interpolant at subband n and reso-lution m Becau of the Gaussian-shaped impul respon

of the CWT ﬁlters,Magaurey et.al showed that the inter-polant can be expresd as

B (n,m )

n 0

(k )=K n 0(k )e j 2m

[Ω(n,m )]T (n 0−k )墨洛温王朝

(7)

K n 0(k )is the interpolating kernel and Ω(n,m )is the

center frequency [2],along which the pha is a constant and equal to the orientation of the subband.If we weight the surrounding subpels by their pha contribution (measured by the inner product of the center frequency and position in the subband),we get a simple higher order and robust interpolation ,

K (n,m )n 0

(k )=

e j 2

[Ω(n,m )]T (k )

Now,

B (n,m )

n 0

(k )=e j 2m

(Ω(n,m ))T k

(k )e j 2

[Ω(n,m )]T (n 0−k )

(8)

Equation (6)now becomes

C (n,m )(n +n 0)≈e j 2

施工准备工作m

(Ω(n,m ))T (n 0)

Σk C (n,m )(n +k )

(9)

The above interpolation formula and the constant subband power (ction 2.1)is ud in locating the minimum of the SSD surface (equation 5),SD (m )(n ,n 0).Expanding equation(5)it can be shown that minimizing the SSD is equivalent to maximizing the pha correlation of the com-plex coeﬃcients.Using the model for the interpolated pha (equation(9)),the motion estimate,n 0is expresd as a lin-ear relation to the interpolated pha,θ(n,m )(n )

2m (Ω(n,m ))T n 0=

θ(n,m )(n )(10)

where,θ(n,m )(n )

=C (n,m )

(n )

Σk C (n,m )

1(n +k )

(11)

In 2-D,since we have six subbands from the DT CWT

decomposition,the motion estimate n 0corresponding to the subpel n at any resolution is obtained by averaging over all the six subbands and is given by

v =arg min n

Σ6n =1SD

(n,m )

(n ,n 0)(12)

Using the linear pha model,a clod-form expression

for the minimizer of SD (n,m )is obtained [5],[2].The ad-vantage of the pha model in equation (10)is that is not limited by the range of the motion vector n 0.Hence we need not perform any further

reﬁnement to obtain a true minimum.

3.1.Algorithm Structure

The block diagram of the transform domain ME is shown in Fig.5which is lf-explanatory in itlf.The motion com-pensation is done using a wavelet bad lowpass/bandpass interpolation [6]technique.As shown in Fig.5,the mo-tion compensation is performed using the motion vectors and the reference frame transform coeﬃcients to generate the predicted frame transform coeﬃcients.The transform

)

Fig.5.Block Diagram of the propod ME technique

domain error frames thus generated are coded using the Wavelet Diﬀerence Reduction[7]algorithm.Thus,we have transform domain error frames and motion vectors(MV)of transform coeﬃcients!

4.RESULTS

The result of the propod ME algorithm applied to the test quence of’Miss America’is shown in Fig.6as a motion ﬁeld reprentation.The comparison of the propod ME technique with standard BKM techniques and coar-to-ﬁne complex multiresolution ME(CDWT2)in[2]is shown in Fig.7(a)and Fig.7(b)show the PSNR of the recon-structed frames for various bit allocations of coding the er-ror frame.The signiﬁcant improvement in the SNRs of the reconstructed frames is attributed to the improved pha model(averaged contribution of the spatial neighbouring coeﬃcients),theﬁne-to-coar ME strategy and the higher order wavelet-bad interpolation technique.Also,theﬁne-to-coars

e ME strategy engenders a better estimation accu-racy.The grouping of coeﬃcients in the estimation of the motion vectors into blocks gives higher computational speed in addition to averaging out the noi eﬀects in estimation. The wavelet-bad bandpass interpolation technique gives a high subpixel accuracy in the order of

dyadic fractions which is very uful in estimating fractional displacements of motion.Additionally,since TD-ME results in a true rep-rentation of the underlying motion which is independent of the intensity,light eﬀects and the acquisition methods of capture of the video quence.Thus,we conclude that we have an eﬃcient uniform transform domain Video Codec with higher accuracy in reduction of temporal redundancy.

5.REFERENCES

[1] E.H.Adelson and J.R.Bergen,“Spatiotemporal en-

ergy models for the perception of motion,”J.Opt.Soc.

Amer.A.,vol.2,pp.284–299,1985.

Fig.6.Motionﬁeld of the”Miss America”quence

0510152025苹果手机怎样开机

frame number

(

)

__.__ Propod

____ H.263

−−−− MPEG

........... CDWT2

0510152025

frames −>

0.5 b/s

1.5 b/s

2.5 b/s

5 b/s

Carphone quence

Fig.7.(a)Comparison of the PSNR of the standards with propod technique(b)PSNR of the reconstructed quence at various bpp of coding the error frames

[2]Julian Magaurey and Nick Kingsbury,“Motion estima-

tion using a complex-valued wavelet transform,”IEEE Trans on Signal Processing,vol.46,no.4,pp.1069–84, April1998.

[3] E.P.Simoncelli,W.T.Freeman, E.H.Adelson,and

David J.Heeger,“Shiftable multiscale transforms,”

Information Theory,vol.38,no.2,pp.

587–607,March1992.

[4]Nick Kingsbury,“Image processing using complex

wavelets,”Phil.Trans.Royal Society London A, September1999.

[5]Kamakshi Sivaramakrishnan,“A wavelet-bad trans-

form domain motion estimation technique for video compression,”Masters Thesis,Dept.of Electrical and Computer Engg.,Boston University,August2000. [6]R.A.Gopinath and C.S.Burrus,“Wavelet-bad low-

pass/bandpass interpolation,”Proc.of ICASSP’92, vol.4,pp.384–388,March1992.

[7]James S.Walker and Truong Nguyen,,”Wavelet-Bad

七年级下册语文书古诗

Image Compression,Chapter of CRC Press book:CRC Handbook of Transforms and Data Compression(to be published).

本文发布于:2023-07-26 14:46:11，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1097491.html

上一篇：医学分子生物学重点名词解释问答（2）

下一篇：Overview of bacterial expression systems for heterologous protein production from molecular and bioc