A UNIFORM TRANSFORM DOMAIN VIDEO CODEC BASED ON DUAL
TREE COMPLEX W A VELET TRANSFORM
Kamakshi Sivaramakrishnan†and Truong Nguyen‡
Boston University
Electrical and Computer Engineering Department
Boston MA02215
†kamakshi@bu.edu,‡nguyent@engc.bu.edu
ABSTRACT
This paper describes a uniform transform domain Video
Codec where the motion estimation/compensation(ME)is performed in the transform domain.The estimation tech-nique discusd here is a subpixel transform domain ME bad on the Dual Tree Complex Wavelet Transform(DT CWT)and a maximum pha correlation technique.The DT CWT is a
multiresolutionfine-to-coar bandpassfil-tered decomposition of each still frame and has desirable properties of shiftablility,directional lectivity and per-fect reconstruction(PR).Estimation isfirst performed at thefinest resolution and successively proceeds to the coar resolutions using afine-to-coar strategy.This gives mul-tiresolution motion estimates enabling estimation of current frame transform coefficients from the corresponding ones in previous frame.The key difference of this approach is the transform domain error frames-a uniform transform do-main Video Codec.This further simplifies the encoder and decoder resulting in computational savings with compara-ble performance to the standards.
1.INTRODUCTION
The intensity-bad block matching(BKM)ME techniques are a good approximation of the underlying3-D motion. However the ambiguity of the very measure of motion(pat-tern of intensity changes)may result in an erroneous rep-rentation of the2-D projected motion.The conventional hybrid video coding techniques like the MPEG and H.263 compri of a spatial domain ME(SD-ME)block which in-states a heavily loaded feedback loop in the encoder shown in Fig.1(a).The Inver transform(T−1),is solely for the purpo of spatial domain ME and has been long recog-nized as a major bottleneck of the video coding system for high speed real-time video networks.A po
ssible solution to this problem is to perform the motion estimation in the transform domain(TD-ME),thus moving the transform (T)block out of the loop and removing the(T−1)block. This results in a uniform transform-domain Video codec cir-cumventing the shortcomings in the conventional codec.In this paper,we propo a modified transform domain Video codec as shown in the Fig.1(b).
课题研究方案
An approach to the problem of transform domain ME is the pha-matching technique,which is feasible if local dis-placements in the spatial domain can be approximated by
-
+
+
Q
T
Q-1
T-1
+
SD-ME Buffer
Heavily loaded feedback loop Entropy
Encode
WDR
I/P
Spatial domain
frames
error
CWT
Simplified feedback loop
Q
DT
+
-
Q-1
+
+
Buffer
TD-ME
Transform
domain
error
Entropy
Encode
WDR
I/P
Hybrid Video Codec Propod Video Codec
(a)(b)
Fig.1.Block diagram of(a)conventional and(b)Propod Video encoder
the linear pha-shifts in the transform domain.Adelson and Bergen et.al[1]designed a bank of spatio-temporal filters that decompod the signal into channels that are tuned to different speeds and orientations and hence repre-ntative of the underlying motion components in the sig-nal.This gives ri to the pha-bad opticalflow con-straint
(∇φi)T v+
∂φi
∂t
=0(1) whereφi is the output(transform coefficients)of the channel i and v is the corresponding component of the mo-tion(displacement)vector.This idea was exploited by Ma-gaurey et.al.[2]in a multiresolution coar-to-fine strat-egy to increa the range of estimation.Their technique however suffers from the deficiencies of a coar-to-fine ME strategy,viz:-
•Inaccurate ME at the coarst resolution,due to the lack of detail and aliasing effects.
•ME performed using coded reference frames involves quantization noi which affects coar resolutions the
most.
•Finite extent subbandfilters introduce aliasing com-ponents at coarr resolutions preventing resolution
scaling of translational motion.
•Not applicable for a complete uniform Video Codec as the error frame of the estimation is still in the
spatial domain.
Also,the transform ud in[2],is a complex-valued transform and hence not compatible with standards.The problems have been addresd in thefine-to-coar DT CWT-bad ME technique that we describe in this paper.
2.DUAL TREE COMPLEX W A VELET
TRANSFORM(DT CWT)
In order to perform estimation in the transform domain,we need a“shift-invariant”transform.This eliminates the real Discrete Wavelet Transform(DWT)which does not exhibit shift-invariance as it violates the Nyquist sampling rate[3] and information moves from one subband to another un-der translation.It has been shown by Simoncelli et.al[3] that there must be a relaxation of the critical sampling con-dition to achieve approximate shift-invariance resulting in abandoning orthogonality.Kingsbury et.al in[4],has de-signed an approximately shift-invariant implementation of t
he wavelet transform,viz:-Dual Tree Complex Wavelet Transform.It has the added advantage of perfect recon-struction(PR)and directional lectivity.The PR prop-erty is very important for the ME technique suggested in this paper.The transform and its shiftability is briefly dis-cusd in this ction,the details of which can be found in [4],[3].
Approximate shift invariance is possible with a real DWT by“doubling”the sampling rate at each level of the tree.“Doubling”can be achieved by eliminating the downsam-pling operator after thefirst level of decomposition provided the samples are uniformly spaced.This is equivalent to two fully-decimated trees(two real DWTs)provided thefilters in the two trees meet the delay criterion to maintain uni-form intervals between samples.The dual tree of realfilters is shown in the Fig.2.
Fig.2.Dual tree of realfilters for the DT CWT,filters of even and odd length alternate at successive levels The uniform sampling condition alongwith linear pha requirements impo conditions[4]on the length of thefil-ters in the two trees.Greater symmetry considerations re-sult in the alternating even and odd lengthfilters in DT CWT.Thefilter respons are Gausssian-shaped rembling the real and imaginary parts of the complexfilters in[2]and are shown in
平行四边形的概念Fig.3
05101520253035
−0.3
−0.2
−0.1
0.1
0.2
0.3
__ real
.... Imag
−− Abs
Fig.3.Impul respons at level4of the DT CWT scaling and wavelet function.
The transform is complex if the output from tree a is considered to be the real part and that from tree b,the imaginary part of the transform coefficient.Alternatively it could be considered to be a limited redundancy oversampled real transform.
2.1.Shiftability of CWT
Simoncelli et.al in[3]have shown that a transform is shiftable if and only if there exists a t of interpolation functions that interpolate the“non-translated”transform coefficients to give the translated
coefficients for any arbitrary transla-tion x0of the spatial domain input signal.This condition is summarized here for a periodic input signal,f(x).The sig-nal f(x)when transformed using the projection functions corresponding to shifted copies(basis functions)of the ker-nel,h(x):
{h(n−∆x−x)|n=0,1,...,N−1}
over a period[0,2π]and sampling interval,∆x=2π
N
gives transform coefficients,C(n).Thus,
C(n)= 2π0dxh(n∆x−x)f(x),n∈0,1,...,N−1(2)
Simoncelli et.al in[3],have shown that the above trans-formation is shiftable if
h(x0−x)=ΣN−1
n=0
b n(x0)h(n∆x−x)(3)
where x0is the arbitrary shift,b n(x0)is the interpolant.
That is,the arbitrary shifted kernel h(x0−x)can be expresd as a linear combination of the basis functions h(x−n∆x).This implies that the sampled basis t spans the entire subspace of all translations of the kernel.Equiv-alently,the linear subspace spanned by the sample basis is invariant to translation.
Using equation(2)in equation(3),we get
C(n0−n)=ΣN−1
k=0
b k(n0∗
2π
N
)C(k−n)(4) where n0=x0∗N is the corresponding shift in the transform domain.
If a solution to the above equation exists,then the trans-form is shiftable.In an aliad transform (real DWT),the respon power depends on the signal position:translation of the input signal generally results in a re-distribution of the power content amongst the various frequency subbands.The shiftability constraint is equal to the fact that power of the transform coefficients in the subband is prerved when the input signal is shifted in position [3].This is demon-strated by the DT CWT while the real DWT exhibits a periodic variation in the power content of the transform co-efficients with translations of the input signal and is shown in the Fig.4(a)and (b).This constraint is ud in the ap-proximation of the spatial motion to a linear pha change,giving a clod form expression for the motion vector esti-mate.
10
2030405060
−0.03
−0.02−0.01
0.010.020.03Translations of I/P signal
P o w e r c o n t e n t o f s u b b a n d 1
国际篮球规则
Level 4剖腹产多久能怀二胎
(a)(b)
Fig.4.(a)Constant power of DT CWT coefficients (b)Pe-riodic power variation of Real DWT coefficients v/s trans-lation of the spatial domain signal
Thus,the DT CWT gives us a “real-valued”transform that is approximately shift-invariant !
3.FINE-TO-COARSE MOTION ESTIMATION The solution to equation (4)is the key to the problem of ME in transform domain and more importantly,it is necessary that we look for a computationally inexpensive interpolant,b n (x 0).The zero-order interpolation technique discusd by Magaurey et.al [2]describes a simple single subpel interpo-lation technique which is good only for small range motion vectors,n 0.Here,we describe a higher order interpolation technique that is more robust and it averages out the noi effects in the transform coefficients.The error measure is the subband squared difference (SD ),similar to that in [2]SD (n,m )(n ,n 0)=|C (n,m )
1
(n +n 0)−C (n,m )
2
(n 0)|2
(5)
where C (n,m )
1is the subpel of the reference frame (un-shifted)at a level of decomposition m and subband n cen-tered around the position n =(n 1,n 2)and C (n,m )
2is the subpel of the current frame (shifted)with similar attributes.Here,n 0is the translation in the transform domain corre-sponding to a motion in the spatial domain.
From equations (3)and (2)
C (n,m )(n +n 0)=Σk B (n,m )
n 0
(k )C (n,m )(n +k )(6)
where B (n,m )
f is the interpolant at subband n and reso-lution m Becau of the Gaussian-shaped impul respon
of the CWT filters,Magaurey et.al showed that the inter-polant can be expresd as
B (n,m )
n 0
(k )=K n 0(k )e j 2m
[Ω(n,m )]T (n 0−k )墨洛温王朝
(7)
K n 0(k )is the interpolating kernel and Ω(n,m )is the
center frequency [2],along which the pha is a constant and equal to the orientation of the subband.If we weight the surrounding subpels by their pha contribution (measured by the inner product of the center frequency and position in the subband),we get a simple higher order and robust interpolation ,
K (n,m )n 0
(k )=
e j 2
m
[Ω(n,m )]T (k )
Now,
B (n,m )
n 0
(k )=e j 2m
(Ω(n,m ))T k
(k )e j 2
m
[Ω(n,m )]T (n 0−k )
(8)
Equation (6)now becomes
C (n,m )(n +n 0)≈e j 2
施工准备工作m
(Ω(n,m ))T (n 0)
Σk C (n,m )(n +k )
(9)
The above interpolation formula and the constant subband power (ction 2.1)is ud in locating the minimum of the SSD surface (equation 5),SD (m )(n ,n 0).Expanding equation(5)it can be shown that minimizing the SSD is equivalent to maximizing the pha correlation of the com-plex coefficients.Using the model for the interpolated pha (equation(9)),the motion estimate,n 0is expresd as a lin-ear relation to the interpolated pha,θ(n,m )(n )
2m (Ω(n,m ))T n 0=
θ(n,m )(n )(10)
where,θ(n,m )(n )
=C (n,m )
2
(n )
Σk C (n,m )
1(n +k )
(11)
In 2-D,since we have six subbands from the DT CWT
decomposition,the motion estimate n 0corresponding to the subpel n at any resolution is obtained by averaging over all the six subbands and is given by
v =arg min n
Σ6n =1SD
(n,m )
(n ,n 0)(12)
Using the linear pha model,a clod-form expression
for the minimizer of SD (n,m )is obtained [5],[2].The ad-vantage of the pha model in equation (10)is that is not limited by the range of the motion vector n 0.Hence we need not perform any further
refinement to obtain a true minimum.
3.1.Algorithm Structure
The block diagram of the transform domain ME is shown in Fig.5which is lf-explanatory in itlf.The motion com-pensation is done using a wavelet bad lowpass/bandpass interpolation [6]technique.As shown in Fig.5,the mo-tion compensation is performed using the motion vectors and the reference frame transform coefficients to generate the predicted frame transform coefficients.The transform
1
)
Fig.5.Block Diagram of the propod ME technique
domain error frames thus generated are coded using the Wavelet Difference Reduction[7]algorithm.Thus,we have transform domain error frames and motion vectors(MV)of transform coefficients!
4.RESULTS
The result of the propod ME algorithm applied to the test quence of’Miss America’is shown in Fig.6as a motion field reprentation.The comparison of the propod ME technique with standard BKM techniques and coar-to-fine complex multiresolution ME(CDWT2)in[2]is shown in Fig.7(a)and Fig.7(b)show the PSNR of the recon-structed frames for various bit allocations of coding the er-ror frame.The significant improvement in the SNRs of the reconstructed frames is attributed to the improved pha model(averaged contribution of the spatial neighbouring coefficients),thefine-to-coar ME strategy and the higher order wavelet-bad interpolation technique.Also,thefine-to-coars
e ME strategy engenders a better estimation accu-racy.The grouping of coefficients in the estimation of the motion vectors into blocks gives higher computational speed in addition to averaging out the noi effects in estimation. The wavelet-bad bandpass interpolation technique gives a high subpixel accuracy in the order of
dyadic fractions which is very uful in estimating fractional displacements of motion.Additionally,since TD-ME results in a true rep-rentation of the underlying motion which is independent of the intensity,light effects and the acquisition methods of capture of the video quence.Thus,we conclude that we have an efficient uniform transform domain Video Codec with higher accuracy in reduction of temporal redundancy.
5.REFERENCES
[1] E.H.Adelson and J.R.Bergen,“Spatiotemporal en-
ergy models for the perception of motion,”J.Opt.Soc.
Amer.A.,vol.2,pp.284–299,1985.
Fig.6.Motionfield of the”Miss America”quence
0510152025苹果手机怎样开机
25
30
35
40
45
50
55
frame number
P
S
N
R
(
d
B
)
__.__ Propod
ME
____ H.263
−−−− MPEG
........... CDWT2
0510152025
38
40
42
44
46
48
50
52
54
frames −>
P
S
N
R(
d
B)
0.5 b/s
1.5 b/s
2.5 b/s
5 b/s
Carphone quence
Fig.7.(a)Comparison of the PSNR of the standards with propod technique(b)PSNR of the reconstructed quence at various bpp of coding the error frames
[2]Julian Magaurey and Nick Kingsbury,“Motion estima-
tion using a complex-valued wavelet transform,”IEEE Trans on Signal Processing,vol.46,no.4,pp.1069–84, April1998.
[3] E.P.Simoncelli,W.T.Freeman, E.H.Adelson,and
David J.Heeger,“Shiftable multiscale transforms,”
Information Theory,vol.38,no.2,pp.
587–607,March1992.
[4]Nick Kingsbury,“Image processing using complex
wavelets,”Phil.Trans.Royal Society London A, September1999.
[5]Kamakshi Sivaramakrishnan,“A wavelet-bad trans-
form domain motion estimation technique for video compression,”Masters Thesis,Dept.of Electrical and Computer Engg.,Boston University,August2000. [6]R.A.Gopinath and C.S.Burrus,“Wavelet-bad low-
pass/bandpass interpolation,”Proc.of ICASSP’92, vol.4,pp.384–388,March1992.
[7]James S.Walker and Truong Nguyen,,”Wavelet-Bad
七年级下册语文书古诗
Image Compression,Chapter of CRC Press book:CRC Handbook of Transforms and Data Compression(to be published).