Removing Rolling Shutter Wobble
Simon Baker,Eric Bennett,Sing Bing Kang,and Richard Szeliski
Microsoft Corporation
One Microsoft Way
Redmond WA98052
Abstract
We prent an algorithm to remove wobble artifacts from a video captured with a rolling shutter camera undergo-ing large accelerations or jitter.We show how estimating the rapid motion of the camera can be pod as a tempo-ral super-resolution problem.The low-frequency measure-ments are the motions of pixels from one frame to the next. The measurements are modeled as temporal integrals of the underlying high-frequency jitter of the camera.The es-timated high-frequency motion of the camera is then ud to re-render the quence as though all the pixels in each frame were imaged at the same time.We also prent an auto-calibration algorithm that can estimate the time be-tween the capture of subquent rows in the camera.
1.Introduction
Most digital still cameras,cellphone cameras,and we-bcams u CMOS nsors.CMOS video cameras are also increasingly becoming popular,from the low-end Flip cam-era[12]to the high-end Red camera[13].To maximize the fill factor,CMOS nsors are commonly read out line-by-line and u a“rolling shutter”;ie.the capture time of each row is slightly after the capture time of the previous row.
Rolling shutter cameras suffer from three main artifacts: (1)shear,(2)partial exposure,and(3)wobble.Shearing occurs when the camera undergoes a constant(or smoothly varying)motion.Shearing can be corrected by computing the global motion and then warping the frames appropri-ately[11,7].In the prence of independently moving(but slowly accelerating)objects,a full opticalflowfield can be ud to perform the correction[5].Partial exposure occurs when a rolling shutter is ud to image fast changing illu-mination such as aflash,a strobe light,or lightning[5].
Wobble occurs when there are large accelerations or the motion is at a higher frequency than the frame rate of the camera.Wobble is particularly pronounced for cameras mounted on helicopters,car
s,and motorbikes.Wobble artifacts are often largely imperceptible in single images,even in cas where the temporal artifacts in the video are quite dramatic.See the videos in the supplemental material (pausing at various points)for examples.About the only prior work that has come clo to addressing rolling shutter wobble is[8].This paper us camera motion and context-prerving warps for video stabilization.Empirically,the authors noted that their algorithm tends to reducing rolling shutter wobble.However,the algorithm does not model the high-frequency temporal motion[10,1]necessary for gen-eral purpo rolling shutter correction.
In this paper,we prent an algorithm to remove rolling shutter wobble in video.In particular,we show how esti-mating the high-frequency jitter of the camera can be pod as a temporal super-resolution problem[3,14].The tem-poral low-frequency measurements(analogous of the low resolution pixels)are the motions of pixels from one frame to the next.The measurements are modeled as temporal integrals of the underlying high-frequency jitter of the cam-era.The estimated high-frequency motion of the camera is then ud to re-render the video as though all the pixels in each frame were imaged at the same time.
We begin in Section2.1by deriving our algorithm for a per row)translational jitter model, analogous to the one in[6].It is straight-forward to gener-alize this model to a per row affine m
odel.See[2]for the details.In Section2.2we generalize the model to include independently moving objects.In particular,we model the motion of each pixel as the combination of a low-frequency independent motion and a high-frequency camera jitter.
Our algorithm has a single calibration parameter,the time between the capture of two subquent rows as a frac-tion of the time between two subquent frames.In Sec-tion2.3we investigate the calibration of this parameter.We first derive a clod-form expression relating the solution of the super-resolution constraints with the correct parame-ter to the solution with another tting(for the translational model).This result is important becau it implies that the performance of our algorithm should be robust to the tting of the calibration parameter,a result which we empirically validate.Second,we prent an auto-calibration algorithm
where one time unit is the time between subquent frames.
that can estimate the calibration parameter from a short g-ment of a video containing jitter.
2.Theory and Algorithms
Assume that the rolling shutter camera captures a video:
I RS
T
(X,Y)for T=0,1,2, (1)
Assume that the Y th row in image I RS
T (X,Y)is captured at
time T+τY,where we define the capture time of each row to be the mid-point of the exposure period for that row.See Figure1for an illustration.The non-zero exposure period means motion blur may be prent.The shift over time in the mid-point of the exposure period still caus“rolling shutter”artifacts
even in the prence of motion blur.Note that in this paper we do not address motion blur removal.
For now,we assume thatτhas been calibrated and so is known.See Section2.3for a method to calibrateτ.We
wish to correct the rolling shutter images I RS
T (X,Y)to gen-
erate a quence I GS
T (X,Y)that might have been captured
by a camera with a global shutter.We are free to choo the
time T+βthat the global shutter image I GS
T (X,Y)would
have been captured.A natural choice is:
β=τ×(M−1)/2(2)
becau it minimizes the maximum correction and means that the center of the image will require the least correction. In this paper,we always u the value ofβin Equation(2).
2.1.High-Frequency Translational Camera Jitter
We begin by describing our algorithm using a per row)translational model of the camera jitter.In this model,the motions of all the pixels in each row are assumed to be the same.The motion of each subquent row can be different,however.2.1.1Motion Model
Denote the temporal trajectory of the projected location of a scene point x(t)=(x(t),y(t)).We u lower ca t,x, y to denote continuous variables and upper ca T,X,Y to denote integer frame,column,and row numbers.Note that we model the continuous path of the point x(t)even through time periods that it is not imaged.If the camera is jittering,x(t)will vary rapidly between two subquent frames T and T+1.We assume that this high-frequency variation can be described using the following differential equation:
d x
=m hf(x;p(t)).(3) The parameters p(t)are a function of continuous time t. At any given time t,m hf(x;p(t)
)describes a low para-metric spatial motion model.For example,m hf could be a translation.In this ca,the parameter vector p(t)= (p1(t),p2(t))has two components,and:
m hf(x;p(t))=(p1(t),p2(t)).(4) In the remainder of this ction,we u this translational model.See[2]for the derivation of our algorithm for a per row)affine model.In the transla-tional model,at any given time t,all the points in the image are moving with the same motion.However,over the dura-tion of a frame from T to T+1,the translation may vary temporally.In the context of image deblurring with a global shutter camera,this translational model can result in arbi-trarily complex blur kernels[6].The blur kernels are the same at each pixel,however.In our ca of a rolling shut-ter quence,the low-frequency(frame-to-frame)motion of each row in the image can be different becau each row is imaged at a different time.The temporally high-frequency translational model can result in complex,non-rigid image lling shutter wobble.
Equation(3)defines a differential equation for x(t).To proceed,this equation must be solved.In the ca of the translation,the continuous analytical solution is:
x(t)=x(t0)+
t
t0
p(s)d s.(5)
For more complicated motion models,deriving an analytic solution of Equation(3)may be impossible.An approxi-mate or numerical solution can be ud instead.See Sec-tion2.2for the approximation ud in the ca of indepen-dent motion and[2]for the affine ca.
2.1.2Measurement Constraints
We assume that measurements of the motion are available in the form of point correspondences.In this paper,all corre-spondences are obtained using the Black and Anandan op-ticalflow algorithm[4].Note that rolling shutter distortions哈佛大学官网
Figure2.Measurement Constraints:Each constraint in Equa-tion(7)specifies a known value for the integral of the unknown higher-resolution temporally varying motion parameters over a known interval.The constraints from points in nearby rows cloly overlap each other,analogously to how sub-pixel shifts generate overlapping constraints in image super-resolution[3].民治英语培训
do not affect the extent to which brightness constancy holds. We subsample theflowfields,as described in detail in Sec-tion3,to obtain a discrete t of correspondences.An al-ternative approach would be to u a feature detection and matching algorithms such as[9].Note,however,that care should be taken as non-rigid image deformations do affect the values of most feature descriptors to some extent.
We assume each correspondence takes the form:
Corr i=(T i,T i+K i,x T
i ,x T
i
+K i
).(6)
This correspondence means that a point at x T
i =(x T
i
,y T
alnicoi
)
in image I RS
T i
was matched,tracked,orflowed to the point
x T
i +K i
=(x T
i
+K i
,y T
i
+K i
)in the cond image I RS
T i+K i
.
The times T i and T i+K i are integers.Although in many
cas,thefirst location x T
i is integer valued,the cond lo-
cation x T
i +K i
should be estimated with sub-pixel accuracy.
For generality we denote both as real-valued.
Each correspondence can be substituted into Equa-tion(5)to generate a measurement constraint:
MC(Corr i)=x T
i +K i
−x T
i
猖狂的意思
−
T i+K i+τy T
i+K i
T i+τy T
i
p(s)d s
(7)
where ideally MC(Corr i)=0.Note that the integral is from the time that the point was imaged in thefirst image
T i+τy T
bookingi to the time at which it was imaged in the c-
ond image T i+K i+τy T
i +K i
;i.e.the length of the in-
terval is not exactly K i.Also note that the constraints in Equation(7)are temporal analogs of the constraints in im-age super-resolution[3].Each constraint specifies a value for the integral of the unknown higher-resolution tempo-rally varying motion parameters over a known interval.See Figure2for an illustration.The constraints from points in neighboring rows cloly overlap each other,analogously to how sub-pixel shifts create overlapping constraints in[3].
One important difference is that the integral in Equa-tion(7)is1D(albeit of a2D vector quantity),whereas in image super-resolution,it is a2D integral(of1-3band images).Image super-resolution is known to be relatively poorly conditioned[3].Obtaining resolution improvements beyond a factor of4–6or so is difficult.In[14],however, it was shown that1D super-resolution problems are far bet-ter conditioned.Roughly speaking,the condition number in1D is the square-root of the condition number in the cor-responding2D ca.Consistent with this analysis,we en-countered diminishing returns when attempting to enhance the temporal resolution by more than a factor of30or so.
2.1.3Regularization and Optimization
We regularize the problem using a standardfirst order smoothness term that encourages the temporal derivative of the motion p to be small.We u L1norms to measure er-rors in both the measurement constraints and regularization. We ud the following global energy function:
Corr i
|MC(Corr i)|+λ
j=1,2
d p j
d s
d s.(8)
The measurement constraints are likely to contain a num-ber of outliers,both due to independently moving objects and gross errors in theflowfield.An L1norm is therefore preferable to an L2norm.We could also u an even more robust energy function,but such a choice would make the optimization more complex.We u an L1norm rather than an L2norm for the regularization term,as it is reasonab
le to expect the motion to be piecewi smooth,with disconti-nuities during rapid accelerations.
We reprent the continuous motion parameters with a uniform sampling across time.The exact number of sam-ples ud in each experiment is reported in Section3.As in [3],we u a piecewi constant interpolation of the sam-ples when estimating the integral in the measurement con-straints.With this reprentation,both the measurement constraints and the regularization term are linear in the un-known motion parameters.We solved the resulting convex L1optimization using linear programming.
2.1.4Correction Process
We wish to estimate the global shutter pixels I GS
T
(X,Y)
using the rolling shutter pixels I RS
T
(x,y).We assume that X,Y,and T are known integer values,whereas x and y are unknown subpixel locations.Once we know x and y,we can(bicubically)interpolate the rolling shutter image.To estimate x and y,it helps to also estimate the time t at which this rolling shutter pixel was captured.Figure3contains an visualization of a2D(y,t)slice through3D(x,y,t)space. We project out the x variable and only show one pixel in each row of the image.Under the translational model,the motion of each pixel in a row is identical.
between the rolling shutter plane and the global shutter path.The rolling shutter image is then interpolated at this point.
The rolling shutter pixels I RS
T
(x,y)lie on the plane:
t=T+τy.(9) Compensating for the estimated motion,the global shutter
pixel I GS
T (X,Y)starts at3D location(X,Y,T+β)and
moves along the path:
x
y
=
X
Y
+
t
T+β
p1(s)d s
t
T+β
boutiquep2(s)d s
.(10)
The correction process begins by solving the pair of simul-taneous Equations(9)and(10).Plugging Equation(9)into the y row of Equation(10)gives:
t−T τ=Y+
t
T+β
p2(s)d s.(11)
The solution of this equation for t is independent of X and so this process only needs to be applied once per row.The solution can be obtained by stepping through the repren-tation of the motion parameters p(t),considering each pair of samples in turn and approximating the integral in Equa-tion(11).For the time interval between each pair of mo-tion samples,Equation(11)is linear in the unknown t.It is therefore easy to check whether there is a solution in this interval.Note that,assuming the absolute value of the verti-
cal motion p2(t)is bounded above by1
τ− for some >0,
the solution of Equation(11)is unique.A single pass can therefore be made through each neighboring pair of motion samples,with an early termination if a solution is found.If no solution is found,the pixel must have moved outside the image.Once the solution of Equation(11)has been com-puted for t,the
corresponding(x,y)location in the rolling shutter image can be computed using Equation(10).
2.2.Adding Low-Frequency Independent Motion
The L1-bad energy function in Equation(8)is rela-tively robust to outliers such as independently moving ob-jects.The correction applied to independently moving ob-jects,however,will ignore their independent motion.Inde-pendently moving objects may still have a residual shear. We now extend our algorithm to model independently mov-ing objects and correct this residual shear.We u a low-frequency model of the independently moving objects for two reasons:(1)In most cas,independently moving ob-jects undergo relatively slow acceleration.There are excep-tions,of cour,such as rotating helicopter blades.(2)Mod-eling independently moving objects with a high-frequency model would be extremely challenging and ambiguous.
We generalize the motion model in Equation(3)to:
d x
d t
=m hf(x;p(t))+m lf
t
(x)(12)
where m lf0,m lf1,...is a low-frequency motion(constant within each frame),but spatially the variation is den.The
low-frequency model m lf
t
(x)can be thought of as a per-pixelflowfield,where each pixelflows with a temporally constant velocity between each pair of frames.
The low-frequency term m lf
t
(x)makes analytically solving Equation(12)hard,as the dependence on x is es-ntially arbitrary.To obtain an approximate solution,we
assume that the spatial variation in m lf
t
(x)is small and treat this term as a constant.Using the translational model of Equation(4)for the high-frequency term,the approxi-mate solution of Equation(12)is:
x(t)≈x(t0)+
t
t0
p(s)d s+(t−t o)m lf
t
(x t
)(13) which yields the measurement constraints:
MC(Corr i)=x T
i
+K i
−x T
i
−
T i+K i+τy T
i+K i
T i+τy T
i
p(s)d s
−(K i+τy T
i
+K i
−τy T
i
)m lf T
i
(x T
i
).(14) We regularize the low-frequency model by adding the fol-lowing two terms to the global energy function:
γ
must love dogs
T
opportunity
∇m lf
T
(x)
1
d x+
T
m lf
T
(x)
1
d x.(15)
Thefirst term encourages the low-frequency model to vary smoothly across the image.We also spatially subsample
m lf
T
(x)to reduce the number of unknowns.See Section3 for the details.The cond term is needed to resolve an ambiguity between the low-frequency and high-frequency models.We favor the high-frequency model by adding a (very small)penalty to non-zero independent motion.
During the correction process,the path of the global shutter pixel in Equation(10)becomes:
x=X+
t
T+β
p(s)d s+(t−T−β)m lf T(X).(16)
Note that the time of interction of this path with the plane of rolling shutter pixels in Equation (9)is no longer inde-pendent of X .The interction therefore needs to be per-formed for each pixel,rather than just once for each row.Note that this process can be sped up by solving the inter-ction on a subsampled mesh and then upsampling.
2.3.Calibrating τ
The only image formation parameter in our model is τ,
the time between the capture of neighboring rows (e Fig-ure 1.)In some cas,it is possible to calibrate τfor a camera in the lab [5].In many cas,however,all we have is a video obtained from an unknown source.Two key questions are:(1)how nsitive is our algorithm to an er-roneous tting of τ,and (2)can we auto-calibrate τ?In Section 2.3.1,we address the first question by deriving a clod-form expression relating two solutions of the mea-surement constraints with different τs.This result indicates that the performance of our algorithm should be robust to the tting of τ,a result which we empirically validate in Section 3.4.In Section 2.3.2,we derive an auto-calibration algorithm to estimate τfrom a short gment of the video.2.3.1Analyzing the Effect of Incorrect Calibration We first introduce some notation.Suppo that the images have M rows.Denote the duty cycle d =(M −1)τ.The
camera is active capturing image I RS
T (X,Y )between times T and T +d .Between T +d and T +1,the camera is inactive in the n that no new rows are imaged.
Now consider two solutions to the measurement con-straints in Equation (7).Suppo the first solution us the correct τ=τ1,duty cycle d 1=(M −1)τ1and the cond solution us an incorrect τ=τ2,duty cycle d 2=(M −1)τ2.Let r =d 1/d 2=τ1/τ2.Also,split the solutions into their active and inactive parts:
p i (t )= p act
i (t )if t − t <=d i
p ina i (t )
if t − t >d i i =1,2(17)
where p act i (t )is the active part of the solution and p ina
i (t )is the inactive part.
Below we show that if all the correspondences have K =1and so take the form (T,T +1,x T ,x T +1),and:
p act 2(t )≈r p act
1(r (t − t )+ t )+c t
(18)where:c t =
1
d 2
t +1
t +d 1
p ina 1(s )d s −
t +1
t +d 2
p ina 2(s )d s
haiti(19)
7月英文缩写then the integrals in Equation (7)are the same:
T +1+τ2y T +1T +τ2y T
p 2(s )d s = T +1+τ1y T +1
T +τ1y T
p 1(s )d s.(20)
For a correspondence (T,T +1,x T ,x T +1)with K =1,the left hand side of Equation (20)is:
T +1+τ2y T +1
T +τ2y T
p 2(s )d s =
T +d 2
T +τ2y T
p act 2(s )d s + T +1
T +d 2
p ina 2(s )d s +
T +1+τ2y T +1
T +1
p act 2(s )d s.(21)
Plugging Equation (18)into the first term on the right hand
side gives:
T +d 2
T +τ2y T
r p act 1(r (s − s )+ s )+c s d s
(22)
which after substituting s =r (s −T )+T simplifies to:
T +d 1
T +τ1y T
p act 1(s )d s
+c T (d 2−τ2y T ).
(23)
Similarly the third term simplifies to:
T +1+τ2y T +1
T +1
p act 1(s )d s
+c T τ2y T +1.
(24)
Assuming that y T +1≈y T and plugging the expressions
into Equation (21)and substituting the expression for c t from Equation (19)yields Equation (20).
Our derivation makes one approximating assumption,that y T ≈y T +1.This assumption is reasonable becau the vertical motion between two concutive frames is gen-erally only a small fraction of the frame.
Equation (18)provides a relationship between two solu-tions of the measurement constraints for different τ.Due to regularization and discretization,the final solution obtained by our algorithm will not exactly match Equation (18).It can be expected to hold approximately,however.
What does Equation (18)mean in terms of the final cor-rection applied?First,note that only the active part of the solution is ud in the correction process.See Figure 3.Sec-ond,note that if c t =0then Equation (18)would mean that exactly the same correction is applied for the two dif-ferent values of τ.The proof of this fact follows the same argument as above.The difference in the two corrections is due to c t ,a constant motion for each frame.The two corrections will therefore approximately differ by a global affine warp.In general,c t will vary from frame to frame,as c t is related to the motion in the inactive period.
In summary,with a slightly incorrect value of τtheory shows that the final corrections with approximately differ by a slightly different affine warp for each frame.Although estimating τwrongly may add a little jitter to the output,our algorithm can be expected to be robust in the n that there is little danger of gross artifacts being added.