MOTION SALIENCY DETECTION USING LOW-RANK AND SPARSE DECOMPOSITION
登克尔
⋆Yawen Xue§,†,⋆Xiaojie Guo§and Xiaochun Cao§
{yxue,xguo,xcao}@tju.edu
§School of Computer Science and Technology,Tianjin University,Tianjin300072,China †School of Computer Software,Tianjin University,Tianjin300072,China
ABSTRACT
Motion saliency detection has an important impact on further
video processing tasks,such as video gmentation,object
recognition and adaptive compression.Different to image
saliency,in videos,moving regions(objects)catch human be-
ings’attention much easier than static ones.Bad on this
obrvation,we propo a novel method of motion saliency
detection,which makes u of the low-rank and spar de-
composition on video slices along X-T and Y-T planes to
achieve the parating foreground moving objects
from backgrounds.In addition,we adopt the spatial infor-
mation to prerve the completeness of the detected motion
objects.In virtue of adaptive threshold lection and efficient
noi elimination,the propod approach is suitable for dif-
ferent video scenes,and robust to low resolution and noisy
cas.The experiments demonstrate the performance of our
method compared with the state-of-the-art.
Index Terms—Motion Saliency Detection,Low-rank
and Spar Decomposition,Video Analysis
1.INTRODUCTION
Visual attention analysis provides an intuitive methodology
to mantic content understanding and important information
capture in both images and videos.Most primates,including
humans,can divert their mind subconsciously to the salient
objects in images or to the motion objects in videos.Such a
remarkable ability leads to that they can sample the most“in-
teresting”features and interpret complex scenes while spend-
ing limited processing.In other words,visual saliency makes
a distinguishing region stand out and thus catch special atten-
tion quickly,which provides an alternative solution to many
tasks,such as video gmentation[1],adaptive content deliv-
ery[2]and adaptive compression[3].
(a) Input Video
(f) Saliency Map
(g) Output Video
(b) X-T Plane Slice
(d) X-T Saliency
(e) Y-T Saliency
(c) Y-T Plane Slice
Fig.2.Illustration of the main stages of the propod method
techniques for images(mentioned above)are not available.
Since,different with image saliency detection,moving re-
gions(objects)alternatively catch more human beings’atten-
tion than static ones,even though which have large contrast
to their neighbors in static images.That is to say,the focal
ladykillerspoint changes from the regions with large contrast to their
neighbors for images to tho with motion discrimination for
videos.Therefore,the contrast bad methods are hardly ap-
plied to videos directly.An exception exists in[4],which ex-
tends the spectrum residual analysis[9]in images to videos.
Actually,the goal of moving object paration from back-
ground is the same as that of motion saliency detection.A
few solutions for parating foreground moving objects from
backgrounds have been propod,such as Gaussian Mixture
Model[5].In this work,we introduce a novel method to de-
tect motion saliency by using low-rank and spar decompo-
英语四级论坛sition.Prior to detailing the stages of our propod method,
wefirst post an example of the performance comparison be-
tween our method and the state-of-the-art in Fig.1.As can be
en,the result obtained by our method is significantly better
than the others.(More experiments and results can be found
in Sec.4.)
2.LOW-RANK AND SPARSE DECOMPOSITION
Suppo we have a data matrix A∈R n∗m,and know that it
can be decompod as A=D+E,where D has low rank
and E is spar.Both D and E are of arbitrary magnitude.
Recall that the Classical Principle Component Analysis
(CPCA)eks the rank-k estimate of D to approximate A,or
to say,reduce the dimensionality of the obrvations in A by
optimally solving:
min||A−D||,s.t.rank(D)≤k,(1)
where||∗||stands for he largest singular
value of it.Note that CPCA is under the assumption that the
obrvations in A are polluted by noi he ab-
solute values of the elements in E are small.Otherwi,the
solution of CPCA is far away from the optimal D.However,体育馆英语
in real world problems,data pollution is ubiquitous and arbi-
trary.As a conquence,CPCA may somewhat lo its power
南京财经大学自考网to deal with many real world problems.
To overcome the drawback of CPCA,Robust Principal
Component Analysis(RPCA)is propod by Cand`e s et al.
[12].The goal of RPCA is to optimize the problem:
min rank(D)+||E||A=D+E,(2)
where||∗||0denotes theℓ0-norm.But such a problem is in-
tractable in polynomial-time.Instead,one can solve its con-
vex relaxation as follows:
min||D||∗+λ||E||A=D+E,(3)
whereλis the coefficient controlling the weight of the spar
matrix E,and||∗||∗and||∗||1reprent the nuclear norm and
theℓ1-norm of the matrix,respectively.This formulation per-
forms well in practice,which recovers the true low-rank so-
lution even when up to a third of the obrvations are grossly
corrupted.More detail about the proof of the low-rank and
请假单英文spar decomposition using RPCA can be found in[12].
3.OUR METHOD
In this ction,wefirst formulate the problem this work in-
tends to solve,and then detail the stages of our propod
成都外国语method(Fig.2).london bridge
Problem Formulation.Different to static image saliency
detection,the motion regions in a video intensively attract
humans’attention instead of the regions with large contrast
in every single image.And due to the correlation between
frames,the motion regions in the video1can be identified
from the background by low-rank and spar decomposition.
Note that foreground motion objects,such as cars and pedes-
trians,usually occupy only a fraction of the image pixels and
hence can be treated as spar errors.In this work,we stack
(a) Y-T Plane Slice (c) Our Method
(b) TSR Fig.3.Visual comparison of the middle results for motion saliency detection on a temporal slice.(a)shows a Y-T plane slice.(b)and (c)are the results of detecting motion saliency on (a)using TSR [4]and our method,respectively.
the temporal slices along X-T and Y-T as the matrices S .Nat-urally,the low-rank component B corresponds to the back-ground and the spar component M captures the motion ob-jects in the foreground.Figure 3(a)shows a temporal slice to confirm our obrvation.As shown in Fig.3(b)and (c),our method significantly outperforms TSR in terms of capturing the motion.Moreover,we take ada
ptive threshold lection and refinement to reduce the effect of noi and missing pix-els (becau of the ignore of spatial consideration).
Decomposition.Bad on the problem formulation,each X-T and Y-T slice S is decompod,similar with Eq.3,as:
min ||B ||∗+λ||M ||S =B +M .
(4)
Then,the motion abs (M ),obtained from the X-T (Y-T)slices are integrated together as S cubeXT (S cubeY T )along X-Y-T.Then,using norm (S cubeXT .∗S cubeY T )to form the initial saliency map S cube ,where .∗is the element-wi product operator,and norm (∗)reprents normalization pro-cessing.The size of T in our experiments equals the size of the video,it also can be defined as the size of a sub-video.Refinement.To reduce the effect of missing pixels on the motion objects and refine the results,we further take into ac-count the spatial information.Intuitively,the pixels belong-ing to the same motion object are always locally coherent.This indicates that a pixel p i,j,k is very likely to be missing salient pixel when its neighbors in frame k are motion salient.Inspired by this obrvation,we u a Gaussian function to recall the missing pixels as follows (we omit subsc
ript k for short):
S cube (i,j )=
||p x,y −p i,j ||2<τ
S cube (x,y )∗f (||p x,y −p i,j ||2),(5)where τis the radius of the support region centered on p i,j ,||∗||2is the ℓ2-norm,and f is a Gaussian function:f (d )=12πσ
exp −d 2
2To
e more results,plea visit:
cs.tju.edu/orgs/vision/msd/results.htm
(2)
(1)
(4)
(e) Raw Saliency Map (b) FD (c) GMM (d) TSR (f) Our Method
(a) Original Frame (5)prada什么意思
(3)
Fig.4.Experiment results.(a)original frames (OF)from different video scenes and types.(b)-(d)are the
results by using frame difference (FD),GMM [5]and TSR [4],respectively.(e)shows the raw saliency maps obtained by our method.The final results by our method are shown in (f).The performance difference analysis plea e the text.effort of our adaptive threshold lection and refinement.Fig-ure 4(e)shows the raw saliency maps without executing the refinement and the adaptive threshold lection.The final re-sults of our method are displayed in Fig.4(f).
5.CONCLUSION
In this work,we propod a novel motion saliency detec-tion method bad on low-rank and spar decomposition,which provides many video processing tasks,such as video gmentation and adaptive content delivery,with a powerful video pre-processing technique.The propod method is able to distinguish foreground motion objects from backgrounds without any background modeling procedure.Thank to spa-tial consideration,we further reduce the effect of incomplete-ness.In addition,by employing adaptive threshold lection and noi elimination,the method can automatically and ro-bustly accomplish the task.The experiments carried on differ-ent video qualities and scenes demonstrated that our propod method outperforms the state-of-the-art.
6.REFERENCES
[1]K.Fukuchi,K.Miyazato,A.Kimura,S.Takagi,and J.Yam-ato,“Saliency-bad video gmentation with graph cuts and quentially updated priors,”in IEEE ICME ,2009,pp.638–641.[2]Y .Ma and H.Zhang,“Contrast-bad image attention analysis
by using fuzzy growing,”in ACM MM ,2003,pp.374–381.
[3] C.Christopoulos,A.Skodras,A.Koike,and T.Ebrahimi,“The
jpeg2000still image coding system:An overview,” Consumer Electronics ,vol.46,no.4,pp.1103–1127,2000.[4]X.Cui,Q.Liu,and D Metaxas,“Temporal spectral residual:price tag 歌词
Fast motion saliency detection,”in ACM MM ,2009,pp.617–620.[5]Z.Zivkovic,“Improved adaptive gaussian mixture model for
background subtraction,”in ICPR ,2004,pp.28–31.[6]L.Itti,C.Koch,,and E.Niebur,“A model of saliency-bad vi-sual attention for rapid scene analysis,” PAMI ,vol.20,no.11,pp.1254–1259,1998.[7]R.Achanta,S.Hemami, F.Estrada,and S.S¨u sstrunk,
“Frequency-tuned salient region detection,”in IEEE CVPR ,2009,pp.1597–1604.[8]J.Harel, C.Koch,and P.Perona,“Graph-bad visual
saliency,”in NIPS ,2007,pp.545–552.[9]X.Hou and L.Zhang,“Saliency detection:A spectral residual
approach,”in IEEE CVPR ,2007,pp.1–8.[10]Z.Wang and B.Li,“A two-stage approach to saliency detection
in images,”in IEEE ICASSP ,2008,pp.965–968.[11]M.Cheng,N.Zhang,G.and Mitra,X.Huang,and S.Hu,
“Global contrast bad salient region detection,”in IEEE CVPR ,2011,pp.409–416.[12] E.Cand`e s,X.Li,Y .Ma,and J.Wright,“Robust principal
component analysis?,”Journal of the ACM ,vol.58,no.3,pp.1–37,2011.