See All by Looking at A Few:
没有人比你更爱我
Spar Modeling for Finding Reprentative Objects
Ehsan Elhamifar Johns Hopkins University
Guillermo Sapiro
University of Minnesota
Ren´e Vidal
Johns Hopkins University
Abstract
We consider the problem offinding a few reprentatives for a ,a subt of data points that efficiently describes the entire datat.We assume that each data point can be expresd as a linear combination of the rep-rentatives and formulate the problem offinding the rep-rentatives as a spar multiple measurement vector prob-lem.In our formulation,both the dictionary and the me
a-surements are given by the data matrix,and the unknown spar codes lect the reprentatives via convex optimiza-tion.In general,we do not assume that the data are low-rank or distributed around cluster centers.When the data do come from a collection of low-rank models,we show that our method automatically lects a few reprentatives from each low-rank model.We also analyze the geometry of the reprentatives and discuss their relationship to the vertices of the convex hull of the data.We show that our framework can be extended to detect and reject outliers in datats,and to efficiently deal with new obrvations and large datats. The propod framework and theoretical foundations are il-lustrated with examples in video summarization and image classification using reprentatives.
1.Introduction
In many areas of machine learning,computer vision,sig-nal/image processing,and information retrieval,one needs to deal with massive collections of data,such as databas of images,videos,and text documents.This has motivated a lot of work in the area of dimensionality reduction,who goal is tofind compact reprentations of the data that can save memory and computational time and also improve the performance of algorithms that deal with the data.More-over,dimensionality reduction can also improve our under-standing and interpretation of the data.
Becau datats consist of high-dimensional data,most dimensionality reduction methods aim at reducing the feature-space dimension for all the ,PCA[25], LLE[34],Isomap[36],Diffusion Maps[7],etc.However, another important problem related to large datats is tofind a subt of the data that appropriately reprents the whole datat,thereby reducing the object-space dimension.This is of particular importance in summarizing and visualizing large datats of natural scenes,objects,faces,hyperspec-tral data,videos,and text.In addition,this summarization helps to remove outliers from the data as they are not true reprentatives of the datats.Finally,memory require-ment and computational time of classification and cluster-ing algorithms improve by working on a reduced number of reprentative data as oppod to a large number of data. Prior Work.To reduce the dimension of the data in the object-space andfind reprentative points,veral meth-ods have been propod[19,21,26,27,38].However, most algorithms assume that the data are either distributed around centers or lie in a low-dimensional space.Kme-doids[26],which can be considered as a variant of Kmeans, assumes that the data are distributed around veral clus-ter centers,called medoids,which are lected from the data.Kmedoids,similar to Kmeans,is an iterative algo-rithm that strongly depends on the initialization.When similarities/dissimilarities between pairs of data are given and there is a natural clustering bad on the similarities, Affinity Propagation[19],similar to Kmedoids,tries tofind a data center for each cluster using a message passing al-gorithm.When the c
ollection of data points is low-rank, Rank Revealing QR(RRQR)algorithm[5,6]tries to -lect a few data points byfinding a permutation of the data that gives the best conditioned submatrix.The algorithm has suboptimal properties,as it is not guaranteed tofind the globally optimal solution in polynomial time,and also re-lies on the low-rankness assumption.In addition,random-ized algorithms for lecting a few columns from a low-rank matrix have been propod[38].For a low-rank matrix with missing entries,[2]propos a greedy algorithm to lect a subt of the columns.For a data matrix with nonnegative entries,[17]propos a nonnegative matrix factorization us-ing an 1/ ∞optimization to lect some of the columns of the data matrix for one of the factors.
Paper Contributions.In this work,we study the problem offinding data reprentatives using dimensionality reduc-tion in the object-space.We assume that there is a subt
Figure1.Some frames of the Society Raffles video and the automatically computed reprentatives of the whole video quence using our algorithm.The reprentatives summarize the video as follows:1)there is a nicely-decorated living room,with a door stage left and a ttee in front of an open window in the foreground;2)a man in the room is talking to someone across the window;3)a couple enter the room,a man and a woman who is wearing a white gown,and a jeweled tiara.Someone,probably thefirst man,is standing on the other side of the room;4)the man who entered with the woman is talking to her and bowing,probably he wants to leave;5)thefirst man is sitting with the woman and is reaching for her tiara;6)thefirst man is leaving the room, a person is standing across the window and examining the tiara;7)the woman is entering back to the living room,so she had followed thefirst man to the door;8)the woman is clutching her head eing the bandit across the window;9)the woman is fainting on the sofa and the bandit has
退出安全模式
disappeared. Figure2.Some frames of a tennis match video,which consists of multiple shots,and the automatically computed reprentatives of the whole video quence using our algorithm.Depending on the amount of activities in each shot of the video,we obtained one or a few reprentatives for that shot.
of data points,called reprentatives,such that each point in the datat can be described as a linear combination of a few of the reprentative points.More specifically,collect-ing N data points of a datat in R m as columns of a data matrix Y∈R m×N,we consider the optimization problem min Y−Y C 2
F
< C row,0≤k,1 C=1 ,(1) where C∈R N×N is the coefficient matrix and C row,0 counts the number of nonzero rows of C[24,37].In other words,we wish tofind at most k N reprentatives that best reconstruct the data collection.This can be viewed as a spar dictionary learning scheme[1,30,33]where the atoms of the dictionary are chon from the data points and, instead of letting the support for the spar codes be arbi-trary,we enforce them to have a common support.
The lf-expressiveness property,Y=Y C,has been studied for subspace clustering using spar repr
entation [11,15]and low-rank reprentation[18,29].However, the algorithms are not targeted atfinding reprentatives becau of the norms they u for C.A framework simi-lar to that in(1),with a nonnegativity constraint on C and without the affine constraint,has been ud for nonnegative matrix factorization for the problem of hyperspectral imag-ing endmember identification[17],without the analysis of the lected columns.In the context of dictionary learning,[4]and[31]u C row,0to design compact dictionaries and to lect similar patches in an image,respectively.
In this work,we propo an algorithm for solving a con-vex relaxation of(1)and provide an analysis of the theoreti-cal guarantees of the algorithm.Our work has the following contributions with respect to the state of the art:
–Unlike prior works,we do not assume that the data are low-rank or distributed around cluster centers.We only re-quire the total number of reprentatives to be much smaller than the number of actual points in the datat.
–When the data come from a collection of low-rank mod-els,we show that our method automatically lects a few data points from each model.
–We analyze the geometry of reprentatives and show that they correspond to vertices of the conv
ex hull of the data.
–We propo a framework to detect and reject outliers from the datat using the solution of the propod optimization program.We also show how to deal with new obrvations and large datats efficiently.
–We demonstrate the propod framework in applications to video summarization(Figs.1-2)and classification using reprentatives.
2.Problem Formulation
Consider a t of points in R m
arranged as the columns of the data matrix Y = N .In this ction,we formulate the problem of finding reprentative objects from the collection of data points.
2.1.Learning Compact Dictionaries
Finding compact dictionaries to reprent data has
been well-studied in the literature [1,16,25,30,33].More specifically,in dictionary learning problems,one tries to simultaneously learn a compact dictionary D = d ∈R m ×
and coefficients X = N
∈R ×N that can efficiently reprent the collection of data points.The best reprentation of the data is typically obtained by minimizing the objective function
N i =1
y i −Dx i 22= Y −DX 2
F
(2)
with respect to the dictionary D and the coefficient matrix X ,subject to appropriate constraints.When the dictionary D is constrained to have orthonormal columns and X is unconstrained,the optimal solution for D is given by the k leading singular vectors of Y [25].On the other hand,in the spar dictionary learning framework [1,16,30,33],one requires the coefficient matrix X to be spar by solving the optimization program
min D ,X
Y −DX 2
x i 0≤s, d j 2≤1,∀i,j,(3)where x i 0indicates the number of nonzero elements of x i (its convex surrogate can be ud as well).In other words,one simultaneously learns a dictionary and coeffi-cients such that each data point y i is written as a linear com-bination of at most s atoms of the dictionary.Besides being NP-hard due to u of the 0norm,this problem is noncon-vex becau of the product of two unknown and constrained matrices D and X .As a result,iterative procedures are employed to find each unknown matrix by fixing the other,which often converges to a local minimizer [1,16].
2.2.Finding Reprentative Data
The learned atoms of the dictionary almost never co-incide with the original data [30,31,33],hence,can not be considered as good reprentatives for the collection of data points.To find reprentative points that coincide with some of the actual data points,we consider a modification to the dictionary learning framework,which first address the problem of local minima due to the product of two un-known ,the dictionary and the coefficient ma-trix.Second,it e
nforces lecting reprentatives from the actual data points.To do that,we t the dictionary to be the
诸葛亮火烧新野
matrix of data points Y and minimize the expression
N i =1
y i −Y c i 22= Y −Y C 2
F
(4)
牙齿美with respect to the coefficient matrix C N
∈R N ×N ,subject to additional constraints that we describe next.In other words,we minimize the reconstruction error of each data point as a linear combination of all the data.To choo k N reprentatives,which take part in the linear reconstruction of all the data in (4),we enforce
C 0,q ≤k,
(5)
where the mixed 0/ q norm is defined as C 0,q
N
i =1I ( c i q
>0),where c i denotes the i -th row of C and I (·)denotes the indicator function.In other words, C 0,q counts the number of nonzero rows of C .The in-dices of the nonzero rows of C correspond to the indices of the columns of Y which are chon as the data repre-ntatives.Similar to other dimensionality reduction meth-ods,we want the lection of reprentatives to be invariant with respect to a global translation of the data.We thus en-force the affine constraint 1 C =1 .This comes from the fact that if y i is reprented as y i =Y c i ,then for a global translation T ∈R m of the data,we want to have y i −T =
y 1−T ···y N −T c i .
As a result,to find k N reprentatives such that each point in the datat can be reprented as an affine combi-nation of a subt of the k reprentatives,we solve min Y −Y C 2
C 0,q ≤k,1 C =1 .(6)This is an NP-hard problem as it requires arching over ev-ery subt of the k columns of Y .A standard 1relaxation of this optimization is obtained as
min Y −Y C 2
C 1,q ≤τ,1 C =1 ,(7)where C 1,q N i =1 c i
q is the sum of the q norms of the rows of C ,and τ>0is an appropriately chon pa-rameter.1We also choo q >1for which the optimization program in (7)is convex.2
The solution of the optimization program (7)not only indicates the reprentatives as the nonzero rows of C ,but also provides information about the ,rel-ative importance,of the reprentatives for describing the datat.More precily,a reprentative that has a higher ranking takes part in the reconstruction of many points in
1We
u τinstead of k since for the k optimal reprentatives, C 1,q
is not necessarily bounded by k .
2We do not consider q =1since · 1,1treats the rows and columns equally and does not necessarily favor lecting a few nonzero rows.
the datat,hence,its corresponding row in the optimal co-efficient matrix C has many nonzero elements with large values.On the other hand,a reprentative with lower ranking takes part in the reconstruction of fewer points in the datat,hence,its corresponding row in C has a few nonzero elements with smaller values.Thus,we can rank k
reprentatives y i
1,...,y i
k
as i1≥i2≥···≥i ,y i
1
has the highest rank and y i
k has the lowest rank,whenever
for the corresponding rows of C we have
c i1
q
≥
c i2
q
≥···≥
c i k
q
.(8)
Another optimization formulation,which is cloly re-lated to(6)is
min C
0, Y−Y C
F
≤ε,1 C=1 ,(9)
which minimizes the number of reprentatives that can re-construct the collection of data points up to anεerror.An 1relaxation of it is given by
min C
1, Y−Y C
F
≤ε,1 C=1 .(10)
This optimization problem can also be viewed in a compres-sion scheme where we want to choo a few reprentatives that can reconstruct the data up to anεerror.
3.Geometry of Reprentatives
We now study the geometry of the reprentative points obtained from the propod convex optimization programs. We consider the optimization program(10)where we t the error toleranceεto zero.First,we show that(10),with a natural additional nonnegativity constraint on C,finds the vertices of the convex hull of the datat.This is,on its own, an interesting result for computing the convex hulls using spar reprentation methods and convex optimization.In addition,the robust versions of the optimization program, e.g.,ε>0,offer robust approaches for lecting convex hull vertices when the data are perturbed by noi.More precily,for the optimization program
min C
1,q
Theorem1Let H be the convex hull of the columns of Y and let k be the number of vertices of H.The nonzero rows of the solution of the optimization program(11),for1< q≤∞,correspond to the k vertices of H.More precily, the optimal solution C∗has the following form
C∗=Γ
I k∆
00
,(12)
where I k is the k-dimensional identity matrix,the elements of∆lie in[0,1),andΓis a permutation matrix.Theorem1implies that,if the coefficient matrix is nonneg-ative,the reprentatives are the vertices of the convex hull of the data,H.3Without the nonnegativity constraint,one would expect to choo a subt of the vertices of H as the reprentatives.In addition,when the data lie in a(k−1)-dimensional subspace and are enclod by k data points, i.e.,H has k vertices,then we canfind exactly k repren-tatives given by the vertices of H.More precily,we show the following result[10].
Theorem2Let H be the convex hull of the columns of Y and let k be the number of vertices of H.Consider the opti-mization program(10)for1<q≤∞andε=0.Then the nonzero rows of a solution correspond to a subt of the ver-tices of H that span the affine subspace containing the data. Moreover,if the columns of Y lie in a(k−1)-dimensional affine subspace of R m,a solution is of the form
C∗=Γ
I k∆
00
,(13)
whereΓis a permutation matrix and the k nonzero rows of C∗correspond to the k vertices of H.
4.Reprentatives of Subspaces
We now show that when the data come from a collection of low-rank models,the reprentatives provide information about the underlying models.More specifically,we assume that the data lie in a union of affine subspaces S1,...,S n of R m and consider the optimization program
min C
1,q
ntatives from every subspace(left plot of Figure3)where the number of reprentatives from each subspace is greater than or equal to its dimension. More precily,we have the following result[10]. Theorem3If the data points are drawn from a union of independent ,if the subspaces are such that dim(⊕i S i)=
i
dim(S i),then the solution of(14)finds at least dim(S i)reprentatives from each subspace S i.In addition,each data point is perfectly reconstructed by the combination of the reprentatives from its own subspace. Since the dimension of the collection of reprentatives in each subspace S i is equal to dim(S i),the dimension of the collection of reprentatives from all subspaces can be as as large as the dimension of the ambient space m by the fact that
奢侈化妆品品牌i
dim(S i)=dim(⊕i S i)≤m.
3Note that the solution of the 1minimization without the affine and nonnegativity constraints is known to choo a few of the vertices of the symmetrized convex hull of the data[8].Our result is different as
we place a general mixed 1/ q norm on the rows of C and show that for any q>1 the solution of(11)finds all vertices of the convex hull of the data.
0.1
0.2
0.30.40.50.6
0.70.80
0.1知母的作用
0.2
0.30.40.50.60.7
0.80.91with outliers.The last t of points corresponds to outliers.
The optimization program (14)can also address the connectivity issues [32]of subspace clustering algorithms bad on spar reprentation [11,15,35]or low-rank rep-rentation [18,29].More precis
ely,as discusd in [15],adding a regularizer of the form C 1,2to the spar [11]or low-rank [29]objective function improves the connectiv-ity of the points in each subspace,preventing the points in a subspace to be divided into multiple components of the similarity graph.
5.Practical Considerations and Extensions
We now discuss some of the practical problems related to finding reprentative points of real datats.
5.1.Dealing with Outliers
In many real-world problems,the collection of data in-cludes outliers.For example,a datat of natural scenes,objects,or faces collected from the internet can contain im-ages that do not belong to the target category.A method that robustly finds true reprentatives for the datat is of par-ticular importance,as it reduces the redundancy of the data and removes points that do not really belong to the datat.In this ction,we discuss how our method can directly deal with outliers and robustly find reprentatives for datats.We u the fact that outliers are often incoherent with respect to the collection of the true data.Hence,an out-lier prefers to write itlf as an affine combination of itlf,while true data points choo points among themlves as reprentatives as they are more co
herent with each other.In other words,if we denote the inliers by Y and the out-liers by Y o ∈R m ×N o ,for the optimization program
min C 1,q
< Y Y o = Y
Y o
C ,1 C =1
,
(15)
we expect the solution to have the structure
C ∗= ∆0
菜豆树0I N o
.
(16)
In other words,each outlier is a reprentative of itlf,as shown in the right plot of Figure 3.We can therefore
identify the outliers by analyzing the row-sparsity of the solution.Among the rows of the coefficient matrix that correspond to the reprentatives,the ones that have many nonzero elements correspond to the true data,and the ones that have just one nonzero element correspond to outliers.In practice,C ∗might not have exactly the form of (16).However,we still expect that an outlier take part in the rep-rentation of only a few other outliers or true data points.Hence,the rows of C ∗corresponding to outliers should have very few nonzero entries.To detect and reject outliers,we define the row-sparsity index of each candidate repre-ntative as
rsi ( )=N c ∞− c 1
(N −1) c 1∈[0,1].4(17)For a row corresponding to an outlier,which has one or a
few nonzero elements,the rsi value is clo to 1,while for a row which corresponds to a true reprentative the rsi is clo to zero.Hence,we can reject outliers by lecting reprentatives who rsi value is larger than a threshold δ.
5.2.Dealing with New Obrvations
An important problem in finding reprentatives is to update the t of reprentative points when new data are added to the datat.Let Y be the collection of points that has already been in the datat and Y new be the new points that are added to the datat.In order to find the repren-tatives for the whole datat including the old and the new data,one has to solve the optimization program min C 1,q
< Y Y new = Y
Y new
C ,1 C =1 .
肌酸什么时候吃(18)
However,note that we have already found the reprenta-tives of Y ,denoted by Y rep ,which can efficiently describe the collection of data in Y .Thus,it is sufficient to e if the elements of Y rep are a good reprentative of the new data Y new ,or equivalently,update the reprentatives so that they can well describe the elements of Y rep as well as Y new .Thus,we can solve the optimization progra
m min C 1,q
< Y rep Y new = Y rep Y new
C ,1 C =1 ,
(19)on the reduced datat Y rep Y new ,which is typically of
much smaller size than Y Y new ,hence it can be solved more efficiently.5
Using similar ideas we can also deal with large datats using a hierarchical framework.More specifically,we can
4We u the fact that for c
∈R N we have c 1/N ≤ c ∞≤ c 1.
5In general,we can minimize QC
1,q ,for a diagonal nonnegative
matrix Q ,which gives relative weights to keeping the old reprentatives and lecting new reprentatives.