首页 > 美文鉴赏

2012_CVPR_See All by Looking at A FewSpar Modeling for Finding Reprentative Objects

更新时间:2023-06-17 09:19:31 阅读：评论：0

See All by Looking at A Few:

没有人比你更爱我

Spar Modeling for Finding Reprentative Objects

Ehsan Elhamifar Johns Hopkins University

Guillermo Sapiro

University of Minnesota

Ren´e Vidal

Johns Hopkins University

Abstract

We consider the problem ofﬁnding a few reprentatives for a ,a subt of data points that efﬁciently describes the entire datat.We assume that each data point can be expresd as a linear combination of the rep-rentatives and formulate the problem ofﬁnding the rep-rentatives as a spar multiple measurement vector prob-lem.In our formulation,both the dictionary and the me

a-surements are given by the data matrix,and the unknown spar codes lect the reprentatives via convex optimiza-tion.In general,we do not assume that the data are low-rank or distributed around cluster centers.When the data do come from a collection of low-rank models,we show that our method automatically lects a few reprentatives from each low-rank model.We also analyze the geometry of the reprentatives and discuss their relationship to the vertices of the convex hull of the data.We show that our framework can be extended to detect and reject outliers in datats,and to efﬁciently deal with new obrvations and large datats. The propod framework and theoretical foundations are il-lustrated with examples in video summarization and image classiﬁcation using reprentatives.

1.Introduction

In many areas of machine learning,computer vision,sig-nal/image processing,and information retrieval,one needs to deal with massive collections of data,such as databas of images,videos,and text documents.This has motivated a lot of work in the area of dimensionality reduction,who goal is toﬁnd compact reprentations of the data that can save memory and computational time and also improve the performance of algorithms that deal with the data.More-over,dimensionality reduction can also improve our under-standing and interpretation of the data.

Becau datats consist of high-dimensional data,most dimensionality reduction methods aim at reducing the feature-space dimension for all the ,PCA[25], LLE[34],Isomap[36],Diffusion Maps[7],etc.However, another important problem related to large datats is toﬁnd a subt of the data that appropriately reprents the whole datat,thereby reducing the object-space dimension.This is of particular importance in summarizing and visualizing large datats of natural scenes,objects,faces,hyperspec-tral data,videos,and text.In addition,this summarization helps to remove outliers from the data as they are not true reprentatives of the datats.Finally,memory require-ment and computational time of classiﬁcation and cluster-ing algorithms improve by working on a reduced number of reprentative data as oppod to a large number of data. Prior Work.To reduce the dimension of the data in the object-space andﬁnd reprentative points,veral meth-ods have been propod[19,21,26,27,38].However, most algorithms assume that the data are either distributed around centers or lie in a low-dimensional space.Kme-doids[26],which can be considered as a variant of Kmeans, assumes that the data are distributed around veral clus-ter centers,called medoids,which are lected from the data.Kmedoids,similar to Kmeans,is an iterative algo-rithm that strongly depends on the initialization.When similarities/dissimilarities between pairs of data are given and there is a natural clustering bad on the similarities, Afﬁnity Propagation[19],similar to Kmedoids,tries toﬁnd a data center for each cluster using a message passing al-gorithm.When the c

ollection of data points is low-rank, Rank Revealing QR(RRQR)algorithm[5,6]tries to -lect a few data points byﬁnding a permutation of the data that gives the best conditioned submatrix.The algorithm has suboptimal properties,as it is not guaranteed toﬁnd the globally optimal solution in polynomial time,and also re-lies on the low-rankness assumption.In addition,random-ized algorithms for lecting a few columns from a low-rank matrix have been propod[38].For a low-rank matrix with missing entries,[2]propos a greedy algorithm to lect a subt of the columns.For a data matrix with nonnegative entries,[17]propos a nonnegative matrix factorization us-ing an 1/ ∞optimization to lect some of the columns of the data matrix for one of the factors.

Paper Contributions.In this work,we study the problem ofﬁnding data reprentatives using dimensionality reduc-tion in the object-space.We assume that there is a subt

Figure1.Some frames of the Society Rafﬂes video and the automatically computed reprentatives of the whole video quence using our algorithm.The reprentatives summarize the video as follows:1)there is a nicely-decorated living room,with a door stage left and a ttee in front of an open window in the foreground;2)a man in the room is talking to someone across the window;3)a couple enter the room,a man and a woman who is wearing a white gown,and a jeweled tiara.Someone,probably theﬁrst man,is standing on the other side of the room;4)the man who entered with the woman is talking to her and bowing,probably he wants to leave;5)theﬁrst man is sitting with the woman and is reaching for her tiara;6)theﬁrst man is leaving the room, a person is standing across the window and examining the tiara;7)the woman is entering back to the living room,so she had followed theﬁrst man to the door;8)the woman is clutching her head eing the bandit across the window;9)the woman is fainting on the sofa and the bandit has

退出安全模式

disappeared. Figure2.Some frames of a tennis match video,which consists of multiple shots,and the automatically computed reprentatives of the whole video quence using our algorithm.Depending on the amount of activities in each shot of the video,we obtained one or a few reprentatives for that shot.

of data points,called reprentatives,such that each point in the datat can be described as a linear combination of a few of the reprentative points.More speciﬁcally,collect-ing N data points of a datat in R m as columns of a data matrix Y∈R m×N,we consider the optimization problem min Y−Y C 2

< C row,0≤k,1 C=1 ,(1) where C∈R N×N is the coefﬁcient matrix and C row,0 counts the number of nonzero rows of C[24,37].In other words,we wish toﬁnd at most k N reprentatives that best reconstruct the data collection.This can be viewed as a spar dictionary learning scheme[1,30,33]where the atoms of the dictionary are chon from the data points and, instead of letting the support for the spar codes be arbi-trary,we enforce them to have a common support.

The lf-expressiveness property,Y=Y C,has been studied for subspace clustering using spar repr

entation [11,15]and low-rank reprentation[18,29].However, the algorithms are not targeted atﬁnding reprentatives becau of the norms they u for C.A framework simi-lar to that in(1),with a nonnegativity constraint on C and without the afﬁne constraint,has been ud for nonnegative matrix factorization for the problem of hyperspectral imag-ing endmember identiﬁcation[17],without the analysis of the lected columns.In the context of dictionary learning,[4]and[31]u C row,0to design compact dictionaries and to lect similar patches in an image,respectively.

In this work,we propo an algorithm for solving a con-vex relaxation of(1)and provide an analysis of the theoreti-cal guarantees of the algorithm.Our work has the following contributions with respect to the state of the art:

–Unlike prior works,we do not assume that the data are low-rank or distributed around cluster centers.We only re-quire the total number of reprentatives to be much smaller than the number of actual points in the datat.

–When the data come from a collection of low-rank mod-els,we show that our method automatically lects a few data points from each model.

–We analyze the geometry of reprentatives and show that they correspond to vertices of the conv

ex hull of the data.

–We propo a framework to detect and reject outliers from the datat using the solution of the propod optimization program.We also show how to deal with new obrvations and large datats efﬁciently.

–We demonstrate the propod framework in applications to video summarization(Figs.1-2)and classiﬁcation using reprentatives.

2.Problem Formulation

Consider a t of points in R m

arranged as the columns of the data matrix Y = N .In this ction,we formulate the problem of ﬁnding reprentative objects from the collection of data points.

2.1.Learning Compact Dictionaries

Finding compact dictionaries to reprent data has

been well-studied in the literature [1,16,25,30,33].More speciﬁcally,in dictionary learning problems,one tries to simultaneously learn a compact dictionary D = d ∈R m ×

and coefﬁcients X = N

∈R ×N that can efﬁciently reprent the collection of data points.The best reprentation of the data is typically obtained by minimizing the objective function

N i =1

y i −Dx i 22= Y −DX 2

(2)

with respect to the dictionary D and the coefﬁcient matrix X ,subject to appropriate constraints.When the dictionary D is constrained to have orthonormal columns and X is unconstrained,the optimal solution for D is given by the k leading singular vectors of Y [25].On the other hand,in the spar dictionary learning framework [1,16,30,33],one requires the coefﬁcient matrix X to be spar by solving the optimization program

min D ,X

Y −DX 2

x i 0≤s, d j 2≤1,∀i,j,(3)where x i 0indicates the number of nonzero elements of x i (its convex surrogate can be ud as well).In other words,one simultaneously learns a dictionary and coefﬁ-cients such that each data point y i is written as a linear com-bination of at most s atoms of the dictionary.Besides being NP-hard due to u of the 0norm,this problem is noncon-vex becau of the product of two unknown and constrained matrices D and X .As a result,iterative procedures are employed to ﬁnd each unknown matrix by ﬁxing the other,which often converges to a local minimizer [1,16].

2.2.Finding Reprentative Data

The learned atoms of the dictionary almost never co-incide with the original data [30,31,33],hence,can not be considered as good reprentatives for the collection of data points.To ﬁnd reprentative points that coincide with some of the actual data points,we consider a modiﬁcation to the dictionary learning framework,which ﬁrst address the problem of local minima due to the product of two un-known ,the dictionary and the coefﬁcient ma-trix.Second,it e

nforces lecting reprentatives from the actual data points.To do that,we t the dictionary to be the

诸葛亮火烧新野

matrix of data points Y and minimize the expression

N i =1

y i −Y c i 22= Y −Y C 2

(4)

牙齿美with respect to the coefﬁcient matrix C N

∈R N ×N ,subject to additional constraints that we describe next.In other words,we minimize the reconstruction error of each data point as a linear combination of all the data.To choo k N reprentatives,which take part in the linear reconstruction of all the data in (4),we enforce

C 0,q ≤k,

(5)

where the mixed 0/ q norm is deﬁned as C 0,q

i =1I ( c i q

>0),where c i denotes the i -th row of C and I (·)denotes the indicator function.In other words, C 0,q counts the number of nonzero rows of C .The in-dices of the nonzero rows of C correspond to the indices of the columns of Y which are chon as the data repre-ntatives.Similar to other dimensionality reduction meth-ods,we want the lection of reprentatives to be invariant with respect to a global translation of the data.We thus en-force the afﬁne constraint 1 C =1 .This comes from the fact that if y i is reprented as y i =Y c i ,then for a global translation T ∈R m of the data,we want to have y i −T =

y 1−T ···y N −T c i .

As a result,to ﬁnd k N reprentatives such that each point in the datat can be reprented as an afﬁne combi-nation of a subt of the k reprentatives,we solve min Y −Y C 2

C 0,q ≤k,1 C =1 .(6)This is an NP-hard problem as it requires arching over ev-ery subt of the k columns of Y .A standard 1relaxation of this optimization is obtained as

min Y −Y C 2

C 1,q ≤τ,1 C =1 ,(7)where C 1,q N i =1 c i

q is the sum of the q norms of the rows of C ,and τ>0is an appropriately chon pa-rameter.1We also choo q >1for which the optimization program in (7)is convex.2

The solution of the optimization program (7)not only indicates the reprentatives as the nonzero rows of C ,but also provides information about the ,rel-ative importance,of the reprentatives for describing the datat.More precily,a reprentative that has a higher ranking takes part in the reconstruction of many points in

1We

u τinstead of k since for the k optimal reprentatives, C 1,q

is not necessarily bounded by k .

2We do not consider q =1since · 1,1treats the rows and columns equally and does not necessarily favor lecting a few nonzero rows.

the datat,hence,its corresponding row in the optimal co-efﬁcient matrix C has many nonzero elements with large values.On the other hand,a reprentative with lower ranking takes part in the reconstruction of fewer points in the datat,hence,its corresponding row in C has a few nonzero elements with smaller values.Thus,we can rank k

reprentatives y i

1,...,y i

as i1≥i2≥···≥i ,y i

has the highest rank and y i

k has the lowest rank,whenever

for the corresponding rows of C we have

c i1

≥

c i2

≥···≥

c i k

.(8)

Another optimization formulation,which is cloly re-lated to(6)is

min C

0, Y−Y C

≤ε,1 C=1 ,(9)

which minimizes the number of reprentatives that can re-construct the collection of data points up to anεerror.An 1relaxation of it is given by

min C

1, Y−Y C

≤ε,1 C=1 .(10)

This optimization problem can also be viewed in a compres-sion scheme where we want to choo a few reprentatives that can reconstruct the data up to anεerror.

3.Geometry of Reprentatives

We now study the geometry of the reprentative points obtained from the propod convex optimization programs. We consider the optimization program(10)where we t the error toleranceεto zero.First,we show that(10),with a natural additional nonnegativity constraint on C,ﬁnds the vertices of the convex hull of the datat.This is,on its own, an interesting result for computing the convex hulls using spar reprentation methods and convex optimization.In addition,the robust versions of the optimization program, e.g.,ε>0,offer robust approaches for lecting convex hull vertices when the data are perturbed by noi.More precily,for the optimization program

min C

1,q

Theorem1Let H be the convex hull of the columns of Y and let k be the number of vertices of H.The nonzero rows of the solution of the optimization program(11),for1< q≤∞,correspond to the k vertices of H.More precily, the optimal solution C∗has the following form

C∗=Γ

I k∆

,(12)

where I k is the k-dimensional identity matrix,the elements of∆lie in[0,1),andΓis a permutation matrix.Theorem1implies that,if the coefﬁcient matrix is nonneg-ative,the reprentatives are the vertices of the convex hull of the data,H.3Without the nonnegativity constraint,one would expect to choo a subt of the vertices of H as the reprentatives.In addition,when the data lie in a(k−1)-dimensional subspace and are enclod by k data points, i.e.,H has k vertices,then we canﬁnd exactly k repren-tatives given by the vertices of H.More precily,we show the following result[10].

Theorem2Let H be the convex hull of the columns of Y and let k be the number of vertices of H.Consider the opti-mization program(10)for1<q≤∞andε=0.Then the nonzero rows of a solution correspond to a subt of the ver-tices of H that span the afﬁne subspace containing the data. Moreover,if the columns of Y lie in a(k−1)-dimensional afﬁne subspace of R m,a solution is of the form

C∗=Γ

I k∆

,(13)

whereΓis a permutation matrix and the k nonzero rows of C∗correspond to the k vertices of H.

4.Reprentatives of Subspaces

We now show that when the data come from a collection of low-rank models,the reprentatives provide information about the underlying models.More speciﬁcally,we assume that the data lie in a union of afﬁne subspaces S1,...,S n of R m and consider the optimization program

min C

1,q

ntatives from every subspace(left plot of Figure3)where the number of reprentatives from each subspace is greater than or equal to its dimension. More precily,we have the following result[10]. Theorem3If the data points are drawn from a union of independent ,if the subspaces are such that dim(⊕i S i)=

dim(S i),then the solution of(14)ﬁnds at least dim(S i)reprentatives from each subspace S i.In addition,each data point is perfectly reconstructed by the combination of the reprentatives from its own subspace. Since the dimension of the collection of reprentatives in each subspace S i is equal to dim(S i),the dimension of the collection of reprentatives from all subspaces can be as as large as the dimension of the ambient space m by the fact that

奢侈化妆品品牌i

dim(S i)=dim(⊕i S i)≤m.

3Note that the solution of the 1minimization without the afﬁne and nonnegativity constraints is known to choo a few of the vertices of the symmetrized convex hull of the data[8].Our result is different as

we place a general mixed 1/ q norm on the rows of C and show that for any q>1 the solution of(11)ﬁnds all vertices of the convex hull of the data.

0.1

0.2

0.30.40.50.6

0.70.80

0.1知母的作用

0.2

0.30.40.50.60.7

0.80.91with outliers.The last t of points corresponds to outliers.

The optimization program (14)can also address the connectivity issues [32]of subspace clustering algorithms bad on spar reprentation [11,15,35]or low-rank rep-rentation [18,29].More precis

ely,as discusd in [15],adding a regularizer of the form C 1,2to the spar [11]or low-rank [29]objective function improves the connectiv-ity of the points in each subspace,preventing the points in a subspace to be divided into multiple components of the similarity graph.

5.Practical Considerations and Extensions

We now discuss some of the practical problems related to ﬁnding reprentative points of real datats.

5.1.Dealing with Outliers

In many real-world problems,the collection of data in-cludes outliers.For example,a datat of natural scenes,objects,or faces collected from the internet can contain im-ages that do not belong to the target category.A method that robustly ﬁnds true reprentatives for the datat is of par-ticular importance,as it reduces the redundancy of the data and removes points that do not really belong to the datat.In this ction,we discuss how our method can directly deal with outliers and robustly ﬁnd reprentatives for datats.We u the fact that outliers are often incoherent with respect to the collection of the true data.Hence,an out-lier prefers to write itlf as an afﬁne combination of itlf,while true data points choo points among themlves as reprentatives as they are more co

herent with each other.In other words,if we denote the inliers by Y and the out-liers by Y o ∈R m ×N o ,for the optimization program

min C 1,q

< Y Y o = Y

Y o

C ,1 C =1

(15)

we expect the solution to have the structure

C ∗= ∆0

菜豆树0I N o

(16)

In other words,each outlier is a reprentative of itlf,as shown in the right plot of Figure 3.We can therefore

identify the outliers by analyzing the row-sparsity of the solution.Among the rows of the coefﬁcient matrix that correspond to the reprentatives,the ones that have many nonzero elements correspond to the true data,and the ones that have just one nonzero element correspond to outliers.In practice,C ∗might not have exactly the form of (16).However,we still expect that an outlier take part in the rep-rentation of only a few other outliers or true data points.Hence,the rows of C ∗corresponding to outliers should have very few nonzero entries.To detect and reject outliers,we deﬁne the row-sparsity index of each candidate repre-ntative as

rsi ( )=N c ∞− c 1

(N −1) c 1∈[0,1].4(17)For a row corresponding to an outlier,which has one or a

few nonzero elements,the rsi value is clo to 1,while for a row which corresponds to a true reprentative the rsi is clo to zero.Hence,we can reject outliers by lecting reprentatives who rsi value is larger than a threshold δ.

5.2.Dealing with New Obrvations

An important problem in ﬁnding reprentatives is to update the t of reprentative points when new data are added to the datat.Let Y be the collection of points that has already been in the datat and Y new be the new points that are added to the datat.In order to ﬁnd the repren-tatives for the whole datat including the old and the new data,one has to solve the optimization program min C 1,q

< Y Y new = Y

Y new

C ,1 C =1 .

肌酸什么时候吃(18)

However,note that we have already found the reprenta-tives of Y ,denoted by Y rep ,which can efﬁciently describe the collection of data in Y .Thus,it is sufﬁcient to e if the elements of Y rep are a good reprentative of the new data Y new ,or equivalently,update the reprentatives so that they can well describe the elements of Y rep as well as Y new .Thus,we can solve the optimization progra

m min C 1,q

< Y rep Y new = Y rep Y new

C ,1 C =1 ,

(19)on the reduced datat Y rep Y new ,which is typically of

much smaller size than Y Y new ,hence it can be solved more efﬁciently.5

Using similar ideas we can also deal with large datats using a hierarchical framework.More speciﬁcally,we can

4We u the fact that for c

∈R N we have c 1/N ≤ c ∞≤ c 1.

5In general,we can minimize QC

1,q ,for a diagonal nonnegative

matrix Q ,which gives relative weights to keeping the old reprentatives and lecting new reprentatives.

本文发布于:2023-06-17 09:19:31，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/1042288.html

上一篇：lasso特征选择python_[机器学习]特征选择简明指南

下一篇：机器人顶刊论文

标签：退出火烧品牌

留言与评论（共有 0 条评论）