Minimax embeddings
Matthew Brand
Mitsubishi Electric Rearch Labs篮球英语
Cambridge MA02139USA
Abstract
Spectral methods for nonlinear dimensionality reduction(NLDR)impo
a neighborhood graph on point data and compute eigenfunctions of a
quadratic form generated from the graph.We introduce a more general
and more robust formulation of NLDR bad on the singular value de-
composition(SVD).In this framework,most spectral NLDR principles
can be recovered by taking a subt of the constraints in a quadratic form
built from local nullspaces on the manifold.The minimax formulation
also opens up an interesting class of methods in which the graph is“dec-
orated”with information at the vertices,offering discrete or continuous
maps,reduced computational complexity,and immunity to some solu-
tion instabilities of eigenfunction approaches.Apropos,we show almost
all NLDR methods bad on eigenvalue decompositions(EVD)have a so-
lution instability that increas faster than problem size.This pathology
can be obrved(and corrected via the minimax formulation)in problems
as small as N<100points.
1Nonlinear dimensionality reduction(NLDR)
Spectral NLDR methods are graph embedding problems where a t of N points X . =
[x1,···,x N]∈R D×N sampled from a low-dimensional manifold in a ambient space R D is reparameterized by imposing a neighborhood graph G on X and embedding the graph with minimal distortion in a“parameterization”space R d,d<D.Typically the graph is spar and local,with edges connecting points to their immediate neighbors.The embedding must keep the edges short or prerve their length(for isometry)or angles(for conformality). The graph-embedding problem wasfirst introduced as a least-squares problem by Tutte[1], and as an eigenvalue problem by Fiedler[2].The u of spar graphs to generate metrics for least-squares problems has been studied intenly in the following three decades(e [3]).Modern NLDR methods u graph constraints to generate a metric in a space of embed-dings R N.Eigenvalue decomposition(EVD)gives the directions of least or greatest variance under this metric.Typically a subt of d extremal eigenvectors gives the embedding of N points in R d parameterization space.This includes the IsoMap family[4],the locally linear embedding(LLE)family[5,6],and Laplacian methods[7,8].Using similar methods,the Automatic Alignment[6]and Charting[9]algorithms embed local subspaces instead of points,and by combining subspace projections thus obtain continuous maps between R D and R d.
This paper introduces a general algebraic framework for computing optimal embeddings directly fro
m graph constraints.The aforementioned methods can can be recovered as spe-cial cas.The framework also suggests some new methods with very attractive properties, including continuous maps,reduced computational complexity,and control over the degree
of conformality/isometry in the desired map.It also eliminates a solution instability that is
intrinsic to EVD-bad approaches.A perturbational analysis quantifies the instability.
2Minimax theorem for graph embeddings
We begin with neighborhood graph specified by a nondiagonal weighted adjacency matrix
M∈R N×N that has the data-reproducing property XM=X(this can be relaxed to XM≈
X in practice).The graph-embedding and NLDR literatures offer various constructions of
M,each appropriate to different ts of assumptions about the original embedding and
its sampling ,isometry,local linearity,noiless samples,regular sampling,etc.).
Typically M i j=0if points i,j are nearby on the intrinsic manifold and|M i j|is small or zero otherwi.Ea
ch point is taken to be a linear or convex combination of its neighbors,
and thus M specifies manifold connectivity in the n that any nondegenerate embedding
the giantY that satisfies YM≈Y with small residual YM−Y F will prerve this connectivity
and the structure of local neighborhoods.For example,in barycentric embeddings,each
point is the average of its neighbors and thus M i j=1/k if vertex i is connected to vertex j (of degree k).We will also consider three optional constraints on the embedding:
1.A null-space restriction,where the solution must be outside to the column-space
of C∈R N×M,M<N.For example,it is common to stipulate that the solution Y be ,YC=0for C=1,the constant vector.
2.A basis restriction,where the solution must be a linear combination of the rows
of basis Z∈R K×N,K≤N.This can be thought of as information placed at the vertices of the graph that rves as example inputs for a target NLDR function.We will u this to construct dimension-reducing radial basis function networks.
3.A metricΣ∈R N×N that determines how error is distributed over the points.For
example,it might be important that boundary points have less error.We assume thatΣis symmetric positive definite and has factorizationΣ=AA⊤(e.g.,A could be a Cholesky factor ofΣ).
In most ttings,the optional matrices will default to the identity matrix.In this context,
we define the per-dimension embedding error of row-vector y i∈rows(Y)to be
E M(y i).=max
颁布英文y i∈range(Z),,K∈R M×N (y i(M+CD)−y i)A
y i A (1)
where D is a matrix constructed by an adversary to maximize the error.The optimizing y i is a vector inside the subspace spanned by the rows of Z and outside the subspace spanned by the columns of C,for which the reconstruction residual y i M−y i has smallest he metricΣ.The following theorem identifies the optimal embedding Y for any choice of M,Z,C,Σ:
Minimax solution:Let Q∈S K×P be a column-orthonormal basis of the null-space of the rows of ZC,with P=K−rank(C).Let B∈R P×P be a square factor satisfying B⊤B= Q⊤ZΣZ⊤,a Cholesky factor(or the“R”factor in QR-decomposition of(Q⊤ZA)⊤). Compute the left singular vectors U∈S N×N of U diag(s)V⊤=B−⊤Q⊤Z(I−M)A,with singular values s⊤.=[s1,···,s P]ordered s1≤s2≤···≤s p.Using the leading columns U1:d of U,t Y=U⊤1:d B−⊤Q⊤Z.
Theorem1.Y is the optimal(minimax)embedding in R d with error [s1,···,s d] 2:
Y .
=U⊤1:d B−⊤Q⊤Z=arg min
Y∈R d×N
∑
y i∈rows(Y)
E M(y i)2with E M(y i)=s i.(2)ghz
Appendix A develops the proof and other error measures that are minimized.
Local NLDR techniques are easily expresd in this framework.When Z =A =I ,C =[],and M reproduces X through linear combinations with M ⊤1=1,we recover LLE [5].When Z =I ,C =[],I −M is the normalized graph Laplacian,and A is a diagonal matrix of vertex degrees,we recover Laplacian eigenmaps [7].When further Z =X we recover locally prerving projections [8].
3Analysis and generalization of charting
The minimax construction of charting [9]takes some development,but offers an interest-ing insight into the above-mentioned methods.Recall that charting first solves for a t of local affine subspace axes S 1∈R D ×d ,S 2,···at offts µ1∈R D ,µ2,···that best cover the data and vary smoothly over the manifold.Each subspace offers a chart—a local pa-rameterization of the data by projection onto the local axes.Charting then constructs a weighted mixture of affine projections that merges the charts into a global parameteriza-tion.If the data manifold is curved,each projection will assign a point a slightly different embedding,so the error is measured as the variance of the propod embeddings about their mean.This maximizes consistency and tends to produce isometric embeddings;[9]discuss ways to explicitly optimize the isometry of the embedding.
Under the assumption of isometry,the charting error is equivalent to the sum-squared displacements
of an embedded point relative to its immediate neighbors (summed over all neighborhoods).To construct the same error criteria in the min-imax tting,let x i −k ,···,x i ,···,x i +k denote points in the i th neighborhood and let the columns of V i ∈R (2k +1)×d be an orthonormal basis of rows of the local pa-rameterization S ⊤i [x i −k ,···,x i ,···,x i +k ].Then a nonzero reparameterization will satisfy
[y i −k ,···,y i ,···,y i +k ]V i V ⊤i =[y i −k ,···,y i ,···,y i +k ]if and only if it prerves the relative position of the points in the local parameterization.Converly,any relative displacements of the points are isolated by the formula [y i −k ,···,y i ,···,y i +k ](I −V i V ⊤i ).Minimizing the Frobenius norm of this expression is thus equivalent to minimizing the local error in charting.We sum the constraints over all neighborhoods to obtain the constraint matrix M =I −∑i F i (I −V i V ⊤i )F ⊤i ,where (F i )k j =1iff the j th point of the i th neighborhood is
the k th point of the datat.Becau V i V ⊤i and (I −V i V ⊤i )are complementary,it follows that the error criterion of any local NLDR method (e.g.,LLE ,Laplacian eigenmaps,etc.)
must measure the projection of the embedding onto some subspace of (I −V i V ⊤i ).To construct a continuous map,charting us an overcomplete radial basis function (RBF )reprentation Z =[z (x 1),z (x 2),···z (x N )],where z (x )is a vector that stacks z 1(x ),z 2(x ),etc.,and z m (x ).= K ⊤m (x −µm )1 p m (x )∑m p m (x )
,(3)p m (x ).=N (x |µm ,Σm )∝e −(x −µm )⊤Σ−1m (x −µm )/2(4)
and K m is any local linear dimensionality reducer,typically S m itlf.Each column of Z contains many “views”of the same point that are combined to give its low-dimensional embedding.
Finally,we t C =1,which forces the embedding of the full data to be centered.Applying the minimax solution to the constraints yields the RBF network mixing ma-trix,f (x ).=U ⊤1:d B
−⊤Q ⊤z (x ).Theorem 1guarantees that the resulting embedding is least-squares Z ,M ,C ,A at the datapoints f (x i ),and becau f (·)is an affine trans-form of z (·)it smoothly interpolates the embedding between points.
There are some interesting variants:
Fig.1.Minimax and generalized EVD solution for kernel eigenmap of a non-developable swiss roll.Points are connected into a grid which ideally should be regular.The EVD so-lution shows substantial degradation.Ints detail corners where the EVD solution cross itlf repeatedly.The border compression is characteristic of Laplacian constraints.
剑桥少儿英语下载One-shot charting:If we t the local dimensionality reducers to the identity matrix(all K m=I),then the minimax method jointly optimizes the local dimensionality reduction to charts and the global coordination of the charts(under any choice of M).This requires that rows(Z)≤N for a fully determined solution.
Discrete isometric charting:If Z=I then we directly obtain a discrete isometric embed-ding of the data,rather than a continuous map,making this a local equivalent of IsoMap. Reduced basis charting:Let Z be constructed using just a small number of kernels ran-domly placed on the data manifold,such that rows(Z)≪N.Then the size of the SVD problem is substantially reduced.
4Numerical advantage of minimax method
Note that the minimax method projects the constraint matrix M into a subspace derived from C and Z and decompos it there.This suppress unwanted degrees of freedom (DOF s)admitted by the pro
blem constraints,for example the trivial R0embedding where all points are mapped to a single point y i=N−1/2.The R0embedding rves as a trans-lational DOF in the solution.LLE-and eigenmap-bad methods construct M to have a constant null-space so that the translational DOF will be isolated in the EVD as null eigen-value paired to a constant eigenvector,which is then discarded.However,ction4.1shows that this construction makes the EVD increasingly unstable as problem size grows and/or the data becomes increasing amenable to low-residual embeddings,ultimately causing solution collap.As the next paragraph demonstrates,the problem is exacerbated when a basis Z(via the equivalent generalized eigenproblem),partly becau the eigenvec-tor associated with the unwanted DOF can have arbitrary structure.In all cas the problem can be averted by using the minimax formulation with C=1to suppress the DOF.
A2D plane was embedded in3D with a curl,a twist,and2.5%Gaussian noi,then regu-larly sampled at900points.We computed a kernelized Laplacian eigenmap using70ran-dom points as RBF ,a continous map using M derived from the graph Laplacian and Z constructed as above.The map was computed both via the minimax(SVD)method and via the equivalent generalized eigenproblem,where the translational degree of freedom must be removed by discarding an eigenvector from the solution.The two solutions are al-gebraically equivalent in every other regard.A variety of eigensolvers were tried;we took
10020005
1015x 10
−5eigenvalue e x c e s s e n e r g y
Eigen spectrum compared to minimax spectrum eigenvalue
−5point d e v i a t i o n Fig.2.Excess energy in the eigenspectrum indicates that the translational DOF has contam-inated many eigenvectors.If the EVD had successfully isolated the unwanted DOF ,then its remaining eigenvalues should be identical to tho derived from the minimax solution.The graph at left shows the difference in the eigenspectra.The graph at right shows the EVD solution’s deviation from the translational vector y 0=1·N −1/2≈.03333.If the numer-ics were perfect the line wo
uld be flat,but in practice the deviation is significant enough (roughly 1%of the diameter of the embedding)to noticably perturb points in figure 1.the best result.Figure 1shows that the EVD solution exhibits many defects,particularly a folding-over of the manifold at the top and bottom edges and at the corners.Figure 2shows that the noisiness of the EVD solution is due largely to mutual contamination of numerically unstable eigenvectors.
4.1Numerical instability of eigen-methods
The following theorem us tools of matrix perturbation theory to show that as the prob-lem size increas,the desired and unwanted eigenvectors become increasingly wobbly and gradually contaminate each other,leading to degraded solutions.More precily,the low-order eigenvalues are ill-conditioned and exhibit multiplicities that may be true (due to noiless samples from low-curvature manifolds)or fal (due to numerical noi).Al-though in many cas some post-hoc algebra can “filter”the unwanted components out of the contaminated eigensolution,it is not hard to construct cas where the eigenvectors cannot be cleanly parated.The minimax formulation is immune to this problem becau it explicitly suppress the gratuitous component(s)before matrix decomposition.
Theorem 2.For any finite numerical precision,as the number of points N increas,the Frobenius norm of numerical noi in the null eigenvector v 0can grow as O (N 3/2),and the eigenvalue problem can approach a fal multiplicity at a rate as fast as O (N 3/2),at which point the eigenvectors of interest—embedding and translational—are mutually contaminated and/or have an indeterminate eigenvalue ordering.
Plea e appendix B for the proof.This theorem esntially lower-bounds an upper-bound on error;examples can be constructed in which the problem is wor.For exam-ple,it can be shown analytically that when embedding points drawn from the simple curve x i =[a ,cos πa ]⊤,a ∈[0,1]with K =2neighbors,instabilities cannot be bounded better than O (N 5/2);empirically we e eigenvector mixing with N <100points and we e it grow at the rate ≈O (N 4)—in many different eigensolvers.At very large scales,more pernicious instabilities t ,by N =20000points,the solution begins to fold over.Although algebraic multiplicity and instability of the eigenproblem is conceptually a minor oversight in the algorithmic realizations of eigenfunction embeddings,as theorem 2shows,the conquences are eventually fatal.5Summary
One of the most appealing aspects of the spectral NLDR literature is that algorithms are usually motivated from analys of linear operators on smooth differentiable ,[7].Understanda
within
bly,the analysis rely on assumptions (e.g.,smoothness or isometry
or noiless sampling)that make it difficult to predict what algorithmic realizations will do when real,noisy data violates the assumptions.The minimax embedding theorem pro-vides a complete algebraic characterization of this discrete NLDR problem,and provides a solution that recovers numerically robustified versions of almost all known algorithms.It offers a principled way of constructing new algorithms with clear optimality properties and good numerical conditioning—notably the construction of a continuous NLDR map (an RBF network)in a one-shot optimization (SVD ).We have also shown how to cast veral local NLDR principles in this framework,and upgrade the methods to give continuous maps.Working in the opposite direction,we sketched the minimax formulation of isomet-ric charting and showed that its constraint matrix contains a supert of all the algebraic constraints ud in local NLDR techniques.
References
1.W.T.Tutte.How to draw a graph.Proc.London Mathematical Society ,13:743–768,1963.
他妈的英文2.Miroslav Fiedler.A property of eigenvectors of nonnegative symmetric matrices and its applica-tion to graph theory.Czech.Math.Journal ,25:619–633,1975.安徽省2011年高考分数线
3.Fan R.K.Chung.Spectral graph theory ,volume 92of CBMS Regional Conference Series in Mathematics .American Mathematical Society,1997.
4.Joshua B.Tenenbaum,Vin de Silva,and John C.Langford.A global geometric framework for nonlinear dimensionality reduction.Science ,290:2319–2323,December 222000.
5.Sam T.Roweis and Lawrence K.Saul.Nonlinear dimensionality reduction by locally linear embedding.Science ,290:2323–2326,December 222000.
6.Yee Whye Teh and Sam T.Roweis.Automatic alignment of hidden reprentations.In Proc.NIPS-15,2003.
7.Mikhail Belkin and Partha Niyogi.Laplacian eigenmaps for dimensionality reduction and data reprentation.volume 14of Advances in Neural Information Processing Systems ,2002.
8.Xiafei He and Partha Niyogi.Locality prerving projections.Technical Report TR-2002-09,University of Chicago Computer Science,October 2002.
9.Matthew Brand.Charting a manifold.volume 15of Advances in Neural Information Processing Systems ,2003.
10.G.W.Stewart and Ji-Guang Sun.Matrix perturbation theory .Academic Press,1990.A Proof of minimax embedding theorem (1)电台在线收听
The burden of this proof is carried by supporting lemmas,below.To emphasize the proof strategy,we give the proof first;supporting lemmas follow.
Proof.Setting y i =l ⊤i Z ,we will solve for l i ∈columns (L ).Writing the error in terms of l i ,
E M (l i )=max K ∈R M ×N l ⊤i Z (I −M −CK )A l ⊤i ZA =max K ∈R M ×N l ⊤i Z (I −M )A −l ⊤i ZCKA l ⊤i ZA .(5)
The term l ⊤i ZCKA produces infinite error unless l ⊤i ZC =0,so we accept this as a con-
数据线 英文straint and ek
min l ⊤i
ZC =0 l ⊤i Z (I −M )A l ⊤i ZA .(6)By lemma 1,that orthogonality is satisfied by solving the problem in the space orthogonal to ZC ;the basis for this space is given by columns of Q .=null ((ZC )⊤).
By lemma 2,the denominator of the error specifies the metric in solution space to be ZAA ⊤Z ⊤;when the problem is projected into the space orthogonal to ZC it becomes Q ⊤(ZAA ⊤Z ⊤)Q .Nesting the “orthogonally-constrained-SVD ”construction of lemma 1