首页 > 英语园地

Minimax Embeddings

更新时间:2023-07-04 23:29:43 阅读：评论：0

Minimax embeddings

Matthew Brand

Mitsubishi Electric Rearch Labs篮球英语

Cambridge MA02139USA

Abstract

Spectral methods for nonlinear dimensionality reduction(NLDR)impo

a neighborhood graph on point data and compute eigenfunctions of a

quadratic form generated from the graph.We introduce a more general

and more robust formulation of NLDR bad on the singular value de-

composition(SVD).In this framework,most spectral NLDR principles

can be recovered by taking a subt of the constraints in a quadratic form

built from local nullspaces on the manifold.The minimax formulation

also opens up an interesting class of methods in which the graph is“dec-

orated”with information at the vertices,offering discrete or continuous

maps,reduced computational complexity,and immunity to some solu-

tion instabilities of eigenfunction approaches.Apropos,we show almost

all NLDR methods bad on eigenvalue decompositions(EVD)have a so-

lution instability that increas faster than problem size.This pathology

can be obrved(and corrected via the minimax formulation)in problems

as small as N<100points.

1Nonlinear dimensionality reduction(NLDR)

Spectral NLDR methods are graph embedding problems where a t of N points X . =

[x1,···,x N]∈R D×N sampled from a low-dimensional manifold in a ambient space R D is reparameterized by imposing a neighborhood graph G on X and embedding the graph with minimal distortion in a“parameterization”space R d,d<D.Typically the graph is spar and local,with edges connecting points to their immediate neighbors.The embedding must keep the edges short or prerve their length(for isometry)or angles(for conformality). The graph-embedding problem wasﬁrst introduced as a least-squares problem by Tutte[1], and as an eigenvalue problem by Fiedler[2].The u of spar graphs to generate metrics for least-squares problems has been studied intenly in the following three decades(e [3]).Modern NLDR methods u graph constraints to generate a metric in a space of embed-dings R N.Eigenvalue decomposition(EVD)gives the directions of least or greatest variance under this metric.Typically a subt of d extremal eigenvectors gives the embedding of N points in R d parameterization space.This includes the IsoMap family[4],the locally linear embedding(LLE)family[5,6],and Laplacian methods[7,8].Using similar methods,the Automatic Alignment[6]and Charting[9]algorithms embed local subspaces instead of points,and by combining subspace projections thus obtain continuous maps between R D and R d.

This paper introduces a general algebraic framework for computing optimal embeddings directly fro

m graph constraints.The aforementioned methods can can be recovered as spe-cial cas.The framework also suggests some new methods with very attractive properties, including continuous maps,reduced computational complexity,and control over the degree

of conformality/isometry in the desired map.It also eliminates a solution instability that is

intrinsic to EVD-bad approaches.A perturbational analysis quantiﬁes the instability.

2Minimax theorem for graph embeddings

We begin with neighborhood graph speciﬁed by a nondiagonal weighted adjacency matrix

M∈R N×N that has the data-reproducing property XM=X(this can be relaxed to XM≈

X in practice).The graph-embedding and NLDR literatures offer various constructions of

M,each appropriate to different ts of assumptions about the original embedding and

its sampling ,isometry,local linearity,noiless samples,regular sampling,etc.).

Typically M i j=0if points i,j are nearby on the intrinsic manifold and|M i j|is small or zero otherwi.Ea

ch point is taken to be a linear or convex combination of its neighbors,

and thus M speciﬁes manifold connectivity in the n that any nondegenerate embedding

the giantY that satisﬁes YM≈Y with small residual YM−Y F will prerve this connectivity

and the structure of local neighborhoods.For example,in barycentric embeddings,each

point is the average of its neighbors and thus M i j=1/k if vertex i is connected to vertex j (of degree k).We will also consider three optional constraints on the embedding:

1.A null-space restriction,where the solution must be outside to the column-space

of C∈R N×M,M<N.For example,it is common to stipulate that the solution Y be ,YC=0for C=1,the constant vector.

2.A basis restriction,where the solution must be a linear combination of the rows

of basis Z∈R K×N,K≤N.This can be thought of as information placed at the vertices of the graph that rves as example inputs for a target NLDR function.We will u this to construct dimension-reducing radial basis function networks.

3.A metricΣ∈R N×N that determines how error is distributed over the points.For

example,it might be important that boundary points have less error.We assume thatΣis symmetric positive deﬁnite and has factorizationΣ=AA⊤(e.g.,A could be a Cholesky factor ofΣ).

In most ttings,the optional matrices will default to the identity matrix.In this context,

we deﬁne the per-dimension embedding error of row-vector y i∈rows(Y)to be

E M(y i).=max

颁布英文y i∈range(Z),,K∈R M×N (y i(M+CD)−y i)A

y i A (1)

where D is a matrix constructed by an adversary to maximize the error.The optimizing y i is a vector inside the subspace spanned by the rows of Z and outside the subspace spanned by the columns of C,for which the reconstruction residual y i M−y i has smallest he metricΣ.The following theorem identiﬁes the optimal embedding Y for any choice of M,Z,C,Σ:

Minimax solution:Let Q∈S K×P be a column-orthonormal basis of the null-space of the rows of ZC,with P=K−rank(C).Let B∈R P×P be a square factor satisfying B⊤B= Q⊤ZΣZ⊤,a Cholesky factor(or the“R”factor in QR-decomposition of(Q⊤ZA)⊤). Compute the left singular vectors U∈S N×N of U diag(s)V⊤=B−⊤Q⊤Z(I−M)A,with singular values s⊤.=[s1,···,s P]ordered s1≤s2≤···≤s p.Using the leading columns U1:d of U,t Y=U⊤1:d B−⊤Q⊤Z.

Theorem1.Y is the optimal(minimax)embedding in R d with error [s1,···,s d] 2:

Y .

=U⊤1:d B−⊤Q⊤Z=arg min

Y∈R d×N

∑

y i∈rows(Y)

E M(y i)2with E M(y i)=s i.(2)ghz

Appendix A develops the proof and other error measures that are minimized.

Local NLDR techniques are easily expresd in this framework.When Z =A =I ,C =[],and M reproduces X through linear combinations with M ⊤1=1,we recover LLE [5].When Z =I ,C =[],I −M is the normalized graph Laplacian,and A is a diagonal matrix of vertex degrees,we recover Laplacian eigenmaps [7].When further Z =X we recover locally prerving projections [8].

3Analysis and generalization of charting

The minimax construction of charting [9]takes some development,but offers an interest-ing insight into the above-mentioned methods.Recall that charting ﬁrst solves for a t of local afﬁne subspace axes S 1∈R D ×d ,S 2,···at offts µ1∈R D ,µ2,···that best cover the data and vary smoothly over the manifold.Each subspace offers a chart—a local pa-rameterization of the data by projection onto the local axes.Charting then constructs a weighted mixture of afﬁne projections that merges the charts into a global parameteriza-tion.If the data manifold is curved,each projection will assign a point a slightly different embedding,so the error is measured as the variance of the propod embeddings about their mean.This maximizes consistency and tends to produce isometric embeddings;[9]discuss ways to explicitly optimize the isometry of the embedding.

Under the assumption of isometry,the charting error is equivalent to the sum-squared displacements

of an embedded point relative to its immediate neighbors (summed over all neighborhoods).To construct the same error criteria in the min-imax tting,let x i −k ,···,x i ,···,x i +k denote points in the i th neighborhood and let the columns of V i ∈R (2k +1)×d be an orthonormal basis of rows of the local pa-rameterization S ⊤i [x i −k ,···,x i ,···,x i +k ].Then a nonzero reparameterization will satisfy

[y i −k ,···,y i ,···,y i +k ]V i V ⊤i =[y i −k ,···,y i ,···,y i +k ]if and only if it prerves the relative position of the points in the local parameterization.Converly,any relative displacements of the points are isolated by the formula [y i −k ,···,y i ,···,y i +k ](I −V i V ⊤i ).Minimizing the Frobenius norm of this expression is thus equivalent to minimizing the local error in charting.We sum the constraints over all neighborhoods to obtain the constraint matrix M =I −∑i F i (I −V i V ⊤i )F ⊤i ,where (F i )k j =1iff the j th point of the i th neighborhood is

the k th point of the datat.Becau V i V ⊤i and (I −V i V ⊤i )are complementary,it follows that the error criterion of any local NLDR method (e.g.,LLE ,Laplacian eigenmaps,etc.)

must measure the projection of the embedding onto some subspace of (I −V i V ⊤i ).To construct a continuous map,charting us an overcomplete radial basis function (RBF )reprentation Z =[z (x 1),z (x 2),···z (x N )],where z (x )is a vector that stacks z 1(x ),z 2(x ),etc.,and z m (x ).= K ⊤m (x −µm )1 p m (x )∑m p m (x )

,(3)p m (x ).=N (x |µm ,Σm )∝e −(x −µm )⊤Σ−1m (x −µm )/2(4)

and K m is any local linear dimensionality reducer,typically S m itlf.Each column of Z contains many “views”of the same point that are combined to give its low-dimensional embedding.

Finally,we t C =1,which forces the embedding of the full data to be centered.Applying the minimax solution to the constraints yields the RBF network mixing ma-trix,f (x ).=U ⊤1:d B

−⊤Q ⊤z (x ).Theorem 1guarantees that the resulting embedding is least-squares Z ,M ,C ,A at the datapoints f (x i ),and becau f (·)is an afﬁne trans-form of z (·)it smoothly interpolates the embedding between points.

There are some interesting variants:

Fig.1.Minimax and generalized EVD solution for kernel eigenmap of a non-developable swiss roll.Points are connected into a grid which ideally should be regular.The EVD so-lution shows substantial degradation.Ints detail corners where the EVD solution cross itlf repeatedly.The border compression is characteristic of Laplacian constraints.

剑桥少儿英语下载One-shot charting:If we t the local dimensionality reducers to the identity matrix(all K m=I),then the minimax method jointly optimizes the local dimensionality reduction to charts and the global coordination of the charts(under any choice of M).This requires that rows(Z)≤N for a fully determined solution.

Discrete isometric charting:If Z=I then we directly obtain a discrete isometric embed-ding of the data,rather than a continuous map,making this a local equivalent of IsoMap. Reduced basis charting:Let Z be constructed using just a small number of kernels ran-domly placed on the data manifold,such that rows(Z)≪N.Then the size of the SVD problem is substantially reduced.

4Numerical advantage of minimax method

Note that the minimax method projects the constraint matrix M into a subspace derived from C and Z and decompos it there.This suppress unwanted degrees of freedom (DOF s)admitted by the pro

blem constraints,for example the trivial R0embedding where all points are mapped to a single point y i=N−1/2.The R0embedding rves as a trans-lational DOF in the solution.LLE-and eigenmap-bad methods construct M to have a constant null-space so that the translational DOF will be isolated in the EVD as null eigen-value paired to a constant eigenvector,which is then discarded.However,ction4.1shows that this construction makes the EVD increasingly unstable as problem size grows and/or the data becomes increasing amenable to low-residual embeddings,ultimately causing solution collap.As the next paragraph demonstrates,the problem is exacerbated when a basis Z(via the equivalent generalized eigenproblem),partly becau the eigenvec-tor associated with the unwanted DOF can have arbitrary structure.In all cas the problem can be averted by using the minimax formulation with C=1to suppress the DOF.

A2D plane was embedded in3D with a curl,a twist,and2.5%Gaussian noi,then regu-larly sampled at900points.We computed a kernelized Laplacian eigenmap using70ran-dom points as RBF ,a continous map using M derived from the graph Laplacian and Z constructed as above.The map was computed both via the minimax(SVD)method and via the equivalent generalized eigenproblem,where the translational degree of freedom must be removed by discarding an eigenvector from the solution.The two solutions are al-gebraically equivalent in every other regard.A variety of eigensolvers were tried;we took

10020005

1015x 10

−5eigenvalue e x c e s s e n e r g y

Eigen spectrum compared to minimax spectrum eigenvalue

−5point d e v i a t i o n Fig.2.Excess energy in the eigenspectrum indicates that the translational DOF has contam-inated many eigenvectors.If the EVD had successfully isolated the unwanted DOF ,then its remaining eigenvalues should be identical to tho derived from the minimax solution.The graph at left shows the difference in the eigenspectra.The graph at right shows the EVD solution’s deviation from the translational vector y 0=1·N −1/2≈.03333.If the numer-ics were perfect the line wo

uld be ﬂat,but in practice the deviation is signiﬁcant enough (roughly 1%of the diameter of the embedding)to noticably perturb points in ﬁgure 1.the best result.Figure 1shows that the EVD solution exhibits many defects,particularly a folding-over of the manifold at the top and bottom edges and at the corners.Figure 2shows that the noisiness of the EVD solution is due largely to mutual contamination of numerically unstable eigenvectors.

4.1Numerical instability of eigen-methods

The following theorem us tools of matrix perturbation theory to show that as the prob-lem size increas,the desired and unwanted eigenvectors become increasingly wobbly and gradually contaminate each other,leading to degraded solutions.More precily,the low-order eigenvalues are ill-conditioned and exhibit multiplicities that may be true (due to noiless samples from low-curvature manifolds)or fal (due to numerical noi).Al-though in many cas some post-hoc algebra can “ﬁlter”the unwanted components out of the contaminated eigensolution,it is not hard to construct cas where the eigenvectors cannot be cleanly parated.The minimax formulation is immune to this problem becau it explicitly suppress the gratuitous component(s)before matrix decomposition.

Theorem 2.For any ﬁnite numerical precision,as the number of points N increas,the Frobenius norm of numerical noi in the null eigenvector v 0can grow as O (N 3/2),and the eigenvalue problem can approach a fal multiplicity at a rate as fast as O (N 3/2),at which point the eigenvectors of interest—embedding and translational—are mutually contaminated and/or have an indeterminate eigenvalue ordering.

Plea e appendix B for the proof.This theorem esntially lower-bounds an upper-bound on error;examples can be constructed in which the problem is wor.For exam-ple,it can be shown analytically that when embedding points drawn from the simple curve x i =[a ,cos πa ]⊤,a ∈[0,1]with K =2neighbors,instabilities cannot be bounded better than O (N 5/2);empirically we e eigenvector mixing with N <100points and we e it grow at the rate ≈O (N 4)—in many different eigensolvers.At very large scales,more pernicious instabilities t ,by N =20000points,the solution begins to fold over.Although algebraic multiplicity and instability of the eigenproblem is conceptually a minor oversight in the algorithmic realizations of eigenfunction embeddings,as theorem 2shows,the conquences are eventually fatal.5Summary

One of the most appealing aspects of the spectral NLDR literature is that algorithms are usually motivated from analys of linear operators on smooth differentiable ,[7].Understanda

within

bly,the analysis rely on assumptions (e.g.,smoothness or isometry

or noiless sampling)that make it difﬁcult to predict what algorithmic realizations will do when real,noisy data violates the assumptions.The minimax embedding theorem pro-vides a complete algebraic characterization of this discrete NLDR problem,and provides a solution that recovers numerically robustiﬁed versions of almost all known algorithms.It offers a principled way of constructing new algorithms with clear optimality properties and good numerical conditioning—notably the construction of a continuous NLDR map (an RBF network)in a one-shot optimization (SVD ).We have also shown how to cast veral local NLDR principles in this framework,and upgrade the methods to give continuous maps.Working in the opposite direction,we sketched the minimax formulation of isomet-ric charting and showed that its constraint matrix contains a supert of all the algebraic constraints ud in local NLDR techniques.

References

1.W.T.Tutte.How to draw a graph.Proc.London Mathematical Society ,13:743–768,1963.

他妈的英文2.Miroslav Fiedler.A property of eigenvectors of nonnegative symmetric matrices and its applica-tion to graph theory.Czech.Math.Journal ,25:619–633,1975.安徽省2011年高考分数线

3.Fan R.K.Chung.Spectral graph theory ,volume 92of CBMS Regional Conference Series in Mathematics .American Mathematical Society,1997.

4.Joshua B.Tenenbaum,Vin de Silva,and John C.Langford.A global geometric framework for nonlinear dimensionality reduction.Science ,290:2319–2323,December 222000.

5.Sam T.Roweis and Lawrence K.Saul.Nonlinear dimensionality reduction by locally linear embedding.Science ,290:2323–2326,December 222000.

6.Yee Whye Teh and Sam T.Roweis.Automatic alignment of hidden reprentations.In Proc.NIPS-15,2003.

7.Mikhail Belkin and Partha Niyogi.Laplacian eigenmaps for dimensionality reduction and data reprentation.volume 14of Advances in Neural Information Processing Systems ,2002.

8.Xiafei He and Partha Niyogi.Locality prerving projections.Technical Report TR-2002-09,University of Chicago Computer Science,October 2002.

9.Matthew Brand.Charting a manifold.volume 15of Advances in Neural Information Processing Systems ,2003.

10.G.W.Stewart and Ji-Guang Sun.Matrix perturbation theory .Academic Press,1990.A Proof of minimax embedding theorem (1)电台在线收听

The burden of this proof is carried by supporting lemmas,below.To emphasize the proof strategy,we give the proof ﬁrst;supporting lemmas follow.

Proof.Setting y i =l ⊤i Z ,we will solve for l i ∈columns (L ).Writing the error in terms of l i ,

E M (l i )=max K ∈R M ×N l ⊤i Z (I −M −CK )A l ⊤i ZA =max K ∈R M ×N l ⊤i Z (I −M )A −l ⊤i ZCKA l ⊤i ZA .(5)

The term l ⊤i ZCKA produces inﬁnite error unless l ⊤i ZC =0,so we accept this as a con-

数据线英文straint and ek

min l ⊤i

ZC =0 l ⊤i Z (I −M )A l ⊤i ZA .(6)By lemma 1,that orthogonality is satisﬁed by solving the problem in the space orthogonal to ZC ;the basis for this space is given by columns of Q .=null ((ZC )⊤).

By lemma 2,the denominator of the error speciﬁes the metric in solution space to be ZAA ⊤Z ⊤;when the problem is projected into the space orthogonal to ZC it becomes Q ⊤(ZAA ⊤Z ⊤)Q .Nesting the “orthogonally-constrained-SVD ”construction of lemma 1

本文发布于:2023-07-04 23:29:43，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/1078788.html

上一篇：【文献阅读】MUTAN——多模态塔克融合VQA模型（HediBen-younes等人，Ar。。。

下一篇：（物理化学专业论文）低维（VIa族化合物）半导体纳米材料的制备及表征

标签：高考剑桥电台安徽省下载收听

留言与评论（共有 0 条评论）