首页 > 英语园地

Correspondence Analysis UTDallas

更新时间:2023-06-14 16:38:51 阅读：评论：0

Abdi, H., & Béra, M. (to appear 2014). Correspondence Analysis. In R.

aboutAlhajj and J. Rokne (Eds.), "Encyclopedia of Social Networks and Mining." New York: Springer Verlag.

Title:Correspondence Analysis

Name:Herv´e Abdi1,Michel B´e ra2

Aﬃl./Addr.1:School of Behavioral and Brain Sciences

mangkhutThe University of Texas at Dallas

Richardson,TX75080,USA

herve@utdallas.edu

Aﬃl./Addr.2:Centre d’´Etude et de Recherche en Informatique et Communica-

tions

Conrvatoire National des Arts et M´e tiers

F-75141Paris Cdex03,Franceicc是什么

michel.bera@cnam.fr

Correspondence Analysis

Synonyms

Dual Scaling,optimal scaling,homogeneity analysis.

Glossary

万圣节快乐用英语怎么说

CA:Correspondence analysis

component:A linear combination of the variables of a data tablectd

dimension:e component

factor:e component

GSVD:Generalized singular value decomposition

PCA:Principal component analysis

SVD:Singular value decomposition

Introduction

Correspondence analysis[ca;e11;22;21;13;16;17;7;20;1;9]is an extension of principal component analysis(pca,for details,e[8])tailored to handle nominal variables.Originally,ca was developed to analyze contingency tables in which a sam-ple of obrvations is described by two nominal variables,but it was rapidly extended to the analysis of any data matrices with non-negative entries.The origin of ca can be traced to early work of Pearson([24])or Fisher,but the modern version of corre-spondence analysis and its geometric interpretation comes from the1960s in France and is associated with the French school of“data analysis”(analy des donn´e es) and was developed under the leadership of Jean-Paul Benz´e cri.As a technique,it was often discovered(and re-discovered)and so variations of ca can be found under veral diﬀerent names such as“dual-scaling,”“optimal scaling,”“homogeneity analysis,”or “reciprocal averaging,”The multiple identities of correspondence analysis are a con-quence of its large number of properties:Correspondence analysis can be deﬁned as an optimal solution for a lot of apparently diﬀerent problems.

Key Points

Ca transforms a data table into two ts of new variables called factor scores(obtained as linear combinations of,respectively the rows and columns):One t for the rows and one t for the columns.The factor scores give the best reprentation of the similarity structure of,respectively,the rows and the columns of the table.In addition,the factors scores can be plotted as maps,that optimally display the information in the original table.In the maps,rows and columns are reprented as points who coordinates are the factor scores and where the dimensions are also called factors,components (by analogy with pca),or simply dimensions.Interestingly,the factor scores of the

rows and the columns have the same variance and,therefore,rows and columns can be conveniently reprented in one single map.

In correspondence analysis,the total variance(often called inertia)of the factor scores is proportional to the independence chi-square statistic of this table and therefore the factor scores in ca decompo thisχ2into orthogonal components.

Correspondence Analysis:Theory and Practice

Notations

Matrices are denoted in upper ca bold letters,vectors are denoted in lower ca bold, and their elements are denoted in lower ca italic.Matrices,vectors,and elements from the same matrix all u the same ,A,a,a).The transpo operation is denoted by the superscript T,the inver operation is denoted by−1.The identity matrix is denoted I,vectors or matrices of ones are denoted1,matrices or vectors of zeros are denoted0.When provided with a square matrix,the diag operator gives a vector with the diagonal elements of this matrix.When provided with a vector,the diag operator gives a diagonal matrix with the elements of the vector as the diagonal elements of this matrix.When provided with a square matrix,the trace operator gives the sum of the diagonal elements of this matrix.

The data table to be analyzed by ca is a contingency table(or at least a data table with non-negative entries)with I rows and J columns.It is reprented by the I×J matrix X,who generic element x i,j gives the number of obrvations that belong to the i th level of theﬁrst nominal ,the rows)and the j th level of the cond nominal ,the columns).The grand total of the table is noted N.

Computations

名侦探柯南的口头禅Theﬁrst step of the analysis is to transform the data matrix into a probability matrix (i.e.,a matrix comprising non-negative numbers and who sum is equal to one)de-noted Z and computed as Z=N−1X.We denote r the vector of the row totals of Z, (i.e.,r=Z1,with1being a conformable vector of1’s),c the vector of the columns ,c=Z T1),and D c=diag{c},D r=diag{r}.The factor scores are obtained from the following generalized singular value decomposition(gsvd,for details on the singular value decomposition e[2;3;15;27;18;12;26]):

Z−rc T

=P∆Q T with P T D−1

P=Q T D−1月份的英语单词

Q=I.(1)

Note that the subtraction of the matrix rc T from Z is equivalent to a double centering of the matrix([5;6]).The matrix P(respectively Q)contains the left(respectively

right)generalized singular vectors of

Z−rc T

,and the diagonal elements of the di-

agonal matrix∆give its singular values.The squared singular values,which are called eigenvalues,are denotedλ and stored into the diagonal matrixΛ.Eigenvalues express the variance extracted by the corresponding factor and their sum is called the total inertia(denoted I)of the data matrix.With the so called“triplet notation,”([14]) that is sometimes ud as a general framework to formalize multivariate techniques,

ca is equivalent to the analysis of the triplet

Z−rc T

,D−1

From the gsvd,the row and(respectively)column factor scores are obtained as

F=D−1

r P∆and G=D−1

Q∆.(2)

Note that the factor scores of a given ,the rows or the columns)are pairwi orthogonal when they describe diﬀerent dimensions and that the variance of the factor scores for a given dimension is equal to the eigenvalue associated with this dimension. So,for example,the variance of the row factor scores is computed as:在线词典

F T D r F=∆P T D−1

r D r D−1

P∆=∆P T D−1

健康成长演讲稿P∆=∆2=Λ.(3)

suggest的用法

What does correspondence analysis optimize?

In ca the criterion that is maximized is the variance of the factor scores(e[21;16]). For example,the rowﬁrst factor f1is obtained as a linear combination of the columns of

the matrix

Z−rc T

taking into account the constraints impod by the matrices D−1

and D−1

.Speciﬁcally,this means that we are arching for the vector q1containing the weights of the linear combination such as f1is obtained as

f1=D−1

Z−rc T

D−1

q1,(4)

such that

f1=arg max

f T D r f,(5) under the constraint that

q T 1D−1

q1=1.,(6)

The subquent row factor scores will maximize the residual variance under the orthog-

onality constraint impod by the matrix D−1

r (i.e.,f T

D−1

f1=0).

How to identify the elements important for a factor

In ca,the rows and the columns of the table have a similar role(and variance)and therefore we can u the same statistics to identify the rows and the columns important for a given dimension.Becau the variance extracted by a ,its eigenvalue) is obtained as the weighted sum of the factor scores for this factor of either the rows of columns of the table,the importance of a row(respectively a column)is reﬂected by the ratio of its squared factor score to the eigenvalue of this factor.This ratio is called the contribution of the row(respectively column)to the factor.Speciﬁcally,the contributions of row i to component and of column j to component are obtained respectively as:

本文发布于:2023-06-14 16:38:51，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/78/953793.html

上一篇：投资学课后答案解析APT

下一篇：股权转让协议范本-中英文

标签：口头禅英语单词侦探月份

留言与评论（共有 0 条评论）