Correspondence Analysis UTDallas

更新时间:2023-06-14 16:38:51 阅读: 评论:0

Abdi, H., & Béra, M. (to appear 2014). Correspondence Analysis. In R.
aboutAlhajj and J. Rokne (Eds.), "Encyclopedia of Social Networks and Mining." New York: Springer Verlag.
Title:Correspondence Analysis
Name:Herv´e Abdi1,Michel B´e ra2
Affil./Addr.1:School of Behavioral and Brain Sciences
mangkhutThe University of Texas at Dallas
Richardson,TX75080,USA
herve@utdallas.edu
Affil./Addr.2:Centre d’´Etude et de Recherche en Informatique et Communica-
tions
Conrvatoire National des Arts et M´e tiers
F-75141Paris Cdex03,Franceicc是什么
michel.bera@cnam.fr
Correspondence Analysis
Synonyms
Dual Scaling,optimal scaling,homogeneity analysis.
Glossary
万圣节快乐用英语怎么说
CA:Correspondence analysis
component:A linear combination of the variables of a data tablectd
dimension:e component
factor:e component
GSVD:Generalized singular value decomposition
PCA:Principal component analysis
SVD:Singular value decomposition
Introduction
Correspondence analysis[ca;e11;22;21;13;16;17;7;20;1;9]is an extension of principal component analysis(pca,for details,e[8])tailored to handle nominal variables.Originally,ca was developed to analyze contingency tables in which a sam-ple of obrvations is described by two nominal variables,but it was rapidly extended to the analysis of any data matrices with non-negative entries.The origin of ca can be traced to early work of Pearson([24])or Fisher,but the modern version of corre-spondence analysis and its geometric interpretation comes from the1960s in France and is associated with the French school of“data analysis”(analy des donn´e es) and was developed under the leadership of Jean-Paul Benz´e cri.As a technique,it was often discovered(and re-discovered)and so variations of ca can be found under veral different names such as“dual-scaling,”“optimal scaling,”“homogeneity analysis,”or “reciprocal averaging,”The multiple identities of correspondence analysis are a con-quence of its large number of properties:Correspondence analysis can be defined as an optimal solution for a lot of apparently different problems.
Key Points
Ca transforms a data table into two ts of new variables called factor scores(obtained as linear combinations of,respectively the rows and columns):One t for the rows and one t for the columns.The factor scores give the best reprentation of the similarity structure of,respectively,the rows and the columns of the table.In addition,the factors scores can be plotted as maps,that optimally display the information in the original table.In the maps,rows and columns are reprented as points who coordinates are the factor scores and where the dimensions are also called factors,components (by analogy with pca),or simply dimensions.Interestingly,the factor scores of the
rows and the columns have the same variance and,therefore,rows and columns can be conveniently reprented in one single map.
In correspondence analysis,the total variance(often called inertia)of the factor scores is proportional to the independence chi-square statistic of this table and therefore the factor scores in ca decompo thisχ2into orthogonal components.
Correspondence Analysis:Theory and Practice
Notations
Matrices are denoted in upper ca bold letters,vectors are denoted in lower ca bold, and their elements are denoted in lower ca italic.Matrices,vectors,and elements from the same matrix all u the same ,A,a,a).The transpo operation is denoted by the superscript T,the inver operation is denoted by−1.The identity matrix is denoted I,vectors or matrices of ones are denoted1,matrices or vectors of zeros are denoted0.When provided with a square matrix,the diag operator gives a vector with the diagonal elements of this matrix.When provided with a vector,the diag operator gives a diagonal matrix with the elements of the vector as the diagonal elements of this matrix.When provided with a square matrix,the trace operator gives the sum of the diagonal elements of this matrix.
The data table to be analyzed by ca is a contingency table(or at least a data table with non-negative entries)with I rows and J columns.It is reprented by the I×J matrix X,who generic element x i,j gives the number of obrvations that belong to the i th level of thefirst nominal ,the rows)and the j th level of the cond nominal ,the columns).The grand total of the table is noted N.
Computations
名侦探柯南的口头禅Thefirst step of the analysis is to transform the data matrix into a probability matrix (i.e.,a matrix comprising non-negative numbers and who sum is equal to one)de-noted Z and computed as Z=N−1X.We denote r the vector of the row totals of Z, (i.e.,r=Z1,with1being a conformable vector of1’s),c the vector of the columns ,c=Z T1),and D c=diag{c},D r=diag{r}.The factor scores are obtained from the following generalized singular value decomposition(gsvd,for details on the singular value decomposition e[2;3;15;27;18;12;26]):
Z−rc T
=P∆Q T with P T D−1
r
P=Q T D−1月份的英语单词
c
Q=I.(1)
Note that the subtraction of the matrix rc T from Z is equivalent to a double centering of the matrix([5;6]).The matrix P(respectively Q)contains the left(respectively
right)generalized singular vectors of
Z−rc T
,and the diagonal elements of the di-
agonal matrix∆give its singular values.The squared singular values,which are called eigenvalues,are denotedλ and stored into the diagonal matrixΛ.Eigenvalues express the variance extracted by the corresponding factor and their sum is called the total inertia(denoted I)of the data matrix.With the so called“triplet notation,”([14]) that is sometimes ud as a general framework to formalize multivariate techniques,
ca is equivalent to the analysis of the triplet
Z−rc T
,D−1
c
,D−1
r
.
From the gsvd,the row and(respectively)column factor scores are obtained as
F=D−1
r P∆and G=D−1
c
Q∆.(2)
Note that the factor scores of a given ,the rows or the columns)are pairwi orthogonal when they describe different dimensions and that the variance of the factor scores for a given dimension is equal to the eigenvalue associated with this dimension. So,for example,the variance of the row factor scores is computed as:在线词典
F T D r F=∆P T D−1
r D r D−1
r
P∆=∆P T D−1
r
健康成长演讲稿P∆=∆2=Λ.(3)
suggest的用法
What does correspondence analysis optimize?
In ca the criterion that is maximized is the variance of the factor scores(e[21;16]). For example,the rowfirst factor f1is obtained as a linear combination of the columns of
the matrix
Z−rc T
taking into account the constraints impod by the matrices D−1
r
and D−1
c
.Specifically,this means that we are arching for the vector q1containing the weights of the linear combination such as f1is obtained as
f1=D−1
r
Z−rc T
D−1
c
q1,(4)
such that
f1=arg max
f
f T D r f,(5) under the constraint that
q T 1D−1
c
q1=1.,(6)
The subquent row factor scores will maximize the residual variance under the orthog-
onality constraint impod by the matrix D−1
r (i.e.,f T
2
D−1
r
f1=0).
How to identify the elements important for a factor
In ca,the rows and the columns of the table have a similar role(and variance)and therefore we can u the same statistics to identify the rows and the columns important for a given dimension.Becau the variance extracted by a ,its eigenvalue) is obtained as the weighted sum of the factor scores for this factor of either the rows of columns of the table,the importance of a row(respectively a column)is reflected by the ratio of its squared factor score to the eigenvalue of this factor.This ratio is called the contribution of the row(respectively column)to the factor.Specifically,the contributions of row i to component and of column j to component are obtained respectively as:

本文发布于:2023-06-14 16:38:51,感谢您对本站的认可!

本文链接:https://www.wtabcd.cn/fanwen/fan/78/953793.html

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。

标签:口头禅   英语单词   侦探   月份
相关文章
留言与评论(共有 0 条评论)
   
验证码:
推荐文章
排行榜
Copyright ©2019-2022 Comsenz Inc.Powered by © 专利检索| 网站地图