A tutorial on Principal Components Analysis

更新时间:2023-05-11 12:32:16 阅读：评论：0

Lindsay I Smith

February26,2002

Chapter1

Introduction

This tutorial is designed to give the reader an understanding of Principal Components Analysis(PCA).PCA is a uful statistical technique that has found application in ﬁelds such as face recognition and image compression,and is a common technique for ﬁnding patterns in data of high dimension.

Before getting to a description of PCA,this tutorialﬁrst introduces mathematical concepts that will be ud in PCA.It covers standard deviation,covariance,eigenvec-tors and eigenvalues.This background knowledge is meant to make the PCA ction very straightforward,but can be skipped if the concepts are already familiar.

There are examples all the way through this tutorial that are meant to illustrate the concepts being discusd.If further information is required,the mathematics textbook “Elementary Linear Algebra5e”by Howard Anton,Publisher John Wiley&Sons Inc, ISBN0-471-85223-6is a good source of information regarding the mathematical back-ground.

Chapter2

Background Mathematics

This ction will attempt to give some elementary background mathematical skills that will be required to understand the process of Principal Components Analysis.The topics are covered independently of each other,and examples given.It is less important to remember the exact mechanics of a mathematical technique than it is to understand the reason why such a technique may be ud,and what the result of the operation tells us about our data.Not all of the techniques are ud in PCA,but the ones that are not explicitly required do provide the grounding on which the most important techniques are bad.

I have included a ction on Statistics which looks at distribution measurements, or,how the data is spread out.The other ction is on Matrix Algebra and looks at eigenvectors and eigenvalues,important properties of matrices that are fundamental to PCA.

2.1Statistics

The entire subject of statistics is bad around the idea that you have this big t of data, and you want to analy that t in terms of the relationships between the individual points in that data t.I am going to look at a few of the measures you can do on a t of data,and what they tell you about the data itlf.

2.1.1Standard Deviation

To understand standard deviation,we need a data t.Statisticians are usually con-cerned with taking a sample of a population.To u election polls as an example,the population is all the people in the country,whereas a sample is a subt of the pop-ulation that the statisticians measure.The great thing about statistics is that by only measuring(in this ca by doing a phone survey or similar)a sample of the population, you can work out what is most likely to be the measurement if you ud the entire pop-ulation.In this statistics ction,I am going to assume that our data ts are samples

of some bigger population.There is a reference later in this ction pointing to more information about samples and populations.

Here’s an example t:

I could simply u the symbol to refer to this entire t of numbers.If I want to refer to an individual number in this data t,I will u subscripts on the symbol to indicate a speciﬁc fers to the3rd number in,namely the number

4.Note that is theﬁrst number in the quence,not like you may e in some textbooks.Also,the symbol will be ud to refer to the number of elements in the t

There are a number of things that we can calculate about a data t.For example, we can calculate the mean of the sample.I assume that the reader understands what the mean of a sample is,and will only give the formula:

Set1:

Total208

Square Root8.3266

8-24

9-11

1111

1224

Divided by(n-1) 3.333

Table2.1:Calculation of standard deviation

difference between each of the denominators.It also discuss the difference between samples and populations.

So,for our two data ts above,the calculations of standard deviation are in Ta-ble2.1.

And so,as expected,theﬁrst t has a much larger standard deviation due to the fact that the data is much more spread out from the mean.Just as another example,the data t:

also has a mean of10,but its standard deviation is0,becau all the numbers are the same.None of them deviate from the mean.

2.1.2Variance

Variance is another measure of the spread of data in a data t.In fact it is almost identical to the standard deviation.The formula is this:

本文发布于:2023-05-11 12:32:16，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/104524.html

上一篇：risk cost计算公式

下一篇：自适应作业3--lf-tunningregulator课件

标签：

留言与评论（共有 0 条评论）