correlation coefficient 相关系数

更新时间:2023-05-30 11:49:32 阅读：评论：0

Pearson product-moment correlation coefficient In statistics, the Pearson product-moment correlation coefficient (sometimes referred to as the PMCC , and typically denoted by r ) is a measure of the correlation (linear dependence) between two variables X and Y , giving a value between +1 and −1 inclusive. It is widely ud in the sciences as a measure of the strength of linear dependence between two variables. It was developed by Karl Pearson from a similar but slightly different idea introduced by Francis Galton in the 1880s.[1] [2] The correlation coefficient is sometimes called "Pearson's r."Several ts of (x , y ) points, with the correlation coefficient of x and y for each t. Note

that the correlation reflects the noisiness and direction of a linear relationship (top row),

but not the slope of that relationship (middle), nor many aspects of nonlinear relationships

(bottom). N.B.: the figure in the center has a slope of 0 but in that ca the correlation

逗比图片coefficient is undefined becau the variance of Y

is zero.

Definition

小提琴名曲

Pearson's correlation coefficient

between two variables is defined as the

covariance of the two variables divided

by the product of their standard

deviations:

The above formula defines the population correlation coefficient, commonly reprented by the Greek letter ρ (rho).Substituting estimates of the covariances and variances bad on a sample gives the sample correlation coefficient ,commonly denoted r

An equivalent expression gives the correlation coefficient as the mean of the products of the standard scores. Bad on a sample of paired data (X i , Y i

), the sample Pearson correlation coefficient is

道德的英语

where , and are the standard score, sample mean, and sample standard deviation respectively.

Mathematical properties

The absolute value of both the sample and population Pearson correlation coefficients are less than or equal to 1.Correlations equal to 1 or -1 correspond to data points lying exactly on a line (in the ca of the sample correlation),or to a bivariate distribution entirely supported on a line (in the ca of the population correlation). The Pearson correlation coefficient is symmetric: corr (X ,Y ) = corr (Y ,X ).

A key mathematical property of the Pearson correlation coefficient is that it is invariant to parate changes in location and scale in the two variables. That is, we may transform X to a + bX and transform Y to c + dY , where a , b ,c , and d are constants, without changing the correlation coefficient (this fact holds for both the population and sample Pearson correlation coefficients). Note that more general linear transformations do change the correlation:e a later ction for an application of this.

The Pearson correlation can be expresd in terms of uncentered moments. Since μX = E(X ), σX 2 = E[(X − E(X ))2]= E(X 2) − E 2(X ) and likewi for Y , and since

the correlation can also be written as秦灭楚

Alternative formulae for the sample Pearson correlation coefficient are also available:

The above formula conveniently suggests a single-pass algorithm for calculating sample correlations, but, depending on the numbers involved, it can sometimes be numerically unstable.

Interpretation

梦到被虫子咬The correlation coefficient ranges from −1 to 1. A value of 1 implies that a linear equation describes the relationship between X and Y perfectly, with all data points lying on a line for which Y increas as X increas. A value of −1implies that all data points lie on a line for which Y decreas as X increas. A value of 0 implies that there is no linear correlation between the variables.

More generally, note that (X i − X )(Y i − Y ) is positive if and only if X i and Y i lie on the same side of their respective means. Thus the correlation coefficient is positive if X i and Y i tend to be simultaneously greater than, or simultaneously less than, their respective means. The correlation coefficient is negative if X i and Y i tend to lie on opposite sides of their respective means.

Geometric interpretation

Regression lines for y=g x (x) [red] and x=g y

(y) [blue ]

For uncentered data, the correlation coefficient

corresponds with the the cosine of the angle

between both possible regression lines y=g x (x) and

x=g y

(y).

For centered data (i.e., data which have been

shifted by the sample mean so as to have an

average of zero), the correlation coefficient can

also be viewed as the cosine of the angle

between the two vectors of samples drawn from

the two random variables (e below).

Some practitioners prefer an uncentered

(non-Pearson-compliant) correlation coefficient.

See the example below for a comparison.

As an example, suppo five countries are found to

have gross national products of 1, 2, 3, 5, and 8

billion dollars, respectively. Suppo the same five countries (in the same order) are found to have 11%, 12%,13%, 15%, and 18% poverty. Then let x and y be ordered 5-element vectors containing the above data: x = (1, 2, 3,5, 8) and y = (0.11, 0.12, 0.13, 0.15, 0.18).By the usual procedure for finding the angle between two vectors (e dot product), the uncentered correlation

coefficient is:

Note that the above data were deliberately chon to be perfectly correlated: y = 0.10 + 0.01 x . The Pearson correlation coefficient must therefore be exactly one. Centering the data (shifting x by E(x ) = 3.8 and y by E(y ) =0.138) yields x = (−2.8, −1.8, −0.8, 1.2, 4.2) and y = (−0.028, −0.018, −

0.008, 0.012, 0.042), from which

as expected.

Interpretation of the size of a correlation Correlation

Negative Positive None

−0.09 to 0.00.0 to 0.09Small

−0.3 to −0.10.1 to 0.3Medium

−0.5 to −0.30.3 to 0.5Large −1.0 to −0.50.5 to 1.0

Several authors [3] have offered guidelines for the interpretation of a correlation coefficient. Cohen (1988),[3] has obrved, however, that all such criteria are in some ways arbitrary and should not be obrved too strictly. The interpretation of a correlation coefficient depends on the context and purpos. A correlation of 0.9 may be very low if one is verifying a physical law using high-quality instruments, but may be regarded as very high in the social sciences where there may be a greater contribution from complicating factors.

Inference

A graph showing the minimum value of Pearson's correlation coefficient that is

significantly different from zero at the 0.05 level, for a given sample size.Statistical inference bad on Pearson's

correlation coefficient often focus on one

of the following two aims. One aim is to test

the null hypothesis that the true correlation

coefficient is ρ, bad on the value of the

sample correlation coefficient r . The other

aim is to construct a confidence interval

around r that has a given probability of

containing ρ.

Randomization approaches

Permutation tests provide a direct approach

to performing hypothesis tests and

constructing confidence intervals. A

一年级口算训练permutation test for Pearson's correlation

coefficient involves the following two steps:(i) using the original paired data (x i , y i ),

randomly redefine the pairs to create a new

data t (x i , y i ′), where the i ′ are a permutation of the t {1,...,n }. The permutation i ′ is lected randomly, with equal probabilities placed on all n ! possible permutations. This is equivalent to drawing the i ′ randomly "without replacement" from the t {1,..., n }. A cloly-related and equally-justified (bootstrapping) approach is to parately draw the i and the i ′ "with replacement" from {1,..., n }; (ii) Construct a correlation coefficient r from the randomized data. To perform the permutation test, repeat (i) and (ii) a large number of times. The p-value for the permutation test is one minus the proportion of the r values generated in step (ii) that are larger than the Pearson correlation coefficient that was calculated from the original data. Here "larger" can mean either that the value is l

arger in magnitude, or larger in signed value, depending on whether a two-sided or one-sided test is desired.

The bootstrap can be ud to construct confidence intervals for Pearson's correlation coefficient. In the "non-parametric" bootstrap, n pairs (x i , y i ) are resampled "with replacement" from the obrved t of n pairs, and the correlation coefficient r is calculated bad on the resampled data. This process is repeated a large number of times,and the empirical distribution of the resampled r values are ud to approximate the sampling distribution of the statistic. A 95% confidence interval for ρ can be defined as the interval spanning from the 2.5th to the 97.5th percentile of the resampled r values.

Approaches bad on mathematical approximations

For approximately Gaussian data, the sampling distribution of Pearson's correlation coefficient approximately follows Student's t-distribution with degrees of freedom N − 2. Specifically, if the underlying variables have a

读书卡怎么做bivariate normal distribution, the variable

has a Student's t-distribution in the null ca (zero correlation).[4] This also holds approximately even if the obrved values are non-normal, provided sample sizes are not very small.[5] For constructing confidence intervals and performing power analys, the inver of this transformation is also needed:

Alternatively, large sample approaches can be ud.

Early work on the distribution of the sample correlation coefficient was carried out by R. A. Fisher[6][7] and A. K. Gayen.[8] Another early paper[9] provides graphs and tables for general values of ρ, for small sample sizes, and discuss computational approaches.

Fisher Transformation

In practice, confidence intervals and hypothesis tests relating to ρ are usually carried out using the Fisher transformation:

If F(r) is the Fisher transformation of r, and n is the sample size, then F(r) approximately follows a normal distribution with

and standard error

Thus, a z-score is

心理健康小报内容

under the null hypothesis of that , given the assumption that the sample pairs are independent and identically distributed and follow a bivariate normal distribution. Thus an approximate p-value can be obtained from a normal probability table. For example, if z = 2.2 is obrved and a two-sided p-value is desired to test the null hypothesis that , the p-value is 2·Φ(−2.2) = 0.028, where Φ is the standard normal cumulative distribution function.

Confidence Intervals

To obtain a confidence interval for ρ, we first compute a confidence interval for F( ):

The inver Fisher transformation bring the interval back to the correlation scale.

For example, suppo we obrve r = 0.3 with a sample size of n=50, and we wish to obtain a 95% confidence interval for ρ. The transformed value is artanh(r) = 0.30952, so the confidence interval on the transformed scale is 0.30952 ± 1.96/√47, or (0.023624, 0.595415). Converting back to the correlation scale yields (0.024, 0.534).

本文发布于:2023-05-30 11:49:32，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/951387.html

上一篇：得与失主题得与失的感悟短句(精选44篇)

下一篇：最新亲情友情的温暖句子4篇(优秀)

标签：心理健康逗比小报图片内容

留言与评论（共有 0 条评论）