strawberryStat Papers
DOI10.1007/s00362-011-0423-0
REGULAR ARTICLE
A new rank correlation measure
Claudio Giovanni Borroni
Received:15April2011/Revid:10November2011
©Springer-Verlag2011
Abstract A new rank correlation measureβn is propod,so as to develop a non-parametric test of independence for two variables.βn is shown to be the symmetrized version of a measure earlier propod by Borroni and Zenga(Stat Methods Appl 16:289–308,2007).More specifically,βn is built so that it can take the opposite sign, without changing its absolute value,when the ranking of one variable is reverd. Further,the meaning of the population equivalent ofβn is discusd.It is pointed out that this latter association measure vanishes not only at independence but,more generally,at indif
ference,that is when the two variables do not show any“tendency”to positive or negative dependence.The null distribution ofβn needs an independent study:hence,thefinite null variance and a table of critical values are determined. Moreover,the asymptotic null distribution ofβn is derived.Finally,the performance of the test bad onβn is evaluated by simulation.βn is shown to be a good competitor of some classical tests for the same problem.
Keywords Nonparametrics·Rank correlation·Association measures·Indifference·Gini’s gamma
Mathematics Subject Classification(2000)62G10·62H20
1Introduction
Rank correlation measures are often ud as test statistics for the independence of two sorting“criteria”or of two continuous variable X and Y who joint distribution is unspecified.Denote by(R1h,...,R nh)the t of ranks of a sample of n units according C.G.Borroni(B)
Department of Quantitative Methods for Economics,University of Milano-Bicocca,
via Bicocca degli Arcimboldi,8,20126Milan,Italy
e-mail:claudio.borroni@unimib.it
美国恐怖故事 第三季
C.G.Borroni
to the h -th sorting criterion (h =1,2);the best known rank correlation measure is probably Spearman’s rho (e Kendall and Gibbons 1990):
ρn :=12n 3−n n i =1
R i 1−n +12 R i 2−n +12 =1−6 n i =1(R i 1−R i 2)2n 3−n .(1)
The left-hand side of (1)shows that ρn is bad on the squared differences of the couples of corresponding ranks.Genest et al.(2010)point out that,in recent years,a rediscovered interest is being put in the L 1-alternative to such a logic,that is in the so-called Spearman’s footrule :
ϕn :=1−2 n 2/2 n i =1|R i 1−R i 2|(2)
(where m denotes the integer part of m ),due to its interpretation as a Manhattan dis-tance between the two ts of ranks.Notice that Spearman’s footrule is here reported in its form ranging in [−1,1](e Dinneen and Blakesley 1982).
After Spearman introduced it in the early twentieth century,ϕn was neglected mainly becau of its la
ck of some statistical properties,considered uful in applications.In particular,it can be easily en that,contrary to ρn ,ϕn does not posss the following property of symmetry:if a sample (x 1,y 1),...,(x n ,y n )drawn from (X ,Y )is consid-ered,then a rank measure should assign the opposite degree of association to the sample (−x 1,y 1),...,(−x n ,y n ).In other words,the value of the rank correlation measure should not change,apart from its sign,when one t of ranks,say R i 1(i =1,...,n ),is substituted with its reverd ranks n +1−R i 1(i =1,...,n ).Salama and Quade (2001)propod to symmetrize the Spearman’s footrule by the simple following argu-mentation:if μ∗n denotes the expression of a rank correlation measure μn obtained by substituting one t of ranks with the one of its reverd ranks,then a symmetric measure is μ n =12 μn −μ∗n (notice that,if μn is already symmetric,then μ n =μn ).By applying such a logic to the Spearman’s footrule,one gets
γn :=12 ϕn −ϕ∗n =2 n 2/2 n i =1[|n +1−R i 1−R i 2|−|R i 1−R i 2|].(3)
Neln and Úbeda-Flores (2004)recognized that the symmetric footrule in (3)is in fact the indice di cograduazione mplice introduced by the Italian statistician Corrado Gini (e Gini 1954and,for a recent review,Genest et al.2010),also known as Gini’s rank association coefficient or simply as Gini’s gamma.Besides solving the lack of symmetry of (2),γn proves to posss good sample properties,as shown in Cifarelli and Regazzini (1977)and in Conti and Nikitin (1999).Moreover,a multivariate versio
倒装句的用法n of Gini’s gamma has been recently propod by Behboodian et al.(2007).
In analogy with the above discussion about ϕn and γn ,this work faces the problem of the symmetrization of a rank correlation measure recently propod by Borroni
A new rank correlation measure
and Zenga (2007).In this latter paper,instead of measuring the extent of associa-tion by the differences R i 1−R i 2(i =1,...,n ),like for ρn ,ϕn and γn ,the sums T i :=R i 1+R i 2(i =1,...,n )were considered.It is easily en that,when the two quences of ranks are perfectly discordant (i.e.R i 1=n +1−R i 2,i =1,...,n ),the total ranks T i ’s are constantly equal to n +1;converly,if the two quences are perfectly concordant (i.e.R i 1=R i 2,i =1,...,n ),such totals show the maximum variability,being equal to the t of multiples of two,{2,4,...,2n }.Hence,any index of variability of the total ranks can be ud as a sample measure of the association between the two variables X and Y .When the variance is chon,ρn is re-obtained,up to a linear transformation.Borroni and Zenga (2007)propod instead to u a variability measure bad on the L 1-norm,the mean difference ,once more introduced by Corrado Gini (1912).One has esntially to compute the sum of the n 2absolute differences T i −T j (i ,j =1,...,n ),between each pair of total ranks;after normal-izing the index,so that it can range in [−1,1],the following rank correlation measure is obtained:
δn :=3n n i =1n j =1 R i 1+R i 2−R j 1−R j 2 −1.(4)
mpiayerIt is easily en that (4)does not posss the property of symmetry above discusd for the Spearman’s footrule.On the applicative side,this fact was emphasized also in Borroni and Zenga (2007),who showed that the test bad on δn does not lead to com-parable powers when positive or negative dependence are considered as alternative hypothes.However,a symmetric version of δn can be built by applying the method ud in Salama and Quade (2001).One gets then the new statistic:
βn :=12 δn −δ∗n =32(n 3−n )n i =1n j =1
R i 1+R i 2−R j 1−R j 2 − R i 1−R i 2−R j 1+R j 2 .(5)The modification leading to βn can be also justified by considering a related prop-erty of symmetry needed for the population version of δn .It is uful to consider all rank correlation measures as estimators of corresponding parameters characteriz-ing the joint cdf H (x ,y )of (X ,Y )who marginal cdf are respectively denoted as F (x )and G (y )and assumed to be continuous.For instance,γn can be regarded as an estimator of the association measure of (X ,Y )
G (H ):=2
2[|1−F (x )−G (y )|−|F (x )−G (y )|]d H (x ,y ).(6)
Technically speaking,(6)is a functional defined on the Fréchet class (F ,G )of all cdf’s with the given marginal distributions.Actually G (H )belongs to a wide class of association measures studied by Cifarelli et al.(1996),which in fact includes the
C.G.Borroni
population version of ρn as well.The measures share the property of vanishing for the elements of the Fréchet class named (again after Corrado Gini)as indifferent ,that is when
H (x ,y )=F (x )−H (x ,G −1(1−G (y )))=G (y )−H (F −1(1−F (x )),y ).
Being zero at indifference esntially means that the association measure must con-sider symmetrically positive and negative dependence.A joint cdf is indeed indif-ferent when it does not show any “tendency”to positive or negative dependence.Independence (H (x ,y )=F (x )G (y ))is then just a particular ca of indifference.A simple example of indifference without independence is instead H (x ,y )=12 H −(x ,y )+H +(x ,y ) ,where H −(x ,y ):=max {0,F (x )+G (y )−1}and H +(x ,y ):=min {F (x ),G (y )}denote the extreme elements of the Fréchet class (for further details,e also Cifarelli and Regazzini 1990and Conti 1993).
The population version of δn is
D (H ):=3
4
|F (x 1)+G (y 1)−F (x 2)−G (y 2)|d H (x 1,y 1)d H (x 2,y 2)−1.
(7)It can be easily shown that D (H )does not vanish in all cas of indifference (for instance in the above reported example).This lack of “symmetry”can then be regarded as a drawback to be corrected.Cifarelli et al.(1996)show that a functional vanishing at indifference can be defined by subtracting a corresponding discordance measure from a given concordance measure (e Cifarelli and Regazzini 1990for further details).Roughly speaking,a concordance measure is a distance of a given cdf from the mini-mum element of the Fréchet class.More specifically,it is a suitable mean of the random vertical distance of the point (F (X ),G (Y ))from the line S 2≡{(x ,y ):y =1−x }.The line S 1≡{(x ,y ):y =x }is instead to be considered to define a corresponding discordance measure,which is the distance of the cdf from the maximum element of the Fréchet class.As D (H )is a concordance measure,a corresponding discordance measure can then be defined as ¯D
(H ):=3 4
|F (x 1)−G (y 1)−F (x 2)+G (y 2)|d H (x 1,y 1)d H (x 2,y 2)−1.Hence,a suitable functional on the Fréchet class,measuring association and vanishing at indifference,is
B (H ):=12 D (H )−¯D (H ) ==32 4
[|F (x 1)+G (y 1)−F (x 2)−G (y 2)|−|F (x 1)−G (y 1)−F (x 2)+G (y 2)|]d H (x 1,y 1)d H (x 2,y 2)(8)
However,B (H )is exactly the population version of βn ,defined in (5).The above discussion about association measures as differences between a concor-dance and a discordance measure is found in Neln (1998)in terms of copulas.Inaward winning
fuck awayA new rank correlation measure
effect,the functional(8)can be also defined in terms of the copula of the continuous random vector(X,Y),i.e.by considering the unique function C:[0,1]2→[0,1] such that H(x,y)=C(F(x),G(y))for every(x,y)∈ 2.It is trivial to show that (8)equals
3
[|u1+v1−u2−v2|−|u1−v1−u2+v2|]d C(u1,v1)d C(u2,v2).(9) 2
[0,1]4
The above expression is uful to emphasize that the population version ofβn,depend-ing only on the copula of(X,Y),is margin-free.In the following we will then equiva-lently indicate such a functional as B(H),B(C)or B(X,Y),as it is clearly well-defined for every continuous bivariate random vector(X,Y).
Afinal remark concerning B(·)can be made.By applying Corollary3.1of Tchen (1980),Borroni and Zenga(2007)showed that(7),the population version ofδn, agrees with the so-called concordance hat D(H1)≤D(H2)whenever two bivariate cdf’s are considered so that H1(x,y)≤H2(x,y)for every(x,y)∈ 2.
A similar argument can be easily ud to prove that the same property,which is alternatively formulated in terms of copulas,holds for B(·).Moreover,it is trivial to show that:(i)B(X,Y)=B(Y,X);(ii)B(X,X)=1,B(X,−X)=−1and −1≤B(X,Y)≤1;(iii)B(X,−Y)=B(−Y,X)=−B(X,Y).Finally,notice that the function|u1+v1−u2−v2|−|u1−v1−u2+v2|in(9)is bounded and continuous for every(u1,u2,v1,v2)∈[0,1]4;hence,it immediately follows that lim n→∞B(C n)=B(C),whenever the quence of copulas{C n}converges point-wi to the copula C.Now,by recalling that B(·)is built to vanish at indifference, so that B(X,Y)=0if X and Y are independent,one can claim that the population version ofβn is a measure of concordance in the n of Scarsini(1984)(e also Neln2006).
This article aims at studying the sample properties ofβn and the performance of the related nonparametric test.Unfortunately,just a minor part of the sample properties of βn can be straightforwardly derived from the one ofδn.Hence,an independent study is to be conducted.Moreover,the performance ofβn as a rank correlation test is to be compared to the one ofδn,so as to verify that the new propod test can overcome the deficiencies of the former one.Of courβn is also to be compared with its classical competitors,such as Spearman’s rho and Gini’s gamma.Section2deals with the null distribution of the test statisticβn:the exact null expectation and variance ofβn are derived and a table of critical values forfinite sample sizes is reported.Moreover,the null asymptotic normality ofβn is proved.Section3deals with the power of the test bad onβn:some results of a simulation study are reported to show in which situations the propod test has better performances.Sect.4summarizes and concludes.
2Null distribution ofβn
To implement a test of independence bad onβn,its null distribution has to be studied. As for other rank correlation measures,βn can be computed by sorting one sample, say y1,...,y n,in ascending order and by arranging accordingly the corresponding
C.G.Borroni
Table1Critical values of the two-sided2α-level test bad on βn
a Simulated n\α0.0050.010.0250.050.1
5 1.00000.75000.75000.60000.5500 60.82860.71430.68570.60000.4571 70.76790.69640.60710.51790.4107 80.69050.64290.54760.47620.3810 90.65000.60000.51670.44170.3500 100.61820.56360.48480.41210.3273 110.58640.53640.46360.39090.3091 120.55940.51400.44060.37410.2937 130.53850.49180.42030.35710.2802 14a0.51870.47250.40440.34290.2703 15a0.50000.45540.38930.33040.2589 16a0.48380.44120.37650.31910.2500 17a0.46940.42770.36400.30880.2414 18a0.45510.41490.35290.29930.2343 19a0.44300.40350.34300.29040.2281 20a0.43160.39250.33380.28270.2218
values of the cond sample,denoted now as x(1),...,x(n).If R(1),...,R(n)denote the ranks of this latter re-arranged sample,βn can be computed as
uboundβn=3
2(n
n
i=1围绕英文
n
j=1
R(i)+i−R(j)−j
假期英语培训−
shincoR(i)−i−R(j)+j
.
Under the null hypothesis H0,(R(1),...,R(n))take every permutation of the t {1,...,n}with the same probability1/n!.The null distribution can then be computed by enumerating such permutations and by recording the frequency of every distinct value ofβn.This procedure becomes quickly impractical as n increas.However, one can notice that the null distribution ofβn is already very clo to normality fro
m sample sizes as low as n=10.The asymptotic null distribution ofβn will be later derived in Theorem2.
Bad on the null distribution ofβn,the critical values of the corresponding test of independence can be determined.If a two-sided test is implemented,the null hypoth-esis H0is rejected whenever|βn|>bα,where the critical value bαis such that Pr{|βn|>bα|H0}≤2αand2αis the significance level of the test.Table1reports the critical values bαfor some lected values ofαand for n=5,...,20.When the distribution ofβn could not be exactly determined,the critical values were simulated by randomly drawing1010permutations of the t{1,...,n}.
An important issue in the study of the null distribution ofβn is the determination of its expectation and variance as functions of the sample size n.Notice that,under the null hypothesis,the statisticsδn andδ∗n in(5)are equally distributed.Hence the null expectation ofβn is easily en to be zero.Unfortunately,the null variance of