Randindex (兰德指数)原理以及numpy 和pytorch 实现
什么是Rand 指数
关于Rand指数的定义我发现上总结得到位,我也就不再进⾏赘述,为了本⽂的完整性和以防国内打不开维基百科,我这⾥就当⼀次搬运⼯,当然有条件的还是建议去维基百科上去看原⽂~~
Rand Index
The Rand index or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. From a mathematical standpoint, Rand index is related to the accuracy, but is applicable even when class labels are not ud.
Definition
Given a t of elements and two partitions of to compare, , a partition of into subts, and , a partition of into subts, define the following:
a, the number of pairs of elements in that are in the same subt in and in the same subt in b, the number of pairs of elements in that are in different subts in and in different subts in c, the number of pairs of elements in that are in the same subt in and in different subts in d, the number of pairs of elements in that are in different subts in and in the same subt in The Rand index, R, is:
Intuitively, can be considered as the number of agreements between and , and as the number of disagreements between and .
Since the denominator is the total number of pairs, the Rand index reprents the frequency of occurrence of agreements over the total pairs, or the probability that and will agree on a randomly chon pair, e.g., .
Similarly, one can also view the Rand index as a measure of the percentage of correct decisions made by the algorithm. It can be computed using the following formula:
where is the number of true positives, is the number of true negatives, is the number of fal positives, and
is the number of fal negatives.
庄周打野出装
Properties
The Rand index has a value between 0 and 1, with 0 indicating that the two data clusterings do not agree on any pair of points and 1 indicating that the data clusterings are exactly the same.In mathematical terms, a, b, c, d are defined as follows:
n S ={o ,...,o }1n S X ={X ,...,X }1r S r Y ={Y ,...,Y }1s S s S X Y S X Y S X Y S X Y R ==a +b +c +d a +b
=C n 2
a +
b n (n −1)/2
a +
b a +b X Y
c +
d X Y X Y C =n 2
n (n −1)/2RI =TP +FP +FN +TN
TP +TN
TP TN FP FN
, where , where , where , where for some , , , , , Relationship with classification accuracy
The Rand index can also be viewed through the prism of binary classification accuracy over the pairs of elements in . The two class labels are " and are in the same subt in and " and " and are in different subts in and ".In that tting, is the number of pairs correctly labeled as belonging to the same subt (true positives), and is the number of pairs correctly labeled as belonging to different subts (true negativess).
The contingency table
Given a t of elements, and two groupings or partitions (e.g. clusterings) of the elements, namely and , the overlap between and can be summarized in a contingency table where each entry denotes the number of objects in common between and : .
\ …Sums
………
…
…
……
…
…Sums
…
Adjusted Rand index
The adjusted Rand index is the corrected-for-chance version of the Rand index. Such a correction for chance establishes a baline by using the expected similarity of all pair-wi comparisons between clusterings specified by a random model.Traditionally, the Rand Index was corrected using twifi万能钥匙怎么用
he Permutation Model for clusterings (the number and size of clusters within a clustering are fixed, and all random clusterings are generated by shuffling the elements between the fixed clusters).However, the premis of the permutation model are frequently violated; in many clustering scenarios, either the number of clusters or the size distribution of tho clusters vary drastically. For example, consider that in K-means the number of clusters is fixed by the practitioner, but the sizes of tho clusters are inferred from the data. Variations of the adjusted Rand Index account for different models of random clusterings.
Though the Rand Index may only yield a value between 0 and +1, the adjusted Rand index can yield negative values if the index is less than the expected index.
上⾯全是上的内容,当了⼀个搬运⼯,还是那句话,有条件的去维基百科去查看原⽂~~
接下来就是⾃⼰对Rand index的“⽩话⽂”解释了,希望能对⼤家有⼀点点的帮助,如有错误,也希望⼤家能及时指出,谢谢
⼀个⼆分类的例⼦
a =∣S ∣∗S =∗{(o ,o )∣o ,o ∈i j i j X ,o ,o ∈k i j Y }
l b =∣S ∣∗S =∗{(o ,o )∣o ∈i j i X ,o =k 1j X ,o ∈k 2i Y ,o ∈l 1j Y }l 2c =∣S ∣∗S =∗{(o ,o )∣o ,o ∈i j i j X ,o ∈k i Y ,o ∈l 1j Y }
l 2d =∣S ∣∗S =∗{(o ,o )∣o ∈i j i X ,o =k 1j X ,o ,o ∈k 2i j Y }l 1≤i ,j ≤n i = j 1≤k ,k ,k ≤12r k =1 k 21≤l ,l ,l ≤12s l =1 l 2
S o i o j X Y o i o j X Y a b S n X =X ,X ,...,X 12r Y =Y ,Y ,...,Y 12s X Y [n ]ij n ij X i Y j n =ij ∣X Y ∣i ⋂j X Y Y 1Y 2Y s X 1n 11n 12n 1s a 1X 2
n 21
n 22
n 2s
a 2
X r
n r 1n r 2n rs a r b 1
b 2
b s
N
如上图,⽤⼀个最简单的例⼦来解释Rand index的代码实现过程。我们假设就是预测的结果:, , 为全集
是groundtruth (GT)结果:, .
是左边这个圆,是右边这个圆,我们称和为前景,和为背景。
现在将两者重叠放在⼀起,假设出现上⾯的情况,即前景只有部分重叠。整个全集被分成四个部分,.所以可得The contingency table (T)为:
\ Sums
Sums
箭头表⽰pair对:
对于四个⼦集,共有10种pair连接⽅式:我们⼤致将这10种pair对分成两类,即和两类,分别⽤蓝⾊和红⾊表⽰
以红⾊的(1)为例: 在中,两个端点属于同⼀类 (都属于),⽽在中却不是,左端点属于,右端点属于,不是同⼀类。所以对于红⾊的(1)pair对应该属于中的情况。其他的情况不再⼀⼀列举,是⼀样的意思从图中也可以看出计算⽐要容易⼀些,所以我们⼀般将Rand index的计算改为:
⽽:
招聘方案模板
这⾥,代表的是The contingency table,上⾯的化简是为了得到最后⼀步的矩阵运算,我们本可以直接使⽤第⼆个等号后⾯的⽅法计算
的,但当不是⼆分类的时候,该等式的计算⽅式是⾮常低效的(避免不了要使⽤for循环),但如果我们化简为最后⼀步的⽅式时,不
再需要循环运算,全部依赖矩阵运算(从代码的⾓度上来说就是⼀⾏的事),是⾮常简洁且⾼效的
小学英语词汇大全numpy 实现
X X ={X ,X }12X +1X =2U U Y Y ={Y ,Y }12Y +1Y =2U X 1Y 1X 1Y 1X 2Y 2U A ,B ,C ,D A +B +C +D =U X Y Y 1Y 2X 1A =n 11B =n 12a 1X 2
C =n 21
D =n 22
a 2
b 1
b 2
N米饭煮多久
邯郸教育考试院
A ,
B ,
C ,
D 4+C =42
10
a +
b
c +
d X X 1Y Y 2Y 1c +d c +d a +b R ==n (n −1)/2a +b 1−n (n −1)/2
c +d
c +
d =(1)+(2)+(3)+(4)
=n ∗n +n ∗n +n ∗n +n ∗n 1112112112222122
=[C −C −C ]+[C −C −C ]+[C −C −C ]+[C −C −C ]n +n 11122n 112
n 122圭亚那粉趾
n +n 11212
n 112
n 212n +n 12222n 122n 222n +n 21222n 212n 222
=[C −C −C ]+[C −C −C ]+[C −C −C ]+[C −C −C ]a 12n 112n 122
b 12
n 112
n 212
b 22
n 122
n 222
a 22
n 212
n 222
=[C +C +C +C ]−2∗[C +C +C +C ]
a 12
a 22
b 12
b 22
n 112
n 122
n 212
n 222=[(a )/2+(a )/2+(b )/2+(b )/2]−2∗[(n )/2+(n )/2+(n )/2+(n )/2]12221222112122212222=[(a )+(a )+(
b )+(b )]/2−[(n )+(n )+(n )+(n )]12221222112122212222=[T .sum (1).pow (2).sum ()+T .sum (0).pow (2).sum ()]/2−T .pow (2).sum ()
T c +d
import numpy as np
def Rand_index_numpy(predMasks, gtMasks):
'''
predMasks: Numpy-array, Predcition result; shape: [r, H, W], (r>=1)
gtMasks: Numpy-array, Groundtruth; shape: [s, H, W], (s>=1)
'''
gtMasks = np.concatenate([gtMasks, np.clip(1- np.sum(gtMasks, axis=0, keepdims=True), a_min=0, a_max=1)], axis=0)
# 在GT上扩充⼀个类别,即除去所有前景(s类),剩下的背景归为⼀类
predMasks = np.concatenate([predMasks, np.clip(1- np.sum(predMasks, axis=0, keepdims=True), a_min=0, a_max=1)], axis=0) # 在prediction上扩充⼀个类别,即除去所有前景(r类),剩下的背景归为⼀类
T =(np.expand_dims(gtMasks, axis=1)* predMasks).sum(-1).sum(-1).astype(np.float32)
# The contingency table
N = T.sum()
# 所有的像素数量
RI =1-((np.power(T.sum(0),2).sum()+ np.power(T.sum(1),2).sum())/2- np.power(T,2).sum())/(N *(N -1)/2)
return RI
pytorch实现
import torch
def Rand_index_torch(predMasks, gtMasks):
'''
predMasks: Tensor, Predcition result; shape: [r, H, W], (r>=1)
gtMasks: Tensor, Groundtruth; shape: [s, H, W], (s>=1)
'''
gtMasks = torch.cat([gtMasks, torch.clamp(1- gtMasks.sum(0, keepdim=True),min=0)], dim=0)
# 在GT上扩充⼀个类别,即除去所有前景(s类),剩下的背景归为⼀类
predMasks = torch.cat([predMasks, torch.clamp(1- predMasks.sum(0, keepdim=True),min=0)], dim=0)
# 在prediction上扩充⼀个类别,即除去所有前景(r类),剩下的背景归为⼀类
大气磅礴的音乐
T =(gtMasks.unsqueeze(1)* predMasks).sum(-1).sum(-1).float()
# The contingency table
N = T.sum()
# 所有的像素数量
RI =1-((T.sum(0).pow(2).sum()+ T.sum(1).pow(2).sum())/2- T.pow(2).sum())/(N *(N -1)/2)
return RI