首页 > 美文鉴赏

python不放回抽样_Python实现一个带权无回置随机抽选函数的方法

更新时间:2023-05-12 00:30:35 阅读：评论：0

python不放回抽样_Python实现⼀个带权⽆回置随机抽选函数

的⽅法

需求

有⼀个抽奖应⽤，从所有参与的⽤户抽出K位中奖⽤户(K=奖品数量)，且要根据每位⽤户拥有的抽奖码数量作为权重。

如假设有三个⽤户及他们的权重是: A(1), B(1), C(2)。希望抽到A的概率为25%，抽到B的概率为25%, 抽到C的概率为50%。

分析

⽐较直观的做法是把两个C放到列表中抽选，如[A, B, C, C]，使⽤Python内置的函数random.choice[A, B, C, C], 这样C抽到的概率即为50%。

这个办法的问题是权重⽐较⼤的时候，浪费内存空间。

更⼀般的⽅法是，将所有权重加和4，然后从[0, 4)区间⾥随机挑选⼀个值，将A, B, C占⽤不同⼤⼩的区间。[0,1)是A, [1,2)是B, [2,4)是C。

使⽤Python的函数random.ranint(0, 3)或者int(random.random()*4)均可产⽣0-3的随机整数R。判断R在哪个区间即选择哪个⽤户。

接下来是寻找随机数在哪个区间的⽅法，

⼀种⽅法是按顺序遍历列表并保存已遍历的元素权重综合S，⼀旦S⼤于R，就返回当前元素。

from operator import itemgetter

urs = [('A', 1), ('B', 1), ('C', 2)]

total = sum(map(itemgetter(1), urs))

rnd = int(random.random()*total) # 0~3

s = 0

for u, w in urs:

s += w

if s > rnd:

return u

不过这种⽅法的复杂度是O(N)，因为要遍历所有的urs。

可以想到另外⼀种⽅法，先按顺序把累积加的权重排成列表，然后对它使⽤⼆分法搜索，⼆分法复杂度降到O(logN)(除去其他的处理)

urs = [('A', 1), ('B', 1), ('C', 2)]

cum_weights = list(itertools.accumulate(map(itemgetter(1), urs))) # [1, 2, 4]

total = cum_weights[-1]

rnd = int(random.random()*total) # 0~3

hi = len(cum_weights) - 1

index = bict.bict(cum_weights, rnd, 0, hi)

return urs(index)[0]

Python内置库random的choices函数(3.6版本后有)即是如此实现，random.choices函数签名为 random.choices(population, weights=None, *, cum_weights=None, k=1) population是待选列表， weights是各⾃的权重，cum_weights是可选的计算好的累加权重(两者选⼀)，k是抽选数量(有回置抽选)。源码如下:

def choices(lf, population, weights=None, *, cum_weights=None, k=1):

"""Return a k sized list of population elements chon with replacement.

If the relative weights or cumulative weights are not specified,

the lections are made with equal probability.

"""

random = lf.random

if cum_weights is None:

if weights is None:

_int = int

total = len(population)

return [population[_int(random() * total)] for i in range(k)]

cum_weights = list(_itertools.accumulate(weights))

elif weights is not None:

rai TypeError('Cannot specify both weights and cumulative weights')

if len(cum_weights) != len(population):

rai ValueError('The number of weights does not match the population')

bict = _bict.bict

total = cum_weights[-1]

hi = len(cum_weights) - 1

return [population[bict(cum_weights, random() * total, 0, hi)]

for i in range(k)]

更进⼀步

因为Python内置的random.choices是有回置抽选，⽆回置抽选函数是random.sample，但该函数不能根据权重抽选

(random.sample(population, k))。

原⽣的random.sample可以抽选个多个元素但不影响原有的列表，其使⽤了两种算法实现, 保证了各种情况均有良好的性能。 (源码地址：random.sample)

第⼀种是部分shuffle，得到K个元素就返回。时间复杂度是O(N)，不过需要复制原有的序列，增加内存使⽤。

result = [None] * k

n = len(population)

pool = list(population) # 不改变原有的序列

for i in range(k):

j = int(random.random()*(n-i))

result[k] = pool[j]

pool[j] = pool[n-i-1] # 已选中的元素移⾛，后⾯未选中元素填上

return result

⽽第⼆种是设置⼀个已选择的t，多次随机抽选，如果抽中的元素在t内，就重新再抽，⽆需复制新的序列。当k相对n较⼩时，random.sample使⽤该算法，重复选择元素的概率较⼩。

lected = t()

lected_add = lected.add # 加速⽅法访问

for i in range(k):

j = int(random.random()*n)

while j in lected:

j = int(random.random()*n)

lected_add(j)

result[j] = population[j]

return result

抽奖应⽤需要的是带权⽆回置抽选算法，结合random.choices和random.sample的实现写⼀个函数weighted_sample。

⼀般抽奖的⼈数都⽐奖品数量⼤得多，可选⽤random.sample的第⼆种⽅法作为⽆回置抽选，当然可以继续优化。

代码如下：

def weighted_sample(population, weights, k=1):

"""Like random.sample, but add weights.

"""

n = len(population)

if n == 0:

return []

if not 0 <= k <= n:

rai ValueError("Sample larger than population or is negative")

if len(weights) != n:

rai ValueError('The number of weights does not match the population')

cum_weights = list(itertools.accumulate(weights))

total = cum_weights[-1]

if total <= 0: # 预防⼀些错误的权重

return random.sample(population, k=k)

hi = len(cum_weights) - 1

lected = t()

_bict = bict.bict

_random = random.random

lected_add = lected.add

result = [None] * k

for i in range(k):

j = _bict(cum_weights, _random()*total, 0, hi)

while j in lected:

j = _bict(cum_weights, _random()*total, 0, hi)

lected_add(j)

result[i] = population[j]

return result

以上就是本⽂的全部内容，希望对⼤家的学习有所帮助，也希望⼤家多多⽀持脚本之家。

本文发布于:2023-05-12 00:30:35，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/89/885515.html

上一篇：党建和业务深度融合具体做法十八篇

下一篇：对党支部委员的意见和建议范文(通用12篇)

标签：抽选权重元素回置函数

留言与评论（共有 0 条评论）