首页 > 英文翻译

gensim库_LDA主题模型困惑度Perplexity计算

更新时间:2023-05-20 08:16:32 阅读：评论：0

LDA主题模型困惑度Perplexity计算

perplexity是⼀种信息理论的测量⽅法，b的perplexity值定义为基于b的熵的能量（b可以是⼀个概率分布，或者概率模型），通常⽤于概率模型的⽐较。

该部分内容可参考、、

可搜索到的资料都通过编程实现了困惑度的计算，不过gensim库其实⾃带了perplexity的计算模块，稍作修改即可返回模型困惑度。

对于困惑度的理解还⼗分有限，有待⽇后更新。

reali导⼊gensim库

尼泊尔人dels import LdaModel

⾸先，导⼊gensim库的LdaModel模块。

然后，查看gensim\models\ldamodel.py源码。搜索perplexity。

def log_perplexity(lf, chunk, total_docs=None):

"""Calculate and return per-word likelihood bound, using a chunk of documents as evaluation corpus.

Also output the calculated statistics, including the perplexity=2^(-bound), to log at INFO level.

Parameters初中英语感叹句

----------

chunk : {list of list of (int, float), scipy.spar.csc}日语在线翻译

The corpus chunk on which the inference step will be performed.

total_docs : int, optional

plateNumber of docs ud for evaluation of the perplexity.

视频英文Returns

-------

numpy.ndarray

The variational bound score calculated for each word.

"""

忽略是什么意思if total_docs is None:

total_docs = len(chunk)

corpus_words = sum(cnt for document in chunk for _, cnt in document)

subsample_ratio = 1.0 * total_docs / len(chunk)

perwordbound = lf.bound(chunk, subsample_ratio=subsample_ratio) / (subsample_ratio * corpus_words)

logger.info(

"%.3f per-word bound, %.1f perplexity estimate bad on a held-out corpus of %i documents with %i words",

perwordbound, np.exp2(-perwordbound), len(chunk), corpus_words

)

return perwordbound

可以看到在模型输出中，其实有困惑度的计算过程，只是没有输出⽽已。

修改源代码，最后return的部分

花体英文

#添加perplexity变量，输出模糊度

垃圾桶的英文

p2(-perwordbound)

return perwordbound,perplexity

即可，返回困惑度计算结果。

计算困惑度

肽链内切酶lda=LdaModel(common_corpus,num_topics=num_topic,id2word=dic,alpha='auto',chunksize=len(texts_all),iterations=20000)

_,perplexity=lda.log_perplexity(common_corpus)

返回值perplexity即为LDA模型的困惑度。

本文发布于:2023-05-20 08:16:32，感谢您对本站的认可！

本文链接：https://www.wtabcd.cn/fanwen/fan/90/115559.html

上一篇：LightGBM参数调优代码详解

下一篇：LightGBM的参数详解以及如何调优

标签：模型计算输出修改资料稍作返回

留言与评论（共有 0 条评论）