Entity type is a significant component of knowledge graph. It plays a vital role in the application of knowledge graph. However, missing and incomplete of entity types becomes a common quality problem on knowledge graph. A general approach to solve this problem is called entity type prediction, which is bad on classification of machine learning. Classification-bad entity type prediction generates datat from knowledge graph by constructing features of entity, utilizes datat to train classifier, label of which is type of entity, and applies the classifier to predict entity type.
There are some limitations in classification-bad entity typing rearch: classifiers are simple single-label classifiers; it relies on certain knowledge graphs to extract entity information, which means it is poorly versatile; it is not able to utilize multiple dimensional features; it does not exploit the fact that ontology of knowledge graph defines a hierarchical entity type. Aiming at overcoming the shortcomings, this thesis designs a multi-label classification bad entity typing approach, whi
ch can extract text features and link features of knowledge graph, utilize the features to classify properly, u the hierarchy of entity type wily.
There are three phas to accomplish type prediction task in this thesis: data processing pha, model training pha, and model predicting pha. In the data processing pha, according to the difference of text and link of knowledge graph, it is divided into text-rich knowledge graph and link-rich knowledge graph. Various methods are ud to construct solid features for different kind of knowledge graph. In the model training pha, three multi-label classification models are ud: multi-label text classification model called TTPE, multi-feature bad multi-label classification model called MFTPE, and hierarchical multi-label classification model called HTPE. In the model predicting pha, it completes missing and incomplete entity type by utilizing the trained model and generating type asrtions.
This thesis experiments on three different types of real knowledge graph, which are single-type knowledge graph, multi-type knowledge graph and hierarchical type knowledge graph, and verify the scalability and versatility of the method. In addition, this thesis implements a multi-label classification bad entity typing system.
Key words:Entity Typing, Knowledge Graph Refinement, Multi-Label Classification, Hierarchical Multi-Label Classification
摘要............................................................................................................................................................................ I I 目录......................................................................................................................................................................... III 图目录....................................................................................................................................................................... V 表目录..................................................................................................................................................................... VI 第一章绪论 . (1)
研究背景 (1)
研究现状 (3)
研究内容 (3)
论文结构 (4)
第二章相关工作和技术 (5)
知识图谱相关概念 (5)
实体类型预测 (5)
2.2.1 基于分类方法的实体类型预测 (6)
2.2.2 基于概率统计方法的实体类型预测 (7)
2.2.3 基于NLP方法的实体类型预测 (7)
2.2.4 基于逻辑推理的方法 (7)
2.2.5 实体类型预测相关研究小结 (8)
多标签学习 (8)
2.3.1 基于问题转换的多标签学习 (9)
2.3.2 基于算法适应的多标签学习 (9)
2.3.3 多标签学习相关研究小结 (10)
层次多标签学习 (10)
2.4.1 HMC-LMLP算法 (12)
2.4.2 HMCN-F/R算法 (13)
2.4.3 层次多标签学习相关研究小结 (14)
第三章实体类型预测方法 (17)
整体流程 (17)
数据处理阶段 (20)
3.2.1 知识图谱解析 (20)
3.2.2 实体描述和图谱描述 (20)
3.2.3 数据集生成 (21)
模型训练阶段 (26)
3.3.1 TTPE模型 (26)
3.3.2 MFTPE模型 (29)
3.3.3 HTPE模型 (30)
3.3.4 评价方法 (34)
模型预测阶段 (36)
3.4.1 类型缺失实体预测 (36)
3.4.2 类型不完全实体补全 (36)
实验 (37)
3.5.1 实验数据集 (37)
3.5.2 分离实验 (38)
3.5.3 相关工作对比 (44)
本章小结 (45)
第四章系统的设计与实现 (47)
系统需求分析 (47)
4.1.1 功能需求分析 (47)
4.1.2 性能需求分析 (47)
系统框架设计 (47)
4.2.1 系统开发环境 (47)
4.2.2 系统总体架构 (48)
系统模型层实现 (49)
4.3.1 数据处理阶段 (49)
4.3.2 模型训练阶段 (51)
4.3.3 模型预测阶段 (53)
系统视图层与控制层实现 (54)
系统测试 (56)
本章小结 (60)
第五章总结与展望 (61)
本文总结 (61)
未来工作展望 (61)
致谢 (63)
参考文献 (65)
缩略语 (69)
图1.1 KBQA中实体类型作用 (2)
图1.2 DBpedia实体Donald Trump类型 (2)
图2.1 多标签的例子 (8)
图2.2 标签的2种层次结构 (11)
图2.3 树形结构标签例子 (11)
图2.4 局部HMC法的3种方式 (12)
图2.5 类别层级为3层的HMC-LMLP模型结构 (13)
图2.6 HMCN-F模型结构 (14)
图2.7 HMCN-R模型结构 (14)
图3.1 实体类型预测任务的整体流程 (19)
图3.2 DBpedia中实体的例子 (21)
图3.3 类型文档的VSM表示 (25)
图3.4 基于类型文档的FastText模型结构 (27)
图3.5 多模态的3种融合方式 (29)入殓师是什么职业
图3.6 MFTPE模型结构 (30)
图3.7 实体Donald Trump的类型向量 (32)
图3.8 HTPE模型结构 (33)
图3.9 HTPE-L模型结构 (33)
图3.10 层次约束修正例子 (34)
图4.1 实体类型预测系统的总体框架 (48)
图4.2 系统主界面 (54)
图4.3 数据处理界面 (55)
图4.4 模型训练界面 (55)
图4.5 模型预测界面 (56)