知识图谱-数据集
DBpedia
⽹址:
简介:
DBpedia 是⼀个很特殊的语义⽹应⽤范例,它从维基百科(Wikipedia)的词条⾥撷取出结构化的资料,以强化维基百科的搜寻功能,并将其他资料集连结⾄维基百科。透过这样的语意化技术的介⼊,让维基百科的庞杂资讯有了许多创新⽽有趣的应⽤,例如⼿机版本、地图整合、多⾯向搜寻、关系查询、⽂件分类与标注等等。DBpedia 同时也是世界上最⼤的多领域知识本体之⼀,也是 Linked Data 的⼀部分,美国科技媒体 ReadWriteWeb 也将 DBpedia 选为2009 年最佳的语义⽹应⽤服务。
DBpedia 2014 版的资料集拥有超过458万的物件,包括144万5000⼈、73万5000个地点、12万3000张唱⽚、8万7千部电影、1万9000种电脑游戏、24万1000个组织、25万1000种物种和6000个疾病。其资料不仅被BBC、路透社、纽约时报所采⽤,也是Google、Yahoo等搜寻引擎检索的对象。
2016年发布的版本中,包括了95亿条RDF格式的三元组数据,其中13亿条是从英⽂版的维基百科中提取的50亿条来⾃其他语⾔,另外32亿条来⾃Depedia Commons和Wikidata。
⽂献:
@article{DBLP:journals/ws/BizerLKABCH09,
author = {Christian Bizer and
Jens Lehmann and
Georgi Kobilarov and
S{\"{o}}ren Auer and
装表接电
Christian Becker and
Richard Cyganiak and
Sebastian Hellmann},
title = {DBpedia - {A} crystallization point for the Web of Data},
journal = {J. Web Semant.},
volume = {7},
number = {3},
pages = {154--165},
year = {2009},
url = {doi/10.1016/j.webm.2009.07.002},
doi = {10.1016/j.webm.2009.07.002},
timestamp = {Fri, 27 Dec 2019 21:12:44 +0100},
biburl = {dblp/rec/journals/ws/BizerLKABCH09.bib},
bibsource = {dblp computer science bibliography, dblp}
}
Yago
⽹址:
中⽂简介:
Yago是⼀个开源的数据集,其中的数据是从维基百科、WordNet和GeoNames等多个数据源中⾃动提取得到的。截⽌到2012年,就包括超过1千万个实体和1.2亿条事实。
英⽂简介:
YAGO (Yet Another Great Ontology) is an open source knowledge ba developed at the Max Planck Institute for Computer Science in Saarbrücken. It is automatically extracted from Wikipedia and other sources.
As of 2012, YAGO3 has knowledge of more than 10 million entities and contains more than 120 million facts about the entities. The information in YAGO is extracted from Wikipedia (e.g., categories, redirects, infoboxes), WordNet (e.g., synts, hyponymy), and GeoNames. The accuracy of YAGO was manually evaluated to be above 95% on a sample of facts.[To integrate it to the linked data cloud, YAGO has been linked to the DBpedia ontology[6] and to the SUMO ontology.
YAGO3 is provided in Turtle and tsv formats. Dumps of the whole databa are available, as well as
thematic and specialized dumps. It can also be queried through various online browrs and through a SPARQL endpoint hosted by OpenLink Software. The source code of YAGO3 is available on GitHub.
YAGO has been ud in the Watson artificial intelligence system.
⽂献:
@inproceedings{DBLP:conf/www/SuchanekKW07,
author = {Fabian M. Suchanek and
Gjergji Kasneci and
Gerhard Weikum},
editor = {Carey L. Williamson and
Mary Ellen Zurko and
Peter F. Patel{-}Schneider and
Prashant J. Shenoy},
title = {Yago: a core of mantic knowledge},
booktitle = {Proceedings of the 16th International Conference on World Wide Web,
{WWW} 2007, Banff, Alberta, Canada, May 8-12, 2007},
pages = {697--706},
publisher = {{ACM}},
year = {2007},
url = {doi/10.1145/1242572.1242667},
doi = {10.1145/1242572.1242667},
timestamp = {Wed, 14 Nov 2018 10:55:41 +0100},
biburl = {dblp/rec/conf/www/SuchanekKW07.bib},
bibsource = {dblp computer science bibliography, dblp}
}
Freeba
⽹址:
简介:
类似于维基百科,Freeba的内容是由社区成员贡献的结构化知识。除了⼈⼯输⼊外,Freeba也主动导⼊如维基百科的结构化知识。
⽬前,已经被⾕歌公司收购。
论⽂中常⽤其⼦集FB13,详见:
⽂献:
@inproceedings{DBLP:conf/sigmod/BollackerEPST08,
author = {Kurt D. Bollacker and
Colin Evans and
Praveen Paritosh and
Tim Sturge and
Jamie Taylor},
editor = {Jason Tsong{-}Li Wang},
title = {Freeba: a collaboratively created graph databa for structuring
张姓女孩名human knowledge},
booktitle = {Proceedings of the {ACM} {SIGMOD} International Conference on Management
of Data, {SIGMOD} 2008, Vancouver, BC, Canada, June 10-12, 2008},
pages = {1247--1250},
publisher = {{ACM}},
year = {2008},
url = {doi/10.1145/1376616.1376746},
doi = {10.1145/1376616.1376746},
timestamp = {Tue, 27 Nov 2018 10:40:37 +0100},
biburl = {dblp/rec/conf/sigmod/BollackerEPST08.bib},
bibsource = {dblp computer science bibliography, dblp}
}
WordNet
⽹址:
中⽂简介:
WordNet是⼀个⼤型的英语词汇数据库。其中,名词、动词、形容词以及副词被按照认知上的同义词
分组,称为synts,每⼀个synt 表征⼀个确定的概念。synt之间通过概念语义以及词汇关系链接。WordNet是计算机语⾔学和⾃然语⾔处理中常⽤的⼯具。
在汉语中,类似的有知⽹的HowNet。
论⽂中常⽤其⼦集WN11,详见:;
以及WN18,详见:
英⽂简介:
WordNet is a large lexical databa of English. Nouns, verbs, adjectives and adverbs are grouped into ts of cognitive synonyms (synts), each expressing a distinct concept. Synts are interlinked by means of conceptual-mantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browr. WordNet is also freely and publicly available for download. WordNet’s structure makes it a uful tool for computational linguistics and natural language processing.
WordNet superficially rembles a thesaurus, in that it groups words together bad on their meanings. However, there are some important distinctions. First, WordNet interlinks not just word for
ms—strings of letters—but specific ns of words. As a result, words that are found in clo proximity to one another in the network are mantically disambiguated. Second, WordNet labels the mantic relations among words, whereas the groupings of words in a thesaurus does not follow any explicit pattern other than meaning similarity.
⽂献:
@article{DBLP:journals/cacm/Miller95,
author = {George A. Miller},
title = {WordNet: {A} Lexical Databa for English},香辣基围虾
journal = {Commun. {ACM}},
volume = {38},
number = {11},
pages = {39--41},
year = {1995},
url = {doi.acm/10.1145/219717.219748},
doi = {10.1145/219717.219748},
timestamp = {Wed, 14 Nov 2018 10:22:30 +0100},
biburl = {dblp/rec/journals/cacm/Miller95.bib},
bibsource = {dblp computer science bibliography, dblp}
}
PDD
⽹址:
中⽂简介:
PDD,全称Patient-Dia-Drug,是⼀个医疗相关的数据集,包含了患者、疾病和药物之间的连接关系。
英⽂简介:
What is PDD Graph (Patient-Dia-Drug Graph):
田园春色Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Facing with patients symptoms, experienced caregivers make right medical decisions bad on their professional knowledge that accurately grasps relationships between symptoms, diagnosis, and treatments. We aim to capture the relationships by constructing a large and high-quality heterogeneous graph linking patients, dias, and drugs (PDD) in EMRs.
Specifically, we extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph prented is accessible on the Web via the SPARQL endpoint, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.
⽂献:
@inproceedings{DBLP:conf/mweb/WangZLHWLL17,
审计结论
author = {Meng Wang and
Jiaheng Zhang and
Jun Liu and
Wei Hu and
Sen Wang and
Xue Li and
Wenqiang Liu},
editor = {Claudia d'Amato and
Miriam Fern{\'{a}}ndez and
Valentina A. M. Tamma and
Freddy L{\'{e}}cu{\'{e}} and
Philippe Cudr{\'{e}}{-}Mauroux and
Juan F. Sequeda and
Christoph Lange and
Jeff Heflin},
title = {{PDD} Graph: Bridging Electronic Medical Records and Biomedical Knowledge
Graphs via Entity Linking},
booktitle = {The Semantic Web - {ISWC} 2017 - 16th International Semantic Web Conference,
Vienna, Austria, October 21-25, 2017, Proceedings, Part {II}},
ries = {Lecture Notes in Computer Science},认知能力训练
volume = {10588},
pages = {219--227},
publisher = {Springer},
year = {2017},
url = {doi/10.1007/978-3-319-68204-4\_23},
doi = {10.1007/978-3-319-68204-4\_23},
timestamp = {Tue, 14 May 2019 10:00:53 +0200},
biburl = {dblp/rec/conf/mweb/WangZLHWLL17.bib},
bibsource = {dblp computer science bibliography, dblp}
}
近些年,国内也推出了以中⽂为主的知识图谱。如清华⼤学的XLore、上海交通⼤学的和复旦⼤学的CNpedia。
清华⼤学的XLore
⽹址:
简介:
XLORE是融合中英⽂维基、法语维基和百度百科,对百科知识进⾏结构化和跨语⾔链接构建的多语⾔知识图谱,是中英⽂知识规模较平衡的⼤规模多语⾔知识图谱。XLORE包含16,284,901个的实例,2,466,956个概念,446,236个属性以及丰富的语义关系。
⽂献:
@inproceedings{DBLP:conf/mweb/WangLWLLZSLZT13,
author = {Zhigang Wang and
Juanzi Li and
k线图咋看Zhichun Wang and
Shuangjie Li and
Mingyang Li and
Dongsheng Zhang and
Yao Shi and
Yongbin Liu and
Peng Zhang and
Jie Tang},
editor = {Eva Blomqvist and
Tudor Groza},
title = {XLore: {A} Large-scale English-Chine Bilingual Knowledge Graph},
booktitle = {Proceedings of the {ISWC} 2013 Posters {\&} Demonstrations Track,
Sydney, Australia, October 23, 2013},
ries = {{CEUR} Workshop Proceedings},
volume = {1035},
pages = {121--124},
publisher = {CEUR-WS},
year = {2013},
url = {ceur-ws/Vol-1035/iswc2013\_demo\_31.pdf},
timestamp = {Wed, 12 Feb 2020 16:44:51 +0100},
biburl = {dblp/rec/conf/mweb/WangLWLLZSLZT13.bib},
bibsource = {dblp computer science bibliography, dblp}心太累
}
上海交通⼤学的
⽹址: ⽆
简介:
< 通过从开放的百科数据中抽取结构化数据,⾸次尝试构建中⽂通⽤知识图谱。⽬前,已融合了三⼤中⽂百科,百度百科,互动百科以及维基百科中的数据。
⽂献: