Journal of Library Science in China ퟜ뗚쯄쪮컥뻭 뗚뛾쯄쯄웚 Vol. 45. No. 244 DOI: 10.13530 /j.cnki.jlis.190046 횪쪶춼웗퓚쫽ퟖ죋컄훐뗄펦폃퇐뺿 * 돂 쳎 쇵 떥죘죘 훬쟬뮪 햪 튪 횪쪶춼웗쫇샻폃볆쯣믺듦뒢、 맜샭뫍돊쿖룅쓮벰웤쿠뮥맘쾵뗄튻훖벼쫵ꎬ튻뺭쳡돶뇣뫜뿬돉캪릤튵뷧뫍 톧쫵뷧뗄퇐뺿죈뗣ꎬ떫쒿잰뛔횪쪶춼웗뗄죏횪뮹뇈뷏믬싒。틀뻝듦뒢랽쪽늻춬ꎬ횪쪶춼웗뿉럖캪믹폚 RDF 듦뒢 뗄폯틥횪쪶춼웗⠩⁔樍 맘솪쫽뻝⤩⁔樍 뫍믹폚춼쫽뻝뿢뗄맣틥횪쪶춼웗。폯틥횪쪶춼웗⠩⁔樍 맘솪쫽뻝⤩⁔樍 닠훘폚횪쪶뗄랢늼뫍 솴뷓ꎬ맣틥횪쪶춼웗퓲룼닠훘폚횪쪶뗄췚뻲뫍볆쯣ꎬ솽헟횮볤볈폐릲춬뗣ꎬ폖폐늻춬횮뒦。놾컄듓룅쓮닣쏦뫍벼 쫵닣쏦쿪쾸럖컶쇋솽헟횮볤뗄틬춬ꎬ횸돶폯틥횪쪶춼웗⠩⁔樍 맘솪쫽뻝⤩⁔樍 닅쫇마루횪쪶춼웗뗄퇓탸뫍랢햹。쯦뫳ꎬ쳡 돶쇋붫횪쪶춼웗펦폃폚쫽ퟖ죋컄퇐뺿뗄쾵춳뿲볜ꎬ늢퓚듋믹뒡짏릹붨쇋훐맺샺듺죋컯뒫볇쇏뿢뗄맘솪쫽뻝욽 첨⠩⁔樍 CBDBLD⤩⁔樍 。룃욽첨뷨훺횪쪶춼웗뗄샭쓮햹쿖쇋죋컯횮볤럡뢻뗄쟗쫴벰짧믡맘쾵ꎬ탎돉쇋쳘폐뗄짧믡맘쾵췸싧ꎬ 늢뿉춨맽짨훃췆샭맦퓲살쪵쿖죋컯횮볤틾탔맘쾵뗄췚뻲폫돊쿖。맣틥횪쪶춼웗퇐뺿훐럡뢻뗄춼퓋쯣뫍맘솪쫽뻝뗄 뷡뫏붫믡돉캪쫽ퟖ죋컄쇬폲퇐뺿뗄쿂튻룶죈뗣ꎬ듓뛸뾪웴쫽ퟖ죋컄퇐뺿뗄탂쪱듺。춼 10。뇭 2。닎뾼컄쿗 25。 맘볼듊 쫽ퟖ죋컄 횪쪶춼웗 맘솪쫽뻝 횪쪶췆샭 훐맺샺듺죋컯뒫볇쇏뿢 럖샠뫅 G251 TP393 Application of Knowledge Graph in Digital Humanities CHEN Tao ꎬ LIU Weiꎬ SHAN Rongrong & ZHU Qinghua ABSTRACT Knowledge graph is a technique that uses computers to shoreꎬmanageꎬand present concepts and their relationships. This technique became a research hotspot in industry and academia as soon as it was proposed. Howeverꎬthe concept of knowledge graph was quite chaotic in this field. People often confuse Knowledge Map ⠩⁔樍 KM⤩ⴳ㌱⢣갩崠告 Knowledge Graph ⠩⁔樍 KG⤩⁔樍 and Graph Database ⠩⁔樍 GD⤩⁔樍 . Knowledge map should be regarded more as a metrological methodꎬso there is no detailed discussion in this paper. According to different storage methodsꎬ the knowledge graph can be divided into semantic knowledge graph ⠩⁔樍 also called linked dataꎬ based on RDF storage⤩ⴳ㌱⢣갩崠告 and generalized knowledge graph ⠩⁔樍 due to graph databases⤩⁔樍 . Linked data focuses on the release and linking of knowledgeꎬ while the generalized knowledge graph focuses more on the mining and calculation of knowledge. There are both commonalities and differences between the linked data and knowledge graph. This paper analyzes the similarities and differences between the two techniques from the conceptual and technical aspectsꎬ and points out that the linked data is the continuation 034 * 놾컄쾵맺볒짧믡뿆톧믹뷰쿮쒿 “쫽ퟖ죋컄훐춼쿱컄놾풴뗄폯틥뮯붨짨폫뾪럅춼웗릹붨퇐뺿” ⠩⁔樍 뇠뫅㨩⁔樍 19BTQ024⤩⁔樍 뗄퇐뺿돉맻횮튻。⠩⁔樍 This article is an outcome of the project “The Study of Semantic Construction of Image Resources and Open Knowledge Graph in Digital Humanities ”⠩⁔樍 No. 19BTQ024 ⤩⁔樍 supported by National Social Science Foundation of China. ⤩⁔樍 춨탅ퟷ헟㨩⁔樍 쇵ꎬ Email㨩⁔樍 wliu @ libnet. sh. cnꎬ ORCID㨩⁔樍 0000 - 0003 - 2663 - 7539 ⠩⁔樍 Correspondence should be addressed to LIU Weiꎬ Email㨩⁔樍 wliu@ libnet.sh.cnꎬ ORCID㨩⁔樍 0000-0003-2663-7539⤩⁔樍
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Application of Knowledge Graph in Digital Humanities
CHEN Tao,LIU Wei,SHAN Rongrong & ZHU Q inghua
ABSTRACTKnowledge graph is a technique that uses computers to shore,manage,and present concepts and theirrelationships. This technique became a research hotspot in industry and academia as soon as it wasproposed. However,the concept of knowledge graph was quite chaotic in this field. People often confuseKnowledge Map ( KM) ,Knowledge Graph ( KG) and Graph Database ( GD) . Knowledge map should beregarded more as a metrological method,so there is no detailed discussion in this paper. According todifferent storage methods,the knowledge graph can be divided into semantic knowledge graph ( also calledlinked data,based on RDF storage) ,and generalized knowledge graph ( due to graph databases) . Linkeddata focuses on the release and linking of knowledge,while the generalized knowledge graph focuses moreon the mining and calculation of knowledge. There are both commonalities and differences between thelinked data and knowledge graph. This paper analyzes the similarities and differences between the twotechniques from the conceptual and technical aspects,and points out that the linked data is the continuation
034
* 本文系国家社会科学基金项目“数字人文中图像文本资源的语义化建设与开放图谱构建研究”( 编号:
19BTQ024) 的研究成果之一。( This article is an outcome of the project“The Study of Semantic Construction of ImageResources and Open Knowledge Graph in Digital Humanities”( No. 19BTQ024) supported by National Social ScienceFoundation of China.)
通信作者: 刘 炜,Email: wliu @ libnet. sh. cn,ORCID: 0000 - 0003 - 2663 - 7539 ( Correspondence should beaddressed to LIU Wei,Email: wliu@ libnet.sh.cn,ORCID: 0000-0003-2663-7539)
陈 涛 刘 炜 单蓉蓉 朱庆华: 知识图谱在数字人文中的应用研究CHEN Tao,LIU Wei,SHAN Rongrong & ZHU Qinghua: Application of Knowledge Graph in Digital Humanities
2019 年 11 月 November,2019
and development of Google’s knowledge graph.In addition, this paper also proposes a system framework for applying knowledge graph to digital
humanities research. Simultaneously,we also point out that digital generation,textual conversion,dataextraction and intelligent construction are the main stages of research and development in the humanitiesfield. Compared with most humanities research abroad in the textual stage,much humanities research inChina are still in the digital stage,which is far from the research stage of smart data.Based on the theoretical basis of the study of smart data of digital humanities,this paper builds a linked
data platform ( CBDBLD) of Chinese Biographical Database ( CBDB) . The seven-step method adopted inthe platform construction is representative and has been used in many digital humanities research projects,which can guide the semantic construction of domestic digital humanities research. This platform containsmore than 420,000 biographical data,about 22. 7 million triples,and is associated with open relateddatasets such as Shanghai Library Authority Name Files and VIAF ( Virtual International Authority File) .CBDBLD dataset contains ten categories of nearly 500 kinds of social relations. Further,this platform usesthe concept of knowledge graph and visualization technology to show the rich relatives and social relationsbetween characters. This platform forms a unique social network,and improves the dynamic interactionability of user’s experience and platform.Knowledge computing and knowledge reasoning are the core technologies involved in the application of
knowledge graph,which are widely studied in the application of generalized knowledge maps. However,little research has been done on linked data and digital humanities. Most of the digital humanities researchin China uses linked data technology to publish and display metadata,which can be regarded as the basis ofknowledge graph application. Nevertheless,it does not represent the whole knowledge graph research. Inthis paper,the CBDBLD platform uses a general rule reasoner to support user-defined rule-based reasoningwhich implements the mining and presentation of implicit relationships between characters. Although thecurrent reasoning is relatively simple,it provides a new research direction for digital humanities research.The abundant graph mining and graph computing algorithms in the research of generalized knowledge atlascan be applied to the linked data,which is also the future research and practice direction of this paper’sauthors.It can be said that both semantic knowledge graph and generalized knowledge graph can promote the
innovation of digital humanities research methods. The combination of the two techniques will become thenext hotspot in the field of digital humanities,and brings a new era of digital humanities research. 10 figs. 2tabs. 25 refs.KEY WORDSDigital humanities. Knowledge graph. Linked data. Knowledge inference. China BiographicalDatabase ( CBDB) .
0 引言
随着互联网的快速发展,网络中的数据内
容呈现出爆炸式增长的态势。与此同时,互联
网内容的大规模、异质多元、组织结构松散等特
点,给人们有效获取信息和知识提出了挑战。
而知识图谱则以其强大的语义处理能力和开放
035
Journal of Library Science in China
总第四十五卷 第二四四期 Vol. 45. No. 244
组织能力,为互联网时代的知识化组织和智能
应用奠定了基础[1]。知识图谱不仅可以将互联
网中的信息表达成更接近人类认知世界的形
式,而且提供了一种更好的组织、管理和利用海
量信息的方式。其发展得益于多个研究领域的
成果,是知识库、自然语言处理、语义网技术、机器学习、数据挖掘等众多知识领域交叉融合的
产物。作为人工智能时代最重要的知识表示方
式之一,知识图谱能够打破不同场景下的数据
隔离,为搜索、推荐、问答、解释与决策等应用提
供基础支撑。但目前学界对知识图谱的理解比
较混乱,主要存在“知识地图 ( Knowledge Map,
KM) ”“知识图谱( Knowledge Graph,KG) ”和“图
数据库( Graph Database,GD) ”三种认知,时常混
为一谈。
知识地图( KM) 主要是指针对大量科学文
献信息,借助于统计学、图论、计算机技术等手
段,以可视化的方式来展示科学学科体系的内
在结构( 主题共现、合作团队、引用关系等) 、学
科特点、前沿热点、发展趋势等信息的一种计量
学方法[2]。严格上讲,知识地图只是作为一种
计量学方法,不能称为知识图谱。
谷歌于 2012 年提出一种在万维网上编码并
关联碎片化知识单元的一种方案,该方案本质
上是一种由知识点相互连接而成的语义网络,
主要用于提升搜索引擎性能,通过描述现实世
界中的实体及其关系,让用户能够更快更简单
地发现新的信息和知识[3]。知识图谱( KG) 要
求以 RDF 三元组模型表达“实体—属性”和属
性值( Statement ) ,推 荐 以 规 范 的 词 表 模 式 ( 即
Schema.org①) 描述各类事物( 人、地、事件等) ,
以 Microdata、RDFa、JSON-LD 等方式进行三元组
编码,使相关语义信息能够包含于网页之中并
相互关联,并支持搜索引擎进行知识发现、索引
以及可视化呈现。在谷歌发布知识图谱之前,
Tim Berners-Lee 早在 2006 年提出了“关联数据”
概念,这是一种万维网上创建语义关联的方法。
关联数据旨在通过 URI 和本体让机器读懂知
识,用于推动数据公开,建立数据之间的链接以
形成数据关系网( Web of Data) [4]。关联数据描
述了通过可连接的 URI 发布来链接网络中各类
资源的方法,可以看出,知识图谱其实就是在关
联数据的基础上提出和发展的。由于知识图谱
使用了 RDF 三元组模型,并支持机器语义描述,
因此可看作是基于语义的知识图谱,在学界常
被称为“关联数据( Linked Data) ”,严格来讲,只
有这种图谱才能被称为知识图谱。关联数据常
使用 RDF 数据库( Triplestore) 进行存储,本文讨
论的知识图谱主要指语义知识图谱。
图数据库是以图形方式表示节点、属性和
关系并进行存储和提供管理功能的数据库,如
Neo4j、ArangoDB 等,属于 NoSQL 的一种( 其他
还有键值对 Key-Value、列存储数据库、文档型
数据库三种) ,其作为大数据的一种重要支撑
技术能够提供完善的图查询语言和丰富的图
挖掘算法。图数据库的结构定义相比 RDF 数
据库更 为 通 用,可 存 储 通 用 的 三 元 组 ( S,P,
O) 数据,工业界目前谈论的知识图谱主要属于
这一类。学术界和工业界在使用“知识图谱”
表述时,往 往 不 严 格 区 分 两 种 存 储 方 案 的 区
别,常常把两者混在一起,统称为知识图谱,因
此采用图数据库构建的知识图谱可看成是广义
的知识图谱。
知识图谱一经提出便迅速成为工业界和学
术界的研究热点,涌现出大量的知识图谱应用
和知识库。目前,微软和谷歌拥有全世界最大
的通用知识图谱,Facebook 拥有全世界最大的社
交知识图谱,阿里巴巴和亚马逊则分别构建了
庞大的商品知识图谱,百度致力于构建最大最
全的中文知识图谱,美团 NLP 中心正在构建全
世界最大的餐饮娱乐知识图谱“美团大脑”。此
外,DBpedia、Freebase、Yago 等大规模链接数据
库( 知识图谱) 已成为众多知识库链接的首选目
036
① https: / / schema.org
陈 涛 刘 炜 单蓉蓉 朱庆华: 知识图谱在数字人文中的应用研究CHEN Tao,LIU Wei,SHAN Rongrong & ZHU Qinghua: Application of Knowledge Graph in Digital Humanities