Top Banner
Ontology Engineering for Big Data Kouji Kozaki The Institute of Scientific and Industrial Research (I.S.I.R), Osaka University, Japan 2013/09/03 1 Ontology and Semantic Web for Big Data (ONSD2013) Workshop in the 2013 International Computer Science and Engineering Conference ICSEC2013), Bangkok, Thailand, 5 th Sep. 2013 ONSD2013@ICEC2013
68

Ontology Engineering for Big Data

Jan 20, 2015

Download

Technology

Kouji Kozaki

For efficient and innovative use of big data, it is important to integrate multiple data bases across domains. For example, various public data bases are developed in life science, and how to find a novel scientific result using them is an essential technique. In social and business areas, open data strategies in many countries promote diversity of public data, how to combine big data and open data is a big challenge. That is, diversity of dataset is a problem to be solved for big data.
Ontology gives a systematized knowledge to integrate multiple datasets across domains with semantics of them. Linked Data also provides techniques to interlink datasets based on semantic web technologies. We consider that combinations of ontology and Linked Data based on ontological engineering can contribute to solution of diversity problem in big data.
In this talk, I discuss how ontological engineering could be applied to big data with some trial examples.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Ontology Engineering for Big Data

Ontology Engineering for Big Data

Kouji KozakiThe Institute of Scientific and Industrial Research (I.S.I.R),

Osaka University, Japan

2013/09/03 1

Ontology and Semantic Web for Big Data (ONSD2013) Workshop in the 2013 International Computer Science and Engineering Conference ( ICSEC2013), Bangkok, Thailand, 5th Sep. 2013

ONSD2013@ICEC2013

Page 2: Ontology Engineering for Big Data

Self introduction: Kouji KOZAKI

Brief biography 2002 Received Ph.D. from Graduate School of Engineering,

Osaka University. 2002- Assistant Professor, 2008- Associate Professor in ISIR,

Osaka University. Specialty

Ontological Engineering Main research topics

Fundamental theories of ontological engineering

2013/09/03 2ONSD2013@ICEC2013

Page 3: Ontology Engineering for Big Data

Ontological topics Some examples of topics which I work

on Definition of disease

What’s “disease” ? What’s “causal chain” ? Is it a object or process ?

Role theory What’s ontological difference among the following

concepts? Person Teacher Walker Murderer Mother

2013/09/03 3

…. Natural type

Role (dependent concept)

ONSD2013@ICEC2013

Page 4: Ontology Engineering for Big Data

Self introduction: Kouji KOZAKI

Brief biography 2002 Received Ph.D. from Graduate School of Engineering, Osaka University. 2002- Assistant Professor, 2008- Associate Professor in ISIR, Osaka University.

Specialty Ontological Engineering

Main research topics Fundamental theories of ontological engineering Ontology development tool based on the ontological theories Ontology development in several domains and ontology-based application

Hozo( 法造 ) -an environment for ontology building/using- (1996- ) A software to support ontology ( = 法) building ( = 造)

and use It’s available at http://www.hozo.jp   as a free software

Registered Users : 3,500 (June 2012) Java API for application development is provided. Support formats: Original format, RDF(S), OWL. Linked Data publishing support is coming soon.

2013/09/03 4ONSD2013@ICEC2013

Page 5: Ontology Engineering for Big Data

My history on Ontology Building

2002-2007 Nano technology ontology Supported by NEDO(New Energy and Industrial Technology Development Organization)

2006- Clinical Medical ontology Supported by Ministry of Health, Labour and Welfare, Japan Cooperated with: Graduate School of Medicine, The University of Tokyo.

2007-2009 Sustainable Science ontology Cooperated with: Research Institute for Sustainability Science, Osaka Univ.

2007-2010  IBMD(Integrated Bio Medical Database) Supported by MEXT through "Integrated Database Project". Cooperated with: Tokyo Medical and Dental University, Graduate School of Medicine, Osaka U.

2008-2012 Protein Experiment Protocol ontology Cooperated with: Institute for Protein Research, Osaka Univ.

2008-2010 Bio Fuel ontology Supported by the Ministry of Environment, Japan.

2009-2012 Disaster Risk ontology Cooperated with: NIED (National Research Institute for Earth Science and Disaster Prevention)

2012- Bio mimetic ontology Supported by JSPS KAKENHI Grant-in-Aid for Scientific Research on Innovative Areas

2012- Ontology of User Action on Web Cooperated with: Consumer first Corp.

2013- Information Literacy ontology Supported by JSPS KAKENHI2013/09/03 5ONSD2013@ICEC2013

Page 6: Ontology Engineering for Big Data

Agenda (1) Motivation

Ontology vs. Big Data How we can use ontology for big data?

(2) Case Studies towards Ontology Engineering for Big Data Ontology Exploration according to the users

viewpoints A Disease Ontology developed in Japanese Medical

Ontology Project

(3) Concluding Remarks

2013/09/03 6ONSD2013@ICEC2013

Page 7: Ontology Engineering for Big Data

Ontology vs. Big Data

Question Is Ontology useful for Big Data?

My answer : (I believe) Yes Combination of ontology and Big Data could

provide new solutions for many problem.

2013/09/03 7

Ontology Not so big. (someone is big) Built by hands. Used based on

semantics by reasoning.

Big Data Very big.

Collected automatically.

Used without semantics by Machine Learning or Data mining.

ONSD2013@ICEC2013

Page 8: Ontology Engineering for Big Data

How to combine Ontology and Big Data

Basic technology Mapping ontology to database

Mapping classes (concepts) defined in ontology to database schema

Mapping classes/instances defined in ontology to data in DB

Add metadata on data using vocabulary defined in ontology

e.g. annotation on document such as webpage, paper etc. Convert database (e.g. RDB) to ontology-based

(RDF) database e.g. linked data such as DBPedia, some bioinformatics DB,

etc. You can choose some of these technology

according to your purpose2013/09/03 ONSD2013@ICEC2013 8

Page 9: Ontology Engineering for Big Data

How to combine Ontology and Big Data

Basic technology Mapping ontology to database

Mapping classes (concepts) defined in ontology to database schema

Mapping classes/instances defined in ontology to data in DB

Add metadata on data using vocabulary defined in ontology

e.g. annotation on document such as webpage, paper etc. Convert database (e.g. RDB) to ontology-based

(RDF) database e.g. linked data such as DBPedia, some bioinformatics DB,

etc. You can choose some of these technology

according to your purpose2013/09/03 ONSD2013@ICEC2013 9

Case StudyA method for mapping Abnormality Ontology (in medical domain) to medical database

Page 10: Ontology Engineering for Big Data

hypertension

Classification of Abnormality Representations 1

blood pressure 200 mmHg

 blood pressure is high

Various types of abnormality representations

are used in medical domain

blood glucose level 150 mm/dL

blood glucose level is high

hyperglycemia

2013/09/03 10

ONSD2013@ICEC2013

Page 11: Ontology Engineering for Big Data

11

Classification of Abnormality representations 2

※Based on quality and quantity ontologies in the Upper Ontology “YAMATO”.

Propertyrepresentati

on

Quantitativerepresentati

on

blood pressure 200 mmHg

blood glucose level 150 mm/dL  

Qualitative representati

on

blood pressure is high

blood glucose level is high

hypertension

hyperglycemia

☑DiagnosisIdentify a concrete

value for each patient in clinical tests

☑Definition of disease

2013/09/03 ONSD2013@ICEC2013

Abnormality Ontology

MedicalDatabase

Mapping

Page 12: Ontology Engineering for Big Data

Structural abnormality

Sizeabnormalit

y

Formational

abnormality

Conformational abnormality

Small in size

Small in line

Small in area

Small in volume

Narrowing tube

Vascular stenosis Gastrointestinal tract stenosis

Arterial stenosis … Intestinal stenosis

Layer 1 :Generic Abnormal States (Object-independent)

Layer 3:Specific context-dependent Abnormal States

Coronary stenosis in

Angina pectoris

Coronary stenosisin

Arteriosclerosis

Intestinal stenosisin

Ileus

Esophageal stenosis in

Esophagitis

Esophageal stenosis

is-a

Materialabnormality

Largein size

diseasedependent

Blood vesseldependent

Topological abnormalit

y ……

Is-a hierarchy of Abnormality Ontology

12

Tube-dependent…

Narrowing of valve

Layer2 :Object-dependentAbnormal States

……

Coronary stenosis

2013/09/03

Page 13: Ontology Engineering for Big Data

How can we deal with clinical test data ?

• In hospitals, huge volume of diagnostic/clinical test data have been accumulated.

• Most are quantitative data: e.g., blood prresure 180mmHg, blood cross-sectional area 40 mmx2,

  Quantitative value Qualitative value 180mmHg (Vqt) high (Vql)

Quantitative value:180

mmhgThreshold value

blood pressure

high

13

high

e.g., 140mmhg

2013/09/03

Page 14: Ontology Engineering for Big Data

blood pressure

Attribute (A)

high

Value (V)

Basic policy for definition of abnormal states

hypertension

Property (P)

A property is decomposed into a tuple: <Attribute (A), Attribute Value (V)> in a qualitative

form.  14

Qualitative representation can be converted into a Property representation.

2013/09/03

Page 15: Ontology Engineering for Big Data

Quantity

Property

blood pressure 180 mmhgcross-section area  xxcmx2

abnormality

knowledge

Clinical test data

blood pressure high

cross-section area small

HypertensionNarrowing  

Quality

Our model enables “Interoperability” from Clinical test data to conceptual knowledge about abnormal States. 15

Qualitative representation can be converted Quantitative data to Property representation.

2013/09/03

Page 16: Ontology Engineering for Big Data

How to combine Ontology and Big Data

Basic technology Mapping ontology to database

Mapping classes (concepts) defined in ontology to database schema

Mapping classes/instances defined in ontology to data in DB

Add metadata on data using vocabulary defined in ontology

e.g. annotation on document such as webpage, paper etc. Convert database (e.g. RDB) to ontology-based

(RDF) database e.g. linked data such as DBPedia, some bioinformatics DB,

etc. You can choose some of these technology

according to your purpose2013/09/03 ONSD2013@ICEC2013 16

Case StudyAnnotation on web browsing history of users based on Web User Action Ontology

Page 17: Ontology Engineering for Big Data

0

5

10

15

20

25

30

35

40

会議毎の利用タイプの推移

(9) Knowledge Systematization

(8) Knowledge Modeling

(7) Information Extraction

(6) Semantic Analysis

(5) Knowledge Sharing

(4) Data Schema

(3) Index

(2) Search

(1) Common Vocabulary

The amount of papers surveyed in each conference9 19 18 24 25 11 23 26 17 18T

he amountsof typ

es of usage説明 Web行動オントロジーとの対応

タイトル URL Webで示される サイトのタイトル -

Webサイトのカテゴリor( インスタンス)

Web etc.) サイトの種類(例:宿泊ポータル,宿泊施設の個別サイトニュースサイト,ブログ, まYahoo!たは,メジャーサイトの場合はインスタンス名(例:楽天トラベル, )

行為系列名 CVタスクにおける部分行為系列名→ CV CV CVトリガー行為系列, 前行為系列,目的 系列, 後行為系列,のいずれか

RH構成行為 行為系列における部分行為の役割

CC構成行為 行為系列における部分行為となる「行為の種類」

構成行為の部分行為 構成行為が複合行為(現状は,情報収集行為のみ)のとき,その複合行為における役割→ 現状は,情報源取得,条件入力,候補情報一覧の閲覧,個別情報の閲覧,のいずれか

対象物 Webサイトが記述の対象としているインスタンス名 ※ Web要検討( ページの「対象」)

対象カテゴリ Webサイトが記述の対象の種類 ※ 要検討(クラスを追加?)

対象情報カテゴリ 行為(主に情報収集行為を想定)が対象とする情報の種類

地名(都道府県) Webサイトの記述対象の地理情報(都道府県名レベルを想定) ※ 要検討?

ランドマーク Webサイトの記述対象の地理情報において,ランドマークとなるもの(観光地名,施設名など) ※ 要検討?

CV条件URL(今回は対象外、 解

析の結果を使う

コンバージョンを行う際に設定した条件(宿泊日,価格,施設の種類など)※URL解析の結果を使うことを想定 -

説明 Web行動オントロジーとの対応

タイトル URL Webで示される サイトのタイトル -

Webサイトのカテゴリor( インスタンス)

Web etc.) サイトの種類(例:宿泊ポータル,宿泊施設の個別サイトニュースサイト,ブログ, まYahoo!たは,メジャーサイトの場合はインスタンス名(例:楽天トラベル, )

行為系列名 CVタスクにおける部分行為系列名→ CV CV CVトリガー行為系列, 前行為系列,目的 系列, 後行為系列,のいずれか

RH構成行為 行為系列における部分行為の役割

CC構成行為 行為系列における部分行為となる「行為の種類」

構成行為の部分行為 構成行為が複合行為(現状は,情報収集行為のみ)のとき,その複合行為における役割→ 現状は,情報源取得,条件入力,候補情報一覧の閲覧,個別情報の閲覧,のいずれか

対象物 Webサイトが記述の対象としているインスタンス名 ※ Web要検討( ページの「対象」)

対象カテゴリ Webサイトが記述の対象の種類 ※ 要検討(クラスを追加?)

対象情報カテゴリ 行為(主に情報収集行為を想定)が対象とする情報の種類

地名(都道府県) Webサイトの記述対象の地理情報(都道府県名レベルを想定) ※ 要検討?

ランドマーク Webサイトの記述対象の地理情報において,ランドマークとなるもの(観光地名,施設名など) ※ 要検討?

CV条件URL(今回は対象外、 解

析の結果を使う

コンバージョンを行う際に設定した条件(宿泊日,価格,施設の種類など)※URL解析の結果を使うことを想定 -

説明 Web行動オントロジーとの対応

タイトル URL Webで示される サイトのタイトル -

Webサイトのカテゴリor( インスタンス)

Web etc.) サイトの種類(例:宿泊ポータル,宿泊施設の個別サイトニュースサイト,ブログ, まYahoo!たは,メジャーサイトの場合はインスタンス名(例:楽天トラベル, )

行為系列名 CVタスクにおける部分行為系列名→ CV CV CVトリガー行為系列, 前行為系列,目的 系列, 後行為系列,のいずれか

RH構成行為 行為系列における部分行為の役割

CC構成行為 行為系列における部分行為となる「行為の種類」

構成行為の部分行為 構成行為が複合行為(現状は,情報収集行為のみ)のとき,その複合行為における役割→ 現状は,情報源取得,条件入力,候補情報一覧の閲覧,個別情報の閲覧,のいずれか

対象物 Webサイトが記述の対象としているインスタンス名 ※ Web要検討( ページの「対象」)

対象カテゴリ Webサイトが記述の対象の種類 ※ 要検討(クラスを追加?)

対象情報カテゴリ 行為(主に情報収集行為を想定)が対象とする情報の種類

地名(都道府県) Webサイトの記述対象の地理情報(都道府県名レベルを想定) ※ 要検討?

ランドマーク Webサイトの記述対象の地理情報において,ランドマークとなるもの(観光地名,施設名など) ※ 要検討?

CV条件URL(今回は対象外、 解

析の結果を使う

コンバージョンを行う際に設定した条件(宿泊日,価格,施設の種類など)※URL解析の結果を使うことを想定 -

説明 Web行動オントロジーとの対応

タイトル URL Webで示される サイトのタイトル -

Webサイトのカテゴリor( インスタンス)

Web etc.) サイトの種類(例:宿泊ポータル,宿泊施設の個別サイトニュースサイト,ブログ, まYahoo!たは,メジャーサイトの場合はインスタンス名(例:楽天トラベル, )

行為系列名 CVタスクにおける部分行為系列名→ CV CV CVトリガー行為系列, 前行為系列,目的 系列, 後行為系列,のいずれか

RH構成行為 行為系列における部分行為の役割

CC構成行為 行為系列における部分行為となる「行為の種類」

構成行為の部分行為 構成行為が複合行為(現状は,情報収集行為のみ)のとき,その複合行為における役割→ 現状は,情報源取得,条件入力,候補情報一覧の閲覧,個別情報の閲覧,のいずれか

対象物 Webサイトが記述の対象としているインスタンス名 ※ Web要検討( ページの「対象」)

対象カテゴリ Webサイトが記述の対象の種類 ※ 要検討(クラスを追加?)

対象情報カテゴリ 行為(主に情報収集行為を想定)が対象とする情報の種類

地名(都道府県) Webサイトの記述対象の地理情報(都道府県名レベルを想定) ※ 要検討?

ランドマーク Webサイトの記述対象の地理情報において,ランドマークとなるもの(観光地名,施設名など) ※ 要検討?

CV条件URL(今回は対象外、 解

析の結果を使う

コンバージョンを行う際に設定した条件(宿泊日,価格,施設の種類など)※URL解析の結果を使うことを想定 -

Web browsing history (access logs) of usersList of all URLs the user accessed for 130M users × 2  years

Web User Action Ontology

Analysis of consumption behavior

Annotation on web browsing history of users based on ontology

This is collaborative work with Consumer first, Inc.

Page 18: Ontology Engineering for Big Data

Basic Idea The format of the access logs (Web browsing history)

of users provided by Consumer first, Inc. User id, access date and time, URL …

Problem URL is meaning less string for human while someone guess its

contents if it is famous site. Diversity of access logs. In order to analyze them, we need consistent meaning.

Annotations on the access log

We tried to add metadata which present human understandable meaning of each URL

We also developed a prototype of automatic annotation Its recall and relevance rate is almost 0.7 ~ 0.9 We think this result is not bad for statistical analysis.

2013/09/03 ONSD2013@ICEC2013 18

Page 19: Ontology Engineering for Big Data

Ontology Engineeringfor Big Data

Basic technology = How to combine Ontology and Big Data

Mapping ontology to database Add metadata on data using vocabulary defined in

ontology Convert database (e.g. RDB) to ontology-based

(RDF) database

How to use Combinations of Ontology and Big Data Ontology can provide semantics to add raw data. Generalized concepts in ontology can connect data in

various concept levels across domains. We can use ontology as given (and authorized)

knowledge to analysis big data.2013/09/03 19ONSD2013@ICEC2013

Page 20: Ontology Engineering for Big Data

Ontology Engineeringfor Big Data

Features of ontology in class level It reflects understanding of the target world. Well organized ontologies have generalized rich

knowledge based on consistent semantics. Ontologies are systematized knowledge of

domains.

Combination of ontology and big data Ontology can provide semantics to add raw data. Generalized concepts in ontology can connect data

in various concept levels across domains. We can use ontology as given (and authorized)

knowledge to analysis big data.2013/09/03 20ONSD2013@ICEC2013

Page 21: Ontology Engineering for Big Data

Two possible way to use ontology for big data

Metadata

...

LOD(Linked Open Data)

Ontology

Big Data

Ontology

Use ontology to bridge datasets across domains

Use ontology to combine deep domain knowledge and raw data

2013/09/03 21ONSD2013@ICEC2013

Page 22: Ontology Engineering for Big Data

Case studies Use ontology to bridge datasets

across domains Understanding an Ontology through

Divergent Exploration Presented at ESWC2011

Use ontology to combine deep domain knowledge and raw data Japanese Medical Ontology project

Disease ontology and Ontology of Abnormal State

presented at ICBO (International Conference on Biomedical Ontology) 2011, 2012 and 2013

2013/09/03 22ONSD2013@ICEC2013

Page 23: Ontology Engineering for Big Data

Use ontology to bridge datasets across domains

Basic technology Terms (classes/instances) defined in ontology are used as

common vocabulary for search data. If the ontology has mapping to Multiple DBs, the user can

search across them.

Motivation and Issue Combinations of multiple datasets could be valuable for Big Data Analysis.

e.g. climate and agriculture, healthcare and life science, etc.

However, to get all combinations acrossmultiple Big Data is not realistic for their size. Requests by the users are also very differentaccording to their interests. It is important to consider efficient method to obtain meaningful combinations. 2013/09/03 ONSD2013@ICEC2013 23

Ontology

Documents / Law Data

Search

Search across multiple DBs

Common Vocabulary

Raw

Page 24: Ontology Engineering for Big Data

A method to obtain meaningful combinations using ontology exploration

2013/09/03 24

Problem Setting

Problem Solution

Innovation

Layer 0

Layer 1

Layer 2

Layer 3

Layer 4

Contents Managementusing the Metadata

Map GenerationDepending onViewpoints

Comparison andConvergenceof multiple Maps

Context Based Convergence

DivergentExploration

Ontology-basedInformationRetrieval

An ontology presents an explicit essential understanding of the target world. It provides a base knowledge to be shared among the users.

They explore the ontology according to their viewpoint and generate conceptual maps as the result.These maps represent understanding from the their own viewpoints.

They explore the ontology according to their viewpoint and generate conceptual maps as the result.These maps represent understanding from the their own viewpoints.

They can use the maps as viewpoints (combinations) to get data from multiple DBs.

They can use the maps as viewpoints (combinations) to get data from multiple DBs.

ONSD2013@ICEC2013

Page 25: Ontology Engineering for Big Data

(Divergent) Ontology exploration tool

Exploration of an ontology

“Hozo” – Ontology Editor

Multi-perspective conceptual chains represent the explorer’s understanding of ontology from the specific viewpoint. Conceptual maps

Visualizations as conceptual maps from different view points

1) Exploration of multi-perspective conceptual chains2) Visualizations of conceptual chains

2013/09/03 25ONSD2013@ICEC2013

Page 26: Ontology Engineering for Big Data

Referring to another concept

2013/09/03 26

Node represents a

concept(=rdfs:Class)

slot represents a relationship

(=rdf:Property)

Is-a (sub-class-of) relationshp

ONSD2013@ICEC2013

Page 27: Ontology Engineering for Big Data

272013/09/03 ONSD2013@ICEC2013

Page 28: Ontology Engineering for Big Data

2013/09/03 28

Aspect dialog

constriction tracing classes

Option settings for exploration

property names

Conceptual map visualizer

Kinds of aspects

Selected relationships are traced and shown as links in conceptual map

ONSD2013@ICEC2013

Page 29: Ontology Engineering for Big Data

29

Explore the focused (selected) path.

2013/09/03 ONSD2013@ICEC2013

Page 30: Ontology Engineering for Big Data

Functions for ontology exploration

Exploration using the aspect dialog: Divergent exploration from one concept using the

aspect dialog for each step Search path:

Exploration of paths from stating point and ending points.

The tool allows users to post-hoc editing for extracting only interesting portions of the map.

Change view: The tool has a function to highlight specified paths of

conceptual chains on the generated map according to given viewpoints.

Comparison of maps: The system can compare generated maps and show the

common conceptual chains both of the maps. 2013/09/03 30

Manual exploration

Machine exploration

ONSD2013@ICEC2013

Page 31: Ontology Engineering for Big Data

2013/09/03 31

Ending point (1)

Ending point (3)Ending point (2)

Search Path

Starting point

Selecting of ending pointsFinding all possible paths from stating point to ending points

ONSD2013@ICEC2013

Page 32: Ontology Engineering for Big Data

2013/09/03 32

Search Path

Selected ending points

ONSD2013@ICEC2013

Page 33: Ontology Engineering for Big Data

2013/09/03 33

What does the result mean?

Selected ending points

ONSD2013@ICEC2013

Problem

Kinds of method to solve the problem

Possible combination of them

Page 34: Ontology Engineering for Big Data

DEMO: Ontology Exploration

2013/09/03 34ONSD2013@ICEC2013

Page 35: Ontology Engineering for Big Data

Usage and evaluation of ontology exploration tool Step 1: Usage for knowledge structuring in

sustainability science

Step 2: Verification of exploring the abilities of the ontology exploration tool

Step 3: Experiments for evaluating the ontology exploration tool

2013/09/03 35ONSD2013@ICEC2013

Page 36: Ontology Engineering for Big Data

Sustainability Science

Sustainability Science probes interactions between global, social, and human systems, the complex mechanisms that lead to degradation of these systems, and concomitant risks to human well-being.

The journal provides a platform for building sustainability science as a new academic discipline.

These include endeavors to simultaneously understand phenomena and solve problems, uncertainty and application of the precautionary principle, the co-evolution of knowledge and recognition of problems, and trade-offs between global and local problem solving.

Volume 1 / 2006 - Volume 8 / 2013Editor-in-Chief: Kazuhiko TakeuchiManaging Editor: Osamu SaitoISSN: 1862-4065 (print version)ISSN: 1862-4057 (electronic version)

36

Page 37: Ontology Engineering for Big Data

Knowledge Structuring in Sustainability Science Sustainability Science (SS)

– We aimed at establishing a new interdisciplinary scheme that serves as a basis for constructing a vision that will lead global society to a sustainable one.

– It is required an integrated understanding of the entire field instead of domain-wise knowledge structuring.

Sustainability science ontology – Developed in collaboration with domain expert in

Osaka University Research Institute for Sustainability Science (RISS).

– Number of concepts : 649, Number of slots :1,075

Usage of the ontology exploration tool– It was confirmed that the exploration was fun for

them and the tool had a certain utility for achieving knowledge structuring in sustainability science. [Kumazawa 2009]

http://en.ir3s.u-tokyo.ac.jp/about_sus

Sustainability Science

37

Page 38: Ontology Engineering for Big Data

Biofuel Use Strategies for Sustainable Development (BforSD, FY2008-FY2010)

Development of the ontology-based mapping system which create comprehensive views of problems and policy measures on biofuel

(1) Structuring biofuel problems: Develop the biofuel ontology which explicitly conceptualizes biofuel problems through literature review and interviews

(2)Develop an ontology exploration tool which interactively generates conceptual maps with paths between concepts in the biofuel ontology

(3)In collaboration with other sub-themes, develop an application method of this map tool for policy making support to find, frame and prioritize relevant problems and policy measures.

(source) US DOE

38

One of the sub-themes

Page 39: Ontology Engineering for Big Data

Usage and evaluation of ontology exploration tool Step 1: Usage for knowledge structuring in

sustainability science

Step 2: Verification of exploring the abilities of the ontology exploration tool

Step 3: Experiments for evaluating the ontology exploration tool

2013/09/03 39ONSD2013@ICEC2013

Page 40: Ontology Engineering for Big Data

Verification of Ontology Exploration Tool Verification methods

1) Enrichment of SS ontologyWe enriched the SS ontology on the basis of 29 typical scenarios (cases) structured by domain experts in biofuel through literature review and interviews

Scenarios

(Cases)Scenarios

(Cases)Scenarios

(Cases)Scenarios

(Cases)Scenarios

(Cases)Scenarios

(Cases)

29 scenarios(cases)

27 conceptual maps

40

Page 41: Ontology Engineering for Big Data

1) Energy services for the poor

(+/−) Competition of biomass energy systems with the present use of biomass resources (such as agricultural residues) in applications such as animal feed and bedding, fertilizer, and construction materials 1

(−) In many developing countries, small-scale biomass energy projects face challenges obtaining finance from traditional financing institutions1

(−) Liquid biofuels are likely to replace only a small share of global energy supplies and cannot alone eliminate our dependence on fossil fuels2

2) Agro-industrial development and job creation

(+) Biofuel is powering new small- and large-scale agro-industrial development and spawning new industries in industrialized and developing countries1

(+/−) In the short-to-medium term, bioenergy use will depend heavily on feedstock costs and reliability of supply, cost and availability of competing energy sources, and government policy decisions1

(+) In the longer term, the economics of biofuel will probably improve as agricultural productivity and agro-industrial efficiency improve, more supportive agricultural and energy policies are adopted, carbon markets mature and expand, and new methodologies for carbon sequestration accounting are developed1

(+) In the longer term, expanded demand and increased prices for agricultural commodities may represent opportunities for agricultural and rural development2

(+) Biofuel industries create jobs, including highly skilled science, engineering, and business-related employment; medium-level technical staff; low-skill industrial plant jobs; and unskilled agricultural labor 1

(+/−) Small-scale and labor intensive production often lead to trade-offs between production efficiency and economic competitiveness 1

3) Health and gender

(−) Market opportunities cannot overcome existing social and institutional barriers to equitable growth, with exclusion factors such as gender, ethnicity, and political powerless, and may even worsen them 2

(−) Forest burning for development of feedstock plantation and sugarcane burning to facilitate manual harvesting result in air pollution, higher surface water runoff, soil erosion, and unintended forest fires3,4

(−) Exploitation of cheap labor (plantation and migrant workers)4

(−) Increased use of pesticides could create health hazards for labors and communities living near areas of feedstock production 1,3

4) Agricultural structure

(−) The demand for land to grow biofuel crops could put pressure on competing land usage for food crops, resulting in an increase in food prices1,2

(+/−) Significant economies of scale can be gained from processing and distributing biofuels on a large scale. The transition to liquid biofuels can be harmful to farmers who do not own their own land, and to the rural and urban poor who are net buyers of food 1

(−) While global market forces could lead to new and stable income streams, they could also increase marginalization of poor and indigenous people and affect traditional ways of living if they end up driving small farmers without clear titles from their land and destroying their livelihood1

(+): Positive effects , (−): Negative effects , (+/−): Both positive and negative effects(Source) 1: UN-Energy (2007), 2: FAO (2008), 3: CBD (2008), 4: Martinelli et al. (2008)

Positive and negative effects of biofuel

41

Page 42: Ontology Engineering for Big Data

5) Food security (−) Demand for agricultural feedstock for liquid biofuels will be a significant factor for agricultural markets and world agriculture over the next decade and perhaps beyond2

(−) Rapid growing demand for biofuel feedstock has contributed to higher food prices, which poses an immediate threat to the food security of poor net food buyers in both urban and rural areas2

(+/−) The effect of biofuels on food security is context-specific, depending on the particular technology and country characteristics involved1

6) Government budget

(−) Because ethanol is used largely as a substitute for gasoline, providing a large tax reduction for blending ethanol and gasoline reduces government revenue from this tax, mainly targeting the non-poor1

(−) Production of biofuels in many countries, except sugarcane-based ethanol production in Brazil, is not currently economically viable without subsidies, given existing agricultural production and biofuel-processing technologies and recent relative prices of commodity feedstock and crude oil2

(−) Policy intervention, especially in the form of subsidies and mandated blending of biofuels with fossil fuels, are driving the rush to liquid biofuels, which leads to high economic, social, and environmental costs in both developed and developing countries 2

7) Trade, foreign exchange balance, and energy security

(+) Diversifying global fuel supplies could have beneficial effects on the global oil market and many developing countries because fossil fuel dependence has become a major risk for many developing economies 1

(+/−) Rapidly rising demand for ethanol has had an impact on the price of sugar and maize in recent years, bringing substantial rewards to farmers not only in Brazil and the United States but around the world1,2

(−) Linking of agricultural prices to the vicissitudes of the world oil market clearly presents risks; however, it is an essential transition to the development of a biofuel industry that does not rely on major food commodity crops 1

8) Biodiversity and natural resource management

(+/−) Depending on the types of crop grown, what they replaced, and the methods of cultivation and harvesting, biofuels can have negative and positive effects on land use, soil and water quality, and biodiversity 1,3

(−) Problems with water availability and use may represent a limitation on agricultural biofuel production 1,3

(−) Introduction of criteria, standards, and certification schemes for biofuels may generate indirect negative environmental and biodiversity effects, passively in other countries3

(−) If the production of biofuel feedstock requires increased fertilizer and pesticide use, there could be additional detrimental effects such as increase in GHGs emission and eutrophicating nutrients and biodiversity loss 3

(−) Wild biodiversity is threatened by loss of habitat when the area under crop production is expanded, whereas agricultural biodiversity is vulnerable in the case of large-scale monocropping, which is based on a narrow pool of genetic material, and can also lead to reduced use of traditional varieties2,3

(+) If crops are grown on degraded or abandoned land, such as previously deforested areas or degraded crop- and grasslands, and if soil disturbances are minimized, feedstock production for biofuels can have a positive impact on biodiversity by restoring or conserving habitat and ecosystem function3

9) Climate change

(+/−) Full lifecycle GHG emissions of biofuel vary widely based on land use changes, choice of feedstock, agricultural practices, refining or conversion processes, and end-use practices1,2

(−) Land use change associated with production of biofuel feedstock can affect GHG emissions; draining wetlands and clearing land with fire are detrimental with regard to GHG emissions and air quality2,3

(−) The greatest potential for reducing GHG emission comes from replacement of coal rather than petroleum fuels 1

(+) Biofuels offer the only realistic near-term renewable option for displacing and supplementing liquid transport fuels 1

(+): Positive effects , (−): Negative effects , (+/−): Both positive and negative effects(Source) 1: UN-Energy (2007), 2: FAO (2008), 3: CBD (2008), 4: Martinelli et al. (2008) 42

Page 43: Ontology Engineering for Big Data

Verification of Ontology Exploration Tool

burn agriculture= ( deforestation, soil deterioration caused by farmland development for biofuel crops )⇒ harvest sugarcanes ( air pollution caused by intentional burn ), disruption of ecosystem caused by deforestation ( water pollution ) 

burn agriculture= ( deforestation, soil deterioration caused by farmland development for biofuel crops )⇒ harvest sugarcanes ( air pollution caused by intentional burn ), disruption of ecosystem caused by deforestation ( water pollution ) 

The concepts appearing in these scenarios were extracted and generalized to add into the ontology

Example: Air pollution, cause of forest fire, soil deterioration, water pollution are attributed to intentional burn when forest is logged or sugarcanes are harvested in the farmland development for biofuel crops.

43

Page 44: Ontology Engineering for Big Data

Verification of Ontology Exploration Tool Verification methods

1) Enrichment of SS ontologyWe enriched the SS ontology on the basis of 29 typical scenarios (cases) structured by domain experts in biofuel through literature review and interviews

2) Verification of scenario reproducing operations

We verified whether the ontology exploration tool could generate conceptual maps which represent original scenarios.

Result: – 93% (27/29) of the scenarios were

successfully reproduced as conceptual maps.

Scenarios

(Cases)Scenarios

(Cases)Scenarios

(Cases)Scenarios

(Cases)Scenarios

(Cases)Scenarios

(Cases)

29 scenarios(cases)

27 conceptual maps

44

Page 45: Ontology Engineering for Big Data

Usage and evaluation of ontology exploration tool Step 1: Usage for knowledge structuring in

sustainability science

Step 2: Verification of exploring the abilities of the ontology exploration tool

Step 3: Experiments for evaluating the ontology exploration tool

1) Whether meaningful maps for domain experts were obtained.

2) Whether meaningful maps other than anticipated maps were obtained.

2013/09/03 45

Maps which are representing the contents of the scenarios anticipated by ontology developers at the time of ontology construction.

Note: the subjects don’t know what scenarios are anticipated.ONSD2013@ICEC2013

Page 46: Ontology Engineering for Big Data

Experiment for evaluating ontology exploration tool

Experimental method1) The four experts to generated

conceptual maps with the tool in accordance with condition settings of given tasks.

2) They remove paths that were apparently inappropriate from the paths of conceptual chains included in the generated maps.

3) They select paths according to their interests and enter a four-level general evaluation with free comments.

2013/09/03 46

The subjects:4 experts in different fields. A: Agricultural economics B: Social science (stakeholder analysis) C: Risk analysis D: Metropolitan environmental planning

A: Interesting B: Important but ordinaryC: Neither good or poorD: Obviously wrong

ONSD2013@ICEC2013

Page 47: Ontology Engineering for Big Data

Experimental results (1)

2013/09/03 47

A B C DExpert A 2 2Expert A(second time) 1 1

Expert B 7 4 1 2Expert B(second time) 6 3 3

Expert C 8 1 5 2Expert D 3 1 1 1Expert A 1 1Expert B 6 5 1Expert C 7 2 4 1Expert D 5 3 1 1Expert B 8 4 2 2Expert C 4 2 2Expert D 3 3

61 30 22 8 1

Task 3

Total

Number ofselected paths

Path distribution based on general evaluation

Task 1

Task 2

(N) Nodes and links included in

the paths of anticipated maps

(M) Nodes and links included in the paths of generated and selected by the experts

50 15050

N∩M

Each area of circle represents the numbers of nodes and links included in paths. Note, the number in the circles represent not the actual number but the rates between each paths.

Fig.7 The rate of paths. ONSD2013@ICEC2013

Page 48: Ontology Engineering for Big Data

Experimental results (1)

2013/09/03 48

A B C DExpert A 2 2Expert A(second time) 1 1

Expert B 7 4 1 2Expert B(second time) 6 3 3

Expert C 8 1 5 2Expert D 3 1 1 1Expert A 1 1Expert B 6 5 1Expert C 7 2 4 1Expert D 5 3 1 1Expert B 8 4 2 2Expert C 4 2 2Expert D 3 3

61 30 22 8 1

Task 3

Total

Number ofselected paths

Path distribution based on general evaluation

Task 1

Task 2

(N) Nodes and links included in

the paths of anticipated maps

(M) Nodes and links included in the paths of generated and selected by the experts

50 15050

N∩M

Each area of circle represents the numbers of nodes and links included in paths. Note, the number in the circles represent not the actual number but the rates between each paths.

Fig.7 The rate of paths.

Number of maps generated: 13

Number of paths evaluated: 61

Number of paths evaluated: 61A: Interesting 30 (49%)B: Important but ordinary 22 (36%)C: Neither good or poor 8(13%) D: Obviously wrong 1(2%)

We can conclude that the tool could generate maps or paths sufficiently meaningful for experts.

85%

ONSD2013@ICEC2013

Page 49: Ontology Engineering for Big Data

Experimental results (2) Quantitatively comparison of the anticipated maps

with the maps generated by the subjects

2013/09/03 49

(N) Nodes and links included in the

paths of anticipated maps

(M) Nodes and links included in the paths of generated and selected by the experts

50 15050

N∩M About 75% of paths in the generated maps are new paths which is not anticipated from the typical scenarios .

It is meaningful enough to claim a positive support for the developed tool. This suggests that the tool has a sufficient possibility of presenting unexpected contents and stimulating conception by the user.

About half (50%) of the paths included in the anticipated maps   were included in the maps generated by the experts.

ONSD2013@ICEC2013

Page 50: Ontology Engineering for Big Data

Summery: Use ontology to bridge datasets across domains

Basic technology Terms (classes/instances) defined in ontology are used as

common vocabulary for search data. If the ontology has mapping to Multiple DBs, the user can

search across them.

Motivation and Issue Combinations of multiple datasets could be valuable for

Big Data Analysis. However, to get all combinations across multiple Big Data is

not realistic for their size. Requests by the users are very different according to their

interests.

Ontology Engineering for Big Data to Solve the issue Ontology Exploration contribute to obtain

meaningful combinations (= viewpoints) according to the users’ interests.

2013/09/03 ONSD2013@ICEC2013 50

Page 51: Ontology Engineering for Big Data

Case studies Use ontology to bridge datasets

across domains Understanding an Ontology through Divergent

Exploration Presented at ESWC2011

Use ontology to combine deep domain knowledge and raw data Japanese Medical Ontology project

Disease ontology and Ontology of Abnormal State

presented at ICBO (International Conference on Biomedical Ontology) 2011, 2012 and 2013

2013/09/03 52ONSD2013@ICEC2013

Page 52: Ontology Engineering for Big Data

Medical ontology project in Japan Developed ontologies

Disease ontology : Definitions of diseases as causal chains of abnormal state. 6000+ diseases

Anatomy ontology : Connections between blood vessel, nerves, bones :

10,000+

It based on ontological frameworks (upper level ontology) which can apply to other domains

Models for causal chains Abnormal state ontology for data integration General framework to define complicated structures

2013/09/03 53ONSD2013@ICEC2013

Page 53: Ontology Engineering for Big Data

Disease Ontology

Definition of the disease ontology

How to connect the disease ontology to medical database

2013/09/03 54ONSD2013@ICEC2013

Page 54: Ontology Engineering for Big Data

An example of causal chain constituted diabetes.

2013/09/03 55

Disorder (nodes)

Causal Relationship

Core causal chain of a disease(each color represents a disease)

Legends

loss of sight

Elevated level of glucose in the blood

Type I diabetesDiabetes-related Blindness

Steroid diabetes

Diabetes…

……

… … …

possible causes and effects

Destruction of pancreatic beta cells

Lack of insulin I in the blood

Long-term steroid treatment

Deficiency of insulin

Is-a relation between diseases using chain-inclusion relationship between causal chains

ONSD2013@ICEC2013

Page 55: Ontology Engineering for Big Data

Structural abnormality

Sizeabnormalit

y

Formational

abnormality

Conformational abnormality

Small in size

Small in line

Small in area

Small in volume

Narrowing tube

Vascular stenosis Gastrointestinal tract stenosis

Arterial stenosis … Intestinal stenosis

Layer 1 :Generic Abnormal States (Object-independent)

Layer 3:Specific context-dependent Abnormal States

Coronary stenosis in

Angina pectoris

Coronary stenosisin

Arteriosclerosis

Intestinal stenosisin

Ileus

Esophageal stenosis in

Esophagitis

Esophageal stenosis

is-a

Materialabnormality

Largein size

diseasedependent

Blood vesseldependent

Topological abnormalit

y ……

Is-a hierarchy of Abnormality Ontology

56

Tube-dependent…

Narrowing of valve

Layer2 :Object-dependentAbnormal States

……

Coronary stenosis

2013/09/03

ONSD2013@ICEC2013

Page 56: Ontology Engineering for Big Data

Medical Department No. ofAbnormal states

No. of Diseases

Allergy and Rheumatoid 1,195 87

Cardiovascular Medicine 3,052 546

Diabetes and Metabolic Diseases 1,989 445

Orthopedic Surgery 1,883 198Nephrology and Endocrinology 1,706 198

Neurology 2,960 396Digestive Medicine 1,125 233

Respiratory Medicine 1,739 788

Ophthalmology 1,306 561

Hematology and Oncology

354 415

Dermatology 908 1,086

Pediatrics 2,334 879

Otorhinolaryngology 1,118 470

Total 21,669 6,302

Disease chains Graphical Tool

Hozo-Ontology EditorClinicians from 13 medical departments describe causal chains of diseases :• 6,302 diseases• 21,669 abnormal

states 572013/09/03

ONSD2013@ICEC2013

Page 57: Ontology Engineering for Big Data

Medical Department No. ofAbnormal state

No. of Disease

Allergy and Rheumatoid 1,195 87

Cardiovascular Medicine 3,052 546

Diabetes and Metabolic Diseases 1,989 445

Orthopedic Surgery 1,883 198Nephrology and Endocrinology 1,706 198

Neurology 2,960 396Digestive Medicine 1,125 233

Respiratory Medicine 1,739 788

Ophthalmology 1,306 561

Hematology and Oncology

354 415

Dermatology 908 1,086

Pediatrics 2,334 879

Otorhinolaryngology 1,118 470

Total 21,669 6,302

Each Clinician defines diseases in terms ofcausal chains at his/her division

Causal Relationship

  

Abnormal States

Myocardial Infarction (disease)

582013/09/03

Page 58: Ontology Engineering for Big Data

Each Clinician defines diseases in terms ofcausal chains at his/her division

Causal Relationship

  

Abnormal States

Myocardial Infarction (disease)

• Using three layer-model of abnormality ontology

• Combining causal chains including the same or related abnormal states by consulting is-a hierarchy

⇒Generic causal chains can be generated.

59

Layer 3

Layer 2

Layer 1

Page 59: Ontology Engineering for Big Data

Causal Relationship

  

Abnormal States

Myocardial Infarction (disease)

Layer 3

Layer 2

Layer 1

Each Clinician describes the definition of disease (causal chains of disease ) at particular department

60

From 13medical divisions All 21,000 abnormal states   can be visualized with possible causal relationships

• Using three layer-model of abnormality ontology

• Combining causal chains including the same or related abnormal states by consulting is-a hierarchy

⇒Generic causal chains can be generated.

Page 60: Ontology Engineering for Big Data

Knowledge provided by the Disease Ontology

Definition of disease It can answer the following questions;

What abnormal state could be a cause of which diseases?

What condition may be occur on a patient of the disease?

That is it can provide base knowledge to analysis big data related to disease.

2013/09/03 ONSD2013@ICEC2013 61

Page 61: Ontology Engineering for Big Data

DEMO: Visualization of abnormal state

ontology with possible causal relationships Java client application Developed by HOZO API.

Disease Chain LOD Linked Open Data converted from the disease

ontology. SPARQL endpoint (web API for query) and

Visualization Tool of Disease Chains by HTML5. http://lodc.med-ontology.jp/

2013/09/03 62ONSD2013@ICEC2013

Page 62: Ontology Engineering for Big Data

SPARQL Endpoint

(c)The user can also browse connected triples by clicking rectangles that represent the objects.

(a)The user can make simple SPARQL queries by selecting a property and an object from lists.

(b) When the user selects a resource shown as a query result, triples connected the resource are visualized.

2013/09/03 63ONSD2013@ICEC2013

Page 63: Ontology Engineering for Big Data

2013/09/03 64ONSD2013@ICEC2013

Page 64: Ontology Engineering for Big Data

Abnormal state

Is-a hierarchy

Clinical  DB

knowledge

data attribute⇔property interoperability

65

Anomaly representati

on

Abnormal statesLayers

Generic Chains

Disease chains

2013/09/03

Page 65: Ontology Engineering for Big Data

Summary(2):Disease Ontology Disease Ontology

Provides domain knowledge described by medical experts.

Medical DB (Big Data) Provides evidential data from medial information

system such as electronic medical records.

It could be a good example to combine Ontology and Big Data.

2013/09/03 66

Existing Knowledge Evidence /New Knowledge

ONSD2013@ICEC2013

Page 66: Ontology Engineering for Big Data

Concluding Remarks Ontology Engineering for Big Data

Combination of them are good! Basic technology: how to combine ontology to

big data Mapping ontology to database Add metadata on data using vocabulary defined in ontology Convert database (e.g. RDB) to ontology-based (RDF)

database How to use Combinations of Ontology and Big

Data: Two possible approaches Use ontology to bridge datasets across domains

Ontology exploration method to obtain meaningful combinations (= viewpoints)

Use ontology to combine deep domain knowledge and raw data

Future Plan Generalizing our approaches and feedback them as new

function of Hozo

2013/09/03 67ONSD2013@ICEC2013

Page 67: Ontology Engineering for Big Data

Acknowledgements

A part of this work was supported by JSPS KAKENHI Grant Numbers 24120002 and 22240011.

A part of research on medical ontology is supported by the Ministry of Health, Labor and Welfare, Japan, through its “Research and development of medical knowledge base databases for medical information systems” and by the Japan Society for the Promotion of Science (JSPS) through its “Funding Program for World-Leading Innovative R&D on Science and Technology (FIRST Program)”.

I’m also grateful to all collaborator of each study.

2013/09/03 ONSD2013@ICEC2013 68

Page 68: Ontology Engineering for Big Data

Acknowledgement

2013/09/03

Thank you for your attention!

Hozo Support Site : http://www.hozo.jp/Contact: [email protected]

69ONSD2013@ICEC2013