Top Banner
Information Artifact Ontology: General Background Barry Smith 1
40

Information Artifact Ontology: General Background

Feb 23, 2016

Download

Documents

odell

Information Artifact Ontology: General Background. Barry Smith. Slides. http://ncorwiki.buffalo.edu/index.php/STIDS_2013. Barry Smith – who am I?. Director: National Center for Ontological Research (Buffalo) Founder: Ontology for the Intelligence Community (OIC, now STIDS) conference series - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Information Artifact Ontology:  General Background

1

Information Artifact Ontology: General Background

Barry Smith

Page 2: Information Artifact Ontology:  General Background

2

Slides

http://ncorwiki.buffalo.edu/index.php/STIDS_2013

Page 3: Information Artifact Ontology:  General Background

3

Barry Smith – who am I?Director: National Center for Ontological Research (Buffalo)Founder: Ontology for the Intelligence Community (OIC, now STIDS) conference series

Ontology work for

NextGen (Next Generation) Air Transportation SystemNational Nuclear Security Administration, DoEJoint-Forces Command Joint Warfighting CenterArmy Net-Centric Data Strategy Center of ExcellenceArmy Intelligence and Information Warfare Directorate (I2WD)

and for many national and international biomedical research and healthcare agencies

Page 4: Information Artifact Ontology:  General Background

4

I2WD Ontology TeamRon Rudnucki

CUBRC, University at Buffalo

Dr. Tatiana MalyutaNY City College of Technology of CUNY,

Data Tactics Corp.

David Salmen Data Tactics Corp.

LCOL Dr. William Mandrick Data Tactics Corp.

Page 5: Information Artifact Ontology:  General Background

5

In the olden days

people measured lengths using inches, ulnas, perches, king’s feet, Swiss feet, leagues of Paris, etc., etc.

Page 6: Information Artifact Ontology:  General Background

6

On June 22, 1799, in Paris,everything changed

Page 7: Information Artifact Ontology:  General Background

7

International System of Units (SI)

Page 8: Information Artifact Ontology:  General Background

8

Making data (re-)usable through standard terminologies

• Standards provide– common structure and terminology– single data source for review (less redundant

data)• Standards allow

– use of common tools and techniques– common training– single validation of data

Page 9: Information Artifact Ontology:  General Background

9

One successful part of the solution to this problem = Ontologies

controlled vocabularies (nomenclatures)plus definitions of terms in a logical language

Standardized (logically defined) terms in an ontology are the equivalent of standardized

units in the SI

Page 10: Information Artifact Ontology:  General Background

10

Ontologies

• are computer-tractable representations of types in specific areas of reality

• are more and less general (upper and lower ontologies)– upper = organizing ontologies– lower = domain ontology modules

Page 11: Information Artifact Ontology:  General Background

11

Linked Open Data are not enough

Page 12: Information Artifact Ontology:  General Background

12

Links are inconsistently defined; ontologies are full of redundancies

Page 13: Information Artifact Ontology:  General Background

13

Towards coordination of modular non-redundant ontologies

Page 14: Information Artifact Ontology:  General Background

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)

CELL AND CELLULAR

COMPONENTCell(CL)

Cellular Component(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)Environment Ontology (EnvO)

Envi

ronm

ents

16

Page 15: Information Artifact Ontology:  General Background

OBO Foundry approach extended into other domains

17

NIF Standard Neuroscience Information Framework

IDO Consortium Infectious Disease OntologycROP Common Reference Ontologies

for Plants

MilPortal.org Military OntologyAIRS Ontology Suite Intelligence Ontology Suite

Page 16: Information Artifact Ontology:  General Background

18

Page 17: Information Artifact Ontology:  General Background

19

Page 18: Information Artifact Ontology:  General Background

20

Page 19: Information Artifact Ontology:  General Background

Horizontal Integration of Big Intelligence Data

The Role of Ontology in the Era of Big Data

T. Malyuta, Ph. D New York City College of Technology, NY, NY

B. Smith, Ph. DUniversity at Buffalo, Buffalo, NY

R. Rudnicki CUBRC, Buffalo, NY

Page 21: Information Artifact Ontology:  General Background

Big Data Problem• Wikipedia defines Big Data as “…a collection of data

sets so large and complex that it becomes difficult to process using on-hand database management tools.”

• Gartner defines Big Data with three ‘V’s:– Volume– Velocity (of production and analysis)– Variety– Recently the forth ‘V’ – Veracity – was added

• This means that Big Data are beyond our control (as opposed to those complex and big systems with diverse and changing data where the complexity is known)

24

Page 22: Information Artifact Ontology:  General Background

Big Data Solution – Agility • Dimensions of agility

– Storage paradigms that accommodate massive volumes of heterogeneous data

– Data processing paradigms that can deal with the massive volumes of heterogeneous data coming onstream

– Dynamic data stores that can easily accommodate diverse and a priori unknown data types and semantics

– Methods and tools that leverage dynamic and diverse content

25

Page 23: Information Artifact Ontology:  General Background

The Problem of Horizontal Integration of Big Intelligence Data

• HI =Def. the ability to exploit multiple data sources as if they are one

• Recognized issues for HI with existing approaches– Data silos– Lexicon/semantics silos

• Requirement for HI of Big Intelligence Data – Agile Semantic Interoperability A strategy for HI must be agile in the sense that it can be quickly

extended to new zones of emerging data according to need Ontology allows an incremental approach – big bang already

from the very first buck (we showed on the project that is described below)

Ontology can provide the needed agility28

Page 24: Information Artifact Ontology:  General Background

Agile Semantic Interoperability

• A good solution has to be– Able to grow incrementally – Able to be developed in a distributed manner– Without losing consistency– Independent of particular implementations, and

data producers and consumers– Applicable to data in an agile manner

• We call our solution: ‘semantic enhancement’ (SE) of data

29

Page 25: Information Artifact Ontology:  General Background

• Explica tion of general terms used in source intelligence artifacts and in data models, terminologies and doctrinal publications which provide typo logies of intelligence-related IAs to semantically enhance data in a way that enables computational integration and reasoning

• Annotation of the instance-level information captured by such IAs to aid retrieval of information about specific persons, groups, events, documents, images, and so forth

Explication vs. Annotation

Page 26: Information Artifact Ontology:  General Background

SE• SE is realized with the help of ontologies that are used to explicate data

models and annotate data instances – Vocabulary of ontologies used for explications and annotations provides agile

horizontal integration– Ontologies, by virtue of their nature and organization, provide semantic

enhancement of data

PersonID Name Description

111 Java Programming

222 SQL Database

SQL Java C++

ProgrammingSkill

ComputerSkill

Skill Education

TechnicalEducation

32

Page 27: Information Artifact Ontology:  General Background

The Meaning of ‘Enhancement’• Semantic enhancement/enrichment of data = arm’s

length approach (no change to data) – through simple explication we associate an entire knowledge system with a database field – enables analytics to process data, e.g. about computer skills,

“vertically” along the Skill hierarchy, as well as “horizontally” via relations between Skill and Education.

– and further… while data in the database does not change, its analysis can be richer and richer as our understanding of the reality changes

• For this richness to be leveraged by different communities, persons, and applications it needs to have the properties mentioned above and be constructed in accordance with the principles of the SE

33

Page 28: Information Artifact Ontology:  General Background

SE Principles⁻ Create a Shared Semantic Resource (SSR) of ontologies

to be used for explication and annotation⁻ Establish an agile strategy for building ontologies within

this SSR, and apply and extend these ontologies to explicate and annotate new source data as they come onstream

⁻ Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups⁻ How to manage collaboration?

34

Page 29: Information Artifact Ontology:  General Background

Achieving the Goal• Methodology of incremental distributed ontology

development • A common ontology architecture incorporating a

common, domain-neutral, upper-level ontology (BFO)• A shared governance and change management process• A simple, repeatable process for ontology development• An ontology registry • A process of intelligence data capture through

explication or source data models

35

Page 30: Information Artifact Ontology:  General Background

Main Methodological Points• Ontological realism

– Based on Doctrine / Science– Involves SMEs in label selection and definition– Thoroughly tested in many projects

• Arms-length process, with minimal disturbance to existing data and data semantics

• Reference ontologies – capture generic content and are designed for aggressive reuse in multiple different types of context: Single reference ontology for each domain of interest

• Application ontologies – are tied to specific local applications– An application ontology is created by combining local content

with generic content taken from relevant reference ontologies– Still interoperable because based on common set of

reference ontologies

* Barry Smith and Werner Ceusters, “Ontological Realism as a Methodology for Coordinated Evolution of Scientific Ontologies”, Applied Ontology, 5 (2010), 139–188.

36

Page 31: Information Artifact Ontology:  General Background

Arms-length Process

SE ontology labels

• Focusing on the terms (labels, acronyms, codes) used in ***our source data

• Where multiple distinct terms {t1, …, tn} are used in separate data sources with one and the same meaning, they are associated with a single preferred label drawn from a standard set of such labels

• All the separate data items associated with the {t1, … tn} thereby linked together through the corresponding preferred labels.

• Preferred labels form basis the for the ontologies we build

Heterogeneous ContentsABC KLM

XYZ

37

Page 32: Information Artifact Ontology:  General Background

Reference and Application Ontologies

vehicle =def: an object used for transporting people or goods

tractor =def: a vehicle that is used for towing

crane =def: a vehicle that is used for lifting and moving heavy objects

vehicle platform=def: means of providing mobility to a vehicle

wheeled platform=def: a vehicle platform that provides mobility through the use of wheels

tracked platform=def: a vehicle platform that provides mobility through the use of continuous tracks

artillery vehicle = def. vehicle designed for the transport of one or more artillery weapons

wheeled tractor = def. a tractor that has a wheeled platform

tracked tractor = def. a tractor that has a tracked platform

artillery tractor = def. an artillery vehicle that is a tractor

wheeled artillery tractor = def. an artillery tractor that has a wheeled platform

Reference Ontology Application Definitions

38

Page 33: Information Artifact Ontology:  General Background

Illustration of Ontology Types (Toy Example)Vehicle

Tractor

Wheeled Tractor

Artillery Tractor

Wheeled Artillery Tractor

Artillery Vehicle

Black – reference ontologies

Red – application ontologies

39

Page 34: Information Artifact Ontology:  General Background

Role of Reference Ontologies• Normalized

– Maintains a set of consistent ontologies – Eliminates redundancy

• Modular– A set of plug-and-play ontology modules– Enables distributed consistent development

• Surveyable

40

Page 35: Information Artifact Ontology:  General Background

SE Architecture• The Upper Level Ontology (ULO) in the SE hierarchy

must be maximally general (no overlap with domain ontologies)

• The Mid-Level Ontologies (MLOs) introduce successively less general and more detailed representations of types which arise in successively narrower domains until we reach the Lowest Level Ontologies (LLOs).

• The LLOs are maximally specific representation of the entities in a particular one-dimensional domain

41

Page 36: Information Artifact Ontology:  General Background

Challenges to HI • Too many lexicons • The scope of the domain: signal, sensor, image,

… intelligence about … the whole world• Difficult to conduct governance and

management of ontology development to ensure consistent evolution

• Lack of expertise• Complexity of the ontology development and

application process

43

Page 37: Information Artifact Ontology:  General Background

Preventing Failure• The method we use offers solutions to some of the common reasons

for failure• Lack of Consensus

– Realism offers an objective standard for settling disputes over terminology. Ontology development becomes an empirical science instead of an exercise in the publication of dialects

– Governance helps to resolve conflicts and achieve consensus• High Maintenance

– Arm’s length implementation places no additional overhead onto applications • Parochialism

– Architecture and methodology prevent development of vocabularies that apply only to a single perspective

• Poor Quality– Experience prevents common mistakes in vocabularies that cause

downstream problems with search and analytics44

Page 38: Information Artifact Ontology:  General Background

Preventing Failure (cont.)• Agile ontology development

– Methodology and architecture– Growing SSR

• Agile ontology application– Incremental– Semi-automated where possible– Even if not as fast as some want it to be

• It is still faster than creating a physical store, which will be just another silo and will still need to be integrated with the rest of data

• Once a data collection is semantically enhanced, it is integrated with all data that had been and will be semantically enhanced without any additional efforts

45

Page 39: Information Artifact Ontology:  General Background

What is Next…

– IAO-Intel: An Information Artifact Ontology for the Intelligence Community (BS)

– A Survey of DSGS-A Ontology Work and Explicating and Annotating Processes (R. Rudnicki)

– Email Ontology – illustration of the methodology of ontology design and of the IAO-Intel (D. Salmen and W. Mandrick)

46

Page 40: Information Artifact Ontology:  General Background

References• Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny

Parent, Milan Patel, Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community, STIDS Conference, 2012.

• • Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny

Parent, Shouvik Bardhan, Jamie Johnson, “Ontology for the Intelligence Analyst”, Crosstalk: The Journal of Defense Software Engineering, 2012.

• • David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry

Smith, Integration of Intelligence Data through Semantic Enhancement, STIDS Conference, 2011.

47