Information Artifact Ontology: General Background Barry Smith 1
Feb 23, 2016
1
Information Artifact Ontology: General Background
Barry Smith
2
Slides
http://ncorwiki.buffalo.edu/index.php/STIDS_2013
3
Barry Smith – who am I?Director: National Center for Ontological Research (Buffalo)Founder: Ontology for the Intelligence Community (OIC, now STIDS) conference series
Ontology work for
NextGen (Next Generation) Air Transportation SystemNational Nuclear Security Administration, DoEJoint-Forces Command Joint Warfighting CenterArmy Net-Centric Data Strategy Center of ExcellenceArmy Intelligence and Information Warfare Directorate (I2WD)
and for many national and international biomedical research and healthcare agencies
4
I2WD Ontology TeamRon Rudnucki
CUBRC, University at Buffalo
Dr. Tatiana MalyutaNY City College of Technology of CUNY,
Data Tactics Corp.
David Salmen Data Tactics Corp.
LCOL Dr. William Mandrick Data Tactics Corp.
5
In the olden days
people measured lengths using inches, ulnas, perches, king’s feet, Swiss feet, leagues of Paris, etc., etc.
6
On June 22, 1799, in Paris,everything changed
7
International System of Units (SI)
8
Making data (re-)usable through standard terminologies
• Standards provide– common structure and terminology– single data source for review (less redundant
data)• Standards allow
– use of common tools and techniques– common training– single validation of data
9
One successful part of the solution to this problem = Ontologies
controlled vocabularies (nomenclatures)plus definitions of terms in a logical language
Standardized (logically defined) terms in an ontology are the equivalent of standardized
units in the SI
10
Ontologies
• are computer-tractable representations of types in specific areas of reality
• are more and less general (upper and lower ontologies)– upper = organizing ontologies– lower = domain ontology modules
11
Linked Open Data are not enough
12
Links are inconsistently defined; ontologies are full of redundancies
13
Towards coordination of modular non-redundant ontologies
RELATION TO TIME
GRANULARITY
CONTINUANT OCCURRENT
INDEPENDENT DEPENDENT
ORGAN ANDORGANISM
Organism(NCBI
Taxonomy)
Anatomical Entity(FMA, CARO)
OrganFunction
(FMP, CPRO) Phenotypic
Quality(PaTO)
Biological Process
(GO)
CELL AND CELLULAR
COMPONENTCell(CL)
Cellular Component(FMA, GO)
Cellular Function
(GO)
MOLECULEMolecule
(ChEBI, SO,RnaO, PrO)
Molecular Function(GO)
Molecular Process
(GO)Environment Ontology (EnvO)
Envi
ronm
ents
16
OBO Foundry approach extended into other domains
17
NIF Standard Neuroscience Information Framework
IDO Consortium Infectious Disease OntologycROP Common Reference Ontologies
for Plants
MilPortal.org Military OntologyAIRS Ontology Suite Intelligence Ontology Suite
18
19
20
Horizontal Integration of Big Intelligence Data
The Role of Ontology in the Era of Big Data
T. Malyuta, Ph. D New York City College of Technology, NY, NY
B. Smith, Ph. DUniversity at Buffalo, Buffalo, NY
R. Rudnicki CUBRC, Buffalo, NY
23
http://ncorwiki.buffalo.edu/index.php/Main_Page#Documents
Big Data Problem• Wikipedia defines Big Data as “…a collection of data
sets so large and complex that it becomes difficult to process using on-hand database management tools.”
• Gartner defines Big Data with three ‘V’s:– Volume– Velocity (of production and analysis)– Variety– Recently the forth ‘V’ – Veracity – was added
• This means that Big Data are beyond our control (as opposed to those complex and big systems with diverse and changing data where the complexity is known)
24
Big Data Solution – Agility • Dimensions of agility
– Storage paradigms that accommodate massive volumes of heterogeneous data
– Data processing paradigms that can deal with the massive volumes of heterogeneous data coming onstream
– Dynamic data stores that can easily accommodate diverse and a priori unknown data types and semantics
– Methods and tools that leverage dynamic and diverse content
25
The Problem of Horizontal Integration of Big Intelligence Data
• HI =Def. the ability to exploit multiple data sources as if they are one
• Recognized issues for HI with existing approaches– Data silos– Lexicon/semantics silos
• Requirement for HI of Big Intelligence Data – Agile Semantic Interoperability A strategy for HI must be agile in the sense that it can be quickly
extended to new zones of emerging data according to need Ontology allows an incremental approach – big bang already
from the very first buck (we showed on the project that is described below)
Ontology can provide the needed agility28
Agile Semantic Interoperability
• A good solution has to be– Able to grow incrementally – Able to be developed in a distributed manner– Without losing consistency– Independent of particular implementations, and
data producers and consumers– Applicable to data in an agile manner
• We call our solution: ‘semantic enhancement’ (SE) of data
29
• Explica tion of general terms used in source intelligence artifacts and in data models, terminologies and doctrinal publications which provide typo logies of intelligence-related IAs to semantically enhance data in a way that enables computational integration and reasoning
• Annotation of the instance-level information captured by such IAs to aid retrieval of information about specific persons, groups, events, documents, images, and so forth
Explication vs. Annotation
SE• SE is realized with the help of ontologies that are used to explicate data
models and annotate data instances – Vocabulary of ontologies used for explications and annotations provides agile
horizontal integration– Ontologies, by virtue of their nature and organization, provide semantic
enhancement of data
PersonID Name Description
111 Java Programming
222 SQL Database
SQL Java C++
ProgrammingSkill
ComputerSkill
Skill Education
TechnicalEducation
32
The Meaning of ‘Enhancement’• Semantic enhancement/enrichment of data = arm’s
length approach (no change to data) – through simple explication we associate an entire knowledge system with a database field – enables analytics to process data, e.g. about computer skills,
“vertically” along the Skill hierarchy, as well as “horizontally” via relations between Skill and Education.
– and further… while data in the database does not change, its analysis can be richer and richer as our understanding of the reality changes
• For this richness to be leveraged by different communities, persons, and applications it needs to have the properties mentioned above and be constructed in accordance with the principles of the SE
33
SE Principles⁻ Create a Shared Semantic Resource (SSR) of ontologies
to be used for explication and annotation⁻ Establish an agile strategy for building ontologies within
this SSR, and apply and extend these ontologies to explicate and annotate new source data as they come onstream
⁻ Problem: Given the immense and growing variety of data sources, the development methodology must be applied by multiple different groups⁻ How to manage collaboration?
34
Achieving the Goal• Methodology of incremental distributed ontology
development • A common ontology architecture incorporating a
common, domain-neutral, upper-level ontology (BFO)• A shared governance and change management process• A simple, repeatable process for ontology development• An ontology registry • A process of intelligence data capture through
explication or source data models
35
Main Methodological Points• Ontological realism
– Based on Doctrine / Science– Involves SMEs in label selection and definition– Thoroughly tested in many projects
• Arms-length process, with minimal disturbance to existing data and data semantics
• Reference ontologies – capture generic content and are designed for aggressive reuse in multiple different types of context: Single reference ontology for each domain of interest
• Application ontologies – are tied to specific local applications– An application ontology is created by combining local content
with generic content taken from relevant reference ontologies– Still interoperable because based on common set of
reference ontologies
* Barry Smith and Werner Ceusters, “Ontological Realism as a Methodology for Coordinated Evolution of Scientific Ontologies”, Applied Ontology, 5 (2010), 139–188.
36
Arms-length Process
SE ontology labels
• Focusing on the terms (labels, acronyms, codes) used in ***our source data
• Where multiple distinct terms {t1, …, tn} are used in separate data sources with one and the same meaning, they are associated with a single preferred label drawn from a standard set of such labels
• All the separate data items associated with the {t1, … tn} thereby linked together through the corresponding preferred labels.
• Preferred labels form basis the for the ontologies we build
Heterogeneous ContentsABC KLM
XYZ
37
Reference and Application Ontologies
vehicle =def: an object used for transporting people or goods
tractor =def: a vehicle that is used for towing
crane =def: a vehicle that is used for lifting and moving heavy objects
vehicle platform=def: means of providing mobility to a vehicle
wheeled platform=def: a vehicle platform that provides mobility through the use of wheels
tracked platform=def: a vehicle platform that provides mobility through the use of continuous tracks
artillery vehicle = def. vehicle designed for the transport of one or more artillery weapons
wheeled tractor = def. a tractor that has a wheeled platform
tracked tractor = def. a tractor that has a tracked platform
artillery tractor = def. an artillery vehicle that is a tractor
wheeled artillery tractor = def. an artillery tractor that has a wheeled platform
Reference Ontology Application Definitions
38
Illustration of Ontology Types (Toy Example)Vehicle
Tractor
Wheeled Tractor
Artillery Tractor
Wheeled Artillery Tractor
Artillery Vehicle
Black – reference ontologies
Red – application ontologies
39
Role of Reference Ontologies• Normalized
– Maintains a set of consistent ontologies – Eliminates redundancy
• Modular– A set of plug-and-play ontology modules– Enables distributed consistent development
• Surveyable
40
SE Architecture• The Upper Level Ontology (ULO) in the SE hierarchy
must be maximally general (no overlap with domain ontologies)
• The Mid-Level Ontologies (MLOs) introduce successively less general and more detailed representations of types which arise in successively narrower domains until we reach the Lowest Level Ontologies (LLOs).
• The LLOs are maximally specific representation of the entities in a particular one-dimensional domain
41
Challenges to HI • Too many lexicons • The scope of the domain: signal, sensor, image,
… intelligence about … the whole world• Difficult to conduct governance and
management of ontology development to ensure consistent evolution
• Lack of expertise• Complexity of the ontology development and
application process
43
Preventing Failure• The method we use offers solutions to some of the common reasons
for failure• Lack of Consensus
– Realism offers an objective standard for settling disputes over terminology. Ontology development becomes an empirical science instead of an exercise in the publication of dialects
– Governance helps to resolve conflicts and achieve consensus• High Maintenance
– Arm’s length implementation places no additional overhead onto applications • Parochialism
– Architecture and methodology prevent development of vocabularies that apply only to a single perspective
• Poor Quality– Experience prevents common mistakes in vocabularies that cause
downstream problems with search and analytics44
Preventing Failure (cont.)• Agile ontology development
– Methodology and architecture– Growing SSR
• Agile ontology application– Incremental– Semi-automated where possible– Even if not as fast as some want it to be
• It is still faster than creating a physical store, which will be just another silo and will still need to be integrated with the rest of data
• Once a data collection is semantically enhanced, it is integrated with all data that had been and will be semantically enhanced without any additional efforts
45
What is Next…
– IAO-Intel: An Information Artifact Ontology for the Intelligence Community (BS)
– A Survey of DSGS-A Ontology Work and Explicating and Annotating Processes (R. Rudnicki)
– Email Ontology – illustration of the methodology of ontology design and of the IAO-Intel (D. Salmen and W. Mandrick)
46
References• Barry Smith, Tatiana Malyuta, William S. Mandrick, Chia Fu, Kesny
Parent, Milan Patel, Horizontal Integration of Warfighter Intelligence Data: A Shared Semantic Resource for the Intelligence Community, STIDS Conference, 2012.
• • Barry Smith, Tatiana Malyuta, David Salmen, William Mandrick, Kesny
Parent, Shouvik Bardhan, Jamie Johnson, “Ontology for the Intelligence Analyst”, Crosstalk: The Journal of Defense Software Engineering, 2012.
• • David Salmen, Tatiana Malyuta, Alan Hansen, Shaun Cronen, Barry
Smith, Integration of Intelligence Data through Semantic Enhancement, STIDS Conference, 2011.
47