Ontology-Based Computing Kenneth Baclawski Northeastern University and Jarg
Mar 26, 2015
Ontology-Based Computing
Kenneth Baclawski
Northeastern University and Jarg
The Onslaught
Increasingly large amounts of information is becoming accessible electronically.
The information sources are increasingly complicated.
The diversity of types of information source is also increasing.
Technologies are emerging to cope with this onslaught: ontology-based computing.
Ontologies
Shared understanding within a community of people
Declarative specification of entities and their relationships with each other
Constraints and rules that permit reasoning within the ontology
Behavior associated with stated or inferred facts
Relational Database SchemasWell established technique for specifying the
structure of shared data, not for communication between people or agents
Declarative specification but of tables, not of entities and relationships
Some constraints are expressible but no significant rules (such as inheritance)
No explicit behaviorStandard language is SQL.
Object-Oriented Schemas
Emerging technology for communication between software components
Declarative specificationsConstraints and some rulesSeveral ways to specify behaviorThe Unified Modeling Language (UML) is
the standard OO modeling language.
0..1Enzyme
sequence : string
1..*0..*
1..*
Chemicalname : stringformula : stringweight : number0..*
2..*Reaction
description : string
0..1
1..1catalyzed by
1..*0..* output
1..*
0..*
input
1..1
Pathwayname : string
2..*
1..1
consists of
1..1
Logic
Very expressive but very difficult to use. Not designed for communication.
Most logical languages are not based on entities and relationships.
Very powerful inferencing capabilities.Do not usually have any associated behavior.Many examples: Prolog, KIF, Slang, ...
XML DTDs and XML Schema
Defines a hierarchical document type. XML Schema defines data types. Designed for communication over the Web.
Good support for entities and hierarchical relationships; awkward for others.
Constraints can be imposed on the hierarchical structure and on data types.
Behavior can be specified procedurally.
Knowledge Representations
Very well developed branch of AI. Many tools, but mostly academic. Not yet used for communication over the Web.
Powerful language for specifying entities and their relationships.
Most are linked with inference engines.Behavior is typically handled in an ad hoc
manner.
RDF and DAML
Resource Description Framework (RDF) is a knowledge representation language represented in XML. It is a WWW Consortium Recommendation.
The DARPA Agent Markup Language (DAML) is an extension of RDF to serve as the basis for ontology-based computing over the Web: the Semantic Web.
Ontological Reasoning in RDF
Class Property
Person
type
Fish
type
owns
type
Wanda
type
Wendy
type
owns
Type constraint violation: The range of owns is Fish.
OR There is no inconsistency: Wanda is a fish!
range
domaintype
Mermaid?
Class Property
College
type
Student
type
majors
type
Cardinality constraint violation: George can’t have two majors
OR There is no inconsistency: Engineering = Arts & Sciences
domain
range
Restriction
type
subClassOfonProperty
1maxCardinality
Arts & Sciences
type
Engineering
type
George
typemajors
majors
equivalentTo
DAML
Representing information
Relational database: recordsOO database: objects and linksLogic: factsXML: documentsKnowledge Representations: annotationsAll of these are graph structures: entities
related to other entities by relationships.
Where is the meaning?
Databases: select-project-join queries Logic: rules determined by unification XML: XSLT patterns Knowledge Representations: templates
All of these are forms of graph matching. The units of meaning are small connected subgraphs that I call motifs.
Ontology Infrastructure
Ontology development toolsContent creation systemsStorage and retrieval systemsOntology reasoning, mediation, ...Integration with applications
Simply introducing a language is not enough.There must be an infrastructure to supportontology-based computing, including:
Ontology Development
Ontologies can be developed using graphical tools specifically for ontologies or by adapting existing tools such as CASE tools.
Testing ontologies is not easy because they include constraints and inference rules.
Ontology testing is analogous to type checking in programming languages.
Content Creation
Databases: Data warehousing technologyText: Natural Language Processing (NLP)Image processingDirect creation of content
No matter how the content is created it must be tested using consistency checking.
Storage and Retrieval
Scaling up will require high-performance, distributed storage and indexing technology.
The natural units for indexing are the motifs (precomputed joins), but the number of motifs is large.
Jarg Corporation has developed a scalable, high-performance indexing technology for ontology-based knowledge representations.
Jarg Architecture
Document Knowledge RepresentationNLP
fragmentation
Knowledge Fragments
Distributed Index Engine
Query NLP Knowledge Representationfragmentation
Knowledge Motifs
MatchingDocuments
Conclusion
Ontology-based computing is emerging as a natural evolution of existing technologies to cope with the information onslaught.
Ontology-based technology must be scalable if it is to contribute to the solution rather than add to the problem.
Consistency checking is important for the development of ontologies and content.
Bibliography Semantic Web: www.w3.org/2001/sw Ontologies: www.ontology.org Unified Modeling Language: www.omg.org/uml Knowledge Interchange Format: logic.stanford.edu/kif Specware and Slang: www.kestrel.edu XML and XML Schema: www.w3.org/xml RDF and RDFS: www.w3.org/rdf DAML: www.daml.org Notation 3: www.w3.org/DesignIssues/Notation3.html Consistency checking: vis.home.mindspring.com Jarg Knowledge Engine: www.jarg.com