An Aspect of the NSF CDI InitiativeCDI: Cyber-Enabled Discovery and
Innovation
From Data to Knowledge: Leveraging Ontology, Epistemology, and LogicDefinitionsA picture of the landscape of interestA workbench with toolkits (for “enhancing
human cognition and generating new knowledge from [the] wealth of heterogeneous digital data”)
Intellectual meritBroader impact
Definitions: “From Data to Knowledge”Progression of terms: symbols, data, conceptualized
data, knowledgeSymbols: characters and character-string instancesData: symbols as values in attribute-value pairsConceptualized data: data in the framework of a
conceptual modelKnowledge: conceptualized data with a degree of
certainty or community agreementFrom Data to Knowledge
Recognize symbolsClassify symbols with respect to meta-data attributesEmbed attribute-value pairs into a conceptual
framework of concepts, relationships, and constraintsPresent for community approval or integrate into
community-approved conceptualizations
Examples: From Data to KnowledgeCar Ads
Symbols: $, 12k, ford, 4-DoorData: price(12k), mileage(12k), make(ford)Conceptualized data:
Car(C123) has Price($12,000) Car(C123) has Mileage(12,000) Car(C123) has Make(Ford) BodyType isa Feature Car(C123) has Feature(Sedan)
Knowledge Community agreement that the ontology is “correct” Community agreement that the facts in the ontology are
“correct” AppointmentsBiology
Examples: From Data to Knowledge
AppointmentsBiology
Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->Appointment
Place
Insurance
Service Provider
Person
NameDoctor
Pediatrcian
Service Description
Duration
Medical Service Provider
Auto Service Provider Auto Mechanic
Dermatologist
Address
Cost
Date
Time
has
is at
is on
has
provides
has
accepts
hashas
"IHC"
is with
is for
is at
is at
has
"DMBA"
is at
->
Examples: From Data to Knowledge
Biology
Definitions: “Ontology,” “Epistemology,” and “Logic”Ontology
Existence answers “What exists?”For us, it answers: what concepts, relationships, and
constraints exist and how they are interrelated.Epistemology
The nature of knowledge answers: “What is knowledge?”, “How is knowledge acquired?”, “What do people know?”
For us, it answers: what is knowledge (conceptualized data with community agreement), how data becomes conceptualized and how conceptualized data becomes knowledge, and how someone’s conceptualized data corresponds with community-agreed-upon conceptualized data.
LogicPrinciples of valid inference – answers: “What can be
inferred?”For us, it answers: what can be inferred (in a formal sense)
from conceptualized data.
Examples: “For-Us Answers” Ontology: What exists?
In Car Ads: Car, Make, Model, Car has Make, Engine isa Feature In Appointments: Service Provider, Date, Appoint with Doctor In Biology: Protein Activity, Molecular Weight, Chromosome Location
is aggregate of ChromosomeNumber and Start and End and Orientation
Epistemology: What is knowledge?
A fact-filled Biology ontology Chromosome Number (21) starts at Start (29,350,518) and ends at End
(29,367,889) with Orientation(minus) How is it acquired?
Creation of a fact-filled Biology ontology obtained from a reliable source Provenance: Was the source from which the Biology ontology was created
reliable? What do people know?
Does my knowledge that I have an appointment with Dr. Jones on Thursday align with the appointment ontology as established by the doctor’s office?
I view the world with my car ads ontology how does it align with the community standard ontology?
Logic: Principles of valid inference Find red Nissans later than a 2002 with less than 100k miles In Appointments: can reason that a dermatologist is a medical service
provider
Landscape of Data and Knowledge
The creation of ontologies with community agreement Declaration of conceptual models (via ontology editor, forms, …) Recognition of meta-data in semi-structured text
The conversion of heterogeneous digital data into knowledge under an ontology Ontology-based/layout-based information extraction/annotation Data integration within an ontological context
The ability to match isolated ontologies with community ontologies (Semi-)automatic schema matching Traceability from symbols in a page of text to symbols as
ontological components of knowledgeThe ability to reason over ontologies to retrieve information both
given and implied Ontologies as first-order logic theories potentially modal logics
too Query (through both formal query languages and informal search)
over populated ontologies for facts (both recorded and implied)
Includes:
A Workbench for Knowledge EngineeringUnified framework with a toolkit
supporting:Ontology creationData to knowledge conversionKnowledge solidificationCommunity usability
Usable by knowledge workers of varying degrees of sophistication
Ontology CreationObjective: Determine what concepts, relationships, and
constraints exist and how they are interrelatedContributing Solutions (what we have done or have in
progress) TANGO (creation, augmentation, adjustment) Forms to conceptual models (CT’s work) Table interpretation through forms to conceptual models (CT’s
work)Open Problems (what we need, and believe we can do)
Reverse engineer XML documents to an XML schema and then to C-XML (built on RA’s work)
Extract a specific ontology from a more general ontology (like YD’s MS work)
Merge ontologies (built on ZL’s work + LX’s work) Convert regular patterns in documents to conceptual models
Named regular expressions over patterns (based on 598R work) Generation of layout patterns converted to named patterns (based on
YD’s work)… more ??
Data to Knowledge ConversionObjective: Find ways to capture facts ontologically.Contributing solutions
Ontology-based information extraction Semantic annotation (YD’s work) Synergistic ontology-based/layout-based extractors (YD’s
work) Data frames as data-to-knowledge converters
Open problems User-directed annotation (like YD’s ASpaces work) User-directed conversion tools
Named regular-expression extractors wrt RDF, Named Graphs, OWL ontologies, OSM ontologies (like 598R work)
Generation of named regular-expression extractors from marked source documents (598R++ work)
Storage structures?… more ??
A Semantic-Web page consists of• the human-readable page (ordinary HTML, XML, …)• one or more annotation attachments
• a reference to the ontology used for annotation• RDF triples of extracted information• pointers into the original source for every item• highlighting possibilities for extracted data• hover possibilities to connect to the ontology
• directly query to annotation attachment• SPARQL• SerFR
Knowledge Solidification Objective: Obtain community agreement for fact-filled ontologies. Contributing solutions
Provide for recording provenance for individual facts (HC’s work for genealogical data)
TANGO: assume published tables have community agreement and therefore fact-filled ontologies grown from tables have community agreement.
Generally, assume published semi-structured data has community agreement and therefore ontologies and facts extracted from this semi-structured data has community agreement (CT’s work)
Open problems How do we solidify knowledge captured only as conceptualized
data (i.e., data extracted with respect to somebody’s “homegrown” ontology)? (Do we need to worry about this?)
Can we link identical facts in different sites? (begun with HC’s work)
Can we (should we) find ways to attach provenance to the ontology itself (not just to the facts )
Tool for community development of ontologies … more ??
Community Usability Objective: Provide (easy) access to knowledge both ontological
knowledge as well as facts. Contributing solutions
Ordinary query processing including servicing requests via free-form queries and service requests (MA’s work)
Information harvesting (CT’s work) Form/Table query processing (built on CT’s work and RPI’s query-
by-table) Information linkage (HC’s work, SI’s work)
Open problems Agents (e.g., in Aspaces YD’s work) Learning and self-adjustment of individual knowledge (How does
my knowledge align with community knowledge?), for the sake of Gaining encyclopedic knowledge Discovering gaps in knowledge Discovering potential adjustments and augmentations to community
knowledge and solidifying community knowledge Seeing knowledge objects from a different point of view
Orchestrating ontology-based services (MA’s future work) Practicalities?
… more ??
• Ease of Use• Free-form queries (+ linguistics)• Form-based queries (graphical?)
• Scalability• Semantic indexes• Caching (on the scale of Google)
• System Development• Demos• Open source tools
• How do we sell the idea?
Intellectual Merit
Provides an answer to the question about how to turn syntactic symbols into semantic knowledge
Shows how to create a web of data Shows how to establish a workbench with toolkits to convert
heterogeneous digital data into knowledge under the auspices of an ontology
Explores the synergistic interplay among ontology, epistemology, and logic for the advancement of knowledge
New ways to think about What knowledge is How knowledge is acquired What individuals know Community knowledge
Query and reasoning over fact-filled ontologies
Achievable intellectual objectives of this research:
Broader Impact
Harvests and make available facts from the wealth of available heterogeneous digital data
Harnesses and manage community knowledge with the objective of enhancing human cognition
Makes facts on the web (rather than pages) easily searchable by the general public
Makes fact creation and maintenance easily attainable by fact providers
Facilitates community agreement of ontologically specified knowledge
Provides a practical set of tools for knowledge management
Involve students, researchers, and knowledge workers from various disciplines in a community-wide effort to convert data into knowlege
Worthwhile implications of this research: