Top Banner
An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation
16

An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Jan 19, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

An Aspect of the NSF CDI InitiativeCDI: Cyber-Enabled Discovery and

Innovation

Page 2: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

From Data to Knowledge: Leveraging Ontology, Epistemology, and LogicDefinitionsA picture of the landscape of interestA workbench with toolkits (for “enhancing

human cognition and generating new knowledge from [the] wealth of heterogeneous digital data”)

Intellectual meritBroader impact

Page 3: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Definitions: “From Data to Knowledge”Progression of terms: symbols, data, conceptualized

data, knowledgeSymbols: characters and character-string instancesData: symbols as values in attribute-value pairsConceptualized data: data in the framework of a

conceptual modelKnowledge: conceptualized data with a degree of

certainty or community agreementFrom Data to Knowledge

Recognize symbolsClassify symbols with respect to meta-data attributesEmbed attribute-value pairs into a conceptual

framework of concepts, relationships, and constraintsPresent for community approval or integrate into

community-approved conceptualizations

Page 4: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Examples: From Data to KnowledgeCar Ads

Symbols: $, 12k, ford, 4-DoorData: price(12k), mileage(12k), make(ford)Conceptualized data:

Car(C123) has Price($12,000) Car(C123) has Mileage(12,000) Car(C123) has Make(Ford) BodyType isa Feature Car(C123) has Feature(Sedan)

Knowledge Community agreement that the ontology is “correct” Community agreement that the facts in the ontology are

“correct” AppointmentsBiology

Page 5: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Examples: From Data to Knowledge

AppointmentsBiology

Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->Appointment

Place

Insurance

Service Provider

Person

NameDoctor

Pediatrcian

Service Description

Duration

Medical Service Provider

Auto Service Provider Auto Mechanic

Dermatologist

Address

Cost

Date

Time

has

is at

is on

has

provides

has

accepts

hashas

"IHC"

is with

is for

is at

is at

has

"DMBA"

is at

->

Page 6: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Examples: From Data to Knowledge

Biology

Page 7: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Definitions: “Ontology,” “Epistemology,” and “Logic”Ontology

Existence answers “What exists?”For us, it answers: what concepts, relationships, and

constraints exist and how they are interrelated.Epistemology

The nature of knowledge answers: “What is knowledge?”, “How is knowledge acquired?”, “What do people know?”

For us, it answers: what is knowledge (conceptualized data with community agreement), how data becomes conceptualized and how conceptualized data becomes knowledge, and how someone’s conceptualized data corresponds with community-agreed-upon conceptualized data.

LogicPrinciples of valid inference – answers: “What can be

inferred?”For us, it answers: what can be inferred (in a formal sense)

from conceptualized data.

Page 8: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Examples: “For-Us Answers” Ontology: What exists?

In Car Ads: Car, Make, Model, Car has Make, Engine isa Feature In Appointments: Service Provider, Date, Appoint with Doctor In Biology: Protein Activity, Molecular Weight, Chromosome Location

is aggregate of ChromosomeNumber and Start and End and Orientation

Epistemology: What is knowledge?

A fact-filled Biology ontology Chromosome Number (21) starts at Start (29,350,518) and ends at End

(29,367,889) with Orientation(minus) How is it acquired?

Creation of a fact-filled Biology ontology obtained from a reliable source Provenance: Was the source from which the Biology ontology was created

reliable? What do people know?

Does my knowledge that I have an appointment with Dr. Jones on Thursday align with the appointment ontology as established by the doctor’s office?

I view the world with my car ads ontology how does it align with the community standard ontology?

Logic: Principles of valid inference Find red Nissans later than a 2002 with less than 100k miles In Appointments: can reason that a dermatologist is a medical service

provider

Page 9: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Landscape of Data and Knowledge

The creation of ontologies with community agreement Declaration of conceptual models (via ontology editor, forms, …) Recognition of meta-data in semi-structured text

The conversion of heterogeneous digital data into knowledge under an ontology Ontology-based/layout-based information extraction/annotation Data integration within an ontological context

The ability to match isolated ontologies with community ontologies (Semi-)automatic schema matching Traceability from symbols in a page of text to symbols as

ontological components of knowledgeThe ability to reason over ontologies to retrieve information both

given and implied Ontologies as first-order logic theories potentially modal logics

too Query (through both formal query languages and informal search)

over populated ontologies for facts (both recorded and implied)

Includes:

Page 10: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

A Workbench for Knowledge EngineeringUnified framework with a toolkit

supporting:Ontology creationData to knowledge conversionKnowledge solidificationCommunity usability

Usable by knowledge workers of varying degrees of sophistication

Page 11: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Ontology CreationObjective: Determine what concepts, relationships, and

constraints exist and how they are interrelatedContributing Solutions (what we have done or have in

progress) TANGO (creation, augmentation, adjustment) Forms to conceptual models (CT’s work) Table interpretation through forms to conceptual models (CT’s

work)Open Problems (what we need, and believe we can do)

Reverse engineer XML documents to an XML schema and then to C-XML (built on RA’s work)

Extract a specific ontology from a more general ontology (like YD’s MS work)

Merge ontologies (built on ZL’s work + LX’s work) Convert regular patterns in documents to conceptual models

Named regular expressions over patterns (based on 598R work) Generation of layout patterns converted to named patterns (based on

YD’s work)… more ??

Page 12: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Data to Knowledge ConversionObjective: Find ways to capture facts ontologically.Contributing solutions

Ontology-based information extraction Semantic annotation (YD’s work) Synergistic ontology-based/layout-based extractors (YD’s

work) Data frames as data-to-knowledge converters

Open problems User-directed annotation (like YD’s ASpaces work) User-directed conversion tools

Named regular-expression extractors wrt RDF, Named Graphs, OWL ontologies, OSM ontologies (like 598R work)

Generation of named regular-expression extractors from marked source documents (598R++ work)

Storage structures?… more ??

A Semantic-Web page consists of• the human-readable page (ordinary HTML, XML, …)• one or more annotation attachments

• a reference to the ontology used for annotation• RDF triples of extracted information• pointers into the original source for every item• highlighting possibilities for extracted data• hover possibilities to connect to the ontology

• directly query to annotation attachment• SPARQL• SerFR

Page 13: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Knowledge Solidification Objective: Obtain community agreement for fact-filled ontologies. Contributing solutions

Provide for recording provenance for individual facts (HC’s work for genealogical data)

TANGO: assume published tables have community agreement and therefore fact-filled ontologies grown from tables have community agreement.

Generally, assume published semi-structured data has community agreement and therefore ontologies and facts extracted from this semi-structured data has community agreement (CT’s work)

Open problems How do we solidify knowledge captured only as conceptualized

data (i.e., data extracted with respect to somebody’s “homegrown” ontology)? (Do we need to worry about this?)

Can we link identical facts in different sites? (begun with HC’s work)

Can we (should we) find ways to attach provenance to the ontology itself (not just to the facts )

Tool for community development of ontologies … more ??

Page 14: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Community Usability Objective: Provide (easy) access to knowledge both ontological

knowledge as well as facts. Contributing solutions

Ordinary query processing including servicing requests via free-form queries and service requests (MA’s work)

Information harvesting (CT’s work) Form/Table query processing (built on CT’s work and RPI’s query-

by-table) Information linkage (HC’s work, SI’s work)

Open problems Agents (e.g., in Aspaces YD’s work) Learning and self-adjustment of individual knowledge (How does

my knowledge align with community knowledge?), for the sake of Gaining encyclopedic knowledge Discovering gaps in knowledge Discovering potential adjustments and augmentations to community

knowledge and solidifying community knowledge Seeing knowledge objects from a different point of view

Orchestrating ontology-based services (MA’s future work) Practicalities?

… more ??

• Ease of Use• Free-form queries (+ linguistics)• Form-based queries (graphical?)

• Scalability• Semantic indexes• Caching (on the scale of Google)

• System Development• Demos• Open source tools

• How do we sell the idea?

Page 15: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Intellectual Merit

Provides an answer to the question about how to turn syntactic symbols into semantic knowledge

Shows how to create a web of data Shows how to establish a workbench with toolkits to convert

heterogeneous digital data into knowledge under the auspices of an ontology

Explores the synergistic interplay among ontology, epistemology, and logic for the advancement of knowledge

New ways to think about What knowledge is How knowledge is acquired What individuals know Community knowledge

Query and reasoning over fact-filled ontologies

Achievable intellectual objectives of this research:

Page 16: An Aspect of the NSF CDI Initiative CDI: Cyber-Enabled Discovery and Innovation.

Broader Impact

Harvests and make available facts from the wealth of available heterogeneous digital data

Harnesses and manage community knowledge with the objective of enhancing human cognition

Makes facts on the web (rather than pages) easily searchable by the general public

Makes fact creation and maintenance easily attainable by fact providers

Facilitates community agreement of ontologically specified knowledge

Provides a practical set of tools for knowledge management

Involve students, researchers, and knowledge workers from various disciplines in a community-wide effort to convert data into knowlege

Worthwhile implications of this research: