Top Banner
03/16/22 Gio - CERN 1 Heterogeneous Information Management June 2000 Gio Wiederhold Stanford University prepared for CERN seminar, June 2000
81

Heterogeneous Information Management

Dec 31, 2015

Download

Documents

Maya Stokes

prepared for CERN seminar, June 2000. Heterogeneous Information Management. June 2000 Gio Wiederhold Stanford University. Abstract. Information is created by applying knowledge (enoded as programs or rules) to collected data and message received. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Heterogeneous   Information Management

04/19/23 Gio - CERN 1

Heterogeneous Information Management

June 2000

Gio WiederholdStanford University

prepared for CERN seminar, June 2000

Page 2: Heterogeneous   Information Management

04/19/23 Gio - CERN 2

Abstract

Information is created by applying knowledge (enoded as programs or rules) to collected data and message received.

Data and computation resources are provided by a variety of suppliers, public and private.

The autonomy of the suppliers causes heterogeneity and inconsistencies. The number of potential suppliers and their autonomy also creates information overload

To cope with these issues novel intermediate services are needed, opening up new opportunities. Many traditional relationships among consumers and vendors will change.

We will present the concepts and status of such services. Collaboration, security, and payment schemes are some of the

considerations.

Page 3: Heterogeneous   Information Management

04/19/23 Gio - CERN 3

Outline

• Background for Mediated Systems• Motivation and Functions needed• Architecture• Current Status

• Resolving Semantic Heterogeneity• Research Directions• Background

– Maintenance– Research Projects– Integration of Simulation Information

Page 4: Heterogeneous   Information Management

04/19/23 Gio - CERN 4

Evolution of mediation

W2 W1

D2

D6D4

W3

I1

D1D5

I2

M1 M2

A1A4 A5A2

A6

a.

b.

A3

c.

d. e.

datasources

wrappers

mediators

network

integrators

applications

D3

Page 5: Heterogeneous   Information Management

04/19/23 Gio - CERN 5

Transforming Data to Information

Application Layer

Mediation Layer

Foundation Layer

data and simulation resources

value-added services

users at workstations

Page 6: Heterogeneous   Information Management

04/19/23 Gio - CERN 6

Data and Knowledge

Information is created at theconfluence ofdata -- the state & knowledge -- the ability to select and project the state into the future

Knowledge LoopKnowledge LoopData LoopData Loop

EducationEducation

RecordingRecording

ActionAction

StorageStorage

SelectionSelection

IntegrationIntegration

SummarizationSummarization

Decision-makingDecision-making

State changesState changes

AbstractionAbstraction

ExperienceExperience

Page 7: Heterogeneous   Information Management

04/19/23 Gio - CERN 7

Definition*

A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications.

It should be small and simple, so that it can be maintained by one expert or, at most, a small and coherent group of experts.

* Wiederhold: IEEE Computer March 1992

Page 8: Heterogeneous   Information Management

04/19/23 Gio - CERN 8

Information overload Data starvation

• More databases– public & corporate

• Faster communication– digital– packeting: TCP-IP, ATM

• World-wide connectivity– Internet & Intranets– world-wide web

• Disintermediation– ubiquitous publishing

Page 9: Heterogeneous   Information Management

04/19/23 Gio - CERN 9

Change in Supply vs Demand

What information consumes is rather obvious, it consumes the attention of its recipients.

Hence a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it.

[Herbert Simon]

Page 10: Heterogeneous   Information Management

04/19/23 Gio - CERN 10

Function of Mediation

Apply Domain-specific Specialist Knowledge to add value

• to locate data sources• to convert for consistency• to integrate from diverse sources• to describe data for processing• to abstract for insight / models• to extrapolate to new situations• to summarize for presentation

INFORMATION

Page 11: Heterogeneous   Information Management

04/19/23 Gio - CERN 11

Interfaces

Service Service interfaceinterface

Resource accessResource access interfaceinterface

User interfaceUser interface

Real-worldReal-world interfaceinterface

Human-computerHuman-computer InteractionInteraction

Application-Application- specific codespecific code

Domain-Domain- specific specific codecode

Source-Source- specificspecific codecode

MEDIATIONMEDIATION

Page 12: Heterogeneous   Information Management

04/19/23 Gio - CERN 12

Making data relevant

• Data reduction• Data abstraction

– Level changing– Summarization– Exception search– Level change to integrate with

other data sources

• Follow Customer Model: hierarchical, divide-and-conquer, a common paradigm

Page 13: Heterogeneous   Information Management

04/19/23 Gio - CERN 13

Functions inside Mediation

Selection

Summarize

Transform

Inte- -gration

Hetero-

genous

resources

articulation

Page 14: Heterogeneous   Information Management

04/19/23 Gio - CERN 14

Status of Mediation Technology

Today• Handcrafted• Expert consults with

programmer • Programmer codes the

knowledge needed• Resource changes

require advise, program update

Future• Generated from

models• Domain Expert

maintains models• Specification

determines functions • Resource changes

trigger regeneration

Page 15: Heterogeneous   Information Management

15Gio - CERN04/19/23

Coverage of Current DARPA I3 Efforts

Databases / Web / Text / Simulation

Facilitation(auto linking)

Maintenance(rule technology?)

Discovery(web,schemasearching)

Wrapping (syntactical heterogeneity)

Integrationover sources

Abstractionfor relevanceto customer

Mediators for multiple domains

Caching /History

:-[

:-[

:-[

:-(

:-(

:-)

:-(:-(

Securityfor cooperation:-(

:-|

:-|

:-)

Good progressGood progress / / active researchactive research / / related workrelated work / / poor coveragepoor coverage

:-)

:-[

:-)

(( ]] || ))

Page 16: Heterogeneous   Information Management

04/19/23 Gio - CERN 16

Mediator Design Principle

Transform Data into Information

Match

Costumer Model

Hierarchical

to

Resource Model

General network

(and maintain models)

Page 17: Heterogeneous   Information Management

04/19/23 Gio - CERN

Heterogeneity among Domains

If interoperation involves distinct

domains mismatch ensues• Autonomy conflicts with consistency,

– Local Needs have Priority,– Outside uses are a Byproduct

Heterogeneity must be addressed• Platform and Operating Systems • Representation and Access Conventions • Naming and Ontology

Page 18: Heterogeneous   Information Management

04/19/23 Gio - CERN 18

Unsolved problem in Interoperation

Common assumption in assembling and integrating distributed information resources

• The language used by the resources is the same• Sublanguages used by the resources are subsets of a

globally consistent language

This assumption is provably false.

Working towards the goal of global consistency is

1. naïve -- the goal cannot be achieved

2. inefficient -- languages are efficient in local contexts

Page 19: Heterogeneous   Information Management

04/19/23 Gio - CERN 19

Ontology: components .

We represent the contents and structure of a languages by its ontology:

• a set of well-defined terms, which delimit the domain of discourse

• relationships among those terms, chosen from a limited set

a formalizable subset of expert knowledge

Page 20: Heterogeneous   Information Management

04/19/23 Gio - CERN 20

SKC’s grounded definition .

• Ontology: a set of terms and their relationships• Term: a reference to real-world and abstract objects• Relationship: a named and typed set of links between objects• Reference: a label that names objects• Real-world object: an entity instance with a physical manifestation• Abstract object: a concept which refers to other objects

Page 21: Heterogeneous   Information Management

04/19/23 Gio - CERN 21

Where are Ontologies found?

Ontologies allow communication among partners in enterprises (rarely in machine-readable form)

Relationships determine meaning - parent, school, company

Variable and Class names in SoftwareDatabases use ontologies during design

in their E-R diagrams (implicitly) and to represent the leaf nodes in their schemas.

Knowledge-bases use term ontologies (often explicitely), add class definition (to hold instances), constraints, and operations among the terms.

Page 22: Heterogeneous   Information Management

04/19/23 Gio - CERN 22

Establishing Ontologies

Top-down: – Commonly acceptable UPPER layers

Domain-specific– Analysis and Sharing tools– Model and Object-type based

Bottom-up– Wordlist creation from task-specific collections– Database models, schemas, and contents

Page 23: Heterogeneous   Information Management

04/19/23 Gio - CERN 23

Large Ontologies: good or bad?

Have all the Knowledge together+ simple for customers of KBs– hard for owners of KBs, must synchronize with many others– in the limit -- everybody must be globally consistent

Large KB will cover multiple / all domains created by a committee -- slow

maintained by a committee -- costly

Differences in level of abstraction -- efficiency homeowner: nail carpenter: sinker, brad, boxnail, . . .

Page 24: Heterogeneous   Information Management

04/19/23 Gio - CERN 24

Domain ontology assumption .

• a domain will contain known objects• the object configuration is consistent• within a domain all terms are consistent &• relationships among objects are consistent

• context is implicit in use• explicit context is needed for external use

No committee is needed to forge compromises * within a domain Compromises hide valuable details

Domain Ontology

Page 25: Heterogeneous   Information Management

04/19/23 Gio - CERN 25

SKC Objective

Provide for Maintainable Ontologies

• devolve maintenance onto many domain-specific experts / authorities

• provide an algebra to compute composed ontologies that are limited to their articulation terms

• enable interpretation within the source contexts

SKC

Page 26: Heterogeneous   Information Management

04/19/23 Gio - CERN 26

Conservative assumption !

When dealing with multiple ontologies one can never be sure that identically or similarly spelled words mean the same thing,

I.e, refer to exactly the same set of real-world objects under all current and future conditions

• Common, optimistic assumption: Meaning is identical– Gets worse when terms are stemmed

• SKC, conservative or pessimistic assumption: Meaning never matches, unless there is a match rule– number of matching rules is reduced by focusing on the

articulation

Page 27: Heterogeneous   Information Management

04/19/23 Gio - CERN 27

An Ontology Algebra

A knowledge-based algebra for ontologies

The Articulation Ontology (AO) consists of matching rules that link domain ontologies

Intersection create a subset ontology keep sharable entries

Union create a joint ontology merge entries

Difference create a distinct ontology remove shared entries

Page 28: Heterogeneous   Information Management

04/19/23 Gio - CERN 28

Sample Operation: INTERSECTION

Source Domain 1:Owned and maintained by Store

Result contains shared terms

Source Domain 2:Owned and maintainedby Factory

Terms usefulfor purchasing

Page 29: Heterogeneous   Information Management

04/19/23 Gio - CERN 29

INTERSECTION support

Store Ontology

Articulation ontology

Matching rules that use terms from the 2 source domains

Factory Ontology

Terms usefulfor purchasing

Page 30: Heterogeneous   Information Management

04/19/23 Gio - CERN 30

Sample Intersections

Shoe Store• Shoes { . . . }• Customers { . . . }• Employees { . . . }

size = size color =table(colcode)

style = style

Ana-tomy {. . . }

• Material inventory {...}• Employees { . . . }• Machinery { . . . }• Processes { . . . }• Shoes { . . . }

Shoe Factory

Hard-ware

Articulation ontologymatching rules :

foot = foot Employees Employees Nail (toe, foot) Nail (fastener). . . . . .

Department Store

Page 31: Heterogeneous   Information Management

04/19/23 Gio - CERN 31

Other Basic Operations

typically priorintersections

UNION: mergingentire ontologies

DIFFERENCE: materialfully under local control

Arti-culation ontology

Page 32: Heterogeneous   Information Management

04/19/23 Gio - CERN 32

Features of an algebra

Operations can be composed

Operations can be rearranged

Alternate arrangements can be evaluated

Optimization is enabled

The record of past operations can be

kept and reused

Page 33: Heterogeneous   Information Management

04/19/23 Gio - CERN 33

Articulationknowledgefor U

U

U

(A B)U

(B C)U

(C E)

Knowledge Composition

Knowledge resource

B

Knowledge resource

A

Knowledge resource

C

Knowledge resource

D

U

(C D)

U

(B C)

Articulation knowledgefor

Composed knowledge forapplications using A,B,C,E

Knowledge resource

E

U

(C E)

Legend:

U : union

U

: intersection

Articulationknowledgefor (A B)

U

Page 34: Heterogeneous   Information Management

04/19/23 Gio - CERN 34

Sample Processing in HPKB

• What is the most recent year an OPEC member nation was on the UN security council?

– Related to DARPA HPKB Challenge Problem

– SKC resolves 3 Sources

• CIA Factbook ‘96 (nation)

• OPEC (members, dates)

• UN (SC members, years)

– SKC obtains the Correct Answer

• 1996 (Indonesia)

– Other groups obtained more,

but factually wrong answers

– Problems resolved by SKC

* Factbook has out of date OPEC & UN SC lists

– Indonesia not listed

– Gabon (left OPEC 1994)

* different country names

– Gambia => The Gambia

* historical country names

– Yugoslavia

• UN lists future security council members

– Gabon 1999

• intent of original question

– Temporal variants

Page 35: Heterogeneous   Information Management

04/19/23 Gio - CERN 35

Tools to create articulations

Graph matcherforArticulation- creatingExpert

Vehicle ontology

Transport ontology

Suggestionsfor articulations

Page 36: Heterogeneous   Information Management

04/19/23 Gio - CERN 36

continue from initial point

Also suggest similar terms for further articulation:

• by spelling similarity,• by graph position• by term match repository

Expert response:1. Okay2. False3. Irrelevant to this articulation

All results are recorded

Okay’s are converted into articulation rules

Page 37: Heterogeneous   Information Management

04/19/23 Gio - CERN 37

Candidate Match Repository

Term linkages automatically extracted from 1912 Webster’s dictionary *

* free, other sources .have been processed.

Based on processing headwords definitions using algebra primitives

Notice presence of 2 domains: chemistry, transport

Page 38: Heterogeneous   Information Management

04/19/23 Gio - CERN 38

Using the match repository

Page 39: Heterogeneous   Information Management

04/19/23 Gio - CERN 39

Navigating the match repository

Page 40: Heterogeneous   Information Management

04/19/23 Gio - CERN 40

Primitive Operations

Unary• Summarize -- structure up• Glossarize - list terms• Filter - reduce instances• Extract - circumscription

Binary • Match - data corrobaration• Difference - distance

measure• Intersect - schem discovery• Blend - schema extension

Constructors• create object• create setConnectors• match object• match setEditors• insert value• edit value• move value• delete valueConverters• object - value• object indirection• reference indirection

Model and Instance

Page 41: Heterogeneous   Information Management

04/19/23 Gio - CERN 41

Future: exploiting the result

Processing & query evaluation is best performed within Source

Domains & by their engines

Result has linksto source

Avoid n2 problem of interpretermapping as stated by Swartout as an issue in HPKB year 1

Page 42: Heterogeneous   Information Management

04/19/23 Gio - CERN 42

SKC Synopsis

• Research: Reliable query answers from heterogeneous, imperfect data sources

• Sources:– General: CIA World Factbook ‘96, UN www, OPEC www

Webster’s Dictionary, Thesaurus, Oxford English Dictionary

– Topical: OPEC, BattleSpace Sensors, Logistics Servers

• Client: DARPA High Performance Knowledge Base

(HPKB) project

• Theory: Rule-based algebra– Translation & Composition primitives

Page 43: Heterogeneous   Information Management

04/19/23 Gio - CERN 43

Innovation in SKC

• No need to harmonize full ontologies• Focus on what is critical for interoperation• Rules specific for articulation• Potentially many sets of articulation rules

• Maintenance is distributed– to n sources– to m articulation agents

is m < n2 , depending on architecture density a research question

Page 44: Heterogeneous   Information Management

04/19/23 Gio - CERN 44

Domain Specialization

• Knowledge Acquisition (20% effort) &• Knowledge Maintenance (80% effort *)

to be performed by• Domain specialists• Professional organizations• Field teams of modest size

Empowermentautomouslymaintainable

* based on experience with software

Page 45: Heterogeneous   Information Management

04/19/23 Gio - CERN 45

SKC Summary .

• Algebra enables Interoperation bydealing explicitly with differences by knowledgeidentifying maintenance domainskeeping sources autonomous

• Assumes domain has a common ontologycomposing domain ontologies requires the algebra to manage the

linkages where articulation occursprocesses are best executed within the domains

• Knowledge about articulation is disjoint allows integration specialists to work independentlysupports multiple intersections and views

• Maintenance is structured and partitioned

Page 46: Heterogeneous   Information Management

04/19/23 Gio - CERN 46

Current SKC Directions

• Experience with real world (imperfect) data confirms validity of our approach

– Expert sources are better maintained than general sources– Rules applied to multiple sources provide more reliable and

accurate query results– Component architecture enables scalable, maintainable

knowledge base development

• Porting the concepts to the DARPA Markup Language (DAML) setting

Page 47: Heterogeneous   Information Management

04/19/23 Gio - CERN 47

Mediation Research Topics

• Mediator management and maintenance• Representation of knowledge and customer models• Balancing dynamic and warehouse solutions• Formalization of semantic heterogneities

– many levels and types – roles for wrappers vs. mediators vs. applications– scalability by partitioning -- make it simple!– Domain Ontologies --- tools, validation, . . .

• Effect of object paradigm and method-based access• Service and business models • New types of information systems

Page 48: Heterogeneous   Information Management

04/19/23 Gio - CERN 48

IntegrationScience

IntegrationScience

ArtificialIntelligence

knowledge mgmtdomain expertise

uncertainty

ArtificialIntelligence

knowledge mgmtdomain expertise

uncertainty

Systems Engineering

analysisdocumentation

costing

Systems Engineering

analysisdocumentation

costing

Databasesaccessstoragealgebras

Databasesaccessstoragealgebras

Long Range Science Vision

Integration Methods

GISSpatial is special.

Page 49: Heterogeneous   Information Management

04/19/23 Gio - CERN 49

Background Material:

• Technology Sources• Maintenance• Projects• Information about the Future

Page 50: Heterogeneous   Information Management

04/19/23 Gio - CERN 50

Interfaces

Application Application Mediator Mediator{OQL, KQML, ...}{OQL, KQML, ...}

Mediator Mediator Data sources Data sources{SQL, TQL, XML, … }{SQL, TQL, XML, … }

Data Data real worldreal world{sensors, clerks, … }{sensors, clerks, … }

Human Human Computer Computer{x-widgets, HTML}{x-widgets, HTML}

Page 51: Heterogeneous   Information Management

04/19/23 Gio - CERN 51

Support for KB-Algebra

• Ontolingua [Gruber, Fikes @ Stanford KSL]: Repository for Domain Terminologies

Used for mechanical design, bibliographies, catalogs

• LOOM [MacGregor@ USC ISI]: Classification-based Expert System

Helps in structuring and processing ontologies

• PROTÉGÉ [Musen@ Stanford MIS] Reuse

• Penguin [Barsalou, Keller@ Stanford MIS, CIFE]: Object manipulation based on Relational Algebra

Used for genetics laboratory, building design

Page 52: Heterogeneous   Information Management

04/19/23 Gio - CERN 52

Getting there:Available Technology/Science

CachingCachingCachingCaching

Uncertainty algebrasUncertainty algebrasUncertainty algebrasUncertainty algebras

GISGISGISGIS

Temporal AlgebrasTemporal AlgebrasTemporal AlgebrasTemporal Algebras

Active DatabasesActive DatabasesActive DatabasesActive Databases

AgentsAgentsAgentsAgentsWeb Search ToolsWeb Search ToolsWeb Search ToolsWeb Search Tools

Security FiltersSecurity FiltersSecurity FiltersSecurity Filters

Object BasesObject BasesObject BasesObject Bases

KnobotsKnobotsKnobotsKnobots

WrappersWrappersWrappersWrappersDB ViewsDB ViewsDB ViewsDB Views

High Perf.Comm.High Perf.Comm.High Perf.Comm.High Perf.Comm.

Simulation AccessSimulation AccessSimulation AccessSimulation Access

Database ModelsDatabase ModelsDatabase ModelsDatabase Models

Internet BillingInternet BillingInternet BillingInternet Billing

Customer ModelsCustomer ModelsCustomer ModelsCustomer Models Constraint ManagementConstraint ManagementConstraint ManagementConstraint Management

Case-based ReasoningCase-based ReasoningCase-based ReasoningCase-based Reasoning

Distributed Storage SystemsDistributed Storage SystemsDistributed Storage SystemsDistributed Storage Systems

Multimedia InterfacesMultimedia InterfacesMultimedia InterfacesMultimedia Interfaces

CircumscriptionCircumscriptionCircumscriptionCircumscription

Communication StandardsCommunication StandardsCommunication StandardsCommunication Standards

Domain OntologiesDomain OntologiesDomain OntologiesDomain Ontologies

Text & Speech ProcessingText & Speech ProcessingText & Speech ProcessingText & Speech Processing

Public DatabasesPublic DatabasesPublic DatabasesPublic Databases

GISGISGISGIS

Page 53: Heterogeneous   Information Management

04/19/23 Gio - CERN 53

Fat versus thin mediators

• too broad:

hard to maintain, needs a committee

• too thin: insufficient added value

• Too fat: hard to

compose

• Too narrow: few costumers

domain scope

service scope

Just right

Page 54: Heterogeneous   Information Management

04/19/23 Gio - CERN 54

Maintenance is good for you

rela

tive

an

nu

al

mai

nte

nan

ce c

ost

dep

reci

atio

n =

1 /

lif

etim

e

automobile hardware software automobile hardware software

100%100%

4040

00

2020

7070

3030

1010

8080

9090

6060

5050

life

tim

eli

feti

me

yearsyears 10 10

44

22

77

33

11

88

99

66

55

1313

1111

1212??

Page 55: Heterogeneous   Information Management

04/19/23 Gio - CERN 55

Client-Server Architecture

Client system

data and simulation resources

Fast build of clients by resource reuse

s X

Changes (x) are difficult,can affect many clients

Page 56: Heterogeneous   Information Management

04/19/23 Gio - CERN 56

Systems with Mediators

Applications . . . .

Mediators . . . . . .

Data Resources . . .

Gio Wiederhold. 1995

Page 57: Heterogeneous   Information Management

04/19/23 Gio - CERN 57

Growth through Reuse

New Application

Prior & Revised Mediators

Extended Data Resources

Gio Wiederhold. 1995

Page 58: Heterogeneous   Information Management

04/19/23 Gio - CERN 58

Linear O(n) Cost of Growth-- now

O(n2)

• Data changes only affect some mediators; only in their domain

• Mediators can

1. supply old information to n-1 prior applications

2. provide better information to the new application

3. be partially or completely reused

• New applications, using the new data, can be developed and inserted dynamically

27

Page 59: Heterogeneous   Information Management

04/19/23 Gio - CERN 59

A mediator is not just static software: Knowledge ages

ApplicationInterface

Resource Interfaces

Owner / Creator Maintainer Lessor - Seller Advertisor

Changes ofuser needs

Domainchanges

Resource changes

Models, programs,rules, caches, . . .

Software & People

Page 60: Heterogeneous   Information Management

04/19/23 Gio - CERN 60

Roles

Computer Scientists• Provide tools

– adapatation– integration– matching– composing

• Assess Standards• Assure scalability

Domain Experts• Learn to use the tools• Select resources• Assess their value • Rank their quality • Resolve semantics• Get client feedback• Give provide feedback

Page 61: Heterogeneous   Information Management

04/19/23 Gio - CERN 61

Assigning maintenance responsibility

a. Source data quality –supplier database, files, or web pages

b. Interface to the source – wrapper, supplier or vendor for supplier

c. Source selection – expert specialist in mediator

d. Source quality assessment – customer input to mediator

e. Semantic interoperation – specialist group providing input to the mediator

f. Consistency and metadata information – mediator service operation or warehouse

g. Informal, pragmatic integration – client services with customer input

h. User presentation formats – client services with customer input

Services

Sources

Customers

Page 62: Heterogeneous   Information Management

04/19/23 Gio - CERN 62

Sample projects

• Tsimmis at Stanford• E-Commerce in Digital Libraries• INEEL: information integration for environmental

restoration• MIFT: feedback for training• Civil Engineering and Architecture• F-22• SimQL• Security

Page 63: Heterogeneous   Information Management

04/19/23 Gio - CERN 63

Projects at Stanford DB group

Data Mining.Data Mining.

Mediator & Wrapper Mediator & Wrapper

Generation.Generation.

Warehousing.Warehousing.

Security Mediators.Security Mediators.

Megaprogramming.Megaprogramming.

Simulation Access.Simulation Access.

Changes, Consistency,Changes, Consistency,

and Configurations.and Configurations.

Data Mining.Data Mining.

Mediator & Wrapper Mediator & Wrapper

Generation.Generation.

Warehousing.Warehousing.

Security Mediators.Security Mediators.

Megaprogramming.Megaprogramming.

Simulation Access.Simulation Access.

Changes, Consistency,Changes, Consistency,

and Configurations.and Configurations.

TSIMMISTSIMMIS

CHAIMS CHAIMS SimQLSimQL

TIHITIHI

C3C3

MIDASMIDAS

WHIPSWHIPS

Page 64: Heterogeneous   Information Management

04/19/23 Gio - CERN 64

The TSIMMIS ProjectRamana Yerneni, Yannis Papakonstantinou, ...

• Objective: Support mediation technology– integrated access to distributed, autonomous,

heterogeneous data sources, using object fusion– wrapper toolkit to rapidly create wrappers, based on

source specification, a uniform interface to heterogeneous sources

– mediator toolkit to rapidly construct mediators, based on a mediator specification, to integrate data from a set of wrappers

Page 65: Heterogeneous   Information Management

04/19/23 Gio - CERN 65

Investors Need to Fuse Informationfrom Multiple Sources .

• group together information about

the same real-world entity• remove redundancies • resolve conflicts

WWW

Ticker Tape Personaldatabase

NetworkNetwork

Page 66: Heterogeneous   Information Management

04/19/23 Gio - CERN 66

An Integration Architecture

ClientApplication

business reports

portfolios for each company

stock market prices

WrapperWrapper

TickerTape Dialog

Mediator

Page 67: Heterogeneous   Information Management

04/19/23 Gio - CERN 67

Additional Challenge: Sources Without a Well-Structured Schema

• semistructured– irregular– deeply nested

• incomplete schema knowledge– autonomous– dynamic

• World Wide Web• SGML documents• genome, chemical

structures• bibliographic

information• files

Examples

Page 68: Heterogeneous   Information Management

04/19/23 Gio - CERN 68

Wrappers & Mediators from High-Level Specifications

Wrapper

Client

Mediator SpecificationInterpreter

DeclarativeMediatorSpecification

Source Source

DeclarativeSource

Specifications

Mediator

Wrapper

Wrapper SpecificationInterpreter

Page 69: Heterogeneous   Information Management

04/19/23 Gio - CERN 69

E-money

Services must be paid for• Incentive for creation and improvement• price proportional to value added, often small

• profit f (cost, market, price, overhead )

• price low per item, so overhead must be low

Simple payment (no credit accounts, checks)

Enabled through secure signatures

yes

Page 70: Heterogeneous   Information Management

04/19/23 Gio - CERN 70

E-Commerce in the Digital Library

DeliveryCryptolope

DigiBoxHTTPE-mail

Shopping Models: Pay-per-view, Subscription, Session, Shareware, Auctions, Site License,

MajorIntegration

Problem

Steven Ketchpel & DL Economics Group

PaymentCyberCashDigiCash

First VirtualSET

Gift Certificate, Layaway, Pre-paid vouchers, … .

Page 71: Heterogeneous   Information Management

04/19/23 Gio - CERN 71

Shopping model: merchant-independent logic controlling flow of business model

Example shopping models:

Order, Pay, (Deliver 52 times)(1 month; Order, Deliver) Pay

Bill

Start Transfer $

OrderComplete

Payment Complete

Event Handlers

2 1

34

Even

t Han

dlers

Even

t Han

dlers

Proxy event handlers translate from

native applications to shopping model defined protocols

Abstract API allows application to

interact with many different services in a consistent way

Abstract API allows application to

interact with many different services in a consistent way

Payment/Delivery/Other Services

Customer Merchant

Event Handlers

State Information

Page 72: Heterogeneous   Information Management

04/19/23 Gio - CERN 72

TSIMMIS Status

• Mediator Specification Interpreter running on Ultrix, AIX, OSF.

• 9000 lines of C/C++ code• 4000 C++ lines of Server/Client Support Libraries • Integration of three disparate bibliographic sources

– legacy system– flat BibTeX files– relational DB– wwWeb files

Page 73: Heterogeneous   Information Management

04/19/23 Gio - CERN 73

Mediator Specification Interpreter Architecture

Query Rewriter

Cost-Based Optimizer

Datamerge Engine

MediatorSpecification

Query

logical datamerge program

plan

Result

Queries toWrappers Results

Page 74: Heterogeneous   Information Management

04/19/23 Gio - CERN 74

Environmental Restoration at INEL Undoing 50 years of messes

…. MQL [ISX]

MSL [Stanford]OQL [ODMG]

QEM

mediator

QEMQEM

QEM

QEM

QEM

CORBA

othermediators

OEMOEM

OEM

OEM

OEMOEM

OEM

QEM

QEM

Idaho NationalEngineering Laboratory04/19/23

LOCKHEED MARTINISX - Stanford Univ.

Many projectsMany projectsmany sourcesmany sources

wrapper

wrapper

ERIS

wrapper

IEDMS

wrapper

Page 75: Heterogeneous   Information Management

04/19/23 Gio - CERN 75

Mediation to Implement Feedback in TrainingDavid Maluf, Priya Panchapagesan, Ted Linden

Abstraction to match levels of granularity

Abstraction

Another task of mediators, prior to integrationAnother task of mediators, prior to integration

MIFT

Page 76: Heterogeneous   Information Management

04/19/23 Gio - CERN 76

Mediation Feedback:Playback or Graph

Janus SimNet

TraineesTraineesObservers

Commanders Training Developers Analysts

Wrapped Simulation Resources

Mediation Layers

Application Layer

Mediators with rules in CLIPS

Standardsin KQML

Wrappersin C/C++

UI in Java

User Interface

I.D.A

Stanford

Objectives

Tasks

Page 77: Heterogeneous   Information Management

04/19/23 Gio - CERN 77

MIFT . Result .

Analyses:Analyses:• Force ratioForce ratio• LossesLosses• Area gainArea gain

ExerciseExercise

SimulatorSimulatorTypeType

Page 78: Heterogeneous   Information Management

04/19/23 Gio - CERN 78

Control Valve Sizing, Future

• Interpretation– Programmatic

• Analysis– Integrated

• Evaluation– Integrated

• Transformation– Automated

From Andrew Arnold: Civ. Eng. Qualification Exam

Page 79: Heterogeneous   Information Management

04/19/23 Gio - CERN 79

F-22 IWSDB Phase 6

Integration ServicesUser Interfaces

SSQQLL

PD DS

WrappersDatabases

Domain Model

Matchmaker Domain

Matching

ChangeNotification

Query Re-formulation

Provi-sioner

Engi- neer

Appli-cationPRIDE

IWSDBclient

GUI

WAISserver

Index

Suppliers

Sy-base

Page 80: Heterogeneous   Information Management

04/19/23 Gio - CERN 80

Simulation services

1. Continously executing: weather prediction– SimQL result reports best match samples

2. Execution specific to query: what-if assessment, spreadsheets– may require HPC power for adequate response

3. Complement base data: materials data, assembly – performs inter- or extra-polations to match query parameters

4. Combinations of 2. and 3.: top layer simulation using stored partial lower level results: weapon performance in setting

5. Human-in-the-loop (mediated by an agent program): SAFs

Note• A simulation service program can be written in any language• A simulation service must be compliant to the interface

Page 81: Heterogeneous   Information Management

04/19/23 Gio - CERN 81

SimQL: Simulation Access Service

Decision-making requires dealing with the future, as well the past

• Databases deal well with the past

• Sensors can provide current status

• Spreadsheets, simulations deal with the likely futures

Information systems should be able to combine all three

timetimepast SQL now SimQL futurepast SQL now SimQL future

Information Systems should also deal with the Future