Top Banner
Infocosm (Amit Sheth) 1 From From Data Dirt Roads Data Dirt Roads To To Infocosm Infocosm Amit Sheth Amit Sheth Large Scale Distributed Information Systems Lab Large Scale Distributed Information Systems Lab University of Georgia University of Georgia http://www.cs.uga.edu/LSDIS http://www.cs.uga.edu/LSDIS [email protected] [email protected] Special Thanks: Vipul Kashyap, Srilekha Mudumbai Invited Talk, 7th Intl. Conf. on Management of Data, Pune, India, Dec. 29, 1995. [Some parts of the talk emphasize issues and perspectives of particular interest to developing countries.]
67

Data dirtroad infocosm-1995

Oct 17, 2014

Download

Business

This talk given in 1995 introduced InfoHarness (research 1993, commercial product 1995 from Bellcore/Telcordia), which supported Browser (Mozilla) based faceted, attribute and keyword based search. Also introduces InfoQuilt and OBSERVER's multiple domain ontology based access to heterogeneous data sources.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 1

FromFrom Data Dirt Roads Data Dirt RoadsToTo InfocosmInfocosm

Amit ShethAmit ShethLarge Scale Distributed Information Systems LabLarge Scale Distributed Information Systems Lab

University of GeorgiaUniversity of Georgiahttp://www.cs.uga.edu/LSDIShttp://www.cs.uga.edu/LSDIS

[email protected]@cs.uga.edu

Special Thanks: Vipul Kashyap, Srilekha Mudumbai

Invited Talk, 7th Intl. Conf. on Management of Data, Pune, India, Dec. 29, 1995.[Some parts of the talk emphasize issues and perspectives of particular interest to developing countries.]

Page 2: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 2

OutlineOutline

! Infrastructurecomputing and communication to support informationsociety in the next century

!State-of-the-artInternet, WWW, Electronic Commerceunique opportunities due to Network Computing

!New Challenges in Information Management

Page 3: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 3

Our Journey to theOur Journey to theInformation SocietyInformation Society

! Information (Data) Superhighway (aka Infobahn)material/physical object-- geography, distance*infrastructure not services or application*

high cost of broadband fiber-optic networks

promoted TV (not computer) as the user devicecomputer has been found to be a better starting point

promise of applications envisaged earlier have fizzled500 channel TV, VOD and the interactive TV? What went wrong?

* see also for related discussion: The Road Ahead, Bill Gates.

Page 4: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 4

Journey and DestinationJourney and Destination

! “The Information superhighway is animpoverished metaphor -- it describes only ameans of transportation. We need a descriptionof the destination, of the Infocosm.” [Ferguson]

Glover Ferguson, Computer World, Vol. 1, Iss. 6, July 17 1995.

Page 5: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 5

Destination InfocosmDestination Infocosm

a society whose members (“organisms”) can havemore effective decision making capability usinginformation that is available whenever needed, atany place, and in (m)any form(s) [Sheth 93]a world where people will work, learn and play,unconstrained by time, place and form [Ferguson 95]

Sheth and Kashyap, Information Brokering- A Key Challenge inthe emerging Infocosm, December 1993.

Glover Ferguson, Computer World, Vol. 1, Iss. 6, July 17 1995.

Page 6: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 6

Related ThemesRelated Themes

!Telecosm (George Gilder)focus on communication and data

need to add computing and information

!Telepresence! “Information at Fingertips”*

Page 7: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 7

ComponentsComponents

InformationInformation

ComputingComputing CommunicationCommunication

Page 8: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 8

Computing and CommunicationComputing and Communication

!The last decade belonged to computing,PC is 120B$ business.

!Next decade will belong to communication.!Future computing will be networked oriented

(analogy of neural nets: “the intelligence will be in the network”).We will sell and use Information, not data."Opportunity for Developing Countries!

Page 9: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 9

Wireless communicationWireless communication

• Stratospheric market estimates are norms, butprior expectations have exceeded

• Estimates of wireless communication devices:224M by the year 2000, 300M by the year 2002

• Growth rate of 40-50% => economy of scale• Tremendous number of alternatives• India and developing countries will have larger

share of wireless than most expect

Page 10: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 10

Wireless Mela?

EBB, February 1995.

Page 11: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 1

Wireless communication andWireless communication andcomputingcomputing

PCS

0

10

20

30

40

50

60

1994 1996 1998 2000 2002 2004 2006

Paging

Cellular

U.S. installed base (millions)

EBB, February 1995

Page 12: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 2

TelecommunicationsTelecommunications-- a few observation about India-- a few observation about India

Madras ISDN-trials were announced for 8/95in Madras (probably delayed?)$500 deposit (Handset) to $1550 (PBX)

Internet-access announcede-mail available in several cities; WWW in the few largest onesPossible significant use of satellite

64Kbps dedicated lines are routinely used by largecompanies for international communication

Expect big take-off for paging and for possibly cellular

Privatization, while delayed, will have the biggest impact

Page 13: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 3The Economist, November 18, 1995

Possible Temporary Setback

Page 14: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 4

CYBERSPACE COMPONENTSCYBERSPACE COMPONENTS

END-USER SERVICES & APPLICATIONS

INFRASTRUCTURE SERVICES:PUBLIC & PRIVATE/COMMERCIAL NETWORKS

SOFTWARE PROTOCOLS & “STANDARDS”

COMMUNICATION INFRASTRUCTURE &PROTOCOLS

NODES & REPOSITORIES[ TEXT, AUDIO, IMAGE, VIDEO STRUCTURED DATABASES]

Page 15: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 5

CYBERSPACE COMPONENTSCYBERSPACE COMPONENTS

End-User Services & Applications• Electronic Commerce• Information Commerce/ Mall• Digital Library• Video on demand, 500 channels• Edutainment• Virtual Corporation

Infrastructure Services•WWW•White Board, Groupware (Notes)• Payment / Billing / Collection• Security• Authentication

Software Protocols & Standards•EDI, PDES, HTML, ...• http , -telnet, r-login• ftp, X.400

Communication Infrastructure•Periodic Connection;On-line Connection• X.25 Wireless Wired(paging) CopperCellular CableSatellite Fiber

one waytwo way

Page 16: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 6

InternetInternet-- a few personal experiences-- a few personal experiences

! organizing international trip! e-mail exchange with a friend in Ahmedabad! ftp and WWW for book project management! use of WWW for

paper distributioncomplete workshop managementpublicity for the lab (4000+ accesses per week)direction to my home

Page 17: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 7

InternetInternetCommerceNet-Neilsen SurveyCommerceNet-Neilsen Survey

! 11% (24M) use Internet in USA and Canada;17% (37M) have access; 8% (18M) use Web

!Use in last 24 Hours: Access the Web (72%);Send e-mail (65%); Non-interactive discussion(36%); Download Software (31%); Use Anothercomputer (31%); Interactive Discussion (21%),Real-time audio or video (19%)

!Average use: 5.5 hr per week!

Page 18: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 8

WWWWWW

Also from CommerceNet-Neilsen survey--Also from CommerceNet-Neilsen survey--Use of the Web forUse of the Web for

Browse or explore (90%)search for other information (73%)search for information on companies/organizations(60%)Search for information on products/services (55%)purchase product or services (14%; 2.5M)

Web users are upscale with an annual income ofWeb users are upscale with an annual income ofmore than $80K (or $50K depending on the survey).more than $80K (or $50K depending on the survey).

winning the whole world?winning the whole world?weird, wacky, and wow!weird, wacky, and wow!

Page 19: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 9

Internet Multimedia MilestonesInternet Multimedia Milestones

Transport Multicasting World Wide Web

1995

1996

1997

Rapid adoption of 56kbit/saccess by small businessesand branch offices via ISDN, frame relay and leased lines

Mbone backbone supportslimited multicasting

Object technology for Web browsers:Hot JavaOpen DocOLEVRML

ATM backbone deploymentby Internet service providersRegional Bell operating companiesbundle ISDN and Internet accessfor small business market

Real-time audio and videoservers based on pseudo-multicasting

Enhanced real-time Internetproducts ship:24-bit Color CU-SeeMe FM Real Audio PlayerInternet Phone, NetphoneMultimedia Netscape Navigator

Projected ISDN installed base : 1.6 million lines (U.S.) 7.68 million lines (global)Projected installedbase of V.34 modems : 5.4 million (global)Telcos bundle ISDN and Internetaccess for home market

Widespread adoption ofIP next generation protocolwith support for broadcasting

Widespread use of Web-centricreal-time Internet tools for entertainment, distance learning,and conferencing

OEM Magazine September 1995

Page 20: Data dirtroad infocosm-1995
Page 21: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 1

Page 22: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 2

Page 23: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 3

Page 24: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 4Wired Oct. 1995

Page 25: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 5Netguide June 1996

Page 26: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 6Web Week December 1995

Page 27: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 7

WWW: Conquereing the Business WorldWWW: Conquereing the Business WorldExample of use in a Workflow ApplicationExample of use in a Workflow Application

SDOH andSDOH andCHREFCHREFmaintainmaintaindatabases,databases,support EDIsupport EDItransactionstransactions

Hospitals and clinics updateHospitals and clinics updatecentral databases aftercentral databases afterencountersencounters

Health providers can obtain up-to-dateHealth providers can obtain up-to-dateclinical and eligibility informationclinical and eligibility information

State and HMO’sState and HMO’scan updatecan updatepatient’s eligibilitypatient’s eligibilitydatadata

Health agencies canHealth agencies canuse reports generated use reports generated to trackto trackpopulation’s needspopulation’s needs

TRACKING SUBSYSTEMTRACKING SUBSYSTEM

Generates:Generates:•• alerts to identify alerts to identifypatient’s needs.patient’s needs.•• contraindications to contraindications tocaution providers.caution providers. Reminders to parentsReminders to parents

Reports to stateReports to stateCTCT

Hospitals andHospitals andcase workerscase workerscan reachcan reachout to theout to thepopulationpopulation HMOs can keep trackHMOs can keep track

of performanceof performance

CLINICAL SUBSYSTEMCLINICAL SUBSYSTEM

Healthcare Info Infra. Tech.project: UGA and CHREF

Page 28: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 8

List of OverdueVaccinations

Link to contraindicationinformation obtained from

the InternetClinical Aspects

Web Browsers: Conquereing the Business WorldWeb Browsers: Conquereing the Business WorldExample of use in a Workflow ApplicationExample of use in a Workflow Application

Interface for a PhysicianInterface for a Physician

Page 29: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 9

Electronic commerce -- statisticsElectronic commerce -- statistics

Year # of Companieson the WWW

Sales on theInternet

1994 29,000 100K

1995 152,000 75,000K

1996 553,000K(projected)

Table: NBC NewsChart: The Economists

90 91 92 93 94 95

Financialservices

Publishing

Law

500

400

300

200

100

Page 30: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 10

Page 31: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 1

Saturn Corp. (Automaker http://www.saturncars.com)

Traffic :84,000 people a month reading 27,000 pagesVision :Use web to build image as innovative company; build customer relationshipsWhat you can do: View 1996 models , find aretailer, read Saturn magazine, order brochure,locate and write other owners via bulletin boardPayoff: 25% of brochures requested via Web

Fidelity Investments( $14 billion mutual funds investor http://www.fid-inv.com )

Vision: Use web as new distribution and sales channelWhat you can do: Review and select 160 mutual funds, plan college and retirement savings ,download software demo, participate in survey and “Guess The Dow” contestPayoffs: Undisclosed savings in mailing, handling &printing from electronically delivered prospectuses.

W. W. Grainger, Inc. ($3 billion wholesaler and distributor http://www.grainger.com/index.html)Traffic: 3,000 pages downloaded weeklyVision: Create low-cost way to expand sales reach; lower acquisition costs for customersWhat you can do: Search product databases, review new products, locate branches worldwide,send E-mail, order catalogPayoff: Detailed customer demographics and feedback helps set direction

COMPUTERWORLD November 20, 1995

Page 32: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 2

World PCWorld PCaka aka Information Appliance Information Appliance aka aka Browser BoyBrowser Boy

! 500$-700$ PC supporting network computingno hard disk

! $50 LSI Logic “superchip” that incorporates amicroprocessor, memory, high speed modem and audioand video processorJava and Servers complete the computing paradigmCan reach much larger population

many more would have $500 disposable compared to $2000more appealing to less technical user

Page 33: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 3

JavaJava

!New programming language (subset of C++)especially suitable to run on network (from Sun)

!Applets (small efficient program) delivered onnetwork

!Microsoft licensed it (a first for Microsoft), IBMtoo

!Already hundreds of applications, includingspreadsheets, wordprocessors and games

Page 34: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 4

JavaJava

User asks for object

Browserdoesn’tunderstandobject type

ObjectDisplayed

Object

Javacode tosupport object

request

reply

request

reply

Browser Network Server

Time

•Java: C++ minus : Typedefs, Preprocessor, ... Functions, Multiple Inheritance Opeartor Overloading, PointersPlus: Multithreading, ...

• Server Site: Java Souce Compiler Byte Codes

• Client Site:Class Loader Byte Code verifier

InterpreterRun-time

Page 35: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 5

Browser Boy vs Bill Gates*

* Richard Shaffer, Forbes, December 4, 1995.

The Economist, Oct. 14, 1995.

Java

Page 36: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 6

Network computing--Network computing--the Equalizerthe Equalizer

Impact on marketing!Marketing a software product will no longer

involve the huge investment; WWW provideslevel playing field-- (almost) as easy to have thepresence on the Web for a small company; levelplaying field in delivery, sales, payment

Page 37: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 7

An opportunity for DevelopingAn opportunity for DevelopingCountriesCountries

!New Communication Infrastructure Alternativespotential for fast catch-up

!New Computing Paradigm leading todiminishing importance of geographic separation and distancenew marketing, sales, support alternativesnew ways to interact with clients and customersnew ways to develop software, new software marketplace

!New commodity to sell-- information

Page 38: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 8

Focus on informationFocus on information

!Exponential growth in the capability ofcomputing (Moore’s law) and communicationbandwidth is well documented.

!Our ability to represent information andknowledge: from numbers and letters to objectsand relationships, from syntax to semantics, fromtransactions to workflows, from data toinformation, ... has received less attention, isharder to address, and has lagged.

Page 39: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 9

Data vs InformationData vs Information

DataSet of facts

Data Measurements about the real worldobtained from human/machine sensors

Interoperability ==> transformation acrossdifferent forms, representations and querylanguages

Data + Knowledge about Meaning of data+ Knowledge of when to apply it

InformationApplication of facts and knowledge of“when” to use facts

Derivation from facts using cognitiveand perceptual processes

Interoperability => transformation of knowledge to make it suitable for application of different facts in a differentenvironment

= Information

Information can be used for decision making based on data

Page 40: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 10

Technical Challenges inTechnical Challenges inGlobal Information SystemsGlobal Information Systems

Difficulties in information accessDifficulties in information access::cosmic Easter egg hunt cosmic Easter egg hunt problemproblem-- hard to locate and access pertinentinformation;write-only database write-only database problem problem -- easy to create, hard to maintain

ScaleScale: : needle in the haystack needle in the haystack problemproblemvast amount of information; large number of autonomous sites

HeterogeneityHeterogeneity:: tower of Babel tower of Babel problemproblemInformation represented in different ways

Query expressivenessQuery expressiveness: : the Pidgin the Pidgin problemproblemquery language not expressive enough to specify the user’s interest

Information OverloadInformation Overload::too much junk (less relevant) information on the network

Page 41: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 1

Some approaches ....Some approaches ....

User centered approachUser centered approach::menu-based browsinghypertext browsing

Syntactic/structural approachSyntactic/structural approach::information retrieval, indexing techniquesname and attribute-based search, pattern matching

Descriptive (symbolic) semantics-based approachDescriptive (symbolic) semantics-based approachmaking design assumptions explicitcapturing the semantics of the query

Cognitive (sub-symbolic) semantics-based approachCognitive (sub-symbolic) semantics-based approachPattern/Speech Recognition AlgorithmsNeural Networks

Page 42: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 2

Challenges with current techniques forChallenges with current techniques forInformation Resource DiscoveryInformation Resource Discovery

Unattractiveness of Navigation and Browsing:tend to give up if the number of links are more than 3 or 4need to annotate links with contextual information in order to help reduce the “link-chasing”

Scalability problems in Indexing information:cannot index all the information on the internet !!difficult to index heterogeneous but related informationcombining results obtained by using independent/ different indices

Hard to maintain pre-determined relationships:file update might make some hyper-links meaningless !!hierarchical organizations might prove expensive to searchif user specified criteria for search is different from criteria of organization

Page 43: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 3

Infocosm viaInfocosm viaInfoHarness and InfoQuiltInfoHarness and InfoQuilt

! InfoHarnessaccess, scale, heterogeneity

! InfoQuiltquery expressiveness, semanticscorrelation of heterogeneous media

InfoHarness is a trademark of Bellcore. Adapt/X Harness is a commercial productbased on the InfoHarness system (see http://www.bellcore.com/features/index.html).

Page 44: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 4

InfoHarness: Business Need ExampleInfoHarness: Business Need Example

Req., Design, ....Documents(Framemaker)

Source code( C functions ), man pages( Unix files )

Figures( postscript files )

Third party tools

Where ?How to access ?

A Software Business House

* Leon Shklar, Satish Thatte

Page 45: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 5

InfoHarness: Business Need ExampleInfoHarness: Business Need Example

Req., Design, ....Documents(Framemaker)

Source code( C functions ), man pages( Unix files )

Figures( postscript files )

Third party tools

A Software Business House*

Now I know ...

InfoHarness

- Uniform access- Integrated view of heterogeneous information

* Leon Shklar, Satish Thatte

Page 46: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 6

InfoHarnessInfoHarness

Dealing with Data Heterogeneity:Use of Domain Independent Metadata

1. Information Unit1.1 Type1.2 Location1.3 Other Attributes

2. List of Collections thatinclude this IHO

Text file(or its portion), bitmap, emailmessage, manpage, directory of man pages

Physical Data•Logical structuring of information space without restructuring, reformatting or relocating•Accessing information via logical units•Utilizing third party indexing tools to search for information

Page 47: Data dirtroad infocosm-1995

Results of the WAIS QUERY Let’s Lookat the 2ndarticle.

Keyword-based Access

Page 48: Data dirtroad infocosm-1995

Kilpatrickis not theauthor! Heis referenced.

We can use keywords to querythe WAIS collection, but we cannot provide the semantics “author”with the keyword “kilpatrick”

Page 49: Data dirtroad infocosm-1995

An attribute-basedaccess method allowsthe specification ofsemantics like “author”and “date”. The typedattribute “date” allowsdata access notsupportedby keyword basedmethods

Attribute-based Access

Page 50: Data dirtroad infocosm-1995

The results will onlycontain those articlesauthored by Kilpatrickthat were posted afterJuly 1, 1995.

Keyword-based and Attribute-based access are complementary

Page 51: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 1

InfoHarness Project: ScalabilityInfoHarness Project: Scalability

Partition 1(Database of Textual object)

Partition 2(Database ofTextual object)

. . . . Partition n(Database of objects with Textual and Image Components)

AttributeMetadata

Index11

Index12

Partition 1 Metadata Object Partition 2 M.O.

Index21

Partition n M.O

Indexn1

Indexn2

CombiningPartial ResultsQuery Processor

Query Result

http://www.cs.uga.edu/LSDIS/infoharness

Page 52: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 2

INFORMATION COMMERCEINFORMATION COMMERCEA proposed Architecture

INFORMATION BROKERING

INFORMATION PROVIDERS

INFORMATION CONSUMERS

I n f o rm a t i onR e q u est

. . .

. .

Ontologies/User Models

I n f o r m a t i o nSy st e m IS 1

I n f o r m a t i o nSy st e m IS 2

I n f o r m a t i o nSy st e m ISm

I n f o r m a t i o nR e q u e st

I n f o r m a t i o nR e q u e st

Page 53: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 3

An a t o m y o fAn a t o m y o fI n f o r m a t i o nI n f o r m a t i o n

B r o k e r i n g T a sk sB r o k e r i n g T a sk s! I n f o r m a t i o n R e so u r c e

Di sc o v e r yi d e n t i f i c a t i o n o f t h e i n f o r m a t i o nso u r c e s r e l e v a n t t o a g i v e n q u e r yo r i n f o r m a t i o n n e e d

!Q u e r y P r o c e ssi n g I n f o r m a t i o n F o c u si n g

i d e n t i f i c a t i o n o f t h e su b se t o fi n f o r m a t i o n i n a g i v e n i n f o r m a t i o nso u r c e r e l e v a n t t o a g i v e n q u e r y

I n f o r m a t i o n Co r r e l a t i o nc o m b i n i n g t h e r e l e v a n ti n f o r m a t i o n f r o m d i f f e r e n t

Page 54: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 4

Challenges in Information BrokeringChallenges in Information Brokering

Ne w Ch a l l e n g e s a n dR e se a r c h Di r e c t i o n s

!Se m a n t i c s- - k e y t oi n f o r m a t i o n

wh a t d o y o u wa n t ? wh a t i sa v a i l a b l e ?R e l a t i o n sh i p b e t we e nst r u c t u r e a n d se m a n t i c s

Co n t e x t , c o n t e x t , c o n t e x tUn c e r t a i n t y , p a r t i a li n f o r m a t i o n , i n c o n si st e n c y

Page 55: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 5

Se m a n t i c s ?Se m a n t i c s ?Wh a t ? Wh e r e ?Wh a t ? Wh e r e ?

Vocabulary Ontology (domain specific)

Content Metadata

(domain specific metadata)

Content-descriptive

Data

(abstract structure)content-based (indices)

content-independentStructure

+ relationships

what is semantics ? Where is semantics ?

Page 56: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 6

CO NT E XTCO NT E XT

Q u e r y t o Wh i t e Ho u se se r v e ra sk i n g f o r d o c u m e n t s o n“ I n d i a ” .2 562 9 O f f i c e - o f - Navajo-and-Hopi-Indian- R e l o c a t i o n2 5654 Na t i o n a l - Co m m i ssi o n - o n - American-Indian, - Al a sk a -

Na t i v e , - a n d - Na t i v e - Ha wi i a n - Ho u si n g2 5668 I n st i t u t e - o f - American-Indian- a n d - Al a sk a -

Na t i v e - Cu l t u r e - a n d - Ar t s- De v e l o p m e n t1 4 8 62 6 National-Indian- Ga m i n g - Co m m i ssi o n1 4 8 63 2 Bu r e a u - o f - Indian-Affairs1 53 62 2 P u b l i c - a n d - Indian-Housing- P r o g r a m s1 58 9 3 0 Indian-Health-Services1 3 3 2 0 6 1 9 9 4 - 0 4 - 2 9 - Ba b b i t t - a n d - De e r - Br i e f i n g -

o n - Indian-Affairs1 3 3 3 0 5 1 9 9 4 - 0 4 - 2 9 - P r e si d e n t - i n - M e e t i n g - wi t h -

Indian-Tribal- L e a d e r s3 0 8 3 7 1 9 9 4 - 0 5- 1 9 - P r e si d e n t - a n d - India-PM-RAO-

i n - P r e ss- Av a i l a b i l i t y3 0 9 2 5 1 9 9 4 - 0 5- 1 1 - P r e si d e n t - Na m e s- F r a n k -

Wi sn e r - a s- Ambassador-to-India8 1 9 60 1 9 9 4 - 0 8 - 0 2 - E i g h t - Na m e d - Na t i o n a l -

Ad v i so r s- o n - Indian-Education8 1 9 62 1 9 9 4 - 0 8 - 0 3 - F o u r - o n - Am e r i c a n - Indian-

Culture- De v e l o p m e n t - Bo a r d1 7 9 3 9 5 Aid t I di 10 01 93

Page 57: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 7

E n a b l i n gE n a b l i n gI n f o c o smI n f o c o sm

Using metadata PatchQuilt and user models/ontologies to support informationrequests over globally distributed heterogeneous media repositories

InfoQuilt Project:

http ://www .c s .uga.e du/LSDIS /in foquilt

Page 58: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 8

I n f o Q u i l tI n f o Q u i l t

Semantic Relationships betweenMetadata

Q u e r y : Ge t m e r e g i o n s( b l o c k s, c o u n t i e s) h a v i n ga population g r e a t e r t h a n 50 0a n d area g r e a t e r t h a n 50sq f e e t h a v i n g a n u r b a nland c o v e r a n d m o d e r a t erelief q u e r y r e p r e se n t s se m a n t i cr e l a t i o n sh i p s b e t we e n t h em e t a d a t a :

Page 59: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 9

I n f o Q u i l tI n f o Q u i l t

Population:Area:

Land Cover:Relief:

Correlation

St r u c t u r e dDa t aUS Ce n su sB u r e a u

SQL Queriesreturn blocks,counties

SQL Queries returnboundaries of blocks,counties

St r u c t u r e dDa t aT I GE R / L i n e

I m a g eDa t a

L a n d Co v e rE l e v a t i o n

IP functions compute regions Land cover, Relief

SQL queries return blocks with Land cover, relief

Correlation of blocks satisfying various constraints in different databases!!

Extraction of Domain Specific Metadata

Page 60: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 1 0

I n f o Q u i l t :I n f o Q u i l t :M u l t i m e d i aM u l t i m e d i aCo r r e l a t i o nCo r r e l a t i o n

Page 61: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 1

InfoQuilt:InfoQuilt:Multimedia CorrelationMultimedia Correlation

Page 62: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 2

InfoQuilt/OBSERVER:InfoQuilt/OBSERVER:Vocabulary SharingVocabulary Sharing

Ontology-Based System Enhanced with Relationships for Vocabulary hEterogeneity Resolution

Page 63: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 3

InfoQuiltInfoQuilt

•Top-down processing: - Capture the context of user query - Construct and display ontologies for specific domains

• Bottom-up processing: - Extract metadata from information source - Generate mappings between metadata and information - Construct information resource context from the metadata

COMPARE

DomainOntology

QueryContext

InformationResourceContext

Metadata

Using descriptive and content-based metadata approachUsing descriptive and content-based metadata approach

Page 64: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 4

A MessageA Message

!Emerging network computing and increasingimportance of communication and the availablealternatives are tearing up traditionalgeographical and market boundaries, and givingonce-in-a-life-time opportunity for the developingcountries to catch-up

!Use information to be happy; know how tosupply information to be wealthy

!But make no mistake-- data is not information

Page 65: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 5

Memorable and Interesting QuotesMemorable and Interesting Quotes

“By means of electricity, the world of matter has become a great nerve, vibratingthousand of miles in a breathless point of time. The round globe is a vast ... brain,instinct with intelligence!” [AI Gore’s Quotation of Nathaniel Hawthorne, 1851]

Zooming is when you overcome your fears and trust the universe to make things right.You fly and float and hum and weave and sing. Opportunity knocks. Hello! I likeplaying with people who zoom. Win-win deals all the time. It’s cooool ..... On reallycool days I zoom. On reallllllly cooooooooool days I zooooooooom.

[Dave Winer <[email protected]>, DaveNet, 9/22/95]

Page 66: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 6

About the SpeakerAbout the SpeakerDr. Amit Sheth directs the Large Scale Distributed Information Systems (LSDIS) Lab, is an AssociateProfessor of Computer Science at the University of Georgia, and an Adj. Assoc. Professor in the College ofComputing at the Georgia Institute of Technology. Earlier he worked for nine years in the R&D labs atBellcore, Unisys, and Honeywell. His primary current research interests include workflow automation(project METEOR), management of heterogeneous digital data and semantic issues in global informationsystems (projects InfoHarness and InfoQuilt), and electronic/information commerce.

Prof. Sheth has led projects on heterogeneous DBMS, factory information system, integration of AI-database systems (project/system BrAID), transactional workflows (PROMPT and METEOR), federateddatabase tools (BERDI and TAILOR), multidatabase consistency, and data quality(Q-Data). LSDIS lab(http://www.cs.uga.edu/LSDIS) maintains active collaboration with industry, and has won significant fundedprojects in the areas of interoperable and global information system. Prof. Sheth has published over 80papers in the areas of federated databases, workflow management, multidatabase consistency, metadata andinformation modeling, and data consistency and semantics. He has participated in over 30 program/organiz-ation committees for conferences , given over 45 invited and colloquia talks and 14 tutorials, and lead twointernational conferences and a workshop as a General/Program (Co-)Chair. Currently he is aGeneral (Co-)Chair of the Intl. Conference on Cooperative Information Systems, the Program Chairof the NSF Workshop on Workflow and Process Automation, and is on the editorial board of five journals.He has also served twice as an ACM Lecturer.

Page 67: Data dirtroad infocosm-1995

Infocosm (Amit Sheth) 7

Partial BibliographyPartial BibliographyBesides the articles referred in the presentation, there is considerable information in popular and technical literature on

the topics of information superhighway, metadata, electronic commerce and related topics. I suggest using any ofthe existing Web tools for a search (for example, most Internet-tools will return more than one hundred URLs forany of these topics). Because the list is too large, below is a partial list of research publications with which thespeaker has been associated. These and other LSDIS publications can be obtained fromhttp://www.cs.uga.edu/LSDIS.

A. Sheth and L. Kalinechenko, "Information Modeling in Multidatabase Systems: Beyond Data Modeling" (invited paper) Proc of the 1st InternationalConference on Information and Knowledge Management (CIKM), Baltimore, November 1992.

A. Sheth and V. Kashyap, "So Far (Schematically) yet So Near (Semantically)" (invited paper) Proc of the DS-5 Semantics of Interoperable DatabaseSystems, Lorne, Australia, November 1992; In IFIP Transactions A-25, North-Holland, 1993.

V. Kashyap and A. Sheth, "Semantics-based Information Brokering: A step towards realizing Infocosm", Technical Report DCS-TR-307, Dept. ofComputer Science, Rutgers University, March 1994 (Position Paper, December 1993).

V. Kashyap and A. Sheth, "Semantic based Information Brokering" Proceedings of the 3rd Intl. Conf. on Information and Knowledge Systems,November 1994.

W. Klas and A. Sheth, Eds., "Metadata for Digital Media", Special issue of SIGMOD Record, December 1994.L. Shklar, A. Sheth, V. Kashyap, and K. Shah, "InfoHarness: Use of Automatically Generated Metadata for Search and Retrieval of Heterogeneous

Information" Proceedings of CAiSE-95, June 1995.V. Kashyap and A. Sheth, "Schematic and Semantic Semilarities between Database Objects: A Context-based Approach" to appear in the VLDB

Journal.V. Kashyap, K. Shah, and A. Sheth, "Metadata for building the MultiMedia Patch Quilt" (to appear in) Multimedia Database Systems: Issues

and Research Directions, S. Jajodia and V.S.Subrahmaniun, Eds., Springer-Verlag, 1995.A. Sheth, V. Kashyap and W. LeBlanc, “Attribute-based Access of Heterogeneous Digital Data,” Proceedings of the Workshop on Providing Web

Access to Legacy Data, the 4th International World Wide Web Conference, December 1995.A. Sheth, "Data Semantics: What, Where and How?" to appear in Database Application Semantics, Proceedings of the 6th IFIP Working

Conference on Data Semantics (DS-6), R. Meersman and L. Mark (Eds.), Chapman abd Hall, London, UK, 1996.