Top Banner
1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos, P. Tucker
28

1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

1David Maier

PetDB

A Petabyte in Your Pocket

David MaierOregon Graduate Institute

with help fromD. DeWitt, J. Naughton, L. Delcambre, K.

Tufte, V. Papadimos, P. Tucker

Page 2: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

2David Maier

PetDB

Your PetDBIt’s 2015.

For $300 a year, you can have a personal petabyte database (PetDB).

You can talk to it from anywhere.

Organizes any kind of digital data.– Doesn’t lose structure, can restructure– Queryable– Handles streams– Organized by type, content, associations, multiple

categorizations and groupings

Locate items by– How or where you encountered them– What you’ve done with them– Where you were when you accessed them

Page 3: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

3David Maier

PetDB

What Would I Put in a Petabyte?A lot.

Fill my office floor to ceiling with books 100 GBWhat do I do with 10,000 as much?

Many possibilities:– Contents of every book and magazine I read– Every web page I visit– All email I send or receive– Every TV program I watch– Every version of every piece of software I use– Maps of everywhere I go– Notes from every class or seminar I attend– All the telephone calls I make– My “Lifestream” (Freeman and Gerlernter)

Page 4: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

4David Maier

PetDB

Streams and Restructuring

Can incorporate streamed data on the fly.– MD: Vital signs from patients in ICU– Factory supervisor: status, output rate of all

machines; finished products; rejects

Can restructure data if desired.– Combined list of conferences in my area– Info sheets on autos I’m considering buying– Comparable salaries of faculty at my rank in

similar departments

Page 5: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

5David Maier

PetDB

Anything I Might Want to Refer Back to

Personally indexed for me.

Can be located in a thousand different ways.

What is the company in Massachusetts I read about in the article on factory tours when I was on the plane to the sales meeting in Atlanta last spring?

Page 6: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

6David Maier

PetDB

Or Things I Might Want in the Future

• Histories of news groups and mailing lists

• Parts of the web I might want to browse, including past snapshots

• Descriptions and prices for any item I might want to buy

• Papers I’ve been meaning to read

• Historical data on stocks I’m interested in

Functions as a personal web portal

Page 7: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

7David Maier

PetDB

“Database” Not Completely Apt• Didn’t have to define a scheme for it

• Doesn’t need to know the datatypes I want to store in advance

• Doesn’t chop data into rows and columnsUnless I ask

• Can query over information streams

• Don’t need to write and run applications to add dataAnything I’ve touched is thereOr expressed an interest in

• Not on a particular computer

• Doesn’t have an “outside”

Page 8: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

8David Maier

PetDB

My PetDB is Good to Me

• I don’t move data between environmentsI’m never on the “wrong” machine

• Never go back to my office to grab a paper, never have the wrong folder at a meeting

• Don’t worry a lot about filing systems–PetDB organizes itself by ways I like to look for information

• Anticipates what data I’ll be using

Page 9: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

9David Maier

PetDB

How to Do This?

On $300/year

Plan A: Pack my office floor to ceiling with disk drives.

About a $1 million.

Plan B: Be clever.– Share– Stage– Reconstitute

Page 10: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

10David Maier

PetDB

Share

Most of the information in my PetDB isn’t unique to me: magazine article, web page, stock quote.

Store one copy.

Information Paradox: What’s too expensive for one may be affordable for all.

My PetDB

Others’ PetDBs

Page 11: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

11David Maier

PetDB

Stage

Not all data has to be at my current point of connection.

Mainly resides in shared and private servers on the Internet.

Staged to me on a series of data managers.

Access time depends on context, likely use– Current itinerary: 1 second– Upcoming trips: 5 seconds– Past trips: 30 seconds

Page 12: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

12David Maier

PetDB

Reconstitute“If I found it once, PetDB can find it again”

Remember what procedure or search constructed or located data originally.

Use the same method to get it again.

Need to ensure base data is archived.

Plus a small amount of unique content

– Stuff I’ve created

– Foreground information that superimposes my personal perspective: selections, annotations, responses, manipulations, groupings

Page 13: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

13David Maier

PetDB

What Infrastructure Do I Need?

Net Data Managers

• Network-centric vs. disk-centric– Data movement vs. data storage

• Work on lives streams as well as stored data

• Deal with data of arbitrary types

• Run queries of thousands of sites

• Locate data by external contexts as well as internal content• Large-scale monitoring

Page 14: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

14David Maier

PetDB

Data Management Space

Disk Centric Network Centric

No Query

Query DBMS

File System Web Servers

Net Data Managers (NDMs)

Page 15: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

15David Maier

PetDB

Why Net Data Managers?

File systems won’t work– No queries, disk centric

Web Servers won’t work– No structural query, no combining of data– No support for optimization and execution of

high-level queries spanning 1000s of sites– No support for triggers– In reality, nothing more than “page servers”

Page 16: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

16David Maier

PetDB

Limitations of Current DBMSs

• Schema-first

• Load then query

• Data in the box

• Scale

• Search by content, not by context

Page 17: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

17David Maier

PetDB

Key Elements of NDM

• Self-describing data (e.g., XML)

• NetQueries

• Algebraic basis

• Stream-processing componentsOil refinery vs. book-order warehouse

Want to do for net-centric, data-intensive applications what relational DBs did for business data processing:

Reduce the coding effort to produce such applications, while improving performance, scalability and reliability.

Page 18: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

18David Maier

PetDB

Codd’s Contribution

What’s the most important aspect of the relational model?

– Calculus?– Algebra?– Equivalence?

My opinion: Observing that BDP programs only do about 6-7 different things:

scan files remove fieldsselect records remove duplicatescombine records [aggregate records]concatenate files

What are the building blocks of net data management?

Page 19: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

19David Maier

PetDB

Without NDMs

Format

Conversion

Custom Software

Data SourcesUsers

Browser

Push

Receiver

Generic Component

Accumulator

+ Query Eng.

Alert

Service

Profiles

Parameter

File

Data Product

Generation

Algorithm

Format

Conversion

Format

ConversionBrowser

Push

Receiver

Browser

Push

Receiver

Page 20: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

20David Maier

PetDB

With NDMs

Custom SoftwareGeneric Component

SourcesUsers

Accumulator

+ Query Eng.

Alert

Service

Profiles

Parameter

File

Algorithm

Data Product

Generation

Browser

Push

Receiver

Browser

Push

Receiver

Browser

Push

Receiver

Format

Conversion

Format

Conversion

Format

Conversion

Page 21: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

21David Maier

PetDB

Kinds of Components

• Stream-based query processors

• Alerters

• Accumulators

• Remote monitoring/indexing

• Semantic Routers

• Replicators: lazy, eager, just-in-time

• Semantic caches

• Splitters

• Access-mode adapters

• Partial evaluators

Page 22: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

22David Maier

PetDB

Alerting vs. Querying

D D

D

DBMS? ? ? ! ! !

? ?

?

AlerterD D D ! ! !

Data Centric Net Centric

Stream of queries past a

store of data

Stream of data past a

store of queries

Page 23: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

23David Maier

PetDB

Access Modes: Who Decides

Consumer

Producer Post

Pull Poll

Push

ProducerConsumer

What Data Moves

When DataMoves

Page 24: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

24David Maier

PetDB

Assembling Applications from Components

Akamai FreeFlow (see NASDAQ site)Splitting + Replication + Merge + Adapters

Web

Content

Split Graphics

Push

Replicate

BaseServer

FieldServer

FieldServer

FieldServer

Browser

Text

Pull

Pull

Merge

Page 25: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

25David Maier

PetDB

NIAGARA Project

Initial investigation of NDM based on XMLUniversity of Wisconsin and OGI

• Stream-oriented XML-QL evaluator

• “Text-in-context” search

• NiagaraCQ

• Merge operator (and rest of algebra)

• XML Firehose

Page 26: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

26David Maier

PetDB

Use of NDM for PetDB

• NetQueries encode procedures for reconstituting data

• Monitoring sources of interest

• Replication, splitting, push, accumulators, semantic routing for staging data

• NetQuery to inform an archive server what to save

• Archives, semantic caches express what they already hold with a NetQuery

Page 27: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

27David Maier

PetDB

Building the PetDB System

Pet

DB

Indexer

ContextMgr.

Petster

DataKennel

BackQuote

WebSnap

IP Server

TaskAnalyzer

Stager

Profiler

Public Archives

Private Archive

StreamProcessor

InternetMonitor

ReplicateServer

Stager

SecureLocalCache

Stager

Page 28: 1 David Maier PetDB A Petabyte in Your Pocket David Maier Oregon Graduate Institute with help from D. DeWitt, J. Naughton, L. Delcambre, K. Tufte, V. Papadimos,

28David Maier

PetDB

What Else is Needed?

• Superimposed InformationMuch of my unique content is an organizational

overlay on base data

• Small-footprint data managers

• Presentation model of stream data

• Authorization and Authentication

• QoS control, content scaling

• Intelligent prediction, learning

• Secure staging areas