Top Banner
Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline • What I have been doing 15 minutes • BIG Changes in DBs 15 minutes • Q&A 20 minutes
29

Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Jan 11, 2016

Download

Documents

Warren Snow
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Database Activities and TrendsJim Gray

Microsoft Research2 June 2006, Microsoft, TechNet, London

Outline

• What I have been doing 15 minutes

• BIG Changes in DBs 15 minutes

• Q&A 20 minutes

Page 2: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Scalability Projects • TerraServer: Geospatial data online

– Now part of Virtual Earth http://local.live.com/

• SATA disk evaluation– Copy 1.5 Petabytes (count types of errors)

MSR-TR-2005-166

• Disk and Network performancemove 1GB/s from CERN to Pasadena MSR-TR-2004-62

• Bricks – BI-Bricks: cheap boxes/disks for BI – Server Bricks: TerraServer Bricks: MSR-TR-2004-107

Page 3: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

DB Projects• Spatial data access inside SQL

– Gives a good example of using CLR to extend SQL– Sample is part of SQL 2005 programming samples.– Many papers …, MSR-TR-2005-122, MSR-TR-2006-52

• To Blob or NOT to Blob?– Explored what is the break-even point of Blobs vs Files.

Guess what! Almost all files should be blobs. MSR-TR-2006-45

• GPU TeraSort: – You have been hearing about Many-Core from Intel– Nvidia & ATI give you 100 cores today (2x next year)

10x the operations per second than the CPU 10x the memory bandwidth of the CPU

– How to program them?– Sort represents IO, memory, processing. – GPU TerraSort demos this MSR-TR-2005-183 – Accelerator: C# extension is a GPU compiler. MSR-TR-2005-184

Not me, but very cool!

Page 4: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

eScience Projects • SkyServer: Astronomy data online http://skyserver.sdss.org/

– A real Data Grid app – Web services are popular– SkyQuery and CasJobs use web services.

http://casjobs.sdss.org/CasJobs/– Spatial access built as SQL 2005 C# extensions.

• Doing Finite Element Analysis with a DB and Vis toolsSupporting Finite Element Analysis with a Relational Database Backend; Part I: There is Life beyond Files MSR-TR-2005-49

• Ecological sensors (soil, water, ocean,…)– Only public thing so far: http://lifeunderyourfeet.org/

– Many papers coming

• Starting BioInfo efforts (Portable PubMed Central, ….)

Page 5: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.
Page 6: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.
Page 7: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.
Page 8: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.
Page 9: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.
Page 10: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Portable PubMedCentral

• “Information at your fingertips”• Helping build PortablePubMedCentral• Deployed US, China, England, Italy, South

Africa, (Japan soon).• Each site can accept documents • Archives replicated • Federate thru web services • Working to integrate Word/Excel/…

with PubmedCentral – e.g. WordML, XSD,• To be clear: NCBI is doing 99% of the work,

but it is very cool and very significant.

Page 11: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Outline

• What I have been doing 15 minutes

• BIG Changes in DBs 15 minutes

• Q&A 20 minutes

Page 12: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

DB System Architecture • The classic DBMS model

os

records

sets

utilities

Added:+Text, Time, Space+ Cubes, Data mining+ XML, XQuery+ Programming Languages+ Triggers and queues+ Replication, Pub/sub+ Extract-Transform-Load+ Many more extensions coming

Replicatio

n

ET

LT

extC

ubesD

ata Mine

Tim

eS

paceN

otification

Procedure

s

QueuesX

ML

os

records

sets

utilities

A Mess?

Worked, but applications wanted to query other data types

Page 13: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

DB Systems evolved to be containers for information services

develop, deploy, and execution environment• Classic ++

– + Programming Languages– + Triggers and queues– + Replication, Pub/sub– + Extract-Transform-Load– + Text, Time, Space– + Cubes, Data mining– + XML, XQuery– + Many more extensions coming

• DBMS is an ecosystemOO is the key structuring strategy:– Everything is a class

– Database is a complex object

– Core object is DataSet

– Classes publish/consume them

– Depends on strong Object Model

os

records

sets

utilities

DataSet

Page 14: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

The Object-Relational Worldmarry programming languages and DBMSs

• Stored procedures evolve to “real” languagesJava, C#,.. With real object models.

• Data encapsulated: a class with methods• Classes may be persistent• Tables are enumerable & index-able

record sets with foreign keys• Records are vectors of objects• Opaque or transparent types• Set operators on transparent classes• Transactions:

– Preserve invariants – A composition strategy– An exception strategy

• Ends Inside-DB Outside-DB dichotomy

Business Business ObjectsObjects

Page 15: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Ask not “How to add objects to databases?”,

Ask “What kind of object is a database?”

Q: Given an object model, what is a DB?A: DataSet class and methods

(nested relation with metadata)This is the basis for the ecosystem

Distributed DBExtensible DBInteroperable DB….

This was implicit in ODBCbut is now explicit within the DBMS ecosystem

Input: Command (any language) Output: Dataset

Tablesor Textor cubeOr…..

Question

Dataset

Entity Set in ADO.NET 3.0

Page 16: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Queues & WorkflowsSODA (Service Oriented Data Architecture)

Service Oriented Database Architecture: App Server-Lite? MSR TR 2005 129

• Apps are loosely connected via Queued messages

• Queues are databases.

• Basis for workflow

• Queues: the first class to add to an OR DBMS

• Queues fire triggers.Active databases

• Synergy with DBMSsecurity, naming, persistence, types, query,…

Workflow:Script Execute Administer &

Expedite all built on queues

Page 17: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Text, Temporal, and Spatial Data Access

• Q: What comes after queues?A: Basic types: text, time, space,…

• Great application of OR technology• Key idea:

table valued functions == indicesAn index is a table, organized differentlyQuery executor uses index to map: Key → set (aka sequence of rows)

• Table valued function can do this mapOptimizer can use it.

• +extras: cost function, cardinality,…

• BIG DEAL: Approximate answers: Rank and Support

select Title, Abstract, Rank from Books join FreeTextTable(Title, Abstract, ‘XML semistructured') Ton BookID = T.Key

select store, holiday, sum(sales) from Sales join HolidayDates(2004) Ton Sales.day = T.daygroup by store, holiday

select galaxy, distance from GetNearbyObjEQ(22,37)

Page 18: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

What’s new here?

• DBMS have tight-integration withlanguage classes (Java, C#, VB,.. )

• The DB is a class• You can add classes to DB. • Adding indices is “easy”

If you have a new idea.• Now have solid Queue systems

Adding workflow is “easy”If you have a new idea.

• This is a vehicle for publishing data on the Web.

Tablesor Textor cubeOr…..

Question

Dataset

Page 19: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Cubes

• Data cubes now standard• MDX is very powerful

(Multi-Dimensional eXpressions) • Dimension, Measure, Operator

concepts highly evolved beyond snowflake schema

• Cube stores cohabit with row storesROLAP + MOLAP + (x xOLAP) (relational +multidimensional online analytic processing)

• Very sophisticated algorithms

• A big part of the ecosystem

CHEVY

CHEVY

FORDFORD 19901990

1991199119921992

19931993

REDREDWHITEWHITE

BLUEBLUE

SELECT <axis_spec> FROM <cube_spec>WHERE <slicer_spec>

Page 20: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Semi-Structured Data• “Everyone starts with the same schema:

<stuff/>.” Then they refine it.” J. Widom

• We are a “strong schema” community• That has pros-and-cons.• Files <stuff/> and XML <<foo/> <bar/>>

are here to stay. Get over it! • File directories are becoming databases;

– Pivot on any attribute– Folders are standing queries.– Freetext+schema search (better precision/recall)

• XSD (xml schema) and xQuery are transitional;But we have to do them to get to the real answer.

• Cohabit with row-stores.• Challenge: figure out what comes after XSD+xQuery

Page 21: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Data Mining and Machine Learning

• Tasks: classification, association, prediction• Tools: Decision trees, Bayes, A Priori,

clustering, regression, Neural net,… • now unified with DBs

– Create table T (x,y,z,u,v,w)Learn “x,y,z” from “u,v,w” using <algorithm>

– Train T with data.– Then can ask:

• Probability x,y,z,u,v,w• What are the u,v,w probabilities given x,y,z

– Example: Learn height from age.

• Anyone with a data mining algorithm hasfull access to the DBMS infrastructure.

• Challenge: Better learning algorithms.

Page 22: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Create the model:CREATE MINING MODEL HeightFromAgeSex

( ID long key, Gender text discrete, Age long continuous, Height long continuous PREDICT) USING Decision_Trees

Train a data mining model:INSERT INTO Height

SELECT ID, Gender, Age, Height FROM People

Predict height from model:SELECT height,

PredictProbability(height) FROM Height PREDICTION JOIN New

ON New.Gender = Height.Gender AND New.Age = Height.Age

DM – DB Synergy

Probabilistic Reasoning

DB verbs to drive Modeler

learn height from Gender + Age

Page 23: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Notification,Stream Processing, and

Sensor Processing• Traditionally:

Query billions of facts• Streams:

millions of queries one new fact – New protein compare to all DNA– Change in price or time

• Implications– New aggregation operators (extension)– New programming style– Streams in products:

• Queries represented as records• New query optimizations.

• Sensor networks – push queries out to sensors.– Simpler programming model– Optimizes power & bandwidth

facts

Q?

A!

QQ

QQ

QQ

Qfact, fact, fact…

Notification

Page 24: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

os

records

sets

utilities

Restatement: DB Systems evolved to be containers for information services

develop, deploy, and execution environment• DBMS is an ecosystem

Key structuring strategy:– Everything is a class– Database is a complex object– Core object is DataSet

• The architecture lets you add your new ideas.

os

records

sets

utilities

DataSet

Page 25: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Language + DB Integration(the Microsoft contribution)

• LINQ are a BIG deal (SQL and XML) http://msdn.microsoft.com/data

• Entity Sets are next step in Data SetsADO.NET V3 automates entities

Page 26: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Data access todayvoid EmpsByDate(DateTime date) {using( SqlConnection con = new SqlConnection(

Settings.Default.AdventureWorksSQL)) {con.Open();

SqlCommand cmd = con.CreateCommand();cmd.CommandText = @"SELECT SalesPersonID, FirstName, HireDateFROM SalesPerson spINNER JOIN Employee e ON

sp.SalesPersonID = e.EmployeeIDINNER JOIN Contact c ON

e.EmployeeID = c.ContactIDWHERE e.HireDate < @date";cmd.Parameters.AddWithValue("@date", date);

DbDataReader r = cmd.ExecuteReader();while(r.Read()) {

Console.WriteLine("{0:d}:\t{1}", r["HireDate"], r["FirstName"]);

}}

Relational EngineCustomer SalesPerson

Explicit DB connections

Opaque command

text

Untyped resultsets

Connection

DataReader

Command

Rows

Entities ≠ Rows

Page 27: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Data access tomorrowpublic partial class AdventureWorksDB :

System.Data.Objects.ObjectContext {public System.Data.Objects.

Query<SalesOrder> SalesOrders{ … }public System.Data.Objects.

Query<SalesPerson> SalesPeople{ … }

}

void EmpsByDate(DateTime date) {

using (AdventureWorksDB aw = new AdventureWorksDB()) {

var people = from p in aw.SalesPeoplewhere p.HireDate < dateselect p;

foreach (SalesPerson p in people) { Console.WriteLine("{0:d}\t{1}",

p.HireDate, p.FirstName ); }}

Relational EngineCustomer SalesPerson

No explicit connections

Strongly typed

commands

Strongly typed results

Connection

DataReader

Command

Rows

MapConnection

MapCommand

MapDataReader

Entities

ObjectContext

Query<T>

Objects

SalesData

Order

Domain

Objects

Auto-Gen classes

Page 28: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.

Outline

• What I have been doing 15 minutes

• BIG Changes in DBs 15 minutes

• Q&A 20 minutes

Page 29: Database Activities and Trends Jim Gray Microsoft Research 2 June 2006, Microsoft, TechNet, London Outline What I have been doing 15 minutes BIG Changes.