Top Banner
1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay [email protected] & [email protected] Microsoft Research Presentation to US Dept. Homeland Security 7 April 2004
42

1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay [email protected]@microsoft.com & [email protected]@Microsoft.com.

Dec 25, 2015

Download

Documents

Theresa Cain
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

1

Microsoft Researchand

Big Databases

Information at your fingertips

Jim Gray & Tom [email protected] & [email protected] ResearchPresentation to US Dept. Homeland Security 7 April 2004

Page 2: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

2

Outline

• Overview of Microsoft Research

• Big-Database Research

• TerraServer: Geospatial app

• SkyServer: data mining app

• Q&A

Page 3: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

4

Most R&D Is DHow to Do Basic Research in Industry?Critical questions (from Rick Rashid)

• How can I create and maintain a world class research organization in an industrial setting?

• How do I keep the lines of communication open between product teams and researchers?

• How do I get new technology into products quickly?

Page 4: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

5

ApproachAdapt the Academic Model

• Organizational goal: Advance state of the art

• University organizational model– Flat structure, critical mass groups

• Open research environment– Aggressive publication in peer-reviewed literature

– Frequent visitors, daily seminars

• Strong ties to University Research– Nearly 15% of basic research budget

directly invested in Universities• Lab grants, research grants, fellowships, etc.

– Hundreds of interns and visitors

Page 5: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

6

Microsoft Research• Founded in 1991• Staff of over 700 in over 55 areas• Internationally recognized research teams• Lab locations :

– Redmond, Washington, USA 75%– Cambridge, United Kingdom 10% – Beijing, People’s Republic of China

10% – Mountain View, California, , USA 5% – San Francisco, California , USA 1%

Page 6: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

7

Microsoft ResearchExpanding the State of the Art

• Thousands of peer-reviewed publications– 10%…30% of papers at our focus conferences

graphics, programming, systems, data management…

• Community leadership– Professional societies

– Journals

– Conferences

• Mentoring Interns

• Hosting academic summers and sabbaticals

• Special workshops

Page 7: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

15

BARC’s Research Agenda

• Scaleable Servers– TerraServer – US map online– SkyServer – All astronomy data online

• Databases– Advancing Databases and data storage

• Media Management– Organizing your digital shoebox

Page 8: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

16

How Can HLS & MSR Cooperate?

• Lots of research at MSR on HLS relevant areas.– Data mining and visualization– Distributed systems.– Cryptography, security,… – Etc.,,,

• Invite MS Researchers to HLS– workshops – study groups.

• HLS visiting scientists at MSR?

Page 9: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

17

Outline

• Overview of Microsoft Research

• Big-Database Research

• TerraServer: Geospatial app

• SkyServer: data mining app

• Q&A

Page 10: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

18

NumbersTerabytes and Gigabytes are BIG!

• Mega – a house in California

• Giga – a very rich person (billionaire)

• Tera – ~ The national debt

• Peta – more than all the money in the world

• A Gigabyte: the Human Genome

• A Terabyte: 150 mile long shelf of books.

Page 11: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

19

How much information is there?

• Soon everything can be recorded and indexed

• Most bytes will never be seen by humans.

• Data summarization, trend detection anomaly detection are key technologies

See Mike Lesk: How much information is there: http://www.lesk.com/mlesk/ksg97/ksg.html

See Lyman & Varian:

How much informationhttp://www.sims.berkeley.edu/research/projects/how-much-info/

Yotta

Zetta

Exa

Peta

Tera

Giga

Mega

KiloA BookA Book

.Movie

All books(words)

All Books MultiMedia

Everything!

Recorded

A PhotoA Photo

24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli

Page 12: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

20

e-Science Has BIG DATA

• Data captured by instrumentsData captured by instrumentsOr data generated by simulatorOr data generated by simulator

• Processed by softwareProcessed by software

• Placed in a files or databasePlaced in a files or database

• Scientist analyzes files / databaseScientist analyzes files / database

• Virtual laboratoriesVirtual laboratories– Networks connecting e-ScientistsNetworks connecting e-Scientists– Strong support from funding agenciesStrong support from funding agencies

• Better use of resourcesBetter use of resources

– Primitive todayPrimitive today

Page 13: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

21

The eScience Big PictureExperiments &

Instruments

Simulationsfacts

facts

answers

questions

• Data ingest • Managing a petabyte• Common schema• How to organize it?• How to reorganize it• How to coexist with others

• Query and Vis tools • Support/training• Performance

– Execute queries in a minute – Batch query scheduling

?The Big Problems

Literature

Other Archives facts

facts

Page 14: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

22

e-Science is Data Mininge-Science is Data Mining• There are LOTS of data

– people cannot examine most of it.– Need computers to do analysis.

• Manual or Automatic Exploration– Manual: person suggests hypothesis,

computer checks hypothesis

– Automatic: Computer suggests hypothesisperson evaluates significance

• Given an arbitrary parameter space:– Data Clusters– Points between Data Clusters– Isolated Data Clusters– Isolated Data Groups– Holes in Data Clusters– Isolated Points

Nichol et al. 2001Nichol et al. 2001Slide courtesy of and adapted from Robert Brunner @ Slide courtesy of and adapted from Robert Brunner @

CalTechCalTech..

Page 15: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

23

Data Analysis

• Looking for– Needles in haystacks – the Higgs particle– Haystacks: Dark matter, Dark energy

• Needles are easier than haystacks• Global statistics have poor scaling

– Correlation functions are N2, likelihood techniques N3

• As data and computers grow at same rate, we can only keep up with N logN

• A way out? – Discard notion of optimal

(data is fuzzy, answers are approximate)– Don’t assume infinite computational resources or memory

• Requires combination of statistics & computer science

Page 16: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

24

Outline

• Overview of Microsoft Research

• Big-Database Research

• TerraServer: Geospatial app

• SkyServer: data mining app

• Q&A

Page 17: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

25

TerraServer/TerraServicehttp://terraService.Net/

http://TerraServer-USA.com/

• US Geological Survey Photo (DOQ) & Topo (DRG) images

• On Internet since June 1998

• Operated by Microsoft

• Cross Indexed with– Demographics,

• A web service• 20 TB data source• 10 M web hits/day

Page 18: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

26

USGS Image Data

• Digital OrthoQuads– 15 TB, 280,000 files

uncompressed– Digitized aerial

imagery– 96% coverage

conterminous US – 1 meter resolution– < 15 years old

• Digital Raster Graphics– 1 TB compressed

TIFF, 65,000 files– Scanned topo maps– 100% U.S. coverage– 1:24,000, 1:100,000

and 1:250,000 scale maps

– Maps vary in age

• Urban Area– 1 foot resolution– Natural Color– 133 major U.S. cities– 30 available 2004– 2001 or later– Produced by NIMA

for Homeland Security

Page 19: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

27

Image Coverage

• 100% U.S., Topo Maps (light green)2m to 1024m resolution

• 96% 48 Conterminous States, (dark green)Ortho Imagery, 1m to 1024m resolution

Urban Area CitiesSeattle, Portland, Stockton, Modesto, Fresno, Sacramento,

Chicago, Orlando, Atlanta, Amarillo, Houston, Lubbock,Springfield, Birmingham, Dallas, Albuquerque, Oklahoma City,

El Paso, Lincoln, Lexington, Tampa, Washington DC, MobileFt Wayne, Colorado Springs, Baton Rouge, …

Page 20: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

28

User Interface ConceptDisplay Imagery:

316 m 200 x 200 pixel images7 level image pyramidResolution 1 meter/pixel to 64 meter/pixel

Navigation Tools: 1.5 m place names“Click-on” Coverage mapLongitude and Latitude searchU.S. Address Search

External Geo-Spatial Links to:USGS On-line Stream GaugesHome Advisor DemographicsHome Advisor Real EstateEncarta ArticlesSteam flow gauges

Concept: User navigates an ‘almost seamless’ image of earth

Buttons to pan NW, N, NE, W, E, SW, S, SE

Click on image to zoom in

Links to switch betweenTopo, Imagery, and Relief data

Links to Print, Download andview meta-data information

Page 21: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

29

New “Urban Area” Data

““Redundant Bunch 1”Redundant Bunch 1”

Microsoft Campus at 4 meterMicrosoft Campus at 4 meter resolution resolution

Ball field at .25 meterBall field at .25 meter resolutionresolution

Page 22: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

30

Software Architecture

Database Server

IIS 6.0IIS 6.0

Windows 2003Server

Windows 2003Server

.NET Framework 1.1.NET Framework 1.1

SQL Server 2000SQL Server 2000

TerraServer StoredProcedures

(T-SQL)

TerraServer StoredProcedures

(T-SQL)

ADO.N

ET 1

.1

Load Programs

.NET Framework 1.1.NET Framework 1.1

Windows 2003Server

Windows 2003Server

WinForm AppC# Classes

WinForm AppC# Classes

ADO.NET 1.1

Web Server

IIS 6.0IIS 6.0

.NET Framework 1.1.NET Framework 1.1

ASP.NET 1.1ASP.NET 1.1

TerraServer Web Pages, Services,

Classes(C#)

TerraServer Web Pages, Services,

Classes(C#)

Windows 2003Server

Windows 2003Server

Page 23: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

31

TerraServer Becomes a Web ServiceTerraServer.net -> TerraService.Net

• Web server is for people.• Web Service is for programs

– The end of screen scraping– No faking a URL:

pass real parameters.– No parsing the answer:

data formatted into your address space.

• Hundreds of users but a specific example:– US Department of Agriculture

Lighthouse app.– USDA has internal TerraServer

Page 24: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

32

Web Service Methods• Place Search

– GetPlaceFacts

– GetPlaceList

– GetPlaceListInRect

– CountPlacesInRect

• Projection– ConvertLonLatPtToUtmPt

– ConvertUtmPtToLonLatPt

– ConvertLonLatTo NearestPlace

– GetTheme

– GetLatLonMetrics

• Tile– GetAreaFromPt– GetAreaFromRect– GetAreaFromTileId– GetTileMetaFromLonLatPt– GetTileMetaFromTileId– GetTile (Image)

• Landmark– GetLandmarkTypes– CountOfLandmarkPointsByRect– GetLandmarkPointsByRect– CountOfLandmarkShapesByRect– GetLandmarkShapesByRect

http://terraservice.net

Page 25: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

33

TerraServer Web Services

• Get image meta-data • Query TS Gazetteer• Retrieve TS ImageTiles • Projection conversions

• Web Map Client– OpenGIS “like” – Landmarks layered on

TerraServer imagery

• Geo-coded data of well-known objects (points), e.g. Schools, Golf Courses, Hospitals, etc.

• Polygons of well-known objects (shapes), e.g. Zip Codes, Cities, etc

• Fat Map Client– Visual Basic / C#

Windows Form– Access Web Services for

all data

Terra-Tile-Service Landmark-Service

http://terraservice.net

Sample Apps

Page 26: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

37

Hardware Evolution• 1998 – 2000: DEC Alpha 8400, StorageWorks DAS

– 1 x 8 x 440mhz RISC processor, 2gb RAM– 2.5 TB RAID-5, 9gb SCSI drives 7 racks– $2.1m (World’s Largest PC) – “Single Server Scale Up”

• 2000 – 2003: 4-node Compaq Windows 2000 DataCenter Cluster, StorageWorks SAN– 4 x 8 x 700mhz Intel (Xeon) Processor, 4 gb RAM each– 18 TB RAID-10 (triple mirrored) 73gb drives, 4 racks– $1.6m – “High Availability Large Scale Cluster”

• 2004 - …: “White-box Storage Bricks”– Low Cost Availability

• 4 copies of the data– RAID1 SATA Mirroring– 2 redundant “Bunches”

• Spare brick to repair failed brick 2N+1 design

• Web Application “bunch aware”– Load balances between redundant databases– Fails over to surviving database on failure

– ~100K$ capital expense.

KVM / IPKVM / IP

Page 27: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

38

Outline

• Overview of Microsoft Research

• Big-Database Research

• TerraServer: Geospatial app

• SkyServer: data mining app

• Q&A

Page 28: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

39

Virtual Observatoryhttp://www.astro.caltech.edu/nvoconf/

http://www.voforum.org/

• Premise: Most data is (or could be online)• So, the Internet is the world’s best telescope:

– It has data on every part of the sky– In every measured spectral band: optical, x-ray, radio..

– As deep as the best instruments (2 years ago).– It is up when you are up.

The “seeing” is always great (no working at night, no clouds no moons no..).

– It’s a smart telescope: links objects and data to literature on them.

Page 29: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

40

Why Astronomy Data?•It has no commercial value

–No privacy concerns–Can freely share results with others–Great for experimenting with algorithms

•It is real and well documented–High-dimensional data (with confidence intervals)–Spatial data–Temporal data

•Many different instruments from many different places and many different times•Federation is a goal•The questions are interesting

–How did the universe form?

•There is a lot of it (petabytes)

IRAS 100

ROSAT ~keV

DSS Optical

2MASS 2

IRAS 25

NVSS 20cm

WENSS 92cm

GB 6cm

Page 30: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

41

Time and Spectral DimensionsThe Multiwavelength Crab Nebulae

X-ray, optical,

infrared, and radio

views of the nearby Crab

Nebula, which is now in a state of

chaotic expansion after a supernova

explosion first sighted in 1054 A.D. by Chinese Astronomers.Slide courtesy of Robert Brunner @ CalTech.

Crab star 1053 AD

Page 31: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

42

SkyServer.SDSS.org• A modern archive

– Raw Pixel data lives in file servers– Catalog data (derived objects) lives in Database– Online query to any and all

• Also used for education– 150 hours of online Astronomy– Implicitly teaches data analysis

• Interesting things– Spatial data search– Client query interface via Java Applet– Query interface via Emacs– Popular -- 1% of Terraserver – Cloned by other surveys (a template design) – Web services are core of it.

Page 32: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

43

Demo of SkyServer

• Shows standard web server

• Pixel/image data

• Point and click

• Explore one object

• Explore sets of objects (data mining)

Page 33: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

44

Federation

Data Federations of Web Services• Massive datasets live near their owners:

– Near the instrument’s software pipeline– Near the applications– Near data knowledge and curation– Super Computer centers become Super Data Centers

• Each Archive publishes a web service– Schema: documents the data– Methods on objects (queries)

• Scientists get “personalized” extracts

• Uniform access to multiple Archives– A common global schema

Page 34: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

45

Federation: SkyQuery.Net• Combine 4 archives initially

• Just added 10 more

• Send query to portal, portal joins data from archives.

• Problem: want to do multi-step data analysis (not just single query).

• Solution: Allow personal databases on portal

• Problem: some queries are monsters

• Solution: “batch schedule” on portal server, Deposits answer in personal database.

Page 35: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

462MASS

INT

SDSS

FIRST

SkyQueryPortal

ImageCutout

SkyQuery Structure• Each SkyNode publishes

– Schema Web Service– Database Web Service

• Portal is – Plans Query (2 phase) – Integrates answers– Is itself a web service

Page 36: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

47

SkyQuery: http://skyquery.net/• Distributed Query tool using a set of web services• Four astronomy archives from

Pasadena, Chicago, Baltimore, Cambridge (England).• Feasibility study, built in 6 weeks

– Tanu Malik (JHU CS grad student) – Tamas Budavari (JHU astro postdoc)– With help from Szalay, Thakar, Gray

• Implemented in C# and .NET• Allows queries like:

SELECT o.objId, o.r, o.type, t.objId FROM SDSS:PhotoPrimary o,

TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<3.5

AND AREA(181.3,-0.76,6.5) AND o.type=3 and (o.I - t.m_j)>2

Page 37: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

48

SkyNode Basic Web Services• Metadata information about resources

– Waveband– Sky coverage– Translation of names to universal dictionary (UCD)

• Simple search patterns on the resources– Cone Search– Image mosaic– Unit conversions

• Simple filtering, counting, histogramming• On-the-fly recalibrations

Page 38: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

49

Portals: Higher Level Services• Built on Atomic Services• Perform more complex tasks• Examples

– Automated resource discovery– Cross-identifications– Photometric redshifts– Outlier detections– Visualization facilities

• Goal:– Build custom portals in days from existing building blocks

(like today in IRAF or IDL)

Page 39: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

502MASS

INT

SDSS

FIRST

SkyQueryPortal

ImageCutout

MyDB added to SkyQuery• Let users add personal DB

1GB for now.• Use it as a workbook.• Online and batch queries.

• Moves analysis to the data• Users can cooperate

(share MyDB)• Still exploring this

MyDB

Page 40: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

51

The Big PictureExperiments &

Instruments

Simulationsfacts

facts

answers

questions

• Data ingest • Managing a petabyte• Common schema• How to organize it?• How to reorganize it• How to coexist with others

• Query and Vis tools • Support/training• Performance

– Execute queries in a minute – Batch query scheduling

?The Big Problems

Literature

Other Archives facts

facts

Page 41: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

52

Outline

• Overview of Microsoft Research

• Big-Database Research

• TerraServer: Geospatial app

• SkyServer: data mining app

• Q&A

Page 42: 1 Microsoft Research and Big Databases Information at your fingertips Jim Gray & Tom Barclay gray@microsoft.comgray@microsoft.com & TBarclay@Microsoft.comTBarclay@Microsoft.com.

53

Grid and Web Services Synergy• I believe the Grid will be many web services

share data (computrons are free)

• IETF standards Provide – Naming– Authorization / Security / Privacy– Distributed Objects

Discovery, Definition, Invocation, Object Model

– Higher level services: workflow, transactions, DB,..

• Synergy: commercial Internet & Grid tools