Top Banner
Scaleable Computing Scaleable Computing Jim Gray Jim Gray Researcher Researcher US-WAT MSR San Francisco US-WAT MSR San Francisco Microsoft Corporation Microsoft Corporation [email protected] [email protected]
44

Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation [email protected]

Jan 02, 2016

Download

Documents

Valentine Glenn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Scaleable ComputingScaleable Computing

Jim GrayJim Gray Researcher Researcher

US-WAT MSR San FranciscoUS-WAT MSR San FranciscoMicrosoft CorporationMicrosoft Corporation

[email protected]@Microsoft.com™

Page 2: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

OutlineOutline

Why scaleable servers?Why scaleable servers? Problems and solutions for scaleable serversProblems and solutions for scaleable servers

How Internet Information Server revolutionizes OLTPHow Internet Information Server revolutionizes OLTP ““Wolfpack” Windows NTWolfpack” Windows NT®® clusters for clusters for

scaleability, availability, manageability scaleability, availability, manageability ActiveXActiveX™™ object model as structuring principle object model as structuring principle OLE DB (DAO) for data sourcesOLE DB (DAO) for data sources MTX as a new programming paradigmMTX as a new programming paradigm MTX as a serverMTX as a server Distributed transactions to coordinate componentsDistributed transactions to coordinate components ““Falcon” queues for asynchronous processingFalcon” queues for asynchronous processing

Page 3: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Kinds Of Kinds Of Information ProcessingInformation Processing

It’s ALL going electronicIt’s ALL going electronic

Immediate is being stored for analysis (so ALL database)Immediate is being stored for analysis (so ALL database)

Analysis and automatic processing are being addedAnalysis and automatic processing are being added

Point-to-pointPoint-to-point BroadcastBroadcast

ImmediateImmediate

Time-Time-shiftedshifted

ConversationConversationMoneyMoney

LectureLectureConcertConcert

MailMail BookBookNewspaperNewspaper

NetworkNetwork

DatabaseDatabase

Page 4: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Why Put EverythingWhy Put EverythingIn Cyberspace?In Cyberspace?

Low rent -Low rent -min $/bytemin $/byte

Shrinks time -Shrinks time -now or laternow or later

Shrinks space -Shrinks space -here or therehere or there

Automate processing -Automate processing -knowbotsknowbots

Point-to-point Point-to-point OR OR

broadcastbroadcast

Imm

ed

iate

OR

tim

e-d

ela

ye

dIm

me

dia

te O

R t

ime

-de

lay

ed

NetworkNetwork

DatabaseDatabase

LocateLocateProcessProcessAnalyzeAnalyzeSummarizeSummarize

Page 5: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Magnetic Storage Magnetic Storage Cheaper Than PaperCheaper Than Paper

File cabinetFile cabinet:: cabinet (four drawer)cabinet (four drawer) 250$250$paper (24,000 paper (24,000

sheets)sheets) 250$250$ space space (2x3 @ 10$/ft(2x3 @ 10$/ft22)) 180$180$ totaltotal

700$700$

3¢/sheet3¢/sheet DiskDisk:: disk (4 GB =)disk (4 GB =) 800$800$

ASCII: 2 mil ASCII: 2 mil pagespages

00..04¢/sheet04¢/sheet (80x cheaper)(80x cheaper)

ImageImage:: 200,000 pages200,000 pages

0.4¢/sheet0.4¢/sheet (8x cheaper)(8x cheaper)

Store everything on diskStore everything on disk

Page 6: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

DatabasesDatabasesInformation at Your FingertipsInformation at Your Fingertips™™

Information Network Information Network™™

Knowledge NavigatorKnowledge Navigator™™

All information will be in anAll information will be in anonline database (somewhere)online database (somewhere)

You might record everything you You might record everything you Read: 10MB/day, 400 GB/lifetimeRead: 10MB/day, 400 GB/lifetime

(eight tapes (eight tapes todaytoday)) Hear: 400MB/day, 16 TB/lifetimeHear: 400MB/day, 16 TB/lifetime

(three tapes/year (three tapes/year todaytoday)) See: 1MB/s, 40GB/day, 1.6 PB/lifetime See: 1MB/s, 40GB/day, 1.6 PB/lifetime

(maybe someday)(maybe someday)

Page 7: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Database StoreDatabase StoreALL Data TypesALL Data Types

The new world:The new world: Billions of objectsBillions of objects Big objects (1 MB)Big objects (1 MB) Objects have Objects have

behavior (methods)behavior (methods)

The old world:The old world: Millions of objectsMillions of objects 100-byte objects100-byte objects

PeoplePeople

NameName AddressAddress

MikeMike

WonWon

DavidDavid NYNY

BerkBerk

AustinAustinPeoplePeople

NameName AddressAddress PapersPapers PicturePicture VoiceVoice

MikeMike

WonWon

DavidDavid NYNY

BerkBerk

AustinAustin

Paperless officePaperless office Library of Congress onlineLibrary of Congress online All information onlineAll information online

EntertainmentEntertainmentPublishingPublishingBusinessBusiness

WWW and InternetWWW and Internet

Page 8: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Billions Of Clients Billions Of Clients

Every device will be “intelligent”Every device will be “intelligent” Doors, rooms, cars…Doors, rooms, cars… Computing will be ubiquitousComputing will be ubiquitous

Page 9: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Billions Of ClientsBillions Of ClientsNeed Millions Of ServersNeed Millions Of Servers

MobileMobileclientsclients

FixedFixedclients clients

ServerServer

SuperSuperserverserver

ClientsClients

ServersServers

All clients networked All clients networked to serversto servers May be nomadicMay be nomadic

or on-demandor on-demand Fast clients wantFast clients want

fasterfaster servers servers Servers provide Servers provide

Shared DataShared Data ControlControl CoordinationCoordination CommunicationCommunication

Page 10: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

ConclusionConclusion Commodity hardware allowsCommodity hardware allows

new applicationsnew applications New applications need huge serversNew applications need huge servers Ideally, clients and servers areIdeally, clients and servers are

built of the same “stuff”built of the same “stuff” Servers should be built fromServers should be built from

Commodity software and Commodity software and Commodity hardwareCommodity hardware

Servers should be able to Servers should be able to Scale up (grow by adding CPUs,Scale up (grow by adding CPUs,

disks, networks)disks, networks) Scale down (can start small)Scale down (can start small)

Page 11: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Scaleable SystemsScaleable SystemsBOTH SMP And ClusterBOTH SMP And Cluster

Grow up with SMP; 4xP6Grow up with SMP; 4xP6is now standardis now standardGrow out with clusterGrow out with clusterCluster has inexpensive partsCluster has inexpensive parts

ClusterClusterof PCs of PCs

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

Page 12: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

SMPs Have AdvantagesSMPs Have Advantages

Single system image Single system image easier to manage, easier easier to manage, easier to program threads in to program threads in shared memory, disk, Netshared memory, disk, Net

4x SMP is commodity4x SMP is commodity Software capable of 16xSoftware capable of 16x Problems:Problems:

>4 not commodity>4 not commodity Scale-down problem Scale-down problem

(starter systems expensive)(starter systems expensive) There There isis a BIGGEST one a BIGGEST one

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

Page 13: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

The TPC-C RevolutionThe TPC-C RevolutionShows How Far SMPs Have ComeShows How Far SMPs Have Come Performance is amazing: Performance is amazing:

2,000 users is the min!2,000 users is the min! 30,000 users on a 4x12 alpha cluster (Oracle)30,000 users on a 4x12 alpha cluster (Oracle)

Prices dropping fastPrices dropping fastVendor's tpmC and $/tpmC

UNIX Dis-Economy of Scale

$0

$50

$100

$150

$200

$250

$300

$350

$400

$450

$500

0 5000 10000 15000 20000 25000 30000 35000

Performance tpmC

Pri

ce

$/T

PM

-C DB2

Informix

Microsoft

Oracle

Sybase

Bet

ter

Bet

ter

Informix on NT

Page 14: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

TPC-C TPC-C Web-Based BenchmarksWeb-Based Benchmarks

SQL Server executes, SQL Server executes, returns ODBCreturns ODBC

Web server builds Web server builds HTML pageHTML page

Sends it to clientSends it to clientvia HTTPvia HTTP

6750 transactions/6750 transactions/minute C on 4xP6minute C on 4xP6

Net: InternetNet: Internetserver performance server performance is GREAT!is GREAT!

HT

TP

HT

TP

OD

BC

OD

BC

SQLSQL

IISIIS= Web= Web

Page 15: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

What Happens To Prices?What Happens To Prices? No expensive UNIX front end No expensive UNIX front end

(20$/tpmC)(20$/tpmC) No expensive TP monitorNo expensive TP monitor

software (10$/tpmC)software (10$/tpmC) => 81$/tpmC=> 81$/tpmC

TPC Price/tpmC

164

93

188

39

66 64

54

3944

66

44 4440

42

31

3835

38

22

41

18

35

16

3945

30

8

19

27

40

3

21

0

10

20

30

40

50

60

70

80

90

100

processor disk software net

Informix on SNIOracle on DEC UnixOracle on Compaq/NTSybase on Compaq/NTMicrosoft on Compaq with VisigenicsMicrosoft on HP with VisagenicsMicrosoft on Intergraph with IISMicrosoft on Compaq with IIS

Page 16: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Scaleable SystemsScaleable SystemsClusters Scale Beyond Largest SMPClusters Scale Beyond Largest SMP

ClusterClusterof PCs of PCs

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

Page 17: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Clusters Have AdvantagesClusters Have Advantages

Clients and servers made from the same stuffClients and servers made from the same stuff Inexpensive: Inexpensive:

Built with commodity components Built with commodity components

Fault tolerance: Fault tolerance: Spare modules mask failuresSpare modules mask failures

Modular growthModular growth Grow by adding small modulesGrow by adding small modules

Unlimited growth: Unlimited growth: no biggest oneno biggest one

Page 18: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

ParallelismParallelismThe OTHER aspect of clustersThe OTHER aspect of clusters

Clusters of machines Clusters of machines allow two kinds allow two kinds of parallelismof parallelism Many little jobs: online Many little jobs: online

transaction processingtransaction processing TPC-A, B, C…TPC-A, B, C…

A few big jobs: data A few big jobs: data search and analysissearch and analysis TPC-D, DSS, OLAPTPC-D, DSS, OLAP

Both give Both give automatic parallelismautomatic parallelism

Page 19: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

ThesisThesisMany little beat few bigMany little beat few big

Smoking, hairy golf ballSmoking, hairy golf ball How to connect the many little parts?How to connect the many little parts? How to program the many little parts?How to program the many little parts? Fault tolerance?Fault tolerance?

$1 $1 millionmillion $100 K$100 K $10 K$10 K

MainframeMainframe MiniMiniMicroMicro NanoNano

14"14"9"9"

5.25"5.25" 3.5"3.5" 2.5"2.5" 1.8"1.8"1 M SPECmarks, 1TFLOP1 M SPECmarks, 1TFLOP

101066 clocks to bulk ram clocks to bulk ram

Event-horizon on chipEvent-horizon on chip

VM reincarnatedVM reincarnated

Multiprogram cache,Multiprogram cache,On-Chip SMPOn-Chip SMP

10 microsecond ram

10 millisecond disc

10 second tape archive

10 nano-second ram

Pico Processor

10 pico-second ram

1 MM 3

100 TB

1 TB

10 GB

1 MB

100 MB

Page 20: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Future Super Server:Future Super Server:4T Machine4T Machine

Array of 1,000 4B machinesArray of 1,000 4B machines1 bps processors1 bps processors1 BB DRAM 1 BB DRAM 10 BB disks 10 BB disks 1 Bbps comm lines1 Bbps comm lines1 TB tape robot1 TB tape robot

A few megabucksA few megabucks Challenge:Challenge:

ManageabilityManageabilityProgrammabilityProgrammabilitySecuritySecurityAvailabilityAvailabilityScaleabilityScaleabilityAffordabilityAffordability

As easy as a single systemAs easy as a single system

Future servers are CLUSTERSFuture servers are CLUSTERSof processors, discsof processors, discs

Distributed database techniquesDistributed database techniquesmake clusters workmake clusters work

CPU

50 GB Disc

5 GB RAM

Cyber BrickCyber Bricka 4B machinea 4B machine

Page 21: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

The Hardware Is In Place…The Hardware Is In Place…And then a miracle occursAnd then a miracle occurs

? SNAP: scaleable networkSNAP: scaleable network

and platformsand platforms Commodity-distributedCommodity-distributed

OS built on:OS built on: Commodity platformsCommodity platforms Commodity networkCommodity network

interconnectinterconnect Enables parallel applicationsEnables parallel applications

Page 22: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Two Scaleability ProjectsTwo Scaleability Projects1-TB DB and 1 billion TPD1-TB DB and 1 billion TPD

1 billion 1 billion transactions transactions

per dayper day

SMP superSMP superserverserver

DepartmentalDepartmentalserverserver

PersonalPersonalsystemsystem

1 Terabyte DB1 Terabyte DB

Grow UP and Grow UP and grow OUTgrow OUT

Page 23: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Building The Biggest NodeBuilding The Biggest Node There is a biggest node (size grows over time)There is a biggest node (size grows over time) Today, with Windows NT, it is probably 1TBToday, with Windows NT, it is probably 1TB We are building it (with help fromWe are building it (with help from

DEC and SPOT)DEC and SPOT) 1 TB GeoSpatial SQL Server database1 TB GeoSpatial SQL Server database (1.4 TB of disks = 280 drives)(1.4 TB of disks = 280 drives) 30K BTU, 8 KVA, 1.5 metric tons30K BTU, 8 KVA, 1.5 metric tons

We plan to put it on the Web We plan to put it on the Web as a demonstration applicationas a demonstration application

It will hold satellite images of the It will hold satellite images of the entire planetentire planet One pixel per 10 metersOne pixel per 10 meters Better resolution in U.S. (courtesy of USGS)Better resolution in U.S. (courtesy of USGS)

Page 24: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

What’s A TeraByte?What’s A TeraByte?

1 Terabyte1 Terabyte 1,000,000,000 business letters 1,000,000,000 business letters 150 miles of book shelf150 miles of book shelf 100,000,000 book pages 100,000,000 book pages 15 miles of book shelf15 miles of book shelf 50,000,000 FAX images50,000,000 FAX images 7 miles of book shelf7 miles of book shelf 10,000,000 TV pictures (mpeg)10,000,000 TV pictures (mpeg) 10 days of video10 days of video 4,000 LandSat images 4,000 LandSat images 16 earth images (100m)16 earth images (100m)

Library of Congress (in ASCII) is 25 TBLibrary of Congress (in ASCII) is 25 TB 1980: $200 million of disc 10,000 discs1980: $200 million of disc 10,000 discs $5 million of tape silo 10,000 tapes$5 million of tape silo 10,000 tapes

1996: $200,000 of magnetic disc 120 discs1996: $200,000 of magnetic disc 120 discs $50,000 nearline tape 50 tapes$50,000 nearline tape 50 tapes

Terror Byte!Terror Byte!

Page 25: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

User InterfaceUser Interface

+ +

+

Next

Page 26: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

What The 1-Billion TPDWhat The 1-Billion TPDProject Is DoingProject Is Doing

Building a 20-node Windows NTBuilding a 20-node Windows NTCluster (with help from Intel)Cluster (with help from Intel)

All commodity partsAll commodity parts Using SQL Server & DTCUsing SQL Server & DTC

distributed transactionsdistributed transactions Each node has 1/20th of the DB Each node has 1/20th of the DB Each node does 1/20th of the workEach node does 1/20th of the work 15% of the transactions are “distributed”15% of the transactions are “distributed” Uses the “Viper” distributedUses the “Viper” distributed

transaction coordinatortransaction coordinator

Page 27: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

How Much Is 1 Billion How Much Is 1 Billion Transactions Per Day?Transactions Per Day?

Millions of transactions per dayMillions of transactions per day

0.10.1

1.1.

10.10.

100.100.

1,000.1,000.

1 B

tpd

1 B

tpd

Vis

aV

isa

AT

&T

AT

&T

Bo

fAB

ofA

NY

SE

NY

SE

Mtp

dM

tpd

1 Btpd = 11,574 tps 1 Btpd = 11,574 tps (transactions per second)(transactions per second) ~ 700,000 tpm ~ 700,000 tpm (transactions/minute)(transactions/minute)

AT&T AT&T 185 million calls 185 million calls

(peak day worldwide)(peak day worldwide) Visa ~20 M tpdVisa ~20 M tpd

400 M customers400 M customers 250,000 ATMs worldwide250,000 ATMs worldwide 7 billion transactions / year 7 billion transactions / year

(card+cheque) in 1994 (card+cheque) in 1994

Page 28: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

OutlineOutline

Why scaleable servers?Why scaleable servers? Problems and solutions for scaleable serversProblems and solutions for scaleable servers

How Internet Information Server revolutionizes OLTPHow Internet Information Server revolutionizes OLTP ““Wolfpack” Windows NT clusters for Wolfpack” Windows NT clusters for

scaleability, availability, manageability scaleability, availability, manageability ActiveX object model as structuring principleActiveX object model as structuring principle OLE DB (DAO) for data sourcesOLE DB (DAO) for data sources MTX as a new programming paradigmMTX as a new programming paradigm MTX as a serverMTX as a server Distributed transactions to coordinate componentsDistributed transactions to coordinate components ““Falcon” queues for asynchronous processingFalcon” queues for asynchronous processing

Page 29: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

““WolfpackWolfpack” Windows NT Clusters” Windows NT ClustersThe great hopeThe great hope

Tandem, Teradata, VAX clusters are proprietaryTandem, Teradata, VAX clusters are proprietary Microsoft & 60 vendors definingMicrosoft & 60 vendors defining

Windows NT ClustersWindows NT Clusters Code name “Wolfpack”Code name “Wolfpack” Almost all big hardware and software vendors involvedAlmost all big hardware and software vendors involved

No special hardware needed -but it may helpNo special hardware needed -but it may help

Page 30: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

““Wolfpack” clustersWolfpack” clusters Key goals:Key goals: Easy: to install, manage, programEasy: to install, manage, program Reliable: more reliable than single nodeReliable: more reliable than single node Scaleable: added parts add throughputScaleable: added parts add throughput

Initial “Wolfpack” is two-node failoverInitial “Wolfpack” is two-node failover Each node can be 4x (or more) SMPEach node can be 4x (or more) SMP File, print, Internet, mail, DB, other servicesFile, print, Internet, mail, DB, other services Easy to manageEasy to manage

Next (NT5) “Wolfpack” is modest size clusterNext (NT5) “Wolfpack” is modest size cluster About 16 nodes (so 64 to 128 CPUs)About 16 nodes (so 64 to 128 CPUs) No hard limit, algorithms designedNo hard limit, algorithms designed

to go furtherto go further

Page 31: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

OutlineOutline

Why scaleable servers?Why scaleable servers? Problems and solutions for scaleable serversProblems and solutions for scaleable servers

How Internet Information Server revolutionizes OLTPHow Internet Information Server revolutionizes OLTP ““Wolfpack” Windows NT clusters for Wolfpack” Windows NT clusters for

scaleability, availability, manageability scaleability, availability, manageability ActiveX object model as structuring principleActiveX object model as structuring principle OLE DB (DAO) for data sourcesOLE DB (DAO) for data sources MTX as a new programming paradigmMTX as a new programming paradigm MTX as a serverMTX as a server Distributed transactions to coordinate componentsDistributed transactions to coordinate components ““Falcon” queues for asynchronous processingFalcon” queues for asynchronous processing

Page 32: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

The BIG PictureThe BIG PictureComponents and transactionsComponents and transactions

Software modules are objects Software modules are objects Object Request Broker (a.k.a., Transaction Object Request Broker (a.k.a., Transaction

Processing Monitor) connects objectsProcessing Monitor) connects objects(clients to servers)(clients to servers)

Standard interfaces allow software plug-insStandard interfaces allow software plug-ins Transaction ties execution of a “job” into an Transaction ties execution of a “job” into an

atomic unit: all-or-nothing, durable, isolatedatomic unit: all-or-nothing, durable, isolated

Object Request BrokerObject Request Broker

Page 33: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Component Object ModelComponent Object Model COM is Microsoft model, engine inside OLE ALL COM is Microsoft model, engine inside OLE ALL

Microsoft software is based on COM (ActiveX)Microsoft software is based on COM (ActiveX) CORBA + OpenDoc is equivalentCORBA + OpenDoc is equivalent Heated debate over which is bestHeated debate over which is best Both share same key goals: Both share same key goals:

Encapsulation: hide implementationEncapsulation: hide implementation Polymorphism: generic operationsPolymorphism: generic operations

key to GUI and reuse key to GUI and reuse Versioning: allow upgradesVersioning: allow upgrades Transparency: local/remoteTransparency: local/remote Security: invocation can be remote Security: invocation can be remote Shrink-wrap: minimal inheritanceShrink-wrap: minimal inheritance Automation: easyAutomation: easy

COM now managed by the Open GroupCOM now managed by the Open Group

Page 34: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

OLE DB: Objects Meet DatabasesOLE DB: Objects Meet DatabasesThe basis for The basis for universaluniversal

data servers, access, & integrationdata servers, access, & integration

DBMSDBMSengineengine

DatabaseDatabase

SpreadsheetSpreadsheet

PhotosPhotos

MailMail

MapMap

DocumentDocument

OLE DB: object-oriented (COM OLE DB: object-oriented (COM oriented) programming interface oriented) programming interface to datato data

Breaks DBMS into componentsBreaks DBMS into components Anything can be a data sourceAnything can be a data source Optimization/navigation “on top Optimization/navigation “on top

of” other data sourcesof” other data sources A way to componentized a A way to componentized a

DBMSDBMS Makes an RDBMS and O-RMakes an RDBMS and O-R

DBMS (assumes optimizer DBMS (assumes optimizer understands objects)understands objects)

Page 35: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Transactions Coordinate Transactions Coordinate Components (ACID)Components (ACID)

Programmer’s view: bracket Programmer’s view: bracket a collection of actionsa collection of actions

A A simplesimple failure model failure model Only two outcomes:Only two outcomes:

Begin()Begin() actionaction actionaction actionaction actionactionCommit()Commit()

Success!Success!

Begin()Begin()action action actionactionactionactionRollback()Rollback()

Begin()Begin()action action actionactionactionaction

Rollback()Rollback()

Failure!Failure!

Fail !Fail !Fail !Fail !

Page 36: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Distributed TransactionsDistributed Transactions Enable Huge Throughput Enable Huge Throughput

Each node capable of 7 KtmpC Each node capable of 7 KtmpC (7,000 (7,000 activeactive users!) users!) Can add nodes to cluster Can add nodes to cluster (to support 100,000 users)(to support 100,000 users)

Transactions coordinate nodesTransactions coordinate nodes ORB / TP monitor spreads work among nodesORB / TP monitor spreads work among nodes

Page 37: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Distributed TransactionsDistributed Transactions Enable Huge DBs Enable Huge DBs

Distributed database technology Distributed database technology spreads data among nodesspreads data among nodes

Transaction processing technology Transaction processing technology manages nodesmanages nodes

Page 38: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

Microsoft Transaction ServiceMicrosoft Transaction ServiceA new programming paradigmA new programming paradigm

Develop your ActiveX object on the desktopDevelop your ActiveX object on the desktop Better yet: download them from the NetBetter yet: download them from the Net Script your work flows as invocations of ActiveX objectsScript your work flows as invocations of ActiveX objects All on desktopAll on desktop

Then, move work flows and objects to server(s)Then, move work flows and objects to server(s) Gives Gives desktop developmentdesktop development three-tier deploymentthree-tier deployment

PresentationPresentationlayerlayer

WorkflowWorkflowlayerlayer

Applicationobjects

DatabaseDatabaselayerlayer

ClientClient Server(s)Server(s)Design and Design and developmentdevelopmentphasephase

DeploymentDeploymentphasephase

PresentationPresentationlayerlayer

WorkflowWorkflowlayerlayer

ApplicationObjects

DatabaseDatabaselayerlayer

MTX execution environmentMTX execution environment

Page 39: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

MTX Provides Server-Side MTX Provides Server-Side Execution Environment Execution Environment

Accepts ActiveX objectsAccepts ActiveX objects Manages bindingsManages bindings

(it’s an ORB)(it’s an ORB) EfficientEfficient (pre-bound (pre-bound

servers)servers) Manages thread poolsManages thread pools Manages securityManages security Includes transaction Includes transaction

servicesservices Provides operator Provides operator

interfaceinterface GUI administrative GUI administrative

interfaceinterface

Clients

NetworkNetwork

Thread PoolThread Pool

QueueQueue

ConnectionsConnections

ContextContext SecuritySecurity

Shared Data

ReceiverReceiver

SynchronizationSynchronization

Service logic

Co

nfig

ura

tion

Co

nfig

ura

tion

Structure of a Structure of a scaleable serverscaleable server

Deadlocks andDeadlocks andstarvationstarvation

Scheduling andScheduling andload balancing load balancing

Object handlesObject handles

National languageNational language

AuthenticationAuthentication

Directory registration, Directory registration, congestion andcongestion andflow control flow control

Ma

na

ge

me

nt

Ma

na

ge

me

nt

Page 40: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

MTX Also Coordinates MTX Also Coordinates And InteroperatesAnd Interoperates

Coordinates Coordinates distributed distributed transactionstransactions Begin dist tran:

Update sales Update inventory Update warrantyCommit

Client applicationClient application

Windows NTWindows NTServerServer

DTCDTC DTC

ServerServer Server Server

SalesSales InventoryInventory WarrantyWarranty

SQL Server Other DBMSSQL Server

Windows NTWindows NT Windows NTWindows NT

Page 41: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

MTX Also Coordinates MTX Also Coordinates And InteroperatesAnd Interoperates

Interoperates with Interoperates with Internet and with Internet and with legacy systemslegacy systems

Windows NT Windows NT Server 4.0Server 4.0

ActiveXComponents

Internet Information Server

SNA ServerSNA Server

OLETX XALU6.2

Browser/clientBrowser/client

HTTPHTTP

MTx

DC

OM

DC

OM

SQL ServerSQL Server Other DBMSOther DBMSCICS/MVSCICS/MVS

Page 42: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

““FalconFalcon” Queue Management ” Queue Management Asynchronous transaction processingAsynchronous transaction processing

Many tasks areMany tasks aretime-shiftedtime-shifted

““Falcon” gives a Falcon” gives a QUEUE mechanismQUEUE mechanism

Message-oriented Message-oriented middlewaremiddleware

Decouples clientDecouples clientfrom serverfrom server

Server works on Server works on priority queuespriority queues

Point-to-pointPoint-to-point BroadcastBroadcast

ImmediateImmediate

TimeTimeshiftedshifted

conversationconversationmoneymoney

lecturelectureconcertconcert

mailmail bookbooknewspapernewspaper

NetNetworkwork

DatabaseDatabase

ServerServer

ClientClient

Page 43: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™

OutlineOutline

Why scaleable servers?Why scaleable servers? Problems and solutions for scaleable serversProblems and solutions for scaleable servers

How Internet Information Server revolutionizes OLTPHow Internet Information Server revolutionizes OLTP ““Wolfpack” Windows NT clusters for Wolfpack” Windows NT clusters for

scaleability, availability, manageabilityscaleability, availability, manageability ActiveX object model as structuring principleActiveX object model as structuring principle OLE DB (DAO) for data sourcesOLE DB (DAO) for data sources MTX as a new programming paradigmMTX as a new programming paradigm MTX as a serverMTX as a server Distributed transactions to coordinate componentsDistributed transactions to coordinate components ““Falcon” queues for asynchronous processingFalcon” queues for asynchronous processing

Page 44: Scaleable Computing Jim Gray Researcher US-WAT MSR San Francisco Microsoft Corporation Gray@Microsoft.com ™