Top Banner
Programming Programming Distributed Systems Distributed Systems with High Level with High Level Abstractions Abstractions Douglas Thain Douglas Thain University of Notre Dame University of Notre Dame 23 October 2008 23 October 2008
28

Programming Distributed Systems with High Level Abstractions

Dec 31, 2015

Download

Documents

severino-trejo

Programming Distributed Systems with High Level Abstractions. Douglas Thain University of Notre Dame 23 October 2008. Distributed Systems. Scale: 2 – 100s – 1000s – millions Domains:Single or Multi Users: 1 – 10 – 100 – 1000 – 10000 Naming:Direct, Virtual - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Programming Distributed Systems with High Level Abstractions

ProgrammingProgrammingDistributed SystemsDistributed Systems

with High Level Abstractionswith High Level Abstractions

Douglas ThainDouglas Thain

University of Notre DameUniversity of Notre Dame

23 October 200823 October 2008

Page 2: Programming Distributed Systems with High Level Abstractions

Distributed SystemsDistributed Systems

Scale: Scale: 2 – 100s – 1000s – millions2 – 100s – 1000s – millions

Domains:Domains: Single or MultiSingle or Multi

Users: Users: 1 – 10 – 100 – 1000 – 100001 – 10 – 100 – 1000 – 10000

Naming:Naming: Direct, VirtualDirect, Virtual

Scheduling:Scheduling: Timesharing / Space SharingTimesharing / Space Sharing

Interface:Interface: Allocate CPU / Execute JobAllocate CPU / Execute Job

Security:Security: None / IP / PKI / KRB …None / IP / PKI / KRB …

Storage: Storage: Embedded / ExternalEmbedded / External

Page 3: Programming Distributed Systems with High Level Abstractions

Cloud Computing?Cloud Computing?

Scale: Scale: 2 – 100s – 2 – 100s – 1000s – 10000s1000s – 10000s

Domains:Domains: SingleSingle or Multi or Multi

Users: Users: 1 – 10 – 100 – 1000 – 1 – 10 – 100 – 1000 – 1000010000

Naming:Naming: Direct, Direct, VirtualVirtual

Scheduling:Scheduling: Timesharing / SpacesharingTimesharing / Spacesharing

Interface:Interface: Allocate CPU / Execute JobAllocate CPU / Execute Job

Security:Security: None / IP / PKI / KRB …None / IP / PKI / KRB …

Storage: Storage: Embedded / ExternalEmbedded / External

Page 4: Programming Distributed Systems with High Level Abstractions

Grid Computing?Grid Computing?

Scale: Scale: 2 – 100s – 2 – 100s – 1000s – 10000s1000s – 10000s

Domains:Domains: Single or Single or MultiMulti

Users: Users: 1 – 10 – 100 – 1000 – 1 – 10 – 100 – 1000 – 1000010000

Naming:Naming: Direct, VirtualDirect, Virtual

Scheduling:Scheduling: Timesharing / SpacesharingTimesharing / Spacesharing

Interface:Interface: Allocate CPU / Allocate CPU / Execute JobExecute Job

Security:Security: None / IP / PKI / KRB …None / IP / PKI / KRB …

Storage: Storage: Embedded / ExternalEmbedded / External

Page 5: Programming Distributed Systems with High Level Abstractions

An Assembly LanguageAn Assembly Languageof Distributed Computingof Distributed Computing

Fundamental OperationsFundamental Operations– TransferFile( source, destination )TransferFile( source, destination )– ExecuteJob( host, exe, input, output )ExecuteJob( host, exe, input, output )– AllocateVM( cpu, mem, disk, opsys )AllocateVM( cpu, mem, disk, opsys )

Semantics of Assembly are Subtle:Semantics of Assembly are Subtle:– When do instructions commit?When do instructions commit?– Delay slots before control transfers?Delay slots before control transfers?– What exceptions are valid for each opcode?What exceptions are valid for each opcode?– Precise or imprecise exceptions?Precise or imprecise exceptions?– What is the cost of each instruction?What is the cost of each instruction?

Page 6: Programming Distributed Systems with High Level Abstractions

Programming in Assembly StinksProgramming in Assembly Stinks

You know the problems:You know the problems:– Stack management.Stack management.– Garbage collection.Garbage collection.– Type checking.Type checking.– Co-location of data and computation.Co-location of data and computation.– Query optimizations.Query optimizations.– Function shipping or data shipping?Function shipping or data shipping?– How many nodes should I harness?How many nodes should I harness?

Page 7: Programming Distributed Systems with High Level Abstractions

AbstractionsAbstractionsfor Distributed Computingfor Distributed Computing

Abstraction: a Abstraction: a declarative specificationdeclarative specification of the computation and data of a workload.of the computation and data of a workload.

A A restricted patternrestricted pattern, not meant to be a , not meant to be a general purpose programming language.general purpose programming language.

Avoid the really terrible cases.Avoid the really terrible cases.

Provide users with a Provide users with a bright pathbright path..

Data structuresData structures instead of file systems. instead of file systems.

Page 8: Programming Distributed Systems with High Level Abstractions

All-Pairs AbstractionAll-Pairs Abstraction

AllPairs( set A, set B, function F )AllPairs( set A, set B, function F )

returns matrix M wherereturns matrix M where

M[i][j] = F( A[i], B[j] ) for all i,jM[i][j] = F( A[i], B[j] ) for all i,j

B1

B2

B3

A1 A2 A3

F F F

A1A1

An

B1B1

Bn

F

AllPairs(A,B,F)F

F F

F F

FMoretti, Bulosan, Flynn, Thain,AllPairs: An Abstraction… IPDPS 2008

Page 9: Programming Distributed Systems with High Level Abstractions

Example ApplicationExample Application

Goal: Design robust face comparison function.Goal: Design robust face comparison function.

F

0.05

F

0.97

Page 10: Programming Distributed Systems with High Level Abstractions

Similarity Matrix ConstructionSimilarity Matrix Construction

11 .8.8 .1.1 00 00 .1.1

11 00 .1.1 .1.1 00

11 00 .1.1 .3.3

11 00 00

11 .1.1

11

F

Current Workload:4000 images256 KB each10s per F(five days)

Future Workload:60000 images1MB each1s per F(three months)

Page 11: Programming Distributed Systems with High Level Abstractions

http://www.cse.nd.edu/~ccl/viz

Page 12: Programming Distributed Systems with High Level Abstractions

Non-Expert User Using 500 CPUsNon-Expert User Using 500 CPUsTry 1: Each F is a batch job.Failure: Dispatch latency >> F runtime.

HN

CPU CPU CPU CPUF F F FCPUF

Try 2: Each row is a batch job.Failure: Too many small ops on FS.

HN

CPU CPU CPU CPUF F F FCPUFFFF FF

FFFF

FFF

FFF

Try 3: Bundle all files into one package.Failure: Everyone loads 1GB at once.

HN

CPU CPU CPU CPUF F F FCPUFFFF FF

FFFF

FFF

FFF

Try 4: User gives up and attemptsto solve an easier or smaller problem.

Page 13: Programming Distributed Systems with High Level Abstractions

All-Pairs AbstractionAll-Pairs Abstraction

AllPairs( set A, set B, function F )AllPairs( set A, set B, function F )

returns matrix M wherereturns matrix M where

M[i][j] = F( A[i], B[j] ) for all i,jM[i][j] = F( A[i], B[j] ) for all i,j

B1

B2

B3

A1 A2 A3

F F F

A1A1

An

B1B1

Bn

F

AllPairs(A,B,F)F

F F

F F

F

Page 14: Programming Distributed Systems with High Level Abstractions
Page 15: Programming Distributed Systems with High Level Abstractions
Page 16: Programming Distributed Systems with High Level Abstractions
Page 17: Programming Distributed Systems with High Level Abstractions

What is the right metric?What is the right metric?

Speedup?Speedup?– Seq Runtime / Parallel RuntimeSeq Runtime / Parallel Runtime

Parallel Efficiency?Parallel Efficiency?– Speedup / N CPUs?Speedup / N CPUs?

Neither works, because the number of CPUs Neither works, because the number of CPUs varies over time and between runs.varies over time and between runs.

Cost EfficiencyCost Efficiency– Work Completed / Resources ConsumedWork Completed / Resources Consumed– Person-Miles / GallonPerson-Miles / Gallon– Results / CPU-hoursResults / CPU-hours– Results / $$$Results / $$$

Page 18: Programming Distributed Systems with High Level Abstractions

All-Pairs Abstraction

Page 19: Programming Distributed Systems with High Level Abstractions

T2

Classify AbstractionClassify Abstraction

Classify( T, R, N, P, F )Classify( T, R, N, P, F )

T = testing setT = testing set R = training setR = training set

N = # of partitionsN = # of partitions F = classifierF = classifier

P

T1

T3

F

F

F

T

R

V1

V2

V3

C V

Moretti, Steinhauser, Thain, Chawla,Scaling up Classifiers to Cloud Computers, ICDM 2008.

Page 20: Programming Distributed Systems with High Level Abstractions
Page 21: Programming Distributed Systems with High Level Abstractions

BXGrid AbstractionsBXGrid Abstractions

B1

B2

B3

A1 A2 A3

F F F

F

F F

F F

F

L brown

L blue

R brown

R brown

S1

S2

S3

eye color

F

F

F

ROCCurve

S = Select( color=“brown” )

B = Transform( S,F )

M = AllPairs( A, B, F )

Bui, Thomas, Kelly, Lyon, Flynn, ThainBXGrid: A Repository and Experimental Abstraction… in review 2008.

Page 22: Programming Distributed Systems with High Level Abstractions

Implementing AbstractionsImplementing Abstractions

S = Select( color=“brown” )

B = Transform( S,F )

M = AllPairs( A, B, F )

DBMS

Relational Database (2x)

Active Storage Cluster (16x)

CPU

Relational Database

CPU CPU CPU

CPU CPU CPU CPU

Condor Pool (500x)

Page 23: Programming Distributed Systems with High Level Abstractions
Page 24: Programming Distributed Systems with High Level Abstractions

Compatibility of Abstractions?Compatibility of Abstractions?

Assembly Language

Map-Reduce All-PairsClassify

Page 25: Programming Distributed Systems with High Level Abstractions

Compatibility of Abstractions?Compatibility of Abstractions?

Assembly Language

Map-Reduce

All-PairsClassify ???

Mismatch:MR relies on data partition.AP relies on data re-use.

Mismatch:Classify partitions logically.

MR partitions physically.

Page 26: Programming Distributed Systems with High Level Abstractions

Compatibility of Abstractions?Compatibility of Abstractions?

Assembly Language

Map-Reduce All-PairsClassify

SwiftDryad More General,Less Optimized?

Page 27: Programming Distributed Systems with High Level Abstractions

From Clouds to MulticoreFrom Clouds to Multicore

Next Step: AP Implementation that runs Next Step: AP Implementation that runs well on Single CPU, Multicore, Cloud, or well on Single CPU, Multicore, Cloud, or Cloud of Multicores.Cloud of Multicores.

Assembly Language

Map-Reduce All-PairsClassify

DryadSwift

CPU CPU CPU CPU

Assembly Language

Map-Reduce All-PairsClassify

DryadSwift

CPU CPU CPU CPU

$$$ $$$ $$$ $$$

RAM

Page 28: Programming Distributed Systems with High Level Abstractions

AcknowledgmentsAcknowledgments

Cooperative Computing LabCooperative Computing Lab– http://www.cse.nd.edu/~cclhttp://www.cse.nd.edu/~ccl

Grad Students:Grad Students:– Chris MorettiChris Moretti– Hoang BuiHoang Bui– Michael AlbrechtMichael Albrecht– Li YuLi Yu

NSF Grants CCF-0621434, CNS-0643229NSF Grants CCF-0621434, CNS-0643229

Undergraduate StudentsUndergraduate Students– Mike KellyMike Kelly– Rory CarmichaelRory Carmichael– Mark PasquierMark Pasquier– Christopher LyonChristopher Lyon– Jared BulosanJared Bulosan