Top Banner
1 Science in the Clouds: Science in the Clouds: History, Challenges, and History, Challenges, and Opportunities Opportunities Douglas Thain Douglas Thain University of Notre Dame University of Notre Dame GeoClouds Workshop GeoClouds Workshop 17 September 2009 17 September 2009
59

Science in the Clouds: History, Challenges, and Opportunities

Feb 23, 2016

Download

Documents

Alpha

Science in the Clouds: History, Challenges, and Opportunities. Douglas Thain University of Notre Dame GeoClouds Workshop 17 September 2009. http://www.cse.nd.edu/~ccl. The Cooperative Computing Lab. We collaborate with people who have large scale computing problems. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Science in the Clouds: History, Challenges, and Opportunities

11

Science in the Clouds:Science in the Clouds:History, Challenges, and History, Challenges, and

OpportunitiesOpportunities

Douglas ThainDouglas ThainUniversity of Notre DameUniversity of Notre Dame

GeoClouds WorkshopGeoClouds Workshop17 September 200917 September 2009

Page 2: Science in the Clouds: History, Challenges, and Opportunities

22

http://www.cse.nd.edu/~ccl

Page 3: Science in the Clouds: History, Challenges, and Opportunities

33

The Cooperative Computing LabThe Cooperative Computing Lab

We We collaborate with peoplecollaborate with people who have large who have large scale computing problems.scale computing problems.We We build new softwarebuild new software and systems to help and systems to help them achieve meaningful goals.them achieve meaningful goals.We run a We run a production computing systemproduction computing system used by people at ND and elsewhere.used by people at ND and elsewhere.We conduct We conduct computer science researchcomputer science research, , informed by real world experience, with an informed by real world experience, with an impact upon problems that matter.impact upon problems that matter.

Page 4: Science in the Clouds: History, Challenges, and Opportunities

Clouds in the Hype CycleClouds in the Hype Cycle

Gartner Hype Cycle Report, 2009

Page 5: Science in the Clouds: History, Challenges, and Opportunities

55

What is cloud computing?What is cloud computing?

A cloud provides rapid, metered access to A cloud provides rapid, metered access to a virtually unlimited set of resources.a virtually unlimited set of resources.

Two significant impact on users:Two significant impact on users:– End users must have an End users must have an economic modeleconomic model for for

the work that they want to accomplish.the work that they want to accomplish.– Apps must be Apps must be flexibleflexible enough to work with an enough to work with an

arbitrary number and kind of resources.arbitrary number and kind of resources.

Page 6: Science in the Clouds: History, Challenges, and Opportunities

66

Example: Amazon EC2 Sep 2009Example: Amazon EC2 Sep 2009(simplified slightly for discussion)(simplified slightly for discussion)

Small: 1 core, 1.7GB RAM, 160GB diskSmall: 1 core, 1.7GB RAM, 160GB disk– 10 cents/hour10 cents/hourLarge: 2 cores, 7.5GB RAM, 850GB diskLarge: 2 cores, 7.5GB RAM, 850GB disk– 40 cents/hour40 cents/hourExtra Large: 4 cores, 15 GB, 1690GB diskExtra Large: 4 cores, 15 GB, 1690GB disk– 80 cents/hour80 cents/hour

And the Simple Storage Service:And the Simple Storage Service:– 15 cents per GB-month stored15 cents per GB-month stored– 17 cents per GB transferred (outside of EC2)17 cents per GB transferred (outside of EC2)– 1 cent per 1000 write operations1 cent per 1000 write operations– 1 cent per 10000 read operations1 cent per 10000 read operations

Page 7: Science in the Clouds: History, Challenges, and Opportunities

77

Is Cloud Computing New?Is Cloud Computing New?

Not entirely, but a combination of the old ideas Not entirely, but a combination of the old ideas of utility computing and distributed computing.of utility computing and distributed computing.– 1960 – MULTICS1960 – MULTICS– 1980 – The Cambridge Ring1980 – The Cambridge Ring– 1987 – Condor Distributed Batch System1987 – Condor Distributed Batch System– 1989 – Seti@Home1989 – Seti@Home– 1990s – Clusters, Beowulf, MPI, NOW1990s – Clusters, Beowulf, MPI, NOW– 1995 – Globus, Grid Computing1995 – Globus, Grid Computing– 2001 –TeraGrid2001 –TeraGrid– 2004 – Sun Rents CPUs at $1/hour2004 – Sun Rents CPUs at $1/hour– 2006 – Amazon EC2 and S32006 – Amazon EC2 and S3

Page 8: Science in the Clouds: History, Challenges, and Opportunities

88

OpEx of Ownership

Clouds Trade CapEx for OpExClouds Trade CapEx for OpEx

Capital Expense of Ownership

Cos

t

Time

OpEx of Cloud Computing

2X

Page 9: Science in the Clouds: History, Challenges, and Opportunities

99

What about grid computing?What about grid computing?A vision much like clouds:A vision much like clouds:– A worldwide framework that would make A worldwide framework that would make

massive scale computing as easy to use as massive scale computing as easy to use as an electrical socket. an electrical socket.

The more modest realization:The more modest realization:– A means for accessing remote computing A means for accessing remote computing

facilities in their native form, usually for CPU-facilities in their native form, usually for CPU-intensive tasks.intensive tasks.

The social context:The social context:– Large collaborative efforts between computer Large collaborative efforts between computer

scientists and computer-savvy fields, scientists and computer-savvy fields, particularly physics and astronomy.particularly physics and astronomy.

Page 10: Science in the Clouds: History, Challenges, and Opportunities

1010

Clouds vs GridsClouds vs Grids

Grids provide a Grids provide a job executionjob execution interface: interface:– Run program P on input A, return the output.Run program P on input A, return the output.– Allows the system to maximize utilization and hide Allows the system to maximize utilization and hide

failures, but provides few performance guarantees failures, but provides few performance guarantees and inaccurate metering.and inaccurate metering.

Clouds provide Clouds provide resource allocationresource allocation::– Create a VM with 2GB of RAM for 7 days.Create a VM with 2GB of RAM for 7 days.– Gives predictable performance and accurate Gives predictable performance and accurate

metering, but exposes problems to the user.metering, but exposes problems to the user.– Can be used to build interactive services.Can be used to build interactive services.– How do I run 1M jobs on 100 servers?How do I run 1M jobs on 100 servers?

Page 11: Science in the Clouds: History, Challenges, and Opportunities

1111

Grid Computing LayerProvides Job Execution

Cloud Computing LayerProvides Resource Allocation

Submit 1M JobsA

lloca

te 1

00 C

PU

s

Dispatch Jobs Manage Load

Page 12: Science in the Clouds: History, Challenges, and Opportunities

1212

Create a Condor Pool with 100 NodesA

llocate 100 Cores

Run 1M Jobs

Page 13: Science in the Clouds: History, Challenges, and Opportunities

1313

Clouds Solve Some Grid ProblemsClouds Solve Some Grid Problems

Application compatibility is simplified.Application compatibility is simplified.– You provide a VM for Linux 2.3.4.1.2.You provide a VM for Linux 2.3.4.1.2.

Performance is reasonably predictable.Performance is reasonably predictable.– 10% variations rather than orders of mag.10% variations rather than orders of mag.

Fewer administrative headaches for the Fewer administrative headaches for the lone user.lone user.– A credit card swipe instead of a certificate.A credit card swipe instead of a certificate.

Page 14: Science in the Clouds: History, Challenges, and Opportunities

1414

But, Problems New and Old:But, Problems New and Old:

How do I reliably execute 1M jobs?How do I reliably execute 1M jobs?Can I share resources and data with others in Can I share resources and data with others in the cloud?the cloud?How do I authenticate others in the cloud?How do I authenticate others in the cloud?Unfortunately, location still matters.Unfortunately, location still matters.Can we make applications efficiently span Can we make applications efficiently span multiple cloud providers?multiple cloud providers?Can we join existing centers with clouds?Can we join existing centers with clouds?(These are all problems contemplated by grid.)(These are all problems contemplated by grid.)

Page 15: Science in the Clouds: History, Challenges, and Opportunities

1515

More Open QuestionsMore Open QuestionsCan I afford to move my data in to the cloud?Can I afford to move my data in to the cloud?Can I afford to get it Can I afford to get it outout??Do I trust the cloud to secure my data?Do I trust the cloud to secure my data?How do I go about constructing an economic How do I go about constructing an economic model for my research?model for my research?Are there social/technical dangers in putting too Are there social/technical dangers in putting too many eggs in one basket?many eggs in one basket?Is pay-go the proper model for research?Is pay-go the proper model for research?Should universities get out of the data center Should universities get out of the data center business?business?

Page 16: Science in the Clouds: History, Challenges, and Opportunities

1616

Clusters, clouds, and gridsClusters, clouds, and gridsgive us access to unlimited CPUs.give us access to unlimited CPUs.

How do we write How do we write programsprograms that can that canrun effectively in large systems?run effectively in large systems?

Page 17: Science in the Clouds: History, Challenges, and Opportunities

1717

K,V

K,V

K,V

K,V

K,V

K,V

K,V M

Key0

Key1

KeyN

V

V

V

V

V

V

V

V

V

R

R

R O2

O1

O0MapReduceMapReduce( S, M, R )( S, M, R )

Set S

Page 18: Science in the Clouds: History, Challenges, and Opportunities

1818

Of course, not all science fits into Of course, not all science fits into the Map-Reduce model!the Map-Reduce model!

Page 20: Science in the Clouds: History, Challenges, and Opportunities

2020

Similarity Matrix ConstructionSimilarity Matrix Construction

1.0 0.8 0.1 0.0 0.0 0.1

1.0 0.0 0.1 0.1 0.0

1.0 0.0 0.1 0.3

1.0 0.0 0.0

1.0 0.1

1.0

Challenge Workload:

60,000 iris images1MB each.02s per F833 CPU-days600 TB of I/O

Page 21: Science in the Clouds: History, Challenges, and Opportunities

2121

I have 60,000 iris images acquired in my research lab. I want to reduce each one to a feature space, and then compare all of them to each other. I want to spend my time doing science, not struggling with computers.

I have a laptop.

I own a few machines I can buy time from Amazon or TeraGrid.

Now What?

Page 22: Science in the Clouds: History, Challenges, and Opportunities

2222

Page 23: Science in the Clouds: History, Challenges, and Opportunities

2323

Page 24: Science in the Clouds: History, Challenges, and Opportunities

2424

Page 25: Science in the Clouds: History, Challenges, and Opportunities

2525

Non-Expert User Using 500 CPUsNon-Expert User Using 500 CPUsTry 1: Each F is a batch job.Failure: Dispatch latency >> F runtime.

HN

CPU CPU CPU CPUF F F FCPUF

Try 2: Each row is a batch job.Failure: Too many small ops on FS.

HN

CPU CPU CPU CPUF F F FCPUFFFF FFF FFF FFF FFF

Try 3: Bundle all files into one package.Failure: Everyone loads 1GB at once.

HN

CPU CPU CPU CPUF F F FCPUFFFF FFF FFF FFF FFF

Try 4: User gives up and attemptsto solve an easier or smaller problem.

Page 26: Science in the Clouds: History, Challenges, and Opportunities

2626

ObservationObservation

In a given field of study, many people In a given field of study, many people repeat the same repeat the same pattern of work many of work many times, making slight changes to the data times, making slight changes to the data and algorithms.and algorithms.If the system knows the overall pattern in If the system knows the overall pattern in advance, then it can do a better job of advance, then it can do a better job of executing it reliably and efficiently.executing it reliably and efficiently.If the user knows in advance what patterns If the user knows in advance what patterns are allowed, then they have a better idea are allowed, then they have a better idea of how to construct their workloads.of how to construct their workloads.

Page 27: Science in the Clouds: History, Challenges, and Opportunities

2727

AbstractionsAbstractionsfor Distributed Computingfor Distributed Computing

Abstraction: a Abstraction: a declarative specificationdeclarative specification of the computation and data of a workload.of the computation and data of a workload.A A restricted patternrestricted pattern, not meant to be a , not meant to be a general purpose programming language.general purpose programming language.UsesUses data structures instead of files. instead of files.Provide users with a Provide users with a bright path..Regular structure makes it tractable to Regular structure makes it tractable to model and model and predict performance.predict performance.

Page 28: Science in the Clouds: History, Challenges, and Opportunities

2828

Working with AbstractionsWorking with AbstractionsFA1A2An

AllPairs( A, B, F )

Cloud or Grid

A1A2Bn

CustomWorkflow

Engine

Compact Data Structure

Page 29: Science in the Clouds: History, Challenges, and Opportunities

2929

All-Pairs AbstractionAll-Pairs AbstractionAllPairs( set A, set B, function F )AllPairs( set A, set B, function F )

returns matrix M wherereturns matrix M whereM[i][j] = F( A[i], B[j] ) for all i,jM[i][j] = F( A[i], B[j] ) for all i,j

B1

B2

B3

A1 A2 A3

F F F

A1A1

An

B1B1

Bn

F

AllPairs(A,B,F)F

F F

F F

F

allpairs A B F.exe

Page 30: Science in the Clouds: History, Challenges, and Opportunities

3030

How Does the Abstraction Help?How Does the Abstraction Help?

The custom workflow engine:The custom workflow engine:– Chooses right data transfer strategy.Chooses right data transfer strategy.– Chooses the right number of resources.Chooses the right number of resources.– Chooses blocking of functions into jobs.Chooses blocking of functions into jobs.– Recovers from a larger number of failures.Recovers from a larger number of failures.– Predicts overall runtime accurately.Predicts overall runtime accurately.

All of these tasks are nearly impossible for All of these tasks are nearly impossible for arbitrary workloads, but are tractable (not arbitrary workloads, but are tractable (not trivial) to solve for a trivial) to solve for a specificspecific abstraction. abstraction.

Page 31: Science in the Clouds: History, Challenges, and Opportunities

3131

Page 32: Science in the Clouds: History, Challenges, and Opportunities

3232

Choose the Right # of CPUsChoose the Right # of CPUs

Page 33: Science in the Clouds: History, Challenges, and Opportunities

3333

Resources ConsumedResources Consumed

Page 34: Science in the Clouds: History, Challenges, and Opportunities

3434

All-Pairs in ProductionAll-Pairs in ProductionOur All-Pairs implementation has provided over Our All-Pairs implementation has provided over 57 CPU-years57 CPU-years of computation to the ND of computation to the ND biometrics research group over the last year.biometrics research group over the last year.Largest run so far: Largest run so far: 58,396 irises58,396 irises from the Face from the Face Recognition Grand Challenge. The largest Recognition Grand Challenge. The largest experiment ever run on publically available data.experiment ever run on publically available data.Competing biometric research relies on samples Competing biometric research relies on samples of 100-1000 images, which can miss important of 100-1000 images, which can miss important population effects. population effects. Reduced computation time from 833 days to 10 Reduced computation time from 833 days to 10 days, making it feasible to repeat multiple times for days, making it feasible to repeat multiple times for a graduate thesis. (We can go faster yet.)a graduate thesis. (We can go faster yet.)

Page 35: Science in the Clouds: History, Challenges, and Opportunities

3535

Page 36: Science in the Clouds: History, Challenges, and Opportunities

3636

Are there other abstractions?Are there other abstractions?

Page 37: Science in the Clouds: History, Challenges, and Opportunities

3737

M[4,2]

M[3,2] M[4,3]

M[4,4]M[3,4]M[2,4]

M[4,0]M[3,0]M[2,0]M[1,0]M[0,0]

M[0,1]

M[0,2]

M[0,3]

M[0,4]

Fx

yd

Fx

yd

Fx

yd

Fx

yd

Fx

yd

Fx

yd

F

F

y

y

x

x

d

d

x F Fx

yd yd

Wavefront( matrix M, function F(x,y,d) )Wavefront( matrix M, function F(x,y,d) )returns matrix M such thatreturns matrix M such that

M[i,j] = F( M[i-1,j], M[I,j-1], M[i-1,j-1] )M[i,j] = F( M[i-1,j], M[I,j-1], M[i-1,j-1] )

F

Wavefront(M,F)M

Page 38: Science in the Clouds: History, Challenges, and Opportunities

3838

Applications of WavefrontApplications of Wavefront

Bioinformatics:Bioinformatics:– Compute the alignment of two large DNA strings in Compute the alignment of two large DNA strings in

order to find similarities between species. Existing order to find similarities between species. Existing tools do not scale up to complete DNA strings.tools do not scale up to complete DNA strings.

Economics:Economics:– Simulate the interaction between two competing firms, Simulate the interaction between two competing firms,

each of which has an effect on resource consumption each of which has an effect on resource consumption and market price. E.g. When will we run out of oil?and market price. E.g. When will we run out of oil?

Applies to any kind of optimization problem Applies to any kind of optimization problem solvable with dynamic programming.solvable with dynamic programming.

Page 39: Science in the Clouds: History, Challenges, and Opportunities

3939

Problem: Dispatch LatencyProblem: Dispatch Latency

Even with an infinite number of CPUs, Even with an infinite number of CPUs, dispatch latency controls the total dispatch latency controls the total execution time: O(n) in the best case.execution time: O(n) in the best case.However, job dispatch latency in an However, job dispatch latency in an unloaded grid is about unloaded grid is about 30 seconds30 seconds, which , which may outweigh the runtime of F.may outweigh the runtime of F.Things get worse when queues are long!Things get worse when queues are long!Solution:Solution: Build a lightweight task dispatch Build a lightweight task dispatch system. (Idea from Falkon@UC)system. (Idea from Falkon@UC)

Page 40: Science in the Clouds: History, Challenges, and Opportunities

4040

worker

workerworkerworkerworkerworkerworker

workqueue

FIn.txt out.txt

put F.exeput in.txtexec F.exe <in.txt >out.txtget out.txt

1000s of workersDispatchedto the cloud

wavefrontengine

queuetasks

tasksdone

Page 41: Science in the Clouds: History, Challenges, and Opportunities

4141

Problem: Performance VariationProblem: Performance Variation

Tasks can be delayed for many reasons:Tasks can be delayed for many reasons:– Heterogeneous hardware.Heterogeneous hardware.– Interference with disk/network.Interference with disk/network.– Policy based suspension.Policy based suspension.

Any delayed task in Wavefront has a cascading Any delayed task in Wavefront has a cascading effect on the rest of the workload.effect on the rest of the workload.Solution - Fast Abort:Solution - Fast Abort: Keep statistics on task Keep statistics on task runtimes, and abort those that lie significantly runtimes, and abort those that lie significantly outside the mean. Prefer to assign jobs to outside the mean. Prefer to assign jobs to machines with a fast history.machines with a fast history.

Page 42: Science in the Clouds: History, Challenges, and Opportunities

4242

500x500 Wavefront on ~200 CPUs500x500 Wavefront on ~200 CPUs

Page 43: Science in the Clouds: History, Challenges, and Opportunities

4343

Wavefront on a 200-CPU ClusterWavefront on a 200-CPU Cluster

Page 44: Science in the Clouds: History, Challenges, and Opportunities

4444

Wavefront on a 32-Core CPUWavefront on a 32-Core CPU

Page 45: Science in the Clouds: History, Challenges, and Opportunities

4545

The Genome Assembly ProblemThe Genome Assembly Problem

AGTCGATCGATCGATAATCGATCCTAGCTAGCTACGA

AGTCGATCGATCGAT

AGCTAGCTACGA TCGATAATCGATCCTAGCTA

Chemical Sequencing

Computational Assembly

AGTCGATCGATCGAT

AGCTAGCTACGA TCGATAATCGATCCTAGCTA

Millions of “reads”100s bytes long.

Page 46: Science in the Clouds: History, Challenges, and Opportunities

4646

Sample GenomesSample Genomes

ReadsReads DataData PairsPairsSequentialSequential

TimeTime

A. gambiaeA. gambiaescaffoldscaffold

101K101K 80MB80MB 738K738K 12 hours12 hours

A. gambiaeA. gambiaecompletecomplete

180K180K 1.4GB1.4GB 12M12M 6 days6 days

S. BicolorS. Bicolorsimulatedsimulated

7.9M7.9M 5.7GB5.7GB 84M84M 30 days30 days

Page 47: Science in the Clouds: History, Challenges, and Opportunities

4747

Some-Pairs AbstractionSome-Pairs AbstractionSomePairs( set A, list (i,j), function F(x,y) )SomePairs( set A, list (i,j), function F(x,y) )

returnsreturnslist of F( A[i], A[j] )list of F( A[i], A[j] )

A1

A2

A3

A1 A2 A3

F

A1A1

An

F

SomePairs(A,L,F)F F

F

(1,2)(2,1)(2,3)(3,3)

Page 48: Science in the Clouds: History, Challenges, and Opportunities

4848

worker

workerworkerworkerworkerworkerworker

workqueue

in.txt out.txt

put align.exeput in.txtexec F.exe <in.txt >out.txtget out.txt

100s of workersdispatched to

Notre Dame,Purdue, and

Wisconsin

somepairsmaster

queuetasks

tasksdone

F

detail of a single worker:

Distributed Genome AssemblyDistributed Genome Assembly

A1A1

An F

(1,2)(2,1)(2,3)(3,3)

Page 49: Science in the Clouds: History, Challenges, and Opportunities

4949

Small Genome (101K reads)Small Genome (101K reads)

Page 50: Science in the Clouds: History, Challenges, and Opportunities

5050

Medium Genome (180K reads)Medium Genome (180K reads)

Page 51: Science in the Clouds: History, Challenges, and Opportunities

5151

Large Genome (7.9M)Large Genome (7.9M)

Page 52: Science in the Clouds: History, Challenges, and Opportunities

5252

What’s the Upshot?What’s the Upshot?

We can do full-scale assemblies as a routine We can do full-scale assemblies as a routine matter on existing conventional machines.matter on existing conventional machines.Our solution is faster (wall-clock time) than the Our solution is faster (wall-clock time) than the next faster assembler run on 1024x BG/L.next faster assembler run on 1024x BG/L.You could almost certainly do better with a You could almost certainly do better with a dedicated cluster and a fast interconnect, but dedicated cluster and a fast interconnect, but such systems are not universally available.such systems are not universally available.Our solution opens up research in assembly to Our solution opens up research in assembly to labs with “NASCAR” instead of “Formula-One” labs with “NASCAR” instead of “Formula-One” hardware.hardware.

Page 53: Science in the Clouds: History, Challenges, and Opportunities

5353

What if your application doesn’t What if your application doesn’t fit a regular pattern?fit a regular pattern?

Page 54: Science in the Clouds: History, Challenges, and Opportunities

5454

MakeflowMakeflow

part1 part2 part3: input.data split.py ./split.py input.data

out1: part1 mysim.exe ./mysim.exe part1 >out1

out2: part2 mysim.exe ./mysim.exe part2 >out2

out3: part3 mysim.exe ./mysim.exe part3 >out3

result: out1 out2 out3 join.py ./join.py out1 out2 out3 > result

Page 55: Science in the Clouds: History, Challenges, and Opportunities

5555

worker

workerworkerworkerworkerworkerworker

workqueue

afile bfile

put progput afileexec prog afile > bfileget bfile

100s of workersdispatched to

the cloud

makeflowmaster

queuetasks

tasksdone

prog

detail of a single worker:

Makeflow ImplementationMakeflow Implementation

bfile: afile prog prog afile >bfile

Two optimizations: Cache inputs and output. Dispatch tasks to nodes with data.

Page 56: Science in the Clouds: History, Challenges, and Opportunities

5656

Experience with MakeflowExperience with Makeflow

Still in initial deployment, so no big results Still in initial deployment, so no big results to show just yet.to show just yet.Easy to test and debug on a desktop Easy to test and debug on a desktop machine or a multicore server.machine or a multicore server.The workload says The workload says nothingnothing about the about the distributed system. (This is good.)distributed system. (This is good.)Graduate students in bioinformatics Graduate students in bioinformatics running codes at production speeds on running codes at production speeds on hundreds of nodes in less than a week.hundreds of nodes in less than a week.

Page 57: Science in the Clouds: History, Challenges, and Opportunities

5757

Abstractions as a Social ToolAbstractions as a Social ToolCollaboration with outside groups is how we Collaboration with outside groups is how we encounter the most interesting, challenging, and encounter the most interesting, challenging, and important problems, in computer science.important problems, in computer science.However, often neither side understands which However, often neither side understands which details are essential or non-essential:details are essential or non-essential:– Can you deal with files that have upper case letters?Can you deal with files that have upper case letters?– Oh, by the way, we have 10TB of input, is that ok?Oh, by the way, we have 10TB of input, is that ok?– (A little bit of an exaggeration.)(A little bit of an exaggeration.)An abstraction is an excellent chalkboard tool:An abstraction is an excellent chalkboard tool:– Accessible to anyone with a little bit of mathematics.Accessible to anyone with a little bit of mathematics.– Makes it easy to see what must be plugged in.Makes it easy to see what must be plugged in.– Forces out essential details: data size, execution time.Forces out essential details: data size, execution time.

Page 58: Science in the Clouds: History, Challenges, and Opportunities

5858

ConclusionConclusionGrids, clouds, and clusters provide enormous Grids, clouds, and clusters provide enormous computing power, but are very challenging to computing power, but are very challenging to use effectively.use effectively.An abstraction provides a robust, scalable An abstraction provides a robust, scalable solution to a narrow category of problems; each solution to a narrow category of problems; each requires different kinds of optimizations.requires different kinds of optimizations.Limiting expressive powerLimiting expressive power, results in systems , results in systems that are usable, predictable, and reliable.that are usable, predictable, and reliable.Is there a menu of abstractions that would Is there a menu of abstractions that would satisfy many consumers of clouds?satisfy many consumers of clouds?

Page 59: Science in the Clouds: History, Challenges, and Opportunities

5959

AcknowledgmentsAcknowledgments

Cooperative Computing LabCooperative Computing Lab– http://www.cse.nd.edu/~cclhttp://www.cse.nd.edu/~ccl

Grad StudentsGrad Students– Chris MorettiChris Moretti– Hoang Bui Hoang Bui – Li YuLi Yu– Mike OlsonMike Olson– Michael AlbrechtMichael Albrecht

Faculty:Faculty:– Patrick FlynnPatrick Flynn– Nitesh ChawlaNitesh Chawla– Kenneth JuddKenneth Judd– Scott EmrichScott Emrich

NSF Grants CCF-0621434, CNS-0643229NSF Grants CCF-0621434, CNS-0643229

UndergradsUndergrads– Mike KellyMike Kelly– Rory CarmichaelRory Carmichael– Mark PasquierMark Pasquier– Christopher LyonChristopher Lyon– Jared BulosanJared Bulosan