Opportunities and challenges in e Science · Outline • Background • Research e- and cyber-infrastructures for e-Science • The experience of the Grid • Examples beyond e-Science

HPC opportunities and challenges in

e-Science

Fabrizio GagliardiEMEA and LATAM Director

Technical Computing

Microsoft Research

International Conference in Computational Science

Krakow, Poland, 24 June 2008

http://www.iccs-meeting.org/iccs2008/abs-Gagliardi.html

Outline

• Background

• Research e- and cyber-infrastructures for e-Science

• The experience of the Grid

• Examples beyond e-Science

• Issues and new trends:

• Green Grid, Cloud Computing and HPC lab in every lab

• Cost analysis: Grid vs Cloud Computing

• Conclusions

Accelerating Scientific Process

• Thousand years ago:

Experimental Science- description of natural

phenomena

• Last few hundred years:

Theoretical Science- Newton‟s Laws, Maxwell‟s

Equations …

• Last few decades:

Computational Science- simulation of complex

phenomena

• Today:

„e-Science‟ or Data-centric

Science- unify theory, experiment, and

simulation

1. Observation 2. Analysis

4. Validation 3. Simulation

10/9/2008 3ICCS 2008, Krakow

Background

Courtesy Kiril Faenov, MSR

Background

Data Gathering

Discovery and

Browsing

Science

Exploration

Domain specific

analyses Scientific Output

“Raw” data includes

sensor output, data

downloaded from

agency or collaboration

web sites, papers

(especially for ancillary

data

“Raw” data browsing for

discovery (do I have

enough data in the right

places?), cleaning (does

the data look obviously

wrong?), and light weight

science via browsing

“Science variables” and

data summaries for early

science exploration and

hypothesis testing.

Similar to discovery and

browsing, but with

science variables

computed via gap filling,

units conversions, or

simple equation.

“Science variables”

combined with models,

other specialized code,

or statistics for deep

science understanding.

Scientific results via

packages such as

MatLab or R2. Special

rendering package such

as ArcGIS.

Paper preparation.

Courtesy Catherine VanIngen, MSR

The data pipeline

10/9/2008 4ICCS 2008, Krakow

BackgroundExplosion of Data

10/9/2008 5ICCS 2008, Krakow

Experiments Archives LiteratureSimulations

Petabytes

Doubling every

2 years

Courtesy Kiril Faenov, MSR

http://images.google.com/imgres?imgurl=http://www.st.com/stonline/stappl/publish/stwebresources/PL__Press__Release/CERN_LHC_t2030shigh.jpeg&imgrefurl=http://wk.typepad.com/weblog/2008/02/ted-2008---sess.html&h=514&w=789&sz=606&hl=en&start=3&sig2=JpG3uuLLGQaVlbdCCTHJfw&um=1&tbnid=LVmRtlYltPxfNM:&tbnh=93&tbnw=143&ei=PQFWSOzDBqOYoQSlwr2TAw&prev=/images?q=lhc&um=1&hl=en&rls=com.microsoft:*:IE-SearchBox&rlz=1I7GGLR&sa=N

e-Infrastructures in Europe:• Research Network infrastructure:

– GEANT pan-European network interconnecting National Research and Education Networks

• Computing Grid Infrastructure:– Enabling Grids for E-SciencE (EGEE project)

– Transition to the sustainable European Grid

Initiative (EGI) currently worked out through EGI_DS project

• Data & Knowledge Infrastructure:– Digital Libraries (DILIGENT) and repositories (DRIVER-II)

• A series of other projects :– Middleware interoperation, applications, policy and

support actions, etc.

Cyber-Infrastructures around the world:

• Similar in US and Asia Pacific

Research e-Infrastructures for e-Science

10/9/2008 6ICCS 2008, Krakow

http://www.opensciencegrid.org/

• Grids for e-Science: a success story so far?

– Several Grid Middleware stacks

– Many HPC applications using the Grid• Some (HEP, Bio) in production use

• Some still in testing phase: more effort still

required to make the Grid their day-to-day workhorse

– e-Health applications also part of the Grid

– Some industrial applications: • CGG Earth Sciences

The experience of the Grid 1/3

10/9/2008 7ICCS 2008, Krakow

http://www.opensciencegrid.org/

• Grids beyond e-Science?

– Slower adoption: prefer different environments, tools and have different TCOs

• Intra grids, internal dedicated clusters , cloud computing

– e-Business applications• Finance, ERP, SMEs

– Industrial applications• Automotive, Aerospace, Pharmaceutical industry,

Telecom

– e-Government applications• Earth Observation, Civil protection:

• e.g. The Cyclops project


10/9/2008 8ICCS 2008, Krakow

• Industry also demonstrated interest in becoming an HPC infrastructure provider:– On-demand infrastructures:

• Cloud and Elastic computing, pay as you go…

• Data centers: Data getting more and more attention

– Service hosting: outsourced integrated services

– Virtualisation being exploited in Cloud and Elastic computing (e.g. Amazon EC2 virtual instances)

• “Pre-commercial procurement”– Research-industry collaboration in Europe to achieve new

leading-edge products• Example: PRACE building a PetaFlop Supercomputing Centre in Europe


10/9/2008 9ICCS 2008, Krakow

10

Examples beyond e-Science

EU BEinGRID: Computational Fluid Dynamics

10/9/2008 ICCS 2008, Krakow

11


CYCLOPS: Forest Fire propagation

10/9/2008 ICCS 2008, Krakow

12


EGEODE VO : Seismic processing based on Geocluster

application by CGG company (France)

10/9/2008 ICCS 2008, Krakow

13

In summary…

• Grid computing has delivered an affordable and high

performance computing infrastructure to scientists all

over the world to solve intense computing and storage

problems within constrained research budget

• This has also been effectively used by industry to

increase the usage of their computing infrastructure

and reduce Total Cost of Ownership (TCO)

• Grid is not only aggregating computing resources but

also leveraging international research networks to

deliver an effective and irreplaceable channel for

international collaboration

10/9/2008 ICCS 2008, Krakow

14

The flip side…

• Major issues with wide adoption of Grid computing in

eScience, e-Business, industry etc. have to do with:

• Cost of operations and management complexity

• Not a solution for all problems (latency, fine grain

parallelism are difficult)

• Difficult to use for the average scientist

• Security and reliability

• Power consumption and heat dissipation are becoming

a limiting factor to consumer based distributed systems

• We are observing the limits of Moore‟s law

10/9/2008 ICCS 2008, Krakow

ICCS 2008, Krakow

Switching Gears:

“To Distribute or Not To Distribute”

• Prof. Satoshi Matsuoka, TITech

• Keynote at Mardi Gras Conference, Baton Rouge, Jan.31, 2008

• In the late 90s, petaflops were considered very hard and

at least 20 years off …

• while grids were supposed to happen right way

• After 10 years (around now) petaflos are “real close” but

there‟s still no “global grid”

• What happened: It was easier to put together massive clusters than to get people to agree about how to share their resources For tightly coupled HPC applications, tightly coupled machines are still necessary Grids are inherently suited for loosely coupled apps or enabling access to machines and/or data

• With Gilder's Law*, bandwidth to the compute resources will promote thin client approach

* “Bandwidth grows at least three times faster than computer power." This means that if computer power doubles every eighteen months (per Moore's Law), then communications power doubles every six months

• Example: Tsubame machine in Tokyo

10/9/2008 15

https://emea.mail.microsoft.com/OWA/redir.aspx?C=0ec49a7bbb6543aba45fa1d5a0f1a151&URL=http://www.netlingo.com/lookup.cfm?term=Gilder's Law

https://emea.mail.microsoft.com/OWA/redir.aspx?C=0ec49a7bbb6543aba45fa1d5a0f1a151&URL=http://www.netlingo.com/lookup.cfm?term=Gilder's Law

https://emea.mail.microsoft.com/OWA/redir.aspx?C=0ec49a7bbb6543aba45fa1d5a0f1a151&URL=http://www.netlingo.com/lookup.cfm?term=Moore's+Law

Supercomputing Reached the Petaflop

IBM RoadRunner at

Los Alamos National Lab

10/9/2008 16ICCS 2008, Krakow

17

New trends (1/3): Green Grid, Pay per CPU/GB and/or HPC in every lab…?

• The Green Grid, IBM Big Green and other IT industry initiatives try

to address current HPC limits in energy and environmental impact

requirements

• Computer and data centers in energy and environmental favorable

locations are becoming important

• Elastic computing, Computing on the Cloud, Data Centers and

Service Hosting are becoming the new emerging solutions for

HPC applications

• Many-multi-core and CPU accelerators are promising potential

breakthroughs

10/9/2008 ICCS 2008, Krakow

18

New trends: Cloud computing and storage on demand (2/3)

•Cloud Computing: http://en.wikipedia.org/wiki/Cloud_computing

•Amazon, IBM, Google, Microsoft, Sun, Yahoo, major „Cloud Platform‟

potential providers

•Operating compute and storage facilities around the world

•Have developed middleware technologies for resource sharing

•First services already operational - Examples:

•Amazon Elastic Computing Cloud (EC2) -Simple Storage Service (S3)

10/9/2008 ICCS 2008, Krakow

http://en.wikipedia.org/wiki/Cloud_computing

19

New trends: Cloud computing and storage on demand (3/3)

http://www.itjungle.com/bns/bns100807-story02.html

http://www.itjungle.com/tug/tug050307-story05http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=mainframes_and_supercomputers&articleId=9073758&taxonomyId=6

7&intsrc=kc_top

10/9/2008 ICCS 2008, Krakow




http://www.itjungle.com/tug/tug050307-story05



http://www.computerworld.com/action/article.do?command=viewArticleBasic&taxonomyName=mainframes_and_supercomputers&articleId=9073758&taxonomyId=67&intsrc=kc_top




• EC2 Beta Service: Web-Services based http://www.amazon.com/gp/browse.html?node=201590011

– $0.10 per hour - Small Instance (Default) • 1.7 GB of memory, 1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute

Unit), 160 GB of instance storage, 32-bit platform • EC2 Compute Unit (ECU) - One EC2 Compute Unit (ECU) provides the

equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor

• S3 storage services: WS-based (REST and SOAP) http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2?ie=UTF8&node=16427261&no=3440661&me=A36L942TSJ2AJA

– Storage: $0.15 per GB-Month of storage used – Data Transfer: $0.10 per GB - all data transfer IN

$0.18 per GB - first 10 TB / month data transfer OUT$0.16 per GB - next 40 TB / month data transfer OUT$0.13 per GB - data transfer out / month over 50 TB

Services may be given below actual cost for various reasons

Amazon EC2 and S3

10/9/2008 20ICCS 2008, Krakow

http://www.amazon.com/gp/browse.html?node=201590011

http://www.amazon.com/S3-AWS-home-page-Money/b/ref=sc_fe_l_2?ie=UTF8&node=16427261&no=3440661&me=A36L942TSJ2AJA









21

EGEE cost estimation (1/2)

Capital Expenditures (CAPEX):

a. Hardware costs: 55.000 CPUs - 25PB storage ~ in the order of

100M Euros (60-140M)

Depreciating the infrastructure in 5 years:25Meuros per year

(10-15M to 40-45M)

b. Cooling and power installations (supposing existing housing

facilities available)

25% of H/W costs: 25M, depreciated over 5 years: 5M Euros (2-

8M)

Total: ~ 30M Euros / year (15M-45M)

Slide Courtesy of Fotis Karayannis10/9/2008 ICCS 2008, Krakow

22

EGEE cost estimation (2/2)

Operational Expenditures (OPEX):

a. 20 MEuros per year for all EGEE costs (including site

administration, operations, middleware etc.

b. Electricity ~10% of h/w costs: 10M Euros per year (other

calculations lead to similar results)

c. Internet connectivity: Supposing no connectivity costs

(existing over-provisioned NREN connectivity)

Total 30M / year

CAPEX+OPEX= 60M per year (45-75M)

Slide Courtesy of Fotis Karayannis10/9/2008 ICCS 2008, Krakow

23

EGEE if performed with Amazon EC2 and S3

In the order of ~50M Euros, probably more cost effective of EGEE

actual cost, depending on the promotion of the EC2/S3 service

Slide Courtesy of Bob Jones10/9/2008 ICCS 2008, Krakow

24

But are clouds mature enough for big sciences?

http://www.symmetrymagazine.org/breaking/2008/05/23/are-commercial-computing-clouds-

ready-for-high-energy-physics/

http://www.csee.usf.edu/~anda/papers/dadc108-palankar.pdf

10/9/2008ICCS 2008, Krakow

Probably not yet, as not designed for them; Does not support complex

Scenarios: “S3 lacks in terms of flexible access control and support for

delegation and auditing, and it makes implicit trust assumptions”

http://www.symmetrymagazine.org/breaking/2008/05/23/are-commercial-computing-clouds-ready-for-high-energy-physics/




















Other directions: HPC in Every Lab?

X64 Server

Courtesy Kiril Faenov, MSR10/9/2008 25ICCS 2008, Krakow

Hardware Paradigm Shift

“… we see a very significant shift in what architectures will look like in the future

...

fundamentally the way we've begun to look at doing that is to move from

instruction level concurrency to … multiple cores per die. But we're going to

continue to go beyond there. And that just won't be in our server lines in the

future; this will permeate every architecture that we build. All will have massively

multicore implementations.”

Intel Developer Forum, Spring 2004

Pat Gelsinger

Chief Technology Officer, Senior Vice President

Intel Corporation

February, 19, 2004

10,000

1,000

100

10

1

„70 „80 „90 „00 „10

Po

wer

Den

sit

y (

W/c

m2)

4004

8008

8080

8085

8086

286386

486

Pentium® processors

Hot Plate

Nuclear Reactor

Rocket Nozzle

Sun‟s Surface

Intel Developer Forum, Spring 2004 - Pat Gelsinger

To Grow, To Keep Up,

We Must Embrace Parallel Computing

GO

PS

32,768

2,048

128

16

2004 2006 2008 2010 2012 2015

Today‟s Architecture: Heat becoming an

unmanageable problem!

Parallelism Opportunity

80X

10/9/2008 26ICCS 2008, Krakow

Challenge: High Productivity Computing

“Make high-end computing easier and

more productive to use.

Emphasis should be placed on time to

solution, the major metric of value to

high-end computing users…

A common software environment for

scientific computation encompassing

desktop to high-end systems will

enhance productivity gains by promoting

ease of use and manageability of

systems.”

2004 High-End Computing

Revitalization Task Force

Office of Science and

Technology Policy,

Executive Office of the

President

The Goal… More Time For Science

More Time on Real Science

Highly Skilled Scientist Spending to Much Time Doing Non-scientific Work-Past and Present Approach are Manually Intensive

TodayTomorrowNon-Scientific Activities

Integrated InformationManagement – Contextual,Collaborative and Rich Content

Not EnoughScience

10/9/2008 28ICCS 2008, Krakow

• We are at a flex point in the evolution of distributed computing (nothing new under the sun…)

• Grid remains a good solution for a reduced number of communities (and often for social/political reasons)

• Cloud computing and hosted services are emerging as the next incarnation of distributed computing with some obvious additional advantages (think of data centres located in Iceland or Siberia)

• HPC in every lab is also affordable: MS technologies

30

Conclusion (1/2)

10/9/2008 ICCS 2008, Krakow

• The emphasis should move in making computing easier for the “normal scientist”

• We should critically re-think and avoid over engineered solutions (learn from the past experience)

• If we will be successful we will be able to enable major new scientific discoveries and industry and commerce will follow as it has always happened…

31

Conclusion (2/2)

10/9/2008 ICCS 2008, Krakow

Thanks to the organizers for the kind invitation and to all

of you for your attention

fabrig microsoft com

32

Thanks

10/9/2008 ICCS 2008, Krakow

Opportunities and challenges in e Science · Outline • Background • Research e- and cyber-infrastructures for e-Science • The experience of the Grid • Examples beyond e-Science

Documents