Top Banner
SUPER- COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH
44

SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

Oct 01, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

SUPER-COMPUTERS

AT THE FRONTIERS OF EXTREME COMPUTING

NOVEMBER 2011

PUBLISHED IN PARTNERSHIP WITH

Page 2: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

JointLaboratory

SMEs

HPCAt the interface of computer science and mathematics, Inria researchers have spent 40 years establishing the scientific bases of a new field of knowledge: computational science. In inte-

raction with other scientific disciplines, computational science offers new concepts, languages, methods

and subjects for study that open new perspectives in the understanding of complex phenomena.

High Performance Computing is a

strategic topic for Inria, about thirty

Inria research teams are involved.

Inria has thus established large scale strategic partnerships with-Bull for the design of future HPC

architectures and with EDF R&D fo-

cused on high performance simulation

for energy applications.

At the international level, Inria and the

University of Illinois at Urbana-Cham-

paign (United States) created a joint

laboratory for research in supercom-

puting, the Joint Laboratory for Petascale Computing (JLPC), in

2009.

The work of this laboratory focuses

on development of algorithms and

software for computers at the peta-

flop scale and beyond. The laborato-

ry’s researchers carry out their work

as part of the Blue Waters project.

It is also noteworthy that several

former Inria spin-off companies have

developed their business on this mar-

ket, such as Kerlabs, Caps Enterprise,

Activeon or Sysfera.

Eventually, in order to boost techno-

logy transfer from public research to

industry, which is part of Inria’s core

mission, the institute has launched

an «SME go HPC» Program, together

with GENCI, OSEO and four French

industry clusters (Aerospace Valley,

Axelera, Minalogic, Systematic).

The objective of the Program is to

bring high level expertise to SMEs wil-

ling to move to Simulation and HPC as

a means to strengthen their compe-

titiveness.SMEs wanting to make use

of high-performance computing or

simulation to develop their products

and services (design, modelling, sys-

tem, test, processing and visualisation

of data) can apply on the website

devoted to this HPC-SME Initiative.

www.inria.fr

www.initiative-hpc-pme.orgInria is the only French national public research organization fully dedicated to digital sciences and it hosts more than 1000 young researchers each year.

Research and Innovation with HPC

Page 3: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

3

SU

PE

RC

OM

PU

TE

RR

S

EDITORIAL

High-performance computing, or HPC, has gradually become a part of our daily lives, even if we are not always aware of it. It is in our medicines, our

investments, in the films we go to see at the cinema and the equipment of our favourite athletes, the cars we drive and the petrol that they run on.

It makes our world a safer place, where our resources are used more wisely, and, thanks to researchers, a world we can more easily understand. Yet these giant steps forward, notably by breaking the petaflops barrier, or one million billion operations per second, will soon seem modest indeed as even greater technological upheavals lie ahead.

Cloud Computing is revolutionising and broadening access to scientific computing. The exaflops, 1,000 times more powerful than a petaflops, will give a new dimension to digital simulations. Today the great regions of the world, with the United States and China in the lead, have taken significant steps to ensure control of future technologies. Up until now, Europe has remained on the sidelines. We need to act quickly if we want to hold on to this know-how, which is essential for our independence, our research and our industries, and preserve our jobs.

OUR STAKE IN THE FUTURE

BY PHILIPPE VANNIER

Chairman and CEO of Bull.

FR

AN

ÇO

IS D

AB

UR

ON

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 4: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

The TERATEC TechnopoleCreated by a CEA initiative to develop and promote high performance simulation and computing, the TERATEC technopole is located in Bruyères-le-Chatel, in the southern part of Île-de-France, and includes all the elements of the HPC and simula-tion value chain around three entities :

The CEA Very Large Computing Center (TGCC)Infrastructure dedicated to supercomputers and equipped in particular with the CCRT machines and the European PRACE machine. It is also a place for exchanges and meetings with a «conference area» including a 200-seats auditorium.

The TERATEC Campus In the TERATEC Technopole and facing the CEA Very Large Computing Center, the TERATEC Campus, with more than 13,000 m² , regroups :

Industrial companies (systems, software and services) and a business center plus an incubator,Industrial research laboratories: Exascale Computing Research Lab (Intel / CEA / GENCI / UVSQ), Extreme Computing Lab (BULL / CEA)...A European HPC Training Institute,Platform Services accessible to all industrial companies and research organizations.

The objective of the TERATEC Campus is to provide professionals in the field of high performance simulation and compu-ting with a dynamic and user-friendly environment to serve as a crossroads for innovation in three major areas : systems performance and architecture, software development and services.

The TERATEC AssociationThe TERATEC Association regroups more than 80 partners from industry and research, having in common advanced usage and development of systems, software or services dedicated to high-performance simulation and computing.TERATEC federates and leads the HPC community to promote and develop numerical design and simulation, and facilitates exchanges and collaborations between participants. Each year, TERATEC organizes the TERATEC Forum, the major event in this domain in France and in Europe (next edition planned on June 26 and 27, 2012 - more on : www.teratec.eu)

If you are interested by joining the TERATEC Campus, contact TERATEC : [email protected] or +33(0)1 69 26 61 76.

Page 5: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

5

SU

PE

RC

OM

PU

TE

RS

06 AN ONGOING CHALLENGE FOR SUPERCOMPUTERSSince the end of nuclear testing, the CEA is taking up the challenge of ensuring the reliability and security of nuclear weapons through simulations alone.

08 HIGH-PERFORMANCE COMPUTING FOR ALL!Genci intends to provide all scientists access to high-performance computing.

11 INRIA IS LEADING THE WAY IN HPCDigital simulation on supercomputers is driving France in the race to Exascale.

12 TERA 100: A LEADER IN EFFICIENCYTera 100 is 7 times more energy-efficient than its predecessor Tera 10.

14 TRI-GATE 3D TRANSISTORS IN THE RACE TO EXASCALEThe development of exaflopic computers will depend on major technological breakthroughs.

16 MODELLING MOLECULES FOR MORE EFFECTIVE TREATMENTSSimulation should orient research towards new drugs.

20 USING SUPERCOMPUTERS TO IMPROVE TSUNAMI WARNING SYSTEMSThe effects of submarine earthquakes on coastlines could be predicted in just 15 minutes!

22 FUTURE NUCLEAR REACTORS ALREADY BENEFIT FROM HPCNational security also relies on three-dimensional modelling.

24 WATCHING MATERIALS GROW, ONE ATOM AT A TIMESimulating growth at the atomic level will lead to mastery of nanoelectronics.

26 CALCULATING NUCLEAR DISSUASIONModelling and simulations are the key tools in nuclear design.

28 UNDERSTANDING HOW A STAR IS BORNAnalysing what happens when galaxies collide and how stars are born.

30 THE PHYSICS OF SHOCKS ON AN ATOMIC SCALEThe mechanics of materials must be understood at the atomic level.

32 MARTENSITIC DEFORMATIONS SEEN THROUGH THE PROCESSOR PRISMMetal alloys can spring back to their initial shape after a major transformation.

34 USING GRAPHICS PROCESSORS TO VISUALISE LIGHTOr the eternal question of how laser beams behave…

35 THE NEXT CHALLENGE: CONTROLLING ENERGY CONSUMPTIONImproving the energy efficiency of memories and processors is a real challenge for tomorrow’s machines…

41 CORRECTING ERRORS IS A TOP PRIORITYIn the run-up to Exascale, simulation should help researchers confirm calculations, even in the event of failures.

CO

NTE

NTS

NEW HORIZONS

THE FUTURE: EXASCALE COMPUTING

MAJOR CHALLENGES

Supplement 2 of “La Recherche” cannot be sold separately from supplement 1 (LR N° 457). “La Recherche” is published by Sophia Publications, a subsidiary of Financière Tallandier.

SOPHIA PUBLICATIONS 74, avenue du Maine 75014 Paris Tel.: +33 (0)1 44 10 10 10Editorial office email: [email protected]

CEO AND PUBLICATION MANAGERPhilippe Clerget

MANAGEMENT ADVISOR Jean-Michel Ghidaglia

To contact a member of the editorial team directly by phone, dial +33 (0)1 44 10 followed by the four digits after to his or her nameEDITORIAL DIRECTOR Aline Richard

EDITOR-IN-CHIEF Luc Allemand

DEPUTY EDITOR-IN-CHIEF FOR SUPPLEMENT 2 Thomas Guillemain

EDITORIAL ASSISTANT FOR SUPPLEMENT 2 Jean-Marc Denis

ARTWORK AND LAYOUT A noir, +33 (0)1 48 06 22 22

PRODUCTION Christophe Perrusson (1378)

SALES, ADVERTISING AND DEVELOPMENT Caroline Nourry (1396)

CUSTOMER RELATIONS Laurent Petitbon (1212)

ADMINISTRATIVE AND FINANCE DIRECTOR Dounia Ammor

SALES AND PROMOTION Évelyne Miont (1380) Headings, subheadings, presentation texts and captions are written by the editorial office. The law of March 11, 1957 prohibits copying or reproduction intended for collective use. Any representation or reproduction in full or in part made without the consent of the author, or of his assigns or assignees, is unlawful (article L.122-4 of the French intellectual property Code). Any duplication must be approved by the French copyright agency (CFC, 20, rue des Grands-Augustins, 75006 Paris, France. Tel.: +33 (0)1 44 07 47 70, Fax: +33 (0)1 46 34 67 19). The editor reserves the right to refuse any insert that would be deemed contrary to the moral or material interests of the publication. Supplement 2 of “La Recherche” Joint commission of Press Publications and Agencies: 0909 K 85863 ISSN 0029-5671

PRINTED IN ITALY BYG. Canale & C., Via Liguria 24, 10071 Borgaro, Torino.

Copyright deposit .© 2011 SOPHIA PUBLICATIONS.

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 6: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

1996: FRANCE DECIDES TO DEFINITIVELY STOP ALL NUCLEAR TESTING. THIS MEANS A NEW CHALLENGE FOR THE CEA: GUARANTEEING, BY 2011, THE RELIABILITY AND SECURITY OF NUCLEAR WEAPONS EXCLUSIVELY VIA SIMULATIONS. THE FOLLOWING IS A RECAP OF THIS FIF-TEEN-YEAR INDUSTRIAL AND RESEARCH ADVENTURE, WITH JEAN GONNORD FROM THE CEA.

“AN ONGOING CHALLENGE FOR SUPERCOMPUTERS”z Now that we’ve reached the year 2011, would you say you have achieved your goals?Jean Gonnord: We have just delivered the “Standard 2010” to weapons designers i.e. the set of simulation codes for nuclear weapons that, combined with our supercomputer Tera 100, now up and running, will guarantee future nuclear warheads on submarines without conducting new nuclear tests. This is a scientific

first! Only the United States has dared, like France, to tackle this ambitious challenge.

z Today your vision of high-performance computing and simulation has been unanimously embraced by industry and research...J. G.: Luckily, yes! Europe was very far behind. From 1996 to 2006, its presence in the Top 500 evolved from 28% down to less than 17% and then back

up to 25% today. It took us ten years to convince people that high-performance simulation was strategic, not only for the industrial world - in order to reduce development cycles and cut costs – but also for research – in energy, climatology, health, etc. This is now accepted throughout the world. But computing power alone is not enough. If we consider this capacity is strategic, then we need

JEAN GONNORD

is project manager for digital simula-tions and com-puting in the military appli-cations depart-ment of the CEA.

CE

A C

AD

AM

6

NE

W H

OR

IZO

NS

NEW HORIZONS

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

z

Page 7: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

has a capacity of more than 1,000 terafl ops, or 1 petafl ops. To achieve such computing power, a massive parallel architecture was required. This meant using 100,000 high-tech processors for Tera 100. For economic reasons, we chose mass produced or “off the shelf” components. Fifteen years later, these turned out to be the most effi cient ones too. Only in a global economy can we fi nance the R&D required for a new processor. Regarding software, we decided to pool development and validation skills by using open source software. Finally, for Tera 100, we implemented a highly innovative policy, co-design, which associates the manufacturer with the user-expert in architecture. Thanks to co-design, the real utility of

materialised in the quality of the end result. This is a general-purpose machine and not some research tool like Blue Gene or IBM’s Roadrunner, in Los Alamos New Mexico, which we beat out in the end. Finally, it is a real commercial success for the French manufacturer Bull. The architecture of Tera 100 has been voted “Best architecture of the year 2009” by the American magazine HPC Wire (1). Bull has sold these computers in Europe as well as Brazil. Our British equivalent (AWE/Atomic Weapons Establishment) has purchased two 150-terafl ops machines. The Genci has ordered the Curie supercomputer, with a capacity of more than 1.5 petafl ops, for its Prace programme, which will be up and running this year at the TGCC (Très grand centre de calcul) on the CEA site in Bruyères-le-Châtel. It will be the fi rst petafl opic computer available for all European researchers. The international programme for fusion energy, F4E, ordered a 1.5 petafl ops computer in April that will be set up in Rokasho, in Japan. This proves that when you have a goal, the desire to reach it and you’re perfectly organised, nothing is impossible.

z How do you see the future?J. G.: Now we need to ensure the future of this capacity, which we have demonstrated, for Europe to design and construct these huge supercomputers, which are strategic for our economy and society as a whole. In an era where Japan and China are now leading the way, in front of the United States, Europe cannot remain the only region in the world where others control a technology that is vital for its future. This is why we support the creation of an ETP, or European Technology Platform, run by European industries and backed by major research laboratories. As far as we are concerned, R&D for the next two generations of CEA/DAM machines, Tera 100 and EXA1, has already been launched. And we will have broken the exafl ops * barrier by the end of the decade. zINTERVIEW CONDUCTED BY ISABELLE BELLIN (1) HPCwire, November 2009.

these Formula 1 computers is ensured.

z How did you organise work with the manufacturer?J. G.: On the basis of a contract, after a call for bids in 2008, which included a proposal to share R&D (and therefore intellectual property rights), construction of a demonstrator and an option to buy. The French manufacturer Bull, which had already built Tera 10, won the tender. More than two hundred people at the CEA and Bull worked together on Tera 100, both on the hardware architecture and the systems software. It was a huge human and organisational success,

reliable machines, that can withstand competition, not computational behemoths designed only for the race to the top, or military machines for our own needs. The very high level of expertise within the team and the capacity of the CEA (Atomic Energy and Alternative Energies Commission) to organise big projects did the rest. Keeping our commitments in 2010 meant we needed 500 terafl ops *. In 2001 we aimed for 5 terafl ops and designed Tera 1 and, in 2005, we reached 50 terafl ops with Tera 10. The success of Tera 10 encouraged us to double the power planned for Tera 100, which in the end

z More than 200 people at the CEA and Bull worked together on Tera 100. It is the third supercomputer set up at the CEA in Bruyères-le-Châtel (Essonne, France) dedicated to simulation. Below: a simulation of turbulent fl ows.

FLOPS (Floating point Operations Per Second) is a unit of measure for the performance of computers. One terafl ops equals one thousand billion operations per second (1012) and one exafl ops equals one billion billion operations per second (1018).

EFFICIENCYis the ratio between measured and theoretical power of a computer.

to control the technology! However, up until 2005, Europe was completely absent on the HPC (High Performance Computing) market: less than 0.2% of these machines were designed here. We have a global vision of this industry, from mastering hardware and software technologies and integrating them in supercomputers to the fi nal application. On this last point, we often felt like we were talking to a wall.

z How were you able to implement this policy?J. G.: As an engineer, by developing a long-term strategy and intermediate phases to take advantage of feedback. And setting the goal of developing general-purpose,

CEA

7

NE

W H

OR

IZO

NS

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 8: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

FLOPS (Floating point Operations Per Second) is a unit of measure for the performance of computers. One teraflops equals one thousand billion operations per second (1012), one petaflops equals one million billion operations per second (1015) and one exaflops equals one billion billion operations per second (1018).

to make even more competitive resources available, which will promote, in turn, the production of the best scientific results... It’s a virtuous circle.” PRACE is off to a good start: Hungary joined the research infrastructure on 8th June, thus becoming the 21st

European state in PRACE.As early as mid 2010, Euro-

pean scientists had access to the Jugene supercomputer, the first component in the PRACE infrastructure, located in Jeulich (Germany). Since the beginning of 2011, they have also been able to use the CURIE computer, set up in France at the TGCC (Très Grand Centre de Calcul) of the CEA. Financed by the GENCI, this supercomputer will be fully operational at the end of 2011, with a power of at least 1.8 pet-aflops. Starting in 2012, scien-tists will have access to even more machines, in Germany, Italy and Spain.

For Jérémie Bec, the first French scientist to benefit from these “European” computing hours, major scientific break-throughs are on the horizon. “Petaflopic supercomputers open the door to a new era in research, an era of experimentation in a virtual lab”, proclaims this spe-cialist in turbulent flows based at the OCA (Côte d’Azur Observa-tory), near Nice, who is studying the role of turbulent fluctuations in triggering precipitation from hot clouds.

More generally speaking, according to Alain Lichnewsky, scientific director of GENCI: “The increased capacities of supercomputers, installed in France under the aegis of the GENCI or within the framework of PRACE, allow innovative re-sults such as the generalisation

I magine machines so pow-erful they could do in a day what it would take a desk-top computer 150 years to

accomplish. Is this science fic-tion? No; just science! These ma-chines, called supercomputers, are capable of performing mil-lions of billions of operations in a single second -hence the term high-performance computing. They help reproduce, through modelling and simulation, ex-periments that cannot be con-ducted in a lab because they are too dangerous, costly, time-con-suming, complex or even inac-cessible on a human scale.

Today, digital simulation has become a key approach in sci-entific research alongside theory and experimentation. In France, the GENCI (Grand équipement national de calcul intensif) is the public organisation in charge of implementing French policy in terms of high-performance computing in academic re-search. Alongside the Ministry of Higher Education and Research, the GENCI brings together the main players in high-perfor-mance computing: the CEA, CNRS, public universities and the INRIA. “Over four years, the GENCI’s investments have helped multiply by more than 30 times the computing power available for the French scientific com-munity, which is currently ap-proximately 600 teraflops*”, adds Catherine Rivière.

Outside France, European high-performance computing is taking shape. Convinced that no country can finance and sustain-ably develop a world class com-puting structure alone, twenty representatives of European countries, including the GENCI for France, created the PRACE

(Partnership for Advanced Computing in Europe) research infrastructure on 9th June 2010 in Barcelona. Its objective? To set up and run a distributed and lasting computing infrastruc-ture in the Old World consisting in four to six centres equipped with machines offering a com-puting power greater than one petaflops*.

A virtual laboratory“The success of PRACE depends on the scientific results it can ob-tain, which must be recognised as among the best in the world, emphasises the British physi-cist Richard Kenway, President of the PRACE scientific council. This is a fundamental goal for us. Demonstrating our success is a prerequisite for enlarging the number of member countries in PRACE, who will contribute substantially to operating the re-search infrastructure. These new contributions will allow PRACE

8

NE

W H

OR

IZO

NS

zCatherine Rivière, CEO of GENCI.

DIGITAL SIMULATION HAS GRADUALLY BECOME A UNIVERSAL VECTOR FOR SCIENTIFIC AND ECONOMIC DEVELOPMENT. THE AMBITION OF THE GENCI, A PUBLIC ORGANISATION, IS TO INCREASE ACCESS TO HIGH-PERFORMANCE COMPUTING BY MAKING IT AVAILABLE TO ALL SCIENTISTS, THROUGHOUT THE COUNTRY.

ACCESS TO HIGH-PERFORMANCE COMPUTING FOR ALL!

DR

head of communications for the GENCI (Grand équipement national de calcul intensif).

BY LÆTITIA BAUDIN

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 9: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

of ab initio models based on fundamental principles in the fields of chemistry and materi-als or the collection of vital data for developing new experimental methods. As modelling has pro-gressed, and the codes of new su-percomputers have been adapt-ed, the frontier of established knowledge is being defined by confronting state-of-the-art simulation with the nature of the problems studied.”

Thus scientists can tackle in-creasingly complex phenomena and provide practical answers to crucial economic or societal problems.

Towards a pyramid schemeIn the field of climatology, for example, preserving the planet means gaining a deeper under-standing of our climate: “We ab-solutely need massive computing power to simulate, as realistically

as possible, our climate’s past, our current conditions and future trends, according to different scenarios, explains Jean Jouzel, Vice President of the IPCC (Inter-governmental Panel on Climate Change). Thanks to CURIE, we will be able to envisage climate simulations with a resolution of a dozen kilometres for the entire planet and over several hun-dreds of years. This will also al-low us to increase European >>>

9

NE

W H

OR

IZO

NS

zOfficial

inauguration of the European

research infrastructure

PRACE (Partnership

for Advanced Computing in

Europe) on 9th June 2010 in Barcelona.

zThe CURIE petaflopic supercomputer during its installation at the TGCC (Très grand centre de calcul) of the CEA in Bruyères- le-Châtel, France.C

EA

PR

AC

E

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 10: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

NE

W H

OR

IZO

NS

>>> particip a tion in the next in-ternational exercises in climate simulation.” This is only one ex-ample. Others will be presented in this edition.

Much remains to be done. “Today, we do not have an inter-mediate level, between national and European supercomput-ers and laboratory machines, underlines Olivier Pironneau, President of the CSCI (Comité stratégique du calcul intensif/Strategic Committee for Inten-sive Computing), in charge of supervising the coherency of French HPC initiatives. Devel-oping university facilities is a top priority; in fact, that is the aim of the Equip@meso project, or the Excellence facility for intensive computing of coordinated Meso-centres, which is coordinated by GENCI and involves ten different regional partners”.

Selected during an “Excel-lence facilities” call for projects, conducted under the aegis of the French General Commission for Investment (Commissariat général à l’investissement), Equip@meso benefits from €10.5 million in funds to rein-force computing means at the regional level, in order to sup-port national facilities. “There-fore, we are going to speed up construction of a HPC pyramid around three geographical stra-ta: European HPC facilities, re-sources in national computing centres and other means coordi-nated at the regional level”, adds Catherine Rivière.

Waiting for ExascaleCoordinating with universities should also allow us to deploy a concerted and complete train-ing offer for specialists master-ing high-performance comput-ing and digital simulation such as, for example, the Master in Modelling and Simulation set up by the CEA, Centrale Paris, the École polytechnique and the UVSQ (Université de Versailles Saint-Quentin-en-Yvelines).

“We do indeed need more young scientists trained in the latest computer technolo-gies, who are capable of un-derstanding, developing and maintaining the required soft-ware”, reckons Richard Ken-way. Especially since the next challenge will be the transi-tion to Exascale in around 2018: “This is much more than a theoretical change, asserts Olivier Pironneau, to success-fully make this transition, a

huge amount of work needs to be done: improved communi-cation between chips, develop-ment of software adapted to the number of processors required to perform a billion billion op-erations per second and resis-tance to failures, which should benefit from virtualisation.”

In France this decisive chal-lenge is being tackled via the ECR Lab (Exascale Computing Research Laboratory), created in partnership by the CEA, GEN-CI, Intel and UVSQ. Home to around twenty researchers, the ECR Lab is preparing and devel-oping hardware and software ar-chitectures (scientific codes and programming tools) that help support exaflopic performance. “The contribution of GENCI con-sists, notably, in preparing the French scientific community for the advent of Exascale”, explains Stéphane Requena, technical manager at GENCI.

Yet the field of high-perfor-mance computing and digital simulation is not limited to aca-demic research: “These are also strategic tools from an economic standpoint, asserts Catherine Rivière. They are an essential component in industrial produc-tivity, by considerably reducing the time spent on the design and commercialisation phases of a product or service and by signifi-cantly contributing to innovation and optimisation of production and maintenance cycles.”

Digital simulations and SMEsWhile large corporations or fi-nancial firms – like TOTAL, EDF, Airbus or BNP Paribas – have in-tegrated digital simulation and high-performance computing in their development plans, SMEs need to be convinced, mainly because they do not always mas-ter the technological, financial and human stakes. Hence the HPCPME initiative, supported by GENCI, INRIA and OSEO. Constructed in keeping with French government recom-mendations (France Numéri-que 2012), this initiative was launched just over a year ago. “Our aim is to help SMEs assess the relevance of using digital sim-ulations in terms of their business model, by calling on players in HPC who can guide them in this process”, adds Catherine Rivière.

By the summer of 2011 no less than 15 SMEs have ex-pressed their interest in benefit-ting from this support. Operat-ing in different sectors (automo-

bile, aeronautics, digital media, shipbuilding, microelectronics, signal processing, etc.) they are located throughout the country.

At a time when Asia is win-ning the supercomputer race, it is crucial to support natio-nal scientific and economic development. Digital simula-tion is just one tool. Now more than ever, we need to increase access to high-performance computing! z

zFrom top to bottom: The British physicist Richard Kenway, President of the PRACE scientific council and Olivier Pironneau, President of the CSCI (Comité Stratégique du Calcul Intensif/Strategic Committee for Intensive Computing)

10

UN

IVE

RSI

TY

OF

ED

INB

UR

GH

DR

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 11: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

THE INRIA (NATIONAL INSTITUTE FOR RESEARCH IN COMPUTER SCIENCE AND CONTROL) HAS LAUNCHED A “LARGE-SCALE INITIATIVE” IN ORDER TO OPTIMISE THE USE OF SUPERCOMPUTERS THROUGH DIGITAL SIMULATIONS. WE CAN ALREADY SAY THAT FRANCE IS UP AND RUNNING IN THE RACE TO EXASCALE.

INRIA IS LEADING THE WAY IN HPC

es and environments, systems software, algorithms and digi-tal libraries, failure tolerance. Another example is the joint national research laboratory INRIA has opened with the Eu-ropean Centre for Research and Advanced Training in Scien-tific Computation (CERFACS) in Toulouse. This laboratory is focused on the design and cre-ation of precise digital tools us-ing large numbers of processor cores for complex applications in materials physics, fl uid me-chanics and climatology.

Defi ning a road mapThis involves drawing up an in-ventory of all the applications required for such a machine, imagining the architectures that could be built in 2018 with the help of manufacturers and, fi nally, estimating the necessary R&D required to create viable software as soon as the fi rst such supercomputer is available. The IESP (International Exascale Software Project) has drawn up an initial road map. The EESI (European Exascale Software Initiative) is helping the IESP by establishing a European version. INRIA is a key player in both of these projects.

Finally, we need to examine how to adapt applications so they can work on an extreme scale. INRIA is a founder and member of the G8 ECS (En-abling Climate Simulation at Extreme Scale).

This project brings together the best teams of six countries to study new algorithms and soft-ware technologies in order to achieve the highest level of per-formance from future exascale supercomputers in the fi eld of climatology. z

H ow can we tackle the major issues involved in programming mas-sive parallel petaf-

lopic and exaflopic computer architectures? How can we imagine their deployment in or-der to understand the complex scientifi c problems and technol-ogies that are of interest for our society today? To answer these questions, INRIA has launched a “large-scale initiative” that brings together teams from its different sites around the theme of “High-performance comput-ing for computational sciences”. The objective: to set up a skills continuum with the aim of us-ing supercomputer processing and storage capacities more effi ciently to implement high-ly-complex large-scale digital simulations.

Massively parallelThe initiative will be dictated by applicative fi elds that represent key “challenges” for the comput-ing and mathematical meth-odologies studied. All these research activities – whether they concern methodology or applications – will focus on the same goal: achieving the best performance with the possibili-ties offered by current and fu-ture technologies.

This partnership within IN-RIA will be completed by the participation of other organisa-tions and industries, such as, in an initial phase, the ANDRA (French National Radioactive Waste Management Agency), BRGM (Geosciences), CEA, EDF R&D or Dassault Aviation. Their participation will include the defi nition of application chal-lenges in various fields such as the environment (seismic

risks, CO2 capture, radionuclide

transport), nuclear fusion in the framework of the Iter project (plasma dynamics) or aeronau-tics (aerodynamic design). Each of these application challenges will lead to the implementation of very large-scale digital simu-lations, in terms of problem size and the volume of data involved, on massive parallel comput-ing confi gurations with several thousand to several hundreds of thousands of cores.

INRIA has also set up joint research laboratories with key institutions in the fi eld. The fi rst of such laboratories opened two years ago, in partnership with the University of Illinois Urba-na-Champagne (USA) and the National Center for Supercom-puting Applications at the same university.

Scientific objectives were defined around four themes: parallel programming languag-

head of the NACHOS research team, dedicated to numerical modelling and high-performance computing.

head of the HEIPACS research team, dedicated to high-end parallel algorithms for challenging numerical simulations.

BY STÉPHANE LANTÉRI

AND JEAN ROMAN

11

NE

W H

OR

IZO

NS

“The next challenge for the scientifi c community regarding digital simulation will be the transition to exascale, which means the performance of the best supercomputer will be multiplied by 1,000. INRIA has taken part, since 2009, in preparing the run-up to exascale.”

Page 12: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

INSTALLED IN JULY 2010, THE TERA 100 SUPERCOMPUTER IS NOW RANKED AMONG THE MOST EFFICIENT ON THE PLANET. IT IS THE FIRST SUPERCOMPUTER IN EUROPE TO BREAK THE PETAFLOPS BARRIER. ITS GREATEST ADVANTAGE: IMPROVED ENERGY EFFICIENCY, UP TO 7 TIMES GREATER THAN ITS PREDECESSOR TERA 10, WHICH WAS 20 TIMES LESS POWERFUL!

TERA 100: A LEADER IN EFFICIENCY

zThe machine’s

interconnection network has a

highly complex system of

cables, enabling communications

between its 4,370 nodes.

The product of more than a dozen years of work on digital si-mulation and the architecture of major computing and data ma-nagement systems, Tera 100 also benefited from experience gai-ned since the beginning of the years 2000 and the successful installation and operation of its predecessors: Tera 1 in 2001 and Tera 10 in 2005. Thanks to these skills and knowledge, in terms of needs analysis and computer architecture, as well as changes in hardware and software tech-nologies, we have succeeded in defining the features of this computational behemoth.

Tera 100 was designed to be efficient no matter which digital methods or algorithms are used.

T he end result of two years of joint R&D by the engi-neers of the military ap-plications division of the

CEA (Atomic Energy and Alterna-tive Energies Commission) and the French computer company Bull, the Tera 100 supercompu-ter is the first computer designed and built in Europe to break the symbolic petaflops * barrier. A

measured performance (Rmax) of 1.05 petaflops placed it 6th, in 2010, among the 500 most powerful computers worldwide. It is currently ranked n°9 on the June 2011 list. With a remarkable efficiency * of 83.7% during the benchmark test for this ranking, it is certainly one of the most ge-neral-purpose supercomputers among the top ten worldwide.

CE

A C

AD

AM

- P.

ST

RO

PPA

/CE

A

P. STROPPA/CEA

z No sooner had Tera 100 been completely installed than the Bull-CEA team started thinking about its successor...which will be 1,000 times more powerful!

12

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

BY PIERRE LECA

head of the simulation and information sciences depart-ment of the military appli-cations division of the CEA.

AND SOPHIE HOUSSIAUX

Tera 100 project manager at Bull.

NE

W H

OR

IZO

NS

Page 13: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

RANK SITE – COUNTRY NAME FIRM POWER

(in petaflops) EFFICIENCY

(%)

1 RIKEN Advanced Institute for Computational Science – Japan K Computer Fujitsu 8.16 93.0

2 National Supercomputing Center, Tianjin – China Tianhe-1A NUDT MPP 2.57 54.6

3 DOE/Oak Ridge National Laboratory, Tennessee – United States Jaguar Cray XT 1.75 75.5

4 National Supercomputing Center, Shenzhen – China Nebulae Dawning 1.27 42.6

5 Tokyo Institute of Technology – Japan Tsubame-2.0 HP 3000SL 1.19 52.1

6 DOE/Los Alamos National Laboratory, New Mexico – United States Cielo Cray XE 1.11 81.2

7 NASA/Ames Research Center/NAS – United States Pleiades SGI 1.09 82.7

8 DOE/National Energy Research Scientific Computing Center, California – United States Hopper Cray XE 1.05 81.8

9 Atomic Energy and Alternative Energies Commission (CEA) – France Tera 100 Bull bullx 1.05 83.7

10 DOE/Los Alamos National Laboratory, New Mexico – United States Roadrunner IBM 1.04 75.7

Built with open source software and widely available processors, it has paved the way for a new competitive range of commer-cial supercomputers. Indeed, the purpose was not to build a singe object, but to integrate this project in an industrial model.

The widely used architec-ture of X86 microprocessors was selected for the project. There-fore the supercomputer easily integrates the environment of workstations operating under Linux.

Design choices took into consideration technological evolutions in microprocessors. Their frequency – or speed – is no longer increasing; instead it is the number of cores, or basic processing units, that is being multiplied within each micro-processor. Those selected to build Tera 100, the Intel Xeon X7560 – also known as the Neha-lem-EX – have eight cores. Four microprocessors are assembled to form a multi-processor (the bullx 3060 server), the founda-tion of the supercomputer, cal-led a “node” in computer speak. Thus, at the heart of each node, 32 cores share 64 gigabytes of memory. To achieve its com-puting power, Terra 100 inter-connects 4,370 nodes, or 17,480 microprocessors and nearly 140,000 cores.

Energy consumption under controlThe interconnection network was designed according to an unusual architecture, a topo-logy in archipelagos, linking clusters of nodes and offering easy access to data storage me-dia. The maximum aggregated throughput of the network can reach 13 terabytes (13 thousand bytes) per second. Overall, the architecture of Tera 100 is a lot like a series of Russian dolls. First, there is a microprocessor, nested inside a node (4 micro-processors), inside a cabinet (24 nodes), within an archipelago (around 20 cabinets), which all make up the supercomputer (10 archipelagos). Moreover, the in-terconnection network has two levels: intra-archipelago and inter-archipelago. This ensures data is communicated between any two nodes in the computer.

Controlling energy consump-tion, a major problem for this type of mega facility, was one of the main preoccupations for the Bull-CEA team. Among its main innovations a water-

based cooling system, installed in the cabinet doors has allowed the facility to remain compact. The supercomputer only occu-pies 650 m2 of floor space. This cooling system, which is located closer to where heat originates, improves the energy efficiency of the computing facility. Fur-thermore, consumption is mo-dulated according to workloads, which involves lowering the core frequencies when they are not using their full power. During normal use, this should limit the electrical power required to 3 megawatts. Twenty times more powerful than its predecessor, Tera 10, Tera 100 improves en-ergy efficiency by 7 fold.

In order to benefit from the most advanced software, developments were conducted with the international commu-nity, including specialists from the US Department of Energy, industrialists and academic researchers, notably on Lustre,

THE TOP 10 SUPERCOMPUTERS

LE FLOPS (Floating point Operations Per Second) is a unit of measure for the performance of computers. One teraflops equals one thousand billion operations per second (1012), one petaflops equals one million billion operations per second (1015) and one exaflops equals one billion billion operations per second (1018).

Efficiency is the ratio between the measured and theoretical power of a computer.

the open source file manage-ment system. This “distributed” system can share data between hundreds of nodes. The software environment takes into consi-deration the specific features of nodes and the special topology of the interconnection network. Of course, this is the case for the communication library, but also for the resource management software, which organises calcu-lations according to their profile.

Finally, an administration and supervision programme checks the status of the different components (memories, pro-cessors, network...) to prevent failures during computation. Maintaining the computer in operational condition is critical during its entire life cycle. Now that the petaflops barrier has been broken, the Bull-CEA team is now working on the Holy Grail of supercomputing: exaflops*, or a power 1,000 times greater than Tera 100. z

A FRENCH SUPERCOMPUTER RANKED 9TH WORLDWIDEEach year, in June and November, a ranking of the 500 most powerful supercomputers worldwide is established on the basis a benchmark system of equations called LINPACK. In the most recent ranking, in June 2011, all the computers – compared to 7 in November 2010 – had broken the petaflops barrier. This speed record was broken for the first time in 2008 by the IBM Roadrunner, currently ranked 10th. Tera 100 is the most powerful supercomputer in Europe, as was its predecessor, Tera 10, in June 2006. With its 9th place worldwide, Tera 100 ranks second in terms of efficiency (83.7%), in other words, for its reliability compared to its theoretical performance. China continues on its dizzying ascension, with two machines in the latest Top 10, but with low efficiency. The United States remains the unchallenged leader in the number of facilities, but Japan is again ranked N° 1 – and by a long shot – with the K computer by Fujitsu, running at 8.16 petaflops.

13

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

NE

W H

OR

IZO

NS

Page 14: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

should allow Moore’s Law, and the historic pace of innovation, to continue.”

Tri-Gate transistors allow chips to function at lower volt-ages and with less leaking. The result is an unusual combina-tion of improved performance and greater energy efficiency compared to the previous gen-eration of transistors, even the most modern ones. The new 22 nm Tri-Gate 3D transistor improves performance by 37% compared to Intel’s 32 nm pla-nar transistors. Moreover, these new transistors use less than

I n the beginning of May, Intel announced a major innovation in transistors, a microscopic component

that is the basis of modern elec-tronics. For the first time since their invention more than fifty years ago, transistor design is about to evolve, taking on a three-dimensional structure. These revolutionary new 3D transistors, called Tri-Gate, will be integrated in a microproces-sor for the first time (code name: Ivy Bridge) using a 22-nanome-tre etching process.

Until now, and for decades, all transistors have used a planar 2D structure that can be found in all computers, mobile phones and consumer electronics, but also in onboard control systems in vehicles, avionics, household appliances, medical equipment and literally thousands of other devices that we use every day, hence the importance of this innovation.

Scientists recognised the interest of a 3D structure a long time ago in improving proces-sor characteristics. Today, the microscopic size of transistors makes their design even more difficult, as they are subject to the physical laws of the infinite-ly small. Therefore, this is a real technological feat, regarding the design of the processor in and of itself, as well as the fact it can be mass produced.

The Tri-Gate 3D marks a new era in transistors. The 2D planar (or “flat”) power-conducting channel is replaced by a thin 3D fin that rises vertically from the silicon of the transistor. Current control is then gated on each of the three sides of the 3D tran-sistor, rather than just on the top side, as is the case for the

current generation of planar 2D transistors. This additional control means there is as much current flowing as possible when the transistor is in the “on” state, increasing performance. Alter-natively, when the transistor is in the “off” state, the flow is close to zero, minimizing energy con-sumption. It also allows the tran-sistor to switch very quickly from one state to another, once again increasing performance.

Continuing Moore’s LawJust as a skyscraper allows urban developers to optimise the avail-able space by building a vertical structure, the Tri-Gate 3D tran-sistor is a means of managing density. As the fins are vertical, transistors can be arranged more densely side by side, which is es-sential in order to exploit the technological and economic advantages of Moore’s Law. For future generations of transistors, designers will be able to length-en these fins to achieve even better performance and energy efficiency.

Fore more than forty years, the economic model of the semiconductor industry has been dictated by Moore’s Law, named after Gordon Moore, co-founder of Intel. It describes a long-term trend according to which the number of transistors that can be placed inexpensively on an integrated circuit doubles approximately every two years, exponentially improving func-tionality, performance and costs. Commenting on the advent of a 3D structure, Gordon Moore observes that “for years we have seen limits to how small transis-tors can get. This change in the basic structure is a truly revolu-tionary approach, and one that

14

NE

W H

OR

IZO

NS

THE DEVELOPMENT OF EXAFLOPIC COMPUTERS, WITH A COMPUTING POWER A THOUSAND TIMES GREATER THAN CURRENT SUPERCOMPUTERS, WILL DEPEND ON MAJOR TECHNOLOGICAL BREAKTHROUGHS IN TERMS OF BOTH HARDWARE AND SOFTWARE.

TRI-GATE 3D TRANSISTORS IN THE RACE TO EXASCALE

head of public sector HPC and research at Intel.

BY MARC DOLLFUS

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 15: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

half the energy to achieve the same level of performance as their 2D predecessors. This sig-nificant gain makes them highly attractive for use in small pock-et-sized terminals, for which en-ergy consumption is a key issue.

An unexpected leap forwardFor Mark Bohr, Intel Senior Fel-low, who contributed a great deal to these innovations, the

gains in performance and en-ergy savings offered by Tri-Gate 3D transistors are unlike any-thing that has been done be-fore. This phase is much more than a simple confirmation of Moore’s Law. The advantages in terms of voltage and energy consumption are much greater than those generally achieved in a single generation of etching techniques. They will offer de-

signers the flexibility they need to make existing devices more intelligent. They will also allow them to design completely new products.

The Tri-Gate 3D transistor will be deployed during the shift to a new manufacturing pro-cess: 22 nm etching. Intel Core processors with Ivy Bridge chip-sets will be the first to be mass produced, at the end of 2011. z

15

NE

W H

OR

IZO

NS

zUp until now, transistors had a two-dimensional structure (left: the 90 nm transistor, presented in 2002). The 3D Tri-Gate (pictured above) has a vertical fin that rises from the silicon of the transistor.

PR

AC

E

INT

EL

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 16: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

it allows these calculations to be performed m o r e a n d more rapidly. “At the start of the years 2000, predicting a single interaction between a therapeutic mol-ecule and a biomol-ecule required three months of calcula-tions! recalls Michel Masella, researcher at the living chemistry laboratory of the CEA (Atomic Energy and Alternative Ener-gies Commission). At the time, some people thought digital simulations were amusing and that it would be quicker to con-duct experiments and wait three months for the results. Today, our goal is the produce 100 to 1,000 predictions of interactions a day, which will be possible thanks to the resources offered by the Cu-rie supercomputer, hosted by the CEA, and which will soon have 80,000 processors.” This is a gi-ant step forward! A step forward from a technical standpoint, be-cause more effi cient machines were needed, but also in terms of physics and algorithms, be-cause the methods researchers are currently using have been signifi cantly improved and op-timised. Certain CEA research units continue, in fact, to work on improving the quantitative effi ciency of algorithmic codes i.e. increasing the number of op-erations performed per second,

T here are thousands of molecules that could potentially be used for therapeutic purposes.

This is the problem facing phar-maceutical laboratories every day. It is impossible to organise clinical trials for each one. How can we predict which biomol-ecules have the bests chances of success? One solution, which was once considered only a dream, consists in digital simu-lations. A colossal machine, a supercomputer, could become a vital tool in understanding treat-ments at the molecular level.

“Thanks to the computing power already available, it is now possible to simulate the be-haviour of biomacromolecules – proteins, nucleic acids, poly-saccharides, etc – in their natu-ral environment and understand their interactions and functional roles inside the cell”, explains Richard Lavery, researcher at the CNRS (National Centre for Scientifi c Research) BMSII labo-ratory (Structural and Molecu-lar Basis of Infectious Systems) at the University of Lyon 1. “By replacing atomic models with simplified representations, we can even build a model of a virus – containing the equivalent of 15 million atoms – and observe its structure evolve over time.”

Simulation can be used to probe what happens inside a liv-ing cell. Practically speaking, this means modelling systems with tens of millions of atoms, pre-dicting the forces established be-tween them - interactions - and

estimating precisely how they will behave inside the human body. However, the stronger an interaction is, the more likely the molecule under consideration will be an effective therapeutic or diagnostic agent. “The prob-lem is that simulation forces us to adopt models of macromolecules that comply with traditional Newtonian mechanics rather then the quantum mechanics of Schrödinger, which are better suited to the molecular world”, underlines Richard Lavery.

Indeed, in simple terms, a chemical link between two atoms corresponds to an ex-change of electrons, a phenom-enon that is essentially quantic. However, this is a very diffi cult equation to solve as there are thousands, even millions, of electrons that need to be taken into account. And while atomic interactions are important, it is also important to bear in mind that atoms move, that atomic structures fold, etc. These com-plex movements over time rep-resent the dynamic part of the molecule. “Biomacromolecules have fl exible structures and their functions involve movements over several time scales, from a femtosecond * to a second”, adds Richard Lavery. Thus, many cal-culations are required to repre-sent their behaviour with suffi -cient precision.

The problem of scalabilityIt is essentially for this rea-son that the power of super-computers is vital, because

HOW CAN WE DETERMINE THE EFFECTIVENESS OF A THERAPEUTIC MOLECULE BEFORE LAUNCHING CLINICAL TRIALS? ONE SOLUTION COULD LIE IN THE USE OF SUPERCOMPUTERS. IN PREPARING TESTS, SIMULATIONS SHOULD BE ABLE TO ORIENT RESEARCH TOWARDS NEW DRUGS.

MODELLING MOLECULES FOR MORE EFFECTIVE TREATMENTS

FEMTOSECONDA nanosecond (1 ns) equals 109 seconds. A picosecond (1 ps) equals 1012 seconds. A femstosecond (1 fs) equals 1015 seconds.

SOLVATATIONA physiochemical phenomenon observed when a chemical compound is dissolved in a solvent and the atoms, ions or molecules of the chemical species are dissipated in the solution by interacting with molecules in the solvent.

LES GRANDS CHALLENGES16

MA

JO

R C

HA

LL

EN

GE

S

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 17: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

quantum physics laboratory, Michel Caffarel is leading a re-search team dedicated to the development of an alternative method for solving Shrödinger’s equation, as precise as DFT but not restricted by the number of processors. This is called “ideal scalability”, a Monte-Carlo quantum method. The term re-fers to the famous casino and is explained by the introduction

of random electronic trajec-tories, constructed by a computer picking ran-dom series of numbers, like a roulette wheel. “The advantage, with the probabilistic dy-

namics of electrons, is that it is parallel-isable. We can per-form parallel cal-culations without knowing what is happening on each processor, explains

Michel Caffarel. Why? Simply because the simulation can be broken down into

a set of independent electronic trajectories.

This unique property allows us to use an arbi-trary number of proces-sors, because they do not communicate with each

other during the entire simulation.” In practice, each proces-

sor calculates a single trajec-tory. Then the mean of the re-sults obtained is used to recon-stitute the final result, which should correspond to a single electronic trajectory, obtained by juxtaposing individual ones. Last spring this method was tested successfully on the Curie machine. Working with Anthony Scemama, a young research at the CNRS, a simulation was con-ducted on a set of 10,000 proces-sors of the machine with perfect scalability. The simulation was therefore performed 10,000 times faster than it would have taken on a single processor. “I’ve been developing these methods for twenty years now and I think we have reached a turning point. We will be able to surpass current methods in just a few years!” ex-ults Michel Caffarel. And the val-idation phase is about to start. “In October we will study the interaction of basic molecules in the chemistry of Alzheimer’s disease – the aggregation of amy-loid peptides – using 80,000 pro-cessors on the Curie super >>>

equation describing the quan-tum “nature” of electrons, it is considered a good compromise in terms of speed and chemical precision. Unfortunately, due to its poor “scalability”, taking ad-vantage of platforms with more than a few thousand processors would be very diffi cult.

Random trajectoriesThis is why research is focusing on the development of new ap-proaches intrinsically adapted to computers with an arbitrary number of processors. In Tou-louse, in the chemistry and

CEA

zThis model of a fl u virus obtained through digital simulation will eventually allow the study of interactions between the main proteins (in orange, ivory and red) and therapeutic molecules. (D. Parton - Oxford University, et al.)

IBP

C, P

AR

IS

or fl ops. However, these quan-tum methods “will not make the scale” to exascale. In computer jargon, we talk about “non-scal-ability”, meaning that it is impos-sible to use these calculations to develop the full potential of exascale computers, which will have millions, even billions, of processors. Today, the method of choice in quantum computing of living molecules is one that emerged in the 90s: DFT or Den-sity Functional Theory, involving electronic structures. While this approach can only be used to ap-proximately solve Schrödinger’s

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

17

LE

S G

RA

ND

S C

HA

LL

EN

GE

S

Page 18: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

>>> computer, and over the long term we can imagine contribut-ing to the understanding of cer-tain aspects of neurodegenera-tive diseases”. What a challenge! Above all, this offers new hope in developing research on patholo-gies for which clinical trials are very difficult to organise.

In the more distant future lies the possibility of more targeted medical research. “We are going to start integrating precise digi-tal simulations upstream in all our experimental programmes, which is what we did before with very simplified models, in order shift to a more efficient predictive phase”, forecasts Michel Masella. Cooperation between theory and experimentation is starting to take shape. This is should al-low researchers to predict highly probable paths to success.

It’s all about interactions“Yet we need to remain realistic, warns Michel Masella. In biology, chemical reactions sometimes depend on tiny things.” While this approach is highly promis-ing, we must continue to opti-mise “pre-selected” molecules. This is why models need to be refined as much as possible. “Little by little we are correcting known defects. For example, we are currently working on long range electrostatic interactions. The existing models only analyse interactions at relatively short distances, which doesn’t ad-equately describe the solvation* of charged molecules. However, this is an important parameter in understanding the reaction of a drug dissolved in a solvent. Now the goal is to adapt the mode to charged interactions.”

Furthermore, if we want to understand more clearly what happens inside a living cell, we must not forget that in addition to the therapeutic interactions of the drug with a cellular pro-tein, proteins also interact with each other. And, to make matters even more complicated, genet-ics mean that each individual displays variations in these pro-teins. While producing individu-alised simulations of therapeutic interactions seems like science fiction today, Michel Masella does not completely dismiss the possibility. “When we will be capable of identifying muta-tions, we will be able to envisage what we refer to as personalisa-tion. But that will take another twenty years!” z MORGANE KERGOAT

zThe movement of electrons (white and grey dots) of this beta-amyloid peptide involved in Alzheimer’s disease was simulated with the quantum Monte-Carlo method. During each step of the simulation, the colour of the electrons is modified. (A. Scemama-CEA, M. Caffarel -CNRS).

DR

THE NEED FOR GREATER PRECISIONUnlike aeronautics or climatology, which have relied on digital models for a long time now, biochemistry is having difficulty integrating them in its culture. In the first two fields, there is no need for highly-evolved models to make predictions based on simulations. While in biology, during a test phase a result must corroborate the entire experiment so the method can be confirmed. If this is not the case and a difference is observed, the results are not corrected, but instead it is the entire theoretical model that must be reviewed. This requires calculations with an extreme level of precision and constantly adapted codes. This culture of excellence comes with a trade off: falling behind other scientific fields that rely more heavily on digital simulations.

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

LE

S G

RA

ND

S C

HA

LL

EN

GE

S

18

Page 19: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

P. A

VAV

IAN

/CE

A

DR

P. A

VAV

IAN

/CE

A

Page 20: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

20

would reach the Polynesian coasts in no more than three hours. When an earthquake is detected, re-searchers analyse its location and magnitude and then estimate if it could form a tsunami. Tradi-tionally, the estimation of the size of the expected tsunami was per-formed using an empirical law based on previously observed events in Polynesia. This is how the tsunami alert was established for Chile in February 2010. An-other method, which the CCPT started to develop in 2009, was used in an operational manner for the Japanese tsunami of 2011. It is based on data from 260 pre-calculated tsunami scenarios, whose fictional sources are spread across the main Pacific subduc-tion * zones. This new method reduces uncertainty regarding the height of the expected waves and does so very quickly after the earth-quake, but it does not offer a de-tailed map of expected heights along the coastline.

A triple modelWhile more precise, a complete simulation of a tsunami with de-tailed information on the coastline

required a computation time that prohibited any real time use. How-ever, the parallelisation of code used in high-performance com-puting offers new possibilities. Digital simulations of tsunamis are based on modelling three phe-nomena: deformations in the ocean floor caused by an earthquake, displacement of the overlying wa-ter and coastal effects. The initial deformation of the ocean floor is calculated using elastic models of the Earth’s crustal deformation.

Propagation, on the other hand, involves solving a set of non-lin-ear equations used in fluid me-chanics for “long waves”: the length of tsunami waves (100 to 300 km) is much greater then the depth of the propagation zone (4 to 6 km). Finally, simulating the effects of a tsunami as it approaches the coastline is only possible if high resolution bathymetric (sea depth) and topographical data are avail-able. The precision of representa-tions of physical processes such as flooding, whirlpools or ampli-fication through resonance in a harbour depends on the resolu-tion of computing grids. With the model we use, we can access a

A s we observed last March in Japan, tsunamis can devastate entire coastal areas and cause consid-

erable damage. These particular waves translate into successive flooding of the coast – every 20 to 40 minutes – alternating with marked withdrawals of the sea. Tsunamis are generated by strong earthquakes, generally with a magnitude greater than 7.5 on the Richter scale, that occur in subduction zones *. In the Pacific Ocean these phenomena are more frequent, due to the intense tec-tonic activity in that part of the world.

During the 1960s, after five catastrophic tsunamis in the Pa-cific, the French Polynesia Warn-ing Centre (CCPT-Centre Polynésien de Prévention des Tsunamis) was created at the CEA Geophysics Laboratory based in Tahiti. Its mission is to ensure constant sur-veillance of overall seismic activ-ity in the Pacific Ocean in order to alert Polynesian authorities. The closest region where earth-quakes could occur is in the Ker-madec-Tonga subduction zone. A tsunami triggered in this zone

MA

JO

R C

HA

LL

EN

GE

S

WHEN A SUBMARINE EARTHQUAKE CAUSES A TSUNAMI, THE TIME REQUIRED TO CALCULATE THE HEIGHT OF THE WAVES EXPECTED ON THE COAST PROHIBITS ANY REAL TIME USE. THANKS TO SUPERCOMPUTERS, THE EFFECTS OF TSUNAMIS ON COASTLINES COULD BE PREDICTED IN FIFTEEN MINUTES.

USING SUPERCOMPUTERS TO IMPROVE TSUNAMI WARNING SYSTEMS

BY ANTHONY JAMELOT AND DOMINIQUE REYMOND

of the Pamatai Geophysics Laboratory in Tahiti.

BY SÉBASTIEN ALLGEYER, FRANÇOIS SCHINDELÉ AND HÉLÈNE HÉBERT

of the Analysis, Surveillance and Environ-ment Depart-ment of the CEA Military Appli-cations Division.

z The height of a tsunami wave as it hits the coast (photomontage opposite) is only measured today after the fact. Faster calculations could be used to anticipate waves and evacuate populations.

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

STO

CK

LIB

© C

HR

IST

OPH

E F

OU

QU

IN

Page 21: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

Since the middle of the 20th cen-tury, more than fifteen tsunamis have been observed in the bays of these islands. Considering the magnitude of the Chilean earth-quake, the empirical law used for this alert indicated wave heights of up to 3 metres for the Marque-sas and 2 metres for Tahiti.

From 36 hours to 15 minutesThe alert level rapidly rose to “red”, which implies the evacuation of coastal areas and ports. The max-imum water level (from “crest to trough”) measured on tide gauges in ports reached more than 3 me-tres in Hiva Oa and Nuku Hiva, compared to approximately 35 cm in the port of Papeete. The heights observed on the coasts of the Mar-quesas (excluding tide gauges) range up to 3 metres above low-tide level, but the tsunami hit the coast at low tide. The alert worked well and there were no victims, only property damage – a boat whose owner refused to evacuate. The digital simulation of the event was conducted a posteriori on two hundred CCRT processors, for fif-teen bays in the Marquesas Islands. The results were obtained after

21

bathymetric description with a spatial resolution of up to 5 km for the entire Pacific Ocean and up to 15 metres for harbours and inlets. We have high definition bathymetric and topographic data for nineteen Polynesian bays. Par-allelisation of code and the use of the CCRT (Centre de calcul re-cherche et technologie-Research and Technology Computing Cen-tre) at the CEA’s Ile-de-France site have been essential in multiplying the number of studies and reduc-ing uncertainty.

However, what are these tech-niques worth when compared to actual experience? We were able to answer this question af-ter the Chilean tsunami on Feb-ruary 27th 2010. At 6:34 GMT a major earthquake (magnitude 8.8) occurred off the coast of Chile. As expected, for this type of quake, the tsunami caused significant damage off Chile and then prop-agated across the Pacific. In French Polynesia, the Marquesas Islands were affected the most. Indeed, gentle submarine slopes and large, open bays with no barrier reefs offer favourable conditions for the amplification of tsunamis.

z Figure 1. Maximum heights after fifteen hours of tsunami propagation. Comparison of records from two tsunami monitoring systems by DART® (Deep-ocean Assessment and Reporting of Tsunamis) and simulations.

SUBDUCTION A geological process by which one tectonic plate is forced under another and sinks into the Earth’s mantle.

only 15 minutes of computation, where this would have taken 36 hours with a single processor. This allowed us to establish the distri-bution of maximum heights off the coast after fifteen hours of propagation – or after covering a distance of 10,000 km. We could observe that French Polynesia was in the main energy axis (Figure 1).

Estimating water heightsFurthermore, digital simulation of the propagated tsunami in in-creasingly smaller grids around the coast (10 to 15 metres of resolu-tion) has allowed us to estimate maximal water heights as well as horizontal speed fields describing currents at a given moment for the fifteen bays. Conclusion: syn-thetic tidal curves are comparable to real readings of harbour tide gauges. Similarly, the comparison is consis tent with water heights reported in eye witness accounts or photos, for instance in the Tahauku Bay and Hiva Oa Island. A whirlpool pho tographed in Hakahau Bay (Ua Pou Island), af-ter approximately 11 hrs and 45 min of propagation, was repro-duced at the same moment.

This complete simulation of a tsu nami, from its source to the coast line, shows how the preci-sion of these results constitutes a vital tool in warning populations, in ad dit ion to existing methods. Indeed, knowing the level of flood-ing of coa stal areas in advance, with minimal uncertainty, would revolutionise the management of alerts. This is not possible today, but we can imagine that in the near future we will be able to ob-tain predictive re s ults from these simulations for all of French Poly-nesia in less than an hour. This implies, of course, that adequate computing means are dedicated to the warning system. z

MA

JO

R C

HA

LL

EN

GE

S

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

CE

A

Page 22: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

22

tron balance equations: the Boltzmann equation can be used to model the life cycle of neutrons, while the Bateman equation trans-lates the evolution of isotopes over time. These are two “exact” equa-tions, i.e. “without approximation”. They use physical units that char-acterise the interaction of particles, such as neutrons, with the fuel and materials of the reactor. And basic nuclear data is produced through precise experimental measurements.

The finesse of modellingAll this seems well and good: pre-cise data and equations can model

Whether it comes to se-curity, extending the life cycle of reactors or optimising waste

management, simulation plays an important and increasingly prevalent role in the nuclear power industry. Engineering teams as well as R&D are using them more and more to calculate the normal behaviour of systems, of course, but also to imagine fields of op-eration beyond what experience can measure. This is a major point in nuclear safety. Like any indus-trial sector, this approach is based on a modelling trifecta of physical phenomena, digital simulation

TODAY WE CANNOT IMAGINE THE NUCLEAR POWER INDUSTRY WITHOUT SIMULATIONS. THEY ARE USED TO OPTIMISE THE ENTIRE FUEL CYCLE, INCLUDING MANAGEMENT OF NUCLEAR WASTE. THREE DIMENSIONAL MODELLING HELPS ENGINEERS IN THEIR WORK… AND KEEPS REACTORS SAFE.

FUTURE NUCLEAR REACTORS ALREADY BENEFIT FROM HPC

MA

JO

R C

HA

LL

EN

GE

S

head of the laboratory in the reactor engineer-ing and applied mathematics department of the nuclear en-ergy division of the CEA, expert in high-perfor-mance comput-ing applied to reactor physics.

BY CHRISTOPHECALVIN

CE

A

and experimental validation. In the field of reactor physics and the fuel cycle, several physical phenomena are the object of cal-culations. The kinetics and dis-tribution of neutrons in the core determine control of the chain reaction and the nuclear fuel. The propagation of ionizing radiation is calculated both for the protec-tion of people and in order to un-derstand its effects on materials. Finally, the evolution of nuclear fuel is directly linked to the opti-misation of fissile resources and waste management.

Theoretical modelling of these phenomena is based on two neu-

z The core of a nuclear reactor is loaded with fuel. Its composition and position are key parameters for the plant’s power and security.

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 23: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

23

MA

JO

R C

HA

LL

EN

GE

S

biases of calculation systems for nuclear reactors is one of the key points in designing tomorrow’s reactors.

By associating HPC with mod-ern code, engineers manage to asses these uncertainties more effectively. They can rapidly learn the precise impact of a more or less fine computation grid on the final result. They can therefore define “reference” solutions for assessing uncertainties. And to do this they can rely on signifi-cant means: computing power in the hundreds of teraflops *, us-ing more than 30,000 processors simultaneously. And the result: an extremely realistic 3D com-putation of the reactor core, thanks to the precise solution of a neu-tron transport equation in just a few hours. With a single proces-sor, the same operation would take at least a year…

Changed at the coreFinally, high-performance com-puting can be used to implement innovative methods for optimis-ing operational parameters and systematically mastering uncer-tainties. A typical example of this approach is the combined use of computing power, artificial intel-ligence methods and a new gen-eration of simulation codes to optimise loading plans for the reactor core. What does this mean?

A reactor core is made up of fuel assemblies. These components remain a certain time inside the reactor core. Their position, de-pending on their type (nature of the fuel and time spent in the core), constitutes the core loading plan. According to this loading plan the main operational parameters re-garding safety and yields – like the maximum power of the reactor – can be optimised.

Up until now, these optimisa-tions were performed based on human expertise and feedback from existing reactors. However, for new reactor concepts, it is more and more difficult, and above all time-consuming, to optimise fuel loading “manually”, considering the number of possible configu-rations.

The design of an optimisation tool, suitable for determining nu-clear fuel loading plans and inde-pendent from the configuration under study (core, fuel) can fa-cilitate the work of engineers and above all reduce the time spent on studies. This tool, based on multi-criteria optimisation soft-ware integrating genetic algorithms (VIZIR) and APOLLO3 code, allows the designer to improve reactor performance and security. This is how, with the help of 4,000 pro-cessors, we can now find solutions for complex reactor core loading plans in less than 24 hours. z

physical phenomena with great accuracy. However, these equa-tions involve solving systems with more than...1,000 billion un-knowns! An impossible mission, no matter how much computing power is available. Thus, in order to solve these equations, it is nec-essary to mobilise two comple-mentary approaches. On the one hand there is the determinist ap-proach. It relies on physical hy-potheses and digital models in order to solve a problem. This is the basis of the APOLLO2 and APOLLO3 codes developed by the nuclear energy department of the CEA (CEA/DEN). On the other hand there is the Monte-Carlo approach. This relies on native representation of data to “play” the life of billions of “useful” neu-trons (TRIPOLI - 4(tm) code, also developed at the CEA/DEN). These digital simulation methods are closely coupled with experiments and measurements.

In the field of reactor physics, high-performance computing (HPC) leads to modifications in the way digital simulations are used. Evolutions in computer codes (algorithms, digital methods, etc) and the effective use of increased computing power continually contribute to improving the ac-curacy and finesse of models. The information obtained via digital simulation is also more complete, thanks to the generalisation of three-dimensional calculations – which first appeared in the 90s – and the consideration of multi-physical phenomena (neutron transport, thermohydraulics, fuel modelistaion, etc).

Another step forward is the possibility of simulating simulta-neously an ever vaster range of power plant components. For ex-ample, HPC opens the door to modelling, at the heart of three-dimensional simulations, the re-actor core and the boiler while taking into consideration nuclear power and thermo-hydraulic phe-nomena together – primary water system, steam generators, etc. Thus engineers obtain much more precise results without spending more time than they did before on common simplified models.

Let us take the example of the nuclear simulation code APOLLO3, which was used at the end of 2010 on the new petaflopic supercom-puter (Tera 100) at the CEA/DAM (military applications department of the CEA) to produce 3D simu-lations of a fourth generation re-actor, the successor of the EPR. Determining uncertainties and

z In this image of a fourth generation reactor core, calculated using APOLLO3 code, each diamond represents an assembly of nuclear fuel. The level of nuclear power for each one is represented by a colour, from blue (no power) to red (maximal power). The blue diamonds in the centre are control rods that are used to manage the nuclear reaction.

FLOPS(Floating point Operations Per Second) is a unit of measure for the performance of computers. One teraflops equals one thousand billion operations per second (1012), one petaflops equals one million billion operations per second (1015) and one exaflops equals one billion billion operations per second (1018).

CE

A

0

1

0

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 24: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

24

Basle) or ART (Activation Relax-ation Technics, developed by Nor-mand Mousseau, at the University of Montreal).

Currently a large number of algorithms have been developed and are being tested to search as quickly as possible for the most stable atomic configurations and energy barriers. These algorithms are required to understand, among other things, protein folding.

The second step consists in comparing the different mecha-nisms for distributing atoms. To do this, we need to call on statis-tical physics, and more precisely a Kinetic Monte Carlo algorithm, which consists in randomly “draw-ing” a new atomic position, de-pending on the energy barrier that must be overcome. It is therefore possible to conduct numerical experiments of material growth by considering each atom, one by one.

However, the calculation of minima and energy barriers us-ing ab initio methods is a hand-icap for this approach, due to the huge computing power required. Indeed, the Schrödinger equa-tion needs to be solved! For this we need to mobilise the “Kohn-Sham system”, based on the “den-sity functional theory”, honoured by a Nobel Prize in Chemistry, in 1998. What does it consist in? The underlying idea is that electronic density is sufficient to determine the fundamental state of elec-trons for a given atomic position. If we consider that electrons are always in equilibrium when at-oms move, it is possible to cal-culate the atomic forces and find the minima – i.e. the stable atomic configurations – and energy bar-riers (Figure 1).

Our group is focused on sim-ulating the growth of graphene

on silicon carbide (SiC) (Figure 2). We have considered bidirec-tional periodic surfaces consisting in 700 atoms, or 2,608 electrons. In one of these directions, the sil-icon carbide crystal (SiC) is made up of alternating layers of pure carbon and silicon. A surface can therefore end with a layer of car-bon or silicon. Ab initio methods are essential in order to describe, in particular, the links of carbon atoms, which are different in sil-icon carbide and graphene. To give an idea, the calculation of one atomic configuration requires around five hours, using a super-computer with 600 parallel pos-sessors!

Wavelets for good measureBigDFT code uses new mathe-matical functions – wavelets – in a novel way, which had been used up until now mainly to compress images. The code has been opti-mised – in partnership with the Grenoble computer lab – to use several cores for each electron simulated. In practice, it is there-fore possible for our system of 2,608 electrons to use more than 2, 608 cores. BigDFT code is also capable of using graphic proces-sors, which reduces computing time by a factor of 10.

The Tera 100 supercomputer can perform calculations 2,000 times faster than a traditional computer. Thus, we can deter-mine two new minima per day with the ART algorithm. To achieve this, the algorithm has been highly optimised. The goal is to reduce by a factor of 4 the number of energy assessments required to determine a minimum and achieve results with only 400 assessments – the basis of a preliminary study that has just been published in the Journal of Chemical Physics.

On the atomic level, ma-terials grow with the movement of atoms as they arrive at the sur-

face and combine with others that are already there. At the same time, this minimises the energy of the system, which is why atoms or-ganise in highly regular networks, like crystals. To study this phe-nomenon, we need to identify the different mechanisms behind atomic distribution and then com-pare them at different tempera-tures. This means simulating mat-ter on an atomic scale.

We can model growth using phenomenological models i.e. using a model with very few pa-rameters and based on certain hypotheses to describe the essen-tial features of a physical phenom-enon. However, it is difficult to determine whether such a model really demonstrates phenomena and whether it will have predic-tive capacities without conduct-ing real or numerical experiments.

Thus it is better to rely on “first principles”, or ab initio, methods based on the Schrödinger equa-tion of quantum mechanics, which is capable of predicting, very spe-cifically, the most stable atomic configurations. They allow the observation of rare events govern-ing growth, such as the movement or dispersion of an atom from one stable site to another.

Energy barriersIn practice, a computing method for the electronic structure, such as BigDFT code (developed since 2005 by the L_Slim laboratory at the CEA in Grenoble) needs to be combined with a highly exhaus-tive research algorithm for the minima and energy barrier, such as MH (Minima Hopping, invented by Stefan Goedecker’s group in

HOW DO ATOMS LINK UP TO FORM MATERIALS? SIMULATING GROWTH AT THE ATOMIC LEVEL REMAINS A MAJOR CHALLENGE. THE GOAL IS TO MASTER THE FLOW OF ATOMS, TEMPERATURE, THE ELECTRIC FIELD… KNOWLEDGE THAT IS VITAL IN THE FIELD OF NANOTECHNOLOGIES.

WATCHING MATERIALS GROW, ONE ATOM AT A TIME

MA

JO

R C

HA

LL

EN

GE

S

researcher at the INAC (Institute for Nanoscience and Cryogenics) at the CEA (Atomic Energy and Alternative Energies Commission) of Grenoble, where he is in charge of the atomistic simulation laboratory.

BY THIERRY DEUTSCH

researcher at the INAC.

AND PASCAL POCHET

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 25: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

25

MA

JO

R C

HA

LL

EN

GE

S

z L’apparition des lamelles

Une déformation, appliquée ici en quelques xxx secondes

sur un alliage de fer et de nickel, induit une transforma-tion martensitique complexe. Les différentes phases s’empi-

lant en grandes lamelles, se structurant elles-mêmes en

bandes.

FIGURE 1

FIGURE 2

Nevertheless, we estimate it will take several months to obtain physical data that can be used in the Kinetic Monte Carlo part.

Currently, we are studying growth of graphene on SiC using a surface ending with silicon (Fig-ure 3). Next, we will examine the growth of a complete sheet of graphene using the carbon atoms present in the final layer of silicon carbide. We hope, in the end, to compare our results with exper-imental data. If these fit, we will have found a mechanism for syn-thesising graphene on silicon carbide. z

zFigure 1 Energy curve representing the path between two minima for a cage of SiC (120 atoms). There is an energy barrier (bump) between the structure on the left (local minimum) and the structure on the right (global minimum).

zFigure 2 Bare surface of SiC ending in silicon (in green) on a yellow plane.The bottom surface ends in carbon (in black). The box outlined with blue lines corresponds to the calculation box containing 700 atoms.

z Figure 3 View above the SiC surface showing a nanosheet consisting in 16 carbon atoms in blue. The red links in the graphene nanosheet are different from the purple ones in the SiC material.

Source: E. Machado-Charry et al., J. Chem. Phys., 135, 034102, 2011.

FIGURE 3

CE

A

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 26: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

26

It is impossible to solve this system precisely. Digital analysis must be used to transform it into a system of «approximate» equa-tions that a supercomputer can solve. Calculations are performed on small areas called grids. The more grids there are, the closer we can come to an accurate solu-tion to the real problem. Practi-cally speaking, dozens, even hun-dreds, of millions of grids are used. This represents billions of un-knowns.

Step-by-step validationTo ensure the computer produces the most realistic representation, validation of results is a two-step process. The first step is valida-tion of parts of the model: a single physical phenomenon, like the mechanical behaviour of a spe-cific material, or a few combined phenomena. To do this, we com-pare the results of the simulations to experiments performed with two tools: the Airix radiographic machine and the Laser mégajoule (LMJ).

Set up since 2000 in Moronvil-liers, in the Champagne-Ardenne region of France, the Airix facility can produce a flash X-ray image of a nuclear weapon without fis-sile matter to confirm the initial, pyrotechnical phase. It simulates the compression of matter using non-radioactive materials with comparable mechanical and ther-mal behaviours. The X-ray images produced during this compres-sion phase are compared to dig-ital simulations. As for the LMJ facility, under construction in Le Barp, near Bordeaux, it will allow us, at the end of 2014, to repro-duce and study the nuclear fusion phase. It will be the equivalent of a wind tunnel in airplane design. The LMJ facility will focus the

equivalent of 240 laser beams on a bead filled with two hydrogen isotopes *, deuterium and tritium, triggering their fusion for several billionths of a second.

A prototype, the LIL (Laser In-tegration Line), consisting in four beams, started operating in March 2002 and has allowed us to con-firm the technological choices of LMJ. The latter, as is already the case for the LIL, will be available for use by the international sci-entific community in the fields of astrophysics, energy, etc.

Finally, the last step consists in global validation. This means comparing the result obtained with software to all the measure-ments recorded during past nuclear tests, notably those conducted in 1996, during the last French pro-gramme. This step is used to pro-duce a computation standard, in other words, real prescriptions for using digital simulations, the trust domain of the simulation at time “t”.

A world firstIn 2001, thanks to the Tera 1 su-percomputer with a computing

To simulate the complete deployment of an atomic weapon without relying on nuclear testing, you

need to come up with a mathe-matical model, solve a series of equations on supercomputers and, finally, confirm the results obtained through laboratory experiments, based on the measurements re-corded during past nuclear tests.

The first step, establishing a model, requires, above all, knowl-edge of the different physical phe-nomena involved and how they are linked (see “H-bomb basics”). The equations that could repro-duce them through computation are already known: Navier-Stokes equations for fluid mechanics, the Boltzmann transport equation for neutrons and diffusion and trans-port equations for the evolution of matter and photons. By com-bining these physical models, we can obtain a system of mathemat-ical equations that faithfully re-produces the workings of a nuclear weapon.

Billions of unknownsThe next step, in order to solve this system of equations, is simu-lation in conditions as close as possible to the real deployment of a nuclear weapon. To achieve this, we need to describe a wide range of particles (neutrons and photons, as well as ions and elec-trons) on three time scales of less than a millionth of a second. And this must be done in states of ex-treme pressure, up to a thousand billion times atmospheric pres-sure! This calculation is incredibly more complex than many other modelling processes, such as me-teorology, due to the brevity of the phenomena in play and the intimate links between physical mechanisms.

HOW CAN WE DESIGN ATOMIC WEAPONS WITHOUT CONDUCTING NUCLEAR TESTS? THE SOLUTION LIES IN COMBINING MODELLING AND SIMULATION. IN THESE FIELDS SUPERCOMPUTERS HAVE PROVEN THEIR UTILITY. AND IN THIS AREA FRANCS IS AHEAD OF THE CURVE.

CALCULATING NUCLEAR DISSUASION

MA

JO

R C

HA

LL

EN

GE

S

Head of the simulation programme in the military applications department of the CEA.

ISOTOPES are atoms of a chemical element with differing numbers of neutrons.

FLOPS (FLoating point Operations Per Second) is a unit of measure for the performance of computers. One teraflop equals one thousand billion operations per second (1012).

H-BOMB BASICSThe first stage in the deployment of a thermonuclear weapon is detonation of an explosive charge (pyrotechnics). In just a few millionths of a second, several thousands of degrees in temperature are reached, triggering fission by compressing the fissile matter (plutonium or uranium) transformed into plasma (ionised gas). This mini nuclear explosion lasts around 100 millionths of a second and raises the temperature by the ten million degrees required to set off the third step, fusion of deuterium or tritium, which are two hydrogen isotopes. The temperature then rises to around a billion degrees for a few millionths of a second.

BY CHARLES LION

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 27: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

27

MA

JO

R C

HA

LL

EN

GE

S

our knowledge, this represents a world first. Missiles on board Mi-rage and Rafale jet fighters have started to be equipped with them in 2009.

With a computing power of 1,000 teraflops, Tera 100 repre-sents a giant step forward in mod-elling capacities. Installed in July 2010, the supercomputer is now fully operational. Three-dimen-sional simulations are available in large numbers, thus allowing us to guarantee the future nuclear warheads of submarines, which

should be in service in 2015. Sim-ulation will allow us to adapt the computation standard to their specific thermal and mechanical environments. The other funda-mental dimension of this research consists in training and certifying future designers of nuclear weap-ons. They must be able to master digital simulations and also un-derstand their limitations.

The combination of unique facilities, such as the Tera world class supercomputers and the LMJ, will also attract young and brilliant engineers, physicists and mathematicians and preserve the necessary skills for our dissuasive missions. z

power of 5 teraflops *, set up at the CEA in Bruyère-le-Chatel, we produced our first computation standard, the first step in guaran-teeing nuclear weapons with a single simulation. In 2005, Tera 10 (50 teraflops) allowed us to de-fine a computation standard that was confirmed through a much vaster set of experiments, with the first 3D simulations. We could guarantee the performance of an air-launched nuclear warhead with a single simulation without organising new nuclear tests. To

z The LMJ facility, currently under construction at Le Barp, near Bordeaux, is a major tool in nuclear weapon simulation. Nuclear fusion will be reproduced on a true scale in this experiment chamber that measures 10 metres in diameter.

CE

A

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 28: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

28

the centre of galaxies, where in-tense starbursts can occur.

One month of simulationSo much for theory. How can we describe these phenomena with precision? Since we cannot mea-sure them, because they occur at astronomical distances and on time scales much longer than a human life...we can simulate the evolution of encounters between galaxies! While this seems simple enough at first glance, it is a real scientific

challenge. To simulate the behav-iour of galaxies, we need to consider the dynamics of their discs on a scale of tens of kiloparsecs (kpc) *, or a distance of 1021 m (a thousand billion billion metres). Regarding the formation of stars and interstel-lar clouds, the scale that needs to be taken into consideration is the parsec, or approximately 1016 m. Moreover, the calculations must combine these two scales. Super-computers have allowed us to start a statistical study of merging galax-

A t the beginning of the Universe, small dwarf galaxies started to form, which would combine to

create increasingly massive gal-axies. The merging of galaxies also plays a role in the redistribution of the angular momentum * be-tween the visible component of rapidly rotating galaxies (gas and stars) and the dark matter, which is like a halo around the visible matter. They create favourable conditions for gas to collapse at

WHAT HAPPENS WHEN TWO GALAXIES COLLIDE? THIS IS NOT MERELY A QUESTION OF CURIOSITY FOR ASTRONOMERS. THE PROCESS OF INTERACTIONS AND FUSIONS BETWEEN GALAXIES ARE KEY PHASES IN THE BIRTH AND FORMATION OF STARS

UNDERSTANDING HOW A STAR IS BORN

DR

BY PAOLA DI MATTEO, FRANÇOISE COMBES AND BENOÎT SEMELIN

astrophysicists at the Paris Observatory.

MA

JO

R C

HA

LL

EN

GE

S

Page 29: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

In our simulation method, the system is modelled by a set of «par-ticles», which are monitored as they change under the effect of the dif-ferent physical processes under consideration. One particle cor-responds to either a cluster of stars or a cloud of interstellar gas. The physical processes are essentially gravitational forces exerted between all the particles, but also the pres-sure of gas, viscosity, shock waves, the formation of stars from gas and the ejection of gas by stars at the end of their life cycle.

To obtain a spatial resolution of 50 parsecs, the particles’ mass must be 5,000 times that of the Sun and 30 million particles are required to model the merger of two large Milky Way type galaxies. To assess the gravitational forces between N particles, the simplest algorithms require N2 calculations of interac-tions. At the cost of an approxima-tion, the algorithm we used, called a “tree” and developed in the 1980s, can be used to perform the calcu-lation with a scale of operation pro-portional to N 1n(N) operations. Without this significant gain, these simulations would be impossible.

Hydrodynamic forcesHowever, gravitational forces are not the only phenomena at work. We must consider the hydrody-namic forces (pressure and vis-cosity) that characterise the dy-namics of galaxies. We calculated these using the SPH method (Smooth Particle Hydrodynam-ics), developed at the end of the 1970s, which consists in repre-senting the fluid as a multitude of tiny components covering it.

To monitor not only the rota-tion of galaxies, but also the for-mation of small dense structures during mergers (molecular clouds whose cycle of dynamic evolution is much shorter), we assessed forces every 250,000 years, or 8,000 times during the 2 billion years covered by a simulation. We can estimate this corresponds to a total of around 10,000 billion calculations of in-teractions between particles dur-ing a single simulation.

If we had only one processor to work with, we would have needed several years to perform these cal-culations. There was only one solu-tion: use several arithmetic units for each simulation. In the context of the “Grand Challenge” on the

Curie supercomputer, and thanks to parallelisation of the code (OpenMP), each simulation was launched on 32 cores and used more than 50 Gb of RAM for each node. The strength of these simulations – which are still in progress and have just started to yield viable data – is to offer results with a previously un-heard of level of spatial resolution. In an initial state, this has allowed us to know where, and how efficiently, gases are transformed into stars.

Our vision of galactic dynam-ics will radically change. Indeed, the nuclear energy of stars is not only “radiated”, but also trans-formed into kinetic energy, through flows of ejected gas. At low reso-lution, the formation of stars can only be treated in a semi-analyt-ical way, by using a probability proportional to the density of the gas present. However, this is not satisfactory as the instabilities of galactic disks are more subtle.

A different global structure?Our hope is to treat interstellar gas with greater precision, by consid-ering its dispersion via radiation, collisions between clouds, the shock waves formed, the macroscopic turbulence generated and its vis-cosity. If we take these different physical parameters into account very precisely, the global structure could be completely transformed. First, because the energy re-injected in the stellar environment by the stars, stellar winds and supernova explosions may go undetected if the scale considered is lower than the resolution. However, if simula-tions do not overlook them, this energy transferred within the in-terstellar environment completely changes its dynamics and could prevent the formation of stars.

Above all, the interstellar envi-ronment has several phases, with very different densities and more than 10 differences in scale between the densest clouds and the diffuse environment. At low resolution, these different phases can be sim-ulated by different components, with exchanges of mass between phases, calibrated in a semi-ana-lytical manner. At high-resolution we can hope to directly recreate several phases, using the natural instabilities of the environment, and therefore treat full-fledged mo-lecular clouds, without introducing them artificially. z

29ies, by conducting approximately fifty simulations, with a spatial res-olution of around 50 parsecs. Prac-tically speaking, we performed cal-culations for over a month on 1,550 cores of the Curie supercomputer, set up at the CCRT (Centre de calcul recherche et technologie - Research and Technology Computing Cen-tre) of the CEA.

Modelling mergersTherefore we are modelling merg-ers between galaxies of the same

size – called “major mergers” – as well as the accretion of satellites by a Milky Way type galaxy - or “minor mergers”. The results of this violent transformation can then be compared with the evo-lution of slower internal processes, called secular evolution. The final goal is to understand, in particu-lar, the redistribution of angular momentum between visible and dark matter, the intensity of star-bursts, or the morphological prop-erties of merger residues.

z Distribution of gas after the first encounter between two galaxies of the same size (major merger). When two galaxies pass near each other, some matter may be ejected far outside the disk, forming structures called a “tidal tail” that can span several hundred kiloparsecs.

ANGULAR MOMENTUMA vector parallel to the axis of rotation whose amplitude represents the number of rotations of a system.

PARSECParallax of one second, a unit of length used in astronomy that equals 3.26 light years, or 206,265 astronomical units (Earth-Sun distance)

MA

JO

R C

HA

LL

EN

GE

S

Page 30: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

30

practical cases. For example, the transition from an elastic to a plastic state: above a certain thresh-old pressure, propagation of a shock causes the creation of ir-reversible structural flaws.

Because it significantly affects the mechanical properties of the material, plasticity requires very precise modelling, based on a description of underlying elemen-tary mechanisms.

Plasticity and ruptureWe were particularly interested in studying the plasticity of dia-monds. This is a real challenge because it involves not only tak-ing into account the complexity of the process, but also setting up an interaction model capable of reproducing the many differ-ent states of carbon (diamond, graphite, etc). Stamp code is ca-pable of describing carbon very precisely, but the trade off is a significant computation time. We used it to simulate the propaga-tion of shocks with various inten-sities on the principal crystallo-graphic orientations of a diamond (Figure 1).

Thus, no less than 1.3 million carbon atoms were considered for a calculation performed on 6,400 cores of the Tera 100 super-computer. The results? We ob-served multiple structural defects, signifying the appearance of plas-ticity. This highly detailed mea-surement of the materialís state will be an invaluable tool in con-structing a model that can be used with a code working on a macro-scopic scale.

During its propagation, a shock encounters interfaces between materials and, in particular, so-called free surfaces forming the frontiers of the system with am-bient air. However, when it is re-flected on a free surface, a shock

can cause different phenomena that depend on its intensity, as well as the local topology of the surface, including its roughness. These phenomena can signifi-cantly affect the properties of a material and damage elements in its environment such as, for example, a measurement system.

This is the case of rupture by scaling. This phenomenon occurs when an expansion wave (or de-compression) produced when the reflection of a shock on a free sur-face encounters the one following the initial shock. This causes sud-den tensioning of the environ-ment locally, which can lead to a rupture of the material. Scaling is currently the object of studies associating large-scale simula-tions of molecular dynamics and specific experiments.

Ejection of matterAnother phenomenon caused by shocks is the projection of mat-ter. This is caused by a defect in the flatness of a free surface, such as, for example, a scratch pro-duced by a machining tool. De-pending on the type of defect, the matter ejected can take on the form of small aggregates, jets, etc. If we want to protect the surround-ing environment (measurement apparatus, coatings on experi-ment chambers, etc) we need to control ejection mechanisms. The conditions for forming a jet, for example, depend on relatively well-known hydrodynamic con-ditions. Its ulterior behaviour, and particularly its fragmentation, is more difficult to apprehend. Once again, we can analyse the prob-lem in an extremely precise man-ner but on a reduced spatial scale – thanks to a digital experiment in molecular dynamics. Thus, we simulated the formation of a jet by considering 125 million cop-

What happens, on a molecular level, when a material withstands an in-

tense shock after, for example, an explosion or violent colli-sion? A shock can be seen as an extremely rapid compression – lasting 1011s and an intense one 1010 Pascal, or 105 atmospheres or more. It is still impossible to observe experimentally, with precision and in real time, what occurs on an atomic scale. We do know that it involves complex phenomena: plasticity (irre-versible deformation of the ma-terial), damage (cracks, breaks), chemical decomposition, etc. To get a more precise idea, one so-lution consists in simulations on petaflopic computers.

Molecular dynamics is a widely used simulation method. This consists in solving equations for the movement of a set of particles (atoms, molecules) interacting under a predefined force. In re-cent years, computers have al-lowed us to conduct phenomeno-logical studies of small systems – a few million atoms or sizes of a few millionths of a micron.

With the Tera 100 supercom-puter, we can now study systems with dimensions of nearly a mi-cron and that can be compared with experiments, while using forces that more closely describe the complexity of interactions be-tween atoms.

To successfully perform these simulations, we needed to rethink software programmes so that they take full advantage of the com-puting power of massive parallel systems. This is the case of Stamp code, developed by the military applications department of the CEA (CEA/DAM) over the last fif-teen years. Thanks to this code we were able to simulate some

MECHANICS AFFECTING MATERIALS DURING INTENSE SHOCKS ARE STILL NOT FULLY UNDERSTOOD. TO STUDY THEM, WE NEED TO DESCEND TO ATOMIC LEVEL. IRREVERSIBLE DEFORMATIONS, DAMAGE, DECOMPOSITIONÖHIGH-LEVEL SIMULATIONS ARE REQUIRED TO UNDERSTAND THE PHYSICS OF SHOCKS.

THE PHYSICS OF SHOCKS ON AN ATOMIC SCALE

MA

JO

R C

HA

LL

EN

GE

S

engineers in the military applications department of the CEA.

BY LAURENT SOULARD, NICOLAS PINEAU, OLIVIER DURAND AND JEAN-BERNARD MAILLET

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 31: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

ing from a few microns to several millimetres, depending on the nature of the explosive. However, simulations of chemical reactions, which necessarily require taking electrons into account, are lim-ited to nanometric scales.

Revisiting modelsHow can we link these two scales? With a model that simulates the evolution of a set of “super par-ticles” representing one or more molecules with their own internal dynamics. The dynamics of this system is dictated by an exten-sion of traditional Hamiltonian

31

per atoms thanks to computa-tions performed by 4,000 cores on Tera 100 (Figure 2). As with our study of plasticity, this data will enable us to construct models on a macroscopic scale.

Finally, if the shock is propa-gated in a material that is also the seat of chemical reactions, we can expect a state of detonation, i.e. the propagation of a chemical reaction by the shock wave. We can understand this phenomenon by studying the molecular mech-anisms behind it. The chemical decomposition of the explosive is produced over a thickness rang-

z Figure 1. Propagation of a shock wave in the single crystal of a diamond. Behind the shock front (white line), multiple structural defects show the appearance of plasticity (1 ps equals 1 picosecond, or 10-12 second).

CE

A

mechanics. They therefore remain compatible with a molecular dy-namics code. Simulations of ni-tromethane (CH3NO2) with 8,000 cores on Tera 100 demonstrated that the appearance of a detona-tion is a much more complex pro-cess than suggested by experi-mental observations.

Simulation can be used to revisit macroscopic models. It completes experiments so we can understand these complex physics. Intimately linked to the power of computers, its impor-tance can only grow over the coming years. z

z Figure 2. Formation of a jet caused by the reflection of a shock on a surface with a defect in flatness. A total of 125 million copper atoms and 4,000 processor cores on Tera 100 were required for this calculation.

MA

JO

R C

HA

LL

EN

GE

S

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 32: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

32

Research engineer at the CEA.

BY CHRISTOPHE DENOUAL

CE

A

the simulation time, limits the correlation between different points in space. In practice, in-creasing the size of the calculation box beyond this “maximal cor-relation distance” contributes nothing more if the duration of the simulation is not increased proportionally. However, while the power of computers has in-creased with the multiplication of processor cores, the power of each core has not really evolved.

Today, as was the case ten years ago, it is very difficult in practice to exceed a million computing cycles, with each one correspond-ing to a length in time of one fem-tosecond (one millionth of one billionth of a second), or a total simulation time of one nanosec-ond. This limited duration cor-responds to a maximal correlation distance of only a few micrometres and restricts the emergence of larger microstructures.

The largest simulations find themselves up against this “wall of time”. It can be overcome by chang-ing the representation, by grouping together several atoms in a complex molecule, for example, to optimise calculations. In the same spirit, we opted in 2010 for a method that al-lows for a radical change in the “granularity” of the representation of matter by forming groups of 10, 100 or 1,000 atoms, while ensuring a certain equivalency between en-ergy in this set and the atomic (and detailed) underlying representation.

This approach is based on a particularly compact representa-tion of the energy landscape in-volved in phase transformations. Each of the stable states is repre-sented by a minimum of energy, the level of which is estimated (before calculations) by a model-ling technique at atomic scale. During transformation, the ma-

I t was in 1970 that we first dis-covered metal alloys had sur-prising properties: when sub-jected to weak tensile stress,

they can stretch to a certain point and then, when “released”, return to their initial shape. These com-plex alloys – often nickel and tita-nium based –, and whose behaviour is often qualified as “pseudo-elas-tic” have rapidly been used for many applications, like frames for eyeglasses or surgical stents.

The mechanism behind such major deformations is called “mar-tensitic transformation” (Figure 1): this involves a modification in the crystal structure after a change in the environment, which can be either an applied strain, pres-sure or change in temperature. In metals, these transformations are extremely commonplace and are the object of intense research. While they rarely allow such spec-tacular behaviours as pseudo-elasticity, they can induce, under extreme strain, significant defor-mations, and play a fundamental role in the behaviour of metals.

Our understanding of these mechanisms on an atomic level has grown a great deal, thanks to modelling martensitic transitions using molecular dynamic shock simulations conducted with mil-lions of atoms. For prolonged pe-riods of stress, the final microstruc-ture becomes very complex and stretches over broad areas of space, often up to one millimetre. Can molecular dynamics simulate mar-tensitic transformations on such broad fields?

The wall of TimeDuring such a simulation, equi-libriums are established via the propagation of deformation waves. The maximum distance covered by these waves, proportional to

FOR AROUND FORTY YEARS NOW METAL ALLOYS HAVE REVEALED ASTONISHING MECHANICAL PROPERTIES: THEY CAN UNDERGO MAJOR DEFORMATIONS BEFORE RETURNING TO THEIR INITIAL SHAPE. REACHING AN UNDERSTANDING OF THIS PHENOMENON ON THE ATOMIC LEVEL IS ONLY POSSIBLE WITH HUGE COMPUTING POWER.

MARTENSITIC DEFORMATIONS SEEN THROUGH THE PROCESSOR PRISM

MA

JO

R C

HA

LL

EN

GE

S

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 33: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

33

MA

JO

R C

HA

LL

EN

GE

S

z Figure 2. Appearance of lamella: A deformation, applied here in 1 microsecond on an iron and nickel alloy, induces a complex martensitic transformation. These different phases stack large lamella, which in turn form bands.

CE

A

terial shifts from a stable state (or well) to another via a “reaction pathway”, i.e. a series of states that allow for a smooth transition be-tween two wells. This reaction pathway tree (Figure 1) is dupli-cated for all the cells in the calcu-lation, each minimising the energy as much as possible.

The reaction pathway treeLet us take, for example, calcula-tions for an iron and nickel alloy (Figure 2). Conducted with Tera 100 on more than 4,500 proces-sors, it has allowed us to achieve cubes with sides measuring 0.5 microns * for a simulation time of 1 microsecond. The microstruc-

tures that emerge from these cal-culations show alternating lamella made up of different variants of martensites with very fl at edges. On a larger scale, these also stack to form relatively straight bands, which are in turn contained within a broader corridor.

To obtain this three-level nested structure, it is important to guar-antee excellent resolution – here each cell represents around 100 atoms –, large calculation boxes and suffi ciently long computing times for the larger structures to emerge. Only coarse grain computing, con-ducted by thousands of cores, can achieve these levels of resolution and scales in space and time.

The results we have obtained today are based on a reaction path-way tree calculated using simplifi ed atomic potentials. Yet, in reality, these transformations are induced by a re-composition of the atoms’ electronic structure. However, on the scale of an individual atom, these transformations are described ex-clusively via a quantum approach. Simulating them requires use of a different method: ab initio calcula-tions of electronic structures. These are also highly complex and greedy in computing time. They also ben-efi t from the massive parallelisation typical of high performance com-puting. Thus we obtain a reliable estimation of reaction pathways.

By combining these two ap-proaches, we will soon have a uni-fi ed vision of a microstructure from a scale of one millimetre to the scale of one atom. These trans-formations, which are sometimes so fast they do not show up in the most precise diagnostics, can therefore be analysed in detail and the mystery of how they are formed can be partially solved. z

z Figure 1. Martensitic trans-formations: Iron-nickel alloys can easily shift from a face-centred cubic structure (upper left) to a body-centred cubic structure (upper right), pro-viding there is homogenous deformation of the cell (white frame) The phase transfor-mations are represented as lines (in grey) connecting two stable states (coloured dots). The space in the tree (represented here simply in 2 dimensions) is generally 9-dimensional..

MICRON Or micrometre, equals one thousandth of a millimetre (10-6 m)

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 34: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

CPU (Central Processing Units) distributed in a string along a single specific dimension, are becoming obsolete for this type of calculation.

Describing how plasma is formedIndeed, simulating a laser pulse less than one millimetre in di-ameter would monopolise 128 CPUs for several months. With centimetre-sized beams like LMJ (Laser Mégajoule), every opera-tion would have to be multiplied by 10,000! Moreover, in addition to optical mechanisms, it is nec-essary to simulate the deteriora-tion of the material. In practical terms, this means calculating the formation of plasma, during a few femtoseconds, when the la-ser’s intensity reaches several dozen TW/cm2 *.

The solution lies in the use of another type of processor: graphic processors, or GPU (Graphic Pro-cessing Units). Originally designed for video games, recent GPUs of-

fer high processing power, thanks to several thousand arithmetic units that can function simulta-neously.

The programmer’s art consists in organising all these resources by expressing his algorithm with the help of thousands of light-weight processors within a special system called a “computational grid”. Gains in speed for the GPUs, compared to their precursor, the CPU, can represent a factor of 50. They can solve systems of non-linear equations in just a few hours, a process that used to take several days only a year ago.

Today, GPUs offer the possi-bility of simulating laser pulses lasting 3 nanoseconds, with a resolution of 30 femtoseconds (or five orders of magnitude on one dimension), in less than a week. Our next challenge? Reaching the femtosecond to describe the for-mation of plasma. Using graphics processors to visualise light...what could be more natural? z

THE FIRST LASER BEAMS WERE PRODUCED MORE THAN FIFTY YEARS AGO! HOWEVER, THE BEHAVIOUR OF THIS LIGHT, AS IT PASSES THROUGH VARIOUS MATERIALS, STILL RAISES MANY QUESTIONS. NON-LINEAR OPTICS, SELF-FOCUSING...THANKS TO SUPERCOMPUTERS, THE LASER BEAM STILL HAS MANY SURPRISES IN STORE FOR US.

USING GRAPHICS PROCESSORS TO VISUALISE LIGHT

It is one of the basic lessons of “linear” optics: light behaves differently in different media. Each material has a specific

refractive index. Yet, this principle has been challenged since the 1960s with the advent of laser sources: the refractive index of transparent matter – gases, glass, etc – can depend on the intensity of the light source. The study of this phenomenon is called “non-linear optics” (NLO).

In certain conditions, for ex-ample when the power output of a laser beam is greater than a threshold value, the index of the medium increases continually along the optical path. Conse-quently, the laser pulse focuses like a magnifying glass or a contact lens. This is called “self-focusing”.

Obsolete architectureThis strange property is encoun-tered in experiments using mod-erate-energy laser sources – of just a few millijoules – but with pulses lasting a femtosecond *. It also can be observed of we use high-energy sources (a dozen kilojoules), but longer pulses, of one nanosec-ond *. This is the case, for example, of high-power laser installations, where self-focusing can be ob-served in silica glasses and leads to a fragmentation of the optical pulse into a multitude of micro-metric filaments with an intensity one thousand times greater than the incident wave.

To describe these non-linear dynamics, we can conduct digi-tal simulations with high-per-formance supercomputers, such as Titane or Tera 100, located at the CEA centre in Bruyères-le-Châtel. However, traditional par-allel computing techniques, which consist in calculating the laser field simultaneously on different

FEMTOSECONDA nanosecond (1 ns) equals 109 seconds. A picosecond (1 ps) equals 1012 seconds. A femstosecond (1 fs) equals 1015 seconds.

TERAWATT (TW)Equals one trillion (1012) watts.

10

5

x [mm] t [fs]

0.2

-0.20

-5000

500

Intensity [TW/cm²]

head of research in the theoretical and applied physics department at the CEA centre in Bruyères- le-Châtel.

BY LUC BERGÉ

research engineer in the simulation and information sciences department at the CEA centre in Bruyères- le-Châtel.

AND GUILLAUME COLIN DE VERDIÈRE

z In this 3D simulation of a “small” laser beam, lasting a picosecond and 0.5 mm in diameter, we can observe that the pulse is initially homogeneous, but then breaks up under the effect of self-focusing into a multitude of highly intense filaments, each with a micrometric diameter and duration of several femtoseconds.

34

MA

JO

R C

HA

LL

EN

GE

S

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 35: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

INCREASING THE POWER OF SUPERCOMPUTERS WILL INVOLVE DRASTICALLY IMPROVING ENERGY EFFICIENCY. MEMORIES AND PROCESSORS THAT ARE LESS GREEDY, MASSIVE PARALLEL ARCHITECTURES, OPTIMISED SOFTWARE, COOLING SYSTEMS:RESEARCHERS ARE EXAMINING EVERY POSSIBLE LEAD TO REDUCE THEIR VORACITY.

THE NEXT CHALLENGE: CONTROLLING ENERGY CONSUMPTION

sumes nearly 10 megawatts when running at full capacity, or the an-nual consumption (heating ex-cluded) of 5,000 households!

Thanks to all this energy, it can perform 8.16 million billion op-erations per second, or 8.16 pet-aflops * according to the Linpack benchmark test *, placing in the n° 1 slot of the international TOP 500, an international ranking of supercomputers, which, while sometimes challenged, remains widely used.

In absolute terms, the amount of electricity consumed by Super K is impressive, but, paradoxically, it is also one of the most energy

High performance com-puting is an Olympic undertaking. To set new records, the “athletes”

in the field have one priority to-day: reducing their consumption...of electricity. The voracity of these machines has indeed become a major drawback to the develop-ment of more powerful super-computers.

Currently, they require huge amounts of electricity. The world champion since last June, the Japanese system baptised Super K, co-developed by Fujitsu and the RIKEN Advanced Institute of Com-putational Science in Kobe, con-

efficient machines of its kind. Its “performance per watt” is 824 megaflops * (8.16 petaflops per 9.9 megawatts), while the average energy efficiency of the top ten supercomputers is only 463 mega-flops per watt. This good perfor-mance is still insufficient to en-visage building a new machine capable of the goal set in 2009 by key players in the field: breaking the exaflops * barrier, or reaching a billion billion operations per sec-ond. “We reckon that at the start of 2012 the most powerful super-computers will deliver 10 petaflops while consuming 10 megawatts, or around one petaflops per >>>

35

z Equipped with 68,544 processors, Super K is the world’s most powerful supercomputer, according to benchmark tests last June.It is also one of the most efficient in terms of energy consumption.

THE FUTURE: EXASCALE COMPUTINGF

UJI

TSU

TH

E F

UTU

RE

: E

XA

SC

AL

E C

OM

PU

TIN

G

FLOPS(Floating point Operations Per Second) is a unit of measure for the performance of computers. One teraflops equals one thousand billion operations per second (1012) and one exaflops equals one billion billion operations per second (1018).

LINPACKA benchmark test used to measure the time required for a computer to solve a set of linear equations of n equations with n unknowns.

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 36: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

open or close simultaneously with each cycle of operations. They need a phenomenal amount of energy – a large share of which is dissipated, moreover, in the form of heat (Joule effect) – which en-tails installing cooling systems that also consume large amounts of energy.

More than 500,000 coresOver the last thirty years, compo-nent manufacturers (Intel, IBM, AMD, Fujitsu, Nvidia, etc) have increased the finesse of etching, or the diameter of the smallest

wire linking two components in a circuit. They have also developed alloys that have increased the com-mutation frequencies of transis-tors while limiting chip voltages. But the heat given off by these

>>> megawatt, explains Franck Cappello, co-Director of the Joint Laboratory for Petascale Comput-ing formed by INRIA and Univer-sity of Illinois Urbana-Champagne (USA) and a specialist in the race to break the exaflops barrier. By extrapolating these results, an exa-flopic supercomputer would con-sumer 1,000 megawatts. This is unacceptable.”

Even if demand for comput-ers capable of exaflopic perfor-mance is strong, the cost of the energy required is not economi-cally sustainable. For a supercom-puter «burning» up to 1,000 mega-watts at full capacity – or as much as a space shuttle during lift off – the annual utilities bill would ex-ceed 500 million in France, or around twice the cost of the su-percomputer itself (between 200 and 300 million). “The goal is to design an exaflopic supercomputer by 2018 that will consume only 20 megawatts, as this corresponds to the maximum capacity of infra-structures required to host such a machine, explains Franck Cap-pello. However, there are those who believe it will be difficult to remain under 50 megawatts.”

With a goal of 20 megawatts, the energy efficiency of an exaf-lopic supercomputer would be 50,000 megaflops per watt! Is it possible to multiply the power of current supercomputers by 100 and still improve their energy ef-ficiency by 50 fold? To meet this challenge, researchers are explor-ing all possible solutions. The first area of improvement consists in developing microprocessors and memories that use less energy. To perform billions of operations per second, supercomputers need more and more processors and memories that contain a multi-tude of transistors, each requiring an electrical current for them to

36

RANK NAME MANUFACTURER SITE/COUNTRY POWER

(IN PETAFLOPS)ENERGY CONSUMPTION (MEGAWATTS)

ENERGY EFFICIENCY (MEGAFLOPS / W)

1 Super K Fujitsu Japan 8.16 9.9 824.6

2 Tianhe-1A NUDT China 2.57 4.04 635.1

3 Jaguar Cray United States 1.75 6.95 253.1

4 Nebulae Dawning China 1.27 2.58 492.6

5 Tsubame 2.0 NEC/HP Japan 1.19 1.4 852.3

6 Cielo Cray United States 1.11 3.98 278.9

7 Pleiades SGI United States 1.09 4.10 265.2

8 Hopper Cray United States 1.05 2.91 362.2

9 Tera 100 Bull France 1.05 4.59 228.8

10 Roadrunner IBM United States 1.04 2.35 444.3

THE ENERGY EFFICIENCY OF THE WORLD'S TOP TEN SUPERCOMPUTERS

TH

E F

UTU

RE

: E

XA

SC

AL

E C

OM

PU

TIN

G

“If we reduce the voltage by half, we need to multiply the number of cores by four to maintain the same level of performance.”

z This prototype silicon circuit, designed by Intel, produces a laser beam that can be used to exchange up to 50 gigabits of data per second – a boon for supercomputer manufacturers.

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 37: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

This is a major area for reducing electricity bills. “In 2018 super-computers will have hundreds of millions of cores, compared to 300,000 today” predicts Franck Cappello.

Greedier than everAnother lever for increasing en-ergy effi ciency consists in improv-ing memories and communica-tion links between components. Currently, transferring data be-tween processors and memories consumes more energy than pro-cessing itself! Once again, design-ers are counting on component manufacturers to innovate.

The fi rst of such innovations consists in implementing optical links instead of copper wires to

connect cabinets and circuits boards, as well as components on chips by replacing certain path-ways in printed circuits.

By using photons, rather then electrons, to transfer data, pho-tonic links created by silicon chips emitting/receiving a laser beam have reached very high outputs in laboratory tests (more than 50 Gb/sec) and could reduce energy consumption by 10 fold. “Imple-mentation of an end-to-end opti-cal communication network will represent a major technological breakthrough in the race to break the exafl ops barrier”, underlines Patrick Demichel, a systems archi-tect specialised in intensive com-puting at Hewlett-Packard. Man-ufacturers are developing >>>

increasingly small surfaces had already reached unacceptable levels in 2004 and manufacturers were forced to limit chip frequen-cies, which now rarely exceed 4 gigahertz (GHz). Inside a super-computer they generally oscillate between 1 and 3 GHz.

Speed could not be increased, so progress in miniaturisation was used to serve parallelisation. Each microprocessor now contains sev-eral cores, processing units ca-pable of working alone or in par-allel. Super K uses, for example, 68,544 SPARC64 VIIIfx processors with 8 cores, made with a 45 nano-metre (nm) process and running at 2 GHz, representing a total of 548,352 cores. And this is just the start. “We predict that in 2018 chips will be etched in 8 nm and proces-sors will contain more than 1,000 cores”, forecasts Franck Cappello.

The proliferation of cores can improve energy effi ciency by re-ducing chip voltage, a technique called “voltage scaling”. By reduc-ing a processor’s voltage, consump-tion is reduced, but performance is also diminished and more cores are required to compensate for this. If we reduce the voltage by half, we need to multiply the num-ber of cores by four to maintain the same level of performance.

37

CO-DESIGN TO THE RESCUEConsidering the energy performance of current general-purpose supercomputers, some researchers believe we will need to consider co-design in order to develop an exafl opic machine that consumes less than 50 megawatts. Co-design consists in creating hardware architectures according to the application they will run and therefore creating specialised supercomputers offering the best performance and/or energy effi ciency. It is highly probable that the fi rst exafl opic supercomputer will be a specialised machine. The drawback, of course, is that it may be too specialised.

z The Green Flash project aims to design a climate simulator capable of running at 200 petafl ops and on only 4 megawatts, or an energy effi ciency of 50,000 megafl ops per watt. To achieve this, it will deploy a massive parallel architecture, based on 20 million cores and created specifi cally for its target application. A prototype has demonstrated the relevance of this approach at the end of 2010, but Green Flash still lacks the necessary funding, estimated at $75 million, to construct the entire project.

TH

E F

UTU

RE

: E

XA

SC

AL

E C

OM

PU

TIN

G

2010

JE

FF

RE

Y T

SEN

G @

IN

TO

UC

HST

UD

IOS.

CO

M

DAV

ID R

AN

DA

LL, U

NIV

ER

SIT

É D

U C

OLO

RA

DO

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 38: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

38

experts estimate that consump-tion, excluding cooling systems, for an exaflopic supercomputer could reach 150 megawatts in 2018. Its processors would consume 50 megawatts, the memory 80 mega-watts and the network 20 mega-watts. Then we would have to add the consumption of the cooling system, or an additional 50 to 70%, for a total consumption in excess of 200 megawatts. This is still much too high.

Graphics processing unitsIf we want to make further prog-ress, the supercomputer’s archi-tecture will be a determining fac-tor. Designers are examining the nature of the processing units inside supercomputers.

There are two competing trends. Some favour machines using a single model of general-purpose processor – like Super K – while

others combine general-purpose processors (CPU or Central Pro-cessing Unit) and Graphics Pro-cessing Units (GPU *) within “hy-brid” machines – like the second most powerful supercomputer in the world, Tianhe-1A, located in Tianjin, China, capable of reach-ing 2.5 petaflops with 4 megawatts.

Used in this case for perform-ing calculations instead of dis-playing graphics, GPUs serve as an accelerator for certain appli-cations and improve overall en-ergy efficiency. In addition to their specialisation, their current weak-ness is that they do not commu-nicate rapidly with other GPUs because they need the CPUs to act as intermediaries. For appli-cations where processors com-municate a great deal with each other, this approach is not relevant. Yet, this situation could evolve. “In the end, we will probably no

>>> memory chips that enable 3D-stacking thanks to vertical communication interfaces. It will therefore be possible to stack them above processors and reduce dis-tances between components. But it is above all the advent of non-volatile memories, such as phase-change memories * or memris-tors * that will considerably reduce electricity consumption.

Unlike current DRAM, non-volatile memories do not require continual power. “If they are suf-ficiently efficient, they could allow us to change the architecture of machines and simplify fault tol-erance systems, which would have fewer backups to perform on a hard drive or SSD * and require only a few control micropoints”, explains Franck Cappello.

If we look at technological changes in components alone, without considering architectures,

38

COOLING SYSTEMSARE THE FOCUS OFSPECIAL ATTENTION Cooling systems, which ensure supercomputers run smoothly, generally represent 50 to 75% of energy consumption. Designers are therefore developing all sorts of innovations to make them more efficient. A top priority for manufacturers is “free-cooling” or systems where air circulates without using a heat pump. Whenever possible, it is best to select a site in a cool region, but free-cooling is a goal that remains out of reach for an exaflopic supercomputer. Manufacturers are therefore trying to find complementary solutions that use as little energy as possible. For example, cabinet doors that contain processor clusters can also contain ice-water circuits. Certain manufacturers are producing printed circuit boards that also have cooling circuits and IBM is even developing chips covered with micro-channels filled with coolant. Despite these efforts, some experts believe that the owners of supercomputers should sell the heat recovered to achieve a more balanced economic model. This is an attractive idea, but difficult to put into practice.

z The cabinet doors of the most powerful French supercomputer, Tera 100 (ranked 9th worldwide), built by Bull, contain a heat exchanger, fans and an ice-water cooling circuit linked to a suspended ceiling that is 1 km long. Each door dissipates 40 kW per cabinet.

CE

A

TH

E F

UTU

RE

: E

XA

SC

AL

E C

OM

PU

TIN

G

PHASE-CHANGE MEMORYPhase-change memories (or PRAM for Phase-Change Random Access Memory) record data in vitreous matter that changes states when an electric current is applied. They do not need to be powered continuously.

MEMRISTORA memristor is a passive electronic component that changes electrical resistance constantly under the effect of an electrical charge. It is used to design RRAM or ReRam memories (Resistive Random Access Memory) that do not need to be powered continuously.

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

Page 39: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

will be finding the right program-ming methods and new mathe-matical models, predicts Serge Petiton, head of the MAP team (Méthodologie et algorithmique parallèle pour le calcul scientifique) at the LIFL (the Computer Science Laboratory of Lille 1 University). I think we’re going to find ourselves up against a wall when we hit 100 petaflops and we’ll absolutely need to consider new paradigms.”

Already today it is more and more difficult to parallelise cal-culations, i.e. to divide them into sub-calculations executed by pro-cessor cores. Software must also focus on locality, thus limiting movements of data between pro-cessors. This has led to the advent of new algorithmic disciplines such as the “Communication Avoiding Algorithm”, whose aim is to allow processors to work in-dependently whenever possible.

Since it is never possible to con-tinually achieve perfect parallelism, software must now be capable of reaching unused hardware resources. This role has already been allocated to operating systems and certain hardware like the processor, which can slow down or disable certain cores. Also applications are being designed to save energy. “The pro-grammer now has three criteria he needs to manage: the number of it-erations, the duration of each itera-tion and the overall energy associ-ated with the execution of these iterations, explains Serge Petiton. And, depending on the need, these algorithms will be used to reduce computation times or energy bills.”

With hundreds of thousands of processors, it is more and more difficult to predict the best com-putation method to reduce energy consumption. According to Serge Petiton “This is not a determinist problem. This is why we are imple-menting autotuning techniques, which consist in automatically up-dating computation parameters in real time in order to reduce en-ergy consumption.” Optimising operating systems, languages, com-pilers and applications could rep-resent 10 to 20% in energy savings.

Implications for the general publicThe gains obtained could be signi-ficant, but the complexity of compu-ter architectures will require reinfor-cing fault tolerance (error correction systems, registry backups...). “We estimate this will consume more en-ergy and, in the end, fault tolerance mechanisms should drain a third of the energy burned by a supercom-puter, compared to 20% today”, imag-ines Franck Cappello.

The researcher’s ingenuity is vital in finding solutions for all these problems. Hence Serge Peti-ton estimates that “in order to de-sign an exaflopic supercomputer, we will need cross-disciplinary teams with hundreds of people trained in energy issues; it will be a real challenge in terms of recruit-ment and training.”

However, the results will be worth it. By developing exaflopic machines, a multitude of problems affecting society in general can be solved, such as climate forecasts or drug screenings. In the same way inno-vations in Formula 1 racing have changed cars for the “common man”, progress in supercomputers should significantly reduce the energy con-sumption of PCs and consumer electronics. So bring on the chal-lenges! z CONSTANTIN LENSEELE

longer make the distinction be-tween CPU and GPU because pro-cessors will contain both general-purpose and specialised cores that will serve as accelerators”, claims Franck Cappello.

“Considering the complexity of future supercomputers, tomorrow’s software will play a vital role. In the race to break the exaflops bar-rier, the main scientific problem

39

SSDAn SSD disc (Solid-State Drive) is a data storage device made up of Flash memories.

GPUA Graphics Processing Unit is a processor dedicated to calculating displays. It can perform several tasks at the same time.

“In the race to break the exaflops barrier, the main scientific problem will be finding the right programming methods and new mathematical models”

TH

E F

UTU

RE

: E

XA

SC

AL

E C

OM

PU

TIN

G

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 40: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

>> to fulfill needs of- CEA High End Simulations- Large scientific and industrial projects- Great European research programs>> to mutualize- Expertise in HPC technologies- R&D Efforts- Infrastructures

>> to manage- The access of key technologies- The complexity of high performance computing equipments

www.cea.fr

The CEA Scientific computing complex

design, development, guarantee and mainte-

stockpile.

2d and 3rd

generation - 4th generation of reactors

to refine climate warming, estimate of future change.

>>>TGCC is a new «green infrastructure» for high computing performance, able to host petascale supercomputers. This supercomputing center has been planned to welcome: the first French Petascale machine for the Prace project and the next generation of Computing Center for Research and Technology (CCRT).

TGCC

Tmaterials science, astrophysics, nuclear physics, aeronautics, climatology, meteorology, theoretical physics, quantum mechanics, biology, chemistry, technological research, etc.

Some applications of simulation :

High performance simulations of plasma turbulence, key physics issues for ITER.

Page 41: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

A COMPUTER WITH 1 MILLION PROCESSING UNITS, EACH ONE FAILING ONLY ONCE EVERY THOUSAND YEARS, WOULD STILL PRODUCE ONE ERROR PER MINUTE! IN THE RUN-UP TO EXASCALE, THE PROBLEM OF FAILURE TOLERANCE NEEDS TO BE RESOLVED WITH PROTOCOLS FOR BACKUPS OF EXECUTION STATES AND FAILURE AVOIDANCE.

CORRECTING ERRORS IS A TOP PRIORITY

THE CHALLENGE FOR DIGITAL LIBRARIESBY LUC GIRAUD member of the HIEPACS team, dedicated to high-end parallel algorithms for challenging numerical simulations

Digital libraries are software building blocks that help solve recurring mathematical problems. These generic problems are part of large simulation codes developed to understand complex phenomena that cannot be studied through experimentation, but that we will be able to understand thanks to the next generation of exaflopic computers. INRIA teams are working with its partners on solutions aimed at facilitating the use of these new machines by researchers who are not experts in parallel computing. In this context, there are many challenges that need to be met in order to significantly affect the use of supercomputers. New “flexible hierarchy” algorithms are capable of simultaneously using large numbers of cores on heterogeneous computers. These libraries must be particularly failure-tolerant. They should also limit the energy used for computing while automatically adapting to variable conditions of use in terms data volume and the number of cores available to process it.

A t the end of the decade we should see the first “exascale” computers, that is to say, machines

with a computing power of one exaflops.

But in order to run scientific applications and take full advan-tage of hundreds of millions of cores (massive parallel systems), we need to envisage the fact that implementing such calculations also involves tolerating a constant flow of failures. The Jaguar su-percomputer at the Oak Ridge National Library in the United States experiences an average of one failure a day, and it has “only” 250,000 cores.

Imagine this: a computer with 1 million processing units, each one failing only once every >>>

41

TH

E F

UTU

RE

: E

XA

SC

AL

E C

OM

PU

TIN

G

BY FRANCKCAPPELLO

co-director of the INRIA/Urbana-Champaign Joint Laboratory for Petascale Computing in the United States.

z The Jaguar supercomputer (USA) experiences one failure a day on average. This is not surprising considering the number of cores it contains (nearly 250,000) and its computing power (2.33 petaflops).

AP

/SIP

A

SUPERCOMPUTERS LA RECHERCHE NOVEMBER 2011 N° 457 .

Page 42: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

42 >>> thousand years – an optimis-tic hypothesis based on reliable hardware – would still produce one error per minute! We can see why it would be impossible to run any application without experi-encing numerous failures...

What sort of failures can a su-percomputer encounter? These range from power outages, “crashes” due to a systems error, or inter-mittent failures on integrated cir-cuits. These are by far the most frequent kind. They involve changes in the state of memory cells (where a bit switches back and forth from 0 to 1) due to electromagnetic or cosmic radiation!

Exascale applicationsSupercomputer hardware is de-signed to detect and correct in-termittent failures, but at the cost of greater energy consumption because this multiplies the num-ber of circuits. Otherwise, the ap-plication data can be “corrupt” and programmes can enter un-expected states, producing erro-neous results without the re-searcher realising it.

INRIA teams are developing algorithmic techniques and soft-ware tools with its partners in or-der to solve the problem of failure tolerance in exascale machines by working on protocols for back-ups of execution states and failure avoidance. In the first case, if a failure occurs, execution is re-started based on the most recent backed up state.

The second case consists in predicting failures and shifting calculations in progress to reli-able resources. Predicting failures remains difficult and backups of execution states remains a key approach. The challenge lies in designing very fast backup algo-rithms that require fewer resources, while the application uses 1 mil-lion processor cores.

Teams are studying stochas-tic execution models in order to predict, and therefore optimise, the performance of a large-scale parallel scientific application. Fi-nally, researchers are working on new digital methods and robust algorithms for simulations, which can calculate the desired solu-tions even if many failures occur. This last approach is a very prom-ising one over the long term, but it will be difficult to adapt all ex-ascale applications by the end of the decade. z

42

TH

E F

UTU

RE

: E

XA

SC

AL

E C

OM

PU

TIN

G

VIRTUALISATION OF HYBRID ARCHITECTURESBY RAYMOND NAMYST leader of RUNTIME team, dedicated to high-performance runtime systems for parallel architectures

The recent advent of “hybrid” computers, associating general-purpose processors and accelerators, has profoundly changed development techniques for simulation applications. While programmers of scientific applications and developers of environments and compilers already had a lot on their plate with the arrival of multi-core processors for supercomputers, a new revolution is taking place in the world of computing: the use of accelerators such as GPU (graphics processing units) to shore up traditional multi-core processors. Originally, GPUs were adopted for their capacity to speed up specific parts of applications, whose computation was “delegated” to them. Gradually, the use of GPUs became more widespread and today their power is often greater than that of general-purpose processors. However, they require radically

different programming techniques from those traditionally used. One of the greatest challenges facing the IT community is succeeding in simultaneously using all computing units. To achieve this, we need to continuously feed a heterogeneous set of processing units. One approach that has been recently explored at the INRIA consists in breaking down applications into tasks, without deciding in advance which processing units will run them. The idea is to preserve as much flexibility as possible. The challenge lies in adjusting as

carefully as possible the distribution of tasks among processing units. Typically tasks will be assigned to the units that can run them more efficiently. However, there are many parameters involved, which make the problem difficult to solve: the degree of useable parallelism, the amount of data transferred, energy consumption, etc. Furthermore, while GPUs are generally more powerful than traditional cores, the gains in speed obtained depend a great deal on the task and volume of data processed. While certain calculations can be performed fifty times faster on a GPU than a traditional core, gains are much more modest for others and sometimes negative! Surprisingly, this apparent drawback is in fact a quality: efficient allocation of tasks on a hybrid machine is much more effective than the results obtained on a homogeneous one, even though it is much easier to run! The idea is that a factory with a variety of specialised workers is more efficient than one with workers having the same skill set.

“A factory with a variety of specialised workers is more efficient than one with workers having the same skill set.”

N° 457 NOVEMBER 2011 LA RECHERCHE SUPERCOMPUTERS

WEI

TA /

IMAG

INEC

HINA

z Tianhe-1 was the

world’s most powerful supercomputer when it was first presented

at the end of 2010. Designed in China, it

beat out the Jaguar (USA) by adopting a hybrid architecture

associating general-purpose processors (CPU) and graphics accelerators (GPU).

Page 43: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science

BECOME A MASTER S C I E N T I S T I N M O D E L L I N G A N D S I M U L A T I O N

Mastering High Performance Computing

UVSQThe futur of computing

UVSQ.fr

www.maisondelasimulation.fr

www.maisondelasimulation.fr/M2S

MA I SON DE LA S IMULAT ION

Page 44: SUPER- COMPUTERS · SUPER-COMPUTERS AT THE FRONTIERS OF EXTREME COMPUTING NOVEMBER 2011 PUBLISHED IN PARTNERSHIP WITH. Joint Laboratory SMEs HPC At the interface of computer science