Top Banner
159

The Scientific Case for High Performance Computing in Europe 2012-2020

Mar 13, 2016

Download

Documents

Full Text (159 pages)
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Scientific Case for High Performance Computing in Europe 2012-2020
Page 2: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE – The Scientific Case for HPC in Europe

ALL RIGHTS RESERVED. This report contains material protected under International and Federal Copyright Laws and Treaties. Any unauthorized reprint or use of this material is prohibited. No part of this report may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage and retrieval system without express written permission from the author / publisher.

The copyrights of the content and images contained within this report remain with their original owner(s). The PRACE name and the PRACE logo are © PRACE.

All diagrams and most photographs are original to this book. See copyright notice for individual copyrights.

Printed by Bishops Printers Ltd, Spa House, Walton Road, Portsmouth, Hampshire, PO6 1TR, United Kingdom, Telephone +44 239 2334 900

Published by Insight Publishers Ltd, 12-13 King Square, Bristol, BS2 8JH, United Kingdom, Telephone +44 117 2033 120, www.ipl.eu.com

Project Manager Ellen HagganBook designer Mike StaffordAdditional designers Thomas HuntLayout assistant Simon BrowneProofreader Becky FreemanPrint manager Paul Rumbold

The PRACE Research Infrastructure is established as an international non-profit association with seat in Brussels and is named “Partnership for Advanced Computing in Europe aisbl”. It has 24 member countries (June 2012) whose representative organisations are creating a pan-European supercomputing infrastructure, providing access to computing and data management resources and services for large-scale scientific and engineering applications at the highest performance level.

PRACE, October 2012

2

ISBN 978-0-9574348-0-6

Page 3: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE – The Scientific Case for HPC in Europe

www.prace-ri.eu

This work has received funding from the European Community’s

Seventh Framework Programme (FP7/2007-2013) in the PRACE-1IP Project under grant agreement n° RI-261557.

Lead Author:

Martyn Guest – Cardiff University, UK

Panel Chairs:

Giovanni Aloisio – University of Salento and ENES-CMCC, Italy

Stefan Blügel – Forschungszentrum Jülich, Germany

Modesto Orozco – Institute for Research in Biomedicine, Spain

Philippe Ricoux – TOTAL, France

Andreas Schäfer – University of Regensburg, Germany

Scientific Case Management Group:

Richard Kenway – PRACE Scientific Steering Committee (Chair)

Turlough Downes – PRACE User Forum (Chair)

Thomas Lippert – PRACE-1IP Project Coordinator

Maria Ramalho – Acting Managing Director of PRACE aisbl

Secretary:

Giovanni Erbacci – PRACE-1IP WP4 leader

Editor-in-chief:

Marjolein Oorsprong – PRACE Communications Officer

3

Page 4: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe Table  of  Contents

  4  

TABLE OF CONTENTS

Glossary…………………………………………………………………………………………………………….…………………………5  Executive  Summary  .......................................................................................................................  11  Key  Recommendations  ..................................................................................................................  16  1   The  European  HPC  Ecosystem  and  its  Potential  Impact  –  2012–-­‐2020  ....................................  18  1.1   Introduction  and  Background  .......................................................................................................  18  1.2   Objectives  and  Scope  of  the  Scientific  Case  Update  .....................................................................  21  1.3   Progress  to  be  Expected  During  the  Petascale  Era  .......................................................................  23  1.4   Balance  between  Scientific,  Industrial  and  Societal  Benefits  ........................................................  41  2   Weather,  Climatology  and  solid  Earth  Sciences  .....................................................................  47  2.1   Summary  ......................................................................................................................................  47  2.2   Computational  Grand  Challenges  and  Expected  Outcomes  .........................................................  49  2.3   A  Roadmap  for  Capability  and  Capacity  Requirements  ................................................................  64  2.4   Expected  Status  in  2020  ...............................................................................................................  65  3   Astrophysics,  High-­‐Energy  Physics  and  Plasma  Physics  ..........................................................  70  3.1   Summary  ......................................................................................................................................  70  3.2   Computational  Grand  Challenges  and  Expected  Outcomes  .........................................................  70  3.3   A  Roadmap  for  Capability  and  Capacity  Requirements  ................................................................  83  3.4   Expected  Status  in  2020  ...............................................................................................................  84  4   Materials  Science,  Chemistry  and  Nanoscience  .....................................................................  86  4.1   Summary  ......................................................................................................................................  86  4.2   Computational  Grand  Challenges  and  Expected  Outcomes  .........................................................  87  4.3   A  Roadmap  for  Capability  and  Capacity  Requirements  ................................................................  95  4.4   Expected  Status  in  2020  .............................................................................................................  100  5   Life  Sciences  and  Medicine  ..................................................................................................  101  5.1   Summary  ....................................................................................................................................  101  5.2   Computational  Grand  Challenges  and  Expected  Outcomes  .......................................................  101  5.3   A  Roadmap  for  Computational  Requirements  ............................................................................  105  5.4   Expected  Status  in  2020  .............................................................................................................  108  6   Engineering  Sciences  and  Industrial  Applications  .................................................................  110  6.1   Introduction  ................................................................................................................................  110  6.2   Computational  Grand  Challenges  &  Expected  Outcomes  in  Engineering  ...................................  112  6.3   Computational  Grand  Challenges  and  Expected  Outcomes  in  Industry  .....................................  119  6.4   Engineering  and  Industrial  Exascale  Issues  ................................................................................  130  6.5   A  Roadmap  for  Computational  Requirements  ............................................................................  131  6.6   Expected  Status  in  2020  .............................................................................................................  132  7   Requirements  for  the  Effective  Exploitation  of  HPC  by  Science  and  Industry  .......................  133  7.1   Introduction  ................................................................................................................................  133  7.2   An  Effective  and  Persistent  Infrastructure  ..................................................................................  133  7.3   Computational  Science  Infrastructure  in  Europe  ........................................................................  136  7.4   The  Challenges  of  Exascale-­‐Class  Computing  .............................................................................  139  7.5   A  Support  Infrastructure  for  the  European  HPC  Community  ......................................................  145  7.6   Education  and  Training  of  Researchers  ......................................................................................  148  7.7   Community  Building  and  Centres  of  Competence  ......................................................................  151  8   Membership  of  International  Scientific  Panel  ......................................................................  153  

Page 5: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Glossary  

  5  

GLOSSARY

Abbreviation / acronym Description

4DVAR   Four-­‐dimensional  variational  assimilation  –  a  simple  generalisation  of  3DVAR  for  observations  that  are  distributed  in  time  

ACARE   Advisory  Council  for  Aviation  Research  and  Innovation  in  Europe  AGN   Active  Galactic  Nuclei  ALD   Atomic  Layer  Deposition  ARGO   Argo  is  a  global  array  of  3,000  free-­‐drifting  profiling  floats  that  measures  the  

temperature  and  salinity  of  the  upper  2,000  m  of  the  ocean  B3LYP   A  hybrid  functional  in  which  the  exchange  energy,  in  this  case  from  Becke's  

exchange  functional,  is  combined  with  the  exact  energy  from  Hartree–Fock  theory  

Big-­‐BOSS     A  ground-­‐based  dark  energy  experiment  to  study  baryon  acoustic  oscillations  (BAO)  and  the  growth  of  structure  with  an  all-­‐sky  galaxy  redshift  survey  

BLAS   Basic  Linear  Algebra  Subprograms  BLAST   Basic  Local  Alignment  Search  Tool  (bioinformatics)  BSC   Barcelona  Supercomputing  Center  (Spain)  BSM   Beyond  the  Standard  Model  CAD   Computer-­‐Aided  Design  CAE   Computer-­‐Aided  Engineering  CASPT2   Complete  Active  Space  with  Second-­‐order  Perturbation  Theory  CASSCF   Complete  Active  Space  Self  Consistent  Field  method  –  a  particularly  important  

multi-­‐configurational  self-­‐consistent  field  approach  (MCSCF)  CC   Coupled  cluster  –  a  numerical  technique  used  for  describing  many-­‐body  systems  CCSD(T)   Coupled-­‐cluster  method  that  included  singles  and  doubles  fully,  while  triples  are  

calculated  non-­‐iteratively  CEA   Commissariat  à  l'Energie  Atomique  CECAM   Centre  Européen  de  Calcul  Atomique  et  Moléculaire  CECDC   Combustion  Exascale  Co-­‐Design  Center  (USA)  CERF   Co-­‐Design  for  Exascale  Research  in  Fusion  (USA)  CERFACS   European  Centre  for  Research  and  Advanced  Training  in  Scientific  Computation  

(France)  CERN   European  Organisation  for  Nuclear  Research  CESAR   Office  of  Science  Center  for  Exascale  Simulation  of  Advanced  Reactors  (USA)  CFD   Computational  Fluid  Dynamics  CI   Configuration  Interaction  CINES   Centre  Informatique  National  de  l’Enseignement  Supérieur  (France)  CMCC   Centro  Euro-­‐Mediterraneo  per  i  Cambiamenti  Climatici  (Italy)  CMIP5   Coupled  Model  Intercomparison  Project  Phase  5  CNRS   Centre  National  de  la  Recherche  Scientifique  (France)  COPES   Coordinated  Observation  and  Prediction  of  the  Earth  System  CPMD   ’Car-­‐Parrinello’  molecular  dynamics  (ab-­‐initio  MD)  CPU   Central  Processing  Unit  CSM   Continuum  solvation  models  CT-­‐QMC   Continuous-­‐Time  Quantum  Monte  Carlo  methods  for  numerically  exact  

calculation  of  complicated  fermionic  path  integrals    CTM   Chemical  transport  models  

Page 6: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Glossary  

  6  

CUDA   NVIDIA’s  parallel  computing  architecture  CVD   Chemical  vapour  deposition  DAs   Distribution  Amplitudes  DEM   Discrete  Element  Method  (particle  simulation  technology)  DESY   Deutsches  Elektronen-­‐Synchrotron  DESY,  a  Research  Centre  of  the  Helmholtz  

Association,  in  Hamburg  (Germany)  DFT   Density  Functional  Theory  –  a  quantum  mechanical  modelling  method  used  to  

investigate  the  electronic  structure  of  many-­‐body  systems  DMFT   Dynamical  Mean-­‐Field  Theory  DNA   Deoxyribonucleic  Acid  DNS   Direct  Numerical  Simulation  DPD   Dissipative  Particle  Dynamics  EBI   European  Bioinformatics  Institute  EByte   1  Exabyte  =  1018  bytes  of  digital  information    ECWS   Exascale  Climate  and  Weather  Science  Co-­‐Design  Center  EDF   Electricité  de  France  (France)  EESI   European  Exascale  Software  Initiative  (Europe)  Eflop/s   1  Exaflop  =  1018  floating-­‐point  operations  per  second  EFT   Effective  Field  Theory  EIDA   European  Integrated  Waveform  Data  Archive  ELI   Extreme  Light  Infrastructure,  a  European  Project,  involving  nearly  40  research  

and  academic  institutions  from  13  EU  Member  States,  forming  a  pan-­‐European  Laser  facility  

ELIXIR   European  bioinformatics  initiative  to  construct  and  operate  a  sustainable  infrastructure  for  biological  information  in  Europe    

EMBL   European  Molecular  Biology  Laboratory    EM-­‐PIC   Electromagnetic  PIC  simulation    ENES   European  Network  for  Earth  System  modelling  ENSO   El  Niño/La  Niña-­‐Southern  Oscillation  is  a  quasiperiodic  climate  pattern  that  

occurs  across  the  tropical  Pacific  Ocean  roughly  every  five  years  EPOS   European  Plate  Observing  System  EPR   A  third-­‐generation  pressurised  water  reactor  (PWR)  design  ESA   European  Space  Agency  ESF   European  Science  Foundation  ESM   Earth  System  Model  (climate)  ESMF   Earth  System  Modelling  Framework  –  open-­‐source  software  for  building  climate,  

numerical  weather  prediction,  data  assimilation  and  other  Earth  science  software  applications  

ESFRI   European  Strategy  Forum  for  Research  Infrastructures    ETP4HPC   European  Technology  Platform  (ETP)  for  High-­‐Performance  Computing  Euclid   A  planned  space  telescope,  an  M-­‐class  mission  of  the  ESA  Cosmic  Vision  2020–

2025,  planned  to  be  launched  in  2019  Exascale   Simulations  and  HPC  systems  which  calculate  at  around  1018  floating-­‐point  

operations  per  second  FAIR   Facility  for  Antiproton  and  Ion  Research  FE   Finite  Elements  FEA   Finite  Element  Analysis  FFT   Fast  Fourier  Transformation  Flash   High-­‐Energy  Density  Physics  Co-­‐Design  Center  (USA)  FP7   European  Commission  –-­‐  Research:  The  Seventh  Framework  Programme  (2007–  

2013)  

Page 7: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Glossary  

  7  

FZJ   Forschungszentrum  Julich  (Germany)  Gauß-­‐Allianz   The  Gauß-­‐Allianz  e.V.  is  a  German  association  in  which  academic  computing  

centres  team  up  to  create  the  necessary  infrastructure  for  the  future  of  HPC  and  Grid  computing  on  a  national  level  

GCM   Global  Climate  Model  GENCI   Grand  Equipment  National  de  Calcul  Intensif  (France)  GENE   Gyrokinetic  Electromagnetic  Numerical  Experiment  –  an  open  source  plasma  

microturbulence  code  GEO600   A  gravitational  wave  detector  located  near  Sarstedt,  Germany  GHG   Greenhouse  Gas  GMES   Global  Monitoring  for  Environment  and  Security  –  European  initiative    GPDs   Generalised  Parton  Distributions    GP-­‐GPU   General  Purpose  Graphics  Processor  Unit  GPU   Graphical  Processing  Unit  GRAPE   Gravity  Pipeline  Engine  –  special-­‐purpose  hardware  GW   GW  approximation,  derived  by  Hedin,  is  based  on  an  expansion  in  terms  of  the  

dynamically  screened  Coulomb  interaction  GYSELA   The  GYSELA  code  simulates  the  electrostatic  branch  of  the  Ion  Temperature  

Gradient  turbulence  in  tokamak  plasmas  Hadoop   Open-­‐source  software  project  that  enables  the  distributed  processing  of  large  

data  sets  across  clusters  of  commodity  servers  HED   High  energy  density  (plasma  physics)  HEP   High-­‐Energy  Physics  HiPER   High-­‐Power  Laser  for  Energy  Research  HLRS   High-­‐Performance  Computing  Center  Stuttgart  (Germany)  HPC   High-­‐Performance  Computing  HQP   Highly  Qualified  Personnel    HTS   High-­‐Throughput  Screening    I/O   Input  and  Output  IATA   International  Air  Transport  Association  IBM   Immersed  Boundary  Method  (particle  simulation  technology)  IceCube   The  IceCube  Neutrino  Observatory  is  a  neutrino  telescope  constructed  at  the  

Amundsen–Scott  South  Pole  Station  in  Antarctica  ICF   Inertial  Confinement  Fusion    IDC   International  Data  Corporation  IESP   International  Exascale  Software  Project  IFERC   International  Fusion  Energy  Research  Centre  IGBP   International  Geosphere–Biosphere  Programme  IGBP-­‐AIMES   The  Earth  System  synthesis  and  integration  project  of  the  IGBP  INCITE   Innovative  and  Novel  Computational  Impact  on  Theory  and  Experiment  Intel  MIC   Intel  Many  Integrated  Core  Architecture      IPCC   Intergovernmental  Panel  on  Climate  Change  IPCC-­‐AR5   Fifth  Assessment  Report  (AR5)  of  the  IPCC  ITER   ITER  Fusion  Research  Collaboration  JET   Joint  European  Torus  JWST   The  James  Webb  Space  Telescope  K-­‐computer   The  first  machine  to  achieve  10  Pflop/s  (Fujitsu,  Japan)  KIT   Karlsruhe  Institute  for  Technology  LBM   Lattice  Boltzmann  Method  LDA   Local  Density  Approximation  LDA-­‐DMFT   LDA-­‐Dynamical  Mean-­‐Field  Theory  to  address  strongly  correlated  electron  systems  

Page 8: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Glossary  

  8  

LES   Large  eddy  simulation  LHC   Large  Hadron  Collider  LIGO   Laser  Interferometer  Gravitational-­‐Wave  Observatory  (US)  Linpack   The  LINPACK  Benchmarks  are  a  measure  of  a  system’s  floating-­‐point  computing  

power  LQCD   Lattice  Quantum  Chromo  Dynamics  LSST   Large  Synoptic  Survey  Telescope  MapReduce   A  patented  software  framework  introduced  by  Google  in  2004  to  support  

distributed  computing  on  large  data  sets  on  clusters  of  computers  MatSEEC   An  independent  ESF  science-­‐based  committee  in  materials  science  and  its  

applications,  materials  engineering  and  technologies  and  related  fields  of  science  and  research  management  

MByte   1  Megabyte  =  106  bytes  of  digital  information    MC   Monte  Carlo  MCF   Magnetic  Confinement  Fusion  MCTDH   Multi-­‐Configuration  Time-­‐Dependent  Hartree  algorithm  MD   Molecular  Dynamics  MDGRAPE   Molecular  Dynamics  Gravity  Pipeline  Engine  MDO   Multidisciplinary  Design  and  Optimisation    MeMoVolc     European  research  network  in  Measuring  and  Modelling  Volcano  Eruption  

Dynamics  MHD   Magneto-­‐Hydrodynamics  MMM   Multiscale  Materials  Modelling  MP2   2nd  order  Møller–Plesset  perturbation  theory  (MP)  MPCD   Multi-­‐Particle  Collision  Dynamics  MPI   Message  Passing  Interface  (distributed  memory  system  programming  model)  MRAM   Magnetic  Random  Access  Memory  MRI   Magnetic  Resonance  Imaging  MSU   Moscow  State  University  (Russia)  MW   Megawatt  MyOcean   Implementation  of  the  Marine  Core  Service  NERIES     Integrated  Infrastructure  Initiative  FP6  project  aiming  at  networking  the  

European  seismic  networks  Neutronics   Neutron  transport  or  simply  Neutronics  is  the  term  used  to  describe  the  

mathematical  treatment  of  neutron  and  gamma  ray  transport  through  materials  NVH   Noise,  Vibration  and  Harshness  ODES   Ordinary  Differential  Equations    OpenMP   Open      specification      for      Multi-­‐Processing      (shared      memory      system  

programming  model)  ORCA12   Global  ocean  model  including  sea  ice,  at  1/12  deg  resolution    ORFEUS     The  European  non-­‐profit  foundation  that  aims  at  coordinating  and  promoting  

digital  broadband  (BB)  seismology  in  the  European  Mediterranean  area  OSIRIS   An  integrated  framework  for  parallel  PIC  simulations    Pan-­‐STARRS   The  Panoramic  Survey  Telescope  and  Rapid  Response  System  PByte   1  Petabyte  =  1015  bytes  of  digital  information    PDE   Partial  Differential  Equations  Petascale   Simulations  and  HPC  systems  which  calculate  at  around  1015  floating-­‐point  

operations  per  second  Pflop/s   1  Petaflop  =  1015  floating-­‐point  operations  per  second  PIC   Particle  In  Cell  PRACE   Partnership  for  Advanced  Computing  in  Europe  

Page 9: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Glossary  

  9  

PRACE-­‐RI   PRACE  Research  Infrastructure  PRACE-­‐1IP   1st  Implementation  Phase  of  PRACE    PRMAT   Parallel  R-­‐Matrix  Program  PWR   Pressurised  Water  Reactor  QCD   Quantum  Chromo  Dynamics  QFT   Quantum  Field  Theory    QM/MM   Quantum  Mechanical  /  Molecular  Mechanics  QMC   Quantum  Monte  Carlo    QSAR   Quantitative  Structure  Activity  Relationship    QUEST   QUantitative  Estimation  of  Earth’s  Seismic  Sources  and  Structure  –  Initial  Training  

Network  in  computational  seismology  funded  within  EU  FP7  RANS   Reynolds  Averaged  Navier  Stokes    Reτ   In  fluid  mechanics,  the  Reynolds  number  (Re)  gives  a  measure  of  the  ratio  of  

inertial  forces  (which  characterise  how  much  a  particular  fluid  resists  any  change  in  motion)  to  viscous  forces  and  consequently  quantifies  the  relative  importance  of  these  two  types  of  forces  for  given  flow  conditions  

SAR   Synthetic-­‐Aperture  Radar  SDP   Seismic  Data  Processing  SHMEM   SHared  MEMory  SKA   Square  Kilometre  Array  SLOOP   SheLf  to  deep  Ocean  mOdelling  of  Processes  –  the  International  Training  Network  

SLOOP  recently  submitted  to  FP7  SM   Standard  Model  SME   Small  and  Medium  Enterprise    SN     A  supernova  (abbreviated  SN,  plural  SNe  for  supernovae)  Sn  neutronics   A  deterministic  method  for  neutronics  (neutron  transport)  in  which  the  particle  

flux  distribution  in  space,  angle  and  energy  is  found  by  solving  the  transport  equation  numerically  

SNP   Single  Nucleotide  Polymorphism  SPH   Smoothed  Particle  Hydrodraulics    (particle  simulation  technology)  SPICE   Seismic  wave  Propagation  and  Imaging  in  Complex  media  –  a  Marie  Curie  

Research  Training  Network  in  FP6  focusing  on  research  and  training  in  all  aspects  of  computational  seismology  

SQUIDs   Superconducting  QUantum  Interference  Devices  –  sensitive  sensors  for  magnetic  fields  

SSC   PRACE  Scientific  Steering  Committee    SST   Sea-­‐surface  temperature  STFC   Science  and  Technology  Facilities  Council  (UK)  Super-­‐K   The  Super-­‐Kamiokande  neutrino  observatory  (Japan)  TByte   1  Terabyte  =  1012  bytes  of  digital  information    TDDFT   Time-­‐Dependent  DFT    Terascale   Simulations  and  HPC  systems  which  calculate  at  around  1012  floating-­‐point  

operations  per  second  Tflop/s   1  Teraflop  =  1012  floating-­‐point  operations  per  second  THC   Thermohaline  Circulation    Tier-­‐0   Leadership-­‐class  computing  systems    Tier-­‐1   National  ‘Mid-­‐range’  HPC  systems  Tier-­‐2   Institutional  HPC  systems  TMDs   Transverse  Momentum-­‐Dependent  Distribution  functions  TOPO-­‐EUROPE  

European  initiative  in  the  Geoscience  of  Coupled  Deep  Earth–Surface  Processes  

Page 10: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Glossary  

  10  

TOPOMOD   Training  project    to  investigate  and  model  the  origin  and  evolution  of  topography  of  the  continents  over  a  wide  range  of  spatial  and  temporal  scales  and  using  a  multidisciplinary  approach,  coupling  geophysics,  geochemistry,  tectonics  and  structural  geology  with  advanced  geodynamic  modelling  

UCL   University  College  London  US   United  States  USA   United  States  of  America  VERCE   Virtual  Earthquake  and  seismology  Research  Community  in  Europe  e-­‐science  

environment  VIRGO   A  gravitational  wave  detector  (Michelson  laser  interferometer)  in  Italy  with  two  

orthogonal  arms  each  3  kilometres  long  WCES   Weather,  Climate  and  solid  Earth  Sciences  WCRP   World  Climate  Research  Program  WG   Working  Group  WORM   A  Write  Once  Read  Many  drive  is  a  data  storage  device  where  information,  once  

written,  cannot  be  modified  WP   Work  Package  ZByte   1  Zettabyte  =  1021  bytes  of  digital  information    Zettascale   Simulations  and  HPC  systems  which  calculate  at  around  1021  floating-­‐point  

operations  per  second  Zflop/s   1  Zettaflop  =  1021  floating-­‐point  operations  per  second  

Page 11: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Executive  Summary  

  11  

Communication  from  the  EC6  on  ‘High  Performance  Computing:  Europe’s  Place  in  a  Global  Race’  COM(2012)  45  final    

‘The  race  for  leadership  in  HPC  systems  is  driven  both  by  the  need  to  address  societal  and  scientific  grand  challenges  more  effectively,  such  as  early  detection  and  treatment  of  diseases  like  Alzheimer's,  deciphering  the  human  brain,  forecasting  climate  evolution,  or  preventing  and  managing  large-­‐scale  catastrophes,  and  by  the  needs  of  industry  to  innovate  in  products  and  services.’  ‘Industry  has  a  dual  role  in  high-­‐end  computing:  firstly,  supplying  systems,  technologies  and  software  services  for  HPC;  and  secondly,  using  HPC  to  innovate  in  products,  processes  and  services.  Both  are  important  in  making  Europe  more  competitive.  Especially  for  SMEs,  access  to  HPC,  modelling,  simulation,  product  prototyping  services  and  consulting  is  important  to  remain  competitive.  This  Action  Plan  advocates  for  a  dual  approach:  strengthening  both  the  industrial  demand  and  supply  of  HPC.’  

 

EXECUTIVE SUMMARY

1. In  2005  and  2006,  an  international  panel  produced  a  White  Paper  entitled  ‘Scientific  Case  for  Advanced   Computing   in   Europe’   that   argued   the   case   for   High-­‐Performance   Computing   (HPC)   to  support  European  competitiveness.1  The  document  was  published  by  the  ‘HPC  in  Europe  Taskforce’2  in  January   2007.   The   initiative   was   instrumental   to   the   establishment   of   PRACE   –   The   Partnership   for  Advanced  Computing  in  Europe  –  and  the  PRACE  Research  Infrastructure  (PRACE-­‐RI)  in  April  2010.3  This  document   represents   the   culmination  of   an   initiative  by  PRACE   to   create  an  update  of   the  Scientific  Case  and  capture  the  current  and  expected  future  needs  of  the  scientific  communities.  It  has  involved  the  PRACE  Scientific  Steering  Committee  (SSC),4  leading  scientists  from  across  all  major  user  disciplines,  and  has  been  funded  through  the  1st  Implementation  Phase  of  PRACE  –  the  PRACE-­‐1IP  project.  

2. Five  years  after  the  publication  of  the  scientific  case,   the   HPC   landscape   in   Europe   has   changed  significantly.   The   PRACE   Research   Infrastructure   is  providing  Tier-­‐0  HPC  services  –  large  allocations  of  time  on  some  of  the  most  powerful  computers  in  the  world  –   to   researchers   in   Europe,   a   global   effort   has   been  launched   towards   achieving  exascale  HPC5  by   the  end  of  this  decade,  and  the  importance  of  HPC  in  solving  the  socio-­‐economic   challenges   and   maintaining   Europe’s  competitiveness  has  become  even  more  evident.    

3. The   scope  of   this   report   is  wide-­‐ranging  and  captures  the  conclusions  of  five  scientific  areas,  each  derived   from   the   work   of   an   associated   panel   of  experts.  The  five  panels  include  those  in  the  areas  of:  Weather,   Climatology   and   solid   Earth   Sciences;  Astrophysics,   HEP   and   Plasma   Physics;   Materials  Science,   Chemistry   and   Nanoscience;   Life   Sciences  and   Medicine;   and   Engineering   Sciences   and  Industrial  Applications.  

4. The   position   of   HPC   has   evolved   since   2007   from   a   technology   crucial   to   the   academic  research  community  to  a  point  where  it   is  acknowledged  as  central   in  pursuing  ‘Europe’s  place  in  a  Global  Race’.6   The   sidebar  extract   from   the   communication   from   the  Commission   to   the  European  Parliament  pays  testimony  to  this  position.  

                                                                                                                         1  www.hpcineuropetaskforce.eu/files/Scientific  case  for  European  HPC  infrastructure  HET.pdf  2  http://www.hpcineuropetaskforce.eu  3  http://www.prace-­‐ri.eu/  4  http://www.prace-­‐ri.eu/Organisation  5 http://www.exascale.org  Experts  predict  that  exascale  computers  (capable  of  1018  operations  per  second)  will  be  in  existence  before  2020.  

6 Communication  from  the  Commission  to  the  European  Parliament,  the  Council,  the  European  Economic  and  Social   Committee   and   the   Committee   of   the   Regions   –   ‘High-­‐Performance   Computing:   Europe’s   place   in   a  Global  Race’,  15.02.2012,  

     http://ec.europa.eu/information_society/newsroom/cf/item-­‐detail-­‐dae.cfm?item_id=7826

Page 12: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Executive  Summary  

  12  

5. HPC   is   currently   undergoing   a  major   change   as   the   next   generation   of   computing   systems  (‘exascale   systems’)4   is   being   developed   for   2020.   These   new   systems   pose   numerous   challenges,  from  a  100-­‐fold   reduction  of  energy  consumption   to   the  development  of  programming  models   for  computers  that  host  millions  of  computing  elements,  while  addressing  the  data  challenge  presented  by   the   storage   and   integration   of   both   observational   and   simulation/modelling   data.   These  challenges  cannot  be  met  by  mere  extrapolation  but  require  radical  innovation  in  several  computing  technologies.   This   offers   opportunities   to   industrial   and   academic   players   in   the   EU   to   reposition  themselves  in  the  field.  

6. All   of   the   panels   contributing   to   this   report   are   convinced   that   the   competitiveness   of  European   science   and   industry  will   be   jeopardised   if   sufficiently   capable   computers   are   not  made  available,  together  with  the  associated  infrastructure  and  skilled  people  necessary  to  maximise  their  exploitation.   The   panels   have   listed   multiple   areas   at   risk   in   concluding   that   access   to   high-­‐performance   computers   in   the   exascale   range   is   of   the   utmost   importance.   Thus,   in   aerospace,  considerable   changes   in   the   development   processes   will   lead   to   a   significant   reduction   in  development  times  while  at  the  same  time  including  more  and  more  disciplines   in  the  early  design  phases   to   find   an   overall   optimum   for   the   aircraft   configuration.   This   will   enable   the   European  aircraft   industry   to  keep  a   leading  role   in  worldwide  competition,   facing  both  an  old  challenge,   i.e.  competing  with  the  USA,  and  a  new,  rapidly  emerging  one  –  keeping  an  innovation  advantage  over  China.  However,  while  aerospace  can  afford  its  own  HPC  provision,  it  may  not  have  the  capability  to  exploit   exascale   if   similar   systems   are   not   available   to   academia   for   training   and   software  development.  In  a  similar  vein,  the  lack  of  high-­‐performance  computers  appropriate  for  life  sciences  research  will  displace  R&D  activities  to  the  USA,  China  or  Japan,  putting  European  leadership  in  this  field  at  risk.  

7. Providing  scientists  and  engineers  with  ongoing  access  to  computers  of  leadership  class  must  be  recognised  as  an  essential  strategic  priority  in  Europe:  there  is  a  compelling  need  for  a  continued  European  commitment  to  exploit  the  most  powerful  computers.  Such  resources  are  likely  to  remain  extremely  expensive  and  require  significant  expertise  to  procure,  deploy  and  utilise  efficiently;  some  fields  even  require  research  for  specialised  and  optimised  hardware.    The  panel  stresses  that  these  resources  should  continue  to  be  reserved  for  the  most  exigent  computational  tasks  of  high  potential  value.  It  is  clear  that  the  computational  resource  pyramid  must  remain  persistent  and  compelling  at  all   levels,   including  national  centres,  access  and  data  grids.  The  active  involvement  of  the  European  Community   along  with   appropriate  member   states   remains   critical   in  maintaining   a   world-­‐leading  supercomputer   infrastructure   in   the   European   ecosystem.   Europe   must   foster   excellence   and  cooperation   in   order   to   gain   the   full   benefits   of   exascale   computing   for   science,   engineering   and  industry  in  the  European  Research  Area.    

By  way  of  summary,  we  present  below  key  statements  from  the  Commission  and  from  our  thematic  panels  that  emphasise  the  essential  role  of  a  sustainable  top-­‐level  infrastructure.      

Page 13: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Executive  Summary  

  13  

Communication  from  the  Commission7  to  the  European  Parliament,  the  Council,  the  European  Economic  and  Social  Committee  and  the  Committee  of  the  Regions  –  ‘High-­‐Performance  Computing:  Europe’s  place  in  a  Global  Race’  COM(2012)  45  final,  Brussels,  15.2.2012  

‘High-­‐Performance  Computing  (HPC)  is  critical  for  industries  that  rely  on  precision  and  speed,  such  as  automotive  and  aviation,  and  the  health  sector.  Access  to  rapid  simulations  carried  out  by  ever-­‐improving  supercomputers  can  be  the  difference  between  life  and  death;  between  new  jobs  and  profits  or  bankruptcy.’  

 

Weather, Climatology and solid Earth Sciences (WCES)

‘In  the  last  decade,  our  understanding  of  climate  change  has  increased,  as  has  the  societal  need  to  carry  this  over  into  advice  and  policy.  However,  while  there  is  great  confidence  in  the  fact  that  climate  change  is  happening,  there  remain  uncertainties.  In  particular,  there  is  uncertainty  about  the  levels  of  greenhouse  gas  emissions  and  aerosols  likely  to  be  emitted  and,  perhaps  more  significant,  there  are  uncertainties  about  the  degree  of  warming  and  the  likely  impacts.  Increasing  the  capability  and  comprehensiveness  of  ‘whole  Earth  system’  models  that  represent  in  ever-­‐increasing  realism  and  detail  scenarios  for  our  future  climate  is  the  only  way  to  reduce  these  latter  uncertainties.’  

‘A  programme  of  provision  of  leadership-­‐class  computational  resources  will  make  it  increasingly  possible  in  solid  Earth  sciences  to  address  the  issues  of  resolution,  complexity,  duration,  confidence  and  certainty.  Challenges  have  significant  scientific  and  social  implications,  playing  today  a  central  role  in  natural  hazard  mitigation,  treaty  verification  for  nuclear  weapons,  and  increased  discovery  of  economically  recoverable  petroleum  resources  and  monitoring  of  waste  disposal.’  

‘There  is  a  fundamental  need  in  oceanography  and  marine  forecasting  to  build  and  efficiently  operate  the  most  accurate  ocean  models.  Improved  understanding  of  ocean  circulation  and  biogeochemistry  is  critical  to  assess  properly  climate  variability  and  future  climate  change  and  related  impacts  on,  for  example,  ocean  acidification,  coastal  sea  level,  marine  life  and  polar  sea-­‐ice  cover.’    

Astrophysics, High-Energy Physics and Plasma Physics

‘Astrophysics,  high-­‐energy  physics  and  plasma  physics  have,  in  recent  years,  shared  a  dramatic  change  in  the  role  of  theory  for  scientific  discovery.  In  all  three  fields,  new  experiments  became  ever  more  costly,  require  increasingly  long  timescales  and  aim  at  the  investigation  of  more  and  more  subtle  effects.  Consequently,  theory  is  faced  with  two  types  of  demands:  precision  of  theory  predictions  has  to  be  increased  to  the  point  that  it  is  better  than  the  experimental  one.  Since  the  latter  can  be  expected  to  increase  by  further  orders  of  magnitude  until  2020,  this  is  a  most  demanding  requirement.    In  parallel,  the  need  to  explore  model  spaces  of  much  larger  extent  than  previously  investigated  also  became  apparent.  For  example:  In  astrophysics,  determination  of  the  nature  of  dark  energy  and  dark  matter  requires  a  detailed  comparison  of  predictions  from  large  classes  of  cosmological  models  with  data  from  the  new  satellites  and  ground-­‐based  detectors  which  will  be  deployed  until  2020.  In  high-­‐energy  physics,  one  of  the  tasks  is  to  explore  many  possible  extensions  of  the  Standard  Model  to  such  a  degree  that  even  minute  deviations  between  experimental  data  and  Standard  Model  predictions  can  serve  as  smoking  guns  for  a  specific  realisation  of  New  Physics.  In  plasma  physics,  

                                                                                                                         7  http://ec.europa.eu/information_society/newsroom/cf/item-­‐detail-­‐dae.cfm?item_id=7826  

Page 14: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Executive  Summary  

  14  

one  of  the  tasks  is  to  understand  the  physics  observed  at  ITER  at  such  a  high  level  that  substantially  more  efficient  fusion  reactors  could  be  reliably  designed  based  on  theoretical  simulations  which  explore  a  large  range  of  options.’    

Materials Science, Chemistry and Nanoscience

‘Computational  materials  science,  chemistry  and  nanoscience  is  concerned  with  the  complex  interplay  of  the  myriads  of  atoms  in  a  solid  or  a  liquid,  thereby  producing  a  continuous  stream  of  new  and  unexpected  phenomena  and  forms  of  matter,  characterised  by  an  extreme  range  of  length,  time,  energy,  entropy  and  entanglement  scales.    The  target  of  this  science  is  to  design  materials    ranging  from  the  level  of  a  single  atom  up  to  the  macroscopic  scale,  and  unravel  phenomena  and  design  processes  from  electronic  reaction  times  in  the  femtosecond  range  up  to  geological  periods.  Computational  materials  science,  chemistry  and  nanoscience  stand  in  close  interaction  with  the  neighbouring  disciplines  of  biology  and  medicine,  as  well  as  the  geosciences,  and  affect  wide  fields  of  the  engineering  sciences.    A  large  and  diverse  computational  community  that  views  as  critical  assets  the  conceptualisation,  development  and  implementation  of  algorithms  and  tools  for  cutting  edge  HPC  will  achieve  this  goal.  These  tools  are  used  to  great  benefit  in  other  communities  such  as  medicine  and  life  sciences,  and  engineering  sciences  and  industrial  applications.’    

‘The  advance  from  petascale  to  exascale  computing  will  change  the  paradigm  of  computational  materials  science  and  chemistry.  The  move  to  petascale  is  broadening  this  paradigm  –  to  an  integrated  engine  that  determines  the  pace  in  a  design  continuum  from  the  discovery  of  a  fundamental  physical  effect,  a  process,  a  molecule  or  a  material,  to  materials  design,  systems  engineering,  processing  and  manufacturing  activities,  and  finally  to  the  deployment  in  technology,  where  multiple  scientific  disciplines  converge.  Exascale  computing  will  significantly  accelerate  the  innovation,  availability  and  deployment  of  advanced  materials  and  chemical  agents  and  foster  the  development  of  new  devices.  These  developments  will  profoundly  affect  society  and  the  quality  of  life,  through  new  capabilities  in  dealing  with  the  great  challenges  of  knowledge  and  information,  sustained  welfare,  clean  energy,  health,  etc.’  

 

Life Sciences and Medicine

‘In  life  sciences  and  medicine,  Eflop/s8  capabilities  will  allow  the  use  of  more  accurate  formalisms  (more  accurate  energy  calculations,  for  example)  and  enable  molecular  simulation  for  high-­‐throughput  applications  (e.g.  the  study  of  larger  number  of  systems).  Molecular  simulation  is  a  key  tool  for  computer-­‐aided  drug  design.    The  lack  of  high-­‐performance  computers  appropriate  for  this  research  will  displace  R&D  activities  to  the  USA,  China  or  Japan,  putting  European  leadership  in  this  field  at  risk.  Appropriate  exascale  resources  could  revolutionise  the  simulation  of  biomolecules,  allowing  molecular  simulators  to  decipher  the  atomistic  clues  to  the  functioning  of  living  organisms.  ‘Biomedical  simulation  will  reduce  costs,  time  to  market  and  animal  experimentation.  In  the  medium  to  long  term,  simulation  will  have  a  major  impact  on  public  health,  providing  insights  into  the  cause  of  diseases  and  allowing  the  development  of  new  diagnostic  tools  and  treatments.  It  is  expected  that  understanding  the  basic  mechanisms  of  cognition,  memory,  perception,  etc.,  will  allow  the  development  of  completely  new  forms  of  energy-­‐efficient  computation  and  robotics.  The  potential  

                                                                                                                         8   flop/s,   for   floating-­‐point   operations   per   second;   teraflop,   1   Tflop/s   =   1012   flop/s;   exaflop,   1   Eflop/s   =   1018  

flop/s  

Page 15: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Executive  Summary  

  15  

long-­‐term  social  and  economic  impact  is  immense.      ‘While  exaflop  machines  are  essential  for  specific  areas  of  life  sciences  (e.g.  brain  simulation),  and  higher  computational  power  will  enable  significantly  increased  accuracy  for  current  modelling  studies,  some  extremely  important  fields  in  life  science  will  be  mainly  limited  by  throughput  and  data  management.’    

Engineering Sciences and Industrial Applications

‘All  of  us  experience  the  effects  of  HPC  in  our  day-­‐to-­‐day  lives,  although  in  many  cases  we  are  unaware  of  that  impact.  We  travel  in  cars  and  aeroplanes  designed  using  modelling  and  simulation  applications  run  on  HPC  systems  so  that  they  are  efficient  and  safe.  HPC  is  essential  for  ensuring  that  our  energy  needs  are  met.  Finding  and  recovering  fossil  fuels  require  engineering  analysis  that  only  HPC  can  deliver.  Nuclear  power  generation  also  relies  heavily  on  HPC  to  ensure  that  it  is  safe  and  reliable.  In  the  coming  years,  HPC  will  have  an  even  greater  impact  as  more  products  and  services  come  to  rely  on  it.’  

‘The  automotive  industry  is  actively  pursuing  important  goals  that  need  exaflop  computing  capability  or  greater.  Examples  include  (i)  vehicles  that  will  operate  for  250,000  kilometres  on  average  without  the  need  for  repair  –    this  would  provide  substantial  savings  for  automotive  companies  by  enabling  the  vehicles  to  operate  through  the  end  of  the  typical  warranty  period  at  minimal  cost  to  the  automakers  –  and  (ii)  insurance  companies  require  full-­‐body  crash  analysis  that  includes  simulation  of  soft  tissue  damage  –  today's  "crash  dummies"  are  inadequate  for  this  purpose.’  

‘The  impact  of  computer  simulation  in  aircraft  design  has  been  significant  and  continues  to  grow.  Numerical  simulation  allows  the  development  of  highly  optimised  designs  and  reduced  development  risks  and  costs.  Boeing,  for  example,  exploited  HPC  in  order  to  reduce  drastically  the  number  of  real  prototypes  from  77  physical  prototype  wings  for  the  757  aircraft  to  only  11  prototype  wings  for  the  787  "Dreamliner"  plane.  HPC  usage  saved  the  company  billions  of  dollars.’  

‘In  addition  to  the  automotive  and  aeronautics  examples  above,  many  areas  within  the  engineering  sciences  –  waves  seismic  equation  inversion,  engine  combustion  (chemical  and  multi-­‐physics  combustion)  and  turbulence  –  demand  highly  scalable  or  so-­‐called  “hero  applications”  to  deliver  long-­‐term  social  and  economic  impact.’  

 Giovanni  Aloisio,  University  of  Salento  and  ENES-­‐CMCC,  Italy  

  Chair,  Weather,  Climatology  and  solid  Earth  Sciences  Panel  

Andreas  Schäfer,  University  of  Regensburg,  Germany     Chair,  Astrophysics,  HEP  and  Plasma  Physics  Panel    

Stefan  Blügel,  Forschungszentrum  Jülich,  Germany     Chair,  Materials  Science,  Chemistry  and  Nanoscience  Panel  

Modesto  Orozco,  Institute  for  Research  in  Biomedicine,  Spain       Chair,  Life  Sciences  and  Medicine  Panel  

Philippe  Ricoux,  TOTAL,  France       Chair,  Engineering  Sciences  and  Industrial  Applications  Panel  

Martyn  Guest,  Cardiff  University,  Wales,  United  Kingdom     Lead  Author  of  the  Scientific  Case  

 

Page 16: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Key  Recommendations  

  16  

KEY RECOMMENDATIONS

In   pointing   to   the   compelling   need   for   a   continued   European   commitment   to   exploit   leadership  class   computers,   the   scientific   panels   have   considered   the   infrastructure   requirements   that   must  underpin  this  commitment  and  present  their  considerations  as  part  of  the  review  of  computational  needs.   This   considers   both   the   vital   components   of   the   computational   infrastructure   and   the   user  support  functions  that  must  be  provided  to  realise  the  full  benefit  of  that  infrastructure.  This  review  has  led  to  a  set  of  key  recommendations  deemed  vital  in  shaping  the  future  provision  of  resources,  recommendations  that  are  justified  in  full  in  the  Scientific  Case9  and  outlined  below.    

Recommendation  1  Need  for  HPC  Infrastructure  at  the  Europe  Level  

The   scientific   progress   that   has   been   achieved   using   HPC   since   the   ‘Scientific   Case   for   Advanced  Computing   in  Europe’  was  published   in  2007,  the  growing  range  of  disciplines  that  now  depend  on  HPC,   and   the   technical   challenges   of   exascale   architectures  make   a   compelling   case   for   continued  investment   in  HPC  at   the  European   level.  Europe  should  continue  to  provide  a  world-­‐leading  HPC  infrastructure  to  scientists   in  academia  and   industry,   for  research  that  cannot  be  done  any  other  way,   through  peer   review  based   solely   on   excellence.   This   infrastructure   should   also   address   the  need  for  centres  to  test  the  maturity  of  future  exascale  codes  and  to  validate  HPC  exascale  software  ecosystem  components  developed  in  the  EU  or  elsewhere.    

Recommendation  2     Leadership  and  Management  

The   development   of   Europe’s   HPC   infrastructure,   its   operation   and   access   mechanisms   must   be  driven  by  the  needs  of  science,  industry  and  society  to  conduct  world-­‐leading  research.  This  public-­‐sector   investment  must  be  a   source  of   innovation  at   the   leading  edge  of   technology  development  and  this  requires  user-­‐centric  governance.  Leadership  and  management  of  HPC  infrastructure  at  the  Europe  level  should  be  a  partnership  between  users  and  providers.    

Recommendation  3  A  Long-­‐Term  Commitment  to  Europe-­‐Level  HPC  

Major   experiments   depend  on  HPC   for   analysis   and   interpretation   of   data,   including   simulation   of  models  to  try  to  match  observation  to  theory,  and  support  research  programmes  extending  over  10–20  year  time  frames.  Some  applications  require  access  to  stable  hardware  and  system  software  for  3–5   years.   Data   typically   need   to   be   accessed   over   long   periods   and   require   a   persistent  infrastructure.   Investment   in   new   software   must   realise   benefits   over   at   least   10   years,   with   the  lifetime  of  major  software  packages  being  substantially  longer.  A  commitment  to  Europe-­‐level  HPC  infrastructure  over  several  decades   is   required  to  provide  researchers  with  a  planning  horizon  of  10–20  years  and  a  rolling  5-­‐year  specific  technology  upgrade  roadmap.      

                                                                                                                         9  See  section  7.  

Page 17: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Key  Recommendations  

  17  

Recommendation  4     Algorithms,  Software  and  Tools  

Most   applications   targeting   Tier-­‐0   machines   require   some   degree   of   rewriting   to   expose   more  parallelism   and   many   face   severe   strong-­‐scaling   challenges   if   they   are   effectively   to   progress   to  exascale,  as   is  demanded  by  their  science  goals.  There   is  an  ongoing  need  for  support   for  software  maintenance,   tools   to  manage  and  optimise  workflows  across   the   infrastructure,  and  visualisation.  Support  for  the  development  and  maintenance  of  community  code  bases  is  recognised  as  enhancing  research   productivity   and   take-­‐up   of   HPC.   There   is   an   urgent   need   for   algorithm   and   software  development  to  be  able  to  continue  to  exploit  high-­‐end  architectures  efficiently  to  meet  the  needs  of  science,  industry  and  society.      

Recommendation  5     Integrated  Environment  for  Compute  and  Data  

Most   application   areas   foresee   the   need   to   run   long   jobs   (for   months   or   years)   at   sustained  performances10  around  100  Pflop/s  to  generate  core  data  sets  and  very  many  shorter  jobs  (for  hours  or  days)  at  lower  performances  for  pre-­‐  and  post-­‐processing,  model  searches  and  uncertainty  quantification.  A  major   challenge   is   the   end-­‐to-­‐end  management   of,   and   fast   access   to,   large   and   diverse   data   sets,  vertically   through   the   infrastructure  hierarchy.  Most   researchers   seek  more   flexibility   and   control  over  operating  modes  than  they  have  today  to  meet  the  growing  need  for  on-­‐demand  use  with  guaranteed  turnaround  times,  for  computational  steering  and  to  protect  sensitive  codes  and  data.  Europe-­‐level  HPC  infrastructure   should   attach   equal   importance   to   compute   and   data,   provide   an   integrated  environment  across  Tiers  0  and  1,  and  support  efficient  end-­‐to-­‐end  data  movement  between  all  levels.  Its  operation  must  be  increasingly  responsive  to  user  needs  and  data  security  issues.    

Recommendation  6       People  and  Training  

There   is   grave   concern   about   HPC   skills   shortages   across   all   research   areas   and   particularly   in  industry.  The  need  is  for  people  with  both  domain  and  computing  expertise.  The  problems  are  both  insufficient  supply  and   low  retention,  because  of  poor  career  development  opportunities   for   those  supporting  academic   research.  Europe’s   long-­‐term  competitiveness  depends  on  people  with  skills  to  exploit   its  HPC  infrastructure.   It  must  provide  ongoing  training  programmes  to  keep  pace  with  the   rapid   evolution   of   the   science,   methods   and   technologies,   and   must   put   in   place   more  attractive   career   structures   for   software   developers   to   retain   their   skills   in   universities   and  associated  institutions.    

Recommendation  7     Thematic  Centres  

Organisational   structure   is   needed   to   support   large   long-­‐term   research   programmes,   bringing  together   competences   to   share   expertise.   This   could   take   the   form  of   virtual   or   physical   thematic  centres  which  might   support   community   codes  and  data,  operate  dedicated   facilities,   focus  on  co-­‐design,  or  have  a  cross-­‐cutting  role  in  the  development  and  support  for  algorithms,  software  or  tools.  While   some  existing  application  areas  have  self-­‐organised   in   this  way,  new  areas   such  as  medicine  might   achieve   more   rapid   impact   if   encouraged   to   follow   this   path.   Thematic   centres   should   be  established   to   support   large   long-­‐term   research   programmes   and   cross-­‐cutting   technologies,   to  preserve  and  share  expertise,  to  support  training  and  to  maintain  software  and  data.    

                                                                                                                         10  Petaflop,  1  Pflop/s  =  1015  flop/s  

Page 18: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  18  

1 THE EUROPEAN HPC ECOSYSTEM AND ITS

POTENTIAL IMPACT 2012  - 2020

1.1 Introduction  and  Background  In  this  section,  we  initially  take  the  opportunity  to  provide  some  of  the  background  to  developments  since   the   last   report.  Following  a  brief   summary  of   the  emerging  European  strategy   for  HPC   (section  1.1.1)   and   the   achievements   of   PRACE   to   date   (section   1.1.2),   we   highlight   the   major   opportunity  presented  with  the  advent  of  the  next-­‐generation  exascale  systems.4  The  objectives  and  scope  of  the  present   update   and   the   resulting   scientific   perspective   from   each   of   the   five   panels   charged   with  contributing  to  this  case  are  summarised  in  sections  1.2  and  1.3.  Our  considerations  extend  beyond  the  scientific   impact   to  consider   the  balance  between   the  scientific,   industrial  and  social  benefits  of  HPC  (section  1.4).    

HPC  is  currently  undergoing  a  major  change  as  the  next  generation  of  computing  systems  (‘exascale  systems’)4   is  being  developed  for  2020.  These  new  systems  pose  numerous  challenges,  from  a  100-­‐fold   reduction   of   energy   consumption   to   the   development   of   programming  models   for   computers  that  host  millions  of  computing  elements.    Exascale  systems  will  be  very  different  from  today’s  HPC  systems,  and  building,  operating  and  using   such   systems   will   face   severe   technological   challenges.  While  the  major  focus  of  this  report  lies  in  the  scientific  challenges  and  outcomes  associated  with  the  provision   of   exascale   resources,   this   panel   would   stress   from   the   outset   that   such   outcomes   are  critically  dependent  on  the  provision  of  the  associated  support  infrastructure:  without  this  provision,  the  full  benefits  of  an  exascale-­‐class  infrastructure  will  simply  not  be  realised.  An  overview  of  the  key  requirements  is  given  in  section  7.  

Through  PRACE,  the  academic  sector  is  now  pooling  its  leadership-­‐class  or  Tier-­‐0  computing  systems  as   a   single   infrastructure,  making   them  available   to   all   researchers   in   the   EU.   Critical  mass   is   thus  achieved,   and   access   to   these   top-­‐of-­‐the-­‐range   HPC   systems   is   provided   based   on   scientific  excellence   rather   than   the   geographical   location   of   a   researcher.   PRACE   is   further   extending   its  services   to  mid-­‐range  HPC  systems   (Tier-­‐1)  with   the  objective  of  providing  a  distributed  computing  ecosystem   that   serves   its   users   irrespective   of   their   location   and   the   availability   of   national  resources.   The   scientific   panels   responsible   for   this   paper   are   convinced   that   the   PRACE  model   of  pooling  and  sharing  systems  and  expertise  makes  optimal  use  of  the  limited  resources  available.    

Europe   has   many   of   the   technical   capabilities   and   human   skills   needed   to   tackle   the   exascale  challenge,   i.e.   to   develop   native   capabilities   that   cover   the   whole   technology   spectrum   from  processor  architectures  to  applications.  Even  though  the  EU  is  currently  weak  compared  to  the  US  in  terms   of   HPC   system   vendors,   there   are   particular   strengths   in   microprocessor   architectures,  applications,   low-­‐power   computing,   systems   and   integration   that   can   be   leveraged   to   engage  successfully   in   this   global   race,   restoring   the   EU   on   the  world   scene   as   a   leading-­‐edge   technology  supplier.   Progress   within   Europe   has   to   date   been   channelled   through   the   EESI11   –   The   European  Exascale  Software  Initiative  –  an  initiative  co-­‐funded  by  the  European  Commission.  

EESI’s  goal  is  to  build  a  European  vision  and  roadmap  to  address  the  challenge  of  the  new  generation  of   massively   parallel   systems   that   will   provide   Pflop/s   performances   in   2010   and   Eflop/s  performances  in  2020.12  EESI   is   investigating  the  strengths  and  weaknesses  of  Europe  in  the  overall  

                                                                                                                         11  http://www.eesi-­‐project.eu/pages/menu/homepage.php  12 flop/s,   for   floating-­‐point  operations  per   second;   teraflop,  1  Tflop/s  =  1012   flop/s;  petaflop,  1  Pflop/s  =  1015  flop/s;  exaflop,  1  Eflop/s  =  1018  flop/s;  zettaflop,  1  Zflop/s  =  1021  flop/s  

 

Page 19: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  19  

international   HPC   landscape   and   competition.   In   identifying   priority   actions   and   the   sources   of  competitiveness  for  Europe  induced  by  the  development  of  peta/exascale  solutions  and  usages,  EESI  is   investigating   and   proposing   programmes   in   education   and   training   for   the   next   generation   of  computational   scientists.   The   Initiative   is   also   seeking   to   identify   and   stimulate   opportunities   for  worldwide  collaborations.  

Leveraging  the  results  from  the  EESI  deliberations  as  part  of  the  current  exercise  has  been  ensured  through  including  many  of  the  EESI  project  leads  in  the  present  panel  membership.  

The   primary   objectives   of   this   update   to   the   Scientific   Case   are   to   identify   the   scientific   areas   for  which  PRACE   is  an   important  Research   Infrastructure  and  the  key  challenges  within   these  areas.   In  highlighting  the  crucial  role  of  large-­‐scale  computer  simulation,  we  identify  the  potential  outcomes  in  science   and   engineering   to   be   addressed   through   PRACE   petascale   resources,   and   the   impact   of  computer  simulations  on  the  economy  and  society  in  general.  This  impact  is  quantified  through  the  production  of  a  roadmap  of  expected  achievements  in  the  next  5–8  years.    

The   importance   of   exascale-­‐class   supercomputing   for   scientific   and   economic   leadership   has   been  stressed  in  numerous  reports  in  the  USA  and  Europe.  Globally,  nations  are  investing  in  HPC  to  tackle  some  of  these  issues.  In  the  1990s,  the  USA  stood  out  as  the  world  leader  in  HPC,  with  Europe  and  Japan   the   other   major   players.   Now,   countries   including   India,   Russia   and   China   are   undertaking  ambitious  HPC  programmes.  Europe  has  lost  ground  by  10%  since  2007  in  terms  of  HPC  investment.13  Failure  by  Europe  to  increase  its  investment  means  that  not  only  will  it  risk  falling  further  behind  the  world  leader,  the  USA,  but,  worse,  its  position  may  be  threatened  by  emerging  HPC  powers.    

1.1.1 The  Emergence  of  a  European  HPC  Strategy    The   development   of   HPC   has   long   been   a   national   affair   for   EU  Member   States,   often   driven   by  military   and   nuclear   energy   applications.   In   recent   years,   the   increasing   importance   of   HPC   for  researchers   and   industry,   as   well   as   the   exponential   rise   in   the   investments   required   to   stay  competitive   at   world   level,   have   led   to   a   common   understanding   that   ‘Europeanisation’   of   this  domain   would   benefit   everyone.   This   is   also   true   for   those   Member   States   which   encounter  difficulties   in   creating   self-­‐sufficient   national   HPC   infrastructures   but   which   can   make   valuable  contributions  to  and  benefit  from  EU-­‐level  HPC  capabilities.    

As  outlined  above,  the  HPC  in  Europe  Taskforce  published  in  2007  a  White  Paper  entitled  ‘Scientific  Case   for   Advanced   Computing   in   Europe’1   that   argued   the   case   for   HPC   to   support   EU  competitiveness.   This   work   was   carried   out   in   the   context   of   the   ESFRI14   European   Roadmap   for  Research  Infrastructures.  It   led  to  the  consolidation  of  national  HPC  strategies,  e.g.   in  Germany  and  France  with  the  creation  of   the  Gauß-­‐Allianz15  and  of  GENCI   (Grand  Equipement  National  de  Calcul  Intensif)16  respectively.  In  turn,  these  developments  resulted  in  the  setting  up  of  PRACE,  as  Member  States   and   national   institutions   have   realised   that   only   through   a   joint   and   coordinated   effort  will  they  be  able  to  stay  competitive.  This  was  supported  in  2009  by  the  European  Council,  which  called  for  further  efforts  in  this  domain.    

1.1.2 Achievements  of  PRACE  to  Date  Since   the   creation   of   the   PRACE   legal   entity   in   2010,   the   academic   sector   has   been   pooling   its  leadership-­‐class   computing   systems   as   a   single   infrastructure,   making   them   available   to   all  

                                                                                                                         13  http://insidehpc.com/2010/11/23/video-­‐interview-­‐with-­‐idc-­‐on-­‐their-­‐strategic-­‐agenda-­‐for-­‐european-­‐supercomputing-­‐leadership/  

14  European  Strategy  Forum  for  Research  Infrastructures  http://ec.europa.eu/research/infrastructures/index_en.cfm?pg=esfri  

15  http://www.gauss-­‐allianz.de/en/  16  http://www.genci.fr/?lang=en  

Page 20: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  20  

‘PRACE  ensures  the  wide  availability  of  HPC  resources  on  equal  access  terms.  It  has  to  be  further  strengthened  to  acquire  the  competence  to  (i)  pool  national  and  EU  funds,  (ii)  set  the  specifications  and  carry  out  joint  (pre-­‐commercial)  procurement  for  leadership-­‐class  systems,  (iii)  support  Member  States  in  their  preparation  of  procurement  exercises,  (iv)  provide  research  and  innovation  services  to  industry,  and  (v)  provide  a  platform  for  the  exchange  of  resources  and  contributions  necessary  for  the  operation  of  high-­‐performance  computing  infrastructure.    Additionally,  an  e-­‐Infrastructure  for  HPC  application  software  and  tools  needs  to  be  put  in  place.  It  should  further  consolidate  the  EU’s  strong  position  in  HPC  applications  by  coordinating  and  stimulating  parallel  software  code  development  and  scaling,  and  by  ensuring  the  availability  of  quality  HPC  software  to  users.’  

researchers  in  the  EU.  Critical  mass  is  achieved,  and  access  to  these  top-­‐of-­‐the-­‐range  HPC  systems  is  provided  based  on  scientific  excellence  rather  than  the  geographical  location  of  a  researcher.  PRACE  is   further   extending   its   services   to   mid-­‐range   HPC   systems   with   the   objective   of   providing   a  distributed  computing  platform  that  serves  its  users  irrespective  of  their  location  and  the  availability  of  national  resources.  The  PRACE  model  of  pooling  and  sharing  systems  and  expertise  makes  optimal  use  of  the  limited  resources  available.    

The   mission   of   the   PRACE   RI   is   thus   to   enable   high-­‐impact   European   scientific   discovery   and  engineering   research  and  development  across  all  disciplines   to  enhance  European  competitiveness  for  the  benefit  of  society.  

 The   PRACE   RI   seeks   to   realise   this   goal   through   world-­‐class   computing   and   data   management  resources  and  services  open  to  all  European  public  research  through  a  peer  review  process.  With  the  broad   participation   of   European   governments   through   representative   organisations,   a   diversity   of  resources  can  be  provided  by  the  PRACE  RI  –  including  expertise  throughout  Europe  in  effective  use  of  the  resources.    

PRACE   encourages   collaboration   with   industry   and   industrial   use   and   conducts   annual   Industrial  Seminars   at   locations   throughout   Europe.   It   also   seeks   to   strengthen   the   European   HPC   industry  through  various  initiatives  and  has  a  strong  interest  in  improving  the  energy  efficiency  of  computing  systems  and  reducing  their  environmental  impact.  

The   PRACE   RI   is   established   as   an   international   non-­‐profit   association   located   in   Brussels   and   is  named  the   ‘Partnership  for  Advanced  Computing  in   Europe   AISBL’.   It   has   24   member   countries  whose  representative  organisations  are  creating  a  pan-­‐European   supercomputing   infrastructure,  providing   access   to   computing   and   data  management   resources   and   services   for   large-­‐scale  scientific  and  engineering  applications  at  the  highest   performance   level.   PRACE   is   funded   by  member   governments   through   their  representative   organisations   and   the   EU’s  Seventh   Framework   Programme   (FP7/2007-­‐2013).17    

The   first   PRACE   computer   systems   and   their  operations  are  funded  by  the  governments  of  the  representative  organisations  hosting  the  systems.  

It   is   clear   that   the   position   of   HPC   has   evolved  since   the   last   Scientific   Case   in   2007   from   a  discipline   crucial   to   the   academic   research  community   to   a   point   where   it   is   acknowledged  to  be  a  central  asset  in  pursuing  ‘Europe’s  place  in  a  Global  Race’.18    

The   sidebar   provides   an   extract   from   the  communication   from   the   Commission   to   the  European  Parliament.  

                                                                                                                         17  Under  grant  agreement  n°  RI-­‐261557  18  Communication  from  the  Commission  to  the  European  Parliament,  the  Council,  the  European  Economic  and  Social  Committee  and  the  Committee  of  the  Regions  –  ‘High-­‐Performance  Computing:  Europe’s  place  in  a  Global  Race’,  15.2.2012,  http://ec.europa.eu/information_society/newsroom/cf/item-­‐detail-­‐dae.cfm?item_id=7826

Page 21: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  21  

Apex  of  Resources  The  PRACE  leadership  systems  form  the  apex  of  resources  for  large-­‐scale  computing  and  data  management  for  scientific  discovery,  engineering  research  and  development  for  the  benefit  of  Europe  and  are  well  integrated  into  the  European  HPC  ecosystem.    

Access  The  PRACE  RI  offers  three  different  forms  of  access:  Project  Access,  Multiyear  Access  and  Preparatory  Access.  Project  Access  is  the  norm  for  individual  researchers  and  research  groups.  It  is  open  to  academic  researchers  worldwide  and  industry  for  projects  deemed  to  have  significant  European  and  International  impact.  'Calls  for  Proposals'  issued  twice  a  year  are  evaluated  by  leading  scientists  and  engineers  in  a  peer  review  process  governed  by  a  PRACE  Scientific  Steering  Committee  comprising  leading  European  researchers  from  a  broad  range  of  disciplines.  Programme  Access  is  available  to  major  European  projects  or  infrastructures  that  can  benefit  from  PRACE  resources  and  for  which  Project  Access  is  not  appropriate.  Preparatory  Access  is  a  simplified  form  of  access  for  limited  resources  for  the  preparation  of  resource  requests  in  response  to  Project  Access  Calls  for  Proposals.  

Education  and  Training  PRACE  has  an  extensive  education  and  training  effort  for  effective  use  of  the  RI  through  seasonal  schools,  workshops  and  scientific  and  industrial  seminars  throughout  Europe.  Seasonal  schools  target  broad  HPC  audiences,  whereas  workshops  are  focused  on  particular  technologies,  tools  or  disciplines  or  research  areas.  Education  and  training  material  and  documents  related  to  the  RI  are  available  on  the  PRACE  website,  as  is  the  schedule  of  events  (http://www.training.prace-­‐ri.eu/).  Six  PRACE  Advanced  Training  Centres  have  been  established  in  2012.  

Software  and  Hardware  Technology  Initiatives    PRACE  undertakes  software  and  hardware  technology  initiatives  with  the  goal  of  preparing  for  changes  in  technologies  used  in  the  RI  and  provides  the  proper  tools,  education  and  training  for  the  user  communities  to  adapt  to  those  changes.  A  goal  of  these  initiatives  is  also  to  reduce  the  lifetime  cost  of  systems  and  their  operations,  in  particular  the  energy  consumption  of  systems  and  the  environmental  impact.    

ETP4HPC  The  European  Technology  Platform  (ETP)  for  High-­‐Performance  Computing  (HPC)  was  created  to  improve  Europe’s  position  in  the  domain  of  HPC  technologies  and  to  foster  collaboration  among  all  players  in  the  HPC  supply  chain.  ETP4HPC  will  promote  the  growth  of  Europe's  HPC  vendors  by  maintaining  a  Strategic  Research  Agenda  for  HPC  technologies,  complementing  the  support  provided  by  PRACE  for  academic  and  industrial  user  communities.  

 

1.2 Objectives  and  Scope  of  the  Scientific  Case  Update  The   preceding   section   provides   a   summary   of   the   developments   since   the   Scientific   Case  was   last  published   in  2007.  The  primary  objectives  of   this  update  to   the  Scientific  Case19  are   to   identify   the  scientific   areas   for   which   PRACE   is   an   important   Research   Infrastructure   and   the   key   challenges  within   these   areas,   highlighting   the   crucial   role   that   large-­‐scale   computer   simulation   is   playing   in  many  areas  of  science.  In  addition  to  identifying  the  potential  outcomes  in  science  and  engineering  to   be   addressed   through   PRACE   petascale   resources   and   the   anticipated   approach   of   exascale  capabilities,   the   update   focuses   on   the   potential   impact   of   computer   simulations   on   the   economy  and  society   in  general.  This   impact   is  quantified   through   the  production  of  a   roadmap  of  expected  achievements  in  the  next  5–8  years.    

The  scope  of  this  update  is  to  cover  the  period  2012–2020,  including  the  same  panels  and  scientific  areas   as   before,  while   extending   the   life   science   panel   to   include  medicine.   The   aim   is   to   place   a                                                                                                                            19  Highlighted  by  the  PRACE  SSC,  the  PRACE  User  Forum  Programme  Committee  and  the  Board  of  Directors  

Page 22: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  22  

greater   emphasis   on   socio-­‐economic   challenges,   business   and   innovation   than   the   original   Case.  While  leveraging  the  results  from  the  EESI11,  the  update  reports  on  the  status  of  implementation  of  the  recommendations  of   the  original  Scientific  Case,   in  particular   through  the  establishment  of   the  PRACE   Research   Infrastructure.   The   final   aim   is   to   provide   strategic   input   to   the   European  Commission,   the   National   funding   agencies,   decision   makers,   the   science   communities   and   the  PRACE  Research  Infrastructure.    

The   following   five   sections   of   this   report   are   devoted   to   the   description   of   a   scientific   roadmap,  detailing  the  major  challenges,  the  scientific  and  societal  benefits  through  making  progress  towards  their  resolution,  and  the  prerequisites  for  being  able  to  tackle  these  challenges.  Five  scientific  areas  are  presented  in  sections  2–6,  each  derived  from  the  work  of  an  associated  panel  of  experts.  The  five  panels  and  their  Chairs  who  have  contributed  to  the  case  include  those  in  the  following  areas:  

• Weather,  Climatology  and  solid  Earth  Sciences:  Chair  Giovanni  Aloisio,  University  of  Salento  and  ENES-­‐CMCC,  Italy  The  focus  is  on  climate  change,  oceanography  and  marine  forecasting,  meteorology,  hydrology  and  air  quality  and  solid  Earth  sciences.  

• Astrophysics,  HEP  and  Plasma  Physics:  Chair  Andreas  Schäfer,  University  of  Regensburg,  Germany  A  compelling  and  recurring  theme  is  that  of  theory  and  modelling  providing  fresh  insight  and  hence  contributing  to  the  success  of  major  experimental  facilities  –  from  space  experiments  such  as  the  European  Planck  Surveyor  satellite  to  those  at  large  European  centres  like  CERN,  FAIR  and  ITER.  

• Materials  Science,  Chemistry  and  Nanoscience:  Chair  Stefan  Blügel,  Research  Centre  Jülich,  Germany  The  key  challenges  highlight  the  crucial  role  that  computer  simulations  now  play  in  almost  all  aspects  of  the  study  of  materials,  not  only  in  traditional  materials  science,  physics  and  chemistry,  but  also  in  nanoscience,  surface  science,  electrical  engineering,  Earth  sciences,  biology  and  drug  design.  The  demands  of  environmental  constraints  –  cleaner  catalysis-­‐based  chemical  processes,  materials  able  to  withstand  extreme  stress  and  environments,  nano  technologies,  etc.  –  drive  many  of  these  developments.    

• Life  Sciences  and  Medicine:  Chair  Modesto  Orozco,  Institute  for  Research  in  Biomedicine,  Spain  The  focus  lies  in  the  key  challenges  and  scientific  objectives  in  genomics,  systems  biology,  molecular  dynamics,  biomolecular  simulation  and  in  medicine.  

• Engineering  Sciences  and  Industrial  Applications:  Chair  Philippe  Ricoux,  TOTAL,  France  The  key  objectives  and  challenges  present  compelling  exemplars  of  the  sheer  breath  and  impact  of  simulation.  These  range  from  innovation  in  technology  and  design  (complete  helicopter  simulation  for  next  generation  rotorcraft,  ‘green’  aircraft  and  the  virtual  power  plant)  to  enhanced  understanding  and  modelling  of  physical  phenomena  in,  for  example,  optimising  gas  turbines  (aero-­‐engines  or  power  generation)  or  internal  combustion  engines  (in  terms  of  costs,  stability,  higher  combustion  efficiency,  reduced  fuel  consumption,  near-­‐zero  pollutant  emissions  and  low  noise).  The  scientific  and  societal  impacts  are  widespread,  from  disaster  management  (e.g.  forest  fires)  to  improvements  in  medical  care  (biomedical  flows).  

An   outline   of   the   key   components   of   the   Computational   Infrastructure,   and   the   user   support  functions   that  must   be   provided   to   realise   the   full   benefit   of   that   infrastructure,   are   presented   in  section  7.  The  full  membership  of  each  panel  is  given  in  section  8.  Furthermore,  representatives  from  each  PRACE  Partner86  have  been  appointed  as  national  contact  points  to  disseminate  the  activity  and  identify  scientists  in  their  country  that  could  also  contribute  to  the  process.  

We   consider   it   essential   that   the   updated   Scientific   Case   has   broad   support   from   the   scientific  community.   The   PRACE   partners   have   therefore   spread   the   information   widely   and   encouraged  participation   in   the  preparation  of   the  document   itself.  Drafts  have  been  made  publically  available  for  comment  on  the  PRACE  website.  

Page 23: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  23  

1.3 Progress  to  be  Expected  During  the  Petascale  Era  By   way   of   introduction   to   the   domain   areas,   we   present   initially   a   summary   of   the   key   scientific  objectives  and  challenges  from  each  of  the  scientific  areas  under  consideration.  We  then  present  this  summary   in   tabular   form   to   impress   on   the   reader   the   sheer   scope  of   accomplishments   promised  through  provision  of  a  sustainable  top-­‐level  infrastructure.  Thus,  Table  1.1  is  organised  to  identify  the  key  challenges  from  the  five  distinct  areas  listed  above.  

1.3.1 Weather,  Climatology  and  solid  Earth  Sciences  Weather,  Climatology  and  solid  Earth  Sciences   (WCES)  encompass  a  wide   range  of  disciplines   from  the  study  of  the  atmosphere,  the  oceans  and  the  biosphere  to  issues  related  to  the  solid  part  of  the  planet.   They   are   all   part   of   Earth   system   sciences   or   geosciences.   Earth   system   sciences   address  many  important  societal  issues  from  weather  prediction  to  air  quality,  ocean  prediction  and  climate  change  to  natural  hazards  such  as  seismic  and  volcanic  hazards,  for  which  the  development  and  the  use  of  high-­‐performance  computing  plays  a  crucial  role.  

Research   in   the   fields   of   w eather,   c limatology   and   solid   Earth   s ciences   is   of   key   importance  for  Europe  for:  

•     Informing  and  supporting  preparation  of  EU  policy  on  environment  and  climate  mitigation  and  adaptation  

•         Understanding  the  likely  impact  of  the  natural  environment  on  EU  infrastructure,  economy  and  society  

• Enabling   informed   EU   investment   decisions   in   ensuring   sustainability   within   the   EU   and  globally  

• Developing   civil   protection   capabilities   to   protect   the   citizens   of   the   EU   from   natural  disasters  

• Supporting  through  the  EU  and  ESA  joint   initiative  on  Global  Monitoring  of  Environment  and  Security  

The  challenges  and  outcomes  in  the  WCES  scientific  domains  to  be  addressed  through  petascale  HPC  provision   are   summarised   below.  More   details   are   provided   in   Table   1.1;   the   associated   societal  benefits  are  fully  developed  in  section  1.4.    

Climate  Change    

Quantify  uncertainties  on   the  degree  of  warming  and   the   likely   impacts  on  nature  and   society.   In  particular   this   implies:(i)   increasing   the   capability   and   complexity   of   ‘whole   Earth   system’  models  that   represent   in   ever-­‐increasing   realism   and   detail   the   scenarios   for   our   future   climate;   (ii)  performing   process   studies  with   ultra-­‐high-­‐resolution  models   of   components   of   the   Earth   system  (e.g.   cloud   resolving   models   of   the   global   atmosphere);   (iii)   running   large  member   ensembles   of  these  models.  

Oceanography  and  Marine  Forecasting    

Build  and  efficiently  operate  the  most  accurate  ocean  models  in  order  to  assess  and  predict  how  the  different  components  of  the  ocean  (physical,  biogeochemical,  sea-­‐ice)  evolve  and  interact.  Produce  realistic  reconstructions  of  the  ocean's  evolution  in  the  recent  past  and  accurate  predictions  of  the  ocean's   future   state   over   a   broad   range   of   time   and   space   scales,   to   provide   policy   makers,  environment  agencies  and  the  general  public  with  relevant  information  and  to  develop  applications  and  services  for  government  and  industry.  

Meteorology,  Hydrology  and  Air  Quality    

Predict  weather  and  flood  events  with  high  socio-­‐economic  and  environmental  impact  a  few  days  in  advance   –   with   enough   certainty   and   early   warning   to   allow   practical   mitigation   decisions   to   be  

Page 24: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  24  

taken.  Understand  and  predict   the  quality   of   air   at   the   Earth’s   surface:   development  of   advanced  real-­‐time  forecasting  systems  to  allow  early  enough  warning  and  practical  mitigation  in  the  case  of  pollution  crisis.  

Solid  Earth  Sciences    

Challenges   span   a  wide   range   of   disciplines   and   have   significant   scientific   and   social   implications,  playing  today  a  central  role  in  natural  hazard  mitigation  (seismic,  volcanic,  tsunami  and  landslides),  treaty   verification   for   nuclear   weapons,   and   increased   discovery   of   economically   recoverable  petroleum   resources   and   monitoring   of   waste   disposal.   Exascale-­‐class   computing   capability   will  make   it   increasingly  possible   to   address   the   issues  of   resolution,   complexity,   duration,   confidence  and  certainty,  and  to  resolve  explicitly  phenomena  that  were  previously  parameterised,  and  will  lead  to  operational  applications  in  other  European  centres,  national  centres  and  industry.  

1.3.2 Astrophysics,  High-­‐Energy  Physics  and  Plasma  Physics  In  recent  years,  astrophysics,  high-­‐energy  physics  and  plasma  physics  have  shared  a  dramatic  change  in  the  role  of  theory  for  scientific  discovery.  In  all  of  these  fields,  new  experiments  became  ever  more  costly,   require   increasingly   long   timescales   and   aim   at   the   investigation   of  more   and  more   subtle  effects.  Consequently,  theory  is  faced  with  two  types  of  demands:  precision  of  theory  predictions  has  to   be   increased   to   the   point   that   it   is   better   than   experimental   precision.   Since   the   latter   can   be  expected   to   increase   by   further   orders   of   magnitude   until   2020,   this   is   a   most   demanding  requirement.   In   all   of   these   research   fields,   well-­‐established   theoretical  methods   have   existed   for  many   decades.   To   achieve   dramatic   progress   therefore   requires   a   dramatic   increase   in   theoretical  resources,  including  computer  resources  for  numerical  studies.    

In  parallel,  the  need  to  explore  model  spaces  of  much  larger  extent  than  previously  investigated  also  became  apparent.  For  example,  to  determine  the  nature  of  dark  energy  and  dark  matter  requires  a  detailed  comparison  of  predictions  from  large  classes  of  cosmological  models  with  data  from  the  new  satellites  and  ground-­‐based  detectors  which  will  be  deployed  until  2020.    

These  predictions  can  be  generated  only  by  massive  numerical   simulations.   In  high-­‐energy  physics,  one  of  the  tasks  is  to  explore  many  possible  extensions  of  the  Standard  Model  to  such  a  degree  that  even  minute   deviations   between   experimental   data   and   Standard  Model   predictions   can   serve   as  smoking   guns   for   a   specific   realisation   of   New   Physics.   In   plasma   physics,   one   of   the   tasks   is   to  understand  the  physics  observed  at  ITER  at  such  a  high  level  that  substantially  more  efficient  fusion  reactors   can   be   reliably   designed   based   on   theoretical   simulations  which   explore   a   large   range   of  options.    

While   the   three   fields   covered   in   this   section   are   distinctly   different,   they   also   have   substantial  overlap.  For  example,  the  Big  Bang  is  equally  a  topic  of  astrophysics  as  of  high-­‐energy  physics    while    nucleosynthesis   depends   on   nuclear   physics   as   well   as   the   modelling   of   supernova   explosions.  Plasma   physics   is   crucial   for   many   aspects   of   astrophysics   as   well   as,   for   example,   a   better  understanding  of  high-­‐energy  heavy-­‐ion  collisions  at  CERN.    

As  the  experimental  roadmap  until  2020  is  already  fixed  in  all  three  research  fields,   it   is  possible  to  quantify   with   some   reliability   what   these   demands   imply   for   HPC   in   Europe.   If   one   requires   that  theory  keeps  up  with  the  experimental  progress,  which  is  crucial  to  maximise  the  scientific  output  of  the  latter,  these  three  fields  together  will  require  at  least  one  integrated  sustained  Eflop/s-­‐year.  This  will  require  at  least  a  dedicated  compute  power  of  1  Eflop/s,  peak  for  roughly  one  decade.    

1.3.3 Materials  Science,  Chemistry  and  Nanoscience  We   all   experience   in   many   aspects   of   life   the   increasing   use   of   diverse   materials-­‐science-­‐based  technologies,  from  ingestible  radio  transmitters  and  fluorescent  quantum  dots  for  medical  diagnosis  and   treatment   to  modern  multifunctional   cell   phones   that   take  photographs,   receive  and   transmit  

Page 25: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  25  

electronic   mail   and   connect   us   ad   hoc   to   digital   information   available   via   the   Internet,   the   latter  made   possible   by   virtue   of   nanotechnology-­‐enabled   electronics.   The   objective   of   computational  materials  science,  chemistry  and  nanoscience  is  to  create  novel  materials  or  chemical  agents  ranging  from   the   level   of   a   single   atom  up   to  macroscopic  measures,  with   effects   ranging   from   electronic  reaction   times   in   the   femtosecond   range   up   to   geological   periods   that   enter  materials   formation.  Computational   materials   science   is   thus   part   of   a   processing   and   manufacturing   activity   that   will  finally   be   deployed   in   a   technology   that   affects   our   society   today   and   determines   our   options   to  shape  our  future.    

Computational   materials   science,   chemistry   and   nanoscience   is   concerned   with   the   complex  interplay  of  the  myriads  of  electrons  and  atoms  in  a  solid  or  a  liquid,  thereby  producing  a  continuous  stream  of  new  and  unexpected  phenomena,  processes  and  forms  of  matter.  The  science  deals  with  the  complexity  of  the  quantum  dimension  and  the  complexity  introduced  by  the  large  configuration  space  of  the  many  particles  that   interact  and  compete  at  a   large  range  of   length,  time,  energy  and  entropy  scales.  It  focuses  on  the  conceptualisation,  development,  implementation  and  application  of  analytical   models   and   novel,   technically   very   complex   and   computationally   very   demanding  computer-­‐based   classical   and   quantum   mechanical   methods.   The   goals   are   to   analyse   and   to  interpret  experimental  characterisation,  to  assist  the  design  and  optimisation  of  routes  for  materials  synthesis,  processing  and  manufacturing  –  ranging  from  chemical  reactions  for  growth  to  long-­‐term  annealing   and   recovery   routes   of   materials.   The   aim   is   to   facilitate   the   discovery   and   design   of  materials   with   new   functionalities   and   desired   properties,   and   to   provide   the   methods   and  computational   tools   for   neighbouring   fields   such   as   life   sciences   and   medicine,   and   engineering  sciences  and  industrial  applications  that  will  result,  for  example,  in  new  drugs  or  more  efficient  solar  cells.    

The  advance   from  petascale   to  exascale   computing   is   a  driving   force   for   improving   the   robustness  and  predictability  of  the  computational  models,  for  instance  by  extending  and  improving  the  degree  of   quantum   mechanics   that   are   included   in   the   models.   One   example   is   the   field   of   strongly  correlated   electron   materials   exhibiting   a   wealth   of   exciting   properties   such   as   the   absence   of  freezing   down   to   the   lowest   of   temperatures,   high-­‐temperature   superconductivity,   colossal  magneto-­‐resistance,   orbitaltronics,   or   a   simultaneous   presence   of   magnetism   and   ferroelectricity.  Many   of   these   phenomena   are   investigated   today   combining   dynamical   mean   field   theory   with  density  functional  theory,  but  the  analysis  of  one  system  in  a  petascale  environment  can  easily  take  up   to   two   years.   Capability   computing   in   an   exascale   environment   is   necessary   to   address   grand  challenges   in  computational  nanoscience.   It  not  only  enables   the   investigation  of   such  phenomena  and  materials  by  larger  computational  models  to  improve  the  robustness  of  the  predictions,  but  also  allows  the  scanning  of  the  parameter  spaces  as  a  function  of  temperature,  pressure,  external  fields  and  stimuli  for  a  large  number  of  systems.  More  memory  per  computer  node  permits  the  application  of  new  algorithms  with  finer  energy  resolution.    

Electron  excitations  and  dynamics   and  non-­‐adiabatic  molecular  or   spin  dynamics   are,   for  example,  responsible  for  the  description  of  photosynthesis,  photovoltaic  and  chemical  reactions  and  processes  relevant  to  ultrafast  writing  and  reading  of   information,  and  at  the  end  even  for  the  van  der  Waals  interaction.   This   is   another   area   that   benefits   greatly   from   an   exascale   facility  which   provides   the  required  computational  resources  –  computing  power  and  memory  –  for  simulations  relevant  to  the  fields  of  information  technology,  chemistry  and  the  life  sciences,  as  well  as  energy  management.    

Finally,  on  the  road  from  a  fundamental  principle  to  synthesis  and  growth  and  then  to  functionality  in  a  device  or  a  biological  environment  that  is  integrated  in  a  certain  technology,  materials  are  part  of  a  heterogeneous   system   driven   by   simulation.   The   latter   involves   the   concurrent   coupling   between  atomic-­‐scale   and   macro-­‐scale   dynamics   through   a   multiscale,   multidisciplinary   approach   that  stretches   from   fundamental   science   to   chemical   and   process   engineering.   Exascale   computing  will  provide   the  resources   to  change   in  part   the  paradigm  of  computational  materials   science,  evolving  from   explanation   and   analysis   to   discovery   and   eventually   to   control   of   materials   properties   and  

Page 26: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  26  

processes   at   the   quantum   scale.   It   will   become   the   driving   engine   for   a   design   continuum   from  fundamental  discovery,  through  systems  engineering  and  processing,  to  technology  that  accelerates  the   availability   of   new   devices   with   new   functionalities.   In   addition   to   promoting   the   role   of  computational   materials   science   in   guiding   the   field   with   respect   to   experimental   and   theoretical  approaches,   it   is   also   needed   to   perform   the   atomistic   level   simulations   that   provide   the  fundamental   data   for   the   coarse-­‐grained   methods   used   in   multiscale   simulation.   The   exascale  infrastructure   offers   high   throughput   through   massive   capacity   computing.   It   facilitates   the   link  between   materials   science   and   materials   informatics,   enabling   new   discoveries   through  combinatorial  search  among  a  vast  number  of  alternatives  being  in  a  feedback  loop  to  processing  and  manufacturing.    

In  conclusion,  there  is  no  doubt  that  the  materials  science,  chemistry  and  nanoscience  community  in  Europe   requires   a   large   allocation   of   CPU   time   that  will   exceed   1   Eflop/s.   There   is   a   considerable  demand   on   Tier-­‐0   capability   computing   for   dynamical   mean-­‐field   theory,   (ab-­‐initio)   molecular  dynamics,  and  multiscale  and  device  simulation.  Obviously,  a  European  exascale  environment  must  take  into  account  that  this  field  also  requires  capacity  computing  to  search  the  immense  phase  space  of  opportunities.  Therefore  a  heterogeneous   infrastructure  best   serves   this   field.  We  re-­‐emphasise  that   a   critical   requirement  of   this   community   is   the  optimal   and  efficient  use  of  massively  parallel  supercomputers   for   this   very   broad   range   of   complex   problems   in   soft   matter.   The   ‘know-­‐how’  surrounding  suitable  parallelisation  strategies  needs  to  be  strengthened  and  expanded.  

1.3.4 Life  Sciences  and  Medicine  The   benefits   of   the   continuous   development   of   more   powerful   computation   systems   are  visible   in  many  areas  of   life  sciences.   For  example,  at   the  beginning  of  2000,   the  Human  Genome  Project20   was   an   international   flagship   project   that   took   several   months   of   CPU   time   using   a  hundred-­‐Gflop/s  computer  with  1  terabyte  of  secondary  data  storage.    

Today,  genomic  sequencing  has  changed  from  being  a  scientific  milestone  to  a  powerful  tool  for  the  treatment  of  diseases,   in  particular  because   it   is  able  to  deliver  results   in  days,  while  the  patients  are  still  under  treatment.  As  an  example,  Beijing  Genomics  Institute  is  capable  of  sequencing  more  than   100   human   genomes   a  week   using   the  Next   Generation   Sequencing   instruments   and   a   100  Tflop/s   computer   that   will   migrate   in   the   near   future   to   a   1   Pflop/s   capability.21   Today,   genome  sequencing  technology  is  ineffective  if  the  data  analysis  needs  to  be  carried  out  on  a  grid  or  cloud-­‐like  distributed  computing  platform.  First,  such  systems  cannot  achieve  the  necessary  dataflow,  of  the   order   of   10–100   petabytes/year;22   second,   research   involving   living   patients   requires   both  speed  and  high  security;   last,  but  not   least,  ethical  and  confidentiality  issues  handicap  distributing  patient  data  across   the   cloud  world.   In   coming   years,   sequencing   instrument   vendors   expect   to  decrease   costs   by   one   to   two   orders   of   magnitude,   with   the   objective   of   sequencing   a   human  genome   for   $1,000.   This   will   make   it   possible   to   integrate   genomic   data   into   clinical   trials   (that  typically   involve   thousands   of   human   tests)   and   into   the   health   systems   of   European   countries,  making  drug  development  easier  and   faster  and  having  a  dramatic   impact  on   the  development  of  personalised  therapies.      

We     should     not     forget,     however,     that     all     these     possibilities     will   develop   only   if   computer  resources   can  deal  with   the   complexity  of   the   large   interconnected  data  sets   that  are  serving   the  large   life   science   community.   For   example,   the   EBI   (that   hosts   the   major   core   bio-­‐resources   of  Europe)   doubled   the   storage   from   6,000   TBytes   (in   2009)   to   11,000   TBytes   (in   2010),   and   has  received  an  average  of  4.6  million  requests  per  day  (see  Figure  1.1).  

                                                                                                                         20  International  Human  Genome  Sequencing  Consortium.  Nature  2001  21  http://www.genomics.cn/en/platform.php?id=248  22 The  byte  is  a  unit  of  digital  information  that  most  commonly  consists  of  eight  bits;  terabyte,  1  TByte  =  1012  bytes;  petabyte,  1  PByte  =  1015  bytes;  exabyte,  1  EByte  =  1018  byte/s;  zettabyte,  1  ZByte  =  1021  bytes

Page 27: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  27  

 

   

Figure  1.1.The  exponential  growth  of  data  storage  in  EBI  (TBytes).  Figure  reproduced  from  the  EBI  annual  report.23  

There  are  also  many  other  steps  along  the  drug  discovery  pipeline  that  will  benefit   from  advances  in   supercomputing.   For   example,   the   identification   of   potential   drug   candidates   for   identified  disease   targets   will   be   fuelled   by   next   generation   supercomputers.   Most   lead   discovery   projects  currently   involve   high-­‐throughput   screening   (HTS)   instruments   that   can   scan   100,000   molecules  per   day   looking   for   those   showing   activity   against   the   target.   The  cost  of   this   technique   is  very  high   and   the   typical   success   rate   is   very   low.   In   contrast,   virtual   screening   is   a   computational  technique   that   can   scan   the   ability   of   a   therapeutic   target   to   recognise  molecules   from   a   virtual  library,  extending  the  chemical  search  space  and  dramatically  reducing  the  costs  of  drug  discovery.  Current   virtual   libraries   can   contain   one   billion   drug-­‐like   compounds24   and   they   are   expected   to  grow   still   larger.25   Only   multi-­‐petascale   supercomputers   are   capable   of   scanning   such   chemical  spaces   while   simultaneously   treating   a   large   number   of   potential   targets.   The   improvement   of  drugs   during   the   process   of   ‘lead   optimisation’   largely   relies   on   structural-­‐based   drug   design  procedures,  requiring  very  large  computer    resources    when    thousands    of    potential    modifications    of    the    lead    need    to    be  analysed.  

Finally,  we  are  fast  approaching  an  information-­‐rich  scenario,  where  we  will  have  detailed  structural  information   on   biomacromolecules,   complete   information   on  DNA,   RNA   and  protein   expression   in  different   cellular   situations,   complete   metabolomic   data   and   accurate   imaging   of   sub-­‐cellular  structures,  complete  cells  and  tissues.  In  the  near  future,  we  will  need  to  integrate  all  this  data  into  predictive  mathematical  models   that  will  be  able   to   represent  not  only   individual   macromolecules  but   entire   cells   and   even   organs.   Flagship   efforts,   such   as  the  Human  Brain  Project,  which  targets  simulating   the   behaviour   of   a   human   brain,   will   open   new   views   in   the   medical   field.   The  computational  cost  of  these  multiscale  simulations,  ranging  from  macromolecules  to  entire  organs,  is  still  far  beyond  current  computational  resources.  

The   priorities   set   out   by   the   life   sciences   panel   include   new   techniques   for:   (i)   data  management   and   large   storage   systems   (increase   of   shared   memory   capacity),   (ii)   interactive  supercomputing,   (iii)   data   analysis   and   visualisation,   (iv)   multi-­‐level   simulation,   and   (v)   training.  Because  life  sciences    and    health    is    such  a   heterogeneous    field,    it    will    be    necessary   to    have  several   application-­‐oriented   initiatives   developed   in   parallel,   although   they   can   share   similar  agendas.   A   flexible   access   protocol   to   Tier-­‐0   resources   will   be   as   important   as   absolute  computer  power  for  this  community.                                                                                                                            23  http://www.ebi.ac.uk/Information/Brochures/pdf/Annual_Report_2010_hi_res.pdf  24  Reymond  et  al.  (2009)  J.  Am.  Chem.  Soc.  25  Bohacek  et  al.  (1996)  Medical  Research  Reviews  

Page 28: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  28  

1.3.5 Engineering  Sciences    

The   engineering   sciences   domains   considered   in   this   overview   include   turbulence,   combustion,  aeroacoustics,  biomedical   flows,  and  general  process   technologies  and  chemical  engineering.  The  challenges  and  outcomes   in   the  engineering  scientific  domains   to  be  addressed  through  petascale  HPC  provision  are  summarised  below,  with  more  detail  provided  in  Table  1.1;  the  associated  societal  benefits  are  fully  developed  in  section  1.4.    

Turbulence  

Turbulence  is  characterised  by  many  degrees  of  freedom,  measured  by  the  Reynolds  number,  which  imply   large   computational   grids.   Present   simulations   have   Reynolds   numbers   of   a   few   thousands,  involve  1010  grid  points,  and  run  over  hundreds  of  millions  of  CPU  hours  in  O(105)  processors.  Direct  Numerical   Simulations   (DNS),   which   centred   on   simple   turbulent   channels   five   years   ago,   have  turned   to   jets   and  boundary   layers,   and   the   trend   towards   ‘useful’   flows   is   likely   to   continue.   The  Reynolds   numbers   have   increased   by   a   factor   of   roughly   five,   implying   a   work   increase   of   three  orders  of  magnitude.  

However,   this   can   only   be   considered   an   intermediate   stage   in   turbulence   research.   There   is   a  tentative  consensus   that  a   ‘breakthrough’  boundary   layer   free  of  viscous  effects   requires  Reynolds  numbers   of   the   order   of   Reτ   =   10,000   –   five   times   higher   than   present   simulations.   That   implies  computer  times  1,000  times  longer  than  present  (Re4),  and  storage  capacities  150  times  larger  (Re3).  Turbulence  research  requires  the  storage  and  sharing  of  large  data  sets,  becoming  O(20  PBytes)  per  case   within   the   next   5–10   years.   The   rewards   in   the   form   of   more   accurate   models,   increased  physical  understanding  and  better  design  strategies  will  grow  apace.    

Combustion  

Scientific   challenges   in   combustion   are   numerous.   First,   a   large   range   of   physical   scales   should   be  considered   from   fast   chemical   reaction   characteristics,   pressure   wave   propagation   up   to   burner  scales  or  system  scales.  Turbulent  flows  are,  by  nature,  strongly  unsteady.  Handling  chemistry    and  pollutant   emissions   in   numerical   simulations   requires   adapted   models,   the   treatment   of   fuels  demands  that  two-­‐phase  flows  are  taken  into  account,  while  solid  particles  such  as  soot  may  also  be  encountered.   Interactions   between   flow   hydrodynamics,   acoustics   and   combustion   may   induce  strong  combustion  instabilities  or  cycle-­‐to-­‐cycle  variations,  decreasing  the  burner  performances  and,  in  extreme  cases,   leading   to  destruction  of   the  system.  The  design  of   cooling   systems   requires   the  knowledge   of   heat   transfer   to   walls   due   to   conduction,   convection   and   radiation   as   well   as  flame/material  interactions.  

Aeroacoustics  

In   the   development   of   new   aircraft,   engines,   high-­‐speed   trains,   wind   turbines   and   so   forth   the  prediction   of   the   flow-­‐generated   acoustic   field   becomes   vital   as   society   expects   a   quieter  environment  and  noise  regulations  become  stricter.  The  future  of  noise  prediction  and  one  day  even  noise-­‐oriented   design   belongs   to   the   unsteady   three-­‐dimensional   numerical   simulations   and   first  principles,   but   the   contribution   of   such  methods   to   industrial   activities   in   aerospace   seems   to   be  years  away.  Certification  often  depends  on  a  fraction  of  a  dB,  whereas  presently  predicting  noise  to  within,  say,  2  dB  without  adjustable  parameters  is  decidedly  impressive.    

The   state  of   the   art   is   limited   to   simplified   components  or   geometries  which   can  be   tackled  using  manually  generated  structured  meshes  in  contrast  to  the  systems  actually  installed  which  need  to  be  simulated.  Massively  parallel  machines   in  the  Eflop/s  range  and  higher  are  essential   for  solving  the  aeroacoustics  problems  not  only  on  a  generic  but  also  on  an  industrial  scale.  

Page 29: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  29  

Biomedical  Flows  

Surgical  treatment  in  human  medicine  can  be  optimised  using  virtual  environments,  where  surgeons  perform  pre-­‐surgical   interventions   to  explore  best  practice  methods   for   the   individual  patient.  The  treatment   of   the   pathology   is   supported   by   analysing   the   flow   field,   for   example   optimising   nasal  cavity   flows  or  understanding  the  development  of  aneurysms.  The  computational   requirements   for  such   flow   problems   have   constantly   increased   over   recent   years   and   have   reached   the   limits   of  petascale   computing.   It   is   vital   to   understand   fully   the   details   of   the   flow  physics   to   conclude   the  derivation   of   medical   pathologies   and   to   propose,   for   instance,   shape   optimisations   for   surgical  interventions.  Such  an  in-­‐depth  analysis  can  be  obtained  only  by  a  higher  resolution  of  the  flow  field,  which  in  turn  increases  the  overall  problem  size.    

As   an   example   of   the   demands   of   biomedical   flow   requirements,   computations   that   have   to   be  performed  for  the  nasal  cavity  problem  under  high-­‐frequency  conditions   involve  Reynolds  numbers  in  the  range  of  Re≈15,000.  To  tackle  problems  where  the  entire  fluid  and  structure  mechanics  of  the  respiratory  system  is  simulated  demands  even  the  next  generation  of  exascale  computers.  

General  Process  Technologies,  Chemical  Engineering  

Chemical   engineering   and   process   technology   are   traditional   users   of   HPC   for   dimensioning   and  optimising   reactors   in   the  design   stage.  Computational   techniques  are  also  used   for   improving   the  operation   of   processes,   for   example   through  model   predictive   optimal   control,   or   through   inverse  modelling   for   estimating   system   parameters.   The   computational   models   used   in   chemical  engineering   span   a   wide   range   of   scales.   On   the   microscopic   level,   chemical   reactions   may   be  represented  by  molecular  dynamics  techniques,  while  on  the  mesoscopic  level,  flows  through  pores  or  around  an  individual  particle  may  be  of  interest.    The  macroscopic  scale  eventually  considers  the  operation  including  heat  and  mass  transfer  in  a  full  industrial-­‐scale  reactor  or  even  the  operation  of  a  full  facility.  

Exascale  systems  will  permit  a  better  understanding  of  highly  dispersed  phenomena  or  very  large  up  (or  down)  scaling  problems,  such  as  aggregates  formation  and  growth,  by  the  development  of  much  improved   particle   simulation   technologies,   for   example   for   describing   multiscale   interactions  between  fluids  and  structure,  or  fluid-­‐solid  suspension,  interfaces  and  multi-­‐physics  coupling.  

1.3.6 Industrial  Applications  

A   variety   of   industrial   applications   are   considered   in   this   overview,   including   aeronautics,   turbo  machines  and  propulsion,   energy,   automotive,  oil   and  gas,   and  other  applications.  Requirements  are  outlined  below  and  presented  in  greater  detail  in  Table  1.1.  

Aeronautics  Aircraft  companies  are  now  heavily  engaged  in  trying  to  solve  problems  such  as  calculating  maximum  lift  using  HPC  resources.  This  problem  has  an  insatiable  appetite  for  computing  power  and,  if  solved,  would  enable  companies  designing  civilian  and  military  aircraft  to  produce  lighter,  more  fuel-­‐efficient  and  environmentally  friendlier  planes.  The  challenges  of  future  aircraft  transportation  (‘Greening  the  Aircraft’)  demand  the  ability  to  flight-­‐test  a  virtual  aircraft  with  all  its  multidisciplinary  interactions  in  a  computer  environment,  to  compile  all  of  the  data  required  for  development  and  certification  –  with  guaranteed  accuracy  –  in  a  reduced  time  frame.  For  these  challenges,  a  complete  digital  aircraft  will  require  more  than  Zflop/s  systems.  

Turbo  Machines,  Propulsion  Numerical   simulation  and  optimisation   is   pervasive   in   the  aeronautics   industry,   and  notably   in   the  design   of   propulsion   engines.   The  main   targets   are   substantial   targeted   reductions   of   specific   fuel  consumption  and  environmental  nuisance  –  in  particular,  greenhouse  gases,  pollutant  emissions  and  noise  –  as  put  forward  by  regulators  such  as  ACARE  and   IATA.  On  the  engine  side,  these  ambitious  

Page 30: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  30  

goals   are   pursued   by   increasing   propulsive   and   thermodynamic   efficiency,   reducing   weight   and  finally  controlling  sources  of  noise.  The  development  of  disruptive  propulsive  technology  is  needed,  relying  even  more  heavily  on  numerical  tools  to  overcome  the  lack  of  design  experience.  There  are  two  major  HPC-­‐related  challenges  –  the  use  of  high-­‐fidelity  numerical  tools  leading  to  a  more  direct  representation  of  turbulence  and  the  evolution  of  optimisation  strategies.  

Energy  Objectives   include   improved   safety   and   efficiency   of   the   facilities   (especially   nuclear   plants),   plus  optimisation   of   maintenance,   operation   and   life   span.   This   is   one   area   in   which   physical  experimentation  –  for  example  with  nuclear  plants  –  can  be  both  impractical  and  unsafe.  Computer  simulation,  in  both  the  design  and  operational  stages,  is  therefore  indispensable.    

Considering  thermal  hydraulic  CFD  applications,  the  improvement  of  efficiency  may  typically  involve  mainly  steady  CFD  calculations  on  complex  geometries,  while  improvement  and  verification  of  safety  may   involve   long   transient   calculations   on   slightly   less   complex   geometries.   As   safety   studies  increasingly   require   assessment   of   CFD   code   uncertainty,   sensitivity   to   boundary   conditions   and  resolution  options  must   be   studied,   but   turbulence  models  may   still   induce   a   bias   in   the   solution.  Doing  away  with  turbulence  models  and  running  DNS-­‐type  calculations  at  least  for  a  set  of  reference  calculations   would   be   a   desirable   way   of   removing   this   bias.   Such   studies   will   require   sustained  access  to  multi-­‐Eflop/s  capacities  over  several  weeks.  

Neutronics  applications    These   include   the   capability   to  model   very   complex,   possibly   coupled   phenomena   over   extended  spatial  and  time  scale.  In  addition,  uncertainty  quantification  and  data  assimilation  are  considered  as  key   to   industrial   acceptance,   so   that   their   associated   computational   needs   that   depend   on   the  complexity  of  the  model  considered  have  to  be  met.  

Automotive  The  automotive  industry  is  actively  pursuing  important  goals  that  need  Eflop/s  computing  capability  or  greater.  These  include  (i)  vehicles  that  will  operate  for  250,000  kilometres  on  average  without  the  need  for  repairs,  (ii)  full-­‐body  crash  analysis  that   includes  simulation  of  soft  tissue  damage,  and  (iii)  longer-­‐lasting  batteries  for  electrically  power  and  hybrid  vehicles.  

For  both  aerodynamics  and  combustion,  at  least  LES,  and  if  possible  DNS,  simulations  are  required  on  an  industrial  scale  and  Eflop/s  applications  must  be  developed  at  the  right  scale.  

Crash:   Most   computations   are   currently   done   in   parallel   (8–64   cores),   and   scalability   tests   have  shown  that  up  to  1,024  cores  may  be  reasonable  on  10  million  finite  elements.  It  is  likely  that  future  simulations   will   require   model   sizes   for   a   full   car   ranging   from   1.5   to   10   billion   finite   elements,  demanding  the  development  of  new  codes  (mainly  open  source)  for  Eflop/s  systems.  

Oil  and  Gas  The  petroleum  industry  is  driven  to  increase  the  efficiency  of  its  processes,  especially  in  exploration  and  production  and  to  reduce  risks  by  the  deployment  of  HPC.  Typical  steps  in  the  business  process  are:   geoscience   for   identification  of   oil   and  gas  underground,  development  of   reservoir  modelling,  designing  of   facilities   for  the  cultivation  of  hydrocarbons;  drilling  of  wells  and  construction  of  plant  facilities;  operations  during  the  life  of  the  fields;  and  eventually  decommissioning  of  facilities  at  the  end  of  production.  

Other  Industrial  Applications  Banks   and   insurance   companies   are   increasingly   using  HPC,  mostly   embarrassingly   parallel  Monte-­‐Carlo  solutions  of  stochastic  ODEs;  but  high-­‐frequency  trading  will   inevitably  require  better  models  and  faster  calculation.  They  also  have  the  challenge  of  interconnecting  supercomputers  and  several  private  clouds.      

Page 31: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  31  

In  common  with  many  other   industries  mentioned   in  the  report,   they  are   faced  with  the   ‘big  data’  problem   in   the   sense   that   massive   market   data   are   available   (Reuters)   and   current   calibration  algorithms  cannot  exploit   such   large   input.  Note   that  41  machines  are  characterised  as   ‘finance’   in  the  November  2011  Top  500  list.  

Pharma   industries,   firmly   established   in   Europe,   already   use   ab-­‐initio   and   molecular   simulation  applied   to   their  domains,  and  will   increase  R&D  efforts   in   this   field,   for  example  drug  design   (GSK,  Sanofi)  or  biomedical  applications  (L’Oréal)  (see  sections  4  and  5  of  this  report).  

1.3.7 A  Schematic  Overview  of  the  Scientific  Roadmap  To   complete   our   introduction   to   the   domain   areas,   we   present   the   key   scientific   objectives   and  challenges   in  tabular   form  to   impress  on  the  reader  the  sheer  scope  of  accomplishments  promised  through  provision  of  a  sustainable  top-­‐level  infrastructure.    

Table   1.1   is   organised   to   identify   the   key   challenges   from   the   five   distinct   areas   listed   above.    

 Table  1.1.    The  challenges  and  outcomes  in  science  and  engineering  to  be  addressed  through  

petascale  HPC  provision.    

Application   Science  Challenges  and  Potential  Outcomes    

Weather, Climatology and Solid Earth Sciences

Climate  Change  

In  the  last  decade,  our  understanding  of  climate  change  has  increased,  as  has  the  societal  need  for  pull-­‐through  to  advice  and  policy.  However,  while  there  is  great  confidence  in  the  fact   that   climate   change   is   happening,   there   remain   uncertainties.   In   particular,   there   is  uncertainty   about   the   levels   of   greenhouse   gas   emissions   and   aerosols   and   likely   to   be  emitted   and,   perhaps   more   significant,   there   are   uncertainties   about   the   degree   of  warming   and   the   likely   impacts.   These   latter   uncertainties   can   only   be   reduced   by  increasing  the  capability  and  complexity  of  ‘whole  Earth  system’  models  that  represent  in  ever-­‐increasing  realism  and  detail  the  scenarios  for  our  future  climate.  A  further  challenge  is   to   provide  more   robust   predictions   of   regional   climate   change   at   the   decadal,   multi-­‐decadal  and  centennial  timescales  to  underpin  local  adaptation  policies.  In  many  regions  of  the   world,   there   is   still   considerable   uncertainty   in   the   model   predictions   of   the   local  consequences  of  climate.  Model  resolution  plays  a  key  role.  A  dual  track  approach  should  be  taken  involving  multi-­‐model  comparisons  at  the  current  leading-­‐edge  model  resolution  (about   20   km),   alongside   the   longer-­‐term   aim   to   develop   a   global   convective   resolving  model   (about   1   km).   To   reduce   these   uncertainties   in   climate   projections   requires   a  coordinated   set   of   experiments   and   multi-­‐year   access   to   a   stable   HPC   platform.   Issues  relating  to  mass  data  storage,  and  dissemination  of  model  outputs  for  analysis  to  a  wide-­‐ranging  community  of  scientists  over  a  long  period,  will  need  to  be  resolved.  A  multi-­‐group  programmatic  approach  could  allow   in  Europe  a   set  of  model   inter-­‐comparisons   focused  on  a  number  of  priority  climate  science  questions.  

Oceanography  and  Marine  Forecasting  

The  ocean   is  a   fundamental  component  of  the  Earth  system.   Improving  understanding  of  ocean  circulation  and  biogeochemistry   is  critical  to  assess  properly  climate  variability  and  future  climate  change  and  related  impacts  on,  for  example,  ocean  acidification,  coastal  sea  level,  marine  life  and  polar  sea-­‐ice  cover.  The  ocean  greatly  influences  the  climate  system  at  shorter  timescales,   i.e.  for  weather  forecasting  and  seasonal  climate  prediction,  but  its  influence  grows  as   timescales   increase.  Beyond  climate,  ocean  scientists  are  being  called  on  to  help  assess  and  maintain  the  wealth  of  services  that  the  ocean  provides  to  society.  Human  activities,   including   supply  of   food  and  energy,   transport  of   goods,   etc.,   exert   an  ever-­‐increasing  stress  on  the  open  and  coastal  oceans.  These  stressors  must  be  evaluated  

Page 32: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  32  

and  regulated   in  order  to  preserve  the  ocean's   integrity  and  resources.  Society  must  also  protect  against  marine  natural  hazards.  Marine  safety  concerns  are  becoming  more  acute  as  the  coastal  population  and  maritime  activities  continue  to  grow.  For  all  these  concerns,  there   is   a   fundamental   need   to   build   and   efficiently   operate   the   most   accurate   ocean  models   in   order   to   assess   and   predict   how   the   different   components   of   the   ocean  (physical,  biogeochemical,  sea-­‐ice)  evolve  and  interact.  The  main  perspective  is  to  produce  realistic   reconstructions  of   the  ocean's   evolution   in   the   recent  past   (e.g.   reanalyses)   and  accurate   predictions   of   the   ocean's   future   state   over   a   broad   range   of   time   and   space  scales,  to  provide  policy  makers  and  the  general  public  with  relevant   information,  and  to  develop  applications  and  services  for  government  and  industry.  

Meteorology,  Hydrology  and  Air  Quality  

Weather   and   flood   events   with   high   socio-­‐economic   and   environmental   impact  may   be  infrequent,  but  the  consequences  of  occurrence  can  be  catastrophic  to  those  societies  and  Earth  systems  that  are  affected.  There  is,  of  course,  a  link  to  climate  prediction  and  climate  change   impacts,   if   severe   meteorological   and   hydrological   events   are   to   become   more  frequent  and/or  more  extreme.  Predicting  these  low-­‐frequency,  high-­‐impact  events  a  few  days   in   advance   –  with   enough   certainty   and   early  warning   to   allow  practical  mitigation  decisions  to  be  taken  –  remains  difficult.  Understanding  and  predicting  the  quality  of  air  at  the  Earth’s  surface  is  an  applied  scientific  area  of  increasing  relevance.  Poor  air  quality  can  cause   major   environmental   and   health   problems   affecting   both   industrialised   and  developing   countries   around   the   world   (e.g.   adverse   effects   on   flora   and   fauna,   and  respiratory   diseases,   especially   in   sensitive   people,).   Advanced   real-­‐time   forecasting  systems  are  basic  and  necessary  tools  for  allowing  early  warning  advice  to  populations  and  practical  mitigation  strategies  in  case  of  air  pollution  crisis.  

Solid  Earth  Sciences  

Computational   challenges   in   solid   Earth   sciences   span   a   wide   range   of   scales   and  disciplines,   and   address   fundamental   problems   in   understanding   the   Earth   system   –  evolution   and   structure   –   in   its   near-­‐surface   environment.   Solid   Earth   sciences   have  significant  scientific  and  social   implications,  playing  today  a  central   role   in  natural  hazard  mitigation   (seismic,   volcanic,   tsunami   and   landslides),   hydrocarbon   and   energy   resource  exploration,  containment  of  underground  wastes  and  carbon  sequestration,  and  national  security   (nuclear   test  monitoring   and   treaty   verification).   In   the   realm   of   seismic   hazard  mitigation  alone,  it  is  well  to  recall  that,  despite  continuous  progress  in  building  code,  one  critical  remaining  step  is  the  ability  to  forecast  the  earthquake  ground  motion  to  which  a  structure  will  be  exposed  during  its  lifetime.    Until  such  forecasting  can  be  made  reliably,  complete  success  in  the  design  process  will  not  be  fulfilled.  All  these  areas  of  expertise  require  increased  computing  capability  in  order  to  provide  breakthrough  science.    A   programme   of   provision   of   leadership-­‐class   computational   resources   will   make   it  increasingly  possible  to  address  the  issues  of  resolution,  complexity,  duration,  confidence  and   certainty,   and   to   resolve   explicitly   phenomena   that  were   previously   parameterised.  Each  of   the   challenges   represents   an   increase  by   a   factor   of   at   least   100  over   individual  national  facilities  currently  available.    A   large   number   of   the   numerical   models   and   capability-­‐demanding   simulations   described  below  will   lead   to   operational   applications   in   other   European   centres,   national   centres   and  industry.  

 

Astrophysics, HEP and Plasma Physics

Astrophysics  

Understanding   our   place   in   time   and   space   has   been   central   to   human   scientific   and  sociological   advancement   throughout   the   past   four   centuries.   An   appropriate   mix   of  computing   infrastructure,   software   development   and   observational   facilities   will   allow  significant  progress  in  the  next  decade  in  answering  a  number  of  outstanding  fundamental  questions  in  astrophysics  and  cosmology:      

Page 33: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  33  

What  is  the  identity  of  the  cosmic  dark  matter  and  dark  energy?  

How  did  the  universe  emerge  from  the  dark  ages  following  the  Big  Bang?  

How  did  galaxies  form?  

How  do  galaxies  and  quasars  evolve  chemically  and  dynamically  and  what  is  the  cause  of  their  diverse  phenomenology?  

How  does  the  chemical  enrichment  of  the  universe  take  place?   How  do  stars  form?  

How  do  stars  die?   How  do  planets  form?  

Where  is  life  outside  the  Earth?  

How  are  magnetic  fields  in  the  universe  generated  and  what  role  do  they  play  in  particle  acceleration  and  other  plasma  processes?  

How  can  we  unravel  the  secrets  of  the  sources  of  strongest  gravity?  

What  will  as  yet  unexplored  windows  into  the  universe  such  as  neutrinos  and  gravitational  waves  reveal?  

 

Answering   these  questions   requires   accurate  numerical   treatment  of   a   range  of   coupled  complex   non-­‐linear   physical   processes   including   gravitation,   hydrodynamics,   non-­‐equilibrium   gas   chemistry,   magnetic   fields,   radiative   transfer   and   relativistic   effects.    Although   the  equations   that  describe   these  physical  processes  are  well   known,   solutions  are   attainable   only   by   numerical   simulation.   Exascale   resources   together   with   efficient  algorithms  are  essential  for  this  task.  

Elementary  Particle  Physics  

While   particle   physics   has   been   very   successful   in   recent   decades   in   unravelling   the  fundamental,  underlying  laws  governing  our  world,  it  is  certain  that  very  important  aspects  are  not  yet  understood.    In  particular,  it  is  clear  that  physics  has  to  change  dramatically  at  much  larger  energy  scales  than  those  explored  so  far.  In  view  of  the  enormous  costs  of  accelerators  with  much  higher  energy   than   the  existing   and  planned  ones   –   in   Europe   in  particular   LHC  and   FAIR  –   the  missing  pieces  of  the  puzzle  can  be  primarily  searched  for  at  the  precision  frontier.    This  requires  very  high  luminosity  accelerators  on  the  one  side  and  very  precise  theoretical  calculations  and  the  systematic  exploration  of  possible  scenarios  for  Beyond-­‐the-­‐Standard-­‐Model   (BSM)  physics  on  the  other  side.  The   latter   two  aspects  depend  crucially  on   input  from  lattice  field  theory.      Within   the   Standard   Model,   the   theoretical   uncertainties   are   dominated   by   non-­‐perturbative  aspects  of  QCD  (quantum  chromodynamics),  many  of  which  can  be  explored  with  lattice-­‐QCD  (LQCD).  Some  of  the  most  relevant  topics  are:    

• The  QCD  phase  diagram,  i.e.  QCD  at  finite  temperature  and  baryon  density,    which  is  also  very  relevant  for  astrophysics    

• The  three-­‐dimensional  quantum  structure  of  hadrons  as  encoded  in  precisely  defined  QCD  quantities  like  Generalised  Parton  Distributions  (GPDs),  distribution  amplitudes  (DAs),  transverse  momentum-­‐dependent  distribution  functions  (TMDs),  Bag  parameters,    etc.      

• The  quark  masses,  which  are  some  of  the  fundamental  parameters  of  the  standard  model  

A   crucial   advantage   of   computer   simulations   is   that  mass   coupling   constants   and   other  aspects  of  the  Standard  Model  can  be  varied  at  will,  which  allows  accidental  correlations  to  be  distinguished  from  fundamental  ones  and  thus  clarifies  many  aspects  of  physics.  The  latter,  namely  the  possibility  to  explore  theoretical  model  spaces,  is  also  a  most  important  aspect  of  the  search  for  BSM  physics:  without  such  numerical  simulation  of  non-­‐perturbative  aspects  it  is  not  possible  to  know  where  best  to  search  for  signals  of  the  new  physics.    

Page 34: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  34  

Plasma  Physics  

Some   of   the  most   demanding   scientific   and   computational   grand   plasma   challenges   are  closely  tied  to  the  development  of  plasma-­‐confinement  devices  for  fusion  energy  research  and   the   recent   developments   in   ultra-­‐intense   laser   technology,   with   the   possibility   of  exploring  astrophysical  scenarios  with  a  fidelity  that  was  previously  not  accessible  due  to  limited  computing  power.    The  main  scientific  challenges  are  in:    

• Plasma  accelerators  (either  laser  or  beam  driven)  and  possible  advanced  radiation  sources  based  on  these,  which  have  promising  applications  in  bio-­‐imaging  and  medical  therapy    

• Magnetic  confinement  fusion  devices  and  in  particular  ITER,  the  international  fusion  experiment    

• Inertial  fusion  energy  and  advanced  concepts  with  ultra-­‐intense  lasers,  which  aim  to  demonstrate  nuclear  fusion  ignition  in  the  laboratory    

• Collisionless  shocks  in  plasma  astrophysics,  associated  with  extreme  events  such  as  gamma  ray  bursters,  pulsars  and  AGNs  

• Solar  physics  addressing  the  combined  Sun  and  Earth  magneto-­‐plasma  system  by  means  of  observations  and  simulations  

These  are   topics  of   relevance  not  only   from  a   fundamental   perspective  but   also   in   terms  of  potential  direct  economic  benefits.  Research  in  plasma  accelerators  is  exploring  the  route  to  a  new   generation   of   more   compact   and   cheaper   particle   and   light   sources.   The   Magnetic  Confinement   Fusion   (MCF)   and   the   Inertial   Confinement   Fusion   (ICF)   approaches   to   nuclear  fusion  are  critical  for  a  sustainable  energy  production  –  a  driving  force  for  economic  growth.  

Materials Science, Chemistry and Nanoscience  

Materials  Informatics  

Materials   informatics   offers   a   pathway   to   cope   with   the   challenge   to   develop   new  materials   technologies   at   a   faster   rate,   in   a   more   cost-­‐effective   way,   and   in   closer  alignment   to   the   product   development   cycle   than   was   previously   possible.   Combining  quantum   methods   for   computing   the   stability   of   materials   and   information   techniques  such  as  data  mining  or  data  analytics  (in  statistics)  to  exploit  materials  informatics  offers  a  novel   approach   to   materials   design.   Due   to   the   increased   availability   of   computational  resources,   it   is   possible   to   run   many   thousands   of   potential   material   calculations   and  generate  notable   ‘theoretical  databases’.  Databases  of  derived  materials,  with  calculated  physical   and   engineering   properties,  will   no   doubt   be   an   increasingly   important   tool   for  researchers  and  engineers  working  in  fields  related  to  materials  development.  

Multiscale  Modelling  

A  pressing  research  challenge  involves  the  integration  of  the  various  length  and  time  scales  relevant   for  materials   science.  Multiscale  materials   simulation   is   currently   a   high-­‐impact  field  of  research,  where  much  effort  is  focused  towards  more  seamless  integration  of  the  length   and   time   scales,   from   electronic   structure   calculations,   atomistic   and   molecular  dynamics,   kinetic   and   statistical   modelling   to   the   continuum.   Together   with   new   and  emerging  techniques,  the  provision  of  increased  computational  power  can  yield  answers  to  versatile   and   complex   questions   central   to   materials   manufacture,   non-­‐equilibrium  processing  –  growth,  processing,  patterning  using  electron  or  ion  beams,  or  plasma  sources  –   properties,   performance   and   technological   applications.   A   sufficiently   detailed   and  realistic  computational  modelling  and  understanding  of  these  highly  complex  physical  and  technological   processes   can   be   achieved   only   by   large-­‐scale   computer   simulations  combining  a  range  of  simulations  that  cover  different  length  and  time  scales.  

Soft  Matter  Systems  –  Structure  and  Flow  

A   sufficiently   detailed   theoretical   modelling   and   understanding   of   highly   complex   soft  matter   systems   is  possible  only   through   large-­‐scale   computer   simulations.   This   is   a  huge  challenge,   as   the   relevant   structures   span   many   orders   of   length   scales.   The   timescale  challenges  are  even  greater.  Thus,  exceptional  amounts  of  computer   time  are  needed  to  

Page 35: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  35  

simulate  soft  matter  and  soft  materials  in  thermal  equilibrium.  Describing  the  behaviour  of  soft  matter  under  flow  is  even  more  challenging.  To  address  these  challenges,  mesoscale  hydrodynamics   simulation   techniques,   such   as   Lattice-­‐Boltzmann,   Dissipative   Particle  Dynamics,   and   Multi-­‐Particle   Collision   Dynamics,   have   been   developed   in   recent   years  which  allow  the  investigation  of  many  interesting  and  important  issues.  

Photo-­‐chemistry  

Sunlight   is   the   predominant   energy   on   Earth   and   a   key   factor   in   photosynthesis   and  photovoltaics.   The   in-­‐depth   understanding   of   the   nature   of   electronic   excited   states   in  biological  or  other  complex  systems  is  unquestionably  one  of  the  key  subjects  in  present-­‐day  chemical  and  physical  sciences  and  there  are  wide-­‐ranging  technological  applications  of   these   processes.   It   is   a   challenge   to   simulate   realistic   photo-­‐activated   processes   of  interest  in  biology  and  materials  science.  These  phenomena  usually   involve  non-­‐adiabatic  transitions   among   the  electronic   states  of   the   system   induced  by   the   coupled  motion  of  electronic  and  nuclear  degrees  of   freedom.  Consequently,   their   simulation   requires  both  accurate   ab-­‐initio   calculations   of   the   (many)   electronic   states   of   the   system   and   of   the  couplings  among  them  and  the  non-­‐adiabatic  time  evolution  of  its  components.  

Nanoscience:  Quantum  Device  Simulation      Ab-­‐Initio  

Our  understanding  of   self-­‐assembly,  programmed  materials,   and  complex  nanosystems  and  their  corresponding  architectures   is  still   in   its   infancy,  as   is  our  ability  to  support  design  and  nano-­‐manufacturing.  The  advance  of  faster  and  less  energy-­‐consuming  information  processing  or  the  development  of  new  generations  of  processors  requires  the  shrinking  of  devices,  which  demands  a  more  detailed  understanding  of  nano-­‐electronics.  As   semiconductor  devices  get  smaller,   so   it   becomes   more   difficult   to   design   or   predict   their   operation   using   existing  techniques.  Given  this  reduction  in  size,  the  next  generation  of  supercomputers  will  enable  us  to  perform  simulations  for  whole  practical  nanoscale  devices,  based  on  electronic  theory  and  transport   theory,   and   to  develop  guidelines   for  designing  new  devices   that   incorporate   the  quantum  effects  that  control  nano-­‐level  phenomena.  

Life Sciences and Medicine  

Genomics  

The   fast   evolution   of   genomics   is   fuelling   the   future   of   personalised   medicine.   Genetic  variability   affects   how   drugs   react   with   each   patient,   sometimes   in   a   positive   manner  (increasing   the   healing   effect),   sometimes   in   a   negative   manner   (increasing   toxic   side  effects)  or  simply  by  reducing  drug  response.  Personalised  medicine  is  a  concept  that  will  replace  the  outdated  idea  that  a  single  drug  is  the  solution  for  an  entire  population.  Thanks  to  recent  advances  in  high-­‐throughput  genome  sequencing,  we  can  already  access  the  full  genomic   profile   of   a   patient   in   a   single   day,   and   the   throughput   of   next   generation  sequencing  techniques  is  increasing  much  faster  than  Moore’s  law.    

Currently,   sequencing   centres   require  multi-­‐petabyte   systems   to   store   patient   data,   and  data   processing   is   carried   out   on   supercomputers   in   the   100   Tflop/s   to   1   Pflop/s   range.  Requirements  are  expected  to   increase  dramatically  as  sequencing  projects  are  extended  to   entire   populations,   making   possible   linkage   studies,   but   for   most   of   the   genomics  challenges,   an   ‘unbalanced’   Eflop/s   computer   (number   of   computer   nodes,   I/O   and  memory  capacities)  would  constitute  a  substantial  barrier  to  efficient  utilisation.  

Systems  Biology  

The   perturbation   of   biological   networks   is   a   major   underlying   cause   of   adverse   drug  effects.   Intense   research   is   being   carried   out   today   to   develop   models   for   identifying  protein  network  pathways   that  will  help  us   to  understand  the  undesired  effects  of  drugs  and   explore   how   they   are   related   to   network   connectivity.   Detailed   knowledge   of   the  structure   and   dynamics   of   biological   networks   will   undoubtedly   uncover   new  pharmacological   targets.   The   use   of   complex   network   medicine   is   expected   to   have   a  dramatic  impact  on  therapy  in  several  areas:  the  discovery  of  alternative  targets;  reducing  toxicity  risks  associated  with  drugs;  opening  new  therapeutic  strategies  based  on  the  use  of  ‘dirty’  drugs  targeting  different  proteins;  helping  to  discover  new  uses  for  existing  drugs.    

Page 36: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  36  

Systems   biology   is   now   at   the   stage   of   collecting   data   to   build   models   for   complex  simulations   that  will,   in   the   near   future,   describe   the   dynamics   of   cells   and   organs   that  presently   remain  unknown.  Progress   is   rapid  and  systems  biology  will   allow  us   to  couple  the   simulations   of   the   models   with   a   biomedical   problem   This   will   require   large  computational   resources   and   systems   biology   will   benefit   from   Eflop/s   capabilities,   but  aspects   related   to   data   management   are   going   to   be   as   important   as   pure   processing  capability.  

Molecular  Simulation  

Eflop/s  capabilities  will  allow  the  use  of  more  accurate  formalisms  (more  accurate  energy  calculations,   for   example)   and   enable   molecular   simulation   for   high-­‐throughput  applications   (e.g.   the   study   of   larger   number   of   systems).   Unfortunately,   if   Eflop/s  capabilities  are  achieved  simply  by  aggregating  a  vast  number  of  slow  processors,  this  will  not  favour  studies  of  longer  timescales  (a  key  tool  for  computer-­‐aided  drug  design)  since  it  will   not   be   possible   to   scale   up   to   systems   using   hundreds   of   thousand   cores   (as   the  simulated  systems  typically  have  less  than  1  million  atoms).  The  lack  of  high-­‐performance  computers  appropriate   for   this   research  will  displace  R&D  activities   to   the  USA,  China  or  Japan,   putting   European   leadership   in   this   field   at   risk.   Appropriate   exascale   resources  could   revolutionise   the   simulation   of   biomolecules,   allowing   molecular   simulators   to  decipher  the  atomistic  clues  to  the  functioning  of  living  organisms.    

Biomedical  Simulation  

The  extensive  use  of  simulation  will  help  to  integrate  knowledge  and  data  about  the  body,  tissues,   cells,   organelles   and   biomacromolecules   into   a   common   framework   that   will  facilitate   the   simulation   of   the   impact   of   factors   that   perturb   the   basal   situation   (drugs,  pathology,  etc.).Simulation  will  reduce  costs,  time  to  market  and  animal  experimentation.  In   the   medium   to   long   term,   simulation   will   have   a   major   impact   on   public   health,  providing   insights   into   the   cause   of   diseases   and   allowing   the   development   of   new  diagnostic  tools  and  treatments.  It  is  expected  that  understanding  the  basic  mechanisms  of  cognition,  memory,  perception,  etc.,  will  allow  the  development  of  completely  new  forms  of  energy-­‐efficient  computation  and  robotics.  The  potential  long-­‐term  social  and  economic  impact  is  immense.  

Engineering Sciences and Industrial Applications

Turbulence  

Direct  Numerical  Simulations   (DNS,  using  no  models),  which  centred  on  simple   turbulent  channels  five  years  ago,  have  turned  to  jets  and  boundary  layers,  which  are  much  closer  to  real-­‐life   applications,   and   the   trend   towards   ‘useful’   flows   is   likely   to   continue.   The  Reynolds  numbers  have  increased  by  a  factor  of  roughly  five,   implying  a  work  increase  of  three  orders  of  magnitude.    However,   this   is   only   an   intermediate   stage   in   turbulence   research.   A   ‘breakthrough’  boundary   layer   free   of   viscous   effects   requires   Reynolds   numbers   of   the   order   of   Reτ   =  10,000  –  five  times  higher  than  present  simulations.    That  implies  computer  times  1,000  times  longer  than  present  (Re4),  and  storage  capacities  150   times   larger   (Re3).   Keeping   wall   times   constant   implies   increasing   processor   counts  from  the  present  O(32  Kproc)  to  O(32  Mproc),  which  will  require  rewriting  present  codes  but   is   probably   not   insurmountable.   Storage   might   be   a   tougher   problem.   Turbulence  research   requires   storing   and   sharing   large   data   sets,   presently   O(100   TBytes)   per   case,  and  becoming  O(20  PBytes)  within  the  next  5–10  years.  Archiving,  transmitting  and  post-­‐processing   those   data   will   require   work,   but   the   rewards   in   the   form   of  more   accurate  models,  increased  physical  understanding,  and  better  design  strategies  will  grow  apace.  

Combustion  

Scientific  challenges  in  combustion  are  numerous.  First,  a  large  range  of  physical  scales  should  be  considered,  from  fast  chemical  reaction  characteristics,  pressure  wave  propagation  up  to  burner  scales  or  system  scales.  Turbulent  flows  are,  by  nature,  strongly  unsteady.  Chemistry  and   pollutant   emissions   involve   hundreds   of   chemical   species   and   thousands   of   chemical  

Page 37: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  37  

reactions,   and   they   cannot   be   handled   in   numerical   simulations   without   adapted   models.  Usual   fuels  are   liquid,   storing  a   large  amount  of  energy   in   small   volumes.  Accordingly,   two-­‐phase   flows   should  be   taken   into   account   (fuel   pulverisation,   spray   evolution,   vaporisation,  mixing  and  combustion).  Solid  particles,  such  as  soot,  may  also  be  encountered.  Interactions  between   flow   hydrodynamics,   acoustics   and   combustion   may   induce   strong   combustion  instabilities   (gas   turbines,   furnaces)   or   cycle-­‐to-­‐cycle   variations   (piston   engines),   decreasing  the  burner  performances   and,   in   extreme   cases,   leading   to   the   system  destruction   in   short  times.   Control   devices   may   contribute   to   avoid   these   instabilities.   The   design   of   cooling  systems  requires  the  knowledge  of  heat  transfer  to  walls  due  to  conduction,  convection  and  radiation   as  well   as   flame/material   interactions.   Europe   is   very  well   positioned   in   this   field  through  the  development  of  scalable  and  mature  code  based  on  LES  and  DNS  simulations.  Fire  simulations  are  less  mature  than  gas  turbine  or  internal  combustion  engine  computations  but  predictions   in   terms  of   safety,  prevention  and   fighting  are  challenging.  Forest   fires   regularly  strongly   affect   south   European   countries   and,   because   of   climate   change,   may   concern  northern   regions   in   the   future.   Their   social   impact   is   very   important.   The   simulation  of   fire  fighting–    for  example,  by  dropping  fluids  without  or  with  retardant–     is  also  a  challenging  research  of  crucial  importance.  

Aeroacoustics  

In  the  development  of  new  aircraft,  engines,  high-­‐speed  trains,  wind  turbines  and  so  forth,  the   prediction   of   the   flow-­‐generated   acoustic   field   becomes   more   and   more   important  since  society  expects  a  quieter  environment  and  noise  regulations,  not  only  near  airports,  become  stricter  every  year.  The  future  of  noise  prediction  and  one  day  even  noise-­‐oriented  design   belongs   to   the   unsteady   three-­‐dimensional   numerical   simulations   and   first  principles,  but  the  contribution  of  such  methods  to  industrial  activities  in  aerospace  seems  to   be   years   away.   Certification   often   depends   on   a   fraction   of   a   dB,   whereas   presently  predicting  noise  to  within,  say,  2  dB  without  adjustable  parameters  is  decidedly  impressive.    The  stateoftheart  is  limited  to  simplified  components  or  geometries  which  can  be  tackled  using  manually  generated  structured  meshes  in  contrast  to  the  systems  actually  installed,  which   need   to   be   simulated,   most   probably   by   adaptive   unstructured   body-­‐fitted   or  Cartesian  grids.  The  latter  can  be  decomposed  into  an  arbitrary  number  of  blocks  such  that  the   computations   can   be   done   on  massively   parallel  machines   in   the   Eflop/s   range   and  higher.  Such  machines  are  essential.  

Biomedical  Flows  

Surgical  treatment  in  human  medicine  can  be  optimised  using  virtual  environments,  where  surgeons   perform   pre-­‐surgical   interventions   to   explore   best   practice   methods   for   the  individual  patient.  The  treatment  of  the  pathology  is  supported  by  analysing  the  flow  field,  for   example   optimising   nasal   cavity   flows   or   understanding   the   development   of  aneurysms.    The   computational   requirements   for   such   flow  problems  have   constantly   increased  over  the  last  years  and  have  reached  the  limits  of  petascale  computing  in  the  sense  not  only  of  computational  effort  but  also  of  required  storage.  It  is  vital  to  understand  fully  the  details  of  the  flow  physics  to  conclude  the  derivation  of  medical  pathologies  and  to  propose,  for  instance,  shape  optimisations   for  surgical   interventions.  Such  an   in-­‐depth  analysis  can  be  obtained  only  by  a  higher  resolution  of  the  flow  field,  which   in  turn   increases  the  overall  problem  size.  To  tackle,  for  example,  the  nasal  cavity  problem,  where  the  entire  fluid  and  structure   mechanics   of   the   respiratory   system   is   simulated,   demands   even   the   next  generation  of  exascale  computers,  which  are  expected  to  be  available  in  2020.  

General  Process  Technologies  Chemical    Engineering  

Chemical  engineering  and  process  technology  are  traditional  users  of  HPC  for  dimensioning  and  optimising   reactors   in   the  design   stage.   Computational   techniques   are   also  used   for  improving   the   operation   of   processes,   for   example   through   model   predictive   optimal  control  or  through  inverse  modelling  for  estimating  system  parameters.  The  computational  models  used  in  chemical  engineering  span  a  wide  range  of  scales.    

On  the  microscopic   level,   chemical   reactions  may  be  represented  by  molecular  dynamics  techniques;  on  the  mesoscopic  level,  flows  through  pores  or  around  an  individual  particle  may   be   of   interest.   The   macroscopic   scale   eventually   considers   the   operation   including  

Page 38: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  38  

heat   and   mass   transfer   in   a   full   industrial-­‐scale   reactor   or   even   the   operation   of   a   full  facility.   Exascale   systems   will   permit   a   better   understanding   of   highly   dispersed  phenomena  or  very  large  up  (or  down)  scaling  problems,  such  as  aggregates  formation  and  growth,  by   the  development  of  much   improved  particle   simulation   technologies,   such  as  for   describing   multiscale   interactions   between   fluids   and   structure,   or   fluid-­‐solid  suspension,  interfacesand  multi-­‐physics  coupling.  

Industrial Applications

 

Aeronautics   Full  Multidisciplinary  Design  and  Optimisation  (MDO),  CFD-­‐based  noise  and  in-­‐flight  simulation:  the  digital  aircraft  

Turbo  Machines,  Propulsion  

Aircraft  engines,  helicopters,  etc.  

Structure  Calculation   Design  new  composite  compounds,  deformation  

Energy  Turbulent  combustion  in  closed  engines  and  opened  furnaces,  explosion  in  confined  area,    power  generation,  hydraulics,  nuclear  plants  

Automotive   Combustion,  crash,  external  aerodynamics,  thermal  exchanges,  etc.  

Oil  and  Gas  Industries  

Full  3D  inverse  waveform  problem  (seismic),  reservoir  modelling,  multiphase  flows  in  porous  media  at  different  scales,  process  plant  design  and  optimisation,  CO2  storage  

Engineering  (in  general)  

Multiscale  CFD,  multi-­‐fluids  flows,  multi-­‐physics  modelling,    computer-­‐aided  engineering,  stochastic  optimisation  and    uncertainties  quantification,  etc.  

Special  Chemistry   Molecular  dynamics,  (catalyst,  surfactants,  tribology,  interfaces),    nano-­‐systems  

Others   Bank/finance,  medical  industry,  pharma  industry,  etc.):  big  data,  data  mining,  image  processing,  etc.  

 

Aeronautics  

Aircraft  companies  are  now  heavily  engaged  in  trying  to  solve  problems  such  as  calculating  maximum  lift  using  HPC  resources.  This  problem  has  an  insatiable  appetite  for  computing  power   and,   if   solved,   would   enable   companies   designing   civilian   and  military   aircraft   to  produce  lighter,  more  fuel-­‐efficient  and  environmentally  friendlier  planes.  

To  meet  the  challenges  of  future  aircraft  transportation  (‘Greening  the  Aircraft’),  it  is  vital  to   be   able   to   flight-­‐test   a   virtual   aircraft   with   all   its   multidisciplinary   interactions   in   a  computer   environment   and   to   compile   all   of   the   data   required   for   development   and  certification   with   guaranteed   accuracy   in   a   reduced   time   frame.   For   these   challenges,  exascale   is   not   the   final   goal   –   a   complete   digital   aircraft  will   require  more   than   Zflop/s  systems.In   parallel,   future   aircraft   concepts   require   deeper   basic   understanding   in   areas  such   as   turbulence,   transition   and   flow   control   to   be   achieved   by   dedicated   scientific  investigations.  

Turbo  Machines,  Propulsion  

Numerical   simulation   and   optimisation   is   pervasive   in   the   aeronautics   industry,   and   in  particular   in   the   design   of   propulsion   engines.   The   main   driving   force   of   technological  evolution   is   substantial   targeted   reductions   of   specific   fuel   consumption   and  environmental  nuisance  –  in  particular  greenhouse  gases,  pollutant  emissions  and  noise  –  as  put  forward  by  regulators  such  as  ACARE  and  IATA.  On  the  engine  side,  these  ambitious  goals  are  pursued  by  increasing  propulsive  and  thermodynamic  efficiency,  reducing  weight  and   finally   controlling   sources  of  noise.  The   targets  can  probably  not  be  achieved  simply  through   gradual   improvement   of   current   concepts.   The   development   of   disruptive  propulsive   technology   is   needed,   relying   even   more   heavily   on   numerical   tools   to  

Page 39: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  39  

overcome  the  lack  of  design  experience.  We  can  foresee  two  major  challenges  related  to  HPC:   the   use   of   high-­‐fidelity   numerical   tools   towards   a   more   direct   representation   of  turbulence  and  the  evolution  of  optimisation  strategies.  

Energy  

The   objectives   are   multiple:   first,   improvement   of   safety   and   efficiency   of   the   facilities  (especially   nuclear   plants),   and   second,   optimisation   of   maintenance   operation   and  lifespan.   This   is   one   field   in   which   physical   experimentation,   for   example   with   nuclear  plants,  can  be  both   impractical  and  unsafe.  Computer  simulation,   in  both  the  design  and  operational  stages,  is  therefore  indispensable.    Thermal   hydraulic   CFD   applications:   Improvement   of   efficiency   may   typically   involve  mainly   steady   CFD   calculations   on   complex   geometries,   while   improvement   and  verification   of   safety   may   involve   long   transient   calculations   on   slightly   less   complex  geometries.   Note   that,   as   safety   studies   increasingly   require   assessment   of   CFD   code  uncertainty,  sensitivity  to  boundary  conditions  and  resolution  options  must  be  studied,  but  turbulence   models   may   still   induce   a   bias   in   the   solution.   Doing   away   with   turbulence  models  and  running  DNS-­‐type  calculations  at  least  for  a  set  of  reference  calculations  would  be  a  desirable  way  of  removing  this  bias.    Such  studies  will  require  access  to  multi-­‐Eflop/s  capacities  over  several  weeks.  Neutronics  applications  will  also  require  access  to  Eflop/s  capacities  for  moving  current  Sn  neutron   transport   code   to   full   Monte   Carlo   transport   calculation   on   millions   of   cores  implementing  time-­‐dependent  and  multi-­‐physics  coupling.  Many  other  applications  exist  beyond  those  mentioned:    new  generations  of  power  plants,  innovation  in  renewable  energies  and  storage,  customers’  energy  efficiency,  development  in  home  and  building  of  new  technologies  and  materials,  etc.  

Automotive  

The  automotive  industry  is  actively  pursuing  important  goals  that  need  Eflop/s  computing  capability  or  greater,  including  the  following  examples:  

• Vehicles  that  will  operate  for  250,000  kilometres  (150,000  miles)  on  average  without  the  need  for  repairs  –  this  would  save  automotive  companies  substantial  money  by  enabling  the  vehicles  to  operate  through  the  end  of  the  typical  warranty  period  at  minimal  cost  to  the  automakers    

• Full-­‐body  crash  analysis  that  includes  simulation  of  soft  tissue  damage  –    today's  ‘crash  dummies’  are  inadequate  for  this  purpose  –  required  particularly  by  insurance  companies  

• Longer-­‐lasting  batteries  for  electrically  power  and  hybrid  vehicles  For  both  aerodynamics  and  combustion,  at  least  LES,  and  if  possible  DNS,  simulations  are  required   in   an   industrial   scale   and   Eflop/s   applications   must   be   developed   at   the   right  scale.Crash:  Most   computations   are   currently   done   in   parallel   (8–64   cores),   and   scalability  tests  have  shown  that  up  to  1,024  cores  may  be  reasonable  on  10  million  finite  elements.  It  is  likely   that   future   simulations  will   require  model   sizes   for   a   full   car   ranging   from  1.5   to   10  billion  finite  elements,  demanding  the  development  of  new  codes  (mainly  open  source)  for  Eflop/s  systems.  Such  codes  must  display  the  attributes  of  coupling  (a  standardised  mapping  between   manufacturing   simulation   and   crash   simulation),   optimisation   and   stochastic  analysis,  with  embedding   into  a  simulation  data  management  system  with  automated  pre-­‐  and  post-­‐processing  including  monitoring  and  coupling  to  other  fields  and  functionalities.  On   a   10-­‐year   horizon,   a   variety   of  main   challenges,   including   true   virtual   testing,  will   be  addressed   leading   to   a   factor   of   >   1,000   for   the   required   number   of   computations   (see  section  6.3.5  for  a  complete  list).  

Oil  and  Gas  

The   petroleum   industry   is   strongly  motivated   to   increase   the   efficiency   of   its   processes,  especially   in   exploration   and   production   and   to   reduce   risks   by   the   deployment   of   HPC.  Typical   steps   in   the   business   process   are:   geoscience   for   identification   of   oil   and   gas  underground,  development  of  reservoir  modelling,  designing  of  facilities  for  the  cultivation  

Page 40: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  40  

of  hydrocarbons;  drilling  of  wells  and  construction  of  plant  facilities;  operations  during  the  life   of   the   fields;   and   eventually   decommissioning   of   facilities   at   the   end   of   production.  Geoscience   analyses   seismic   data   with   numerical   techniques   for   inverse   problems.   The  economic  impact  of  HPC  is  definitely  high  and  the  best  possible  tools  are  deployed.  Eflop/s  is  not  the  ultimate  goal  for  aeronautics.  The  complete  Inverse  Problem  Resolution  of  wave  equation  needs  more  computational  resources.  The  objective  of  this  application  is  to  produce  from  a  seismic  campaign  the  best  estimation  of  the  underground  topography  in  order  to  optimise  reservoir  delineation  and  production  by  solving  the  Full  Wave  Inversion.    This   application   is   largely   embarrassingly   parallel,   and   the   higher   performing   the   HPC  system   is,   the  better   the  approximation  of   the  underground   topography.  However,   large  HPC  simulations  are  required  to  understand  (i)  multi-­‐phases,  multi-­‐fluid  flows  of  different  viscosity,   in  porous  media  at  high  pressure  and  high  temperature  and  now  in  carbonates,  (ii)   multi-­‐fluid   flows   in   risers   (e.g.   BP/Macombo   Mexico   Gulf),   with   safe   and   optimised  flows,  plus  transport  and  storage.  

Others  

Banks  and  insurance  companies  are  increasingly  using  HPC,  mostly  embarrassingly  parallel  Monte-­‐Carlo   solutions   of   stochastic   ODEs;   but   high-­‐frequency   trading   will   inevitably  require   better   models   and   faster   calculation.   They   also   have   the   challenge   of  interconnecting  supercomputers  and  several  private  clouds.   In  common  with  many  other  industries,   they   are   faced  with   the   ‘big   data’   problem   in   the   sense   that  massive  market  data   are   available   (Reuters)   and   current   calibration   algorithms   cannot   exploit   such   large  input.   Note   that   41   machines   are   characterised   as   ‘finance’   in   the   latest   Top   500   list  (November  2011).  They  also  need  new  and  much  more  efficient  data  mining  methods.    Industrial   pharma  applications:  All   these   industries,   firmly   established   in   Europe,   already  use   ab-­‐initio   and  molecular   simulation   applied   to   their   domains,   and   will   increase   R&D  efforts   in   this   field   (see   sections   4   and   5   of   this   report)   for   drug   design   (GSK,   Sanofi)   or  biomedical  applications  (L’Oréal).  The  main  issues  for  these  industries  include:  

• Big  data  management,  generation,  transport,  storage,  etc.,  due  to  screening  simulations  

• Exascale-­‐efficient  MD  software  • New  data  mining  for  massively  parallel  QSAR  (Structure  –  Activities  Relations)  

Academic  and  Industry  Common  Issues  

Operating  software  with  load  balancing,  fault  tolerance,  coupling  with  users  need.  On  the  other  hand,  what  is  clearly  expected  in  2020  includes:  

• Standard  coupling  interfaces  and  software  tools    • Mesh-­‐generation  tool,  automatic  and  adaptive  meshing,  highly  parallel  (from  

meshes  of  about  100  million  tetras  to  10  billion  tetras)  • Coupling  multi-­‐physics,  refined  chemistry  • Meshless  methods  and  particle  simulation,  billions  particle  simulation  • New  numerical  methods,  algorithms,  solvers/libraries  (BLAS,  etc.)    • Uncertainty  quantification      • Optimisation,  data  assimilation  • Large  database,  big  data,  new  methods  for  data  mining  and  valorisation  

     

Page 41: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  41  

1.4 Balance  between  Scientific,  Industrial  and  Societal  Benefits  While  the  focus  of  this  report  lies  in  the  description  of  the  scientific  roadmap  and  the  major  challenges  associated  with  that  roadmap,  we  should  emphasise  the  increasing  economic  and  societal  benefits  that  arise  from  making  progress  towards  their  resolution.  In  this  section,  we  consider  the  potential  impact  of  computer   simulations   on   the   economy   and   society   in   general,   providing   a   number   of   compelling  examples  from  each  of  the  scientific  domains  central  to  this  report.    HPC  is  a  key  enabler  for  economic  growth   in   Europe   today.   It   delivers   a   competitive   edge   to   companies   operating   in   the   global  marketplace,  allowing  them  to  design  and  produce  products  and  services  that  differentiate  them  from  their  competitors.  A  study  in  2004  by  IDC  research  of  HPC  users  found  that  almost  100%  indicated  that  HPC  was  indispensable  for  their  business.  All  of  us  experience  the  effects  of  HPC  in  our  day-­‐to-­‐day  lives,  although   in  many   (and   probably  most)   cases  we   are   unaware   of   that   impact.  We   travel   in   cars   and  aeroplanes  designed  using  modelling  and  simulation  applications  run  on  HPC  systems  so  that  they  are  efficient  and  safe.  HPC  is  essential  for  ensuring  that  our  energy  needs  are  met.  Finding  and  recovering  fossil  fuels  require  engineering  analysis  that  only  HPC  can  deliver.  Nuclear  power  generation  also  relies  heavily  on  HPC  to  ensure  that  it  is  safe  and  reliable.  In  the  coming  years,  HPC  will  have  an  even  greater  impact  as  more  products  and  services  come  to  rely  on  it.  

Besides   the  above  benefits   that   industry  can  draw   from  HPC  to  achieve   increased   innovation   rates  and   competitiveness,   HPC   is   also   a   critical   and   essential   technology   as   we   address   some   of   the  societal   challenges   ahead,   such   as   generating   clean   and  efficient   energy,   predicting   and  mitigating  against  the  effects  of  climate  change,  and  ensuring  safe  and  efficient  travel.  Clearly,  one  of  the  most  important   challenges   facing   our   world   is   to   design   and   provide   clean   and   climate-­‐friendly  transportation   systems   and   energy-­‐producing   technologies,   which   would   not   lead   to   the   current  levels  of  CO2  and  other  greenhouse  gases.  HPC  is  therefore  essential  in  two  ways:    

1. It  is  the  unique  way  to  study  and  design  new  processes,  as  analogic  mock-­‐up  systems  are  becoming  unaffordable,  if  not  impossible  to  construct  (as,  for  example,  in    the  design  of  low-­‐pressure  stable  combustion  chambers,  of  more  fuel-­‐economic  terrestrial  and  airborne  vehicles,  etc.)    

2. It  is  the  only  way  to  check  the  real  impact  of  these  new  designs  through  the  use  of  advanced  climate  models  

To   capture   the   economic   and   societal   benefits   through   investment   in   HPC   resources,   we   show   in  Table  1.2  below  a  summary  of  these  benefits  as  a  function  of  scientific  domain.  We  have  attempted  to  colour-­‐code  these  benefits  according  to  economic  (red)  and  societal  (green)  –  and  highlight  those  developments  that  impact  on  both  (purple).    

Table  1.2.    The  economic  and  societal  benefits  arising  through  HPC  provision  as  a  function  of  scientific  domain.  

Economic  and  Societal  Benefits  in  Weather,  Climatology    and  Solid  Earth  Sciences  

Quantifying  the  Certainty  and  Impact  of  Forecasts  

Natural  disasters  claim  hundreds  of  thousands  of  lives  annually  and  cause  vast  property  losses.  To  what  extent  anthropogenic  climate  change  will  lead  to  an  increase  in  occurrence  and  severity  of  extreme  events  and  natural  disasters  is  one  of  today’s  most  important  and  challenging  scientific  questions.  The  countries  that  will  have  access  to  the  highest  performance  in  computing  will  be  able  to  perform  experiments  that  will  become  the  references  for  future  scientific  assessments  and  associated  political  decisions.  Even  though  Europe  has  world-­‐class  expertise  in  climate,  oceanography,  weather  and  air  quality,  earthquake  and  tsunami  modelling  issues,  European  scientists  may  lose  their  current  prominence  if  they  cannot  access  the  most  powerful  computing  systems.  

Page 42: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  42  

The  economic  benefit  to  society  of  quantifying  the  certainty  and  impact  of  forecasts,  on  whatever  timescale,  is  enormous.  By  providing  probabilistic  results  to  agencies  involved  in  assessing  impacts  of  extreme  events  or  climate  change  adaptation,  mitigation  strategies  can  be  developed  and  the  impacts  constrained.  

The  societal  benefits  range  from  mitigation  of  high-­‐impact  weather  by  having  a  more  accurate  and  timely  weather  nowcasting  system  to  a  better  air-­‐quality  forecast  with  direct  impact  on  health,  traffic  (e.g.  fog  and  rain),  agriculture  (e.g.  ozone  influence),  etc.  In  the  last  few  years,  the  emphasis  on  ‘policy  relevance’  has  moved  beyond  the  issues  of  the  existence  of  a  human  effect  on  climate  and  how  to  mitigate  future  change.  As  it  becomes  more  likely  that  a  certain  level  of  climate  change  is  inevitable,  interest  has  extended  to  how  climate  will  change  in  the  next  few  decades,  and  to  evaluate  the  most  effective  ways  to  adapt  to  it.  Hence,  from  a  policy  point  of  view,  there  are  two  important  timescales  that  have  to  be  understood:  the  next  few  decades  during  which  vulnerabilities  can  be  assessed  and  adaptation  responses  planned,  and  the  centennial  scale  on  which  we  can  understand  how  global  strategies  could  mitigate  climate  change.  

Seismic,  Volcanic  and  Tsunami  Hazard  Mitigation  

It  is  well  to  recall  that,  despite  continued  progress  in  building  code  development  in  Europe  and  abroad,  all  these  catastrophic  events  have  disruptive  implications  for  society,  with  high  associated  economical  cost,  and  they  must  be  properly  managed.    

This  applies  particularly  to  critical  facilities  such  as  nuclear  power  plants  and  long-­‐term  waste  repositories.  Critical  steps  are:  first,  the  ability  to  predict  a  wide  range  of  possibilities  in  the  process  of  designing  and  planning  new  facilities,  and  second,  the  means  to  perform  simulation  and  near-­‐term  prediction  to  support  decision-­‐making  strategies  for  retrofitting  existing  structures  and  to  manage  specific  issues.  Predictions  have  to  rely  on  high-­‐resolution  models  and  on  the  ability  to  do  vast  ensembles  of  simulation  of  these,  and  to  integrate  the  high  data  volume  outputs  using  appropriate  analytics  for  uncertainty  quantification  and  extreme  scales  prediction.    

Monitoring  of  nuclear  testing  treaties  and  underground  explosion  activities  relies  on  the  detection  and  the  characterisation  of  the  nuclear  explosions  from  the  seismic  signals  recorded  by  the  global  seismic  networks.  Detecting  and  discriminating  nuclear  explosions  from  earthquakes  and  other  types  of  wave  sources  have  to  rely  on  deep  physical  understanding  of  the  wave  propagation  physics,  and  on  the  ability  to  perform  vast  ensemble  of  high-­‐resolution  simulation  of  the  seismic  waves  propagation  generated  by  a  wide  variety  of  sources.  There  must  also  be  the  ability  to  process  these  synthetic  waveforms  and  to  compare  them  with  the  actual  observation.    

Research  and  Development  in  the  Energy  Industry    

Research  and  development  in  the  energy  industry  depends  heavily  on  innovative  computational  analysis,  and  leading  edge  computing  and  data  capabilities,  to  explore  containment  of  underground  wastes,    carbon  sequestration  to  reduce  global  warming,  new  energy  resources  and  to  find  innovative  means  to  tap  these  resources  and  to  monitor  their  exploitation.    

All  these  tasks  require  high-­‐resolution  3D  tomographic  images  of  the  Earth’s  interior  and  now  time-­‐lapse  repeated  tomography  to  detect  changes.  The  capability  to  perform  simulation-­‐based  inversion  and  optimisation  faster,  integrating  multiscale  and  multi-­‐physics  methodology,  in  high-­‐dimensional  and  complex  parameter  spaces  is  extremely  important  to  the  overall  competitiveness  of  Europe  in  these  industrial  and  societal  issues.  

     

Page 43: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  43  

Marine  Monitoring  and  Forecasting  Systems    

The  societal  benefit  of  marine  monitoring  and  forecasting  systems  is  crucial  in  the  areas  of  marine  safety  (e.g.  wave  models  predicting  sea  state  are  significantly  improved  by  accurate  prediction  of  ocean  currents),  marine  resources,  marine  environment  and  climate  (e.g.  forecasting  of  toxic  algae  blooms,  storm  surges,  regional  sea-­‐level  changes  in  the  long  term),  seasonal  and  weather  forecasting.    

The  impact  of  these  services  will  be  enormous,  as  they  will  be  crucial  contributions  to  the  environmental  information  base  allowing  Europe  to  evaluate  independently  its  policy  responses  in  a  reliable  and  timely  manner.    

The  societal  benefit  of  an  intensive  use  of  higher-­‐resolution  ocean  models  in  the  research  community  will  also  be  of  crucial  importance.    

Observations  alone  cannot  provide  the  fundamental  knowledge  of  the  ocean  that  modelling  activities  can  bring  up.This  knowledge  will  be  a  key  factor  in  the    improvement  of  forecasting  and  climate  prediction  models  themselves,  in  the  evaluation  of  uncertainties  related  to  the  non-­‐linear  character  of  the  ocean  flows,  and  in  the  development  and  rationalisation  of  the  ocean  observational  networks.  

Economic  and  Societal  Benefits  in  Astrophysics,    High-­‐Energy  Physics  and  Plasma  Physics  

Alternative  Sources  of  Energy  

Ultra-­‐compact  laser-­‐plasma  accelerators  will  also  produce  high-­‐quality  electron  and  ion  beams  directly  relevant  for  biomedical  applications.  Ion  beams  from  these  accelerators  (at  +200  MeV)  would  be  particularly  suited  for  the  treatment  of  deep  tumours  or  radio-­‐resistant  cancers,  at  a  very  small  fraction  of  the  cost  of  the  existing  facilities,  or,  at  lower  energies,  to  produce  radioisotopes  for  medical  diagnostics.  Coherent  X-­‐rays  from  high-­‐quality  electron  beams  with  ultra-­‐short  pulse  duration  produced  can  have  a  tremendous  impact  in  structural  biology  and  on  bio-­‐imaging  of  virus,  again  at  a  much  smaller  fraction  of  the  cost/space  of  existing  light  sources.  

Besides  their  interest  as  natural  phenomena  (aurorae,  etc.),  magnetospheric  perturbations  are  also  of  practical  interest  since  they  can  affect  satellites,  disrupt  telecommunications  and  occasionally  affect  power  grids.  Thus,  solar  physics  modelling,  coupled  with  observations,  can  anticipate  disruptive  solar  activity  and  contribute  to  the  strategy  to  protect  investments  in  satellites  and  telecommunications.    Europe  is  also  the  leading  partner  in  several  research  satellites,  Ulysses,  SoHO,  Cosmic  Vision  Solar  Orbiter.  

 

   

Figure  1.2.  Spot  price  for  a  barrel  of  oil  versus  global  oil  production,  suggesting    saturation  in  the  supply  rate  from  2004  (from  [26]).  Bohacek  et  al  (1996)  Medical  Research  Reviews  

Page 44: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  44  

Figure  1.2  suggests  that  global  oil  production  may  have  peaked  in  2004,26  an  event  that,  whenever  it  happens,  will  inevitably  have  enormous  consequences  for  the  world  economy.    

This,  in  addition  to  climate  change,  provides  a  further  timely  reminder  of  society’s  urgent  need  to  develop  alternative  sources  of  energy  that  are  independent  of  fossil  fuels.  

HPC  research  makes  very  important  contributions  to  the  development  of  carbon-­‐free  sources  of  energy  (e.g.  from  nuclear  fusion).      

The  possible  economic  benefit  from  top-­‐level  computing  capacity  in  the  field  of  thermonuclear  fusion  research  is  self-­‐evident.  Achieving  thermonuclear  fusion  as  a  possible  future  energy  source  is  a  great  dream  of  mankind.    It  requires  very  large  investments,  and,  as  in  many  other  fields,  computer  simulations  are  an  effective  way  to  steer  applied  research  in  the  right  directions.    

Europe  is  the  world  leader  in  magnetic  confinement  with  the  Joint  European  Torus  (JET)  as  the  top  running  experiment  and  as  the  partner  contributing  the  largest  fraction  of  the  construction  costs  of  the  international  project  ITER.  Europe  is  also  establishing  a  strong  effort  in  Inertial  Confinement  Fusion  through  large-­‐scale  pan-­‐European  laser  projects  such  as  the  High  Power  Laser  for  Energy  Research  –  HiPER.  

Forefront  computing  capabilities  are  an  essential  cost-­‐effective  means  to  back  all  these  investments  and  to  strengthen  or  maintain  leadership  in  these  fields.  

Economic  and  Societal  Benefits  in  Materials  Science,    Chemistry  and  Nanoscience  

Finding  Substitutes  for  Critical  Minerals    

Minerals  are  important  components  of  many  products  people  use  in  daily  life  (e.g.  ultra-­‐high  coercivity  magnets,  cell  phones,  computers,  laser,  navigation  and  automobiles).  Yet  the  European  Union  does  not  mine  or  process  much  of  that  raw  material.  The  Enterprise  and  Industry  Directorate  of  the  European  Commission  issued  a  report  defining  critical  raw  materials27  as  ones  whose  supply  chain  is  at  risk,  or  for  which  the  impact  of  a  supply  restriction  would  be  severe,  or  both.  The  report  recommends  that  substitution  should  be  encouraged,  notably  by  promoting  research  on  substitutes  for  critical  raw  materials  in  different  applications.  The  exascale  infrastructure  will  assist  the  combinatorial  materials  discovery  and  design  to  rapidly  discover  and  develop  substitutes  for  technologies  and  applications  that  are  currently  dependent  on  these  critical  minerals  for  which  no  known  alternative  is  available  today.  

Computational  materials  science  is  intrinsically  a  rather  diverse  interdisciplinary  community  that  plays  a    very  prominent  role  in  PhD  and  post-­‐doctoral  research  training.  The  young  scientists  working  in  this  field,  developing  complex  computational  methods  and  applying  them  in  a  multi-­‐  and  trans-­‐disciplinary  environment,  are  in  great  demand  in  both  industry  and  academia.  

Computational  materials  science,  chemistry  and  nanoscience  have  a  proven  track  record  of  direct  impact  on  our  society.  One  may  think  of  the  Internet  and  smart-­‐phone  revolution  or  the  efficiency  of  modern  detergents.  There  is  no  reason  to  believe  that  this  may  stop;  instead,  the  impact  will  accelerate  due  to  the  steep  progress  in  materials  design  that  is  foreseen  by  the  use  of  an  exascale  infrastructure.  This  is  necessary  to  contribute  to  the  pressing  issues  on  energy  harvesting,  storage,  conversion  and  saving,  environmental  protection  and  toxicity  management,  decontamination,  air  cleaning,  biotechnology  and  health  care.  

     

                                                                                                                         26  James  Murray  and  David  King,  ‘Oil's  tipping  point  has  passed’,  Nature  481,  433,  26th  January  2012.    27  http://ec.europa.eu/enterprise/policies/raw-­‐materials/files/docs/report-­‐b_en.pdf  

Page 45: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  45  

Energy  Materials  

Computational  materials  science  assisted  by  exascale  computing  can  provide  invaluable  input.  Examples  where  design  of  new  materials  is  needed  are  high-­‐efficiency  photovoltaic  cells,  fuel  cells  for  electricity  production  by  hydrogen,  energy-­‐efficient  solid-­‐state  lighting,  batteries  for  energy  storage  or  thermoelectric  materials.  Given  the  need  for  green  energy  production  and  green  information  technology,  computational  materials  science  can  have  a  global  impact  on  the  grand  challenges  of  energy,  environment  and  sustainability.  

Economic  and  Societal  Benefits  in  Life  Sciences  and  Medicine  

Rational  Drug  Design  and  the  Pharmaceutical  Industry  

Europe  has  a  very  competitive   industry  that   launches  almost  40%  of   the  pharmaceutical  products  on  the  worldwide  market.  Advances  in  genomics,  systems  biology  and  molecular  simulation  are  making  rational  drug  design  a  powerful  alternative  to  trial-­‐and-­‐error  methods.  Computing   systems   that   are   incapable   of  performing   high-­‐power   computations   in   a   time  frame   of   weeks   would   not   be   profitable   for   the   drug  industry.   Assuming   a   20-­‐year   patent,  drug  companies  have  an  average  of  6  years’  exclusivity  to  recover  fully  their  $1.2  billion  investment,28  while  maintaining  their  operating  costs.  This  introduces  a  strong  pressure  that  is  damaging  pharmaceutical  companies  in  Europe,  forcing  them  to  design  new  approaches  to  reduce   the   time  required   for   drug   development   and   to   design   new   therapeutic   scenarios.    

In  this  field,  personalised  medicine  appears  as  one  of  the  low  hanging  fruits.  Its  market  is  expected   to   show   an  annual   growth   of   about   10%.   The   core   diagnostic   and   therapeutic  segment  of  the  market  is  comprised  primarily  of  pharmaceutical,  medical  device  and  diagnostics  companies  and  the  projections  for  2015  are  for  reaching  $452  billion.29  However,  any  computational   strategy   leading   to   increased   efficiency   in   target  discovery,   lead   finding   and  lead  optimisation  will  have  an  enormous   impact   for   the  European  pharmaceutical  industry  and  will  represent  enormous  saving  for  the  national  public  health  systems.    

Additional  savings  will  be  derived  by  the  partial  replacement  of  animal  testing  ($150  million  per  launch30  associated  with  the  development  a  new  pharmaceutical  entity)  by  in-­‐silico  approaches.  In  this  field,  we  cannot  ignore  that,  according  to  the  REACH  initiative,  major  animal  testing  will  be  required  to    evaluate    the    safety    of    major    chemicals    sold    in    Europe    decade,31    which    will    involve  millions    of    laboratory    animals,    with    an    estimated    total    cost    of  €1.3–9.5    billion.    

Researchers  in  medical  simulation,  systems  biology  and  molecular  simulation  are  working  towards  the  development  of  in-­‐silico  models  that  can  simulate  systems  from  cells  to  complex  organs  such  as  the  brain.  These  models,  when  complete,  will  certainly  require  exascale  computing.  

Economic  and  Societal  Benefits  in  Engineering  Sciences  and  Industrial  Applications  

In  the  Energy  Domain  

Every  human  activity  and  energy  process  –  nuclear,  thermal,  hydro  –  needs  to  improve  continuously  its  environmental  impact  (thermal  discharge  of  nuclear/thermal  power  plants,  long-­‐term  geomorphology  in  rivers  due  to  hydropower,  water  or  air  quality,  etc.)  Achieving  this  environmental  risk  assessment  will  also  require  the  use  of  multi-­‐physics  (sedimentology,  water  quality),  multiscale  (from  local  scale  of  near  field  to  the  large-­‐scale  of  a  river  basin),  and  complex  multidimensional,  time-­‐dependent  models  (sometimes  to  simulate  five  years  of  ‘maybe’  evolution.  

                                                                                                                         28  Tufts  University  CSDD  Outlook  2008  29  The  Science  of  Personalised  Medicine:  Translating  the  Promise  into  Practice  (2009)  PricewaterhouseCoopers  30  Paul  et  al.  (2010)  Nature  Reviews  Drug  Discovery  31  http://ec.europa.eu/environment/chemicals/reach/reach_intro.htm

Page 46: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     The  European  HPC  Ecosystem  –  2012–2020

  46  

The  main  challenge  facing   the  nuclear   industry  is,   today  more  than  ever,   to  design   safe  nuclear  power  plants.  This   is  presently  being  done  for  the  so-­‐called  third  generation  (e.g.  the  French  EPR),  and  is  under  active  preparatory  studies  for  the  forthcoming  fourth  generation  (e.g.  sodium-­‐  or  gas-­‐cooled  fast-­‐neutron  reactors).    HPC    and    in    particular  Eflop/s  computing    will  contribute  to  improving  nuclear  power  plant  design,  involving  the  use  of  multi-­‐physics,  multiscale,  complex  three-­‐dimensional  time-­‐dependent  models.  It  is  worth  noting  that  HPC  3D  simulations  are  crucial  to  EDF  for  assessing  the  10-­‐year  extension  of  the  lifetime  of  nuclear  power  plants,  such  an  extension  representing  hundreds  of  millions  of  euros  of  saving  each.  Another  example  concerns  the  costs  of  a  drilling  by  oil  companies,  which  amounts  to  approximately  several  tens  of  millions  of  dollars;  avoiding  an  unproductive  well,  thanks  to  extensive  HPC  analysis  of  seismic  data,  is  clearly  worth  the  cost!  

The  easily  accessible  oil  reserves  are  decreasing  rapidly,  with  an  oil-­‐peak   forecasted  to  occur  sometime  around  the  middle  of  the  century.  Improving  the  efficiency  of  the  search  for  new  oil  reservoirs,  including  non-­‐  traditional  reservoirs,  can  be  done  only  through  very  advanced  wave-­‐propagation  models  and  image-­‐processing  methods.  Such  methods  are  being  considered  by  oil  companies  as  they  contribute  to  crucial  competitive  advantages.  Equally,  the  production  from  non-­‐traditional  fields,  like  marginal  or  deep-­‐water  fields,  is  often  characterised  by  oil  and  gas  qualities,  which  are  more  difficult   to  cultivate.   Flow  from  wells  to  production  plants  needs   particular   attention   and  engineering  abilities;  handling  of  produced  fluids  is  more  complex.  Energy  supply  can  be  safeguarded  only  with  extended  capabilities  in  reservoir  modelling  and  enhanced  techniques  for  production  (e.g.  processing  and  transport  of  crude  oil  streams  in  harsh  environments).  Increased  levels  and  quality  of  computer  simulations  enable  engineers  and  scientists  to  turn  potential  risks  into  well-­‐managed  opportunities  for  the  energy-­‐consuming  society.  

    In  the  Chemical  Engineering  Domain  

The  chemical  industry  heavily  relies  on  oil  as  a  carbon  source.  With  the  prospect  of  increasing  oil  prices  and  decreasing  oil  reserves,  gasification  (with  its  low  CO2  footprint)  of  low-­‐grade  coals  and  biomass  providing  synthesis  gas  for  further  chemical  processing  has  become  an  important  technology.  Gasification   with  reactor  sizes  of  several   metres,   residence   times  of  10–100   seconds  and  physical  processes  occurring  on  micrometre  and  millisecond  scales  is  a  challenging  multiscale,  multi-­‐physics  problem.    Future    (virtual)    design    and    optimisation   of    these    reactors    will    be    driven    by    numerical  simulation.  Developing  for  a  large  variety  of  different  feedstock  requires  a  large  number  of  simulations.  Each  single  full  3D  and  dynamic  simulation  requires  efficient  numerical  models  and  the  corresponding  computational  resources.  

In  the  Transportation  Domain  

Future  air  transport  systems  will  have  to  meet  the  constantly  increasing  needs  of  European  citizens  for  travel  and  transport,  as  well  as  the  pressing  requirements  to  preserve  the  environment  and  quality  of  life.  Within  the  ACARE  (Advisory  Council  for  Aeronautics  Research  in  Europe)  Vision-­‐2020,  ambitious  goals  have  been  set  for  air  traffic  of  the  next  decades.  These  include  a  reduction  of  emissions  by  50%  and  a  decrease  of  the  perceived  external  noise  level  by  10–20  dB.  Continuous  improvement  of  conventional  technologies  will  not  be  sufficient  to  achieve  these  goals:  a  technological  leap  forward  is  required.  Numerical-­‐based  design  and  flight  testing  will  be  a  key  technology  when  aiming  at  more  affordable,  safer,  cleaner  and  quieter  and  hence  greener  aircraft.  The  access  to  high-­‐performance  computers  in  the  exascale  range  is  of  the  utmost  important.  Considerable  changes  in  the  development  processes  will  lead  to  significant  reductions  in  development  times  while  at  the  same  time  including  more  and  more  disciplines  in  the  early  design  phases  to  find  an  overall  optimum  for  the  aircraft  configuration.  This  enables  the  European  aircraft  industry  to  retain  a  leading  role  in  worldwide  competition,  facing  both  an  old  challenge,  i.e.  competing  with  the  US,  and  a  new,  rapidly-­‐emerging  one,  i.e.  keeping  an  innovation  advantage  over  China.  

Designing  efficient  engines,  motors  and  reactors,  for  airplanes  and  cars,  is  critical  for  both  the  efficiency  and  safety  of  actual  and  future  propulsion  modes.  This  challenge  is  facing  advanced  European  industries:  how  can  we  use  less  energy  for  propelling  the  new  vehicles  presently  under  development  while  meeting  the  corresponding  environmental  challenge  of  emitting  less  greenhouse  gases?  How  do  we  design  safe  reactors  which  could  perform  at  low  pressure  without  exhibiting  dangerous  instabilities?  

 

Page 47: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  47  

2 WEATHER, CLIMATOLOGY AND SOLID

EARTH SCIENCES

2.1 Summary  

Weather,  Climatology  and  solid  Earth  Sciences   (WCES)  encompass  a  wide   range  of  disciplines   from  the  study  of  the  atmosphere,  the  oceans  and  the  biosphere  to  issues  related  to  the  solid  part  of  the  planet.   They   are   all   part   of   Earth   system   sciences   or   geosciences.   Earth   system   sciences   address  many  important  societal  issues,  from  weather  prediction  to  air  quality,  ocean  prediction  and  climate  change  to  natural  hazards  such  as  seismic,  volcanic  and  tsunami  hazards,  for  which  the  development  and  the  use  of  high-­‐performance  computing  plays  a  crucial  role.  

Research   in   the   fields   of   w eather,   c limatology   and   solid   Earth   s ciences   is   of   key   importance  for  Europe  for:  

•     Informing  and  supporting  preparation  of  the  EU  policy  on  environment,  climate  and  natural  hazard  mitigation  and  adaptation    

•     Understanding  the  likely  impact  of  the  natural  environment  on  EU  infrastructure,  economy  and  society  

• Enabling   informed   EU   investment   decisions   in   ensuring   sustainability   within   the   EU   and  globally  

• Developing   civil   protection   capabilities   to   protect   the   citizens   of   the   EU   from   natural  disasters  

• Supporting  through  the  EU  and  ESA  joint   initiative  on  Global  Monitoring  of  Environment  and  Security  

The   following   paragraphs   introduce   the   WCES   scientific   domains,   the   societal   benefits   and   the  international  background.  

Climate  Change  In  the  last  decade,  our  understanding  of  climate  change  has   increased,  as  has  the  societal  need  for  this  to  carry  into  advice  and  policy.  However,  while  there  is  great  confidence  in  the  fact  that  climate  change  is  happening,  there  remain  uncertainties.   In  particular,  there  is  uncertainty  about  the  levels  of  greenhouse  gas  emissions  and  aerosols   likely   to  be  emitted  and,  perhaps  even  more  significant,  there  are  uncertainties  about  the  degree  of  warming  and  the  likely  impacts.  Increasing  the  capability  and   comprehensiveness   of   ‘whole   Earth   system’  models   that   represent   in   ever-­‐increasing   realism  and   detail   scenarios   for   our   future   climate   is   the   only  way   to   reduce   these   latter   uncertainties.   A  further   challenge   is   to   provide  more   robust   predictions  of   regional   climate   change   at   the  decadal,  multi-­‐decadal  and  centennial  timescales  to  underpin  local  adaptation  policies.  In  many  regions  of  the  world,   there   is   still   considerable  uncertainty   in   the  model  predictions  of   the   local   consequences  of  climate  at  different   timescales.  Model   resolution  plays  a  key  role.  A  dual-­‐track  approach  should  be  taken   involving   multi-­‐member   multi-­‐model   comparisons   at   the   current   leading-­‐edge   model  resolution  (about  20  km,  limited  to  a  few  decades)  alongside  the  longer-­‐term  aim  to  develop  a  global  convective   resolving   model   (down   to   1   km   resolution).   To   reduce   these   uncertainties   in   climate  projections   requires   a   coordinated   set   of   experiments   and   multi-­‐year   access   to   a   stable   High-­‐Performance  Computing  (HPC)  platform.   Issues  relating  to  mass  data  storage,  and  dissemination  of  model  outputs  for  analysis  to  a  wide-­‐ranging  community  of  scientists  over  a  long  period,  will  need  to  be   resolved.   A   multi-­‐group   programmatic   approach   could   allow   in   Europe   a   set   of   model   inter-­‐  comparisons  focused  on  a  number  of  priority  climate  science  questions.    

Page 48: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  48  

Oceanography  and  Marine  Forecasting  The   ocean   is   a   fundamental   component   of   the   Earth   system.   Improving   understanding   of   ocean  circulation  and  biogeochemistry   is  critical   to  assessing  properly  climate  variability  and  future  climate  change  and  related  impacts  on,  for  example,  ocean  acidification,  coastal  sea  level,  marine  life  and  polar  sea-­‐ice  cover.  The  ocean  greatly  influences  the  climate  system  at  shorter  timescales  (i.e.  for  weather  forecasting   and   seasonal   climate   prediction)   but   its   influence   grows   as   timescales   increase.   Beyond  climate,  ocean  scientists  are  being  called  on  to  help  assess  and  maintain  the  wealth  of  services  that  the  ocean  provides  to  society.  Human  activities,  including  supply  of  food  and  energy,  transport  of  goods,  etc.,  exert  an  ever-­‐increasing  stress  on  the  open  and  coastal  oceans.  These  stressors  must  be  evaluated  and  regulated  in  order  to  preserve  the  ocean's  integrity  and  resources.  Society  must  also  protect  itself  against   marine   natural   hazards.   Marine   safety   concerns   are   becoming   more   acute   as   the   coastal  population  and  maritime  activities   continue   to   grow.   For   all   these   concerns,   there   is   a   fundamental  need  to  build  and  efficiently  operate  the  most  accurate  ocean  models   in  order  to  assess  and  predict  how   the  different   components   of   the  ocean   (physical,   biogeochemical,   sea-­‐ice)   evolve   and   interact.  The  main  perspective  is  to  produce  realistic  reconstructions  of  the  ocean's  evolution  in  the  recent  past  (e.g.  reanalyses)  and  accurate  predictions  of  the  ocean's  future  state  over  a  broad  range  of  time  and  space   scales,   to   provide   policy   makers   and   the   general   public   with   relevant   information,   and   to  develop  applications  and  services  for  government  and  industry.  

Meteorology,  Hydrology  and  Air  Quality  Weather  and  flood  events  with  high  socio-­‐economic  and  environmental  impact  may  be  infrequent,  but  the  consequences  of  occurrence  can  be  catastrophic  to  those  societies  and  Earth  systems  that  are  affected.  There   is,  of  course,  a   link  to  climate  prediction  and  climate  change   impacts,   if  severe  meteorological   and   hydrological   events   are   to   become   more   frequent   and/or   more   extreme.  Predicting  these   low-­‐frequency,  high-­‐impact  events  a  few  days   in  advance  –  with  enough  certainty  and   early   warning   to   allow   practical   mitigation   decisions   to   be   taken   –   remains   difficult.  Understanding  and  predicting  the  quality  of  air  at  the  Earth’s  surface  is  an  applied  scientific  area  of  increasing  relevance.  Poor  air  quality  can  cause  major  environmental  and  health  problems  affecting  both   industrialised   and   developing   countries   around   the   world   (e.g.   adverse   effects   on   flora   and  fauna,   and   respiratory   diseases,   especially   in   sensitive   people,   ).   Advanced   real-­‐time   forecasting  systems  are  basic  and  necessary  tools  for  allowing  early  warning  advice  to  populations  and  practical  mitigation  strategies  in  case  of  air  pollution  crisis.  

Solid  Earth  Sciences  Computational   challenges   in   solid   Earth   sciences   span   a   wide   range   of   scales   and   disciplines   and  address  fundamental  problems  in  understanding  the  Earth  system  –  evolution  and  structure  –  in  its  near-­‐surface   environment.   Solid   Earth   sciences   have   significant   scientific   and   social   implications,  playing  today  a  central  role  in  natural  hazard  mitigation  (seismic,  volcanic,  tsunami  and  landslides),  hydrocarbon   and   energy   resource   exploration,   containment   of   underground   wastes   and   carbon  sequestration,  and  national  security  (nuclear  test  monitoring  and  treaty  verification).  In  the  realm  of  seismic  hazard  mitigation  alone,  it  is  well  to  recall  that,  despite  continuous  progress  in  building  code,  one   critical   remaining   step   is   the   ability   to   forecast   the   earthquake   ground   motion   to   which   a  structure  will  be  exposed  during  its  lifetime.  

All  these  areas  of  expertise  require  increased  computing  capability  in  order  to  provide  breakthrough  science.   A   programme   of   provision   of   leadership-­‐class   computational   resources   will   make   it  increasingly   possible   to   address   the   issues   of   resolution,   complexity,   duration,   confidence   and  certainty,   and   to   resolve   explicitly   phenomena   that   were   previously   parameterised.   Each   of   the  challenges   represents   an   increase   by   a   factor   of   at   least   100   over   individual   national   facilities  currently  available.  A   large  number  of  the  numerical  models  and  capability-­‐demanding  simulations  described   below  will   lead   to   operational   applications   in   other   European   centres,   national   centres  and  industry.  

Page 49: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  49  

2.2 Computational  Grand  Challenges  and  Expected  Outcomes  

2.2.1 Climate  of  the  Earth  System  Motivation  

There  is  a  vital  need  for  high-­‐performance  computing  in  order  to  predict  the  future  evolution  of  the  climate  and  answer  key  societal  questions  about  the  impact  of  global  warming  on  human  activities.  Even  though  the  scientific  community  has  little  doubt  that  climate  is  sensitive  to  mankind’s  activity,  many   questions   remain   unsolved   at   the   quantitative   level.   There   is   a   need   to   better   qualify   and  quantify   uncertainty   of   the   predictions,   estimate   the   probability   of   extreme   events   and   regional  impacts,  quantify  the  feedbacks  between  climate  and  biogeochemical  cycles  such  as  carbon  dioxide  and  methane,  and  identify  the  impacts  of  climate  change  on  marine  and  terrestrial  ecosystems  and  on   societies.   All   these   questions   are   strongly   linked   to   the   amount   of   computing   power   and   data  storage   capacity   available   since   they   ask   for   increased   model   resolution,   large   numbers   of  experiments,  increased  complexity  of  Earth  system  models,  and  longer  simulation  periods  compared  to   the   current   state   of   climate  models.   It   is   also   important   to   perform   coordinated   ensembles   of  simulations  using  different  models  to  ensure  the  robustness  of  the  model  results.  Such  coordinated  multi-­‐model  activities  are  carried  out  within  the  framework  of  the  IPCC  but  considerably  more  could  be   done  within   a   European   context.   Sustained   computing   power   of   the   order   of   100   Tflop/s   to   1  Pflop/s   or   more   is   already   required   today   for   Europe   to   maintain   its   scientific   weight   in   climate  change  research  worldwide.  

Challenges:  description  and  state  of  the  art  

Fundamental  questions  facing  climate  change  research  can  be  summarised  in  four  key  challenges.  

Challenge  #1:  The  need   for  very  high-­‐resolution  models   to  better  understand,  quantify  and  predict  extreme  events  and   to  better  assess   the   impact  of  climate  change  on  society  and  economy  on   the  regional  scale  

Modelling   the   climate   system   is   a   challenge   because   it   requires   the   simulation   of   a   myriad   of  interacting   and   complex   processes   as   well   as   their   analysis   at   different   time   and   spatial   scales.  Climate  system  modelling  requires  sophisticated  numerical  models,  due  to  the  inherently  non-­‐linear  governing   equations.   Huge   computational   resources   are   needed   to   solve   billions   of   individual  equations   describing   the   physical   processes   at   different   scales.   Indeed,  model   simulations   are  required   to   represent   both   modification   of   the   larger-­‐scale,   global,   state   (inside   which   extreme  events   are   developing)   and   the   fine-­‐scale   temporal   and   spatial   structure   of   such   events   (storms,  cyclones,  intense  precipitation,  etc.).  

Currently,   global   climate  models   have   typical   grid   spacing   of   100–200   km  and   are   limited   in   their  capacity   to   represent   processes   such   as   clouds,   orography   effects,   small-­‐scale   hydrology,   etc.   The  latest  generation  of  models,  under  development  or  just  starting  to  be  used,  have  grid  spacing  in  the  20–50   km   range,   and   there   is   evidence   that   a   number   of   important   climate   processes   are   better  represented   at   this   resolution   (e.g.   ENSO,   blocking,   tropical   storm   numbers,   etc.).   A   priority   is   to  continue  the  development  of  coupled  models  at  such  high  resolution  and  use  them  in  multi-­‐member  multi-­‐model  inter-­‐comparisons  focused  on  key  climate  processes.    

In  weather   forecasting   applications,  much   higher   convective   resolving   limited-­‐domain  models   are  now  being  used  operationally.  However,  these  models  cannot  be  run  globally  for  climate  because  of  the  prohibitive  cost  of  associated  computing   resources  and   limits   in  model   scalability.  The  climate  community’s  first  ‘grand  challenge’  in  the  longer  term  is  therefore  to  develop  global  climate  models  that   resolve   convective   scale   motions   (nominally   around   1   km   horizontal   resolution).   These   very  high-­‐resolution   models   will   directly   resolve   convective   systems,   allow   a   better   representation   of  orographic   effects,   atmosphere   and   ocean   energy   and   matter   transport,   and   provide   greater  regional  details.    

Page 50: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  50  

They   will   allow   determination   of   whether   convective   scale   resolution   is   necessary   for   credible  predictions   of   some   important   aspects   of   regional   climate   change.     Developing   such   very   high  resolutions   will   require   developing   scalable   and   more   efficient   dynamical   cores   and   improving  physical  parameterisations.  

 

Figure  2.1.  Snapshot  of  Hadley  Centre  HadGEM3  simulation  at  a  resolution  of  25  km  (N512).  These  simulations,  part  of  the  UPSCALE  project,  were  performed  on  the  HLRS  HERMIT  supercomputer,  with  funding  from  PRACE.  Ocean  temperatures  (in  colour  going  from  blue  =  cold  to  violet  =  warm)  are  shown  in  the  background,  while  clouds  (B/W  scale)  and  precipitation  (colour)  are  shown  in  the  foreground.  Over  land,  snow  cover  is  shown  in  white.  (Credits:  P.  L.  Vidale  and  R.  Schiemann  (NCAS-­‐Climate,  University  of  Reading)  and  the  PRACE-­‐UPSCALE  team.)  

Very  high-­‐resolution   global   models  are  expected  to   improve  our  predictions  and  understanding  of  the   effect   of   global   warming   on   high-­‐impact   weather   events   on   seasonal,   decadal   and   century  timescales.  Another   issue   is   related   to   the   simulation  of  regional-­‐scale  climate  features,  of  crucial  importance   for   assessing   impacts   on   society   and   economic   activities   (farming,   fisheries,   health,  transportation,  etc.)  and  for  which  improved  regional  models,   embedded   in   global   climate  models,  are  necessary.    

Such   regional  models,   currently  running  at  10–50  km,  are   also   calling  for  spatial  resolution  of  a  few  kilometres.    

Increasing  model  resolution  down  to  1  km  requires   increases  by  factors  of  at   least  100  to  1,000   in  computing  power  compared  to  the  current  state,  i.e.  in  the  multi-­‐petascale  to  exascale  range  toward  the  end  of  the  decade.  It  should  be  noted  that  each  increase  of  the  spatial  resolution  by  a  factor  2  in  each  direction  mandates  at  least  an  eightfold  increase  in  computing  power,  depending  on  the  time  step  length  that  needs  to  be  decreased  with  increasing  resolution.  

Challenge  #2:  The  need  to  move  from  current  climate  models  towards  Earth  system  models  

Today   it   is  clear  that  models  must  also   include  more  sophisticated  representations  of  non-­‐physical  processes  and  subsystemsthat  are  of  major  importance  for  long-­‐term  climate  development,  such  as  the  carbon  cycle.  Scientists  are  keen  to  discover  the  sensitivity  of  predictions  not  only  to  unresolved  physical  processes  (e.g.  the  cloud  feedbacks  mentioned  above)  but  also  to  non-­‐physical  o ne s ,   like  those   related   to   biology   and   chemistry   (including,   for   example,   those   involving   the   land   surfaces,  and   GHG-­‐reactions).   In   the   last   few   years,   biological   and   chemical   processes   have   begun   to   be  

Page 51: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  51  

included   in   long-­‐term   simulations   of   climate   change,   albeit   in   a   simplified  way.   In   addition   to   the  value  of  being  able  to  predict  changes  in  vegetation  and  atmospheric  composition,  it  turns  out  that  these  additional  processes  can  have  quite  a  marked  effect  on  the  magnitude  of  climate  change.  For  example,  the  European  modelling  groups  were  the  first  to  show  in  coupled  Earth  system  models  that  the  global  carbon  cycle  accelerates  climate  change.    

However,   the   carbon   cycle   itself   is   intertwined   with   other   biogeochemical   cycles,   such   as   the  nitrogen  cycle,  so  other  matter  cycles  also  need  to  be  included.  Moreover,  other  processes,  such  as  GHG-­‐reactions   or   aerosol-­‐related   processes   and   their   indirect   effect   on   clouds   or   interactive  vegetation,  still  need  to  be  better  accounted  for.  

It  should  be  noted  that  including  the  representation  of  biogeochemical  cycles  using  different  biochemical  tracers   and   aerosols   typically   increases   time   by   a   factor   of   between   5   and   20   (depending   on   the  complexity  of  the  parameterisations  and  the  number  of  tracers).  An  increase  of  computing  power  by  a  factor  of  5  to  20  is  then  required  to  better  account  for  the  complexity  of  the  system.  

Challenge  #3:  Quantifying  uncertainty  

Future  projections  of  climate  change  are  uncertain   for  a  number  of   reasons.  The   future   forcing  by  greenhouse   gases   and   aerosols   is   uncertain,   and   climate   variations   have   both   a   natural   and  anthropogenic  component,  both  of  which  need  to  be  represented  in  climate  models.  The  models  are  also  inherently  imperfect  owing  to  physical  processes  that  are  either  not  completely  understood  or  yet  to  be  adequately  represented  because  of  limited  computer  power.  

To  better  understand  and  predict   global   and   regional   climate   change  and   climate   variability   using  numerical   models,   a   wide   range   of   underlying   scientific   issues   needs   to   be   solved   by   the  international   community,   as   reported   in   the  WCRP   strategy   COPES   ‘Coordinated  Observation   and  Prediction  of  the  Earth  System’  2005–2015  (http://wcrp.ipsl.jussieu.fr/).  

Taking  into  account  the  interests  and  strengths  of  the  European  climate  science  community  and  the  aim  to  answer  societal  needs,  the  major  issues  are  related  to:    

• The  predictability  –  and  its  limits  –  of  climate  on  a  range  of  timescales    • The  range  of  uncertainty  that  can  be  fully  represented  using  the  models  currently  available  • The  sensitivity  of  climate  and  how  much  we  can  reduce  the  current  uncertainty  in  the  major  

feedbacks,  including  those  due  to  clouds,  atmospheric  chemistry  and  the  carbon  cycle  

The   consensus   approach   to   solving   these   problems   is   to   assume   that   the   uncertainty   can   be  estimated  by  combining  multi-­‐model  multi-­‐member  experiments.  Running  multi-­‐model  experiments  in  a  coordinated  European  way  allows  investigation  of  the  sensitivity  of  results  to  model  parameters.    Moreover,   running  multi-­‐member   ensembles   of   climate   integrations   allows   the   chaotic   nature   of  climate   to   be   accounted   for   and   thereby   enables   systematic   assessment   of   the   relative   roles   of  natural  climate  variability  and  man-­‐made  climate  change.  

Use   of   different   scenarios   of   emissions   of   greenhouse   agents   is   also   mandatory   to   warrant   the  production   of   experiments   to   probe   the   future   course   of   climate.   Furthermore,   quantifying  uncertainty   about   future   climate   change   will   require   investigating   the   sensitivity   of   results   to  specification   of   the   initial   state;   the   latter   initialisation   issue   is   particularly   important   for   decadal  timescale  predictions  where  both  natural  climate  variability  and  man-­‐made  climate  change  need  to  be  predicted  within  the  model.  

It   should   be   noted   that   computing   requirements   scale   directly   with   the   number   of   ensemble  members   required   to   represent   better   uncertainties   associated   with   both   internal   variability   and  model   parameterisations,   but   the   number   of  members   required   to   keep   the   same   signal-­‐to-­‐noise  ratio   in   climate   forecasts   also   increases   as   the   spatial   resolution   increases.   Ensemble  experiments  are  therefore  computationally  expensive  –  a  factor  of  10  to  100  for  each  experiment  –  but  will  bring  

Page 52: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  52  

enormous   economic   benefit   as   they   will   improve   reliability   of   models   and   our   understanding   of  uncertainties  in  forecasts.  

Challenge  #4:  The  need  to  investigate  the  possibility  of  climate  surprises  

In   a   complex   non-­‐linear   system   such   as   the   Earth   system,  minute   actions   could   cause   long-­‐term,  large-­‐scale   changes.   These   changes   could   be   abrupt,   surprising   and   unmanageable.   Paleoclimatic  data   indicate   the   occurrence   of   such   rapid   changes   in   the   past.   For   example,   it   is   crucial   to  determine  if  there  are  thresholds  in  the  greenhouse  gas  concentrations  over  which  climate  change  could   become   irreversible.   The   Atlantic   thermohaline   circulation   (THC)   might   undergo   abrupt  changes  as  inferred  from  paleo-­‐records  as  well  as  from  some  long  simulations  of  future  climate.  The  possible  climatic  consequences  of  such  a  slowdown  in  Atlantic  THC  are  still  under  debate.  Surprises  may  also  arise  from  ice-­‐sheet  collapse  and  large  amounts  of  fresh  water  in  the  ocean.  

Some  key  questions  arise  on  the  possibility:    

• To   model   and   understand   glacial–interglacial  cycles,  including  changes  in  carbon  cycle  and  major  ice  sheets    

• To   use   observational   evidence   from   past   climates   to   calibrate   the   sensitivity   of   complex  climate  models  and  respective  adjustable  model  parameters    

• To   what   extent   we   can   attribute   signals   in   the   period   of   the   instrumental   record   to  understand  Earth  system  processes  (from  weather  scales  to  those  typical  of  anthropogenic  climate   change).   The   need   for   longer   historical   runs,   both   current-­‐era   hindcasts   and  palaeoclimates,   and   for   longer   runs   to   investigate   possible   future   non-­‐linear   changes   is  evident,  and  the  computing  needs  scale  accordingly.  

To  investigate  climate  surprise  requires  longer  simulations  on  future  and  past  periods  with  medium  to   high   resolution   and   various   degrees   of   complexity.   A   factor   of   10   to   1,000   is   required   which,  however,  cannot  be  achieved  just  by  an  increased  number  of  cores  but  also  requires  an  increase  of  power  for  each  individual  core.  

Roadmap  

To  improve  the  reliability  and  capability  of  climate  and  Earth  system  simulations,  both  physical  and  software   infrastructures   are   required.   Computing   power   is   a   strong   constraint   to   the   type   of  problem   that   can   be   addressed.   At   the   European   level,   Tier-­‐0   (international   –   the   PRACE  infrastructure),   Tier-­‐1   (national)   and   Tier-­‐2   (institutional)   facilities   are   available.   Thus,   a   suitable  strategy  for  Earth  system  modelling  is  needed  to  exploit  efficiently  all  these  systems  belonging  to  the  European  ‘computing  ecosystem’.    

Climate  models   fundamentally  scale  with  difficulty  on  supercomputers  because   the  problems  they  represent   are   connected,   algorithmically   between   elements   of   the   Earth   system   processes   and  physically   in   all   physical   dimensions.   Requiring   significant   communication   results   in   an   increasing  overhead   with   increasing   domain   decomposition.   Both   capability   and   capacity   computing   are  therefore  important  for  Earth  system  modelling.  Capability  is  needed  given  the  long  timescales  every  coupled  model  configuration  needs  to  spin  up  to  a  stable  state.  As  described  above,  paleo-­‐studies  in  particular  also  require  relatively  long  runs  and  therefore  capability.  In  both  cases,  this  will  be  true  as  long   as   no   technique  of   parallelisation   in   time   is   available.  Higher-­‐resolution   simulations,   of   course,  also   strongly   benefit   from   capability.   However,   to   carry   out   control   and   transient   multi-­‐member  ensemble   runs  dealing  with  modern   climate  and  based  upon   the  above-­‐mentioned   stable   state   is   a  typical   capacity   problem.   For   example,   producing   the   set   of   experiments   for   IPCC   AR5,   organised  through   CMIP5,   requires   running   of   a   high   number   of   simulations   (typically   cumulated   10,000  simulated  years  for  each  of  the  ~25  modelling  centres)  and  certainly  asks  for  capacity,  as  all  of  these  runs  must  be  considered  as  being  part  of  the  same  experiment.  These  capacity-­‐demanding  ensemble  

Page 53: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  53  

type  runs  with  high-­‐resolution  models  are  generally  done  most  efficiently  on  central  HPC  systems  and  not  in  a  distributed  manner.  There  are  applications  for  which  distributed  systems  would  provide  good  performance,   but   these   cases   generally   depend   on   models   with   very   good   portability   and   with  relatively  low  input/output  volumes,  criteria  that  are  not  fulfilled  in  general  by  Earth  system  models.  So  systems  especially  suited  to  ESM  high-­‐performance  computing  applications  need  to  provide  both  capability   and   capacity   with   a   good   balance   between   computer   power,   storage   system   size   and  read/write  efficiency.  

The   climate   community   is   just   beginning   to   use   the   Tier-­‐0   PRACE   type   machines   (e.g.   UPSCALE  project  on  HLRS  based  on  atmospheric  only  model   at  20  km  scale   resolution)   available  within   the  PRACE  infrastructure.  Most  climate  models  are  executed  (e.g.  for  IPCC  scenarios)  on  Tier-­‐1  national,  sometimes  on  purpose-­‐built   systems  or   Tier-­‐2  dedicated  national  machines,  which   are   sufficiently  tailored   towards   climate  applications,  providing,   for   example,   a   good  balance  between   computing  performance,  bandwidth  to  storage,  and  storage  capacity.  Lack  of  scalability  of  the  multi-­‐component  (e.g.  atmosphere,  ocean,   land,   coupler,  etc.)   climate  models   is   just  one   reason  why   they  have  not  been   run   often   on   Tier-­‐0  machines   yet.   Another   important   aspect   is   that   access   requirements   to  these   Tier-­‐0   platforms   are   based   on   capability   only   rather   than   on   both   capability   and   capacity.  Another   key   reason   for   this   limited  use   is   that   running   a   coordinated   set   of   experiments   requires  multi-­‐year  access   to  a  stable  platform,   in   terms  of  hardware  and  middleware,  because  of   the  high  cost  of  loss  of  bit-­‐level  reproducibility.    

In  summary,  climate  modelling  requires:  

• Computing  platforms  offering  both  capability  and  capacity  and  access  requirements  for  the  applications  including  these  two  aspects  

• Multi-­‐year  access  so  that  the  simulations  can  be  carried  out  on  the  same  Tier-­‐0  machine  with  the  same  environment  as  the  ones  used  during  the  model  porting  and  validating  phase  

• The  possibility  of  getting  multi-­‐year  multi-­‐modelling  groups  access  to  investigate  scientific  questions  through  targeted  multi-­‐model  experiments  

• Appropriate  mass  storage  and  dissemination  mechanisms  for  the  model  output  data    • Outputs  of  the  coordinated  set  of  experiments  easily  available  for  analysis  to  a  wide-­‐ranging  

community  of  scientists  over  a  long  period  

These   requirements   will   be   central   to   the   usefulness   of   the   PRACE   infrastructure   for   the   climate  community.   ENES   (European   Network   for   Earth   System   Modelling   –   http://www.enes.org)     has  started   collaboration   with   PRACE   aimed   at   fostering   the   use   of   Tier-­‐0   machines   for   the   climate  community.   PRACE   has   started  with   general-­‐purpose  machines   that   serve   all   research   fields.   The  specific  requirements  of  the  climate  community,  such  as  appropriate  queue  structures  and  access  to  high-­‐volume   data   archiving,   would   be   better   met   by   a   dedicated   world-­‐class   machine   for   Earth  system  modelling  in  the  future.  Such  a  facility  would  also  allow  the  production  of  ensembles  of  very  high-­‐resolution  simulations  for  future  climate  relevant  for  the  development  and  provision  of  climate  services.  It  would  also  efficiently  correlate  with  the  proposed  establishment  of  an  Exascale  Climate  and  Weather   Science   (ECWS)   Co-­‐Design   Centre,   where   integrated   teams   of   climate   and   weather  science   researchers,   applied  mathematicians,   computer   scientists   and   computer   architects   would  cooperate  closely.  At  least,  facilities  customised  for  climate  models  are  expected.  

   

2.2.2 Oceanography  and  Marine  Forecasting    Motivation  

Progress  in  ocean  science  is  intricately  linked  to  the  computing  power  available  because  of  the  need  for  increasingly  higher  model  resolutions,  many  more  simulations,  and  greater  complexity  in  ocean  system   models.   Operational   oceanography   is   a   new   and   rapidly   growing   sector,   providing   key  

Page 54: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  54  

assessments  for  coastal  water  quality,  fisheries  and  marine  ecosystems,  offshore,  military,  transport,  etc.   The   advent  of   satellite  measurements  of   sea   level   (altimetry)   and  of   the   global  Argo   array  of  profiling  floats  has  led  to  major  breakthroughs  by  overcoming  the  historically  sparse  data  coverage  in   the   surface   ocean.   Yet   the   subsurface   and   deep   ocean   remains   drastically   under-­‐sampled.  We  must  therefore  assimilate  available  data  into  models  and  make  sure  that  those  models  account  for  the  key  ocean  physical  and  biogeochemical  processes  to  be  able  to  predict   the  evolution  of  ocean  characteristics  and  of  marine  ecosystems  at  all  relevant  scales.  A  key  concern  in  the  ocean,  as  in  the  atmosphere,  is  eddies.  Their  ubiquitous  nature  causes  them  to  play  a  fundamental  role  in  setting  the  mean  circulation,  in  transporting  heat,  carbon  and  other  key  properties  across  frontal  structures  and  between  basins.  Small  eddies  (sub-­‐mesoscale)  drive  variations  in  vertical  motion  that  affect  nutrient  supply   and   thus  ocean  biota.   Eddies   and  other   non-­‐linear   ocean  processes   generate   intrinsic   low-­‐frequency   variability   that   cannot   be   quantified   by   present   observation   systems   and   that   require  large-­‐ensemble  modelling   strategies.   These   key   concerns   can   be   addressed   only   by   building   upon  recent  advances  in  ocean  modelling  to  construct  more  accurate,  high-­‐resolution  models.  

Challenges:  description  and  state  of  the  art  

Peta-­‐   to   exa-­‐flops   computing   capabilities  would   greatly   help   resolve   three  major   issues   facing   the  oceanographic  research  community.  

Challenge  #1:  High-­‐resolution  ocean  circulation  models  

Spatial   resolution   is   of   prime   importance  because  major  ocean   currents   are   critically   governed  by  small-­‐scale  topographic  features,  such  as  narrow  straits  and  deep  sills,  and  by  energetic  small-­‐scale  eddies.  These  factors  are  essentially  unresolved  in  current  climate  models.  Existing  eddy-­‐permitting  (O(10   km)   grid)   models   have   now   begun   to   capture   eddy   processes   in   the   subtropics   and   mid-­‐latitudes   (with   strong   effects,   for   example,   on   the   Gulf   Stream),   but   much   higher   resolution   is  needed  to  achieve  comparable  progress  in  the  subpolar  and  polar  oceans.  Yet  it  remains  a  challenge  to   run   realistic   global   or   regional   ocean/sea-­‐ice   models   at   resolutions   high   enough   to   ensure  dynamical  consistency  over  a  wide  range  of  resolved  scales.  The  immediate  challenge  is  to  use  such  models  in  many  ensemble  simulations  to  quantify  and  understand  the  broadband  chaotic  variability  that   spontaneously   emerges   within   the   eddying   ocean.   This   will   further   help   quantify   associated  uncertainties   in   ocean   forecasting   and   in   climate   prediction.   Going   further,   we   also   need   to  represent   scales   of  O(1   km)  over   large  domains   (i.e.   at   a   basin   or   global   scale)   as   process   studies  show  that  features  of  the  scale  have  significant  impacts  on  larger  scales.  Another  challenge  concerns  developing  data  assimilation  in  eddying  ocean  models:  new  computationally  efficient  methodologies  are  needed  to  adequately  constrain  such  models  with  a  combination  of  a  large  variety  of  very  high-­‐resolution   satellite   observations   (wide-­‐swathe   altimetry,   high-­‐resolution   SST,   ocean   colour,   SAR  images,  etc.)  and  the  much  more  dispersed  and  under-­‐sampled  in-­‐situ  data  (e.g.  ARGO  drifters).  

Challenge  #2:  Carbon  fluxes  

A   key   challenge   concerns   the   strong   control   of   ocean   physics   on   ocean   biogeochemistry.   Ocean  biogeochemistry  affects  air–sea  CO2  fluxes,  which   in  turn  affect  atmospheric  CO2  and  thus  climate.  To  project  future  changes  in  air–sea  CO2  fluxes  and  climate  change  accurately,  a  key  prerequisite  is  to  be  able  to  simulate  adequately  decadal  and  inter-­‐decadal  variability  of  the  oceanic  carbon  cycle  over   the   recent   past,   while   separating   natural   from   anthropogenic   components.   Major  improvements  are  needed  to  assess  adequately   large-­‐scale  carbon  transport,  regional  variations  of  ocean  carbon  sources  and  sinks,  and  exchanges  between  the  open  and  coastal  oceans.  In  addition,  because  ocean   circulation   in  general   and   small-­‐scale  physical  processes   in  particular   greatly   affect  ocean   carbon,   it   is   critical   to   pursue   these   studies   with   coupled   physical/biogeochemical   global  ocean  models  at  the  highest  resolution  available.  

Page 55: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  55  

Challenge  #3:  Understanding  and  monitoring  marine  ecosystems  

Another  great  challenge  in  ocean  sciences  is  to  understand  the  evolution  of  regional  marine  ecosystems  over  seasonal  to  decadal  scales  and  their  sensitivity  to  a  changing  environment.  For  instance,  increasing  atmospheric   CO2   also   affects   the   ocean   by   reducing   ocean   pH   (ocean   acidification),   which   threatens  some  marine  organisms,  especially  corals  and  shell  builders.  Human  influence  is  also  causing  a  general  warming   of   surface   waters   (through   climate   change)   as   well   as   reductions   of   oxygen   in   subsurface  waters.   How   these   multiple   stressors   interact   and   will   change   in   the   future   is   a   burning   question.  Accurate  models  of  these  interactions  would  greatly  improve  understanding,  monitoring  and  forecasting  of  marine  resources  and  influence  the  preservation  of  our  coastal  zones.  These  models  will  need  to  be  able   to   simulate   biogeochemical   cycles   and   ‘blooms’   accurately   and   to   build   stronger   links   between  halieutic   resources   and   fine-­‐scale   ocean   circulation   features.   In   the   mid-­‐latitudes,   ecosystems   are  strongly  affected  by  vertical  eddy  velocities  that  can  enhance  or  impede  the  supply  of  nutrient-­‐rich  deep  water.  High  resolution  is  essential,  possibly  using  nested  regional  models  (grid  refinement  up  to  1/100°  degree)  embedded  within  larger-­‐scale  models.  Explicit  resolution  of  fine-­‐scale  processes  avoids  the  use  of  subgrid-­‐scale  parameterisations  which  remain  inaccurate  in  critical  regions  that  have  remote  impacts  on  basin  or  global  oceanic  circulation.  

Roadmap  

Running  ocean/sea-­‐ice  circulation  models  that  can  resolve  the  spectrum  of  ocean  dynamics  down  to  the   sub-­‐mesoscale   (e.g.   O(1/100°))   is   at   present   beyond   any   computing   capability,   except   in   very  local   areas   and   over   short   periods.   Since   a   twofold   increase   in   grid   resolution   requires   a   tenfold  increase  in  computer  power,  reaching  kilometric  resolution  at  global  scale  demands  a  thousandfold  increase  in  computer  power.  Adding  a  full  carbon  cycle  model  increases  the  computational  cost  and  storage   capacity   by   a   factor   of  O(5).   The  highest   resolution   global   ocean/sea-­‐ice   circulation  model  presently   used   in   Europe   for   research   and   operational   forecasting   (e.g.   ORCA12   used   in   MCS  MyOcean)  uses  a  grid   resolution  of  1/12°   (i.e.  5   to  10  km).  A  single  50-­‐year   long   run  of   this  model  requires   an   available   crest   computational   power   of   25   Tflop/s   for   a   period   of   2   months,   which  represents  ~2.5  %  of   the  annual   crest  power  available  on  a  Pflop/s   computer   (or  ~10%  of   a  Tier-­‐1  computer  where  this  model  is  presently  being  run).  The  most  commonly  used  eddying  ocean  models  (for  operational   forecasting,   seasonal  and  climate  prediction,  and  ocean  climate  variability   studies)  use   a   grid   resolution   of   about   1/4°   (10   to   25   km)   and   require,   for   a   50-­‐year   long   run,   a   crest  computational  power  of  10  Tflop/s  for  a  period  of  20  days.  In  the  near  future,  the  scientific  objectives  of   the  oceanographic  community  will   require,  on   the  one  hand,  performing  series  of  multi-­‐decadal  experiments   with   O(1/12°)   models   or   ensemble   runs   of   O(50)   members   with   O(¼°)   models,   or  increasing  eddying  ocean  model  complexity  with  a   full  carbon  cycle.  Operational  oceanography,  on  the  other  hand,  urgently  needs   to  develop  higher-­‐resolution  products   for   its  marine   core   services,  and  to  develop  global  models  with  a  grid  resolution  of  ,  for  example,  1/24°  (resolution  of  5  km  at  low  latitudes  and  up  to  2  km  at  high  latitudes),  asking  for  an  increase  of  computational  power  by  a  factor  of  10  or  more.  At  the  same  time,  we  need  to  make  significant  progress  in  data  assimilation:  accurate  ocean  simulations  require  accurate  initial  conditions,  accurate  forcing  fields  and  accurate  calibration  of   model   parameters.   These   requirements   are   even  more   demanding   than  modelling   in   terms   of  computational  power.  Moreover,   it  would  be   important  to   include  grid  refinements,  down  to  1  km  scales   to   resolve   explicitly   the   dynamics   of   specific   oceanic   regions   that   are   critical   for   the   global  oceanic   circulation   and   therefore   avoid   the   need   for   parameterisations   in   key   regions.   Altogether,  the  above  requirements  call   for  crest  computational  resources  of  500  to  1,000  Tflop/s  available  for  periods  of  months.  This  can  be  obtained  only  with  O(10  to  100)  Pflop/s  computers  coupled  to  very  large  storage  facilities  O(10  to  100  PBytes)  that  will  store  the  simulation  outputs  over  long  periods  (at  least   O(5   y))   for   subsequent   studies.   Finally,   support   should   be   made   available   to   train   young  interdisciplinary  scientists  to  become  specialists  in  not  only  climate  science  or  HPC  but  both.  Training  should  be  provided   via   summer   schools   and   international   training  networks   (e.g.   the   International  Training  Network  SLOOP  (SheLf  to  deep  Ocean  mOdelling  of  Processes)  recently  submitted  to  FP7).  

Page 56: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  56  

2.2.3 Weather  and  Air  Quality  Motivation  

The   mitigation   of   high-­‐impact   weather   achievable   by   having   a   more   accurate   and   timely   weather  nowcasting  system  requires  that  the  comprehensiveness  of  the  various  models  used  for  describing  the  development   of   the   atmosphere   in   the   near   future   will   have   to   be   enhanced   dramatically.   The  resolution   of   the   numerical   weather   prediction   models   will   have   to   be   increased   to   about   1   km  horizontally  to  resolve  convection  explicitly.  In  addition,  a  probabilistic  approach  will  have  to  be  taken  to  arrive  at  meaningful  warning  scenarios.  Similarly,  one  of  the  most  far-­‐reaching  developments  results  from   the   enhanced   capabilities   in   air   quality  modelling   forecasts.   Chemical   transport  models   (CTM)  that  aim  to  simulate  the  physical  and  chemical  processes  in  the  atmosphere  have  been  used  for  urban  pollution   problems.   However,   the   operational   air   quality   forecast   systems   require   high   spatial  resolution,   significant   computational   efforts   and   a   large   volume   of   input   data.   Indeed,   this   is  more  relevant  on  new  online  codes,  where  weather  and  air  quality  are  solved  with  an  integrated  approach.  The   ability   to   forecast   local   and   regional   air   pollution   events   is   challenging   since   the   processes  governing  the  production  and  sustenance  of  atmospheric  pollutants  are  complex  and  non-­‐linear.  The  availability   of   increased   computational   power   and   the   possibility   of   accessing   scattered   data   online  with  the  help  of  a  cloud  infrastructure,  coupled  with  advances   in  the  computational  structure  of  the  models,  now  enable  their  use  in  real-­‐time  air  quality  forecasting.  Furthermore,  this  may  contribute  to  the  GMES  (Global  Monitoring  for  Environment  and  Security)  European  initiative,  which  has  stated  as  a  priority   objective   the   deployment   of   environmental   forecasting   services.   These   challenges   are  currently  being  tackled,  albeit  on  a  much  smaller  scale,  because  the  computing  power  is  about  three  orders  of  magnitude  smaller  than  required  for  the  complete  solutions.  Today  it  is  only  possible  to  solve  the  problems  for  a  limited  number  of  variables  in  a  limited  area.  It  is  known  in  principle  what  needs  to  be  done  in  future  but  the  resources  are  not  yet  available.  

Challenges:  description  and  state  of  the  art  

Fundamental   questions   facing   weather   and   air   quality   research   can   be   summarised   in   three   key  challenges.  

Challenge   #1:   The   need   for   very   high-­‐resolution   atmospheric   models   and   associated   forecasting  system  

The   resolution   of  models  will   have   to   be   increased   in   both   time   and   space   to   be   able   to   resolve  explicitly  physical  processes  which  today  are  still  parameterised.  To  arrive  at  meaningful  results,  this  work   also   entails   the   evaluation   of   the   error   growth   due   to   the   uncertainty   of   the   initial   and  boundary   conditions.   One   option   is   the   computation   of   multiple   scenarios   with   initial   conditions  varying  within   the  error   space.   Furthermore,   the  data  gathered  by  new,  high-­‐resolution  observing  systems,   either   space-­‐   or   ground-­‐based,   need   to   be   assimilated   using   new   techniques   such   as  4DVAR.  Furthermore,  the  I/O  rates  of  the  applications  will  be  in  the  order  of  5  GB/s  for  the  duration  of   the   runs,   resulting   in   files  of  more   than  6  TBytes.   In  order   to   test   these   ideas,  a  preoperational  trial   of   a   complete   end-­‐to-­‐end   system   needs   to   be   carried   out,   checking   whether   the   integrated  system  consisting  of  data  acquisition,  data  assimilation,  forecast  run  and  product  generation  can  be  handled  in  a  sufficiently  small  time  period  to  allow  future  operational  deployment.  Similarly,  sets  of  ensemble  forecasts  will  have  to  be  run  as  a  preoperational  test  case.  

Challenge   #2:   High-­‐resolution   assimilated   air   quality   forecast   and   cloud–aerosol–radiation  interaction  models  

A  reliable  pan-­‐European  capability  of  air  pollutant  forecasting  in  Europe  with  high  resolution  (1  km)  becomes  essential   in  informing  and  alerting  the  population,  and  in  the  understanding  of  when  and  why  episodes  of  air  pollution  arise  and  how  they  can  be  abated.  It  is  well  known  that  the  accuracy  of  an  air  quality   forecasting  system  depends  dramatically  on  the  accuracy  of  emission  data.   It   is   thus  timely   to   develop   a   system   using   the   four-­‐dimensional   variational   analysis   (4DVAR)   to   improve  simulations   of   air   quality   and   its   interactions  with   climate   change.   4DVAR   is   an   improved  way   of  

Page 57: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  57  

combining   observations   valid   at   different   times   (satellite   data,   radiosondes,   ground   observations,  aircraft   measurements,   photometer   data,   model   data)   with   background   fields   from   a   previous  forecast  to  create  a  starting  point  for  a  new  forecast.    

4DVAR  is  computationally  very  expensive  (considerably  more  expensive  than  the  forecast  itself),  and  also  requires  a  lot  of  memory.  In  order  to  be  able  to  provide  high-­‐resolution  air  quality  forecasts  and  improved  weather  forecasts,  an  important  unresolved  question  is  the  role  of  aerosols   in  modifying  clouds,  precipitation  and   thermal  atmospheric   structure.  Under  natural   conditions,  dust  generates  complex   feedbacks   within   atmospheric   processes:   increased   dust   load   modifies   the   thermal   and  dynamic   structure   of   the   air;   modified   atmosphere   furthermore   changes   conditions   for   the   dust  uptake   from   deserts,   and   so   on.   In   a   similar   way,   other   aerosols   may   act   as   cloud   condensation  nuclei   and   affect   precipitation   processes.   In   current   generation   models,   these   processes   are   still  highly  simplified.  Several  efforts  in  this  area  must  be  progressed  with  the  help  of  supercomputers.  

Challenge  #3:  Develop  pan-­‐European  short-­‐range  weather  and  air  quality  modelling  systems  

Weather  and  air  quality  models  are  commonly  used  in  the  forecast  mode.  Further  improvements  are  still   needed   and   these   require   more   computing   power.   Such   improvements   include:   the  representation  of   clouds   and   small-­‐scale  processes,   the   coupling  with   the  biosphere,   atmosphere,  aerosols,  clouds,  chemical   reactions   in   the  environment,  data  assimilation,  and  the  estimation  and  computation  of  different  sources  of  emissions.  In  addition  to  the  requirements  of  previous  sections,  this  requires  including  more  physics  and  chemical  species  in  the  models.  Short-­‐range  and  very  high-­‐resolution  models   need   observational   data   and   subsequent   data   assimilation.   This   assimilation   of  new   data   needs   to   be   resolved,   also   as   real-­‐time   application.   This   increases   the   computing  requirements  by  a  factor  of  10.  

Roadmap  

The  national   computing   resources   for   solving   the   challenges  outlined  above  are  expected   to  grow  over  the  coming  years.  This  growth  will  allow  a  gradual  increase  in  complexity  and  resolution  of  the  various  models  used  and  will  result  in  narrowing  the  gap  between  what  is  possible  at  any  given  time  and  what   is   required.  The  work  will  be  carried  out  by  collaboration  between  established  scientific  communities  consisting  of  the  European  National  Meteorological  Services  and  the  European  Centre  for   Medium-­‐range  Weather   Forecasts   (organised   in   the   European   Meteorological   Infrastructure),  collaborating  universities,  and  scientific  research  centres.  The  European  partners  have  collaborated  over  many  years,  often  supported  by  EU  projects.    

A  complete  pan-­‐European  weather  and  air  quality  forecast  requires  high  resolution  over  a  wide  area,  a   complete   description   of   the  meteorology,   gas-­‐phase   and   aerosol   chemistry,   and   their   transport  and   coupling   with   the   other   component   of   the   Earth’s   system,   like   emission   of   mineral   dust.  Integrated   air   quality   forecasts   for   the   area   of   all   EU   Member   States   at   the   highest   feasible  resolution  (1  km)  require  extensive  computing  resources,  as  each  of  the  applications  proposed  will  require   100–300   Tflop/s   sustained   performance.   The   work   will   be   carried   out   by   established  scientific  communities,  among  other  initiatives  existing  at  the  different  Member  States  on  a  local  or  national  level.  The  most  relevant  initiative  is  the  GMES  project,  a  joint  initiative  between  the  EU  and  ESA   to   strengthen   the   acquisition   and   integration   of   high-­‐quality   EU   environmental,   geographical  and  socio-­‐economic  data,  that  will  help  improve  policymaking  from  local  to  global  level.  

2.2.4 Solid  Earth  Sciences  Motivation  

Because   solid   Earth   processes   occur   on   many   different   spatial   and   temporal   scales,   it   is   often  convenient  to  use  different  models.  A  key  issue  is  to  better  identify  and  quantify  uncertainties,  and  estimate   the   probability   of   extreme   events   through   simulation   of   scenarios   and   exploration   of  parameter   spaces.  For   some  problems,   the  underlying  physics   is   today  adequately  understood  and  the  main  limitation  is  the  amount  of  computing  and  data  capabilities  available.  For  other  problems,  a  

Page 58: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  58  

new  level  of  computing  and  data  capabilities  is  required  to  advance  our  understanding  of  underlying  physics  where  laboratory  experiments  can  hardly  address  the  wide  range  of  scales  involved  in  these  systems,   for   example  modelling   and   simulating   earthquake  dynamics   rupturing  processes   together  with  high-­‐frequency  radiation  in  heterogeneous  Earth.  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The  solid  Earth  community  is  preparing  itself  for  massive  use  of  supercomputers  by  the  current  (re-­‐)  organisation  of  some  of  the  communities  through  large-­‐scale  EU  projects,  including  the  ESFRI  project  EPOS   (http://www.epos-­‐eu.org),   the   FP7-­‐Infrastructure   project   VERCE   (http://www.verce.eu),   the  Marie   Curie   ITN   initiative   QUEST   (http://www.quest-­‐itn.org),   and   other   initiatives   like   SHARE,  TOPOEurope,  TOPOMod  and  MEMoVOLC.  The  need  for  leadership-­‐class  data-­‐intensive  computing  is  illustrated  through  the  four  major  challenges  outlined  below.    

Challenges:  description  and  state  of  the  art  HPC   research   makes   very   important   contributions   to   the   development   of   carbon-­‐free   sources   of  energy  (e.g.  from  nuclear  fusion).    Fundamental  questions  facing  solid  Earth  sciences  research  can  be  summarised  in  four  key  challenges.    Challenge  #1:  Earthquake  ground  motion  simulation  and  seismic  hazard  

To   understand   the   basic   science   of   earthquakes   and   to   help   engineers   better   prepare   for   such  events,   scientists   need   to   identify  which   regions   are   likely   to   experience   the  most   intense   ground  shaking,  particularly  in  populated  sediment-­‐filled  basins.  This  understanding  can  be  used  to  improve  building  codes  in  high-­‐risk  areas  and  to  help  engineers  design  safer  structures,  potentially  saving  lives  and  property.   In  the  absence  of  deterministic  earthquake  prediction,  the  forecasting  of  earthquake  ground   motion   based   on   simulation   of   scenarios   is   one   of   the   most   promising   tools   to   mitigate  earthquake-­‐related   hazard.   This   requires   intense  modelling   that  meets   the   actual   spatio-­‐temporal  

 

Figure  2.2.  Top  left:  Simulation  of  the  seismic  wavefield  generated  by  the  L’Aquila  earthquake  (Italy)  on  6  April  2009;  snapshots  at  6  s,  11  s,  16  s  and  21  s  after  the  event  (vertical  displacement,  up/down  as  red/blue);  Bottom  left:  Mesh  discretisation  of  the  L’Aquila  region  for  high-­‐frequency  wave  simulations.  (Courtesy  E.  Casarotti  and  F.  Magnoni,  see  Peter  et  al.,  2011.);  Right:  Visualisation  of  the  magnetic  state  in  the  Earth’s  liquid  core  during  an  inversion  of  the  magnetic  polarity  using  the  Dynamical  Magnetic  Field  Line  Imaging  method.  (Courtesy  J.  Aubert,  see  Aubert  et  al.,  2008.)  

 

Page 59: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  59  

resolution  scales  of  the  continuously  increasing  density  and  resolution  of  the  seismic  instrumentation  which  records  dynamic  shaking  at  the  surface  as  well  as  of  the  basin  models.  Another  important  issue  is  to  improve  our  physical  understanding  of  the  earthquake  rupture  processes  and  seismicity.  Large-­‐scale   simulations  of   earthquake   rupture  dynamics,   and  of   fault   interactions,   are   currently   the  only  means  to  investigate  these  multiscale  physics  together  with  data  assimilation  and  inversion.    

High-­‐resolution  models   are   also   required   to   develop   and   assess   fast   operational   analysis   tools   for  real-­‐time   seismology   and   early   warning   systems.   Earthquakes   are   a   fact   of   life   in   Europe   and   all  around  the  world.  Accurate  simulations  must  span  an  enormous  range  of  scales,   from  metres  near  the   earthquake   source   to   hundreds   of   kilometres   across   the   entire   region,   and   timescales   from  hundredths   of   a   second   –   to   capture   the   higher   frequencies,   which   have   the   greatest   impact   on  buildings   –   to   hundreds   of   seconds   for   the   full   event.   Adding   to   the   challenge,   ground   motion  depends   strongly   on   subsurface   soil   behaviour.  While   providing  much   useful   information,   today’s  most   advanced   earthquake   simulations   are   generally   not   capable   of   adequately   reproducing   the  observed  seismograms.  The  likely  reason  is  that  these  models  are  based  on  a  number  of  assumptions  made  largely  to  reduce  the  computational  effort  and  on  the  often  poor  knowledge  of  the  medium  at  the   scale-­‐length   target   of   the   waveform   modelling.   There   is   an   urgent   need   to   enhance   these  simulations   and   to   improve   model   realism   by   incorporating   more   fundamental   physics   into  earthquake  simulations.  The  goal  is  to:    

• Extend  by  a  factor  10  the  spatial  dimensions  of  the  models    • Increase  the  highest  resolved  frequency  above  5  Hz  (for  structural  engineering  purposes)  

implying  a  64-­‐fold  increase  in  computational  size  (size  scales  roughly  as  the  cube  of  the  resolved  frequency)    

• Move  to  more  realistic  soil  behaviours  implying  at  least  a  two  order  of  magnitude  increase  in  computational  complexity      

• Incorporate  a  new  physics-­‐based  dynamic  rupture  component  at  100  m  resolution  for  realistic  wave  radiation,  and  near-­‐field  risk  assessment,  implying  at  least  an  order  of  magnitude  increase  in  computation    

• Invert  for  both  the  earthquake  source  and  the  geological  parameters  which  necessitates  repeated  solutions  of  the  forward  problem  leading  to  an  increase  of  one  to  two  order  of  magnitude  in  computations    

• Perform  stochastic  modelling  of  seismic  events  and  wave  propagation  for  quantifying  uncertainties  and  exploring  earthquake  scenarios  which  implies  a  10–50  times  increase  in  computation  

These  improved  simulations  will  give  scientists  new  insights  into  where  strong  ground  motions  may  occur   in   the   event   of   such   an   earthquake,   which   can   be   especially   intense   and   long-­‐lasting   in  sediment-­‐filled  basins.    

State  of  the  art  can  be  split   into  two  categories.  First,  problems  for  which  the  underlying  physics   is  adequately   understood   and   the   challenges   arise   mainly   from   computational   limitations   (e.g.  simulation   of   an   elastic   wave   propagation   in   strongly   heterogeneous   geological   media   remains   a  computational   challenge   for   the   new   Pflop/s   technology).   Second,   problems   for   which   high-­‐performance   computing   resources   are   required   to   advance   scientific   understanding   –   modelling  earthquake  dynamic   rupturing   processes   together  with   high   frequency   radiation   in   heterogeneous  media  is  an  example  of  this  type  of  problem.  Fully  coupled  extended  earthquake  dynamics  and  wave  propagation  will  remain  a  grand  challenge  problem  even  with  the  next  generation  computers.  

Challenge  #2:  High-­‐resolution  imaging  techniques  

The  capacity  for  imaging  accurately  the  Earth’s  subsurface,  on  land  and  below  the  sea  floor,  is  one  of  the   challenging   problems   that   have   important   economic   applications   in   terms   of   resource  management,   identification  of   new  energy   reservoirs   and   storage   sites   as  well   as   their  monitoring  

Page 60: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  60  

through  time.  As  recoverable  deposits  of  petroleum  become  harder  to  find,  the  costs  of  drilling  and  extraction   increase;   the   need   for  more   detailed   imaging   of   underground   geological   structures   has  therefore   become   obvious.   Recent   progress   in   seismic   acquisition   related   to   dense   networks   of  sensors  and  data  analysis  makes   it  possible  now  to  extract  new  information  from  fine  structures  of  the  large  volume  of  recorded  signals  associated  with  strongly  diffracted  waves.  This  increase  in  data  acquisition   is   also   important   for   risk   mitigation.   Imaging   accurately   seismic   rupture   evolution   on  complex   faulting   systems   embedded   in   a   heterogeneous   medium,   or   time-­‐lapse   monitoring   of  volcanoes,  proceed  in  similar  fashion.  

Seismic   imaging  of   the  Earth’s   subsurface  has   important   implications   in   terms  of   energy   resources  and   environmental   management.   While   deep-­‐ocean   (1,000–2,000   m)   fossil   energy   resources   are  going   to   be   under   extraction,   investigations   in   complex   tectonic   zones   as   foothills   structures   are  crucial   because   these   zones   are   expected   to  host   reservoirs   of   future   economic   interest.  With   the  advent   of   high-­‐resolution   and   large  dynamic   instrumentation,   the   challenge   is   now   to   exploit   fully  fine  details   of   the   recorded   signals   going  beyond   the   first   arrival  waves   and  exploring   late-­‐arriving  signals   associated   with   strongly   and   possibly   multiply   diffracted   waves.   This   will   open   new  perspectives  in  very  complex  geological  settings,  as  well  as  the  capacity  of  monitoring  through  time  waste   disposals   or   reservoirs   during   their   exploitation,   and,   for   example,   possible   ascents   and  descents  of  magmas  within  volcanoes.  

Differential   and   time-­‐lapse   seismology,   migration   and   correlation   methods   are   today   explored   in  order  to  extract  this  detailed  information.  In  these  imaging  techniques,  only  adjoint  methods  related  to   linearised  techniques  are  to  date  tractable,  and  back-­‐projection   is  the  mathematical   tool   for  the  image   reconstruction.   The   adjoint   methods   allow   for   only   local   analysis   of   the   resolution   and  uncertainties.   Semi-­‐local   analysis   will   require   simulated   annealing/genetic   algorithms   leading   to   a  drastic   increase   in   computer   resources   we   cannot   yet   foresee,   without   mentioning   exhaustive  inspections   of   model   space   with   importance   sampling   strategy.   Because   thousands   of   forward  problems  should  be  achieved  in  an   iterative  optimisation  scheme  related  to  the  number  of  sources  and   receivers,   one   must   investigate   techniques   for   solving   efficiently   in   a   combined   way   these  forward   problems   altogether.   Moreover,   in   the   forward   model,   new   models   must   accurately  simulate  complex  wave  propagation  phenomena  such  as  reflection  and  diffraction  in  heterogeneous  media   with   high   impedance   contrasts,   or   diffraction   by   rough   topographies   at   the   surface   of   the  Earth  or  at  the  bottom  of  the  sea,  at  very  high  frequencies  (10–40  Hz)  where  complex  attenuation  is  expected.   The   bridge   between   deterministic   estimations   and   probabilistic   approaches   should   be  clearly  identified  and  will  justify  the  demanding  task  of  performing  wave  propagation  modelling.  

Challenge  #3:  Structure  and  dynamics  of  the  Earth’s  interior  

One  of  the  major  problems  facing  Earth  scientists  is  to  improve  the  resolution  and  the  understanding  of  the  Earth’s  interior  structure  and  dynamics.  Broadband  seismological  data  volumes  are  increasing  at  a  faster  rate  than  computational  power,  challenging  both  the  analysis  and  the  modelling  of  these  observations.  This  progress  is  thanks  to  the  federation  of  digital  seismic  networks  and  to  the  notable  presence  of   the  European   Integrated  Data  Archive(s)   (i.e.  ORFEUS  and  EIDA)  which,   irrespective  of  the   specific   archive   to   which   the   data   request   is   submitted,   provides   data   contained   in   all   the  federated   archives.   So   far,   only   a   small   fraction   of   the   information   contained   in   broadband  seismograms  is  actually  used  to   infer  the  structure  of  the  Earth’s   interior.  Recent  advances   in  high-­‐performance  computing  and  numerical  techniques  have  facilitated  three-­‐dimensional  simulations  of  seismic  wave  propagation  at  unprecedented   resolution  and  accuracy  at   regional   and  global   scales.  The  realm  of  Pflop/s  computing  opens  the  door  to  full  waveform  tomographic  inversions  that  make  use  of  these  new  tools  to  enhance  considerably  the  resolution  of  the  Earth’s  interior  image.  This  is  a  grand   challenge   problem   due   to   the   large   number   of   mesh-­‐dependent   model   parameters   and   of  wave  propagation  simulations  required  during  the  inversion  procedure.  

Page 61: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  61  

Convection  of  the  solid  Earth  mantle  drives  plate  tectonics  and  the  Earth  thermal  evolution.  Mantle  convection  is  dominated  by  slow  viscous  creep,  involving  a  timescale  of  hundreds  of  millions  of  years.  Despite   the   low   velocities,   the   Rayleigh   number   is   of   the   order   of   107,   inducing   a   quite   time-­‐dependent   dynamic   and   small   convective   scales   comparable   to   the   size   of   the   domain.   One  computational   challenge   is   thus   the   resolution   of   convective   features   of   less   than   100   km   over  spherical  domain  of  depth  2,900  km  and  circumference  of  40,000  km.  Another  challenging   issue   is  the   resolution   of   the   rapid   spatial   variations   of   the   physical   properties:   viscosity   is   strongly  dependent  on  temperature,  pressure  and  stress  (e.g.  six  orders  of  magnitude  with  temperature  and  two  orders  of  magnitude  with  depth).   Incorporation  of  melt-­‐induced  compositional  differentiation,  self-­‐consistent   plate-­‐like   behaviour   (elastic   brittle)   and   composition   solid–solid   phase   change   is  extremely   difficult   and   computationally   demanding.   How   plate   tectonics   arise   from   mantle  convection  is  an  outstanding  issue.  

Seismology   is   the  unique  method   that  can  probe   the  Earth’s   interior   from  the  surface   to   the   inner  core,  as  well  as  its  external  coupling  with  the  atmosphere  and  the  oceans.  Improving  the  capability  to  enhance   the   quality   of   3D   tomographic   images   of   the   Earth’s   interior,   with   a   resolution   of   the  thermal   and   chemical   heterogeneities   lower   than   tens   of   kilometres,   using   the   continuously  increasing  data  sets  of  broadband  seismological  records,   is  today  essential  to   improve  core–mantle  dynamical  models  and  our  knowledge  of  the  Earth’s  physics.  This  is  also  an  essential  step  in  order  to  improve   the   imaging   of   earthquake   rupture   processes   using   both   regional   and   tele-­‐seismic  seismological   observations.   Solid   Earth   internal   dynamical   processes   often   take   place   on   scales   of  tens  to  millions  of  years.  Even  with  the  most  advanced  observational  systems,  the  temporal  sampling  of  these  phenomena  is  poor.  In  order  to  understand  these  systems,  simulations  must  be  carried  out  concurrently  with  observations.  Mantle  convection  provides  the  driving  force  behind  plate  tectonics  and  geological  processes  that  shape  our  planet  and  control  the  sea  level.  Realistic  models  of  thermo-­‐mechanical  mantle   convection   in   3D   spherical   geometry   are   required   to   better   assimilate  mineral  physics   and   seismology   information   into   the   deep-­‐Earth   dynamics.   The   short-­‐time   scale   dynamic  behaviour  will  serve  as  the  monitor  for  stress  build-­‐up  that  loads  seismically  active  regions.  

Numerical   3D   simulation   of   wave   propagation   at   regional   and   global   scales   has   been   achieved  recently  at  unprecedented  resolution  and  will  continue  to  improve  in  the  next  few  years.  Using  these  new  developments   for  non-­‐linear  arrival   time  and  waveform   inversions  will   lead   to  a   revolution   in  global   and   regional   tomography   in   the   next   decade.   Even   in   the   realm   of   Pflop/s   computing,   this  seems  an  extraordinary  computational  challenge  when   facing   the  hundreds  or   thousands  of  model  parameters   involved  here.   Taking  advantage  of   the   fact   that   adjoint   calculations  and   time-­‐reversal  imaging   are   quite   straightforward   in   seismic   inverse   problems   opens   new   doors   for   efficiently  computing  the  gradient  of  the  misfit  function  and  to  developing  new  scalable  algorithms  for  seismic  inversions.   Full  waveform   inversion   is   today’s   very   challenging  data-­‐intensive  application,   requiring  well-­‐balanced  HPC  architectures.  

Three-­‐dimensional   numerical   simulations   of   mantle   convection   with   both   chemical   and   thermal  buoyancies   are   today   performed   both   in   Cartesian   and   spherical   shell   geometries.   Numerical  simulations   that   include   both  melt-­‐induced   compositional   differentiation   and   self-­‐consistent   plate  tectonics-­‐like   behaviour   have   been   performed   only   in   two   dimensions   and   in   small   3D   Cartesian  geometries.   Incorporating   melt-­‐induced   compositional   differentiation,   self-­‐consistent   plate-­‐like  behaviour   (elastic   brittle)   and   composition   solid–solid   phase   changes   in   high-­‐resolution   spherical  shell   models   is   today   a   challenging   problem   that   can   be   addressed   only   in   the   realm   of   Pflop/s  computing.  

Challenge  #4:  Generation  of  the  Earth’s  magnetic  field  

Named   one   of   the   enigmas   of   natural   sciences,   the   extremely   involved   magneto-­‐hydrodynamic  simulations  of  the  core  dynamics  and  the  associated  external  magnetic  field  are  essential  to  progress  in   this   field.   The   past   seven   years   have   seen   significant   advances   in   computational   simulations   of  

Page 62: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  62  

convection   and  magnetic-­‐field   generation   in   the   Earth's   core.   Although   dynamically   self-­‐consistent  models  of  the  geodynamo  have  simulated  magnetic  fields  that  appear  in  some  ways  quite  similar  to  the   geomagnetic   field,   none   is   able   to   run   in   an   Earth-­‐like   parameter   regime   because   of   the  considerable  spatial  resolution  that  is  required.  

The  history  of  the  Earth’s  magnetic  field  variations  is  engraved  in  the  frozen-­‐in  field  directions  found  in  most  volcanic  rocks  on  Earth  (e.g.  oceanic  crust  generated  at  the  spreading  ridges).  Many  of  those  observable   directions   are   used   to   derive   plate   motions   of   recent   times,   and   it   is   important   to  understand   the   constraints   of   these   estimates   particularly   in   times   with   frequent   reversals.   On   a  shorter  timescale,  it  is  important  to  understand  the  phenomenology  of  magnetic-­‐field  reversals,  not  only   because   currently   the   field   strength   is   decreasing   steadily  with   some   likelihood   of   a   reversal  over  the  next  few  thousand  years.  Understanding  the  generation  of  the  Earth’s  magnetic  field  is  not  only  crucial  for  geophysics;  it  has  strong  implications  in  astrophysics  in  understanding  the  magnetism  of  planets  and  stars.  Besides,  geodynamo  is  one  of  the  challenges  of  non-­‐linear  physics.  

While   relevant   programs   are   implemented   in   parallel,   much   higher   resolution   is   required   to   be  comparable  with  the  natural  conditions.  In  addition,  many  realisations  are  necessary  to  create  stable  results   for  highly  non-­‐linear  processes  with   strong  dependence  on   initial   and  boundary   conditions.  No  global  convective  dynamo  simulation  has  yet  been  able  to  afford  the  spatial  resolution  required  to   simulate   turbulent   convection,   which   surely   must   exist   in   the   Earth's   low-­‐viscosity   liquid   core.  They  have  all  employed  greatly  enhanced  eddy  diffusivities  to  stabilise  the  low-­‐resolution  numerical  solutions  and   crudely  account   for   the   transport   and  mixing  by   the  unresolved   turbulence.  A  grand  challenge   for   the  next  generation  of  geodynamo  models   is   to  produce  simulation  with  the  thermal  and  viscous  (eddy)  diffusivities  set  no   larger  than  the  actual  magnetic  diffusivity  of  the  Earth's  fluid  core,  while  using   the  core's  dimensions,  mass,   rotation  rate  and  heat   flow.  Another  challenge   is   to  develop   new   highly   parallel,   adjoint-­‐based   assimilation   methods   for   the   understanding   and   the  prediction   of   the   Earth’s  magnetic-­‐field   evolution   and   fluctuations.   Ensemble-­‐based  methods   that  exploit  massively  parallel  architectures  are  the  next  challenging  data-­‐intensive  HPC  applications.  

Roadmap  

Nowadays,   simulations   of   seismic  wave  propagation   in   small   geological   volumes  have   reached   the  Tflop/s  plateau  production  and   state-­‐of-­‐the-­‐art   cases  are   rising   to   the   realm  of  Pflop/s   computing.  Usually,   for   heterogeneous   geological   basins   of   dimensions   300   km   x   300   km   x   80   km   at   200   m  wavelength   resolution,   a   high-­‐frequency   simulation   requires   O(1K–10K)   processors,   O(10–100)  TBytes,   O(1–10)   hours   to   complete   with   complex   intensive   data   movement   between   computer  nodes,  disk  and  archival  storage  elements.    

However,  these  simulations  remain  limited:  the  resolved  frequencies  are  still  too  low  for  the  seismic  engineering  applications  (>  5  Hz);   important  non-­‐linear  effects  and  complex  soil  behaviours  are  not  yet   taken   into   account;   earthquake   sources   remain   simplistic;   and   uncertainty   quantification   and  data  assimilation  are  not  yet  reachable.  Enhancing  the  resolution  and  the  physics,  making  inversion  of  extended  earthquake  sources  and  seismic  parameters,  and  quantifying  the  uncertainties  through  strong   motion   scenarios   will   push   these   simulations   into   the   realm   of   Pflop/s   computing.   A   key  requirement   will   be   the   provision   of   large-­‐scale   end-­‐to-­‐end   data   cyber   infrastructures   to   handle,  analyse  and  visualise  PBytes  of  simulated  data  for  storage.  Visualisation  of  very  large  data  sets  will  be  a  related  important  challenging  problem.  

The   time   formulation   (e.g.  wave  propagation   in   time)   allows  handling  3D   imaging  problems  at   the  expense  of  computer  time  using  present-­‐day  algorithm  technology.  For  simulations  on  boxes  of  100  km  x  100  km  x  25  km,  with  10  Hz  content,  sustained  performances  range  between  10  Tflop/s  and  100  Tflop/s.  Moving   to  more  powerful   resources  will   increase   the  size  of   the  box  and/or   the  maximum  frequency.   Pflop/s   computing   will   improve   seismic   imaging   resolution   by   using   thousands   of  recorded  seismograms.    

Page 63: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  63  

Present-­‐day  numerical  algorithm  know-­‐how  should  be  improved  for  handling  expected  biases  coming  from   our   initial   guess   of   the   Earth’s   image.   Pflop/s   computing   will   also   allow   us   to   tackle   the  important   problem   of   uncertainties   by  making   use   of   repeated   forward  modelling.   The   frequency  formulation   (e.g.   Helmholtz   equation)   allows   efficient   image   processing,   using   a   parallel   direct  algebraic   solver,   by   choosing   only   a   few   frequencies   and   by   speeding   up  multi-­‐sources   and  multi-­‐receivers  computations.  The  3D  imaging  problem  in  the  frequency  domain  is  today  a  challenge  both  for   computer   resources   and   for   numerical   algorithms   due   to   the   lack   of   an   efficient   large-­‐scale  parallel   direct   algebraic   solver.   The   realm   of   Pflop/s   computing   will   provide   the   possibility   of  performing   such   seismic   imaging   in   the   frequency   domain.   This   will   require   access   to   a   large  memory/processor   ratio,   efficient   algorithms   for   direct   decomposition   of   very   large  matrices,   and  optimised  parallel  and  sequential  IOs.    

Achieving  load  balancing  between  processors  in  the  frequency  domain  approach  will  be  a  challenge.  Unfortunately,  resorting  to  iterative  methods  dims  the  interest  of  a  frequency  formulation  compared  to   the   time   domain   formulation.   Seismic   Data   Processing   (SDP)   is   of   paramount   importance   for  imaging  underground  geological  structures  and   is   in  use  all  over  the  world  to  search  for  petroleum  deposits  and  to  probe  the  deeper  portions  of  the  Earth.  Current  advances  in  data  acquisitions,  multi-­‐component   and   multi-­‐attribute   analysis   have   increased   the   data   volume   several   fold.   Processing  methods  have  also  changed  for  high  resolution,   leading  to  an   increase   in  the  computational  effort,  which   is   beyond   the   scope   of   actual   computer   resources.   Large   data   volumes   and   complex  mathematical   algorithms   make   seismic   data   processing   an   extremely   compute   and   I/O   intensive  activity  which  requires  high-­‐performance  computers  (1–10  Tflop/s  sustained)  with  a  large  memory.  

Global   wave   simulation   of   body   wave   phases   that   explore   the   Earth’s   core   is   today   a   Pflop/s  challenge  problem.  This   requires  global  wave  simulations,  at  periods  of  1  second  or   less  and  space  resolution   at   wavelengths   of   tens   of   kilometres,   in   3D   anelastic   Earth   models   including   high-­‐resolution  crustal  models,  topography  and  bathymetry  together  with  rotation  and  ellipticity.  Today,  front-­‐end  global  seismology  simulations  run  at  wavelengths  of  tens  of  kilometres  and  typical  periods  down   to   1–2   seconds   in   3D   Earth   models,   on   hundreds   of   thousand   cores   with   a   sustained  performance   of   ~   200   Tflop/s.   The   next   generation   forward   global   wave   simulations   will   be  simulations  for  periods  below  1  second  and  will  require  hundreds  of  TBytes  of  memory  and  1  Pflop/s  sustained   performance.   Another   great   challenge   will   be   to   go   for   the   adjoint-­‐based   inversion   of  complete  waveforms  using  these  3D  wave  propagation  simulation  models.  This  will   lead  to  at   least  one  order  magnitude  increase  in  the  computational  requirements.  

The   first  models  were   able   to   simulate  magnetic   fields   quite   similar   to   the   geomagnetic   fields   on  Gflop/s  technologies  with  subtle  compromise  leading  to  the  modification  of  the  equations  and  a  set  of  parameters   far   from   the  Earth   conditions.   In  2005,   results   for   the   same  set  of  parameters  have  been   obtained   without   modifying   the   physics   using   512   processors   of   the   38.6   Tflop/s   Earth  Simulator   for  6,500  hours.  For  the  first   time,  a  dynamo  was  obtained  with  a  small  viscous  moment  compared   with   the   magnetic   one.   The   challenge   is   now   to   achieve   the   relevant   balance   for   the  dynamics   of   the   Earth's   core,   for   which   both   moments   are   vanishing.   Massive   access   to   Pflop/s  computing  will   allow   European   researchers   to   investigate   the  mechanism   of   these   dynamos   (only  obtained   in   2005)   and   understand   their   physical   principle.   Yet   the   parameters   available   on   such  resources   are   still   out   by   a   factor   1  million   from   the   actual   geophysical   values.   Progress,   achieved  over  the  last  few  years,  clearly  indicates  that  an  Earth-­‐like  solution  (for  which  both  moments  vanish)  could  be  reached  by  decreasing  the  relevant  parameter  (controlling  viscous  effects)  by  a  factor  1,000  only.  Constructing  such  an  Earth-­‐like  numerical  dynamo  model  is  therefore  only  realistic  in  the  realm  of  Pflop/s  computing.  When  such  simulations  become  available,  the  critical  scientific  issue  will  be  to  interpret  the  dynamical  models  in  the  frame  of  dynamo  theory.  This  will  require  PBytes  of  storage  to  describe   the   4D   (time   and   space)   magneto-­‐hydrodynamic   solution.   Another   great   challenge   is   to  develop   highly   parallel   ensemble-­‐based   assimilation   methods   to   predict   the   evolution   and   the  changes  of  the  Earth’s  magnetic  field.  This  will  require  new  PBytes  capabilities  combining  data-­‐  and  CPU-­‐intensive  architectures.  

Page 64: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  64  

2.3 A  Roadmap  for  Capability  and  Capacity  Requirements    

The  computational   requirements   for   the  weather,  climatology  and  solid  Earth  sciences  applications  discussed   in   this   document   have   one   common   feature:   the   urgent   need   for   access   to   very   large  computational   resources   does   not   stem   from   a   single   aspect   such   as   the   need   to  model   a   larger  number  of  objects  or   to  model  at   a  higher   resolution.  Currently  available   compute  power   restricts  these   applications   in   several   ways.   For   example,   the   envisaged   advanced   climate   studies   require  simultaneously  higher   resolutions,  a  more  sophisticated   representation  of  processes  and  ensemble  methods  to  quantify  uncertainty.    

Very   similarly,   earthquake   and   ground   motion   modelling   requires   higher   resolutions,   a   more  sophisticated   representation   of   the   physical   processes   of   earthquake   source   dynamics   and  quantification  of  uncertainties  on  strong  motion  scenarios.  The  need  to  improve  multiple  aspects  of  the   application   implies   very   high   computational   requirements.   The   requirements   are   typically   a  factor  of  1,000  above  what  can  be  run  today  on  the  top  computational  facilities  installed  in  Europe.  In   absolute   terms,   the   performance   requirements   of   these   applications   range   from   100+   Tflop/s  sustained   to   1   Pflop/s   sustained,   with   some   of   the   applications   having   even   higher   longer-­‐term  requirements.   As   a   large   number   of   such   applications   are   concerned   (see   the   many   challenges  described  above),  the  total  sustained  performance  which  would  be  necessary  is  then  in  the  range  of  10+  to  100  Pflop/s.  

The   ratio   between   sustained   and   peak   performance   varies   from   application   to   application;   in   the  past,  typical  factors  of  1:10  for  scalar  architectures  and  1:3  for  vector  processor-­‐based  systems  were  given  for  weather,  climatology  and  solid  Earth  sciences.  However,  vector  systems  have  become  less  efficient  and  are  effectively  absent   from   the   technology   landscape.  Moreover,  due   to   the  extreme  parallelism  present  in  the  envisaged  Eflop/s  systems  with  millions  of  cores,  such  performance  ratios  could  be  sustained  only  if  the  application  programs  were  modified  to  deal  with  such  parallelism.    

The   peak   performance   requirement   is   therefore   in   the   exascale   range,   when   considering   a   scalar  architecture.   (However,   it   should   be   stressed   here   that   for   many   applications,   such   as   medium  resolution  climate  models  used  for  paleo  simulations,  strong  scaling  would  be  required  to  an  extent  that   seems   not   to   be   possible.   These   applications   would   only   benefit   from   increased   per   core  performance   (e.g.   powerful   new   vector-­‐type   architectures).)   Even   new   mathematical   algorithms  might  be  required.  Due  to  the  high  internal  communication  requirements  of  the  applications  and  the  continuous  need  to  modify  and  enhance  the  model  codes,  a  general-­‐purpose  computing  system  that  offers  excellent  communication  bandwidth  and  low  latencies  between  all  processors  is  required.  For  most   of   the  WCES   applications,   the   amount  of   computer  memory   required   is   generally   not   higher  than   that   required  by   applications   in   other   scientific   disciplines.  However,   studies   of   the   structure  and  dynamics  of  the  Earth’s  deep  interior,  and  high-­‐resolution  seismic  inversion,  will  require  memory  sizes   approaching   100   TBytes.   To   ensure   efficient   utilisation   of   the   system,   an   I/O   subsystem   that  supports   high   transfer   rates   and   provides   substantial   amounts   of   online   disk   storage   (at   least   1+  PBytes   today   and   10+   for   the   2016   timeframe)   is   essential.   Such   online   storage   needs   to   be  complemented   by   local   offline   storage   (at   least   10+   PBytes),   to   enable   inputs   and   outputs   to   be  stored   up   to   12   months.   A   possible   long-­‐term   storage   strategy   would   be   for   each   community   to  develop   its   own  distributed  but   shared  database   system  based  on  data-­‐grid   technology.   The   long-­‐term  archive  could  then  be  held  at  national  facilities.  Most  of  this  archive  would  be  communal  data  available  to  other  researchers  rather  than  private  data.  Depending  on  the  community,  these  archives  would   hold   between   20   and   100   PBytes   of   data.   To   implement   a   grid-­‐based   distributed   archive  system,   high-­‐speed   network   links   between   the   European   resources   and   the   larger   of   the   national  facilities   would   be   a   fundamental   requirement.   Such   a   strong   link   with   national   facilities   would  enable   the   bulk   of   the   pre-­‐   and   post-­‐processing   to   be   carried   out   at   these   facilities.   Equally,  visualisation  and  analysis  of  model  outputs  would  be  possible  through  these  network  links.  

 

Page 65: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  65  

2.4 Expected  Status  in  2020  Today,  all   successful  development  projects   for   sustained  Pflop/s  applications  have   in  common  that  major  portions  of  application  codes  had  to  be  rewritten  (software  refactoring)  and  algorithms  had  to  be  re-­‐engineered  for  applications  to  run  efficiently  and  productively  on  novel  architectures.  

Software  must   be   redesigned   in   order   to  meet   such   requirements   and   adapted   to   exploit   effectively  coming  supercomputing  architectures,  in  particular  the  extreme  parallelism.  Furthermore,  ways  have  to  be  implemented  to  cope  with  the  expected  high  failure  rate  of  the  components.  In  order  to  reach  such  an  ambitious  target,  a  deep  synergy  between  HPC  experts  and  application  developers  from  the  communities  must  be  fulfilled,  and  a  strong  commitment  from  the  scientific  counterpart  has  to  be  achieved.  

This   principle   is   not   limited   to   the  high  end  of   supercomputing,   but   applies   to   all   tiers   of   the  HPC  ecosystem.  The  compute  nodes  architecture,  in  fact,  follows  current  technology  trends  towards  more  parallelism   and   more   customisation.   Compute   nodes   will   integrate   thousands   of   cores,   some   of  which   will   serve   as   computational   accelerators   (e.g.   GP-­‐GPU,   Intel   MIC,   etc.).   These   technology  trends  pose  major  challenges  to  which  software  needs  to  respond  swiftly  and  effectively.  However,  the   effort   involved   in   software   refactoring   can   be   substantial   and   often   surpasses   the   abilities   of  individual   research   groups.   Furthermore,   such   effort   often   requires   a   deep   knowledge   of   the  algorithms  and  of  the  codes  and,  in  many  cases,  even  the  understanding  of  the  physics  beneath  the  numerical   algorithms.   Therefore,   a   successful   practice   is   to   place   these   activities   in   community  projects  where   substantial   code   development   activities   are   common   and   thus   long-­‐term   software  development  activities  for  high-­‐end  computing  can  be  sustained.  

Supercomputer  centres  with  their  profound  expertise  of  computing  architectures  and  programming  models,  have  to  recast  their  service  activities  in  order  to  support,  guide  and  enable  scientific  program  developers   and   researchers   in   refactoring   codes   and   re-­‐engineering   algorithms,   influencing   the  development  process  at  its  root.  The  resulting  codes  will  fit  both  Tier-­‐1  and  Tier-­‐0  allocation  schemas  and   it   will   be   the   particular   requirements   of   the   users   of   these   community   codes   that   determine  whether  applying  for  Tier-­‐1  or  Tier-­‐0  resources.  To  become  effective,  the  computer  services  should  be  provided  for  longer  periods  without  the  necessity  for  users  to  change  their  codes.  

Due   to   the   complexity   of   these   novel   HPC   environments   and   the   relevance   of   the   scientific  challenges,   interdisciplinary   teams   and   training   programmes   will   be   strongly   required.   Training  programmes   will   allow   WCES   scientists   to   improve   their   HPC   background   as   well   as   to   establish  stronger   links  between  the  HPC  community  and  their  own  domain.   In   this   respect,   funding  specific  actions  to  support   training  activities,  summer/winter  schools,   intra-­‐European  fellowships  as  well  as  international  incoming  and  outgoing  fellowships  will  play  a  strategic  role  in  preparing  new  scientists  with  a  stronger  and  more  interdisciplinary  background.  Given  the  expected  increased  complexity  of  the   component  models   and   of   future   exascale   computing   platforms,   a   lot   of   resources   should   be  devoted   to   the   technical   aspects   of   coupled   climate   modelling;   the   coupler   development   teams  should  be  reinforced,  including  experts  in  computing  science  remaining  at  the  same  time  very  close  to  the  climate  modelling  scientists.

Page 66: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  66  

Appendix A: WCES Challenges and Expected Status in the Time Frame 2012–2020

 

  Time Frame Challenges   2012   2016   2020  

HPC  system  peak  performance     x10   x100  

Climate

Climate  Extreme  Events,  Impacts  Quantifying  Uncertainties  

High-­‐resolution  experiments  ~20  km,  few  decades  only  atmosphere  only  ensembles  

Small  decadal  ensembles  coupled  to  ¼o  ocean  

Ensemble  of  multi-­‐model  and  multi-­‐experiments  at  medium  resolution  (~100  km)  with  coupled  climate  models  

Ensemble  of  multi-­‐model  and  multi-­‐experiments  HR  (~20  km)  centennial  experiments  with  coupled  climate  models  (1/4o)  Small  decadal  ensembles  coupled  to  1/12o  ocean  Atmosphere  only  at  higher  resolution  (~7  km)  

Ensemble  of  multi-­‐model  and  multi-­‐experiments  very  HR  (~7  km)  experiments  with  (eddy  resolving)  coupled  climate  models    Atmosphere  only  at  ~1  km  

Climate  Prediction  

Seasonal  multi-­‐model  (~3),  multi-­‐member  (~50)  global  predictions  made  on  different  HPC  systems  at  ~125  km  resolution  Decadal:  multi-­‐model  (~4),  multi-­‐member  (~10)  decadal  hindcasts/predictions  made  at  ~125–250  km  coupled  resolution  

Equivalent  seas-­‐decadal  predictions  routinely  at  ~50  km  coupled  resolution:  ~50  members  (seasonal)  and  20  (decadal).  Improved  physics,  increased  vertical  resolution  (esp.  stratosphere  and  upper  ocean)  

Testing  of  ~15–25  km  seas-­‐decadal  predictions  in  multi-­‐model/multi-­‐member  configurations  

Seas-­‐decadal  predictions  at  ~10  km  resolution  (ensemble  size  25–50  members  per  model)    

Increased  vertical  resolution  +  advanced  physics  

Regional  Climate  Modelling  

~25  km  standard  multi-­‐GCM,  multi-­‐RCP  transient  downscaling  of  centennial  projections  

Limited  number  of  ~10  km  multi-­‐GCM  downscaling  of  centennial  projections  over  Europe  

~10  km  standard  for  multi-­‐GCM,  centennial  dynamical  downscaling  on  Europe  scale  Limited  downscaling  at  ~5  km  resolution  and  some  smaller-­‐domain  downscaled  projections  at  1–2  km  resolution  (cloud  resolving)  

~2  km  downscaling  at  European  scale  

Climate  Earth  System  Modelling  

See  below  Appendix  B,    ‘Key  Numbers  for  Climate  Earth  System  Modelling’  

See  below  Appendix  B,    ‘Key  Numbers  for  Climate  Earth  System  Modelling’  

See  below  Appendix  B,    ‘Key  Numbers  for  Climate  Earth  System  Modelling’  

Paleo  Climate,  e.g.  Holocene  Simulations  or  Glacial  Cycles  

Climate  Surprises:32  

Coupled  Ocean  Atmosphere  Model,  resolution  200–500  km,  possible  to  simulate  O(1,000)  years  per  (calendar)  year    

Need  to  add  additional  components.  Desirable  to  increase  resolution  by  a  factor  of  2  but  priority  on  decreasing  turnaround  time  with  given  resolution  

Simulate  several  10,000  years  per  year  with  resolution  O(100  km)  and  full-­‐blown  ESM  

     

                                                                                                                         32  Thermohaline  circulation  slow-­‐down  in  the  North  Atlantic  ;  rainforest  changes  and/or  boreal  forest  changes  and  carbon  uptake  changes  ;  ocean  stability  and  ocean  C  uptake  changes  ;  sudden  ice-­‐sheet  loss  and  sea  level  ;  permafrost  melt  and  methane  release    

Page 67: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  67  

Oceanography HPC  system  peak  performance  

Ocean  CINES:  300  Tflop/s   x10   x100  

Ocean  Climate  Variability  Glossary:  blue:  physical  white:  sea-­‐ice  green:  biological/biogeochemical    1/4°        :  27  km    to    9  km  1/12°    :  9  km        to    3  km  1/24°    :  4.5  km  to    1.5  km  

–  Eddy-­‐resolving  (1/12°)  multi-­‐decade  simulations  of  the  blue&white  global  oceans  –  Eddy-­‐permitting  (1/4°)  centennial  simulations  of  the  blue&white  global  oceans  –  Multi-­‐decade  O(1  km)  simulations  of  the  coastal  and  regional  blue  oceans  

  –  Eddy-­‐resolved  (1/24°)  multi-­‐decade  simulations  of  the  blue&white  global  oceans  –  Eddy-­‐resolving  (1/12°)  multi-­‐decade  ensembles  or  multi-­‐centennial  simulations  of  the  blue&white  global  oceans  –  Eddy-­‐resolving  (1/12°)  pluri-­‐annual  simulations  of  the  blue&white&green  global  oceans  –  Eddy-­‐permitting  (1/4°)  multi-­‐centennial  simulations  –  Pluri-­‐annual  O(100  m)  simulations  of  the  coastal  and  regional  blue  oceans  

Ocean  Monitoring  and  Forecasting    

–  Eddy-­‐resolving  (1/12°)  analyses  and  forecast  of  the  blue&white  global  oceans  –  Eddy  permitting  (1/4°)  reanalyses  of  the  global  blue&white  oceans  –  Sub-­‐mesoscale  eddy-­‐permitting  (1/36°)  analyses  and  forecast  of  the  blue&white  coastal/regional  oceans    

–  Eddy-­‐resolved  (1/24°)  analyses  and  forecasts  of  the  blue&white  oceans  –  Eddy-­‐resolving  (1/12°)  reanalyses  of  the  global  blue&white  oceans  –  Eddy-­‐permitting  (1/4°)  reanalyses  of  the  global  blue&white&green  oceans  –  Sub-­‐mesoscale  eddy-­‐permitting  (1/36°)  analyses  and  forecast  of  the  blue&white&green  coastal/regional  oceans  –  Sub-­‐mesoscale  eddy-­‐resolving  O(1  km)  analyses  and  forecast  of  the  blue&white  coastal  /regional  oceans  

–  Eddy-­‐resolved  (1/24°)  analyses  and  forecasts  of  the  blue&white  oceans  –  Sub-­‐mesoscale  eddy-­‐resolving  O(1  km)  analyses  and  forecast  of  the  blue&white&green  coastal/regional  oceans  –  Eddy-­‐resolving  (1/12°)  reanalyses  of  the  global  blue&white&green  oceans    

Solid Earth Sciences Earthquake  Ground  Motion  Simulation  and  Seismic  Hazard  

High-­‐resolution  earthquake  dynamics  rupture  and  radiation  wave  simulation  models    (f  ~4  Hz,  l  ~100  m)    Ground  motion  simulation  up  to  4  Hz  in  complex  geological  basins,strong  impedance  contrast,  non-­‐linear  surface  soil  behaviour,  kinematic  finite  tens  source  probabilistic  earthquake  scenarios  

Ground  motion  simulation  4  Hz,  in  large  complex  geological  basins,  dynamic  earthquake  source,  non-­‐linear  surface  soil  behaviour,  hundreds  earthquake  scenarios  and  stochastic  approach  

Dynamic  source  and  velocity  inversion  PBytes  output    

Stochastic  source  and  wave  propagation  simulation  with  quantification  of  the  forward  uncertainties  for  ground  motion  prediction    

Global  Wave  Simulation  in  3D  Earth  Models  and  3D  Global  Tomography  

Global  surface  and  body  wave  simulations  at  3  secs  for  large  earthquakes  

Global  surface  and  body  wave  simulations  below  1  second  for  large  earthquakes  

Bayesian  full  waveform  tomography  using  global  inversion  methods  for  high  resolution  of  the  Earth’s  

Page 68: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  68  

3D  long  periods  full  wave  form  tomography  using  elastic  wave:  mantle  heterogeneities  and  anisotropy  

Point  source  (CMT)  and  extended  source  inversion  

 

       

Exploration  of  the  quality  of  the  Earth  models  in  the  data  space  from  the  comparison  between  predicted  and  observed  wave  forms  at  the  stations  of  global  dense  seismic  arrays  (f  <  1  Hz)    Full  short  period  waveform  tomography  using  adjoint-­‐based  inversion  methods  for  high  resolution  of  the  Earth’s  structure  and  seismic  sources:  mantle  and  core  heterogeneities  and  anisotropy    

structure  and  seismic  sources,  and  forward  and  inverse  error  quantification      

Mantle  Convection    Earth  Magnetic  Field  and  Geodynamo  Modelling  

3D  spherical  thermo-­‐chemical  convection  mantle  at  global  scale  including  self-­‐  consistent  plate-­‐like  behaviour    Seismological  signature  of  dynamic  mantle  convection    Geodynamo  simulation:  investigation  of  the  different  regime  and  scaling      

3D  high-­‐resolution  spherical  thermo-­‐chemical  convection  mantle  at  global  scale  including  self-­‐consistent  plate-­‐like  behaviour  +  solid–solid  phase  changes  and  resolution  of  fine  scales  structures  (plumes,  swells)    Coupling  core  and  mantle  dynamical  convection  models    Ensemble-­‐based  data  assimilation  for  the  prediction  of  the  geomagnetic  field  evolution  and  temporal  changes  

Earth-­‐like  dynamo  model  with  both  viscous  and  magnetic  vanishing  moments  

Several  realisations    

 

Page 69: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Weather,  Climatology  and  solid  Earth  Sciences  

  69  

Appendix B: Some Key Numbers for Ocean and Climate Earth System Modelling Key Numbers for Climate Earth System Modelling   Time  Frame  Challenges   2012   2016   2020  

Horizontal  resolution  of  each  coupled  model  component  (km)   125   50   10  

Increase  in  horizontal  parallelisation  wrt  2012  (hyp:  weak  scaling  in  2  directions)   1   6,25   156,25  

Horizontal  parallelisation  of  each  coupled  model  component  (no.  of  cores)   1,00E+03   6,25E+03   1,56E+05  

Vertical  resolution  of  each  coupled  model  component  (no.  of  levels)   30   50   100  

Vertical  parallelisation  of  each  coupled  model  component   1   1   10  

No.  of  components  in  the  coupled  model   2   2   5  

No.  of  members  in  the  ensemble  simulation   10   20   50  

No.  of  models/groups  in  the  ensemble  experiments   4   4   4  

Total  number  of  cores  (4x6x7x8x9)   8,00E+04   1,00E+06   1,56E+09  

Increase   1   13   19531  

       

Data  produced  (for  one  component  in  Gbytes/month-­‐of-­‐simulation)   2,5   26   1302  

Data  produced  in  total,  i.e.  13x7x8x9x(increase  in  vert  res)  (in  Gbytes/month-­‐of-­‐simulation)  

200   4167   1302083  

Increase   1   21   6510  

 Key Numbers for Ocean Modelling  

The  reference  Tier-­‐1  computer  is  JADE  at  CINES  (23040  CORES,    267  Tflop/s)  

The  unit  simulation  for  research  application  is  a  50-­‐year  run  (multi-­‐decade)  

  Requirements  

Challenges   Computational  Power   Storage  Capacity  

Effective  power  for  Eddy-­‐resolving  model  ORCA1233  (1/12°)  

25  Tflop/s  for  2  months  (i.e.  50  Tflop/s.month)   105  TBytes  

Projection  for  a  reanalysis  with  Eddy-­‐resolving  model  ORCA12  (1/12°)  

250  Tflop/s  for  2  months  (i.e.  500  Tflop/s.month)   105  TBytes  

Projection  for  Eddy-­‐resolving  model  ORCA12  (1/12°)  with  biogeochemistry  

125  Tflop/s  for  2  months  (i.e.  250  Tflop/s.month)   840  TBytes  

Projection  for  Eddy-­‐resolved  model  ORCA24  (1/24°)  

100  Tflop/s  for  5  months  (i.e.  500  Tflop/s.month)   525  TBytes  

Projection  for  50-­‐members  ensemble  run  of  Eddy-­‐permitting  model  ORCA025  (1/4°)  

500  Tflop/s  for  1  month    (500  Tflop/s.month)   60  TBytes  

 

                                                                                                                         33  ORCA12  is  presently  the  ‘biggest’  ocean  circulation  model  in  Europe  for  research  and  operational  use.  

Page 70: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  70  

3 ASTROPHYSICS, HIGH-ENERGY PHYSICS

AND PLASMA PHYSICS

3.1 Summary  

In  recent  years,  astrophysics,  high-­‐energy  physics  and  plasma  physics  have  shared  a  dramatic  change  in  the  role  of  theory  for  scientific  discovery.  In  all  of  these  fields,  new  experiments  became  ever  more  costly,   require   increasingly   long   time   scales   and  aim  at   the   investigation  of  more   and  more   subtle  effects.  Consequently,  theory  is  faced  with  two  types  of  demands:  precision  of  theory  predictions  has  to   be   increased   to   the   point   that   it   is   better   than   the   experimental   one.   Since   the   latter   can   be  expected   to   increase   by   further   orders   of   magnitude   until   2020,   this   is   a   most   demanding  requirement.   In  all  of   these  research   fields,  well-­‐established  theoretical  methods  have  existed   for   -­‐decades.   To   achieve   dramatic   progress   therefore   requires   a   dramatic   increase   in   theoretical  resources,  including  computer  resources  for  numerical  studies.    

In  parallel,  the  need  to  explore  model  spaces  of  much  larger  extent  than  previously  investigated  also  became  apparent.  For  example,  to  determine  the  nature  of  dark  energy  and  dark  matter  requires  a  detailed  comparison  of  predictions  from  large  classes  of  cosmological  models  with  data  from  the  new  satellites  and  ground-­‐based  detectors  which  will  be  deployed  until  2020.  These  predictions  can  only  be  generated  by  massive  numerical  simulations.  In  high-­‐energy  physics,  one  of  the  tasks  is  to  explore  many   possible   extensions   of   the   Standard   Model   to   such   a   degree   that   even   minute   deviations  between  experimental  data  and  Standard  Model  predictions  can  serve  as  smoking  guns  for  a  specific  realisation  of  New  Physics.  In  plasma  physics,  one  of  the  tasks  is  to  understand  the  physics  observed  at  ITER  at  such  a  high  level  that  substantially  more  efficient  fusion  reactors  can  be  reliably  designed  based  on  theoretical  simulations  which  explore  a  large  range  of  options.    

While   the   three   fields   covered   in   this   section   are   distinctly   different,   they   also   have   substantial  overlap.  For  example,  the  Big  Bang  is  equally  a  topic  of  astrophysics  as  of  high-­‐energy  physics    while  nucleosynthesis   depends   on   nuclear   physics   as   well   as   the   modelling   of   supernova   explosions.  Plasma   physics   is   crucial   for   many   aspects   of   astrophysics   as   well   as,   for   example,   a   better  understanding  of  high-­‐energy  heavy-­‐ion  collisions  at  CERN.    

As  the  experimental  roadmap  until  2020  is  already  fixed  in  all  three  research  fields,   it   is  possible  to  quantify   with   some   reliability   what   these   demands   imply   for   HPC   in   Europe.   If   one   requires   that  theory  keeps  up  with  the  experimental  progress,  which  is  crucial  to  maximise  the  scientific  output  of  the   latter,   these   three   fields   together   will   require   at   least   one   integrated   sustained   Eflop/s-­‐year  which  will  require  at  least  a  dedicated  compute  power  of  1EFlop/s,  peak  for  roughly  one  decade.    

3.2 Computational  Grand  Challenges  and  Expected  Outcomes  

3.2.1 Astrophysics  In   astrophysics,   perhaps   even   more   than   in   other   scientific   disciplines,   there   is   an   intimate  interdependency  between  theoretical  research,  which   is  overwhelmingly  reliant  on  simulations  and  modelling,   and   the   exploitation   of   data   from   large   observational   facilities.   Controlled   experiments  are   not   possible   in   astrophysics.   This   important   methodological   gap   is   filled   by   simulations   and  modelling   which,   due   to   the   intrinsic   complexity   of   astrophysical   phenomena,   require   the   most  advanced  computational  infrastructure.  It  follows  therefore  that  state-­‐of-­‐the-­‐art  HPC  is  an  essential  precondition  for  the  advancement  of  this  scientific  discipline  and  for  the  proper  exploitation  of  the  very  large  funding  that  society  has  decided  to  invest  in  astronomical  facilities.  

Page 71: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  71  

Astrophysics  and  cosmology  have  a  unique  public  appeal  and  capture  the  public  imagination  in  a  way  that   few   other   sciences   do.   The   public   are   genuinely   interested   in   many   of   the   questions   that  professional   astrophysicists   address,   such   as   the   nature   of   the   dark   matter   or   the   prevalence   of  Earth-­‐like   planets   outside   our   solar   system.   A   vibrant   research   presence   in   astrophysics   is   a  cornerstone  of  the  scientific  literacy  of  the  population  as  a  whole.  Such  elevated  status,  however,  has  practical   consequences   of   immeasurable   impact   on   society:   a   recent   survey   conducted   by   the  Institute  of  Physics  in  the  UK  showed  that  a  large  fraction  of  students  who  enrol  in  physics  degrees  at  university  are  attracted  to  science  by  the  excitement  of  fundamental  physics  and  astrophysics.    

Europe   has   traditionally   been   a   world   leader   in   HPC-­‐based   theoretical   astrophysics.   European  scientists  lead  the  world  in  the  development  and  release  of  algorithms  and  codes,  and  this  has  led  to  many  of  the  most  important  breakthroughs  in,  for  example,  cosmology  and  stellar  evolution.  Europe  is   already   committed   to   leading   or   participating   in   many   of   the   major   ground-­‐   and   space-­‐based  astronomical  facilities  for  the  next  decade  and  beyond.  By  contrast,  there  is  as  yet  only  an  incipient  coordinated  effort  through  PRACE  to  ensure  that  the  computational  resources  that  are  an  essential  counterpoint  to  these  facilities  are  developed.  An  increase  by  a  factor  of  10  in  the  size  of  the  largest  supercomputers  available   for  astrophysical   research  by  2015  and  by  a   factor  of  100  by  2020   is   the  minimum  requirement.    

Challenges    

The  following  are  12  fundamental  questions   in  astrophysics   in  which  significant  progress   is   likely   in  the   next   decade   if   an   appropriate   mix   of   computing   infrastructure,   software   development   and  observational  facilities  can  be  achieved.    

1. What  is  the  identity  of  the  cosmic  dark  matter  and  dark  energy?  2. How  did  the  universe  emerge  from  the  dark  ages  immediately  following  the  Big  Bang?  3. How  did  galaxies  form?  4. How   do   galaxies   and   quasars   evolve   chemically   and   dynamically   and  what   is   the   cause   of  

their  diverse  phenomenology?  5. How  does  the  chemical  enrichment  of  the  universe  take  place?  6. How  do  stars  form?  7. How  do  stars  die?  8. How  do  planets  form?  9. Where  is  life  outside  the  Earth?  10. How   are  magnetic   fields   in   the   universe   generated   and  what   role   do   they   play   in   particle  

acceleration  and  other  plasma  processes?  11. How  can  we  unravel  the  secrets  of  the  sources  of  strongest  gravity?  12. What  will  as  yet  unexplored  windows   into   the  universe  such  as  neutrinos  and  gravitational  

waves  reveal?  

Answering   these   questions   requires   accurate   numerical   treatment   of   a   range   of   coupled   complex  non-­‐linear   physical   processes   including   gravitation,   hydrodynamics,   non-­‐equilibrium   gas   chemistry,  magnetic   fields,   radiative   transfer   and   relativistic   effects.   The   set   of   partial   differential   equations  describing   this   blend   of   physics   is   well   known   but   largely   inaccessible   by   analytic   techniques.   The  astrophysics  community  has  addressed   these  needs  by  developing  a   large  number  of   sophisticated  algorithms  and  codes.  There  are  ongoing  efforts  to  enable  these  codes  to  scale  to  100,000  cores  and  beyond,   to   include   ‘on  the   fly’  analysis  and  to  use  accelerators  such  as  GPUs;  but  more  manpower  support   is   essential.   There   is   an   urgent   need   to   address   the   data   challenge,   including   storage   and  integration  of  observational  data  and  simulation/modelling  data.  The  hardware  requirements  of  the  community  are  a  mixture  of  very  large  calculations  that  use  a  large  fraction  of  the  machine  alongside  a   large   number   of   smaller   calculations   for   exploring   different   models,   parameters   and   physical  processes.  These  questions  fall  into  three  broad  categories:  cosmology  and  the  large-­‐scale  structure  of  the  universe,  planets  and  stars,  and  strong  gravity  and  physical  processes.  

Page 72: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  72  

Cosmology  and  the  large-­‐scale  structure  of  the  universe:  Questions  1–5  

The  next  decade  will  see  the  advent  of  large-­‐scale  cosmological  experiments,  both  from  the  ground  (LSST)  and  from  space  (Euclid).  Radio  telescopes  like  SKA  will  revolutionise  our  understanding  of  the  high   redshift   Universe,   providing   21   cm   tomography   of   the   epoch   of   cosmic   reionisation.   The  successor  of  the  Hubble  space  telescope,  JWST,  as  well  as  extremely  large  optical  telescopes  of  the  30   m   class   here   on   Earth   will   peer   back   in   time   to   observe   the   infancy   of   galaxy   formation   in  unprecedented   detail,   yielding   insights   into   the   formation   of   the   first   generation   of   objects.   Large  galaxy   surveys   that  are   currently  underway  or  are   to   commence   this  decade   (such  as  Pan-­‐STARRS,  Big-­‐BOSS)  will  drastically  improve  the  statistical  constraints  on  dark  matter,  the  nature  of  dark  energy  and  galaxy  formation.    

It   is   imperative  to  advance  dramatically  our  simulations  to  allow  proper  interpretation  of  upcoming  observational  data   and   to  provide   tight   constraints  on   cosmological  models.   For   example,  one   key  step  is  to  compute  reliably  the  clustering  of  dark  matter  for  a  given  set  of  cosmological  parameters.  Since   gravitational   dynamics   is   a   complex   non-­‐linear   problem,   we   need   to   use   large   N   body  simulations  covering  the  same  volume  and  the  same  galaxy  mass  range  detected  by  these  surveys.  Indeed,  to  interpret  data  from  Euclid  requires  simulations  of  the  cosmological  horizon,   i.e.  cubes  of  size  12’000  Mpc/h,  with  at  least  100  particles  per  halo  of  size  L*/10,  where  L*  is  the  luminosity  of  the  Milky  Way.   This   translates   into   a   prodigious   number   of   particles,   namely  N   =   327683   or   35   trillion  bodies.  State-­‐of-­‐the-­‐art  N  body  solvers  (GADGET,  PKDGRAV  or  RAMSES)  usually  require  200  bytes  per  particle.  For  4  GB  per  core,  taking  into  account  memory  overheads  (x  2),  these  calculations  require  a  106  core  machine  or  larger.  Exascale  resources  will  allow  simulations  of  the  large-­‐scale  universe  with  sufficient   particles   to   resolve   all   dark  matter   haloes   that   could  host   stars.   They  will   allow  multiple  realisations   of   the   Hubble   volume   to   test   and   constrain   models   for   the   dark   sector.   Similar  requirements   are   needed   to   study   baryonic   processes   on   large   scales,   such   as   the  mechanism   of  reionisation,  or  chemical  enrichment  and  feedback  from  stars  and  AGN.  

Our  understanding  of  galaxy  formation  and  the  origin  of  the  Hubble  sequence  (from  dwarf  galaxies  to  grand  design  spirals  and  ellipticals)   is  still  extremely  sketchy  at  best,  even  though  a  basic  formation  paradigm  exists;   ‘hierarchical  galaxy   formation’.  The   fundamental  problem   is   that  galaxy   formation  involves  that  aforementioned  blend  of  different  physical  processes  that  are  non-­‐linearly  coupled  over  a   wide   range   of   scales   leading   to   extremely   complex   dynamics.   For   this   reason,   HPC   simulation  techniques  have  become  the  primary  avenue  for  theoretical  research  in  galaxy  formation.  This  is  also  helped   by   the   fact   that   the   current   standard   model   of   cosmology   precisely   specifies   the   initial  conditions  of   cosmic   structure   formation  at  a   time  briefly  after   the  Big  Bang.   It   is   a   computational  problem  par   excellence   to   try   to   evolve   this   initial   state   forward   in   time,   staying   as   faithful   to   the  physics   as  possible.   Current   state-­‐of-­‐the-­‐art  hydrodynamic   simulations  of   galaxy   formation   reach  a  resolution   length  of  about  100  parsecs   in  the  hydro-­‐  and  gravitational  components  while   individual  stars  are  modelled  with  super-­‐particles  that  represent  1,000  stars.  Exascale  resources  would  enable  an  increase  in  mass  resolution  by  a  factor  of  a  1,000  over  existing  state  of  the  art.  This  would  allow  simulations   of  Milky  Way  models   in  which   the   sites   of   star   formation   are   accurately   followed   and  each  star  is  represented  by  a  single  particle.  

Planets  and  Stars:  Questions  6–9  

Understanding  planet  formation  is  one  of  the  key  questions  of  modern  astronomy.  It  will  help  us  to  learn   about   the   history   of   our   own   planet,   setting   the   conditions   for   human   existence   and   at   the  same  time  giving  us  an  idea  of  how  rare  or  frequent  the  conditions  for  life  in  the  universe  are:  is  life  a  cosmic   phenomenon?   Today,   the   Kepler   satellite   is   revolutionising   our   understanding   of   planetary  systems  around  other   stars,  while  missions   in   the  next  decade  will   characterise   their   atmospheres  and  search  for  signatures  of  life.  Simulations  are  essential  for  understanding  how  these  systems  arise  and   to   quantifying   their   habitability   and   stability.   Understanding   planetary   architecture   requires  detailed   3D   simulations   of   the   unobservable   processes   in   disks   that   involve   multiple   physical  

Page 73: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  73  

processes   acting   over   a   large   range   of   scales.   This   sets   the   nature   of   turbulence   in   disks   and   the  resulting  planetary  building  bricks   (planetesimals)   that  are  produced.  The   initial   size  distribution  of  planetesimals   is   vital   to   understand   the   further   growth   to   planetary   cores.   At   the   same   time,   the  growth  and  migration  of  gas  giants  and   the   reshuffling  of   terrestrial  planets   in   the  habitable  zones  around   the   host   star   also   depend   crucially   on   the   properties   of   turbulence   in   the   gas   disk.   This  program  already  uses  significant  resources  of  the  available  Pflop/s  machines  within  PRACE.  To  take  the   next   step   in   improving   our   simulations,  we   have   to   gain   an   order   of  magnitude   in   resolution,  which   will   catapult   our   applications   from   running   on   about   104   cores   for   107   CPU   hours   to   new  architectures,  providing  us  with  1011  CPU  hours  per  year,  which   is  a  clear  scientific  application  case  and  a  strong  argument  for  the  envisioned  exascale  supercomputers.  

   

Figure  3.1.  One  typical  result  of  the  Millenium  Run  which  investigated  the  evolution  of  the  matter  distribution  in  a  cubic  region  of  the  Universe  of  2  billion  light  years  extent,  by  tracing  10  billion  particles.    Courtesy  of  Volker  Springel  (Heidelberg  Institute  for  Theoretical  Studies)  and  Thomas  Janka  (MPA  Garching)  

The  birth  of  stars  and  their  planetary  systems  is  intimately  coupled  to  the  dynamical  state  of  the  gas  they   are   forming   from   –   cold   and   dense   clouds   of   molecular   hydrogen   and   dust   embedded   in   a  turbulent,  multi-­‐phase  environment.  Understanding  the   life  cycle  of  molecular  clouds  and  the   local  onset  and   termination  of   stellar  birth   in  galaxies  at  different   redshifts   is   a   key  problem  of  modern  astronomy  and  lies  at  the  very  forefront  of  computational  astrophysics.  Progress  requires  combining  very   high-­‐resolution   multi-­‐species   magneto-­‐hydrodynamic   simulations   of   dust   and   gas   with   time-­‐dependent   non-­‐equilibrium   chemistry   in   order   to   describe   correctly   the   different   phases   of   the  turbulent,   self-­‐gravitating   ISM   and   their   heating   and   cooling   behaviour   (including   high-­‐precision  multi-­‐frequency  radiative  transfer).  In  addition,  we  need  to  account  for  the  internal  stellar  evolution  of   the   (proto)   stars   that   form   within   the   clouds   to   be   able   to   model   correctly   stellar   feedback  processes  such  as  winds  and  outflows  or  radiation.  Such  feedback  may  be  able  to  destroy  the  clouds  

Page 74: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  74  

 Figure  3.2.  3D  rendering  of  a  core-­‐collapse  supernovae  explosion  0.15  seconds  after  the  bounce.  Courtesy  of  Volker  Springel  (Heidelberg  Institute  for  Theoretical  Studies)  and  Thomas  Janka  (MPA  Garching)    

from   the   inside  by  heating  and   stirring   them,  as  well   as  by  driving  outflows  and   large-­‐scale  winds,  thus  transporting  gas  (and  metals)  from  the  disk  into  the  galactic  halo.  This  global  matter  cycle  plays  a  major   role   in   controlling   the   long-­‐term   evolution   of   star   forming   galaxies   in   the   universe.  With  current   computational   power,   answering   these   questions   is   simply   not   possible.   The   proposed  multiscale   and  multi-­‐physics   approach   to  model   the   complete   life   cycle  of  molecular   clouds   in   the  disk   of   the   Milky   Way   requires   a   concerted   collaborative   effort   of   several   research   groups   with  complementary  expertise  and  experience.  It  will  also  require  an  increase  in  computational  power  of  at  least  100,  bringing  us  into  the  exascale  regime.    

Strong  gravity  and  physical  processes:  Questions  10–12  

One  of  the  primary  goals  of  relativistic  astrophysics  until  2020+  will  be  the  first  direct  measurement  of   gravitational  waves  predicted  by   Einstein's   theory,   using  huge   laser   interferometer   facilities   like  VIRGO  and  GEO600   in  Europe  or  LIGO   in   the  US.  The  strongest  sources  expecting  to  radiate   in   this  upcoming  exciting  observational  window  to  the  universe  are  orbiting  and  merging  binary  black  holes,  colliding  compact  stars,  and  collapsing  and  exploding  massive  stars.  Elaborate  numerical  models  of  these   astrophysical   phenomena   are   needed   for   accurate   signal   predictions   that   will   allow   us   to  extract  the  gravitational  waves  from  a  noisy  background  and  to  realise  their  promise  to  unravel  some  of  the  mysteries  of  neutron  stars  and  black  holes  as  the  most  exotic  objects  in  the  universe.  Exploring  phenomena   in   the   strong   gravity   environment   of   such   extreme   objects   requires   a   treatment   of  general   relativistic   effects.   The   numerical   complications   of   the   corresponding   highly   non-­‐linear  hyperbolic   metric   equations   translate   into  extremely  high  computational  demands.  

Magnetic   fields   are   omnipresent,   from   the  largest   dimensions   of   galaxy   clusters   and  intergalactic  space  to  the  intermediate  scale  of   interstellar  gas  and  dust  clouds,  down  to  small  bodies   like  planets  and  moons.  While  the   origin   of   initial   seed   fields   shortly   after  the  Big  Bang   is   still   speculative,   the  growth  of   these   seed   fields   is   understood   as   a  consequence  of  plasma  flows  on  large  scales  and   through   a   cascade   of   highly   turbulent  magneto-­‐hydrodynamic   (MHD)   dynamo  effects.   Computer   simulations   of   these  processes   are   possible,   and   they   would  encompass   all   scales   between   the   global  size   of   objects   at   the   one   end   and   small  vortex  flows  and  granulation  on  the  scale  of  wave  structures  at  the  other.  However,  this  requires   resolutions   reaching   orders   of  magnitude   beyond   the   capabilities   of  present   HPC   systems.   Understanding   how  stars  end  their   lives  as  supernovae  (SNe)  or  what   happens   when   compact   stars   collide  requires   following   the   extreme   conditions   and   physical   processes   that   are   acting   on   very   short  timescales.   It   is   essential   to   include   neutrino   physics   and   radiative   transport   processes   in   these  calculations.  This  ultimately  will  require  highly  parallelised  Monte  Carlo  methods,  whose  application  can   most   easily   be   adapted   to   arbitrary   source   geometries   in   three   dimensions.   Neutrino-­‐   and  photon-­‐radiation  hydrodynamics  simulations  in  3D  are  among  the  computationally  most  challenging  and   demanding   tasks   to   be   performed   on   forthcoming   generations   of   supercomputers   with  requirements   easily   reaching   into   the   Eflop/s   regime.   Results   of   most   sophisticated   models   are  

Page 75: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  75  

indispensable   for   understanding   the   messages   carried   by   the   radiation   from   such   highest-­‐energy  phenomena   and   to   harvest   the   cosmic   neutrino-­‐burst   measurements   that   are   possible   with   big  underground  facilities  like  IceCube  at  the  South  Pole,  SuperK  in  Japan  and  new  instruments  planned  in  Europe,  the  US  and  Japan.  

For  example,  a  successful  simulation  of  the  first  second  of  a  SN  explosion  (corresponding  to  106  time  steps)   is   necessary   to  predict   neutrino   and   gravitational-­‐wave   signals   and   to   continue   through   the  nucleosynthesis  phase   into  the   late,  astronomically  observable  stages  of   the  explosion.  For  modest  spatial  resolution  of  10  million  cells  and  12  neutrino-­‐energy  bins,  this  requires  about  50  million  core  hours  on  16,000  of   the   strongest   cores  of   currently   available  PRACE  Tier-­‐0   systems.   Improving   the  spatial   and  energy   resolutions  by  only   a   factor  of   two  will   increase   the   computer   time  demand   to  about  1  PFlop/s-­‐year  per  simulation,  and  there  are  many  stellar  parameter  sets  for  which  one  does  need   such   simulations.   A   fully   relativistic   treatment,   convergent   turbulent   MHD   flows   and   highly  accurate  neutrino  spectra  will  require  even  higher  exascale  resources  for  SN  modelling.    

3.2.2 High-­‐Energy  Physics  

Quantum   Field   Theory   (QFT)   is   the   fundamental   theory   of   our   world,   describing   all   particles   and  interactions  with  extremely  high  precision.  However,  it  is  mathematically  only  consistent  if  additional  interactions  and  particles  exist  which  lead  only  at  high  energies  to  significant  contributions  and  cure  potential  divergences.  If  this  were  not  to  happen,  the  correct  description  of  thousands  of  quantities  by   the   Standard   Model   would   be   purely   accidental,   which   is   statistically   close   to   impossible.  Therefore  everybody  believes  that  Physics  Beyond  the  Standard  Model  (BSM)  must  exist.  

The  search  for  this  physics  is  the  driving  force  behind  high-­‐energy  particle  physics.  Many  suggestions  exist  as  to  the  nature  of  this  physics.  Some  of  them  fall  within  the  scope  of  QFT,  some  are  of  fundamentally  different   nature,   like   string   theory.   Supersymmetry,   a   fundamental   symmetry   between   bosonic   and  fermion  degrees  of  freedom,  is  usually  assumed  at  some  level.  Many  high-­‐precision  experiments  at  high  and  low  energies  search  for  this  new  physics  and  in  particular  the  Large  Hadron  Collider  (LHC)  at  CERN  was  built  to  find  it  (besides  the  Higgs  particle  itself).  So  far,  no  clear  signal  has  been  observed  and  it  has  become   unlikely   that   BSM   physics   will   reveal   itself   soon   through   large   measurable   effects.   This  expectation   has   been   strengthened   by   the   fact   that   the   recently   established34   Higgs   particle   is   so   far  compatible  with  Standard  Model  expectations  and  thus  does  not  give  any  hints  yet  on  the  physics  beyond  the   Standard  Model.   Therefore,   future   discoveries   will   require   high   precision,   both   experimental   and  theoretical.  As  the  systematic  theoretical  uncertainties  are  primarily  caused  by  effects  of  the  quark–gluon  interaction,  which   are   described   by  Quantum   Chromodynamics   (QCD),   a   large   fraction   of   present-­‐day  theoretical   work   focuses   on   that   theory.   In   fact,   QCD   is   a   very   complicated   theory,   which   combines  extremely  strong  non-­‐linearities  with  all  the  complexity  of  relativistic  quantum  field  theories,  resulting  in  an  extremely  rich  phenomenology,  much  of  which  is  still  not  fully  understood.  QCD  is  therefore  not  only  crucial   to   improving   the  experimental   sensitivity   for  BSM  physics,  but   it   is  a   fascinating   field   in   its  own  right.   It   allows,   for   example,   study   of   the   fundamental   connection   between   quantum   physics   and  thermodynamics  in  well-­‐controlled  settings  and  how  effective  degrees  of  freedom  (the  hadrons,  i.e.  quark  gluon   bound   states)   emerge   from   fundamental   ones   (quarks   and   gluons).   Over   the   years,   highly  sophisticated   techniques   have   been   developed   which   link   all   of   QCD   dynamics   to   a   large   number   of  precisely  defined  non-­‐perturbative  quantities.  These  can  be  calculated  numerically  by  Lattice  QCD  (LQCD),  and  in  many  cases  LQCD  is  actually  the  only  known  method  to  determine  them.  Such  calculations,  which  are  indispensable  to  interpret  high-­‐energy  experiments  reliably,  constitute  the  largest  fraction  of  the  HPC-­‐demand  of  particle  physics.  

The  basic   idea  of   LQFT   is   the   following:  most  of   the  non-­‐perturbative  quantities  of   interest   can  be  expressed  within  functional  integral  quantisation  but  not  in  canonical  quantisation.  This  implies  that  

                                                                                                                         34  http://www.bbc.co.uk/news/world-­‐18702455  

Page 76: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  76  

functional  integrals,  which  are  infinitely  dimensional,  have  to  be  calculated  numerically,  which  is  only  possible  if  a  sufficiently  large  space–time  volume  is  approximated  by  an  ensemble  of  discrete  points,  turning   the   functional   integral   into   a   normal   integral   of   very   large   dimension.   To   control   this  approximation,   the   limit   of   vanishing   lattice   spacing   has   to   be   under   complete   control,   which   is  usually  the  hardest  challenge.  

Another   crucial   task   for   theory   is   to   explore   much   larger   theoretical   model   spaces.   The   absence   of  signals  for  BSM  physics  at  LHC  calls  for  the  development  of  alternative  schemes  for  unifying  theories.  For  some   of   the   most   attractive   candidates,   conformal   symmetry   plays   a   fundamental   role.   While  supersymmetry  remains  among  the  most  interesting  theoretical  candidates,  its  lattice  implementation  is  still  plagued  by  conceptual  problems.  As  it  is  unclear  whether  this  situation  will  change  in  the  next  few  years,   it   is   not   possible   to   predict   what   computer   resources   might   be   needed   in   future   by   such  investigations.  However,  for  alternative  scenarios,  inspired  by  the  assumption  of  conformal  symmetry  at  high-­‐energy  scales,  there  are  questions  that  can  very  effectively  be  addressed  already  by  existing  means,  like   the   search   for  mechanisms   of   electroweak   symmetry   breaking   based   on   strong   dynamics.   These  Technicolor-­‐like   models   require   a   slow   running   of   the   QCD   coupling   which   can   be   realised   in   the  proximity  of  the  ‘conformal  window’  of  gauge  theories.  This   is   just  one  example  for  the  need  to  study  QFTs  other  than  QCD,  which  can  be  done  using  the  same  numerical  lattice  techniques.  

Challenges    

With  respect  to  the  Grand  Challenge  problems  listed  in  the  last  report,  very  substantial  progress  was  made  with   respect   to  QCD  thermodynamics  and  with   respect   to  making   lattice  QCD  simulations  at  physical   quark   mass   possible   (see   Figure   3.3),   while   it   turned   out   that   the   demands   for   hadron  structure  calculations  could  not  be  met  with  the  resources  available  so  far.  The  reason  is  technical:  the  extrapolation  of  simulation  results  to  the  real  physical  situation  turned  out  not  to  be  well  under  control,  such  that  one  needs  simulations  with  much  higher  statistics  in  a  larger  parameter  space  (for  example  with  respect  to  quark  mass  and  lattice  spacing).  Only  these  can  provide  the  urgently  needed  information.   As   stated   above,  QCD   is   an   extremely   rich   theory   and   different   lattice   collaborations  made   significant   progress   in   exploring   a   large   range   of   aspects   of   hadron   physics.   They   would   all  benefit  greatly  from  moving  simulations  even  closer  to  the  real  physical  world.    

 Figure  3.3.  The  hadron  mass  spectrum  from  simulations  at  physical  quark  masses  by  the  BMW-­‐Collaboration,  compared  to  the  experimental  values.  Courtesy  of  Budapest-­‐Marseille-­‐Wuppertal  Collaboration  

To  summarise  all  of  this  under  the  title  of  hadron  physics  does  not  do  justice  to  the  richness  of  QCD,  but  it  is  practical  in  that  most  simulations  serve  the  analysis  of  a  number  of  physics  questions  in  parallel  such  that  it  is  hardly  possible  to  specify  the  computational  need  for  each  of  them  separately.  The  keywords  cited   above   refer   to   such   diverse   tasks   as   pinning   down   the   fundamental   constants   of   the   standard  model,   in   particular   the   quark  masses;   providing   the   theory   input  which   allows   interpretation   of   the  

Page 77: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  77  

decay   rates   of   hadrons   containing   bottom  quarks  which   is   a   sector   of   the   Standard  Model   especially  sensitive  to  BSM  physics;  determining  the  transverse  quark  gluon  structure  of  protons  which  is  needed  to  interpret  certain  aspects  of  proton–proton  collisions  at  CERN;  determining  the  strange  quark  content  of  the  nucleon,  which  is  surprisingly  poorly  known;  and  many,  many  more.  

High-­‐energy   physics   is   intimately   related   to   astrophysics   and   plasma   physics.   For   example,   the  cosmological  Big  Bang  can  be  seen  as  a  high-­‐energy  physics  experiment  far  exceeding  the  power  of  any  man-­‐made   accelerator.   Therefore   it   should   be   possible   to   extract   elements   of   BSM   physics   from   the  analysis  of  its  precise  properties,  which  is  one  of  the  joint  big  tasks  of  all  three  fields.  Particle  theory  has  to  contribute  a  precise  understanding  of  QFT  thermodynamics  and  of  likely  BSM  candidates.  The  former  is  primarily  studied  for  QCD  thermodynamics  that  gets  direct  experimental  input  from  high-­‐energy  heavy-­‐ion  experiments,   in  particular  at   LHC.   In   this   field,  very   substantial  progress  was  made   in   recent  years.  Many  bulk  properties,  which  were  still  hotly  debated  a   few  years  ago,  are  now  settled  and  theory  has  progressed  to  address,  for  example,  subtle  charge  correlations  observed  in  experiment.  However,  much  more  information  is  still  needed.  One  fundamental  piece  of  information  is  the  equation  of  state  of  QCD  matter,  but  there  is  much  more.  LQCD  does  not  allow  studying  QCD  dynamics  directly  which  is,  however,  needed  for  Early  Universe  physics.  Instead,  one  can  investigate  equilibrium  properties  for  many  different  situations   (different   baryon   densities,   background   magnetic   fields,   varying   masses   and   interaction  parameters,   different   gauge   groups,   different   numbers   of   fermions,   etc.)   and   thus   constrain   analytic  calculations   of   QFT   dynamics.   Much   of   this   field   is   still   unexplored.   Let   us   mention   as   an   example  simulations  of  the  inflationary  phase  of  the  Early  Universe  and  the  process  of  reheating,  leading  to  the  hot  Big  Bang  initial  conditions,  which  are  needed  for  astrophysics.  Finally,  it  could  prove  necessary  to  repeat  these  studies  with  a  more  costly  fermion  action.  At  present,  formulations  are  mostly  used  which  for  finite  lattice  spacing  violate  chirality,  a  fundamental  symmetry  relevant  for  QCD  thermodynamics  which  might  imply   that   the   continuum   limit,   i.e.   the  extrapolation   to   vanishing   lattice   spacing,   is   less  under   control  than  usually  assumed.   If   such  simulations   turned  out   to  be  necessary,   they  would   require  hundreds  of  PFlop/s-­‐years.    

The  header  ‘High-­‐energy  physics’  is  somewhat  misleading  as  QCD  at  medium  large  and  low  energies  is   equally   fascinating.   The   research   in   these   fields   does   not   aim  primarily   at   the   discovery   of   BSM  physics  (although  this  is  also  part  of  the  agenda,  e.g.    in  neutrino-­‐less  double  beta  decay)  but  rather  at  the  better  understanding  of  ordinary  matter,  primarily  atomic  nuclei  and  their  interactions.  Again,  the   ties   to   astrophysics   are   very   close.   A   typical   application   is   the   core-­‐collapse   leading   to   a  supernova  explosion  and  the  resulting  nucleosynthesis  of  heavy  nuclei.      

This  field  has  been  revolutionised  in  recent  years  as  it  became  possible  to  extract  nuclear  forces  from  ab-­‐initio   LQCD   calculations.   These   forces   are   in   the   process   of   superseding   the   schematic  phenomenological  forces  used  so  far.  One  way  to  reach  this  goal  uses  effective  field  theory  (EFT)  as  intermediary.   (Only   very   small   nuclei   can   be   simulated   in   toto   on   the   lattice.)  With   the   expected  development  of  computer  resources,  EFT  will  allow  in  the  next  few  years  medium  heavy  nuclei  to  be  treated  directly.    

These  developments  are  perfectly  matched  by  important  theoretical  progress  in  ab-­‐initio  calculations  of  nuclear  structure  and  reactions  based  on  the  nucleon–nucleon  forces  obtained  from  QCD  and  EFT.  We  are  approaching  the  point  where  nuclear  physics  calculations  for  heavy  nuclei  can  be  done  with  the   same   rigour   as   QFT   ones.   Particularly   demanding   are   the   simulations   of   explosive   events   like  supernovae,   novae,   sources   for   X-­‐ray   or   gamma-­‐ray   bursts,   etc.   The   associated   nucleosynthesis  typically   involves   several   thousand   isotopes   and   several   hundred   thousands   of   reaction   channels.  Needless  to  say,  all  of  this  is  only  possible  with  numerical  techniques.  

Other   computational   tasks   in   high-­‐energy   physics   are   closely   related   to   hydrodynamics   and   plasma  physics.   For  example,   to  extract   information  on   the  QCD  phase  diagram  one  has  also   to  describe   the  hydrodynamic  phase  of  heavy-­‐ion  collisions  correctly.  This  requires  the  simulation  of  finite  systems  with  relativistic  viscous  hydrodynamics  to  very  high  accuracy  that  is  just  another  formidable  numerical  task.    

Page 78: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  78  

Table  3.1  cannot  give  more  than  a  rough  impression  of  the  development  to  be  expected  in  the  period  to   2020.   The   largest   demand   is   expected   in   the   field   of   hadron   physics   for   which   it   can   also   be  quantified   rather  precisely  based  on  existing  experience.   This   includes   the  physics  of   light  hadrons  and  their  weak  matrix  elements,  as  well  as  the  physics  of  heavy  flavours  –  charm  and  beauty  –  the  main   goal   of   LHCb   and   an   essential   probe   of   BSM   effects.   Phenomenology   needs   a   reduction   of  present-­‐day  statistical  errors   for  matrix  elements   from  LQCD  by  at   least  an  order  of  magnitude.  To  obtain  a  corresponding  control  of  the  extrapolation  to  the  real  physical  world  as  well  as  of  all  other  systematic  uncertainties,  one  needs  simulations  for  a  large  number  of  lattice  parameters.  This  would  require,  for  example,  for  the  most  relevant  hadron  structure  observables  of  light  baryons  about  100  PFlop/s-­‐years.  One  example  for  such  systematic  uncertainties  are  possible  artefacts  from  admixtures  of   hadrons  with   the   same   quantum   numbers   but   larger  mass.   One  would   also   need   independent  simulations  by  collaborations  using  different   lattice   formulations,  again   to  control  other  systematic  uncertainties.  Assuming   the   existence  of   at   least   two   large   collaborations   plus   several   smaller   and  less  ambitious  ones  leads  to  the  300  sustained  PFlop/s-­‐years  given  in  the  table  below.  

Table  3.1.  Summary  of  some  key  high-­‐energy  physics  developments  to  be  expected  in  the  period  to  2020.  

Physics Objective Required Sustained Performance

Relevant Experiments

LQCD  at  zero  temperature:    Hadron  matrix  elements  at  the  physical  point  (moments  of  Generalised  Parton  Distributions  (GPD)s  and  (Distribution  Amplitudes)  DAs,  g_A,  hadronic  contribution  to  the  muon  anomalous  magnetic  moment,  singlet  contributions,  bag  parameters,  input  for  CKM  physics,  transition  form  factors,  TMDs,  etc.)    Several  independent  collaborations  and  fermion  actions    

 >  300  Pflop/s-­‐years  

 e.g.  ALICE,  ATLAS,  BES,  CMS,  JLab,  J-­‐PARC,  LHCb,  PANDA,  PHENIX,  STAR  

LQCD  in  a  box  Decay  characteristics  of  hadron  resonances    

>  20  Pflop/s-­‐years   BES,  PANDA,  LHCb    

LQCD  at  finite  temperature:  Equation  of  state  for  physical  masses  (at  least  by  two  independent  collaborations/fermion  actions)  Localising  the  critical  point  (at  least  by  two  independent  collaborations/fermion  actions)  

 50  Pflop/s-­‐years    50  Pflop/s-­‐years  

     ALICE,  ATLAS,  CBM,  CMS,  PHENIX,  STAR  

Non-­‐QCD    applications  of  LQFT  Investigating  ‘walking  technicolor’  models      Investigating  supersymmetric  models    Investigation  inflation  scenarios  

 5  Pflop/s-­‐years  Unclear,  conceptual  problems  Unclear  for  lack  of  benchmarks  

 LHC,  ILC  

EFT:  calculations  for  medium  heavy  nuclei   10  Pflop/s-­‐years   FAIR  Nuclear  physics  input  for  nucleosynthesis  in  supernovae    

2  Pflop/s-­‐years   FAIR  

Page 79: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  79  

 

Figure  3.4.  Electrostatic  potential  fluctuations  from  a  gyrokinetic  simulation  of  a  tokamak  plasma  (TCV  Machine),  obtained  with  the  GENE  code  (PRACE  early  access  call).  IPP  Garching  

3.2.3 Plasma  Physics  Plasmas  are  pervasive  in  nature,  comprising  more  than  99%  of  the  visible  universe,  and  permeate  the  solar   system,   interstellar   and   intergalactic   environments,   occurring   over   huge   ranges   of   scale   in  space,   energy   and   density   scales.   Plasma-­‐based   technology   has   been   at   the   forefront   of   scientific  research  for  more  than  half  a  century,  with  applications  in  fundamental  science  and  a  wide  range  of  topics   from   nuclear   fusion   to   medicine.   These   are   extremely   challenging   scientific   problems,  requiring  state-­‐of-­‐the-­‐art  numerical  tools  and  computational  resources  

Magnetic  fusion  research  seeks  to  reach  thermonuclear  conditions  by  containing  plasmas  with  strong  magnetic   fields   in   suitably   designed   devices.   Attaining   burning   plasma   conditions,   at   the   density  achievable   in   present-­‐day   devices,   requires   heating   the   plasma   to   very   high   temperatures   (of   the  order   of   100   x   106   ºC)   and   correspondingly   high   gradients   (~50   x   106   ºC/m).   In   these   conditions,  energy   and   particles   are   lost   from   the   device   through   turbulent   convection.   Understanding   the  transport  of  energy  and  matter  is  one  of  the  key  questions  in  fusion  plasma  physics,  and  it  is  of  great  practical   interest,   since   the   efficiency   of   a   power   station  would   depend  on   the   ratio   of   the   fusion  power  output  to  the  input  power  required  to  operate  the  device.  The  next-­‐step  international  fusion  experiment,  ITER,  is  a  10GEuro  tokamak  being  constructed  in  France,  with  the  objective  of  achieving  a  ratio  of  fusion  power  to  heating  power  that  exceeds  10.    

Concurrently,  some  of  the  most  demanding  scientific  and  computational  grand  plasma  challenges  are  closely  tied  in  with  recent  developments  in  ultra-­‐intense  laser  technology  and  with  the  possibility  of  exploring  astrophysical  scenarios  with  a  fidelity  that  was  previously  not  accessible  due  to  limitations  in   computing   power.   The   main   scientific   challenges   are   in   (i)   plasma   accelerators   (either   laser   or  beam   driven)   and   possible   advanced   radiation  sources   based   on   these,   which   have   promising  applications  in  bio-­‐imaging  and  medical  therapy,  (ii)  inertial  fusion  energy  and  advanced  concepts  with   ultra-­‐intense   lasers,   which   aim   to  demonstrate   nuclear   fusion   ignition   in   the  laboratory,   and   (iii)   collisionless   shocks   in  plasma   astrophysics,   associated   with   extreme  events  such  as  gamma  ray  bursters,  pulsars  and  AGNs.  

These   are   topics   of   relevance   not   only   from   a  fundamental   point   of   view  but   also   in   terms  of  potential  direct  economic  benefits.  For  instance,  research   in  plasma  accelerators   is  exploring   the  route  to  a  new  generation  of  more  compact  and  cheap   particle   and   light   sources,   a   topic  where  Europe   is   clearly   leading   due   to   the   large-­‐scale  pan-­‐European   laser  projects   (e.g.   Extreme  Light  Infrastructure   (ELI)   and   High   Power   Laser   for  Energy   Research   (HiPER)),   and   the   national  efforts   on   the   development   of   laser-­‐based  secondary  sources  (e.g.   in  Germany,  France  and  UK).  The  exploration  of  an  alternative  path,  to  the  Magnetic  Confinement  Fusion  (MCF)  approach  for  nuclear   fusion   is   critical   for   sustainable  energy  production,  which   is   the  driving   force   for  economic  growth.  Solar  physics  is  also  a  very  active  field  of  research,  with  new  satellites  becoming  operational  and   with   increased   interest   in   the   numerical   simulations   of   the   complex   solar   dynamics   and   the  impact  of  solar  phenomena  on  the  terrestrial  environment.    

Page 80: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  80  

Specifically,   the   present-­‐day   theoretical   challenge   lies   in   the   need   to   resolve   the   equations   of  magneto-­‐hydrodynamics  in  the  turbulent  regime  across  a  large  portion  of  the  Sun,  from  the  thin  (in  relative  terms)  boundary  layer,  called  the  tachocline,  which  marks  the  transition  between  the  inner  region   dominated   by   radiation   and   the   outer   convective   zone   ruled   by   hydrodynamics,   up   to   the  solar  surface  with  its  rich  phenomenology.    

Even   more   ambitiously,   future   simulations   aim   to   target   the   full   11year   solar   cycle,   including   a  treatment   of   the   coronal   phenomena   and   the   generation   and   time-­‐variation   of   the   solar  wind.   In  parallel,   the   interaction   of   the   solar   wind   with   the   geomagnetic   environment   is   also   a   subject   of  study  of  great  interest.  The  study  of  the  combined  Sun  and  Earth  magneto-­‐plasma  system  by  means  of  observations  and  simulations  is  often  referred  to  as  the  field  of  Space  Weather.  

Challenges    

In  all  of  these  fields,  huge  computational  challenges  stem  from  the  very  high  ranges  of  scales  in  space  and  in  time  that  are  required  to  model  any  significant  portion  of  the  system  to  integration  times  of  interest.  Advances  in  HPC  are  facilitating  higher-­‐fidelity  simulations  with  better  approximations,  and  these  are  greatly  improving  our  understanding  of  plasmas.  

Magnetic   fusion   devices   are   several   metres   across,   whereas   turbulent   structures   occur   at   the  millimetre   scale   and   significant  magnetic   disturbances   at   the   scale   of   at   least   several   centimetres.  Computational  modelling  will  play  a  crucial  role  in  maximising  the  success  of  the  unique  experiment,  ITER.   In   this   machine,   encompassing   all   the   important   phenomena   underlying   energy   losses   will  require   simulations   of   several   thousand   grid   points,   in   the   two   directions   perpendicular   to   the  magnetic  field  (fewer  in  the  parallel  direction).    

Furthermore,  in  fusion  devices,  plasma  collisions  are  so  rare  that  the  mean  free  path  along  the  field  lines   can   be   longer   than   the   characteristic   macroscopic   scale.  Modelling   transport   along   the   field  lines   by   a   fluid   closure   is   problematic.   The  modern   trend   is   to   use   a   reduced   form   of   the   kinetic  equation  for  each  plasma  component  (electrons  and  ion  species).  At  best,  this  requires  hundreds  of  grid  points  in  each  of  two  (parallel  and  perpendicular)  velocity  directions.    

In  the  case  of   ITER,  the  smallest  timescale  to  be  resolved  is  the  one  associated  with  millimetre-­‐size  vortices,  which  is  typically  of  the  order  of  a  microsecond.  The  energy  confinement  time  can  be  of  the  order   of   seconds.   Thus,   depending   on   the   time-­‐advancing   algorithm,   a   significant   simulation   to  steady-­‐state  energy  balance  would  require  tens  of  millions  of  time  steps.  

As  an  example,  a  recent  top  simulation  using  a  kinetic  plasma  model  was  carried  out  with  the  GYSELA  code  at  about  1011  variables  in  phase  space,  corresponding  to  a  few  TBytes  of  storage.  Integrating  the  model   for  duration  of   about  1  ms  of  physical   time   required  about   a  month  of  CPU  at   an  effective  speed  of  around  5  Tflop/s.    

However,   a   significant   ITER   simulation   with   a   more   complete   physical   model   may   require   1014  

variables   and   shorter   time   steps,   in   order   to   simulate   electron   scales,   or   an   integration   time   of   a  factor  103  longer  for  simulations  over  the  energy  confinement  timescale.    

Figure   3.4   shows   a   picture   of   the   electric   potential   in   a   turbulent   plasma  obtained  with   the  GENE  code.  Thanks  also  to  PRACE  resources,  plasma  turbulence  simulations  are  advancing  rapidly  and  are  being  compared  with  measurements  to  yield  valuable   insights   into  the  mechanisms  that  determine  losses  from  the  confinement  system.    

In   laser   plasma   interaction   scenarios,   the   numerical   tools   of   choice   are   usually   fully   relativistic  particle-­‐in-­‐cell  (PIC)  codes  such  as  the  OSIRIS  framework.  PIC  models  work  at  the  most  fundamental,  microscopic  level  and  are  therefore  the  most  computationally  intensive  models  in  plasma  physics.  

Page 81: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  81  

 Figure  3.5.  Laser  Wakefield  Accelerator  simulation  showing  laser  (red/white),  wakefield  structure  (green/blue)  and  accelerated  particles  (spheres).  Code  OSIRIS,  PRACE  first  call.  

For   example,   recent   scalings   for   a  laser-­‐plasma   electron   accelerator  indicate  that   to  reach  an  energy  of  10  GeV  the  accelerating  length  must  be  in  the   order   of   ~0.5   m,   with   a   plasma  density   of   ~1017   cm-­‐3.   Since   the   laser  wavelength   (~1   micron)   needs   to   be  resolved,   the   simulation   cell   size   will  be  of  ~10-­‐7  m,  and  the  total  number  of  iterations  will  be   in   the  order  of  ~107.  The   total   number   of   simulation  particles   required   will   be   over   ~1011,  and   the   total   computer   memory  requirements   are   in   excess   of   ~10  TBytes.  

Solar   physics   simulations   are   not   less  challenging.   Solar   plasmas   are  characterised   by   a   variety   of  interplaying   phenomena,   with  disparate   plasma   parameters.  Magnetism   originates   as   a   result   of   a  yet   ill-­‐understood  dynamo  mechanism  in   the   convective   region.   This   couples  to  the  Sun's  surface  where  events  such  as   protuberances,   flux   emergences  and   solar   flares   occur,   and  where   the  solar  wind  is  generated.    

Turbulence   is   generally   at   high  Reynolds  number,   and   spatio-­‐temporal   scale   separation   is   extreme  (8–10  orders  of  magnitude).  Present  research  aims  at  MHD  simulations  of  at  least  10,000  grid  points  in  each  spatial  direction  for  long  very  integration  times.  

Computer   resources   awarded   with   the   first   PRACE   calls   have   already   permitted   ground-­‐breaking  results   in   plasma   physics.   Ab-­‐initio   simulations   of   turbulence   in  magnetic   fusion   plasmas  with   the  GENE  code  have  clarified   the   limit  of  validity  of  energy  confinement  scaling   laws  and  are  exploring  the   physical   mechanisms   that   can   suppress   turbulence   in   transport   barriers,   which   significantly  enhance  fusion  performance.    

Studies  of  inertial  confinement  fusion  with  the  OSIRIS  code  have  explored  various  scenarios  for  fast  ignition  with  realistic  simulations  as  well  as  the  dynamics  of  particle  acceleration  in  shocks  produced  by  intense  laser  beams  (see  Figure  3.5).  

Table   3.2   lists   a   few   grand   challenge   plasma   simulations   that   should   become   accessible   with   the  computer  power  that  is  likely  to  emerge  over  the  time  frame  2012–2020.  

 

Page 82: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  82  

Table  3.2.  Summary  of  some  key  computational  challenges  in  plasma  physics.    

Grand  Challenge   Benefits   Compute  Requirement  

Challenges  

Magnetic  Confinement  Plasmas:  Tokamaks  Global  GK  simulations  of  ion-­‐scale  electrostatic  plasma  turbulence,  to  well  beyond  the  turbulence  saturation  time  (say  ~1  ms)    Natural  extensions  to  longer  times  (e.g.  confinement  time  ~5  s),  include  electron  scales,  include  magnetic  fluctuations            Multiscale  simulations  that  separate  slow  transport  and  fast  turbulent  timescales:  couple  local  gyrokinetics  code  to  transport  solver  

Theoretical  understanding  of  processes  determining  fusion  power  in  ITER  plasmas.  Exploit  to  optimise  ITER  and  design  smaller  devices        Improve  general  understanding  of  plasma  turbulence  and  impact  on  confinement  more  sophisticated  comparisons  of  turbulence  simulations  with    measurements  from  existing  tokamaks    Seek  to  maximise  scientific  benefit  from  unique  10GEuro  expt  ITER  (EU  largest  contributor).  Need  HPC  simulations  above  before  ITER  first  plasma  (2019)  

~O(200)  Pflop/s  hrs  >  O(10  s)  TB  memory              Extensions  improve  model,  but  much  more  demanding  computationally              Multiscale  less  demanding  route  to  transport  timescale.    Need  MANY  such  simulations  to  optimise  ITER  scenarios  

GK  codes  scale  to  <  104,105  

cores  Can  algorithm  bottlenecks  be  improved  to  reach  108  cores?  (serious  issue)      Can  we  find  clever  ways  to      parallelise    time?  

Laser  plasmas  and  ICF  Can  plasma-­‐based  acceleration  lead  to  the  development  of  compact  accelerators  for  use  at  the  energy  frontier,  in  medicine,  in  probing  materials,  and  in  novel  light  sources?          Can  we  achieve  fusion  ignition  and,  eventually,  useful  fusion  energy  from  compressed  and  heated  HED  fusion  plasma?        

The  holistic  comprehension  of  plasmas,  in  particular  in  conditions  which  are  highly  non-­‐linear  such  as  those  associated  with  intense  lasers,  is  highly  complex,  and  it  requires  ab-­‐initio  fully  kinetic  simulations          This  understanding  is  fundamental  for  the  design  and  development  of  new  generation  plasma  accelerators  with  the  possibility  of  tremendous  scientific  and  societal  impact  

One-­‐to-­‐one  modelling  of  plasma  accelerators  requires  petascale  systems,  and  parametric  scans  are  required  for  system  design  optimisation            ICF  simulations  are  even  more  demanding,  and  full  3D  modelling  of  implosion,  ignition  and  explosion  will  push  requirements  to  the  exascale  realm    

Can  EM-­‐PIC  simulations  strong  scale  to  ~107–108  

cores?  Can  we  extend  our  models  to  work  effectively  in  overdense  regimes?    Can  multi-­‐physics  models  be  efficiently  included  in  our  algorithms?    

Page 83: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  83  

3.3 A  Roadmap  for  Capability  and  Capacity  Requirements    

3.3.1 Astrophysics    Astrophysics   is  a   large  and  diverse  research  area  and  numerical  simulations  play  a  central  role.  The  community  has  always  taken  advantage  of  the  latest  HPC  hardware.  As  an  example,  the  first  N-­‐body  simulations   of   the   1960s   followed   the   trajectories   of   about   100   particles   to   study   star   cluster  evolution   and   galaxy   dynamics.   Today  we   are   simulating   the  Hubble   volume  with   almost   a   trillion  particles  and  with  high  dynamic  resolution.  This  increase  of  1010  in  particle  number  alone  in  60  years  is  faster  than  Moore's  law,  thanks  to  the  continued  software  development  that  takes  place  within  the  field.  Astrophysics  has  always  kept  up  with  HPC  trends  –  from  serial  codes  to  Vector  machines  to  the  current  MPI-­‐based  clusters  that  have  dominated  research  for  over  a  decade.  Novel  special-­‐purpose  hardware,   such  as   the  GRAPE  board,  was  developed  by   the  groups   studying  dense   stellar   systems.  Today  many   groups   are   starting   to   utilise   GPUs   as   hardware   accelerators.   The   community   is   well  poised   to   follow   the   current   trends   towards  massively   parallel,   large  multi-­‐core   nodes,   each   with  multiple  GPUs.  As  in  other  areas,  there  is  also  the  need  for  appropriately  matched  growth  in  storage  capacity.  However,  following  these  trends  requires  ever  more  complex  algorithms,  and  investment  in  manpower  for  software  development  is  essential.  

3.3.2 High-­‐Energy  Physics    The  computational  needs  of  LQCD  have  changed  substantially  in  recent  years.  While  in  the  past  the  main  cost  factor  was  the  generation  of  ensembles  of  quantum  field  configurations,  lately  analysis  has  become  comparable  in  computational  cost.  While  ensemble  generation  requires  very  high  scalability  of  the  architecture  to  obtain  long  enough  Monte-­‐Carlo  histories,  analysis  can  also  be  done  efficiently  on  cluster  architectures   (CPU  or  GPU  or  hybrid)  but  requires  easy  programmability  due  to  the  very  broad   spectrum   of   quantities   to   be   analysed.   Due   to   this   shift   of   requirements,   many   LQCD  collaborations   currently   face   a   certain   shortage   of   computer   resources   for   analysis.   In   the   near  future,   the   situation   is   likely   to   change   again,   as   the   limited   memory   bandwidth   of   scalable  architectures   will   reduce   the   efficiency   with   which   ensemble   generation   can   be   performed.  Efficiencies   of   order   10%   are   currently   reached   routinely   and   sometimes   substantially   more,   the  maximum  efficiency  reached  for  compute  kernels  being  close  to  40%.  Thus  the  roughly  300  sustained  Pflops-­‐years  needed   for  hadron  structure  physics  might  already  correspond  to  3  Eflops-­‐years  peak,  depending  on  computer  architecture.      

Another   problem,   which   will   become   more   relevant,   is   the   growing   mismatch   between   compute  power  and  I/O  bandwidth.  

While   ensemble   generation   is   clearly   part   of   capability   computing,   analysis   is   halfway   between  capability  and  capacity  computing.  A  typical  configuration  one  might  analyse  in  five  years  from  now  might  consist  of  multi-­‐Gbytes  of  data  requiring  a  moderately  scalable  architecture  for  analysis.    

LQCD  has  a  strong  record  in  making  efficient  use  of  a  large  range  of  computer  architectures.  The  field  will   therefore  be   able   to  drive   the  usage  of   future   computer   architectures,  whether   it   is   based  on  more  traditional  CPU  architectures,  GPU  architectures  or  emerging  new  technologies.    

3.3.3  Plasma  Physics    In   magnetic   fusion,   the   big   computational   challenge   for   the   next   decade   will   be   to   acquire   the  capability   to   simulate   the   totality   of   low-­‐frequency   dynamics   in   general   toroidal   geometry.   Such   a  code  would  treat  MHD  and  micro-­‐turbulence   in  a  unified  way,  and  would  need  mesh  sizes  nearing  105  point   in  each   transverse  direction   to  encompass  electron-­‐scale   lengths  at  physical   ion/electron  mass   ratios.   This   is   a   daunting   challenge   if   a   straightforward   scaling   from   present   simulations   is  employed.    

Page 84: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  84  

For  codes  evolving  the  kinetic  equations,  a  crucial  problem  is  the  efficient  and  accurate  interpolation  of  the  value  of  the  distribution  function  to  neighbouring  grid  points.  For  PIC  codes,  noise  reduction  is  a  key  element.  Both  methods  need  efficient   integration   in  velocity  space  and  global  solvers   for  the  electromagnetic   potentials   in   complex   geometry.   This   is   perhaps   the   key   limiting   factor   for   better  performance.   Likewise,   the   main   challenge   in   solar   physics   will   come   from   comprehensive  simulations  of  the  solar  interior,  the  solar  surface  dynamics  and  space  weather.  

For   plasma-­‐based   accelerator   modelling,   the  main   computational   challenge   is   to   perform   3D   full-­‐scale   one-­‐to-­‐one   modelling   of   metre   class   wakefield   accelerator   scenarios,   providing   insight   and  direction   into   the  next  generations  of  plasma-­‐based  accelerators.   Inertial  Confinement  Fusion,  and  Fast/Shock  Ignition  in  particular,  will  also  require  full-­‐scale  3D  modelling  to  analyse  energy  transport  and  deposition  in  the  compressed  core  leading  to  ignition.    

3.4 Expected  Status  in  2020  

3.4.1 Astrophysics  Astrophysics   and   cosmology  will   continue   to  be  driven  by  observations  of   our   cosmos.   Space-­‐   and  ground-­‐based  missions  are  planned  beyond  the  2020  timescale  that  will  provide  the  data  to  answer  our  dozen  questions.  Simulations  play  an  essential  role  in  this  grand  quest  to  understand  our  origins  and   it   is   essential   that   growth   in  HPC   resources   continues   at   its   current   rate   in  order   to  meet   the  requirements   of   these  missions.   If   HPC   trends   continue,   and   these   resources   are  made   available,  then   by   2020  we   expect  many   breakthroughs   and   a   huge   amount   of   progress   towards   answering  those  fundamental  questions.  

3.4.2 High-­‐Energy  Physics    The  development  of  computational  particle  physics  was  so  rapid   in   the  past   that  predictions  so   far  into   the   future   are   hardly   possible.   In   any   week,   for   example,   LHC   could   make   a   ground-­‐shaking  discovery  which  might  change  fundamentally  our  understanding  of  particle  physics  and  could  change  the  research  agenda  of  all  groups  working   in   that   field.  For   instance,   it   could  be  observed  that   the  decay  probability  for  any  of  the  Higgs  decay  channels  does  not  fit  to  the  Standard  Model.  Checking  as  many   as   possible   of   them   is   presently   the   focus   of   experimental   research   at   LHC.   A   few  developments  can,  however,  be  predicted  with  a  reasonable  level  of  certainty.  

• QCD  thermodynamics  should  be  more  or  less  settled  by  2020.  This  does  not  mean  that  there  will  not  be  any  open  questions  left  –  quite  to  the  contrary  –  but  things  like  the  equation  of  state  or  the  existence/non-­‐existence  of  a  critical  point  should  be  decided  and  should  have  turned  into  textbook  physics.  

• The  status  of  hadron  structure  calculations  is  less  clear.  In  the  past,  all  predictions  have  turned  out  in  the  end  to  be  over-­‐optimistic.  Nevertheless,  based  on  present  knowledge,  it  is  (again)  a  rational  expectation  that  LQCD  will  have  reached  the  point  at  which  it  can  provide  information  comparable  in  reliability  to  direct  experimental  data,  but  of  a  highly  complementary  and  broader  nature.  

• Non-­‐QCD  LQFT  will  experience  rapid  growth  over  the  next  years.  If  BSM  physics  is  found,  then  this  new  physics  will  have  to  be  analysed;  if  it  is  not  found,  the  question  will  be  made  ever  more  urgent:  how  can  it  be  that  it  is  not  yet  in  reach  and  what  are  more  promising  avenues  to  find  it?      

• The  non-­‐LQFT  applications  which  are  still  at  an  early  stage  of  development  will  certainly  catch  up  and  play  a  much  more  prominent  role  by  2020.    

Page 85: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Astrophysics,  HEP  and  Plasma  Physics

  85  

3.4.3 Plasma  Physics  With  ITER  expected  to  be  operational  at  the  end  of  this  decade,  there  is  a  strong  community  effort  to  develop  the  numerical  tools  to  carry  out  the  necessary  science.  

As   far   as  magnetic   fusion   is   concerned,  one  expects   to  be  able   to   carry  out   a   full-­‐torus   simulation  with  a  detailed  plasma  physics  model  for  the  duration  of  an  energy  confinement  time.  This  is  a  grand  HPC  challenge  requiring  exascale  resources.  In  parallel,  capacity  HPC  computing  will  be  necessary  to  carry   out   parametric   studies   with   somewhat   simpler   plasma   physics   models.   In   parallel   with   this  activity,   the  community  will  exploit  a   suite  of   codes,   combined   into  a   single   framework   for  plasma  and  machine-­‐integrated  modelling,  ranging  from  data  analysis  and  reconstruction  of  the  plasma  state  to   interpretative   and   predictive   simulations.   This   suite   will   include   some   first-­‐principle   codes   that  require  HPC.  

By   the   end   of   the   decade   laser   plasma   accelerators   will   have   matured   and   detailed   full-­‐scale  simulations  of  problems  involving  a  very   large  range  of  spatial  and  temporal  scales  will  be  required  for  high-­‐fidelity  modelling  that  will  also  require  exascale  resources,  and  that  will  provide  predictive  modelling   capable   of   sustaining   engineering   advances.   In   inertial   confinement   fusion,   and   in  advanced  ignition  schemes  in  particular,  the  community  should  be  in  a  position  to  perform  complete  modelling   of   inertial   fusion,   using  multi-­‐physics/multiscale   codes   to   analyse   in   detail   ICF   scenarios  and  optimise  target  designs.    

Similarly,  the  solar  space  community  is  likely  to  be  in  the  position  to  carry  out  the  grand  challenge  of  simulating   the  dynamics  of   the   Sun  over   a   long   timescale,   comparable   to   the  11-­‐year   cycle.   Space  weather  forecasting  will  require  a  full  chain  of  codes,  from  measurement  and  reconstruction  of  the  solar  MHD   state,   through   stability   analysis,   simulation  of   the   solar   surface  dynamics,   prediction  of  events  and  their  impacts  on  the  solar  wind,  down  to  simulations  of  the  interaction  of  the  solar  wind  with  the  magnetosphere.  Again,  certain  codes  belonging  to  this  integrated  modelling  suite  will  need  to  access  HPC  resources.                                                

Page 86: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  86  

4 MATERIALS SCIENCE, CHEMISTRY AND NANOSCIENCE

4.1 Summary  The   advance   from   petascale   to   exascale   computing   will   change   the   paradigm   of   computational  materials   science   and   chemistry.   Today   the   discipline   acts   as   the   third   scientific  method   in   a   tight  loop   between   experiment   and   theory.   The  move   to   petascale   will   broaden   this   paradigm   –   to   an  integrated   engine   that   determines   the   pace   in   a   design   continuum   from   the   discovery   of   a  fundamental   physical   effect,   a   process,   a   molecule   or   a   material,   to   materials   design,   systems  engineering,   processing   and  manufacturing   activities,   and   finally   to   the   deployment   in   technology,  where   multiple   scientific   disciplines   converge.   Exascale   computing   will   significantly   accelerate   the  innovation,   availability   and  deployment   of   advanced  materials   and   chemical   agents   and   foster   the  development  of  new  devices.  These  developments  will  profoundly  influence  society  and  the  quality  of  life,  through  new  capabilities  in  dealing  with  the  great  challenges  of  knowledge  and  information,  increased  productivity,  sustained  welfare,  clean  energy,  health,  etc.  

Computational  materials  science,  chemistry  and  nanoscience  is  concerned  with  the  complex  interplay  of  the   myriads   of   atoms   in   a   solid   or   a   liquid,   thereby   producing   a   continuous   stream   of   new   and  unexpected  phenomena  and   forms  of  matter.  An  extreme  range  of   length,   time,  energy,  entropy  and  entanglement  scales  give  rise  to  the  complexity  of  an  extremely  broad  range  of  materials  and  associated  properties.35  The  target  of  this  science  is  to  design  materials  ranging  from  the  level  of  a  single  atom  up  to  the  macroscopic  scale,  and  phenomena  from  electronic  reaction  times  in  the  femtosecond  range  up  to  geological   periods.   Computational   materials   science,   chemistry   and   nanoscience   stands   in   close  interaction  with  the  neighbouring  disciplines  of  biology  and  medicine,  as  well  as  the  geosciences,  and  it  impacts   extensive   fields   of   endeavour  within   the   engineering   sciences.   The   above  design   goal  will   be  achieved   by   a   large   and   diverse   computational   community   that   views   as   critical   assets   the  conceptualisation,   development   and   implementation   of   algorithms   and   tools   for   cutting-­‐edge   HPC.  These   tools   are   used   to   great   benefit   in   other   communities   such   as  medicine   and   life   sciences,   and  engineering  sciences  and  industrial  applications.  as  Furthermore,  this  domain  serves  another  major  goal  in  educating  human  resources  for  future  advances  in  computational  materials  science.  

During   the   past   5–10   years,   many   of   the   goals   laid   out   in   the   first   ‘Scientific   Case   for   European  Petascale  Computing’  have  been  successfully  accomplished.  In  the  field  of  nanoscience,  for  example,  robust   tools   for   the   quantitative   understanding   of   structure   and   dynamics   at   the   nanoscale   have  been   developed,  matching   the   extraordinary   developments   in   experimentation.   Today   four  major  thematic  challenges  are  noted:    

1. In  response  to  the  new  computer  petascale  architectures,  powerful  algorithms  are  developed  that  make  use  of  thousands  of  computer  cores,  which  sets  a  direction  of  exascale  computing.    

2. Unprecedented  progress  is  witnessed  in  the  increase  of  precision  and  robustness  of  computational  predictions  on  much  finer  energy  scales,  e.g.  of  many  body  quantum  systems  requiring  orders  of  magnitude  more  processor  time.    

3. A  considerable  effort  is  undertaken  to  bridge  seamlessly  the  gap  between  different  length  and  timesscales  inherent  to  this  field  to  reach  the  complete  simulation  of  an  entire  device  or  systems  integrated  in  a  technology.    

4. Long  time  trajectory  simulations  are  urgently  needed.                                                                                                                                      35  Inorganic  and  organic  solids,  molecules  and  polymers,  smart  materials  that  self-­‐repair,  actuate  and  transduce,  bio-­‐compatible  and  programmable  materials  and  soft  matter,  self-­‐assembly  and  biomimetic  synthesis,  quantum  dots  and  strongly  correlated  quantum  materials,  photosynthesis  and  the  ultrafast  magnetisation  dynamics,  to  name  but  a  few  examples  

Page 87: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  87  

With  a   sustained  Eflop/s,  many  of   the   leadership-­‐class  calculations  of   the  materials   science,  chemistry,  nanoscience,   condensed   matter   physics,   mineralogy   and   geoscience   communities   that   may   take   1–2  years  on  a  petascale  computer  today  excitingly  become  state-­‐of-­‐the-­‐art  high-­‐throughput  calculations.  This  allows   the   application   of   these   methods   to   a   multitude   of   systems   under   many   different   external  parameters   and   enables   the   validation   and   quantification   of   the   robustness   of   the   methods   and   the  reliability  of   the  associated  models.  High   throughput   facilitates   the   link  between  materials   science  and  materials  informatics  that  suggests  new  discoveries  through  combinatorial  search  among  a  vast  number  of  alternatives  in  a  feedback  loop  of  processing  and  manufacturing.    

4.2 Computational  Grand  Challenges  and  Expected  Outcomes    

Many   upcoming   challenges   in   computational   materials   science,   chemistry   and   nanoscience   have  been   outlined   by   a   recent   ESF  Material   Science   and   Engineering   Expert   Committee’s   (MatSEEC)36  Science  Position  paper37  from  which  we  draw  benefits.  This  section,  however,  is  divided  into  separate  overviews  of  materials  science,  chemistry  and  nanoscience.  Each  field  is  vast.  This  division  is  in  part  historic   and   stems   from   the   different   communities   and   the   characteristic   questions   that   are  addressed   although,   in   practice,   the   boundaries   between   the   fields   are   somewhat   fluid.   A   wide  variety  of  societal  challenges38  translate  in  part  to  research  challenges,  with  these  challenges  in  turn  inspiring  computational  challenges.      

This   translates   into   demand   for   dealing   with   ever-­‐increasing   complexity,   larger   system   sizes,  increasing  robustness,  ever-­‐longer  simulation  sequences,  seamless  changes  in  length  and  time  scale  and  a  diversity  of  underlying  algorithms  on  a  necessarily  diverse  set  of  HPC  platforms.  The  research  challenges   and   the   computational   challenges   span   all   three   fields   but   are   listed   below   under   a  particular  subfield  where  they  most  naturally  fit.      

There  are  four  major  themes  of  computational  challenge  common  to  all  sub-­‐fields:    

1. Exploiting  exascale  architectures.  Highly  parallel  computing  platforms,  with  tens  or  hundreds  of  thousands  of  cores  as  well  as  special-­‐purpose  processors,  such  as  powerful  graphics  cards  or  accelerators,  will  be  essential  for  the  future  of  this  field.  In  response  to  these  new  computer  architectures,  a  critical  investigation  and  exploration  of  algorithms  and  their  computational  implementation  to  enable  such  platforms  to  be  utilised  is  essential.  The  development  of  stochastic  methods,  or  linear  scaling  methods  and  divide-­‐and-­‐conquer  algorithms,  developed  during  the  Pflop/s  infrastructure,  needs  to  be  adapted  to  exascale.  Truly  massively  parallel  computing  is  a  major  challenge  for  the  future.  

2. An  increase  in  the  precision  and  robustness  of  the  computational  models  is  required  to  improve  their  predictive  capability  for  larger  systems  over  longer  timescales.  The  computational  cost  often  increases  as  a  power  law  with  a  large  exponent  as  a  function  of  system  size.  The  underlying  algorithms  are  frequently  more  complex  but  potentially  benefit  most  from  highly  parallel  computing  platforms.  Redeveloping  these  algorithms  for  new  computer  architectures  is  a  major  challenge.    

3.  Bridging  seamlessly  the  gap  between  different  length  and  timesscales  is  inherent  to  this  field  to  transition  between  the  composition  of  a  material,  the  processing  and  its  conditions  of  use  –  one  would  like  to  have  a  single  model,  that  bridges  atomistic  and  continuum  descriptions  seamlessly,  

                                                                                                                         36  www.esf.org/matseec  37  Computational  Techniques,  Methods  and  Materials  Design,  European  Science  Foundation,  March  2011,  ISBN:  978-­‐2-­‐918428-­‐38-­‐1  

38  Energy  harvesting,  storage,  conversion  and  saving,  environmental  protection  and  toxicity  management,  decontamination,  air  cleaning,  integration  of  data  and  information  technology  for  higher  information  availability  and  connectivity,  critical  materials  substitution,  biotechnology,  topography,  health  care,  or  mobility  

Page 88: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  88  

i.e.  contains  atomistic  and  continuum  limits  as  special  cases.  This  problem  has  yet  to  be  solved,  but  the  mathematical  methods  required  for  multiscale  modelling  are  rapidly  developing.  These  include  such  topics  as  multi-­‐resolution  analysis,  high-­‐dimensional  computation,  domain  decomposition,  turbulence,  level  sets,  and  discrete  mathematics.  These  topics  need  to  be  explored  from  the  point  of  view  of  the  application  to  various  materials-­‐science  problems,  ranging  from  differential  equations  to  stochastic  simulation.  A  further  dimension  is  the  increased  use  of  materials  informatics,  the  development  of  databases  and  their  curation  and  mining.      

4. Long  time  trajectory  simulations  are  needed.  The  simulation  of  nucleation,  growth,  self-­‐assembly  and  polymerisation  is  central  to  the  design  and  performance  of  many  diverse  materials  such  as  rubbers,  paints,  fuels,  detergents,  functional  organic  materials,  cosmetics  and  food.  

4.2.1 Materials  Science  

4.2.1.1 Hard  Matter      

State  of  research    Hard   condensed  matter   encompasses   an   extremely   rich   variety   of  materials   and   systems.39   These  materials   find   their   application   in   solar   cells,   fuel   cells,   SQUIDS,   sensors,   responsive   actuators,  headphones,   non-­‐volatile   resistive   and   magnetic   memories,   processors,   cameras,   smart   phones,  batteries,  cars,  rockets,  satellites,  etc.  Non-­‐equilibrium  phenomena  are  of  great  practical  importance  in   such   diverse   areas   as   optimising   self-­‐assembled   and   biomimetic   techniques   to   produce   and  process  materials,  manufacturing  technologies,  designing  energy-­‐efficient  transportation,  processing  structural  materials,  or  mitigating  the  damage  caused  by  earthquakes.  

Challenges  

Challenge:  Strongly  correlated  and  quantum  materials  

• Multiferroicity,  colossal  magneto-­‐resistance,  exotic  superconductivities,  Mott-­‐insulation,  Coulomb-­‐blockade,  Kondo-­‐effect,  heavy-­‐fermions,  orbital  ordering  are  variants  of  strongly  correlated  electron  phenomena.  Our  understanding  of  the  underlying  physics  and  chemistry  of  the  emerging  class  of  strongly  correlated  materials  is  still  severely  limited  and  is  hampering  technological  applications.  The  current  development  of  the  LDA+DMFT  method  (see  section  4.3)  provides  a  new  approach  that  addresses  the  complexity  of  the  quantum  properties  of  these  materials  that  really  requires  exascale-­‐level  simulations.    

• Topological  and  Chern  Insulators,  the  (anomalous-­‐,  spin-­‐,  quantum  spin,  quantum  anomalous)  Hall  effect,  the  orbital  moment  and  the  magneto-­‐electric  coupling  are  examples  of  a  new  class  of  quantum  materials  that  is  classified  by  the  topological  nature  of  the  electrons  at  the  Fermi  surface.  The  spin-­‐orbit  energy  is  the  relevant  energy  scale.  A  reliable  integration  of  the  Fermi  surface  requires  an  extremely  high  resolution  and  is  a  field  that  benefits  from  massively  parallel  computing.    

                                                                                                                         39  To  give  an  impression  of  the  vastness  of  the  field,  we  mention  metals,  semiconductors,  insulators,  alloys,  glasses,  amorphous  materials,  quasi-­‐crystals,  heterostructures,  nanoclusters,  quantum  dots,  graphene,  nanotubes,  buckyballs  and  related  structures,  zeolites,  wires,  composite  materials,  phase  change  materials,  smart  materials,  steel,  shape  memory  materials,  magnets,  giant  magnetoresistance  materials,  colossal-­‐magnetoresistive  oxides,  spin-­‐chain  and  spin-­‐ladder  compounds,  magneto-­‐optical  recording  materials,  piezo-­‐  and  ferro-­‐electric  materials,  electro  ceramics,  tunnel  junctions,  multiferroics,  artificially  engineered  metamaterials,  topological  insulators,  high-­‐temperature  superconductors,  organic  superconductors,  organic  electronic  materials,  porous  silicon,  Bose–Einstein  condensates,  transistors.  

Page 89: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  89  

 Figure  4.1.  Visualisation  of  the  near-­‐tip  deformation  in  brittle  fracture  (Alessandro  De  Vita,  King’s  College,  London).  

Challenge:  Materials  informatics  

Materials  informatics  offers  the  potential  to  develop  new  materials  technologies  at  a  faster  rate,  and  in  a  more  cost-­‐effective  way,  than  previously  possible.  The  challenge  is  to  reduce  development  time  both  for  the  discovery  of  new  materials  and  for  the  prediction  of  their  properties  and  process.  This  accelerated  method  provides  closer  alignment  to  the  product  development  cycle  while  contributing  to   increased  product  performance.  Some  of   the   techniques  of   interest   in  materials   informatics   are  standard  –   for  example,  quantum  methods   for  computing   the  stability  of  materials,  or   information  techniques   such   as   data   mining,   or   data   analytics   (in   statistics).   However,   combining   these  techniques   to  exploit  materials   informatics   is  not   standard  practice  and  offers  a  novel  approach   to  materials  design.  Due  to  the  increased  availability  of  computational  resources,  our  ability  to  address  complex  materials  issues  is  dramatically  improving:  for  example,  it  is  possible  to  run  many  thousands  of   potential   material   calculations   and   generate   notable   ‘theoretical   databases’.   Databases   with  derived   materials,   with   calculated   physical   and   engineering   properties,   will   no   doubt   become   an  increasingly   important   tool   for   researchers   and   engineers   working   in   fields   related   to   materials  development.  Capacity  computing  capabilities  of  an  exascale  facility  will  drive  the  establishment  and  management   of   such   databases.   It   is   closely   related   to   the  Materials  Genome   Initiative   for  Global  Competitiveness40  initiated  by  the  Office  of  Science  and  Technology  Policy  of  the  Executive  Office  of  the  President  of  the  United  States.  

Challenge:  Multiscale  modelling  

A  pressing  research  challenge  involves  the  integration  of  the  various  length  and  time  scales  relevant  for   materials   science,   briefly   outlined   above.   Multiscale   materials   simulation   is   currently   a   high-­‐impact   field   of   research,   where  much   effort   is   focused   towards  more   seamless   integration   of   the  length  and  time  scales,  from  the  electronic  structure  calculations,  atomistic  and  molecular  dynamics,  kinetic  and  statistical  modelling  to  the  continuum.  Together  with  new  and  emerging  techniques,  the  provision   of   increased   computational   power   can   yield   answers   to   versatile   and   complex   questions  central   to   materials   manufacture,   properties,   performance   and   technological   applications.   Typical  examples  of  multiscale  materials  problems  include  the  following.    

• Modelling  related  to  materials  growth,  processing  and  modification  using  electron  and  ion  beams  or  plasma  techniques.  Examples  are  chemical  vapour  deposition  (CVD)  or  atomic  layer  deposition  (ALD)  growth  of  thin  films  and  coatings,  where  the  scales  vary  from  the  sub-­‐nanometre  surface  region  to  the  metre-­‐scale  reactor.    

• Friction  and  sealing.  These  topics  are  extremely  important  for  mechanical  engineering  and  physics  involving,  for  example,  functionality,  energy  efficiency,  environmental  protection,  safety  and  miniaturisation  of  technological  devices.  Orders  of  magnitude  in  length  scale  have  to  be  covered.    

• Ageing  of  engineering  materials.  For  engineering  materials  in  our  daily  environment  (e.g.  from  cement  and  concrete  to  clay  sediments  or  materials  for  waste  storage),  understanding  the  relationship  between  their  complex,  hierarchical  microstructure  and  the  long-­‐term  evolution  of  transport  or  mechanical  properties  is  at  the  basis  of  improving  durability  and  sustainability.  Nano-­‐  and  micro-­‐scale  processes  –  from  ion  and  water  transport  to  local  evolution  of  composition  and  morphology  –  drive  the  evolution  and  aging  of  

                                                                                                                         40  http://www.whitehouse.gov/sites/default/files/microsites/ostp/materials_genome_initiative-­‐final.pdf  

Page 90: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  90  

their  mechanical  properties  and  may  lead  toward  functional  failure  and  structural  deterioration.  Computational  physical  chemistry,  statistical  physics  and  experimental  approaches  designed  for  glasses  and  amorphous  materials  can  give  insight  into  this  critical  range  of  length  scales,  from  nanometres  to  microns.  

• Brittle  fracture.  Many  attempts  have  gradually  made  clear  that  fracture  in  pure  phases  such  as  glasses,  crystalline  semiconductors  and  minerals  –  as  well  as  in  complex  systems  such  as  advanced  ceramic  fuel  cells,  thermal  barriers  and  biomimetic  coating  films  –  present  challenging  problems  for  theory.  This  is  mainly  because  high  accuracy  and  large  system  sizes  are  both  necessary  ingredients  for  modelling  failure  in  brittle  materials.  On  the  one  hand,  the  ionic  or  covalent  bond-­‐breaking  and  formation  associated  with  brittle  fracture  advancement  (accompanied,  for  example,  by  surface  reconstruction,  chemical  attack  by  inflow  of  corrosive  species,  or  reactions  with  pre-­‐existing  impurities)  require  interatomic  potentials  truly  capable  of  quantum  chemical  accuracy.  On  the  other  hand,  the  need  to  capture  faithfully  the  stress  concentration  phenomena  requires  large-­‐scale  (~106  atoms)  model  systems.  All  the  above  has  made  brittle  fracture  an  extremely  hard  problem  to  tackle.  These  simulations  contribute  decisively  to  the  prediction  of  the  lifetime  of  high-­‐performance  materials  in  an  energy  technology  such  as  high-­‐efficiency  gas  turbines.  

Many  of  the  above  problems  have  been  studied  on  the  continuum  level.  A  sufficiently  detailed  and  realistic  computational  modelling  and  understanding  of  these  highly  complex  technological  processes  can  only  be  achieved  by  large-­‐scale  computer  simulations  combining  a  large  number  of  particles  and  long  timescale  simulations  at  different  length  and  time  scales.  

4.2.1.2 Soft  Matter    

State  of  research    Soft   condensed  matter   (see   Figure   4.2)   encompasses   polymers,   colloids,  membranes,   amphiphiles  and   surfactants,   synthetic   and  biological  polymers,   lipids   and  proteins   as  well   as  organic–inorganic  hybrid  systems.  Classical  chain  molecules,  i.e.  polymers  in  the  narrower  sense,  form  only  a  subgroup  of   soft   matter   and   serve   as   a   reference   for   model   building.   These   macromolecules   find   their  applications  in  many  different  kinds  of  materials  such  as  rubbers,  paints,  fuels,  detergents,  functional  organic   materials,   cosmetics,   food,   bio-­‐membranes,   the   cytoskeleton   and   the   cytoplasm   of   living  cells.  

 

 

Figure  4.2.  The  ‘classical’  soft  matter  fields  of  (synthetic)  polymers,  hard-­‐sphere  colloids  and  amphiphilic  systems  have  merged  into  a  single  research  area  during  the  last  decade,  because  many  macromolecules  are  studied  today  which  display  polymeric,  colloidal  or  amphiphilic  character  simultaneously.  (Copyright:  Forschungszentrum  Jülich.)  

 

 

 

Page 91: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  91  

The   unifying   principles   of   soft   matter   systems   are   structures   on   mesoscopic   length   scales   from  nanometres  to  micrometres,  and  typical  energy  scales   in  the  order  of   the  thermal  energy.  Thermal  fluctuations  play  an  important  role  because  of  the  relatively  low  energy  density  in  these  materials  –  they  are  ‘soft’.  Self-­‐assembly  is  a  dominant  feature  in  soft  materials  and  the  essential  reason  for  their  complexity.  The  many  interacting  degrees  of  freedom  imply  that  entropy  plays  an  important  role  or  even  dominates  in  many  cases,  leading  to  universal  behaviour.  The  spatially  large  molecules  are  able  to   fluctuate   strongly   in   their   shape   (conformation).   Consequently,   processes   on   the   microscopic-­‐atomistic,   and   those   on   the   mesoscopic   level,   contribute   to   the   materials’   properties   in   equal  measure.  The  large  range  of  relevant  length  scales  implies  a  large  range  of  timescales.  Indeed,  there  are   often   ten   or   more   orders   of   magnitude   between   the   typical   timescales   of   the   local   atomic  movements  and  the  meso-­‐  respectively  and  macroscopic  phenomena.  Unlike  ‘hard  matter’  the  solid  phases  of  soft  matter  are  at  best  partially  crystalline,  but  rather  typically  in  an  amorphous,  glass-­‐like  state,  and  often  heterogeneous  on  mesoscopic  scales.  

Challenges  A  sufficiently  detailed  theoretical  modelling  and  understanding  of  highly  complex  soft  matter  systems  can   be   achieved   only   by   large-­‐scale   computer   simulations.   This   is   a   huge   challenge,   as   the   relevant  structures  comprise  many  orders  of  magnitude  concerning  the   length  scales.  As  far  as  the  timescales  are  concerned,  the  problem  is  more  challenging  still.  Already   in  the  case  of  the  most  simple  polymer  system,  a  melt  of  linear  neutral  homo-­‐polymers,  structures  are  encountered  on  lengths  from  covalent  binding  of  0.1  nanometre  to  clusters  of  10  nanometre  in  size.  The  length  scale  of  collective  phenomena  is   even   larger   by   an   order   of   magnitude.   Relaxation   times   reach   from   the   period   of   binding-­‐angle  vibrations  of  about  0.1  pico-­‐second  up  to  10  micro-­‐seconds  for  reptation  movement,  i.e.  eight  orders  of  magnitude.     (The   movement   of   a   chain   in   a   dense   polymeric   system   is   highly   constrained;   due   to  entanglements  with  other  chains  lateral  motion  of  the  chain  at  many  points  are  highly  improbable.)  The  most  important  questions  concern  the  dependence  on  the  polymer  chain  length,  on  the  concentration  dependence   for  multi-­‐component   systems,   and   the   temperature  dependence,   and   therefore   require  many   simulations   of   this   kind.   Thus,   huge   amounts   of   computer   time   are   needed   to   simulate   soft  matter  and  soft  materials  in  thermal  equilibrium.  

Similar   considerations  apply   for   the  simulation  of   charged  systems,   such  as  poly-­‐electrolytes   in  polar  solvents.  Poly-­‐electrolytes,  charged  colloids  and  charged  amphipihilic  molecules  are  present  in  a  wide  range  of   systems  and  applications,   from  biological   systems   (where   charged  bio-­‐macromolecules,   like  DNA   and   proteins,   are   ubiquitous)   to  waste-­‐water   treatment.   Here,   the  main   challenge   is   the   long-­‐range  nature  of   the  electrostatic   interaction  between   the  charged  macromolecules,   the  counter-­‐ions  and  the  salt  ions,  and  the  multi-­‐component  nature  of  the  relevant  systems.    

Questions  concerning  the  behaviour  of  soft  matter  under  flow  are  even  more  challenging.  Examples  include  the  extrusion  process  during  the  fabrication  of  polymer  materials  by  injection  moulding;  the  directed   self-­‐assembly   of   nano-­‐colloids   to   obtain   formulations   with   desired   properties   for   either  processing  or   function  of  nanoparticle-­‐based  materials,   and   the   flow  properties  of  blood  cells.   The  main   issue   here   is   the   incorporation   of   the   long-­‐range   hydrodynamic   interactions,   and   the  description   of   the   interplay   of   hydrodynamic   flows,   the   deformation   of   macromolecules,  membranes,   droplets,   vesicles   and   cells,   the   effect   of   walls   and   confinement,   and   the   effect   of  thermal  fluctuations.  To  address  these  challenges,  mesoscale  hydrodynamics  simulation  techniques,  such  as  Lattice-­‐Boltzmann,  Dissipative  Particle  Dynamics,  and  Multi-­‐Particle  Collision  Dynamics,  have  been  developed  in  recent  years.  Although  these  approaches  need  further  development,  they  already  allow  the  investigation  of  many  interesting  and  important  issues.  

Another   important   issue   is   the  desirable  permanent  monitoring  of  experimental  studies,  which  will  only   be   possible   if   the   computer   performance   available   today   is   increased   dramatically.   Further  parallelisation  can  be  the  solution  only  in  special  cases  since  the  systems’  temporal  development  has  to   be   followed   in   many   cases.   Therefore   there   are,   apart   from   the   need   for   considerably   more  

Page 92: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  92  

powerful  supercomputers,  intensive  efforts  to  develop  simulation  methods  with  which  several  length  and  time  scales  can  be  systematically   linked  to  each  other  (multiscale  simulations).  Real  progress   is  only   possible   if   both   developments   go   hand   in   hand.   Examples   are   the   consideration   of   local   ion  interactions  and  explicit   solvents   (e.g.  molecular   structure  of  water)   for  poly-­‐electrolytes,   to  which  almost  all  biopolymers  belong;  the  dynamics  of  realistic  polymer  melts  with  branched  polymers,  the  phase  behaviour  of  multi-­‐component  systems  or  scale-­‐spanning  calculations  with  realistic  dynamics  and  with  conformation  changes  of  smaller  and  later  of  larger  biopolymers.    

Finally,  downsizing  experiment  to  the  level  of  control  of  single  molecules  or  very  small  dimensions  as  in   micro-­‐   and   the   upcoming   nano-­‐fluidics   will   open   the   pathway   to   performing   experiments   in  parallel  on  the  computer,  giving  unseen  options  of  insight  into  molecular  processes.  This  requires  the  parallel  development  of  hardware  as  well  as  new  simulation  methods.  From  a  computational  point  of  view,   ‘very   large’   systems   are   still   to   be   studied,   which   come   into   range   with   the   new   powerful  hardware.   Then   non-­‐equilibrium   behaviour,   the   basis   for   almost   all   natural   and   technological  processes,  can  be  investigated.  This  should  help  us  to  understand  these  processes  but  also  will  help  to  improve  force  fields.  These  typically  are  parameterised  based  on  macroscopic  experiments,  while  quantum   chemistry   can   be   used   for   bonded   interactions.   However,   both   suffer   from   a   naturally  rather   low   level   of   accuracy   compared   to   the   needs   in   this   field,   so   that   the   combination   of  experiment  and  simulation  is  also  promising  here.  

4.2.2 Chemistry  

State  of  research    Computational  chemistry  is  currently  concerned  with:    

•     The  design  and  production  of  new  substances,  materials,  and  molecular  devices  with  properties  that  can  be  predicted,  tailored,  and  tuned  before  production  

•     The  simulation  of  technologically  relevant  chemical  reactions  and  processes,  which  has  a  huge  potential  in  a  variety  of  fields    

•     The  control  of  how  molecules  react,  over  all  timescales  and  the  full  range  of  molecular  size  catalysis,  which  remains  a  major  challenge  in  the  chemistry  of  complex  materials,  with  many  applications  in  industrial  chemistry  –  e.g.  a  combinatorial  materials  search  under  realistic  treatment  of  supported  catalytic  nanoparticles  involving  several  hundred  transition  metal  atoms  would  require  resources  of  at  least  a  10  Pflop/s  

•     The  knowledge  of  atmospheric  chemistry,  which  is  crucial  for  environmental  prediction  and  protection  (clean  air).    

Challenges  

Challenge:  Quantum  chemistry  

The  key  goal  of  quantum  chemistry  is  the  accurate  calculation  of  geometrical  and  electronic  ground  state   properties   of   molecules   as   well   as   their   excited   states.   Requested   chemical   accuracy   of   1  kcal/mol   is  difficult   to  achieve  with   functionals  available   in  density   functional   theory.  On   the  other  hand,   quantum   chemical   methods   are   predominantly   applied   to   small   isolated   molecules,   which  correspond  to  the  state  of  an  ideal  gas.  Most  chemical  processes,  however,  take  place  in  condensed  phase,  and  the  interaction  of  a  molecule  with  its  environment  can  generally  not  be  neglected.    

•     Solvent  effects.  Solvent  molecules  can  directly  interact  with  the  reacting  species,  for  example  by  coordination  to  a  metal  centre  or  by  formation  of  hydrogen  bonds.  In  such  cases,  it  is  necessary  to  include  explicitly  solvent  molecules  in  the  calculation.  Depending  on  the  size  of  the  solvent  molecules  and  the  number  needed  for  convergence  of  the  calculated  properties,  the  overall  size  of  the  molecular  system  and  the  resulting  computational  effort  can  increase  significantly.  Currently,  only  DFT  methods  are  able  to  handle  several  hundred  

Page 93: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  93  

atoms,  but  the  developments  towards  linear-­‐scaling  approaches  in  quantum  chemical  wavefunction-­‐based  methods  are  very  promising  (see  below).  An  alternative  would  be  a  mixed  quantum  mechanical  (QM)  and  molecular  mechanical  (MM)  treatment  (QM/MM  method).  If  there  are  no  specific  solute–solvent  interactions,  the  main  effect  of  the  solvent  is  electrostatic  screening,  depending  on  its  dielectric  constant.  This  can  be  described  very  efficiently  by  continuum  solvation  models  (CSM).  

•     Spectroscopy  for  large  molecules.  Calculated  molecular  spectroscopic  properties  are  very  helpful  in  the  assignment  and  interpretation  of  measured  spectra,  provided  that  the  accuracy  is  sufficiently  high.  In  many  cases,  spectra  can  be  obtained  with  reasonable  accuracy  at  the  DFT  or  MP2  level,  but  ultraviolet  and  visible  spectra  normally  require  more  elaborate  theories  like  configuration  interaction  (CI)  which  are  only  applicable  to  very  small  molecules.  To  improve  on  the  currently  applied  semi-­‐empirical  approaches,  it  would  be  necessary  to  calculate  accurate  excitation  energies  also  for  large  molecules  such  as  organic  dyes  with  50  or  more  atoms.  One  option  is  to  implement  methods  for  division  of  large  systems  into  separate  fragments  to  be  calculated  with  quantum  chemical  methods  on  separate  nodes  in  parallel  (denoted  by  different  authors  as  divide-­‐and-­‐conquer  or  fragment  molecular  orbitals).  

•     Accurate  thermochemistry.  The  highly  accurate  calculation  of  thermochemical  data  with  quantum  chemical  methods  is  currently  possible  only  for  small  molecules  up  to  about  10  atoms.  However,  much  of  the  data  for  molecules  of  this  size  is  already  known,  whereas  accurate  experiments  for  larger  compounds  are  quite  rare.  Therefore,  efficient  quantum  chemical  methods  are  needed  which  are  able  to  treat  molecules  with  30–50  atoms  with  the  same  level  of  accuracy.  Another  problem  arises  for  large  molecules.  They  often  have  a  high  torsional  flexibility,  and  the  calculation  of  partition  functions  based  on  a  single  conformer  is  therefore  not  correct.  Quantum  molecular  dynamics  could  probably  give  better  answers,  but  in  many  cases  it  is  too  expensive.  This  will  clearly  change  in  the  coming  decade.      

Challenge:  Photochemistry  

Sunlight   is   the   predominant   energy   on   Earth   and   a   key   factor   in   photosynthesis.   It   is   intimately  related  to  life.  The  in-­‐depth  understanding  of  the  nature  of  electronic  excited  states  in  biological  and  other   complex   systems   is   unquestionably  one   of   the   key   subjects   in   present-­‐day   chemical   and  physical   sciences.   It   is  well   known   that   interaction   between   light   and  matter   has  many   important  consequences   in   biological   process   and   in   advanced   materials   elaboration.  It   includes   the  comprehension  of  physiological  process  (e.g.  vision)  to  the  development  of  phototherapeutic  drugs.  There   are   wide-­‐ranging   technological   applications   of   these   processes:     from   the   elaboration   of  molecular   photoelectronic   devices   to   the   design   of   efficient   solar   cells,   or   excited-­‐state   chemical  synthesis-­‐   and   quantum   control   of   reactions-­‐   to   possible   applications   in   quantum   computing   and  information  processing  using  excited  molecules.  

It   is  a   challenge   to   simulate   realistic  photo-­‐activated  processes  of   interest   in  biology  and  materials  science.  These  phenomena  usually   involve  non-­‐adiabatic   transitions  among   the  electronic   states  of  the   system   induced   by   the   coupled   motion   of   electronic   and   nuclear   degrees   of   freedom.  Consequently,  their  simulation  requires  both  accurate  ab-­‐initio  calculations  of  the  (many)  electronic  states  of   the  system  and  of   the  couplings  among   them  and   the  non-­‐adiabatic   time  evolution  of   its  components.   Several   approaches   have   been   developed   recently   to   tackle   these   problems,   but   the  techniques  currently  available  are  generally  either  not  efficient  or  not  accurate  enough  to  provide  a  reliable  tool  to  study  photo-­‐physical  processes  in  complex  systems.  The  importance  of  the  field  will  drive  progress  in  the  future.    

Page 94: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  94  

4.2.3 Nanoscience  

State  of  research    Nanoscience  and  nanotechnology  are  typically  understood  as  research  and  technology  development  at   the   atomic,   molecular   or   macromolecular   levels,   in   the   length   scale   of   approximately   1–100  nanometre  range,   i.e.   including  1  to  1  billion  atoms.   It   is  creating  and  using  structures,  devices  and  systems   that   have  novel   properties   and   functions   because  of   their   small   or   intermediate   size.   The  ability  to  control  or  manipulate  on  the  atomic  scale  is  an  essential  part  of  the  field.  Atomic  details  are  still   important:   surface   charge,   impurities,   dopants,   vacancy,   clusters,   symmetry,   step   edges   and  corners,  and  passivation.  A  large  number  of  simulation  challenges  and  opportunities  can  be  found  in  the  broad  topical  areas  of  (i)  nano  building  blocks  (nanotubes  and  graphene,  quantum  dots,  clusters,  and   nanoparticles,   organic   materials,   DNA),   (ii)   complex   nanostructures   and   nano-­‐interfaces,   (iii)  transport  mechanisms  at  the  nanoscale,  and  (iv)  the  assembly  and  growth  of  nanostructures.  

Over   the   past   10   years,   the   focus   in   theory,   modelling   and   simulation   research   has   been   on  elucidating   fundamental   concepts   that   relate   the   structure   of   matter   at   the   nanoscale   to   the  properties   of  materials   and   devices.   As   a   result,   theory,  modelling   and   simulation   have   played   an  important   role   in   developing   a   fundamental   understanding   of   nanoscale   building   blocks.  Computational  capability  has  increased  by  more  than  a  factor  of  1,000  over  the  past  10  years,  leading  to   more   ambitious   simulations   and   wider   use   of   simulation.   For   example,   the   number   of   atoms  simulated  by  classical  molecular  dynamics  for  10  ns  time  durations  has  increased  from  fewer  than  10  million   in   2000   to   nearly   1   billion   in   2010.  Over   the  past   decade,   new   theoretical   approaches   and  computational  methods  have  also  been  developed  and  are  maturing.  On  the  K-­‐computer  in  Japan  we  witnessed  at   the  end  of  2011  a  simulation41  –  acknowledged  by   the  Bell  Award  –   that  achieved  an  execution   performance   of   more   than   3   Pflop/s.   This   simulation   of   the   electron   states   of   silicon  nanowires  with  approximately  100,000  atoms  (20  nanometres  in  diameter  and  6  nanometres  long)  –  close   to   the   actual   size   of   the  materials   –   showed   that   the   electron   transport   characteristics   will  change  depending  on  the  cross-­‐sectional  shape  of  the  nanowire.  

Opportunities  Taking  into  account  the  establishment  of  an  exascale  infrastructure  and  the  ongoing  developments  of  the  computational  methods,  for  the  first  time  in  history  there  will  be  a  direct  overlap  of  experimentally  and  computationally  accessible  length  scales.  This  creates  unprecedented  opportunities  for  science  and  technology  to  gain  detailed  knowledge  of  dynamic  and  transport  processes  at  the  nanoscale,  moving  in  the  direction  of  designer  functionality  and  designer  materials.    

Challenges  Our   understanding   of   self-­‐assembly,   programmed   materials,   and   complex   nanosystems   and   their  corresponding   architectures   is   still   in   its   infancy,   as   is   our   ability   to   support   design   and  nanomanufacturing.    Two  challenges  are  outlined  below.  

Challenge:  Quantum  device  simulation  ab-­‐initio  

The  advance  of  faster  and  less  energy-­‐consuming  information  processing  or  the  development  of  new  generations   of   processors   requires   the   shrinking   of   devices,   which   demands   a   more   detailed  understanding   of   nanoelectronics.   As   semiconductor   devices   become   smaller,   so   it   becomes  more  difficult  to  design  or  predict  their  operation  using  existing  techniques.  On  the  other  hand,  given  this  reduction   in   size,   the  next  generation  of   supercomputers  will  enable  us   to  perform  simulations   for  whole  practical  nanoscale  devices,  based  on  electronic  theory  and  transport  theory,  and  to  develop  

                                                                                                                         41  Y.  Hasegawa  et  al.,  Proceedings  of  2011  International  Conference  for  High  Performance  Computing,  Networking,  Storage  and  Analysis  (ACM  New  York,  2011)    

 

Page 95: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  95  

 Figure  4.3.  Spatial  distribution  of  local  density  of  states  for  the  phase  change  materials  Ge512Vac512Sb1024Te2048  in  a  supercell  of  4,096  sites.  In  the  left  lower  part  of  the  cube  the  chemical  information  and  in  the  upper  right  part  the  value  of  the  local  density  of  states  (DOS)  are  displayed.  Here,  large  (small)  radii  of  the  spheres  correspond  to  high  (low)  DOS  values  as  specified  in  the  right  panel.  For  both  parts  of  the  plot  Ge,  Vac,  Sb,  and  Te  are  shown  in  white,  transparent,  light  blue,  and  dark  blue,  respectively.  (A.  Thiess,  R.  Zeller,  P.  H.  Dederichs  &  S.  Blügel,  2012.)  

guidelines   for   designing  new  devices   that   incorporate   the  quantum  effects   that   control   nano-­‐level  phenomena.  This  requires  the  description  of  the  temporal  evolution  of  a  switching  quantum  device  with   defects   and   leads.   Envisaged   are   pico-­‐second   simulations,   for   example   based   on   time-­‐dependent  density   functional   theory  of  1,000,000  atoms  of  a   spin-­‐torque  magnetic   random  access  memory  (MRAM),  nano-­‐ionics  based  resistive  switching  memories,  or  organic  electronics,  all  possible  for  the  first  time  with  the  advent  of  exascale  computing.    

Challenge:  Design  of  nanostructures  

Self-­‐assembly   is   a   central   feature   of   nanoscience.   Understanding,   predicting,   and   controlling   this  process   is   crucial   for   the   design   and   manufacture   of   viable   nanosystems.   Clearly   the   subsystems  involved   in   this   process   are   assembling   themselves   according   to   some  minimum   energy   principle.  Once   an   understanding   of   the   underlying   physics   is   available,   optimization   problems   can   be  formulated  to  predict  the  final  configurations.  Since  these  systems  are  also  huge  and   likely  to  have  many   local  minima,   a   careful   development   of   the  models,   the   constraints,   and   the   algorithms  will  also  be  required  here.  The  description  is  pursued  with  a  QM/MM  context.      

4.3 A  Roadmap  for  Capability  and  Capacity  Requirements  

In   the   past   decade,   the   fundamental   techniques   of   theory,   modelling   and   simulation   that   are  relevant   to  materials   science   have   undergone   a   stunning   revolution.   It   has  made   this   community  intense  users  of  computing  at  all   levels  driven  by   the  change   from  the  Tflop/s   to   the  Pflop/s   level.  This  field  has  produced  several  Gordon  Bell  Prize  winners  for  the  ‘fastest  application  code’  –  in  1988  (1  Gflop/s),  in  2001  (11  Tflop/s),  and  most  recently  in  2011  (3.08  Pflop/s)  on  the  K-­‐computer  in  Japan  for  the  electronic  structure  of  silicon  nanowires.  But,  impressive  as  the  increase  in  computing  power  has  been,  it  is  only  part  of  the  overall  advance  in  simulation  that  has  occurred  over  the  same  period.  Advances  in  the  period  include  the  following.  

• New  mesoscale  methods  (including  dissipative  particle  dynamics  and  field  theoretic  polymer  simulation)  have  been  developed  for  describing  systems  with  long  relaxation  times  and  large  

Page 96: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  96  

spatial  scales,  and  these  are  proving  useful  for  the  rapid  prototyping  of  nanostructures  in  multicomponent  polymer  blends.  Here  the  requirement  is  for  larger  simulations  times  and  larger  systems.  

• Molecular  dynamics  with  fast  multipole  methods  for  computing  long-­‐range  inter-­‐atomic  forces  have  made  possible  the  accurate  calculations  of  the  dynamics  of  millions  and  sometimes  billions  of  atoms.  The  requirement  here  is  also  for  larger  simulations  times  and  larger  systems.  Molecular  dynamics  simulations  are  important  in  soft-­‐matter  but  also  in  life  science.      

• Monte  Carlo  methods  for  classical  simulations  have  undergone  a  revolution,  with  the  development  of  a  range  of  techniques  (e.g.  parallel  tempering,  continuum  configurational  bias  and  extended  ensembles)  that  permit  extraordinarily  fast  equilibration  of  systems  with  long  relaxation  times.  Recently,  Monte  Carlo  methods  have  been  combined  with  DFT  methods  to  determine  materials-­‐specific  thermodynamic  properties.    

• Density  functional  theory  (DFT)  and  extensions  such  as  ab-­‐initio  ‘Car-­‐Parrinello’  molecular  dynamics  (ab-­‐initio  MD)  and  time-­‐dependent  DFT  (TDDFT)  have  transformed  material  physics  and  likewise  computational  chemistry,  surface  science  and  nanoscience.  These  methods  provide  the  capability  to  describe  the  electronic  structure,  interatomic    forces  and  in  part  also  the  electronic  excitations  of  molecules  and  condensed  media  containing  hundreds  or  thousands  of  atoms  in  a  computational  volume  that  might  be  periodically  repeated  (see  Figure  4.3),  together  with  their  static  and  dynamical  structural  properties.  A  large  variety  of  DFT  methods42,43  have  been  developed  that  are  able  to  cope  with  the  chemical  challenge  of  the  periodic  table,  the  heterogeneity  of  systems  and  the  structural  and  compositional  diversity,  i.e.  large  classes  of  molecules  and  materials  can  now  be  described  with  a  reliable  predictability.  Applying  the  popular  local  density  approximation  (LDA),  the  accuracy  of  calculated  energy  differences  between  equilibrium  states  is  estimated  to  about  3  meV/atom  ~0.1  kcal/mol,  0.2%  for  a  charge  density  difference,  and  an  atomic  force  difference  of    10-­‐5  atomic  units.  Ab-­‐initio  MD  has  extended  its  field  of  applications  by  developing  an  algorithm  to  capture  rare  events.      In  particular,  a  number  of  linear-­‐scaling  methods  have  been  developed  or  are  under  development,  some  of  which  are  becoming  efficient  for  system  sizes  larger  than  10,000  atoms.  These  are  geared  towards  efficient  use  of  massively  parallel  computers,  are  likely  to  be  extendable  to  exascale  computing  and  can  bring  the  computable  system  sizes  to  new  horizons.    In  the  past  10  years  we  have  witnessed  a  drive  to  extend  the  applicability  of  DFT  to  wider  classes  of  systems  exhibiting  strong  electron  correlations  (oxide  materials,  defects,  partially  filled  d-­‐  and  f-­‐electron  systems)  or  long-­‐range  correlation,  as  exemplified  by  the  van  der  Waals  interaction.  Modern  approaches  to  improve  the  description  by  better  exchange-­‐correlation  functionals  are  based  on  orbital-­‐dependent  quantities,  such  as  hybrid  functionals,  range-­‐separated  functionals  or  a  separated  treatment  of  exchange  and  correlation  (e.g.  exact  exchange  plus  the  random-­‐phase  approximation).  These  functionals  improve  the  predictability  of  properties  (e.g.  enthalpy  of  formation  of  molecules  is  on  average  better  than  3  kcal/mol  for  B3LYP  functional),  demanding  some  10–100  times  more  CPU  time.  Parallelisation  on  massively  parallel  computers  seems  non-­‐trivial,  while  local  accelerators  to  a  CPU  would  appear  beneficial.  

• Beyond  DFT,  Hedin’s  GW  approximation  based  on  many-­‐body  perturbation  theory  has  been  implemented  in  many  electronic  structure  codes  and  has  found  widespread  use  in  calculating  spectroscopic  properties.  Originally  used  primarily  to  calculate  the  band  gap  of  semiconductors,  access  to  increased  computational  resource  and  diversity  of  methods  has  seen  GW  recently  applied  to  study  surfaces,  nanostructures  and  molecules.  A  detailed  comparison  of  the  self-­‐energy  in  the  GW  context  with  the  exchange-­‐correlation  functional  in  DFT  is  expected  to  pave  the  way  for  further  improvement  of  functionals.  This  interplay  of  different  approaches  to  

                                                                                                                         42  Yousef  Saad,  James  R.  Chelikowsky  &  Suzanne  M.  Shontz,  SIAM  Review,  vol.  52,  pp.  3–54  (2010)  43  http://www.psi-­‐k.org/codes.shtml  

Page 97: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  97  

correlated  systems  has  already  been  exploited  in  the  realm  of  TDDFT.  

• Wavefunction-­‐based  schemes  are  the  norm  for  the  quantum  chemistry  community.  They  allow  a  systematic  approach  to  the  exact  solution  of  the  electronic  Schrödinger  equation  and  in  this  way  offer  a  hierarchy  of  methods  useful  for  estimating  error  bars  of  simpler  approximations.  Also,  traditional  DFT  methods  are  not  capable  of  predicting  correctly  the  breaking  of  chemical  bonds  or  describing  dark  excited  states  that  control  photochemistry.  However,  standard  wavefunction-­‐based  methods  have  computational  costs  that  rise  steeply  with  the  number  of  atoms  in  the  molecule.  Much  effort  is  focused  on  numerical  efficiency  and  parallelisation.  ‘Static’  correlation  is  often  centred  on  the  (usually  unattainable)  full-­‐configuration  interaction  (CI)  method.  In  practice,  CASSCF  and  CASPT2  are  among  the  most  successful  methodologies,  with  the  multi-­‐configurational  perturbation  theory  (PT2)  adding  the  necessary  dynamical  correlation.  For  ‘dynamical  correlation’  (or  single  reference)  problems,  quantum  chemists  have  developed  Coupled  Cluster  (CC)  methodologies  such  as  CCSD(T),  or  perturbation  theories  such  as  MP2,  etc.  In  recent  years,  the  range  of  systems  amenable  to  highly  accurate  CC  calculations  has  increased  dramatically.    Although  parallelisation  on  massively  parallelizsed  computers  is  difficult,  a  breakthrough  can  be  expected  in  the  next  decade  with  the  application  of  the  tensor-­‐network  theory  and  tensor  approximation  to  quantum  chemistry.  Introducing  linear-­‐scaling  methods  for  many  quantum-­‐chemical  methods  and  for  the  computation  of  various  molecular  properties  has  circumvented  the  steep  increase  of  computational  effort  with  molecular  size.  These  methods  exploit  the  local  electronic  structure  and  open  the  way  to  treat  large  molecular  systems  with  1,000  atoms  and  more  at  the  Hartree  Fock,  DFT  and  MP2  levels.  This  community  would  appear  to  be  best  served  by  a  diverse  mix  of  architectures,  including  those  with  computationally  powerful  fat  nodes.    

• The  interest  in  Quantum  Molecular  Dynamics  continues  to  grow.  The  standard  method  of  solving  the  Schrödinger  equation  uses  a  representation  of  the  wavepacket  and  Hamiltonian  in  an  appropriate  product  basis.  The  method  is  restricted  by  the  computational  resources  required,  which  grow  exponentially  with  the  number  of  degrees  of  freedom.  The  treatment  of  tetra-­‐atomic  systems  is  now  becoming  the  state  of  the  art,  but  studies  of  systems  with  more  than  six  degrees  of  freedom  are  in  general  still  impossible.  The  multi-­‐configuration  time-­‐dependent  Hartree  (MCTDH)  algorithm,  which  corresponds  to  a  multi-­‐configurational  mean-­‐field  method,  does  not  overcome  the  exponential  scaling  but  significantly  alleviates  the  problem  due  to  the  construction  of  a  variationally  optimised  moving  basis.  MCTDH  is  arguably  today's  most  powerful  wavepacket  propagation  method,  and  it  can  be  applied  for  systems  typically  involving  20–50  degrees  of  freedom.  The  exponential  scaling  can  be  avoided  by  turning  to  more  approximate,  in  particular  to  semi-­‐classical,  methods,  where  the  wavepacket  is  approximated  by  an  ensemble  of  particles  that  follow  classical  trajectories  (e.g.  ab-­‐initio  path  integral  molecular  dynamics).  A  considerable  range  of  theoretical  methods  is  applied  to  tackle  these  systems.    

• Quantum  Monte  Carlo  (QMC)  methods  now  promise  to  provide  nearly  exact  descriptions  of  the  electronic  structures  of  molecules.  Traditionally,  these  methods  have  been  based  on  the  variational  MC  method,  or  diffusion  MC  and  Green's  function  MC.  The  latter  two  are  projection  approaches,  which  dispense  with  quantum  chemical  basis  sets  but  have  to  deal  with  the  Fermion  sign  problem  and  the  related  fixed-­‐node  approximation.  In  general,  all  QMC  methods  exhibit  good  scaling  with  the  number  of  electrons,  enabling  the  treatment  of  relatively  large  systems,  but  with  a  computational  cost  much  larger  than  traditional  ab-­‐initio  methods  based  on  DFT.    Unlike  wavefunction-­‐based  quantum  chemical  methods,  they  are  essentially  stochastic  in  the  way  they  seek  to  solve  the  electron  correlation  problem  exactly  and  thus  benefit  significantly  from  massively  parallel  computer  architectures  that  become  effective  in  Pflop/s  and  Eflop/s  computing.  This  is  one  reason  why  their  importance  will  increase  during  the  next  decade.    

• During  the  last  few  years  conventional  electronic  structure  calculations  based  on  DFT  in  the  local  density  approximation  (LDA)  have  been  merged  with  a  modern  many-­‐body  approach  –  the  

Page 98: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  98  

dynamical  mean-­‐field  theory  (DMFT)  –  into  a  novel  computational  method  referred  to  as  LDA+DMFT  to  address  strongly  correlated  electron  systems.  The  solution  of  the  effective  multiband  impurity  problem  is  the  main  point,  achieved  by  a  quantum  impurity  solver  –  typically  the  Hirsch–Fye  QMC  algorithm  based  on  a  time  discretisation  approach.  A  new  generation  of  continuous-­‐time  Quantum  Monte  Carlo  (CT-­‐QMC)  methods  for  numerically  exact  calculation  of  complicated  fermionic  path  integrals  has  recently  been  proposed  for  interacting  electrons  based  on  the  weak-­‐coupling  and  strong-­‐coupling  perturbation  expansion.  This  methodological  breakthrough  in  the  quantum  many-­‐body  theory  will  stimulate  the  realistic  modelling  of  the  electronic,  magnetic,  orbital  and  structural  properties  of  materials  such  as  transition  metals  and  their  oxides.  It  still  needs  considerable  developments  to  be  able  to  treat  increasingly  complex  systems.  

4.3.1 Materials  Science  In  materials   science   all   these  methods   come   into   use   as   elements   of  multiscale  materials   design,  where  the  design  of  a  material  includes  all  aspects  from  functionality  to  manufacturing.    

Steel   is   an  example  of   a   seven-­‐component   alloy  where   the  macroscopic  properties  depend  on   the  microscopic   properties   of   seven   chemical   constituents   that   determine   the   properties   on   the  mesoscopic  structure.  Catalysts,  device  elements,  smart  materials  or  composite  materials  are  other  examples   where   materials   screening   will   develop   into   a   materials   genome   project   where   some  250,000   different   materials   systems   have   to   be   calculated   to   generate   a   database.   This   requires  massive  capacity  computing  with  high  throughput  requirements  –  a  factor  of  1,000  improvement  in  throughput  speed  is  necessary  to  make  materials  informatics  a  powerful  and  widely  used  tool.    

4.3.1.1 Strongly  Correlated  Electron  Materials  

LDA+DMFT   represents   a   novel   and   extremely   promising   approach   to   treat   strongly   correlated  electron   materials   realistically   and   to   compute   their   properties.   There   is   a   strong   motivation  developing  this   tool   to  predictive  power.  This   is  a  major  effort   for   the  decade  ahead.  Today,  an   in-­‐depth  analysis  of  one  system  with  three  to  five  orbitals  may  take  two  years  on  a  rack  (4,096  cores)  of  a  Pflop/s  high-­‐performance  computer.  The  CT-­‐QMC  algorithm  has  the  potential  to  push  back  the  sign  problem  beyond  seven  orbitals  and  to  lower  temperatures  that  cannot  be  studied  today  due  to  the  lack  of  CPU  time.  Switching  to  an  exascale  computer  with  500  MByte  per  core  will  allow  for  a  much  larger  throughput  of  systems,  which  is  absolutely  necessary  to  scan  the  properties  of  these  materials  as  function  of  temperature,  pressure  and  other  external  stimuli.  Higher  throughput  is  also  necessary  to  validate  the  accuracy  of  the  underlying  model  and  to  engage  more  people   in  this  research.  Only  under  these  conditions  can  we  use  these  methods   in  a  materials  science  approach  and  unravel  the  secrets  of  this  materials  class  for  use  in  the  context  of  a  functional  design.  Since  the  time-­‐consuming  algorithm  is  stochastic  in  nature,  the  methods  can  make  use  of  massively  parallelised  computers.  It  is  expected  to  scale  to  exascale  computing  and  it  is  certainly  an  application  for  a  Tier-­‐0  infrastructure.  If  more  memory  per  core  is  available,  different  impurity  solvers  can  be  employed  that  are  more  robust  against   the   fermionic   sign   problem.   Considering   the   many   materials   that   exhibit   magnetism,  superconductivity,   multiferroicity,   orbital   ordering,  Mott-­‐transition,   Kondo-­‐effect   in   bulk,   surfaces,  interfaces,  molecular  crystals,  with  available  CPU  time,  this  becomes  a  major  activity.  The  modestly  estimated  CPU  time  is  30  Pflop/s  Tier-­‐0  access  throughout  the  year.            

4.3.1.2 Soft  Matter    

Progress   in   computational   studies   of   soft-­‐matter   systems   requires   the   possibility   to   investigate  systems  that  are  more  complex.  Increasing  complexity  typically  requires  much  larger  system  sizes.  In  many   cases,   due   to   increased   relaxation   times,   it   also   requires   much   longer   runs.   In   the   past,  progress   has   been  possible   on   the  one  hand  due   to   improved   simulation   codes,   and  on   the  other  

Page 99: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  99  

hand  due  to  the  enormous  increase  in  computer  power.  Since  many  codes  are  already  very  efficient,  the  field  relies  heavily  on  the  future  increase  of  computer  capability.    

Today’s   available   computer   resources   are   nowhere   near   adequate   to   simulate   sufficiently   realistic  chemical   models   of   long   polymer   chains   in   a   melt   long   enough   to   achieve   predictions   for   real  systems:   this   would   require   an   increase   in   the   power   of   present-­‐day   computers   by   two   to   three  orders  of  magnitude.    

An  exascale  infrastructure  would  allow  access  to  important  new  areas  of  investigation.  These  include  the  following:  

• The  suppression  of  turbulence  in  liquids  by  addition  of  polymers  (‘turbulent  drag  reduction’)    

• The  prediction  of  the  flow  behaviour  of  blood  from  the  squeezing  of  a  single  red  blood  cell  through  narrow  capillaries  to  the  streaming  through  arteries,  membrane  fluctuations  and  membrane  functions  including  their  interaction  with  membrane  proteins  and  the  realistic  consideration  of  the  surrounding  water    

• The  structure  building  and  function  of  molecular  aggregates  as  well  as  the  connection  of  the  variable  conformation  of  macro-­‐molecules  with  functional  groups  (e.g.  chromophores  in  fluorescent  polymers)  and  their  electronic  properties  

The   latter  examples  are  closely   related   to  other   fields  of   computational   science,   including  catalysis  research,   quantum   chemistry   and   fluid   mechanics.   In   order   to   stay   competitive   in   the  aforementioned  fields  and  to  take  part  in  the  described  developments,  the  available  computational  power   has   to   increase   by   orders   of  magnitude,   and   the   access   to   it   on   national   and   international  levels  is  indispensable.    Serious  consideration  must  also  be  given  to  the  provision  of  a  special  purpose  computer  for  long  MD  runs.    

A   crucial   aspect   is   the  optimal   and  efficient  use  of  massively  parallel   supercomputers   for   this   very  broad   range   of   complex   problems   in   soft   matter.   The   ‘know-­‐how’   on   suitable   parallelisation  strategies  must  be  strengthened  and  expanded.  

4.3.2 Chemistry  Considering  the  great   importance  of  non-­‐adiabatic  processes  from  photoinduced  charge  transfer   in  energy  harvesting,  solar  cells  to  femto-­‐  and  ato-­‐chemistry,  time-­‐dependent  density  functional  theory  in   combination   with   Ehrenfest   dynamics44   and   quantum   molecular   dynamics   will   be   used   more  extensively   in   the   future.   Since   the   methods   require   considerable   computational   resources,  considerable   benefit   will   arise   from   an   exascale   environment,   expanding   usage   to   a   wider  community.  Estimated  CPU  time  ~10  Pflop/s.    

4.3.3 Nanoscience  Nanoscience   has   benefited   considerably   from   the   DFT,   Car-­‐Parrinello   MD   and   the   DFT   tool  development.  Nanoscience  at  ‘1  nm’  length  scales  (i.e.  a  few  thousand  atoms)  is  achievable  today  on  a  few  thousand  processors  with  a  cubic  scaling  of  the  CPU  time  with  system  size.  As  a  rule  of  thumb  (with  many   exceptions),   1   processor   and   2   GB  memory   per   atom   shows   good   scalability,   and   the  scalability   limit   is   a   few   processors   per   atom.   Density   functional   investigations   at   the   nano-­‐scale,  using   existing   methods,   have   large   aggregate   CPU   demands   (billion   CPU   h/year).   Nano-­‐structure  calculations   require   computation   of   the   structures,   energetics   and   dynamics   of   highly  inhomogeneous  systems.  As  typical  for  this  field,  often  many-­‐of-­‐a-­‐kind  runs  are  required  to  conclude.    

As   an   example,   one   ab-­‐initio   MD   simulation   of   1,000   atoms   simulated   with   20,000   molecular  dynamics  steps  (1  time  step  is  equivalent  to  0.5  fsec,  total  simulation  time  10  psec)  requires  about  2  weeks  of  simulation  time  on  4,096  processors.  Section  7.7  reveals  about  6,000  DFT  publications  per  

                                                                                                                         44  10  times  higher  numerical  effort  than  ab-­‐initio  molecular  dynamics,  typically  large  systems  of  a  few  hundred  atoms  

Page 100: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Materials  Science,  Chemistry  and  Nanoscience  

  100  

year  in  Europe,  about  750  of  which  are  publications  resulting  from  ab-­‐initio  MD.  Estimating  crudely  that   five   simulations   of   this   type   lead   to   one   publication,   ab-­‐initio  MD   requires   currently   around  1.875   Pflop/s   of   sustained   computer   power   throughout   the   year.   The   aim   of   computational  nanoscience   is   to   overlap   more   realistically   with   the   stunning   experimental   advances   of   the   field  producing  experimental  details  that  require  a  quantitative  analysis.    

This  means  we  have   to  deal  with   larger   length   scales,   longer   simulation   times   (estimated   factor  of  increase   in   floating-­‐point  operations:   1,000),  more   configurations   (different   isomers,   compositions,  atomic   distributions)   (estimated   factor   of   increase   in   floating-­‐point   operations:   10)   and   greater  precision   in   approximations   to  DFT   in   particular   for   the   excited   state   (estimated   factor   100).   Then  one   can   arrive   at   the   rather   shocking   estimate   that,   even   after   an   establishment   of   an   exascale  infrastructure,  the  available  computer  time  will  be  insufficient  and  only  the  most  excellent  research  proposals  can  be  funded  at  a  European  exascale  facility.    

The  conventional  DFT  codes  will  not  generally  run  on  105+  processors  and  this  becomes  even  more  difficult   with   the   addition   of   orbital-­‐based   functionals   to   obtain   higher   predictability.   The   order-­‐N  methods  developed  over   the  past  years  are  designed  to  scale  on  massively  parallel  computers  and  are  expected  to  have  the  potential  to  scale-­‐up  to  exascale.  Adapting  or  redesigning  these  codes  for  such  architectures  is  not  only  time-­‐intensive  but  a  challenge  in  itself.    

In  summary,  there  is  no  doubt  that  the  materials  science,  chemistry  and  nanoscience  community  in  Europe  requires  a  large  allocation  of  CPU  time  that  will  exceed  1  Eflop/s.  There  is  a  large  demand  on  Tier-­‐0   capability   computing   for   dynamical   mean-­‐field   theory,   (ab-­‐initio)   molecular   dynamics,  multiscale  and  device  simulation.    

Obviously,   a   European   exascale   environment   must   take   into   account   that   this   field   also   requires  capacity   computing   to   search   the   immense   phase   space   of   opportunities.   Therefore,   a  heterogeneous  infrastructure  best  serves  this  field.    

We   re-­‐emphasise   that   a   critical   requirement   of   this   community   is   the  optimal   and   efficient   use  of  massively  parallel  supercomputers  for  this  very  broad  range  of  complex  problems  in  soft  matter.  The  ‘know-­‐how’  surrounding  suitable  parallelisation  strategies  needs  to  be  strengthened  and  expanded.  

 

4.4 Expected  Status  in  2020  Implementing  existing  codes  or  developing  new  codes  for  an  exascale  facility  is  an  enormous  challenge  that  will  be  addressed.  An  analysis  of  the  number  of  papers  published  over  time  from  simulation  and  computation-­‐oriented  communities  in  Europe  reveals  a  linear  or  faster  growth  in  the  number  of  active  scientists  and  published  papers.  Analysing  the  increasing  number  of  faculties  in  Europe  in  the  fields  of  materials  science,  physics,  applied  physics,  chemistry  and  engineering  sciences  clearly  shows  that  the  growth  of   computer  power  –   a   factor  of   1,000  every  10   years   –   leads   to   a   speed  of  progress  which  excels   simulation   to   the   point   that   simulation   becomes   a   driving   factor   for  materials   discovery   and  innovation.   Indeed,   it   becomes   the  driving   force   in   a   design   continuum   from   fundamental   discovery  through  systems  engineering  and  processing  to  technology.  By  2020,  full  device  simulations  from  first  principles   will   become   possible.   Across   Europe   we   will   have  more   graduate   schools   of   simulation  sciences.   In   2020   we   can   expect   to   reach   the   ‘simulation   laboratory’   paradigm,   where   core  developers  of  community  codes  are  in  contact  with  a  European  exascale  facility  and  at  the  same  time  educate   and   train   the   community   on   codes   for   capability   computing   to   enable   them   to   perform  simulations  using  these  codes.  The   innovation  and  design  of  new  materials,  processes,  devices  and  technology  will  dramatically  speed  up.  Labour-­‐intense  experimental  trial  and  error  will  become  much  more   efficient   due   to   the   computational   pre-­‐screening.   Materials   informatics   will   gain   traction.  Designing   complex  materials   systems  based  on   the   knowledge  of   structural,  mechanical,   chemical,  optical,   dielectric,   electric   and   magnetic   properties   will   significantly   influence   the   integration   of  knowledge  and  communication,  the  progress  of  medical  analysis  capabilities,  solutions  to  energy  and  environmental  quests,  and  the  way  our  society  develops  into  the  future.  

Page 101: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Life  Sciences  and  Medicine  

  101  

5 LIFE SCIENCES AND MEDICINE

5.1 Summary  

The  life  sciences  community  is  very  diverse  and  there  is  an  important  imbalance  between  the  large  community   of   experimental   biologists   (who   strongly   depend   on   computational   results)   and   the  small  community  of  computational  biologists   (who  rely  heavily  on  HPC  resources).  For  this  reason,  the  work  of  computational  biologists  has  a  ‘multiplicative‘  effect  on  life  sciences.  As  an  example,  it  has  been  determined  that  a  discontinuation  of  the  access  to  biological  databases  would  block  most  research  pipelines  in  biomedicine  within  a  couple  of  days.  Despite  the  relatively  small  size  of  the  ‘in-­‐silico’   biology   community,   its   impact   on   the   life   sciences   is   enormous.   The   primary   goal   of  computational  biology  and  bioinformatics  is  to  understand  the  mechanisms  of  living  systems.  With  recent  experimental  advances  in  this  area  (e.g.  next  generation  of  DNA  sequencing  instruments)  the  data  generated  is  becoming  larger  and  more  complex.     In    contrast    to    other    communities,    there    are    no    universal    computer  packages  and  software  evolves  very  fast  to  adapt  to  new  instruments.  The   problems   faced   by   scientists   working   in   molecular   simulations   and   genomics   are   also   very  different,  as  are  the  computer  algorithms  used.  The  importance  of  having  fast  and  flexible  access  to  very   large   computational   resources   is   crucial   in   the   many   fields   of   life   sciences   and   the   lack   of  suitable   computers   can   block   entire   projects   with   important   consequences   for   science   and   for  society.  

Opinions  with  respect  to  extreme  computing  (exascale  computing)  are  unanimously  favourable,  but  opinions   about   single-­‐machine   Eflop/s   computing  were   less   enthusiastic.  While   Eflop/s  machines  are  a  major  requirement  for  specific  issues  (e.g.  brain  simulation),  and  higher  computational  power  will   enable   significantly   increased   accuracy   for   current  modelling   studies,   some   highly   important  fields  in  life  science  will  be  mainly  limited  by  throughput  and  data  management.  Therefore  a  single-­‐minded   focus  on  achieving  high   flop  rates   in   individual   runs,   rather   than  application   results,   could  seriously  damage  European  research  in  these  areas.  

Four  main  areas  in  life  sciences  and  health  that  requires  HPC  are  described  below:  genomics,  systems  biology,  molecular  simulation  and  biomedical  simulation.  These  four  fields  are  strongly  related  to  the  pharmaceutical   and   biotechnology   sectors   but   also   to   other  economically   important   areas   such  as  food  (agriculture),  environment  (biotoxicity)  and  energy  (biofuels).  

 

5.2 Computational  Grand  Challenges  and  Expected  Outcomes  

5.2.1 Genomics  In   genomics   research   we   face   problems     involving   the  management   of  massive   amounts   of   data  (e.g.  the  sequencing  of  2,500  genomes  of  cancer  patients)  in  programs  that  can  require  hundreds  of  thousands   of   processors   but   little   inter-­‐processor   communication.   However,   the   vast   amount   of  data  to  be  managed  (often  combined  with  confidentiality  and  privacy  aspects)  hampers  the  use  of  cloud  or   grid-­‐computing   initiatives   as   a   general   solution.   Suitable  and   flexible  access   to   computer  resources   is   crucial   in   this  area.  Currently  known  cornerstones   for  an  exascale   system   (number  of  computer   nodes,   I/O   and   memory   capacities)   are   clearly   driving   the   race   for   Eflop/s   peak  performance.   For  most   of   the   genomics   challenges,   such   an   Eflop/s   computer   could  be   even   less  ‘balanced’   than   today’s   HPC   systems   and   this   would   constitute   a   substantial   barrier   to   using   it  efficiently.  

The   fast   evolution   of   genomics   is   fuelling   the   future   of   personalised   medicine   (see   Figure   5.1).  

Page 102: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Life  Sciences  and  Medicine  

  102  

Genetic   variability   affects   how   drugs   react   with   each   patient,   sometimes   in   a   positive   manner  (increasing   the   healing   effect),   sometimes   in   a   negative  manner   (increasing   toxic   side   effects)   or  simply  by  reducing  drug  response.  Personalised  medicine  is  a  concept  that  will  replace  the  outdated  idea   that   a   single   drug   is   the   solution   for   an   entire   population.   It   will   develop   specific  solutions  for  segments  of  population  characterised  by  given  genetic  factors,  or  even  for  individuals.  Thanks   to   recent   advances   in  high-­‐throughput   genome  sequencing,  we  can  already  access   the   full  genomic   profile   of   a   patient   in   a   single   day,   and   the   throughput   of   next-­‐generation   sequencing  techniques  is  increasing  much  faster  than  Moore’s  law.  Currently,  sequencing  centres   require  multi-­‐PByte   systems   to   store   patient   data,   and   data   processing  is  carried  out  on  supercomputers  in  the  100  Tflop/s  to  1  Pflop/s  range.  Requirements  are  expected   to   increase  dramatically  as   sequencing  projects  are  extended  to  entire  populations,  making  possible  linkage  studies.    

 

   

Figure  5.1.  Pharmacogenomics  aims  to  identify  patients  at  risk  for  toxicity  or  reduced  response  to  therapy  prior  to  medication  selection.  (S.  Marsh  &  H.  L.  McLeod,  Hum.  Mol.  Genet.  15,  R89–R93  (2006).)  

5.2.2 Systems  Biology  Some    diseases     cannot     be     understood     at     the     gene     level     (genomics)     but     only   in     a    more  complex,  pathway  context.  Drug  effects  are  similarly  studied  at  the  systems  biology  level.  Disease-­‐associated     networks     containing     several     proteins     have     been     reported     as     possible   causes  of  disease.    

Furthermore,   perturbation   of   biological   networks   is   a   major   underlying   cause   of   adverse   drug  effects.  Detailed  knowledge  of  the  structure  and  dynamics  of  biological  networks  will  undoubtedly  uncover   new   pharmacological   targets.   Intense   research   is   being   carried   out   today   to   develop  models  for  identifying  protein  network  pathways  that  will  help  to  understand  the   undesired  effects  of   drugs  and  explore   how   they   are   related   to  network  connectivity   (see   Figure  5.2).   The  use  of  complex  network  medicine  is  expected  to  have  a  dramatic  impact  on  therapy  in  several  areas:  the  discovery   of   alternative   targets;   reducing   toxicity   risks   associated   with   drugs;   opening   new  therapeutic   strategies   based   on   the   use   of   ‘dirty’   drugs   targeting   different   proteins;   helping   to  discover  new  uses  for  existing  drugs.    Systems  biology  is  now  at  the  stage  of  collecting  data  to  build  models  for  complex  simulations  that  will,   in  the  near  future,  describe  the  dynamics  of  cells  and  organs  that  presently  remain  unknown.  The  models  that  are  developed  today  are  stored  in  databases.  Progress  is  rapid  and  systems  biology  

Page 103: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Life  Sciences  and  Medicine  

  103  

will   allow   us   to   couple   the   simulations   of   the   models   with   a   biomedical   problem   (e.g.   monitor  mutations   in  a  specific  genome  that  can  change  the  activity   of   a   protein).   This   will   require   large  computational   resources   and   systems   biology   will   benefit   from   Eflop/s   capabilities,   but   aspects  related  to  data  management  are  going  to  be  as  important  as  pure  processing  capability.  

Figure  5.2.  Development  of  models  that  can  be  used  to  do  drug  re-­‐profiling  and  to  simulate  in-­‐silico  drug  toxicity.  (Patrick  Aloy  et  al.,  IRBB.)  

5.2.3 Molecular  Simulation  Eflop/s   capabilities   will   allow   the   use   of   more   accurate   formalisms   (more   accurate   energy  calculations,   for   example)   and   enable  molecular   simulation   for   high-­‐throughput   applications   (e.g.  study  of   larger   numbers   of   systems).   Unfortunately,   if   Eflop/s   capabilities   are   achieved   simply  by  aggregating  a  vast  number  of  slow  processors,  this  will  not  favour  studies  of  longer  timescales,  since  it  will   not  be  possible   to   scale  up   to   systems  using  hundreds  of   thousand  cores   (as   the   simulated  systems  typically  have  less  than  1  million  atoms).  

The   needs   of   the   molecular   simulation   field   will   probably   be   better   served   by   a   heterogeneous  machine,   with   hierarchical   capabilities   in   terms   of   the   number   of   cores,   the   amount   of  memory,  memory  access  bandwidth  and   inter-­‐core   communication.   This   should  be  contrasted  with   current  ideas  regarding  a  ‘flat’  machine  with  peak  Eflop/s  power.  Exascale  capability  will,  however,  facilitate  biased-­‐sampling   techniques,   which   require   parallel   computing,   enabling   in-­‐silico   experiments  unreachable  today.    Examples   include   the   proteome-­‐scale   screening   of   chemical   libraries   to   find   new   drugs   and   the  study   of   entire   organelles,   or   even   cells,   at   the   molecular   level.   In   most   of   these   cases,  parallelisation   is  expected  to  be  hierarchical   (e.g.  ensemble  simulations,  multiscale  modelling,  or  a  mix  of  parallelisation  and  high  throughput).  Molecular   simulation   is   a   key   tool   for   computer-­‐aided   drug  design.   The   lack   of   high-­‐performance  computers   appropriate   for   this   research   will   displace   R&D   activities   to   the   USA,   China   or   Japan,  putting  European  leadership  in  this  field  at  risk.  Computational  simulation  of  biomolecules  provides  a  unique  tool  to  link  our  knowledge  of  the  fundamental  principles  of  chemistry  and  physics  with  the  behaviour  of  biological  systems  (see  Figure  5.3).  

Page 104: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Life  Sciences  and  Medicine  

  104  

 

 

 Figure  5.3.  Multiscale  molecular  simulation  in  life  sciences.  

 Appropriate   exascale   resources   could   revolutionise   this   area,   allowing   molecular   simulators   to  decipher   the   atomistic   clues   to   the   functioning   of   living   organisms.   Certain   grand   challenge  problems  in  this  area  fit  well  to  the  conventional,  general-­‐purpose,  exascale  development  roadmap.  However,   other   vital   problems   in   the   field   will   be   addressable   only   through   the   development   of  novel   architectures,   not   by   huge   machines   with   very   large   theoretical   peak   power   but  limited  efficiency   for   the   applications   of   interest.   This   is   already   at   an   advanced   stage   in   the   USA   and  Japan,  and  there  is  an  extreme  danger  that  Europe  will  be  left  behind.  

5.2.4 Biomedical  Simulation  

We   envision   projects   such   as   the   simulation   of   the   brain   (see   Figure   5.4),   organs   and   tissue  modelling,  as  well  as  in-­‐silico  toxicity  prediction.    

In   these   areas,   Eflop/s   capabilities   will   be   a   necessary,   but   not   sufficient,   requirement,   since   the  integration  of  experimental  information,  human  interaction  with  calculations  and  the  refinement  of  underlying   physical   models   will   also   be   instrumental   for   success.   As   in   the   case   of   molecular  simulation,  multiscale  modelling   is  one  of  the  major  challenges  of   this  area  and   represents  one  of  the  major  cross-­‐cutting  issues  of  exascale  systems  for  life  sciences.  The   extensive   use   of   simulation  will   allow   significant  improvements  in  the   quality   and  quantity  of  research  in  this  area.  Simulation  will  help  to  integrate  knowledge  and  data  on  the  body,  tissues,  cells,   organelles   and   biomacromolecules   into   a   common   framework   that   will   facilitate   the  simulation   of   the   impact   of   factors   that   perturb   the   basal   situation   (drugs,   pathology,   etc.).

Page 105: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Life  Sciences  and  Medicine  

  105  

 Figure  5.4.  Human  brain  simulation  timeline.  (Felix  Schürmann  &  Herny  Markram,  Blue  Brain  Project,  EPFL.)  

Biomedical  simulation  will  reduce  costs,  time  to  market  and  animal  experimentation.  In  the  medium  to  long  term,  simulation  will  have  a  major  impact  on  public  health,  providing  insights  into  the  cause  of   diseases   and   allowing   the   development   of   new   diagnostic   tools   and   treatments.   In   parallel,  simulations   will   have   a   major   impact   on   information   technology.   Thus,   it   is   expected   that  understanding   the   basic   mechanisms   of   cognition,   memory,   perception,   etc.,   will   allow   the  development  of  completely  new  forms  of  energy-­‐efficient  computation  and  robotics.  The  potential  long-­‐term  social  and  economic  impact  is  immense.  

5.3 A  Roadmap  for  Computational  Requirements  

The   priorities   set   out   by   the   experts   include   new   techniques   for   (i)   data   management   and  large   storage   systems   (increase   of   shared   memory   capacity),   (ii)   interactive   supercomputing,   (iii)  data   analysis   and   visualisation,   (iv)   multi-­‐level   simulation,   and   (v)   training.   As   life   sciences     and    health    is    such  a   heterogeneous    field,    it    will    be    necessary   to    have   several  application-­‐oriented  initiatives  developed  in  parallel,  although  they  can  share  similar  agendas.  A   flexible   access   protocol  to   Tier-­‐0   resources   will   be   as   important   as   absolute   computer  power  for  this  community.  The  following  specific  points  were  highlighted  by  the  panel.  

Competence  Centre.   The  life   sciences  panel   is  eager   to  apply  the  model  of  USA  co-­‐design  centres  focused  on  exascale  physics   applications,   such   as   the  Centre   for   Exascale   Simulation  of  Advanced  Reactors   (CESAR),   the   Co-­‐Design   for   Exascale   Research   in   Fusion   (CERF),   the   Flash   High-­‐Energy  Density  Physics  Co-­‐Design  Centre,  and  the  Combustion  Exascale  Co-­‐Design  Centre  (CECDC).  A  centre  with  academic  and  industrial  participation  focused  in  life  sciences  and  health  will  be  instrumental  to  facilitate   an   efficient   use  of   PRACE   Tier-­‐0   resources   in   areas   such   as   tissue   and   organ   simulation,  

Page 106: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Life  Sciences  and  Medicine  

  106  

molecular   dynamics,   cell   simulation,   genome   sequencing   and   personalised  medicine.   Considering  the   complex   nature   of   the   bio-­‐computational   field,   only   a   powerful   competence   centre   will  guarantee   compatibility  between   research   needs   in   the   area   and   the  new   generation   of   exascale  computers.  

Capability     and     Capacity.     The     panel     fails     to     recognise     the     real     relevance     of     the     debate  between   ’capability‘   and   ’capacity‘.   Some   of   bio-­‐problems   will   require   access   to   multi-­‐Pflop/s  computers,   others  will   largely  benefit   from   specific   purpose  or   hybrid  machines,  while  others  will  necessarily     require     exascale     resources.     Doubts     exist,     however,     as     to   whether   a   massive  computer  created  by  aggregating  a  vast  number  of  slow  cores  would  be  beneficial   for  most  of  the  life  science  community.  We  believe  that  obtaining  large  peak  performance  at  the  expense  of  a  loss  of  balance  would  be  an  error.   It  should  be  stressed  that  bio-­‐problems  will   require  not  only  Eflop/s  calculation  power  but  also  exadata  management  capabilities.  

Software   for   Extreme  Computing.  While  most   of   the   software   in   use   today   could  be   used   in   the  exascale,  most  of  the  software  that  should  be  used  has  not  yet  been  developed.  On  the  other  hand,  there   are   software   packages   available   today   whose   ‘functionality’   (but   not   necessarily   the   code  itself)  needs  to  be  ported  to  exascale  platforms.  These  applications  will  currently  not  run  efficiently  on   exascale   computers   without   enormous   efforts   in   method   development.   Concerns   exist   in   the  panel   over   the   fact   that   most   current   algorithms   cannot   scale   up   to   using   105   or   106   slow  processors.   Rather,   there   is   a   need   to   completely   reconsider   which   parallelisation   approaches  should  be  used.  A  brief  analysis  of  the  software  universe  in  the  field  is  shown  below.  

Quantum  Chemistry.   The  current   capability  of   first-­‐principles  quantum  chemistry   is  used   to  study  neurotransmitters,   helical   peptides   and   DNA   complexes.   Quantum   chemistry   calculations   are  precise   but   expensive.   Exascale   should   make   feasible   calculations   that   are   unthinkable   today.  Important  applications  in  this  field  used  for  bio-­‐simulations  include  Dalton,45  GAMESS,46  Gaussian47  

and  CPMD.48  

Chemical  Informatics.  It  is  becoming  unfeasible  to  fully  explore  and  predict  1D,  2D  and  3D  chemical  properties   of   small   molecules   with   databases   of   tens   of   millions   of   compounds.  Drug  discovery  based   on   small  molecules  will   need   to   deal  with   the   increasing   size   of  databases   (up   to   1   billion  entries   today).   Several   types   of   open-­‐source   and   proprietary   software   need   to   be   ready   for  exascale  systems.  

Stochastic  Models  and  Biostatistics.  Stochastic  methods  will  be  applied  to  model  complex  biological  systems,   to   simulate   large   coarse-­‐grained   systems,   to   sample   conformational   space   for  molecular  docking,  or  to  predict  secondary  structure  of  RNA.  Personalised  medicine  is  based  on  the  so-­‐called  Single  Nucleotide  Polymorphism  (SNP)  association  studies  to  identify  mutations   as   bio-­‐markers   for  genes   that   predispose   individuals   to   diseases.   The   existing  multi-­‐SNP  methods  are  only  capable  of  handling  10  to  100  SNPs,  a  very  small   fraction  of  exascale  systems  that  should  provide  methods  that  could  handle  much  higher  dimensionality.    

Sequence  Analysis.  With  the  increased  amount  of  data  generated  in  laboratories  around  the  world,  basic   protein   and   DNA   sequence-­‐based   calculations   are   becoming   a   significant   bottleneck   in    research.    For    example,    in    phylogeny    (reconstruction    of    ancient    proteins),  present-­‐day   Bayesian  approaches   cannot   be   applied   to   more   than   200   sequences   (200   base   pairs   long),   and   new  methods   will   increase   the   complexity.   Vital   applications   in   this   area   of   research   include   BWA,49  BLAST/BLASTMPI,50  CLUSTALW,51  HMMER52  and  MrBayes.53                                                                                                                              45  http://dirac.chem.sdu.dk/daltonprogram.org/  46  http://www.msg.ameslab.gov/gamess/  47  http://www.gaussian.com/  48  http://www.cpmd.org/  49  BWA,  Burrows-­‐Wheeler  Aligner,  http://bio-­‐bwa.sourceforge.net/  50  blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download  

Page 107: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Life  Sciences  and  Medicine  

  107  

Molecular   Modelling.   Molecular   modelling   is   a   key   discipline   for   rational   drug   design.  Computational   tools   in   this   area   allow   scientists   to   model   pharmaceutical   target   structures  and  calculate   protein–drug   docking   energy.  Molecular  modelling   represents   one   of   the  main   exascale      challenges.       Vital       applications       include      Gromacs,54       AMBER,55      NAMD,56  Autodock,57  Glide,58  Dock,59  Flexx,60  FTDock,61  LigandFit62  and  ROSETTA.63  

Network  Medicine.   In   recent   years   it  has  become  apparent   that  many  common  disorders  such  as  cancer,   cardiovascular   diseases   and   mental   diseases   are   often   caused   by   multiple   molecular  abnormalities.   As   mathematical   systems   theory   shows,   the   scale   and   complexity   of   the   solution  should  match  those  of  the  problem.  Network  medicine  has  multiple  potential  biological  and  clinical  applications.  For  example,  the  understanding  of  the  effects  of  cellular  interconnectedness  may  offer  better   targets   for   drug  development,  more   accurate   biomarkers  to  monitor    diseases    and    better    disease     classification.  Exascale     computing    will     be     the  necessary   infrastructure   to  move   from   a  static   to   a  dynamic   understanding  of   biological  pathways   and   protein   interaction   networks   (as   a  reference,   the   human   interactome   connects  25,000  protein-­‐coding  genes  and  ~1,000  metabolites).  In   the   panel’s   opinion,   software   to   be   used   in   this   area   in   exascale   computers   still   needs   to   be  developed.  

Cell   Simulation.   It   is  estimated   that  eukaryotic   cells   contain  about  10,000  different  proteins   (with  close   to   1,000,000   copies   of   some   of   them).   Whole-­‐cell   and   sub-­‐cellular   simulations   (e.g.  membranes)  will  require  huge  computational  resources  and  efficient  coupled  multiscale  simulation  applications.   In   the   panel’s   opinion,   software   to   be   used   in   this   area   in   exascale   computers   still  needs  to  be  developed.  

Tissue    Modelling.    As    described    in    previous    sections,    tissue    simulations    (like    heart,  respiratory  system   and   brain)   are   going   to   be   key   issues   for   animal   substitution   in   drug  testing.  Future  medicine  will  be  based  on  virtual  patient  models  and  this  should   increase  both  the  safety  and   the  efficacy  of  drugs.  Again,   it   is   clear   that   software   to  be  used   in   this   field   in  exascale  computers  still  needs  to  be  developed.  

Finally,   the  panel   identified   several   additional   technical   aspects   that  will   require   special   attention  from   computer   scientists.   These   include:   (i)   software   quality   control,   (ii)   development   tools,   (iii)  software  optimisation,  (iv)  hardware  optimisation,  and  (v)  exabyte  data  management.  To  implement  the  life  sciences  and  health  exascale  computing  applications,  the  experts  propose  a  timeline  to  build  an   exascale   centre   for   life   sciences.   The   centre   will   require   the   combined   expertise   of   vendors,  hardware  architects,   system  software  developers,   life  science  researchers  and  computer   scientists  working   together   to   make   informed   decisions   about   features   in   the   design   of   the   hardware,  software  and  underlying  algorithms.  

     

                                                                                                                                                                                                                                                                                                                                                                                           51  http://www.ebi.ac.uk/Tools/msa/clustalw2/  52  http://hmmer.janelia.org/  53  mrbayes.sourceforge.net/  54  www.gromacs.org/  55  http://  ambermd.org  56  www.ks.uiuc.edu/Research/namd/  57  http://autodock.scripps.edu/  58  www.schrodinger.com/  59  dock.compbio.ucsf.edu/  60  www.biosolveit.de/FlexX/  61  www.sbg.bio.ic.ac.uk/docking/ftdock.html  62  www.accelrys.com  63  http://www.rosettacommons.org/  

Page 108: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Life  Sciences  and  Medicine  

  108  

5.4 Expected  Status  in  2020  

5.4.1 Genomics  Advances   in   the   technologies   for   data   generation   that   both   increase   the  output   and   decrease  the  cost  will  mean  that,  over  the  next  decade,  the  quantity  of  data  being  produced  will  increase  by  at   least  a  thousandfold  and  maybe  as  much  as  a  millionfold.  There  are  three  key  aspects  that  HPC  centres  will  have  to  deal  with:  (i)  data  storage,  (ii)  data  transportation,  and  (iii)  data  confidentiality.  On  the  other  hand,  while  the  most  popular  genomics  software  is  regularly  reviewed  and  optimised  for  new   systems,   a   large   part   of   the   available   genomics   libraries   were   started   in  the   1990s   and  use   inefficient   script/high-­‐level   languages   (e.g.   Perl,   Java   or   Python  packages).  These  codes  still  perform   well   for   current   data   loads,   but   may   not   be   ready   for   the   data   challenges   of   the   next  decade.   In   order   to   avoid   simplistic   views   of   the   problem,   it   is   important   to   stress   that  bioinformatics   software   has   been   developed   under   a   strong   time   pressure,   given   the   rapid  changes   in   technology,   and   several   codes   are  not   open-­‐source   and   can   only  be   optimised   by   the  code   owners.   Furthermore,   they   evolve   very   rapidly,   generating   serious   problems   for   program  optimisation  following  standard  working  procedures  in  the  computer  science  community.  Given  the  large   amount   of   code   available,   an   interesting   alternative   could   be   the   development   of   more  efficient  compilers  for  script/high-­‐level  languages.  

5.4.2 Systems  Biology    One   of   the  main   challenges   here   is   the   reverse   engineering   of   the  biological  networks  operation  in  normal   cells   (of  all   types)  and   the   identification  of   intercellular   communication  networks  which  are   responsible   of   the   functioning   of   multicellular   organisms.   This   is   a   first   step   towards   a   full  understanding  of  the  impact  that  external  perturbations  can  have  on  biological  systems  and,  in  turn,  to   explain   complex   human   diseases.   Current   applications   dealing   with   the   ‘Omics’   (proteomics,  metabolomics,   etc.)   generally   require   more   system   memory   than   intense   CPU   usage.   Extensive      information       retrieval       and       database       operations       constitute       the       layer  underlying   systems  biology.  Problems  related  to  data  handling,  data   integrity  and  confidentiality   are   all   important   in  this   area.   Model   reconstruction   and   engineering  will    require   the   integration    of   different    levels    of   granularity   from    coarse-­‐grained  models  to  detailed  ones.  Each  specific  application  will  have  its  own   requirements,   ranging   from   easy   parallelisable   code   to   highly   integrated   algorithms.   The  considerations   related   to   temporal   modelling   and   simulation   of   fluctuations   will   add   additional  levels   of   complexity.   A   central   repository   of   data   with   distributed   hubs   across   Europe   will   be   a  major   requirement   of   systems   biology.   Participation   of  major   bioinformatics   initiatives   in   Europe  (such  as  ELIXIR)  in  the  definition  of  exascale  requirements  is  judged  as  very  important  by  this  panel.  

5.4.3 Molecular  Simulation    Examples  of  grand  challenges  we  will   face  in  the  future  are:  (i)  simulations    of    biological     systems    that     are     thousands     of     times     larger     than     those   possible   today   (e.g.   realistic   cell   membrane  models,   including  drug  permeation  and  binding),   (ii)  simulations   that  are  thousands  of  times  more  computationally   complex   than   those   possible   today   (e.g.   quantum   simulations   of   biomolecules),  and   (iii)   simulations   that   cover   timescales   thousands   of   times   longer   than   those   possible   today.  However,   in   the   long   term,   the   real   challenge   in   the   field   will   be   the   multiscale   simulation.  Structural  genomics  initiatives  are  beginning  to  encompass  many  of  the  important  organisms,  while  proteomics   initiatives   increase   our   knowledge   of   the   structural   space   of   drug-­‐targets.   Massive  sequencing   projects,   transcriptomics   and   functional   genomics   are   deciphering   the   molecular  mechanism  of   cellular   action,   and  a  variety  of   spectroscopic   techniques  are  providing  a  picture  of  how   tissues   and  organisms  work   (see  Figure  5.4).  The  multiscale  simulation  will   integrate  multiple  simulations   layers   in   different   scales   to   reach   a   unified   vision   of   living   systems   (from   atoms   to  tissues).  There   is  an  obvious   complexity   in  merging  such   techniques   that  will  not   have  necessarily  

Page 109: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe     Life  Sciences  and  Medicine  

  109  

the  same  hardware   requirements   (memory,  disk   space,  processor,  etc.).  The  exascale  systems  will  need  to  integrate  this  multiscale  scenario  providing  a  simple  user-­‐interface.  

5.4.4 Biomedical    Simulation      The     simulation     of     complete     organs     is     a     frontier     of  biocomputation.  These  simulations  are  characterised  by:  (i)  a  very  large,  highly  heterogeneous  state  space;  (ii)  multi-­‐level  modelling  at  the  molecular,  sub-­‐cellular,  cellular,  tissue  and  organ   levels;   (iii)  multiple  timescales  (from  picoseconds  to  years);  and  (iv)   structural   plasticity.   Handling   very   large   volumes   of   state   data  will   require   new  techniques   for:   (i)   data  management;   (ii)   collaborative   interactive   visualisation;   (iii)   computational  steering  of   simulations;   (iv)   real-­‐time  monitoring  of   performance;   (v)   run-­‐time   switching  and   load  balancing   between  models   at   different   levels   of   abstraction;   and   (vi)   coding   of   parallel   tasks   and  processes.   Bandwidth   and  memory   capabilities   are   growing  more   slowly   than   flops   and   they   are  constrained   by   energy   consumption.   It   is   currently   expected   that   early   Eflop/s   machines   will  provide   no  more     than     0.1     EBytes     of     memory     and     this     may     be     insufficient     for     flagship  simulations   (e.g.  whole  brain  simulation).  Tissue  simulation  will  benefit   from  heterogeneous  CPUs,  i.e.  CPUs   that  combine  complex   cores   (useful   for   subcellular   simulation)   with   larger   numbers   of  smaller   cores   (ideal   for   the   cellular   level).   In  current  supercomputing,  some  compute-­‐intensive  processes  (e.g.  visualisation,  data  analysis)  are  run  offline  on  specialised  machines,  a  situation  that  will   be   impractical   in  exascale  environments.   It   is  therefore  important  that  these  processes  should  be  executed   in   situ.   More   generally,   reducing   data   flow   will   require   new   approaches   to  I/O  that  avoid  large  movements  of  system  memory  to  disk.  

Page 110: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

110  

 

6 ENGINEERING SCIENCES AND INDUSTRIAL

APPLICATIONS

6.1 Introduction64  

Engineering   sciences   represents   a   major   technological   innovator   within   the   European   Community  and   contributes   substantially   to   its   economic   success.   Applications   in   industry   are   underpinned  by  research   in   the   field   of   engineering   science.   Consequently,   industries   of   all   kinds   are   facing  opportunities   and   challenges   driven   by   the   application   of   high-­‐performance   computing   (HPC).   The  efficient   use   and   successful   exploitation   of   modern   HPC   will   therefore   play   a   significant   role   in  delivering  increased  understanding  of  realistic  engineering  problems  through  high-­‐fidelity  modelling,  simulation   and   optimisation.   However,   although   European   engineering   companies   have   achieved  remarkable  success,  the  computational  community  remains  fragmented.  

The   topics   covered   by   computational   engineering   are   extremely   diverse   and   cover,   for   example,  aeronautical  engineering,  automotive  engineering,  civil  engineering,  oil  and  gas  exploration  (seismic)  and   engineering   (multi-­‐fluids   flows,   etc.),   chemical   processing,   nuclear   engineering,   biological   and  medical   engineering.   This   includes   research   to   understand   and   predict   turbulent   fluid   flow,  multiphase  flow,  turbulent  combustion,  fluid–structure  and  fluid–material  interactions  and  structural  failure,  and  the  integration  of  these  tools  into  robust  optimisation  schemes  for  product  design.  Other  industrial  applications,  for  example  in  the  fields  of  chemistry  and  biology,  are  covered  elsewhere  in  this  report.  Many  of  these  fields  have  interlinked  challenges  such  as  energy  –  we  need  to  develop  a  better  understanding  of   renewable  energy  sources.  Simulation   is  especially  critical   to  analysing   the  socio-­‐economic  impact  and  any  potential  environmental  consequences.  

Examples  of  cost  and  efficiency  savings  possible  by  the  use  of  PRACE  large-­‐scale  resources  in  industry  in  the  period  2012–2020  include,  among  many  others:    

• Improved  efficiency  of  energy  conversion  in  gas  turbines    • Reduction  in  the  number  and  cost  of  wind  tunnel  or  crash  tests  associated  with  aircraft  and  

automotive  design    • Development  of  new  energy-­‐efficient  design  of  cars  and  traffic  systems    • Reduction   in   the   number   of   unsuccessful   wells   drilled   by   use   of   more   accurate   seismic  

analysis    • Reduction  of  environmental  pollution  and  noise  

The  engineering   community   is   not   as   organised   as  other   scientific   communities.   In   contrast   to   the  scientific  disciplines,   there  are   few   ‘community’   codes,   and   institutions  make  use  of  both   in-­‐house  and   commercial   software.   Examples   of   advances   in   engineering   science,   underpinning   further  industry  development,  are  often  carried  out  by  collaborations  between  industry,  research  institutes  and  universities.    

The   broad   objective   of   the   PRACE   engineering   working   group   is   to   identify   challenges   and  bottlenecks  and  develop  high-­‐fidelity  software  for  informing  critical  design  and  operational  decisions.  This  will   require   an  understanding  of   hardware   and   software  HPC   trends,   such   as   Intel’s   emerging  multi-­‐core   technology   or   the   use   of   GP-­‐GPU   architectures.   The   panel   has   identified   a   number   of  common  issues  in  exploiting  current  petascale  resources.  While  many  codes  are  known  to  scale  well  to   hundreds   of   thousands   of   cores,   subsequent   analysis   requires   the   associated   data   storage   and                                                                                                                            64  Uli  Rüde,  Neil  Sandham,  Dave  Emerson  

Page 111: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

111  

network  bandwidths  to  scale  at  approximately  the  same  rate.  There  are  concerns  that  tools  for  post  processing,   remote  visualisation  and  valorisation  are  not   suitable   for   the   largest  data   sets   that  will  occur.  

For  many  engineering  applications,  there  are  generally  three  distinct  stages:  1. Pre-­‐processing  (creating  the  computational  mesh)  2. Solution  (discretising  equations  and  implementing  the  numerical  algorithm)  3. Post-­‐processing  (analysing  and  displaying  numerical  results)  

A  lot  of  time  and  effort  has  been  invested  in  developing  efficient  numerical  solver  algorithms.  What  now   has   to   be   considered   is   the   challenge   of   getting   these   developments   to   scale   up   to   many  thousands  of  processors,  something  that  has  received  only  limited  attention.  

The  pre-­‐processing   stage  often   requires   the   creation  of  a  good  quality   computational  mesh   that   is  crucial  to  the  success  of  any  grid-­‐based  numerical  algorithm.  However,  none  of  the  available  tools  is  capable   of   generating   the   size   of  meshes   necessary   to   exploit   hardware   using   100,000   cores   and  beyond.  This  challenge  can  be  tackled  in  two  main  ways:  (i)  the  first  would  be  to  investigate  parallel  grid   generation;   (ii)   the   second   consists   of   adaptive   resolution   techniques,   including  mesh   and/or  order  adaptation.  In  both  cases,  load  balancing  becomes  an  important  issue.  

The  use  of  automated  optimisation  chains  for  large-­‐scale  problems  poses  additional  practical  issues,  both  on  job  scheduling  and  external  connectivity  of  the  HPC  facility.  An  optimisation  chain  is  driven  by  a  central    optimiser,  launching  autonomously  a  large  number  of  computations  with  a  certain  need  for   synchronisation.   Moreover,   such   optimisation   chains   are   based   on   the   interaction   with   a  parameterised  CAD  description  of   the   geometry,  manipulated  with   industrial   standard   tools  which  are  simply  not  available  on  HPC  architectures.  This  means  that  either  these  tools  need  porting,  or  a  connection  to  a  standard  (mainly  Windows-­‐based)  computer  is  required.  

Data  analysis  of  results  obtained  from  a  current  Pflop/s  or  an  upcoming  Eflop/s  computer  presents  some  formidable  challenges.  Again,  like  pre-­‐processing,  it  may  not  have  received  the  attention  but  is  clearly   going   to   play   an   important   role   in   interpreting   the   data   produced.   Visualisation   (perhaps  remote)  will  be  the  key  to  understanding  the  large  amounts  of  data  generated,  and  more  research  is  needed   to  develop   intelligent   feature  extraction  algorithms,  but  again   currently   available   tools   are  not   suitable   for   the   challenges   of   exascale.   More   generally,   future   massive   multiscale   and   multi-­‐physics  simulations  will  generate  a  deluge  of  data  (rough  data  and  its  associated  metadata)  and  there  is  a  need  to  redevelop  (or  re-­‐invent)  the  post-­‐processing  toolchain  in  order  to  facilitate  data  mining  on  huge  and  heterogeneous  data.  Convergence  with   ‘big  data’  methodologies  already  used   in  web  data  mining  is  to  be  expected.    

A   further   challenge   facing   engineering   is   code   coupling.   In  multi-­‐physics   applications,   where   we  need   to   couple   continuum-­‐based   software   such   as   structural  mechanics,   acoustics,   fluid   dynamics  and  thermal  heat  transfer,  this  is  required  in  a  horizontal  fashion.  For  large  numbers  of  cores,  with  a  complex   memory   and   accelerator   hierarchy,   much   work   needs   to   be   done.   In   addition,   there   is  growing  interest  in  coupling  codes  in  the  vertical  direction  (multiscale  models),  i.e.  from  continuum  to  mesoscale   to  molecular  dynamics   to  quantum  chemistry.  This   requires  bridging   length  and   time  scales  that  span  many  orders  of  magnitude.  

Further  developments  with  potentially  high  impact  on  computational  engineering  include  the  use  of  HPC   systems   for   interactive   computational   steering   that   requires   interactive   behaviour   and  correspondingly   fast   response   times   for   the   simulation.   Even   beyond   this   are   real-­‐time   and  embedded  simulations  and  immersive  virtual  reality  techniques.  For  example,  the  control  systems  for  a  large-­‐scale  power  plant  are  being  designed  and  developed  before  the  plant  itself  is  operational  by  using  a  real-­‐time  HPC  simulator.  Similarly,  the  real-­‐time  simulator  can  be  used  for  training  the  plant  operators   for   dangerous   operation   modes   and   emergencies.   As   another   example,   real-­‐time   HPC  simulators   are   being   developed   into   new   types   of   diagnostic   tools   in   medical   engineering.   For  example,  blood  flow  simulators  can  be  used  for  operation  and  therapy  planning.  

Page 112: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

112  

In  common  with  many  of  today’s  scientific  disciplines,  the  majority  of  the  numerical  algorithms  used  to  solve  the  problem  have  been  successfully  parallelised  using  MPI.  However,  the  new  generation  of  heterogeneous  many-­‐cores  present  formidable  challenges  to  engineering  software,  which  has  been  developed   and   validated   over   many   years.   Here   the   software   engineering   methodology   for   high-­‐performance   scientific   and   engineering   codes   is   critically   underdeveloped   and   standards   are  necessary  in  order  to  secure  future  software  developments.  

Community   activities   are   also  needed   to  equip   routinely   engineering   students  with   the   knowledge  and  confidence  to  apply  HPC  in  their  industrial  careers  and  an  outreach  activity  to  spread  the  use  of  HPC  out   from   the   core   areas   into  new  areas   such  as   biomedical   engineering,  where   there  may  be  significant  potential  for  HPC-­‐inspired  research  and  development.  

6.2 Computational  Grand  Challenges  &  Expected  Outcomes  in  Engineering    

6.2.1 Turbulence65    Turbulence  is  one  of  the  most  important  unsolved  problems  in  classical  mechanics.  Virtually  all  flows  faster  than  a  few  metres  per  second  or  larger  than  a  few  centimetres  are  turbulent,  including  most  cases  of   interest   in   industry,   and  practically   all   atmospheric,   oceanic   and  astrophysical   flows.   Even  the  flow  in  the  largest  human  arteries  can  become  turbulent.  When  doctors  detect  a  ‘heart  murmur’,  they  are  listening  to  the  noise  of  turbulence.  From  the  engineering  point  of  view,  turbulence  can  be  favourable  or  deleterious.  Turbulent  mixing  allows  combustion  to  proceed  efficiently  in  power  plants  and   aircraft   engines,   but   turbulent   drag   is   responsible   for   much   of   the   energy   spent   in  transportation.  Most  of  the  pressure  drop  in  large  water  mains  or  in  oil  and  gas  pipelines  is  turbulent  dissipation,  and  roughly  half  of  the  drag  of  aircraft  is  turbulent  skin  friction.  About  10%  of  the  energy  use  in  the  world  is  spent  overcoming  turbulent  friction.  

Turbulence   theory   has   long   been   a   theme   of   engineering   research,   mostly   through   theory   and  experiments,   but   it   has   made   large   strides   in   recent   years   because   of   the   influence   of  supercomputing.   Simulations   are   basically   experiments   by   different  means,   but   they   offer   at   least  two  key  advantages:  they  provide  better  control  of  the  experimental  conditions,  including  some  that  cannot  be  created  otherwise,  and  they  result  in  essentially  complete  databases.  On  the  other  hand,  they   are   expensive.   Turbulence   is   characterised   by   many   degrees   of   freedom,   measured   by   the  Reynolds   number,   which   imply   large   computational   grids.   Present   research   simulations   routinely  have   Reynolds   numbers   of   a   few   thousands,   involve   1010   grid   points,   and   run   over   hundreds   of  millions  of  CPU  hours   in  O(105)  processors.  Turbulence  was  explicitly  mentioned   in   the   first  PRACE  Scientific  Case  as  one  of  the  necessary  underpinnings  of  engineering  research,  and  it  continues  to  be  so   today.   Likely   future   trends   were   predicted   to   be   the   simulation   of  more   complex   and   realistic  flows,   and   the   increase   in   the  Reynolds   numbers   of   canonical   ones.   Both  have   taken  place.  Direct  Numerical  Simulations  (DNS,  using  no  models),  which  centred  on  simple  turbulent  channels  five  years  ago,  have  turned  to  jets  and  boundary  layers,  which  are  much  closer  to  real-­‐life  applications,  and  the  trend  towards  ‘useful’  flows  is  likely  to  continue.  The  Reynolds  numbers  have  increased  by  a  factor  of  roughly  five,  implying  a  work  increase  of  three  orders  of  magnitude.  It  is  interesting  that  this  increase  has   taken   place   with   relatively   little   degradation   of   computational   efficiency,   and   that   many  landmark   simulations   have   been   performed   by   European   researchers.   Even   if   the   European  turbulence  community  has  traditionally  been  strong,  its  current  prominence  in  computation  was  far  from  being  assured  five  years  ago.      

                                                                                                                         65  Javier  Jiménez,  Philipp  Schlatter,  Roel  W.  C.  P.  Verstappen  

Page 113: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

113  

Another  development  has  been  the   improvement  of   large-­‐eddy  simulation  (LES)  models,  which  are  an  intermediate  level  of  detail  between  full  modelling  and  direct  simulation  of  turbulence.  They  hold  the   best   promise   for   practical   turbulence   simulations   in   the   future,   although   boundary   conditions  continue   to  be  a  problem.  Many  of   the   theoretical  developments  have  also   taken  place   in  Europe,  and  owe  a   lot   to   the  new  higher-­‐Reynolds  number  numerical   data   sets   against  which   they   can  be  tested.  

On  the  more  applied  side,  the  most  impressive  results  have  probably  originated  from  the  US,  such  as  the   computation   of   an   entire   jet   engine   at   Stanford.   That   simulation   used   a   combination   of  modelling,   LES   and   DNS,   and   centred   on   interfacing   the   various   simulation   levels,   rather   than   on  physical   accuracy.   There   have   been   few   comparable   European   programs,   which   are   nevertheless  important   if  HPC   is   going   to   fulfil   its   promise   in   the   application  of   turbulence   research   to   the   real  world,  particularly  in  the  extension  of  LES  to  real  industrial  cases.  

On   the   other   hand,   even   the   favourable   situation   just   described   can   only   be   considered   an  intermediate   stage   in   turbulence   research.   There   is   a   tentative   consensus   that   a   ‘breakthrough’  boundary  layer  free  of  viscous  effects  requires  Reynolds  numbers  of  the  order  of  Reτ  =  10,000,  which  are   lower   than  many   industrial   applications,   but   five   times   higher   than   present   simulations.   That  implies   computer   times   1,000   longer   than   present   (Re4),   and   storage   capacities   150   times   larger  (Re3).  Keeping  wall  times  constant  implies  increasing  processor  counts  from  the  present  O(32  Kproc)  to   O(32   Mproc),   which   will   require   rewriting   present   codes   but   is   probably   not   insurmountable.  Turbulent   simulations  have   scaled  correctly   for  25  years,   from  single  processors   to  O(105).   Storage  might   be   a   tougher   problem.   Turbulence   research   requires   storing   and   sharing   large   data   sets,  presently  O(100  TBytes)  per  case,  and  becoming  O(20  PBytes)  within  the  next  5–10  years.  Archiving,  transmitting  and  post-­‐processing  those  data  will  require  work,  but  the  rewards  in  the  form  of  more  accurate  models,  increased  physical  understanding,  and  better  design  strategies  will  grow  apace.    

6.2.2 Combustion66    Combustion  has  a  strong  impact  on  environment  (greenhouse  gases,  pollutant  emissions,  noise)  but  represents   more   than   80%   of   energy   conversion   worldwide   and   is   essential   for   ground   and   air  transportation,   electricity   production   and   industrial   processes   and   is   involved   in   safety   (fires   and  explosions).   The   central   position   of   combustion   in   our  world  will   not   decrease   in   the   near   future.  Science,   and   especially   numerical   simulations,   is   mandatory   to   promote   its   highest   efficiency   use  with  the  lowest  impact  on  climate.  The  objective  of  combustion  studies  is  to  better  understand  and  model   physical   phenomena   to   optimise,   for   example,   gas   turbines   (aero-­‐engines   or   power  generation),   internal   combustion  engines  or   industrial   processes   in   terms  of   costs,   stability,   higher  efficiency,  reduced  fuel  consumption,  near-­‐zero  pollutant  emissions  and  low  noise,  or  to  help  in  fire  prevention  and  fighting.  Computational  Fluid  Dynamics  (CFD)  offers  to  design  engineers  the  unique  opportunity   to  develop  new   technical   concepts,   reducing  development  costs  by  avoiding  extensive  and  very  expensive  experimental  campaigns.  From  an  economic  point  of  view,  industrial  companies  involved  in  propulsion  and  energy  systems  are  among  the  biggest  employers  in  the  European  Union.  To   give   them  more   efficient   and   cost-­‐effective   system   designs   is   crucial   support   to   promote   their  competitiveness  on  the  worldwide  market.  

Scientific   challenges   in   combustion   are   numerous.   First,   a   large   range   of   physical   scales   should   be  considered   from   fast   chemical   reaction   characteristics   (reaction   zone   thicknesses   of   about   tens   of  millimetres,   10-­‐6   s),   pressure   wave   propagation   (sound   speed)   up   to   burner   scales   (tens   of  centimetres,   10-­‐2   s   resident   times)   or   system   scales   (metres   for   gas   turbines,   kilometres   for   forest  fires).  Turbulent  flows  are,  by  nature,  strongly  unsteady.  Chemistry  and  pollutant  emissions   involve  hundreds   of   chemical   species   and   thousands   of   chemical   reactions,   and   cannot   be   handled   in  numerical   simulations   without   adapted   models.   Usual   fuels   are   liquid,   storing   a   large   amount   of  

                                                                                                                         66  Denis  Veynante  and  Stewart  Cant  

Page 114: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

114  

energy   in   small   volumes   (about   50   MJ/kg).   Accordingly,   two-­‐phase   flows   should   be   taken   into  account   (fuel   pulverisation,   spray   evolution,   vaporisation,  mixing   and   combustion).   Solid   particles,  such   as   soot,   may   also   be   encountered.   Interactions   between   flow   hydrodynamics,   acoustics   and  combustion   may   induce   strong   combustion   instabilities   (gas   turbines,   furnaces)   or   cycle-­‐to-­‐cycle  variations  (piston  engines),  decreasing  the  burner  performances  and,  in  extreme  cases,  leading  to  the  system  destruction  in  short  times.  Control  devices  may  contribute  to  avoid  these  instabilities  either  based   on   passive   (geometry   changes,   Helmholtz   resonators)   or   active   (actuators)   techniques.   The  design   of   cooling   systems   requires   the   knowledge   of   heat   transfer   to   walls   due   to   conduction,  convection  and  radiation  as  well  as  flame/material  interactions.  

Fire   simulations   are   probably   today   less   mature   than   gas   turbine   or   internal   combustion   engine  computations  but  predictions  in  terms  of  safety,  prevention  and  fighting  are  challenging.  Forest  fires  regularly   strongly   affect   south   European   countries   and,   because   of   climate   change,   may   concern  northern   regions   in   the   future.   Their   social   impact   is   very   important   (land,   buildings,   human   and  animal   life,   agriculture,   tourism,   economy).   Forest   fires   involve   a   very   large   range   of   spatial   and  temporal   scales.   Chemical  mechanisms   are   especially   complex   (the  wood   pyrolysis   depends   on   its  nature,  moisture   and   involves   numerous   chemical   species).   Forest   fires   are   strongly   controlled   by  long-­‐distance   radiative   heat   transfer,   generally   neglected   in   ordinary   combustion   computations.  Buoyancy   effects   (large-­‐scale   flames)   as   well   as   interactions   with   the   local   meteorology   (winds,  moisture)  and  the  local  topography  (hills,  valleys,  etc.)    should  also  be  taken  into  account  and  need  adapted  models  when  these  features  are  not  relevant  in  burners.  The  simulation  of  the  fire  fighting,  for   example   by   dropping   fluids  without   or  with   retardant,   is   also   a   challenging   research   of   crucial  importance.  

A  related  area  concerns  accidental  explosions  in  industrial  process  plant  caused  by  leaked  clouds  of  flammable   gas   or   vapour.   Simulation   technology   in   this   area   is   widely   used   for   safety   case  assessment,  but  the  accuracy  suffers  from  the  very   large  range  of  scales  that  must  be  represented.    The   chemistry   need   not   be   represented   in   full   detail,   but   the   coupling   of   flow,   turbulence   and  combustion   is   strong   and   complex.   This   can   lead   to   devastating   consequences,   and   reliable  suppression  methods  are  required.  

High-­‐end  high-­‐performance  computing  systems  give  the  opportunity  of  aggressive  research  to  allow  the   use   of   combustion  with   the   highest   efficiency   and   the   lowest   impact   on   climate.   Combustion  simulations  will  combine  three  methodologies:  

• Direct  numerical  simulations  (DNS)  are  very  high-­‐fidelity  computation  without  modelling  for  turbulence  (all  the  relevant  flow  scales  are  explicitly  computed).  Because  of  their  computational  cost,  they  are  limited  to  small  cubic  domain  and  low  turbulence  Reynolds  numbers  but  are  the  best  workhorse  today  to  reveal  the  internal  structure  of  turbulent  flames,  understand  propagation,  extinction,  ignition,  pollutant  formation  and  new  combustion  regimes  (homogeneous,  or  ‘flameless’,  combustion)  and  devise  combustion  models.  Exascale  machines  will  give  access  to  configurations  and  operating  conditions  close  to  realistic  laboratory  turbulent  burners.67    

• Large  eddy  simulations  (LES),  where  the  largest  flow  motions  are  explicitly  computed  while  only  the  effects  of  the  small  ones  are  modelled,  are  more  relevant  to  compute  and  analyse  unsteady  flows  in  larger  domains  of  realistic  shapes  under  practical  operating  conditions  as  encountered  in  gas  turbine  chambers,  piston  engines  and  industrial  furnaces.  LES  have  revolutionised  the  field  of  numerical  combustion  in  the  last  20  years  by  bringing  almost  DNS-­‐like  capacities  to  actual  industrial  systems.  Examples  include  the  ignition  of  a  helicopter  combustion  chamber,  unstable  modes  of  an  industrial  gas  turbine,  and  cycle-­‐to-­‐cycle  piston  

                                                                                                                         67  DNS  is  the  topic  of  the  Combustion  Co-­‐design  Center  initiated  in  2011  by  J.  Chen  and  J.  Bell  at  Sandia  National  Laboratories  (USA).  Multiple  groups  in  Europe  have  the  capacity  to  develop  such  competitive  research  programmes.  

Page 115: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

115  

engine  variations  –  all  have  been  simulated  on  national  (Tier-­‐1)  or  European  (Tier-­‐0  of  PRACE)  machines.68  Today,  industry  relies  on  and  invests  in  LES  to  compute  multiple  phenomena  that  are  beyond  the  capacities  of  existing  classical  RANS  codes  available  in  companies  (see  below).  European  groups  are  leaders  in  this  field  while  their  LES  combustion  codes  are  recognised  as  the  most  advanced.  

• Reynolds  Averaged  Navier  Stokes  (RANS)  remains  the  standard  approach  used  within  the  energy  industry.  It  allows  for  the  inclusion  of  complex  industrial  geometries  together  with  complex  physics  at  a  level  of  computational  cost  that  can  be  tolerated  within  the  engineering  design  cycle.    Larger,  more  advanced  and  more  frequent  RANS  calculations  are  being  carried  out  in  order  to  explore  the  design  space  for  novel  clean  combustion  systems.  A  strong  coupling  exists  between  DNS,  LES  and  RANS  whereby  data  and  modelling  insight  move  from  DNS  through  LES  to  RANS,  while  questions  and  new  challenges  move  the  other  way.        

To   compute   only   the   reacting   flow   within   the   combustion   chamber   is   not   sufficient   and   multi-­‐physics/chemical   phenomena   must   be   coupled   in   the   simulations.   For   example,   in   a   gas   turbine,  simultaneous   simulations   of   the   combustion   chamber,   the   compressor   (feeding   the   chamber)   and  the  turbine  (fed  by  the  chamber)  are  needed.  Flame/wall   interactions  should  be  taken  into  account  in  terms  of  heat  transfers,  flow/structure  and  flame/material  interactions  to  design  cooling  systems  and  control   system   lifetime.  The  noise  emitted  by   the  combustor,  as  well  as   its  perception  at   long  distances,  must   also   be   computed.   These   various   phenomena   are   generally   described   by   different  codes69   that   should   run   together  on  a  massively  parallel  machine  and  exchange  data.  They   lead   to  new   challenges   in   terms   of   load   balancing   and   simulation   control   but   also   in   terms   of   physical  phenomena  coupling  and  model  compatibilities.  

6.2.3 Aeroacoustics70    In   the   development   of   new   aircraft,   engines,   high-­‐speed   trains,   wind   turbines   and   so   forth   the  prediction   of   the   flow-­‐generated   acoustic   field   becomes   more   and   more   important   since   society  expects  a  quieter  environment  and  the  noise  regulations  –  not  only  near  airports  –  become  stricter  every  year.  Considering  what  has  been  achieved  in  the  field  of  computational  aeroacoustics  over  the  last  10  years,  it  is  evident  that  the  future  of  noise  prediction  and  one  day  even  noise-­‐oriented  design  belongs  to  the  unsteady  three-­‐dimensional  numerical  simulations  and  first  principles.  However,  the  contribution  of  such  methods  to  industrial  activities  in  aerospace  seem  to  be  years  away,  i.e.  they  lag  behind   the   contributions  of   computational   fluid  dynamics   to   the  design  of,   for   example,   airframes  and   gas   turbines.   Certification   often   depends   on   a   fraction   of   a   dB,   whereas   presently   predicting  noise   to   within,   say,   2   dB   without   adjustable   parameters   is   without   doubt   impressive.   Generally,  industry  uses  database  methods  which  chronically  leave  significant  uncertainties  leading  up  to  flight  tests  with  serious  business  consequences.  And  model  tests  are  not  at  small  scales  –   in  the  order  of  1/10  –  reliable   to  a   fraction  of  a  dB.  The  extra  difficulty   in  simulation  aimed  at  engine,  airframe  or  combustion  noise  is  due  to  the  very  wide  range  of  chemical,  turbulent,  acoustic  and  geometric  scales  which  are  defined  by   the   configuration,   the   thin  wall-­‐bounded  and   free   shear   layers,   the   chemical  layers  and  the  audible  range  of  sounds.    

                                                                                                                         68  Note  that,  because  of  their  very  high  computational  costs,  these  simulations  are  still  limited  (a  few  tens  of  cycles  in  internal  combustion  engines  for  a  given  regime)  and  sometime  unique  (ignition  of  a  helicopter  combustion  chamber).  Exascale  machines  will  give  access  to  longer  physical  times  and  repeated  simulations  with  different  designs,  operating  conditions  or  model  constants  to  quantify  the  overall  sensitivity  to  these  parameters  and  optimise  practical  systems.  

69  The  structure  of  the  code  may  strongly  depend  on  the  related  physical  phenomena.  For  example,  combustion  involves  balances  over  small  volumes  and  domain  splitting  is  retained  for  parallelisation.  On  the  other  hand,  radiative  heat  transfers  are  controlled  by  long-­‐distance  interactions  and  are  more  likely  parallelised  by  wavelengths  and/or  radiation  direction.  

70  Wolfgang  Schröder  

Page 116: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

116  

The   state  of   the   art   is   limited   to   simplified   components   or   geometries  which   can  be   tackled  using  manually  generated  structured  meshes,   in  contrast  to  the  systems  actually   installed,  which  need  to  be  simulated,  most  probably  by  adaptive  unstructured  body-­‐fitted  or  Cartesian  grids.  The  latter  can  be   decomposed   into   an   arbitrary   number   of   blocks   such   that   the   computations   can   be   done   on  massively  parallel  machines  in  the  Eflop/s  range  and  higher.    

Such  machines  are  essential  for  solving  the  aeroacoustics  problems  not  only  on  a  generic  but  on  an  industrial   scale,   i.e.   a   complete   wing   in   high-­‐lift   configuration,   a   full   landing   gear,   a   combustion-­‐chamber–turbine-­‐jet  configuration,  at  such  a   level  of  efficiency,  reliability,  and  accuracy  that  a   low-­‐resolution  design  can  be  achieved.  

Consider  the  acoustic  analysis  of  the  noise  generated  by  a  full  landing  gear  at  a  Reynolds  number  that  is   still   two  orders  of  magnitude  below   the   real   flow  condition.  To  determine   the  noise   source,   the  turbulent  flow  field  has  to  be  simulated.  This  requires  a  mesh  in  the  range  of  tera  cells  and  storage  in  the  PByte  range.  Economically,  such  an  analysis  is  completely  out  of  range  today,  and  multi-­‐petascale  and  then  exascale  computers  are  needed  in  the  next  three  to  five  years  to  make  such  computations  feasible.  To  tackle  problems  in  the  real  Reynolds  number  range,  the  next  generation  of  computers  is  necessary.  

6.2.4 Biomedical  Flows71    Surgical  treatment  in  human  medicine  can  be  optimised  using  virtual  environments,  where  surgeons  perform  pre-­‐surgical   interventions  to  explore  best  practice  methods  for  the  individual  patient.  The  treatment  of   the  pathology   is   supported  by  analysing   the   flow   field,   for  example  optimising  nasal  cavity  flows  or  understanding  the  development  of  aneurysms.  The  computational  requirements  for  such   flow   problems   have   constantly   increased   over   recent   years   and   have   reached   the   limits   of  petascale  computing  in  the  sense  not  only  of  computational  effort  but  also  of  required  storage.  It  is  vital   to   understand   fully   the   details   of   the   flow   physics   to   finalise   the   derivation   of   medical  pathologies  and  to  propose,  for  instance,  shape  optimisations  for  surgical  interventions.  Such  an  in-­‐depth  analysis  can  be  obtained  only  by  a  higher  resolution  of  the  flow  field,  which  in  turn  increases  the  overall  problem  size.    

It  goes  without  saying  that   it   is  very   important   in  biomedical   flows  to  resolve  highly  wall-­‐bounded  shear   layers   to   understand   fully   the   influence  of   the   flow  on   the   tissue   causing   irritations.   This   is  done  by  an  accurate  computation  of  the  wall-­‐shear  stresses.  In  this  context,  the  wall  heat  flux  also  needs  to  be  considered,  requiring  not  only  a  high  resolution  close  to  the  highly  intricate  geometry  of  the   wall   but   also   a   highly   resolved   computational   mesh   representing   the   deformable   tissue.   A  coupled  solution  approach  is  required  to  compute  such  fluid-structure  interaction  problems,  which  again   increases   the   considered   problem   size   and   the   computational   effort,   i.e.   it   necessitates  exascale   computing.   Moreover,   to   determine   the   transitional   flow,   direct   numerical   simulations  have  to  be  performed  to  correctly  capture  time-­‐dependent  spatial  flow  structures  such  as  evolving  vortices,   recirculation   zones,   separated   flow,   and   mixing   layers   as   they   appear   (e.g.   during   the  respiration  phase  in  human  lungs).  The  miniaturisation  and  heating  of  the  flow  is  strongly  coupled  to  the   formation   of   droplets   caused   by   condensation   at   inhaled   particle   surfaces.   In   this   context,  understanding   the   transport,   coagulation   and   collision   of   millions   of   particles   from   micro-­‐   to  nanometer   scale   is   extremely   important.   The   aspect   of   particle   transport   is   also   essential   to  understand   particle   deposition   at   nasal   drug   delivery   with   sprays   and   of   diesel   aerosols   in   the  human  lung  which  can  cause  cancer.    

This  predicted  growth  of  the  computational  costs  can  be  only  handled  by  splitting  the  problem  into  subproblems  distributed  over  more  computational  resources.  Such  resources  could  be  provided  by  exascale  computers.  The  current  trend  of  reduced  distributed  memory  under  a  massive  increase  of  

                                                                                                                         71  Wolfgang  Schröder  

Page 117: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

117  

computational  units  will  form  future  HPC  systems,  in  which  highly  reliable  fast  interconnects  need  to  be  implemented  to  deal  with  the  increased  communication  effort  guaranteeing  good  scaling  speed-­‐ups  for  exascale  applications.  Furthermore,  the  additional  overhead  to  perform  particle  simulations  cannot  be  handled  by  current  Pflop/s  computers  but  could  be  simulated  on  exascale  systems.  All  of  these   simulations   need   to   be   performed   under   unsteady   conditions   leading   to   very   high   storage  requirements  that  will  reach  the  exascale  range  and  cannot  be  stored  on  today's  HPC  systems.  

Computations   that   have   to   be   performed   for   the   nasal   cavity   problem   under   high   frequency  conditions   involve  Reynolds  numbers   in  the  range  of  Re  ≈15,000.  The   lung  problem  –  not     just   for  the  upper  bifurcations  but  for  approximately  20  generations  –  leads  to  cell  numbers  in  the  range  of  tera   cells,   a   total   of   about   billion   time   steps   ,   and   storage   requirements   in   the   range   of   a   PByte.  Currently,  such  a  computation  takes  a  few  hundred  days  on  a  multi-­‐Pflop/s  IBM  BlueGene/Q  system  (JUQUEEN).   To   perform   such   an   analysis   in   the   next   couple   of   years   definitely   requires   exascale  computing  power.  Furthermore,  to  tackle  problems  where  the  entire  fluid  and  structure  mechanics  of   the   respiration   system   is   simulated   demands   even   the   next   generation   of   exascale   computers  expected  to  be  available  in  2020.  

6.2.5 Solid  Body,  Mechanical  and  Electrical  Engineering72    

The  design  of   new   structures  with   composite  materials   –  with  or  without   elastomeric  behaviour  –  and  mechanical   structures  –  performing  as  well   at   very   low  and  high   temperatures  –  has   shown  a  very  impressive  improvement  thanks  to  HPC.  In  practice,  the  equations  of  mechanical  deformations  are  very  non-­‐linear  and  not  easily  solvable  on  many  core  computers.  Major  progress   is  expected   in  the  next  few  years  with  exascale  applications.  

Solution  of  auxiliary  problems  (equations,  eigenvalue)  

Hardware   enhancements   leading   to   exascale   by   2020   –   increasing   both   speed   and  memory   some  thousand  times  –  should  enable  the  solution  of  fundamental  auxiliary  problems  for  which  there  are  algorithms  with  asymptotically  linear  complexity.  This  would  include  the  solution  of  systems  of  linear  equations  or  eigenvalue  problems  some  10–100  times  faster,  or  the  treatment  of  problems  that  are  some  10–1,000  times  larger.  Similar   improvements  are  expected  in  the  solution  of  basic   large-­‐scale  state   problems   of   mechanics   and   of   electromagnetic   fields;   this   should   yield   a   similar   impact   in  advanced   computational   engineering   (see   below).   However,   these   goals   will   not   be   accomplished  without   research.   This   will   result   in   related   progress   into   the   solution   which   will   adjust   current  approaches   to   new   architectures,   the   cost   of   operations   depending   on   the   placement   of   the  arguments  in  memory  or  structure  of  the  communication  costs.  

Complex  structures  (larger  problems)  

Emerging  computers  will  enable  more  realistic  modelling  of  complex  structures.  Examples  include:    

• The  transient  analysis  of  complete  engine  with  evaluation  of  the  stress  and  temperature  fields  

• Vibration  analysis  of  relevant  parts  of  power  stations  taking  into  account  the  effect  of  damping  or  non-­‐linear  effects    

• Multiscale  problems  such  as  more  reliable  analysis  of  constructions  with  fibre  composites,  or  modelling  of  the  crash  tests  with  more  realistic  interaction  of  passengers  

Optimal  design  (improved  speed)  

The  typical  goal  of  computation  is  to  improve  performance  of  the  product.  An  increase  in  computer  speed   by   three   orders   of   magnitude   enables   a   thousandfold   enhancement   in   the   resolution   of  

                                                                                                                         72  Zdenek  Dostal  

Page 118: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

118  

problems  on  the  current  edge  of  complexity,  enough  to  switch  from  the  analysis  of  such  problems  to  optimal  design.  This  should  markedly  increase  applicability  of  the  optimal  design  methodology.  

Reliability  of  computations  (improved  speed)  

Most   analysis   that   is   carried   out   today   does   not   take   into   account   uncertainty   of   input   data.   For  example,  the  results  of  stress  tests  are  typically  not  the  stress  limit  but  its  distribution  function.  The  reason  is  that  such  computation  is  considerably  more  time-­‐consuming.  To  switch  from  the  common  deterministic   analysis   to   analysis   which   takes   into   account   uncertainty   of   the   data   would  considerably   increase   the   reliability   of   analysis   and   resulting   decisions.   There   are   engineering  problems   where   the   explicit   analysis   of   uncertainty   is   critical,   such   as   analysis   of   the   deposits   of  radioactive  waste.  Improved  performance  would  also  result  in  improved  reliability  of  computation  by  using  methods  which  require  additional  costs,  such  as  a  posteriori  error  estimates.  

6.2.6 General  Process  Technologies,  Chemical  Engineering73      Chemical   engineering   and   process   technology   are   traditional   users   of   HPC   for   dimensioning   and  optimising   reactors   in   the  design   stage.  Computational   techniques  are  also  used   for   improving   the  operation  of  processes  –  for  example,  through  model  predictive  optimal  control  or  through  inverse  modelling   for   estimating   system   parameters.   The   computational   models   used   in   chemical  engineering   span   a   wide   range   of   scales.   On   the   microscopic   level,   chemical   reactions   may   be  represented  by  molecular  dynamics  techniques,  while  on  the  mesoscopic  level,  flows  through  pores  or  around  an  individual  particle  may  be  of  interest.    The  macroscopic  scale  eventually  considers  the  operation  including  heat  and  mass  transfer  in  a  full  industrial-­‐scale  reactor  or  even  the  operation  of  a  full  facility.  

Usually,   laboratory-­‐scale   reactors   do   not   scale   trivially   to   full   industrial   process   size.   Therefore,  simulation   tools   are   essential   to   avoid   time-­‐consuming   and   expensive   prototype   systems   when  designing  new  processes.  Typical  computational  problems  here  involve  complex  reactive  multiphase  flows.   On   the   process   scale,   these   systems   can   currently   be   represented   only   by   averaging  techniques   and   with   macroscopic   models   that   cannot   capture   the   physics   on   the   microscopic   or  mesoscopic   scale   from   first   principles.   Such   reactors   typically   involve   bubbles,   droplets   and   flow  through   pores.   Additionally,   the   nucleation,   transport   and   agglomeration   of   particles   may   be   of  interest.  Modelling   such   kinds   of   interactions   individually   is   already   difficult   for   single  microscopic  objects.   Currently,   such   models   are   computationally   only   feasible   on   a   small   scale   since   the  computational  power  is  insufficient  to  simulate  larger  ensembles.  Future  systems  will  be  essential  to  bridge  the  scales  better  and  permit  more  detailed  models.    

Exascale  systems  will  permit  a  better  understanding  of  very  dispersed  phenomena  or  very   large  up  (or  down)  scaling  problems,  such  as  aggregates  formation  and  growth,  by  the  development  of  much  improved  particle  simulation  technologies  (LBM,  IBM,  DEM,  SPH,  etc.  )74,  for  example  for  describing  multiscale  interactions  between  fluids  and  structure,  or  fluid–solid  suspension,  interfaces  and  multi-­‐physics  coupling.  

For  the  process  design  and  its  optimisation,  both  now  and  in  the  future,  macroscopic  models  based  on   continuum   descriptions   will   be   used.   However,   macroscopic   models   require   a   closure   of   the  model  equations.  Correct  closure   laws  are  essential   for   the   fidelity  of   the  simulation,  but  currently  they  can  often  only  be  derived  from  empirical  arguments.  The  predictive  power  of  such  macroscopic  models   is   therefore   limited  and  thus  their   industrial  use   is  not  yet  satisfactory   in  many  cases.  With  exascale   computers,   it  will   become   possible   to  model   and   simulate   such   systems  with  much   finer  resolution   and   it   will   become   increasingly   feasible   to   use   more   refined   and   detailed   models.   For  

                                                                                                                         73  Uli  Rüde  74  LBM  –  Lattice-­‐Boltzmann  method;  IBM  –  Immersed  Boundary  Method;  DEM  –  Discrete  Element  Method;  SPH  –  Smoothed  Particle  Hydrodraulics  (SPH)  

Page 119: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

119  

example,   given   exascale   computational   power,   new   methods   in   particle   dynamics   and   discrete  element  methods   can  be   coupled  with   continuum-­‐based  CFD  models   to   simulate  particulate   flows  directly  with  a  full  resolution  of  each  particle.  

We   expect,   for   example,   that   such   multiscale   and   multi-­‐physics   models   will   present   many   new  opportunities   for   the   process   industry,   but   they   critically   depend   on   the   interdisciplinary  development   of   models,   algorithms   and   simulation   software,   and   of   course   on   the   availability   of  sufficient  computational  power   in  the  form  of  future  exascale  systems.  Eflop/s  systems  will  help  to  reduce   the   dependence   on   surrogate  models   and   hence   lead   to  more   and  more   accurate   process  models.  In  some  cases,  exascale  systems  may  allow  the  simulation  of  complex  multiscale  phenomena  at  full  industrial  scales.  

 

6.3 Computational  Grand  Challenges  and  Expected  Outcomes  in  Industry    

Industrial  applications  involved  in  the  field  of  numerical  simulation  needing  next-­‐generation  exascale  systems  are  mainly:  

• Aeronautics:  full  Multidisciplinary  Design  and  Optimisation  (MDO),  CFD-­‐based  noise  and  in-­‐flight  simulation:  the  digital  aircraft    

• Turbo  machines,  propulsion:  aircraft  engines,  helicopters,  etc.  • Structure  calculation:  design  new  composite  compounds,  deformation,  etc.  • Energy:  turbulent  combustion  in  closed  engines  and  opened  furnaces,  explosion  in  confined  

area,  power  generation,  hydraulics,  nuclear  plants,  etc.  • Automotive:  combustion,  crash,  external  aerodynamics,  thermal  exchanges,  etc.  • Oil  and  gas  industries:  full  3D  inverse  waveform  problem  (seismic),  reservoir  modelling,  

multiphase  flows  in  porous  media  at  different  scales,  process  plant  design  and  optimisation,  CO2  storage,  etc.  

• Engineering  (in  general):  multiscale  CFD,  multi-­‐fluids  flows,  multi-­‐physics  modelling,  computer-­‐aided  engineering,  stochastic  optimisation,  etc.    

• Special  chemistry:  molecular  dynamics,  (catalyst,  surfactants,  tribology,  interfaces),  nano-­‐systems,  etc.  

• Others  (bank/finance,  medical  industry,  pharma  industry,  etc.):  ‘Big  data’,  data  mining,  image  processing,  etc.  

• Common  issues  for  all  the  above  include  Data  assimilation,  uncertainty  quantification,  etc.    

6.3.1 Turbo  Machines,  Propulsion75                          Motivation.    Numerical  simulation  and  optimisation  is  pervasive   in  the  aeronautics   industry,  and  in  particular   in   the  design  of   propulsion  engines.   The  main  driving   force  of   technological   evolution   is  substantial   targeted   reductions   of   specific   fuel   consumption   and   environmental   nuisance   –   in  particular   greenhouse  gases,   pollutant  emissions  and  noise  –   as  put   forward  by   regulators   such  as  ACARE  and  IATA.  On  the  engine  side,  these  ambitious  goals  are  pursued  by  increasing  propulsive  and  thermodynamic  efficiency,   reducing  weight  and  finally  controlling  sources  of  noise.  The  targets  can  probably   not   be   achieved   simply   through   gradual   improvement   of   current   concepts.   The  development  of  disruptive  propulsive  technology  is  needed,  relying  even  more  heavily  on  numerical  tools   to   overcome   the   lack   of   design   experience.  We   can   foresee   two  major   challenges   related   to  

                                                                                                                         75  Koen  Hillewaert  

Page 120: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

120  

HPC:  the  use  of  high-­‐fidelity  numerical  tools  towards  a  more  direct  representation  of  turbulence  and  the  evolution  of  optimisation  strategies.  

High-­‐fidelity   aerodynamic   simulation.   Although   jet   engine   design   is   inherently   multidisciplinary,  predicting  the  aerodynamics   is  both  the  most  critical  and  the  most  costly   issue.  A  recent  review  of  the  state  of  the  art  is  given  by  Tucker.76,77  

To  date,   the  Reynolds  Averaged  Navier-­‐Stokes   (RANS)  approach,  modelling   the  ensemble-­‐averaged  impact  of  turbulence  on  the  main  flow,  is  the  main  workhorse.  This  is  mainly  due  to  the  combination  of  acceptable  prediction  accuracy  with   low  computational   cost.  RANS   is   capable  of   simulating   flow  unsteadiness,   such   as   rotor–stator   interactions,   theoretically   if   there   is   a   clear   scale   separation  between  the  turbulence  and  the  computed  flow  features.  The  problem  is  that  this  scale  separation  is  usually  not  guaranteed,  which  is  a  possible  explanation  for  the  discrepancy  between  computed  and  measured   performance.   RANS   also   clearly   fails   to   predict   transitional   flows,   flow   instabilities,  broadband  (and  to  a  much  lesser  extent  tonal)  noise  generation,  combustion  efficiency,  etc.    

At   the   other   end   of   the   spectrum   large   eddy   simulation   (LES)   approaches   represent   the   energy-­‐carrying  scales  of  the  turbulence  directly,  while  using  relatively    simple  models  for  the  more  isotropic  and   universal   non-­‐resolved   scales.   However,   due   to   accuracy   and   resolution   requirements,   the  computational  effort  for  this  approach   is  prohibitive   in  practice  and   is   likely  to  remain  so;  although  the   approach   has   already   been   applied   to   realistic   geometries,   it   is   generally   accepted   that   these  computations  were  not  sufficiently  resolved.69    

Hybrid  approaches,  using  either  RANS  in  the  proximity  of  the  wall70  or  wall  models,  result  in  a  further  significant  reduction  of  the  computational  effort,  but  lead  to  increased  modelling  error,  in  particular  concerning  flow  transition  to  turbulence.  To  date,  no  consensus  has  been  reached  on  whether  these  approaches  are  adequate  and  what  their  optimal  parameterisation  is.    

We  can  expect  that  the  focus  for  the  next  five  years  will  be  on  the  further  development  of  the  LES  and  hybrid  approaches,  thereby  relying  on  direct  numerical  solutions  for  a  better  comprehension  of  the  flow  phenomena  and  as  reference  data.  A  second  axis  concerns  the  reduction  of  computational  effort  using  more  accurate  and  adaptive  discretisation  techniques.  Adaptation  techniques  may  also  prove  a  solution  for  the  mesh  generation,  a  process  that   is   inherently  difficult  to  parallelise.  During  this  period,  access  to  large-­‐scale  resources  is  of  paramount  importance  for  the  development  of  these  methods.  The  recurrent  industrial  use  of  these  accurate  simulation  strategies  to  fans,  isolated  blade  rows  and  stages  is  expected  near  the  end  of  the  decade,  initially  for  final  verification  to  complement  the  optimisation  chain,  and  later  on  integrated  in  the  optimisation  loop.  Given  the  complexity  of  the  flow  and  the  wish  to  reduce  the  modelling  hypotheses  as  much  as  possible,   it   is  expected  that   the  use  of  computational  resources  will  follow  their  availability.  In  any  case,  by  this  time  industry  hopes  to  use  Eflop/s  scale  machines  and  beyond,  at  least  for  a  number  of  urgent  computations.    

Optimisation.  Given   the   level   of   expected   performance,   it   is   clear   that   the   design   challenges   are  quite   daunting   and   therefore   require   the   use   of   automated   optimisation   chains.   These   chains  autonomously  launch  simulations  to  assess  the  merits  of  a  number  of  design  choices.  A  very  popular  class  of  approaches,  in  particular  for  complex  design  spaces,  constructs  a  surrogate  model  from  these  computations,  on  which  the  actual  optimisation  is  performed.    

Aerodynamic   simulation   currently   relies   largely   on   steady   RANS   computations.   Over   the   next   few  years,   more   expensive   unsteady   periodic   computations   will   be   used   more   extensively   to   include  unsteady  effects,  blade  excitation  and  tonal  noise  in  the  optimisation  loop.  However,  since  significant  effort  is  devoted  to  the  development  of  more  economical  approaches,  optimisation  will  continue  to  

                                                                                                                         76  ‘Computation  of  unsteady  turbomachinery  flows:  part  1  –  progress  and  challenges’,  P.  G.  Tucker,  Progress  in  aerospace  Sciences,  vol.  47,  pp.  522-­‐  545,  2011  

77  ‘Computation  of  unsteady  turbomachinery  flows:  part  2  –  LES  and  hybrids’,  P.  G.  Tucker,  Progress  in  aerospace  Sciences,  vol.  47,  pp.  546-­‐  569,  2011

Page 121: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

121  

rely  on  a  large  number  of  relatively  cheap  computations.  We  can  therefore  assume  that,  in  the  next  few   years,   the   main   evolution   with   respect   to   HPC   will   probably   be   a   significant   increase   of   the  number   of   computations   in   function   of   the   available   resources,   due   to   the   need   for   robust  optimisation  and  uncertainty  quantification,  as  well  as  the  exploration  of  ever-­‐greater  design  spaces.    

A  very  important  and  specific  issue  to  optimisation  is  the  heterogeneity  of  the  application,  not  only  due   to   the   involvement   of   multiple   physics,   each   with   different   timescales   and   computational  requirements,   but   also   due   the   need   for   automatic   geometry   modification   coupled   to   mesh  generation   to   the   global   steering   by   the   optimisation   tool.   This   requires   even   more   flexible  scheduling  and  the  development/adaptation  of  heterogeneous  communication  protocols.  Moreover,  standard  non-­‐HPC  numerical  technology  will  need  to  be  ported,  in  particular  CAD  manipulation  and  mesh  generation  tools.  Alternatively,  a  connection  to  standard  workstations  will  be  required.    

In   the   longer   run,   high-­‐fidelity   large-­‐scale   simulations   will   also   be   integrated   in   the   optimisation  chain.  Given   large   computational   requirements,   it   is   to   be   expected   that   this  will   only   be   possible  within   the   framework   of   a  multi-­‐fidelity   approach,   combining   different   levels   of   resolution   for   the  construction   of   the   surrogate   model.   The   time   frame   for   this   inclusion   therefore   relies   on   the  development   of   LES   and   hybrid   technology   but   also   on   the   development   of   the   mathematical  framework   underpinning  multi-­‐fidelity   surrogate  models.  We   can   probably   expect   to   see   the   first  demonstrators  towards  the  end  of  this  decade.  

6.3.2  Aeronautics78  The   impact   of   computer   simulation   in   aircraft   design   has   been   significant   and   continues   to   grow.  Numerical  simulation  allows  the  development  of  highly  optimised  designs  and  reduced  development  risks  and  costs.  Boeing,  for  example,  exploited  HPC  in  order  to  reduce  drastically  the  number  of  real  prototypes  from  77  physical  prototype  wings  for  the  757  aircraft  to  only  11  prototype  wings  for  the  787  ‘Dreamliner’  plane.  HPC  usage  saved  the  company  billions  of  euros.    

Aircraft  companies  are  now  heavily  engaged  in  trying  to  solve  problems  such  as  calculating  maximum  lift  using  HPC  resources.  This  problem  has  an  insatiable  appetite  for  computing  power  and,  if  solved,  would  enable  companies  designing  civilian  and  military  aircraft  to  produce  lighter,  more  fuel-­‐efficient  and  environmentally  friendlier  planes.  

To  meet  the  challenges  of  future  aircraft  transportation  (‘Greening  the  Aircraft’),  it  is  vital  to  be  able  to   flight-­‐test   a   virtual   aircraft  with   all   its  multidisciplinary   interactions   in   a   computer   environment  and  to  compile  all  of  the  data  required  for  development  and  certification  with  guaranteed  accuracy  in  a  reduced  time  frame.    

For  these  challenges,  exascale  is  not  the  final  goal.  A  complete  digital  aircraft  will  require  more  than  Zflop/s  systems.  

In  parallel,  future  aircraft  concepts  require  deeper  basic  understanding  in  areas  such  as  turbulence,  transition  and  flow  control  to  be  achieved  by  dedicated  scientific   investigations  (see  above  on  each  engineering  scientific  item).  

The  roadmap  for  approaching  the  digital  aircraft  vision   includes  the  following  major  simulation  and  optimisation  challenges:  

• Improved  physical  modelling  for  highly  separated  flows  • Real-­‐time  simulation  of  aircraft  in  flight,  coupling  of  the  aerodynamic,  structural  mechanics,  

aeroelastic  and  flight-­‐mechanic  disciplines  based  on  high-­‐fidelity  methods  within  a  multidisciplinary  massively  parallel  simulation  environment  

• Aerodynamic  and  aeroelastic  data  production  

                                                                                                                         78  Philippe  Ricoux,  Stephane  Requena  

Page 122: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

122  

• Noise  source  and  impact:  full  development  of  noise  source  mechanisms,  acoustic  radiation  and  noise  impact  simulation  tools  which  compute  acoustic  disturbances  on  top  of  aircraft  flow    

• Multidisciplinary  aircraft  design:  fully  coupled  simulation  of  the  flow  around  a  parameterised  aircraft  configuration  and  surface  shapes  covering  a  reactive  structural  model  within  a  sophisticated  optimization  process.    The  coupled  large-­‐scale  simulations  will  run  multiple  times,  on  exascale  systems  allowing  a  mix  of  capacity  and  capability  applications  

In  terms  of  timing,  the  aeronautics  industry  has  already  produced  their  roadmap  linking  capacity  and  methods  (see  Figure  6.1).  

 

 

 

 

 

 

 

 

 

 

 

Figure  6.1.  Aeronautics  industry  roadmap  linking  capacity  and  methods    

6.3.3 Seismic,  Oil  and  Gas79    The  petroleum  industry  is  strongly  motivated  to  increase  the  efficiency  of  its  processes,  especially  in  exploration  and  production  and  to  reduce  risks  by  the  deployment  of  high-­‐performance  computing.  Typical   steps   in   the  business  process  are:  geoscience   for   identification  of  oil   and  gas  underground,  development   of   reservoir   modelling,   designing   of   facilities   for   the   cultivation   of   hydrocarbons;  drilling   of   wells   and   construction   of   plant   facilities;   operations   during   the   life   of   the   fields;   and  eventually  decommissioning  of  facilities  at  the  end  of  production.  

Geoscience   analyses   seismic   data   with   numerical   techniques   for   inverse   problems.   The   economic  impact  of  HPC  is  definitely  high  and  the  best  possible  tools  are  deployed.    

Again,  Eflop/s   is  not  the  ultimate  goal.  The  complete   Inverse  Problem  Resolution  of  wave  equation  needs  more  computational  resources.  

The   objective   of   this   application   is   to   produce   from   a   seismic   campaign   the   best   estimation   of   the  underground  topography  in  order  to  optimise  reservoir  delineation  and  production  by  solving  the  Full  Inverse  Wave  Equation.  This  application   is   largely  embarrassingly  parallel,  and  the  higher  performing  the  HPC  system,  the  better  the  approximation  of  the  underground  topography.  

                                                                                                                         79  Philippe  Ricoux  

 Courtesy  of  Airbus  

 

Page 123: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

123  

 

 

 

 

 

 

 

 

 

 

 

 

 

 With  seismic,  others  physics  must  be  coupled  and  solved  such  as  electromagnetism  (antenna,   radar,  sonar,   etc.,   and   more   generally   wave   propagation).   Maxwell   equations   could   be   solved   without  modelling   in   3D   academic   configurations   to   set   database   post-­‐processed   to   analyse   physics   and  develop  models  devoted  to  practical  systems.  

A  roadmap  of  the  steps  of  this  kind  of  approach,  showing  the  different  necessary  and  more  complex  methods  of  approximations  of  the  physical  reality  (e.g.  elastic,  visco-­‐elastic,  etc.)   is  shown  in  Figure  6.2  courtesy  of  Total,  but  this  roadmap   is  now  accepted  by  all   international  oil  companies.  For  this  application,   it   is   essential   to   both   define   and   implement   new   algorithms   representing   more  accurately   the   physics   of   the   problems   to   be   solved,   and   also   to   deploy   ever   more   powerful  hardware.  

Moving   beyond   geoscience,   the   other   activities   in   the   petroleum   industry   have   aspects  which   are  generally  classified  as  system-­‐of-­‐systems  design  and  multiphase  fluid  dynamics.  Of  these  topics,  fluid  dynamics  require  a  significant  effort  in  terms  of  computing.    

Similar  criteria  are  valid  for  multi-­‐fluid  problems  as  for  geoscience.  Enhanced  quality  of  simulations  depends  both  on  more  appropriate  physical  models  and  on  numerical  methods  and  techniques  (e.g.  for  bifurcation  analysis).  Physical   scales  are  disparate,   for   instance   in  pipeline  modelling  where   the  diameter  is  measured  in  fraction  of  a  metre  while  the  length  of  the  pipeline  is  normally  measured  in  kilometres.    

For  all  oil  and  gas  applications,  the  future  codes  must  merge  and  couple  multiscale  techniques  and  multi-­‐physics  models  due  to  the  complexity  of  non-­‐linear  and  stochastic  equations.  

So,  these  domains  will  require  one  or  more  breakthroughs  for  an  efficient  use  on  exascale  systems.    

In  summary,  the  major  problems  and  issues  are:    • Use  of  standard  programming  model  (MPI,  Open  MP,  etc.)  cross-­‐compiling  –  portable  software  

stack  for  development  on  PCs  and  deployment  on  large  HPC  systems  • Maintenance  of  legacy  codes  • Tools  for  test,  verification  and  validation  of  (parallel)  codes  • Memory  access  • Data  management    • Task  management  and  distribution  

Courtesy  of  TOTAL    

Figure  6.2.  Seismic  depth  imaging  methods  evolution  and  HPC  

Page 124: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

124  

• Efficient  solvers    • Numerical  multiscale  techniques  and  methods    • Efficient  massively  parallel  coupling    • Uncertainties  quantification  • Data  assimilation    

 

6.3.4    Power  Generation,  Nuclear  Plant80    In  this  industrial  domain,81  the  objectives  are  multiple:  (i)  improvement  of  safety  and  efficiency  of  the  facilities  (especially  nuclear  plants),  and  (ii)  optimisation  of  maintenance  operation  and  lifespan.  This  is   one   field   in   which   physical   experimentation,   for   example   with   nuclear   plants,   can     be   both  impractical  and  unsafe.  Computer  simulation,  in  both  the  design  and  operational  stages,  is  therefore  indispensable.    

Thermal  Hydraulic  CFD  Application  Field  

Improvement   of   efficiency   may   typically   involve   mainly   steady   CFD   calculations   on   complex  geometries,  while  improvement  and  verification  of  safety  may  involve  long  transient  calculations  on  slightly  less  complex  geometries.  

• Study   of   flow-­‐induced   loads   to   minimise   vibration   and   wear   through   fretting   in   fuel  assemblies  may  require  from  200  million  to  2  billion  cells  per  fuel  assembly  and  to  account  correctly  for  both  cross-­‐flows  in  the  core  and  walls  around  the  core,  at  least  one  quarter  of  a  core  (over  100  assemblies)  may  need  to  be  modelled.  

• To   study   flow-­‐induced   deformation   in   PWR   (pressurised  water   reactor)   cores,   a   full   core  may  need  to  be  represented,  at  a  slightly  lower  resolution,  for  an  estimated  grid  size  of  at  least  5  billion  cells,  which  leads  to  runs  of  100  Pflop/s  over  several  weeks.  

• Detailed   simulations   designed   to   verify   and   increase   safety   may   require   full   core  simulations,   and   mesh   sensitivity   studies   for   these   transient   calculations   may   require  unsteady  calculations  for  meshes  from  5  to  20  billion  cells  before  2020  which  correspond  to  runs  on  400  Pflop/s  over  several  weeks.  

• To  validate  the  models  used  for  calculations  such  as  the  ones  described  above,  as  well  as  many  others,  running  quasi-­‐DNS  type  calculations  on  subsets  of  the  calculation  domain  may  be  necessary.    

This  will  require  meshes  in  the  20-­‐billion  cell  range  by  2012  (to  study  cross-­‐flow  in  a  tube-­‐bundle,  in  a  simplified  steam  generator  type  configuration),  and  running  similar  calculations  for  more  complete  calculation  domains  may  require  meshes  well  above  100  billion  by  2020.    

Note   that,   as   safety   studies   increasingly   require  assessment  of  CFD  code  uncertainty,   sensitivity   to  boundary  conditions  and  resolution  options  must  be  studied,  but  turbulence  models  may  still  induce  a  bias  in  the  solution.    

 

 

 

                                                                                                                           80  Philippe  Ricoux,  EDF,  EESI  81  Note  that  we  do  not  consider  HPC  applications  linked  to  nuclear  weapons  in  this  report,  but  restrict  our  attention  to  civil  nuclear  applications.    

 

Page 125: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

125  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Doing  away  with  turbulence  models  and  running  DNS-­‐type  calculations  at  least  for  a  set  of  reference  calculations  would  be  a  desirable  way  of  removing  this  bias.  Such  studies  will  require  access  to  multi-­‐Eflop/s  capacities  over  several  weeks.  

Neutronics  Application  Field  

This   includes   the   capability   to   model   very   complex,   possibly   coupled   phenomena   over   extended  spatial  and  time  scales.   In  addition,  uncertainty  quantification  and  data  assimilation  are  considered  as   key   to   industrial   acceptance,   so   that   their   associated   computational   needs   that   depend   on   the  complexity  of  the  model  considered  have  to  be  met.  

In   terms   of   computing   resources,   projections   are   difficult   to   make   because   of   the   non-­‐linear  behaviour   of   iterative   algorithms   with   respect   to   the   degrees   of   freedom   –   and   the   number   of  processors.  Additionally,  new  algorithms  may  have  to  be  implemented  to  address  the  new  types  of  numerical/physical  problems  within  an  evolving  architecture.      Electric  Power  Generation  Overview  

Many  other  applications  exist  beyond  those  mentioned:  new  generations  of  power  plants,  innovation  in  renewable  energies  and  storage,  protection  against  specific  environmental   threats   (earthquakes,  flood,   heatwave,   etc.),   customers’   energy   efficiency,   development   in   home   and   building   of  technologies  and  services  for  energy  efficiency,  etc.  

Several  problems  should  be  addressed  to  reach  these  goals:  CFD,  heat  and  multi-­‐fluids  flows,  thermal  hydraulic  CFD,  LES  simulations,  etc.,  modelling  very  complex  systems,  possibly  coupled  phenomena  

   

Figure  6.3.  A  possible  roadmap  for  Eflop/s  neutronics  computation,  courtesy  of  EDF  

Page 126: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

126  

over  extended  spatial  and  time  scales,  mixed  with  capacities   like  uncertainty  quantification  or  data  assimilation.    

The  challenge  is  particularly  severe  for  multi-­‐physics,  multi-­‐scale  simulation  platforms  that  will  have  to   combine   massively   parallel   software   components   developed   independently   from   each   other.  Another  difficult  issue  is  to  deal  with  legacy  codes,  which  are  constantly  evolving  and  have  to  stay  in  the  forefront  of  their  disciplines.    

This  will   require  new  compilers,   libraries,  middleware,  programming  environments  and   languages,  as  well  as  new  numerical  methods,  code  architectures,  mesh  generation  tools    visualisation  tools,  etc.  

6.3.5    Transportation,  Automotive  82  The  automotive  industry  is  actively  pursuing  important  goals  that  need  Eflop/s  computing  capability  or  greater,  including  the  following  examples:  

• Vehicles  that  will  operate  for  250,000  kilometres  (150,000  miles)  on  average  without  the  need  for  repairs  –  this  would  provide  considerable  savings  for  automotive  companies  by  enabling  the  vehicles  to  operate  through  the  end  of  the  typical  warranty  period  at  minimal  cost  to  the  automakers    

• Full-­‐body  crash  analysis  that  includes  simulation  of  soft  tissue  damage  (today's  ‘crash  dummies’  are  inadequate  for  this  purpose)  –  insurance  companies  in  particular  require  this  

• Longer-­‐lasting  batteries  for  electrically  powered  and  hybrid  vehicles  

For   both   aerodynamics   and   for   combustion,   at   least   LES,   and   if   possible   DNS,   simulations   are  required   in   an   industrial   scale   and   Eflop/s   applications   must   be   developed   at   the   right   scale,  according   to  weak   scalability,   but   these   simulations  must   be   coupled   to   all   physics   (flow,   thermal,  thermodynamic,   chemistry,   etc.)   involved   in   the   global   transportation   system.   This   leads   to   a  requirement  for  coupled  simulations  involving  at  least  one  legacy  code  with:    

• Full-­‐scale,  multi-­‐physics  configurations  

• Multiple  runs  for  optimisation  and  parametric/statistical  analysis.  

The  global  roadmap  for  this  sector  could  thus  be  as  follows:  • Individual  performance  and  scalability  of  component  codes    • Eflop/s  systems  will  mainly  allow  multiple  runs,  by  ‘farming’  applications,  for  ‘optimised’  

resolutions  • Overall  performance  of  the  multi-­‐physics  coupled  system.    Once  more,  that  leads  to  farming  

applications  • Data  management  

For  combustion  and  external  aerodynamics,  see  above  specific  scientific  descriptions  (section  6.2.2).  

Crash  

From  the  present,  where  most  of  the  computations  are  done  in  parallel  (8–64  cores)  and  scalability  tests   have   shown   that   up   to   1,024   cores  may   be   reasonable   on   10  million   finite   elements,   to   the  future,  where  model  sizes  for  a  full  car  will  range  between  1.5  and  10  billion  elements.  

New   codes   (mainly   open-­‐source)   must   be   developed   for   Eflop/s   systems   with   the   following  attributes:    

                                                                                                                         82  Philippe  Ricoux,  Stephane  Requena  

Page 127: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

127  

• Coupling   to   perform   a   standardised   mapping   between   manufacturing   simulation   and   crash  simulation    

• Optimisation  and  stochastic  analysis  

• First  multi-­‐level  computations  are  tested  in  research  and  in  industrial  applications  when  the  so-­‐called  sub-­‐cycling  is  used  –  more  detailed  parts  of  the  problem  are  treated  on  a  dedicated  group  of  cores.  

• In   general,   crash   simulation   is   already   well   embedded   into   a   simulation   data   management  system  with   automated   pre-­‐   and   post-­‐processing   including  monitoring   and   coupling   to   other  fields  and  functionalities.  

For  the  10-­‐year  perspective,  the  following  main  challenges  must  be  addressed:    

• True  virtual  testing  replacing  some  physical  tests  requiring  reliable  computations  –  for  example,  it  will  be  necessary  to  replace  parts  meshed  shells  by  3D  meshes  (middle  pillar  meshed  with  30  million  finite  elements  at  Audi).  

• Handling  of  a  much  higher  complexity  of  finite  element  models  (new  materials,  human  models  instead  of  dummies,  etc.);  these  new  materials  require  better  and  more  efficient/stable  algorithms.  The  human  models  have  to  be  improved,  with  stochasticity  included.  It  will  be  necessary  to  consider  not  only  drivers  meshed  models  but  also  fully  3D  meshed  passenger  models,  to  increase  the  number  of  test  cases  to  be  more  representative  of  typical  real  car  accidents,  to  model  more  precisely  the  behaviour  of  all  the  airbags  with  a  good  acceleration  model  (with  a  law  to  model  the  airbags  release).  

• Ensuring  that  the  overall  computational  wall  clock  time  remains  constant  (ideally  ca.  8  hours  for  an  overnight  production  run).  

• Addressing  true  multidisciplinary  and  multi-­‐physics  simulations  including  optimisations  and  stochastic  analysis;  this  will  lead  to  a  factor  of  >  1,000  for  the  required  number  of  computations  compared  to  today  and  the  necessity  to  embed  all  simulations  in  an  overall  simulation  data  management.  As  an  example,  optimising  by  hand  is  possible  for  three  to  five  parameters  but  not  for  >  100  parameters  at  the  same  time.  A  big  challenge  is  to  lower  the  weight  of  cars  (in  order  to  reduce  their  consumption)  and  first  R&D  studies  showed  that  there  is  a  need  to  change  materials  (from  steel  to  aluminium)  and  as  a  result,  reconsider  the  weight/cost/performance  ratio.  This  re-­‐conception  process  with  such  new  materials  will  need  to  perform  massive  shape  optimisation  studies  in  order  to  maintain  performance  and  safety  while  reducing  weight  and  cost.  

• Establishing  robust  topology  and  shape  optimisation  for  crash  including  meta-­‐modelling  techniques  for  fast-­‐coupled  multidisciplinary  analysis  (especially  for  fluid  structure  coupling)  or  for  crash/stamping  coupling  with  a  very  accurate  representation  of  materials.    

• Multi-­‐level  simulations  where  some  local  effects  (e.g.  failure)  are  studied  on  the  meso-­‐level  in  parallel  to  the  overall  macro-­‐computation,  which  might  be  realised  based  on  hybrid  parallelisation  schemes.  Representation  of  the  tear  sheets  and  fracture  of  spot  welds  is  important  because  it  changes  the  crash  shock  scenario.  The  models  currently  used  in  industry  are  not  representing,  at  least  not  in  a  simplified  way,  the  behaviour  at  the  meso  level.  The  latter  is  expected  to  involve  increasing  the  size  of  the  models  by  a  factor  of  two.  

 

Addressing  these  challenges  requires  projects  in  the  following  areas:  

• Due  to  the  complexity  and  high  non-­‐linearity  of  crash  simulation,  it  will  be  difficult  to  progress  through  strong  scalability.  A  limit  estimation  could  be  between  64  (current  standard)  and  2,000  cores  for  next-­‐generation  crash  simulations.  Farming  applications  should  run  on  Eflop/s  machines  able  to  address  simultaneously  both  capability  and  capacity  simulations.  

Page 128: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

128  

• Memory  is  currently  not  an  issue  because  of  the  explicit  FE  (Finite  Element)  method,  but  it  will  become  more  important  especially  given  the  trend  to  coupled  simulations  where  a  large  amount  of  data  needs  to  be  mapped  from  manufacturing  simulation  to  crash  simulation.  In  the  future,  some  companies  may  use  implicit  methods,  which  are  more  memory  intensive.  As  an  example,  BMW  is  using  implicit  methods  for  its  crash  simulations:  the  computational  cost  is  greater  but  results  are  much  more  accurate.  The  ESI  Group  is  working  on  merging  explicit  and  implicit  methods  to  allow  the  possibility  of  performing  a  crash  and  a  NVH  (Noise,  Vibration  and  Harshness)  simulation  with  the  same  mesh.      

• Automated  pre-­‐processing  should  be  improved  (e.g.  meshing  of  3D  objects,  coupling  between  CAD  and  CAE,  unified  geometrical  modelling  by  isogeometric  analysis,  parameterisations  for  sensitivity  and  optimisation  studies).    

As  for  the  numerous  other  industrial  applications  in  the  energy  domain,  the  following  methods  must  be  addressed  for  an  Eflop/s  crash  test  application:    

• More  efficient  algorithms  for  stochastic  modelling  

• More  efficient  algorithms  for  shape  and  topology  optimisation;  current  single  crash  simulations  are  taking  around  8  to  15  hours  on  up  to  64  cores.  The  target  will  be  to  perform  in  an  overnight  run  a  whole  shape  optimisation  study  on  a  full  body  consisting  of  10x  or  100x  single  crash  simulations  for  analysis  the  next  day.  

• Establishment  of  a  uniform  approach  for  CAD  and  CAE  (and  other  CAx),  already  demonstrated  on  subsystems  to  tens  of  parameters;  the  target  would  be  to  use  it  on  a  full  system  

• Improved  material  models  for  soft  tissues  (human  model),  composites,  honeycomb  structures,  multi-­‐material  light  weighting.  

• Algorithms  for  multi-­‐level  analysis  for  composites  and  other  new  lightweight  materials  where  a  coupling  between  manufacturing  and  crash  simulation  is  realised  

• Algorithms  for  multi-­‐physics  (especially  for  electric  cars)  and  multidisciplinary  simulations  

• New  techniques  for  parallelisation  to  improve  scalability  (based  on  sub-­‐cycling  or  other  approaches)  

• Robust  meshing  techniques  for  3D  modelling,  which  can  be  used  during  simulation  to  enable  shape  optimisation  and  adaptive  multi-­‐level  computation  (example.g.  for  failure  analysis).  Adaptive  meshing  is  still  not  possible  and  the  trend  is  to  move  to  the  standardisation  of  functions  used  in  CAD  and  mesh.  

• Fluid–structure  interaction  (simulation  and  optimisation).  Simulation  of  combustion  and  pollutant  emissions  in  combination  

6.3.6 Other  Important  Industrial  Applications83    Industrial  Medical  Applications.    This  is  a  large  market  although  the  companies  involved  are  mainly  SMEs.   HPC   is   used   for   cardiovascular   flows,   the   modelling   of   the   brain   (not   yet   industrialised),  tumour   growth,   medical   images   (combination   of  MRIs,   for   example).   This   market   will   grow   using  increasing   performances   in   viscous   flows,   image   processing,   2D/3D   reconstruction   and   ‘big   data’  management.  

Industrial   Pharma   Applications.   All   these   industries,   firmly   established   in   Europe,   already   use   ab-­‐initio  and  molecular  simulation  applied   to   their  domains,  and  they  will   increase  R&D  efforts   in   this  

                                                                                                                         83  Philippe  Ricoux,  Olivier  Pironeau  

Page 129: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

129  

field   (see   the   sections   4   and   5   of   this   report)   for   drug   design   (GSK,   Sanofi)   and   biomedical  applications  (L’Oréal).  The  main  issues  for  these  industries  include:  

• ’Big  data’  management,  generation,  transport,  storage  –  due  to  screening  simulations  • Exascale  efficient  MD  software  • New  data  mining  for  massively  parallel  QSAR  (Structure  –  Activities  Relations)  

§ (cf.  Bio  and  medical  sciences  Scientific  Case  for  academic  developments)  

Banks  and  Insurance  Companies  are   increasingly  using  HPC,  mostly  embarrassingly  parallel  Monte-­‐Carlo  solutions  of  stochastic  ODEs;  but  high-­‐frequency  trading  will   inevitably  require  better  models  and  faster  calculation.    They  also  have  the  challenge  of  interconnecting  supercomputers  and  several  private   clouds.    Finally,   in   common  with  many   other   industries  mentioned   in   the   report,   they   are  faced  with  the  ‘big  data’  problem  in  the  sense  that  massive  market  data  are  available  (Reuters)  and  current   calibration   algorithms   cannot   exploit   such   large   input.   Note   that   41   machines   are  characterised  as  ‘finance’  in  the  Top  500  list  (November  2011).84  

Emerging   Technologies.     New   types   of   industry   are   evolving   around   computer   networks,   data  mining,  social  networks,  etc.  The  main  issues  here  include:  

• ‘Big  data’  management,  generation,  transport,  storage    • New  data  mining  for  massively  parallel  analysis  (K  Tables,  etc.)  

One  of   the  major   issues  will  be   to  allow  major  companies   to   take  advantage  of  HPC   for   increasing  their  competitiveness  but  also  to  help  all  their  supply  chain  (including  SMEs)  to  be  engaged  in  the  use  of   HPC.   That   means   that,   from   one   perspective,   large-­‐scale   exascale   systems   may   be   easily  downscaled   to   Pflop/s   in   box   systems   for   SMEs   and   that   software   is   made   available,   known   and  affordable  for  such  small  companies.  Such  an  issue  is  crucial  for  ensuring  global  European  industrial  competitiveness.  

 

                                                                                                                         84  http://www.top500.org/lists/2011/11  

Page 130: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

130  

6.4 Engineering  and  Industrial  Exascale  Issues    The  major  issues  from  both  an  academic  and  an  industrial  perspective  that  must  be  addressed  in  order  to  enable  efficient  exascale  applications  are  shown  in  Table  6.1  below.  

Table  6.1.  Enabling  exascale  applications  –an  academic  and  industrial  perspective.  

1.  The    Simulation  Environment  

• Unified  Simulation  Framework  and  associated  services:  CAD,  mesh  generation,  data-­‐setting  tools,  computational  scheme  editing  aids,  visualisation,  etc.    

• Multi-­‐physics  simulations:  establishment  of  standard  coupling  interfaces  and  software  tools,  mixing  legacy  and  new  generation  codes    

• Common  (jointly  developed)  mesh-­‐generation  tool,  automatic  and  adaptive  meshing,  highly  parallel  

• Standardised  efficient  parallel  I/O  and  data  management  (sorting  memory  for  fast  access,  allocating  new  memory  as  needed  in  smaller  chunks,  identifying  memory  that  is  rarely/never  needed  based  on  heuristic  algorithms,  etc.)    

2.  Codes/  Applications    

• New  numerical  methods,  algorithms,  solvers/libraries,  improved  efficiency  

• Optimisation,  data  assimilation  

• Coupling  between  stochastic  and  deterministic  methods,  uncertainty  quantification  

• Numerical  scheme  involving  stochastic  HPC  computing  for  uncertainty  and  risk  quantification  

• Meshless  methods  and  particle  simulation  • Large  database,  ‘big  data’,  new  methods  for  data  mining  and  valorisation    

• Scalable  programs,  strong  and  weak  scalability,  load  balancing,  fault-­‐tolerance  techniques,  multi-­‐level  parallelism  (issues  identified  with  multi-­‐core  with  reduced  memory  bandwidth  per  core,  collective  communications,  efficient  parallel  I/O)    

• Development  of  standards  programming  models    (MPI,  OpenMP,  C++,  Fortran,  etc.)  handling  multi-­‐level  parallelism  and  heterogeneous  architecture  (GPU)  

3.  Archival  Storage  and  Data  Transfer    

• Certainly  one  of  the  hardest  challenges  will  be  in  archival  storage,  network  capacity  to  transfer  multi-­‐petabyte  data  sets,  and  post-­‐processing  tools,  such  as  graphics  software  capable  of  managing  individual  files  in  the  multi-­‐TBytes  range.    

• That  may  require  the  establishment  of  one  or  several  dedicated  service  centres  across  Europe,  linked  to  mass-­‐storage  facilities.  Basic  engineering  research,  as  opposed  to  proprietary  development,  is  cooperative,  and  it  is  important  that  accesses  to  such  data  and  centres  remains  open  to  groups  beyond  the  data  originators  for  several  years  after  the  simulations  are  run.    

4.  Human  Resources   • Training,  education  of  HPC  developers  and  engineers    

Page 131: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

131  

6.5 A  Roadmap  for  Computational  Requirements  

For  many  academic  or  industrial  applications,  computational  requirements  are  very  similar.  Timing  is  similar  with  a  step  around  2015–2017  for  100  PF  systems  and  2020–2022  for  exascale.  These  steps  correspond   for   turbulence,   for   combustion,   for   aeronautics,   etc.,   to   developing   new   resolution  methods  (LES,  DNS-­‐like)  and  increasing  the  number  of  grid  points  (as  described  in  previous  sections).  The   table   below   could   be   a   good   compromise   for   all   these   domains   involving   automatic   mesh  generation.  

 Table  6.2.  Computational  requirements  for  Domains  involving  automatic  mesh  generation  

 

Case   Adverse-­‐pressure-­‐gradient  boundary  layer,  Reθ  =  

20,000  

Compressible  jet,  with  nozzle  and  acoustics,  

ReD  =  50,000  

LES  of  multi-­‐stage  low-­‐pressure  turbine  

(50  blades)  

Current  State  of  the  Art   Reθ  =  2,000  

ReD  =  8,000  Developing  from  pipe  

 RANS  modelling  

Likely  Date   2015   2015   2020  

Grid  Points  (Gpoints)  

300   32   5000  

CPU  Hours  (IBM  BG-­‐P)   5  Gh   2.5  Gh   200  Gh  

Cores  (BG-­‐P)   4  M  cores   1.5  M  cores   60  M  cores  

Central  Storage   80  TBytes   80  TBytes   5  PBytes  

Archival  Disk  Storage   7  PBytes   400  TBytes   10  PBytes  

 Notes  

Extension  of  current  software  (several  cases  to  be  run)  

Extension  of  current  software  

Requires  new  integration,  gridding    and  other  software  

 As  one  illustrative  example,  we  project  the  progress  expected  in  direct  numerical  simulation  in  Table  6.3.    

Table  6.3  Direct  Numerical  Simulation  Challenges  and  Expected  Status  in  the  2012-­‐2020  Timeframe.    

Year   2012   2017   2022  

Computational  Power   5  Pflop/s   100  Pflop/s   >  1  Eflop/s  

Main  Memory   100  TBytes   1  PBytes   10  PBytes  

#  Cores   100  K   2  M   100  M  

#  Particles  Simulated   100  K  –  10  M   1  M  –  100  M   >  1G  

Particle  Model   Mass  point  or  spherical   Geometrically  resolved,   Elastic  deformable  

Page 132: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Engineering  Sciences  and  Industrial  Applications  

132  

rigid  

#  CFD  Grid  Cells   10  G  –  100  G   100  G  –  1  T   >  10  T  

Core  Hours   >  20  M   >  200  M   >  2  G  

Notes  Simple  particle  models,  explicit  coupling  for  about  10  M  time  steps  

Using  fluid  structure  interaction  techniques  and  immersed/embedded  boundary  techniques  

Using  models  of  deformable  particles  each  with  e.g.  100  degrees  of  freedom  

 For   industrial   applications   such   as   oil   and   gas,   aeronautics   and   nuclear   plants,   exascale   is   not   the  ultimate  goal  but  is  just  a  stepping  stone  towards  zettascale.  Solving  the  inverse  problem  of  seismic,  or   designing   a   digital   aircraft   or   spatial   applications,   will   require   much   more   than   exascale  computers.    

6.6 Expected  Status  in  2020  Eflop/s  computers  are  expected  from  HPC  vendors  around  2020  and  one  of  the  key  issues  will  be  to  keep   the   overall   power   consumption   acceptable   around   20  MW.   These   systems   will   be   a   central  ingredient   in   the   further   development   of   engineering   research   in   Europe.   Several   basic   flows   are  currently   waiting   on   the   availability   of   such   machines,   for   example   some   turbulence   codes   and  several  DNS  simulations.  

But  for  the  majority  of  applications,  evidence  suggests  that  the  real  need  for  large-­‐scale  cooperative  simulation  projects   is  not  currently  contemplated   in  the  EU  funding  schemes.  These  should   include  technological   and   basic   research   into   areas   such   as   flow  physics,   code   integration   and   interfacing,  verification  and  validation,  gridding,  numerics,  parallelisation,  and  the  interaction  of  all  those  aspects  with  new  computer  and  accelerator  architectures.    

What   is  particularly   required   in  2020   is   software  with   load  balancing,   fault   tolerance,  coupled  with  user  need.  What  is  clearly  expected  in  2020  includes:  

• Standard  coupling  interfaces  and  software  tools    • Mesh-­‐generation  tool,  automatic  and  adaptive  meshing,  highly  parallel  

§ from  meshes  of  ca.  100  million    tetras,  16k  cores,  one  second  physical  time  § expected    2020:  10  billion  tetras,  1.5  million  cores,  one  second  physical  time    

• Multi-­‐physics,  refined  chemistry  • Billion  particle  simulations  • New  numerical  methods,  algorithms,  solvers/libraries    • Uncertainty  quantification  

Finally,   we   reference   again   the   three   prospective   roadmaps   of   industries   shown   earlier   in   this  section.  These  capture   in  turn  the  expectations  of  the  aeronautics   industry  (Figure  6.1),  reveals  the  evolution  of  seismic  depth  imaging  methods  (Figure  6.2)  and  depicts  a  possible    roadmap  for  Eflop/s  neutronics  computation.    

Page 133: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

133  

7 REQUIREMENTS FOR THE EFFECTIVE

EXPLOITATION OF HPC BY SCIENCE AND

INDUSTRY

7.1 Introduction  All   of   the   panels   contributing   to   this   report   are   convinced   that   the   competitiveness   of   European  science   and   industry  will   be   jeopardised   if   sufficiently   capable   computers   are   not  made   available,  together  with  the  associated  infrastructure  necessary  to  maximise  their  exploitation.  In  reviewing  the  scientific   impact   and   societal   benefits   in   the   preceding   sections   of   this   report,   the   panels   have  identified   multiple   areas   at   risk   in   concluding   that   access   to   high-­‐performance   computers   in   the  exascale  range  is  of  utmost  important.    

Such  resources  are  likely  to  remain  extremely  expensive  and  require  significant  expertise  to  procure,  deploy   and   utilise   efficiently;   some   fields   even   require   research   for   specialised   and   optimised  hardware.   The   panel   stresses   that   these   resources   should   continue   to   be   reserved   for   the   most  exigent   computational   tasks   of   high   potential   value.   It   is   clear   that   the   computational   resource  pyramid  must   remain  persistent  and  compelling  at  all   levels,   including  national  centres,  access  and  data   grids.   The   active   involvement   of   the   European   Community   along   with   appropriate   Member  States  remains  critical  in  establishing  a  world-­‐leading  supercomputer  infrastructure  in  the  European  ecosystem.   Europe   must   foster   excellence   and   cooperation   in   order   to   gain   the   full   benefits   of  exascale  computing  for  science,  engineering  and  industry  in  the  European  Research  Area.    

In  pointing  to  the  compelling  need  for  a  continued  European  commitment  to  exploit  leadership  class  computers,   the   panels   have   considered   the   infrastructure   requirements   that   must   underpin   this  commitment,  and  present  their  considerations  below  as  part  of  the  review  of  computational  needs.  This  considers  both  the  vital  components  of  the  computational  infrastructure,  and  the  user  support  functions  that  must  be  provided  to  realise  the  full  benefit  of  that  infrastructure.    This  review  has  led  to   a   set   of   key   recommendations   deemed   vital   in   shaping   the   future   provision   of   resources,  recommendations   that   are   justified   below   and   presented   as   sidebars   in   the   following   text,   and  captured  as  part  of  the  Executive  Summary  to  this  report.  

 

7.2 An  Effective  and  Persistent  Infrastructure    The   resources   required   to   support   computational   science   through   a   number   of   large   and   often  diverse  computational  projects  span  a  hierarchy  of  levels  –  desktop,  departmental  or  laboratory  level  machines,   regional  centres  and  supercomputer  centres.  These   resources  need  to  be  organised   in  a  hierarchical   multi-­‐tier   pyramid   and   connected   by   adequate   high-­‐speed   links   and   protocols.  Furthermore,  there  are  several  complementary  functions  that  must  be  provided  by  a  Computational  Infrastructure  if  it  is  to  prove  both  effective  and  persistent  (see  Table  7.1).  

Usually,  ‘capacity  computing’  is  deployed  against  tasks  (b)  and  (c),  while  ‘capability  computing’  provides  the   only   solution   to   deliver   against   task   (d).   In   this   Scientific   Case,   we   advocate   that   this   essential  component   of   capability   computing   should   be   performed   through   shared   European   services   that  complement  national  facilities.  This  will  add  value  at  all  levels,  in  particular  by  being  more  competitive  on  the  innovative  aspects  permitted  by  type  (d)  tasks.  We  also  show  that  the  infrastructure  needs  to  

Page 134: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

134  

Recommendation    

N e ed f o r H P C In fr as tru c tur e a t the E u ro p e L eve l

The  scientific  progress  that  has  been  achieved  using  HPC  since  the  ‘Scientific  Case  for  Advanced  Computing  in  Europe’  was  published  in  2007,  the  growing  range  of  disciplines  that  now  depend  on  HPC,  and  the  technical  challenges  of  exascale  architectures  make  a  compelling  case  for  continued  investment  in  HPC  at  the  European  level.  Europe  should  continue  to  provide  a  world-­‐leading  HPC  infrastructure  to  scientists  in  academia  and  industry,  for  research  that  cannot  be  done  any  other  way,  through  peer  review  based  solely  on  excellence.  

 

embrace  a  pyramid  of  resources  to  deliver  effectively  against  all  of  the  above,  and  we  consider  from  a  scientific  perspective  how  this  infrastructure  might  be  best  balanced.  

In   order   to   integrate   the   variety   of   resource   levels,   facilitate   access   for   users   and   simplify   the  management   of   the   extreme   volumes   of   data   required,   an   appropriate   electronic   data  communication  infrastructure  is  key.  Typically  referred  to  as  a  ‘Grid’,  this  infrastructure  needs  to  be  highly  tuned  for  HPC  usage,  and  connected  to  the  various  tiers  of  HPC  facilities.  

 Table  7.1.  Complementary  functions  of  an  effective  and  persistent  infrastructure.  

(a)  The  development  and  evolution  of  innovative  application  programs,  models  and  methods  (we  return  to  this  function  in  7.5.1  below).  

(b)  Preparatory  and  post-­‐processing  work,  permitting  the  design  and  validation  of  particular  models;  this  may  require  both  data  preparation  plus  the  analysis  and  exploitation  of  the  data  generated  by  the  computations.  

(c)  Large-­‐scale  systematic  studies,  where  each  case  requires  true  supercomputer  power.  This  enables  exploration  of  the  parameter  space  of  devices  and  phenomena,  with  the  ability  to  deal  with  multiple  combinations  of  parameter  values,  thereby  enabling  the  investigation  of  the  statistical  behaviour  of  phenomena  i.e.  uncertainty  quantification.  

(d)  Extremely  large,  so-­‐called  'hero'  computations,  where  the  sheer  power  of  the  entire  computational  resource  is  used  to  study  more  detailed  models  than  previously  possible.  The  objective  may  be  scientific  insight,  where  the  model  would  include  scientific  aspects  not  previously  understood,  or  an  attempt  to  deal  with  more  detailed  data  than  usually  feasible.  In  industry,  it  may  be  necessary  to  validate  models  extensively  before  they  are  used  more  routinely  in  design  processes.  Extremely  large  computations  may  also  be  required  to  deal  with  unexpected  situations  and  incidents,  in  order  to  mitigate  the  consequences  or  rapidly  prepare  design  changes.  

(e)  Efficient  algorithms  are  an  essential  ingredient  of  any  HPC  project.  As  larger  and  larger  problems  are  solved  on  larger  and  larger  computers,  it  becomes  increasingly  important  to  select  optimal,  or  near  optimal,  algorithms  and  solvers.  As  most  problems  have  a  superlinear  computational  complexity,  simply  relying  on  hardware  advances  to  solve  these  larger  problems  is  ultimately  doomed.  Moreover,  some  of  the  more  critical  tasks  are  generic,  in  the  sense  that  they  are  not  tied  to  one  particular  application  –  or  even  one  particular  field  –  but  will  occur  in  most  of  the  challenges  listed  in  this  report  e.g.,  the  solution  of  (large  and  sparse)  linear  and  non-­‐linear  systems,  computation  of  the  Fast  Fourier  Transform  (FFT)  and  integration  of  time-­‐dependent  differential  equations.  

 

There   was   widespread   consensus   among  the   panels   that   the   development   of   the  infrastructure,   its   operation   and   access  mechanisms  must   be   driven   by   the   needs  of  science,  industry  and  society  to  conduct  world-­‐leading   research.   PRACE   should  work  more   closely  with   its  users,  with   the  leadership  and  management  involving  both  researchers  and  providers.  

While   this   report   targets   specifically   the  infrastructure  required  to  handle  capability  jobs,   it   also   acknowledges   the   importance  of   the   remainder   of   the   pyramid.   This  

Page 135: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

135  

Recommendation      Integrated Environment for Compute and Data Most  application  areas  foresee  the  need  to  run  long  jobs  (for  months  or  years)  at  sustained  performances  around  100  Pflop/s  to  generate  core  data  sets  and  very  many  shorter  jobs  (for  hours  or  days)  at  lower  performances  for  pre-­‐  and  post-­‐processing,  model  searches  and  uncertainty  quantification.  A  major  challenge  is  the  end-­‐to-­‐end  management  of,  and  fast  access  to,  large  and  diverse  datasets,  vertically  through  the  infrastructure  hierarchy.  Most  researchers  seek  more  flexibility  and  control  over  operating  modes  than  they  have  today  to  meet  the  growing  need  for  on-­‐demand  use  with  guaranteed  turnaround  times,  for  computational  steering  and  to  protect  sensitive  codes  and  data.  Europe-­‐level  HPC  infrastructure  should  attach  equal  importance  to  compute  and  data,  provide  an  integrated  environment  across  Tiers  0  and  1,  and  support  efficient  end-­‐to-­‐end  data  movement  between  all  levels.  Its  operation  must  be  increasingly  responsive  to  user  needs  and  data  security  issues.  

Recommendation    

Leadership and Management The  development  of  Europe’s  HPC  infrastructure,  its  operation  and  access  mechanisms  must  be  driven  by  the  needs  of  science  and  industry  to  conduct  world-­‐leading  research.  This  public-­‐sector  investment  must  be  a  source  of  innovation  at  the  leading  edge  of  technology  development  and  this  requires  user-­‐centric  governance.  Leadership  and  management  of  HPC  infrastructure  at  the  Europe  level  should  be  a  partnership  between  users  and  providers.  

comprises   the   grid   or   network   infrastructure,  which   can   be   based   on   state-­‐of-­‐the-­‐art  developments   within   existing   projects.   The  national  computational  centres  are   important  resources;   our   view   is   that   the   European  dimension  should  consider  them  as  an  integral  part  of  the  European  resource  pyramid,  whose  apex   should   be   an   exceptional   –   at   the  exascale   level   of   performance   –   permitting  very   large   capability   class   resource.   Such   an  approach   will   position   European   capability  resources  at  a  level  comparable  to  the  best  in  the   world,   resources   that   to   date   are  predominantly  available  in  Japan  and  the  USA.    

The   effect   of   a   European   collaboration   to   advance   the   apex   of   the   resource   pyramid   amounts   to  positioning   it  competitively  with  respect   to  similar  systems   in  other  major  countries,  notably   Japan  and   the  USA,  and   the  emerging  HPC  nations  undertaking  ambitious  HPC  programs,   including   India,  Russia  and  China.  The  key  driver   is   to  promote  scientific  competitiveness;   these  systems  should  be  targeted   strategically   at   scientific   challenges   with   the   full   support   and   agreement   of   the   relevant  scientific  communities.  In  this  report,  we  show  that  a  wide  spectrum  of  scientific  challenges  demand  exascale  resources  best  achieved  at   the  European   level.  The   justification  of  such  an  endeavour  has  been  given  on  scientific  grounds.    

Most  application  areas   foresee   the  need   to   run   some   long   jobs   (for  months  or   years)   at   sustained  performances  around  100  Pflop/s,   typically   to  generate  core  data  sets,  and  very  many  shorter   jobs  (for   hours   or   days)   at   lower   performances   for  pre-­‐   and   post-­‐processing,  model   searches   and  uncertainty   quantification.   This   requires   a  small   number   of   Eflop/s   machines   at   Tier-­‐0,  integrated  with  a  much  larger  number  of  multi-­‐Pflop/s   machines   at   Tier-­‐1.   The   main  impediments   to   realising   these   performances  are   the   management   of,   and   fast   access   to,  multi-­‐PByte   datasets   and   the  algorithm/software  challenges  of  strong  scaling  to  exploit  Eflop/s  architectures  efficiently.    

The   computational   materials   science,  chemistry   and   nanoscience   community   in  Europe   comprises  more   than  10,000   scientists  –   probably   some   tens  of   thousands   –  working  in   fields   as   diverse   as   nanoelectronics,   steel,  blood   flow,   poly-­‐electrolytes   and   bio-­‐compatible   materials.   Such   an   active   and  diverse   community   has   applications   in  capability  and  capacity  computing  that  are  best  served   with   a   heterogeneous   computational  science   infrastructure,   flexible   policies   for  PRACE   access   and   project   duration.   Many  applications   require   capacity   computing.  Examples   include  the   investigation  of   the  properties  of  quantum  materials  with  strongly  correlated  electrons  exhibiting  exotic  properties,  multiscale  simulations  of  complex  fluids,  soft  and  biomaterials  

Page 136: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

136  

Recommendation  Thematic Centres Organisational  structure  is  needed  to  support  large  long-­‐term  research  programmes,  bringing  together  competences  to  share  expertise.  This  could  take  the  form  of  virtual  or  physical  thematic  centres  which  might  support  community  codes  and  data,  operate  dedicated  facilities,  focus  on  co-­‐design,  or  have  a  cross-­‐cutting  role  in  the  development  and  support  for  algorithms,  software,  or  tools.  While  some  existing  application  areas  have  self-­‐organised  in  this  way,  new  areas  such  as  medicine  might  achieve  more  rapid  impact  if  encouraged  to  follow  this  path.  Thematic  centres  should  be  established  to  support  large  long-­‐term  research  programmes  and  cross-­‐cutting  technologies,  to  preserve  and  share  expertise,  to  support  training,  and  to  maintain  software  and  data.  

and  heterogeneous  materials,  or  the  complete  simulation  of  a  nanoelectronic  device.  Dealing  will  all  elements  of   the  periodic   table,   introducing  myriads  of   atoms,   scanning   temperature,  pressure  and  chemical   potential   ranges,   simulating   non-­‐equilibrium   processes,   including   external   stimuli,   all  reveals   a   large   phase   space   of   opportunities   that   encourages   further   progress   by   combinatorial  materials  optimisation  to  develop  a  treasure  map  for  technological  applications.  This  type  of  science  is   not   possible   without   powerful   capacity   computing   capabilities.   On   the   other   hand,   capability  computing   should   serve   dynamical   mean   field,   Quantum   Monte   Carlo,   molecular   dynamics   and  order-­‐N  density  functional  theory  software.  In  these  circumstances,  the  importance  and  the  adoption  of   the   methods   would   undoubtedly   increase   with   different   computer   platforms   required   as   a  function  of  algorithm.    

Another  consequence  of  this  discussion  is  that  it  is  highly  unlikely  that  there  will  be  a  single  design  or  architecture   that   best   addresses   the   exascale   requirements   of   all   disciplines.   Indeed,   some  application  areas   require   intensive  use  of   specific   system  architectures   and/or  particular  modes  of  access  and  operation  such  as  on-­‐demand  access  and  guaranteed  turnaround,  data  and  code  security  or   access   to  massive   data   repositories   and   instruments,   arguing   for   the   introduction  of   dedicated,  thematic  facilities.  Thus,  the  computational  materials  science,  chemistry  and  nanoscience  community  are  giving   serious   consideration   to   the  provision  of   a   special-­‐purpose   computer   for   long  molecular  dynamics   runs.   Vital   problems   in   the   field   of   life   sciences   will   only   be   addressable   through   the  development   of   novel   architectures,   not   by   huge   machines   with   very   large   theoretical   peak  power   but  limited  efficiency  for  the  applications  of   interest.  This   is  already  at  an  advanced  stage  in  the  USA  and  Japan,  and  there  is  an  extreme  danger  that  Europe  will  be  left  behind.  

What  is  clear  is  that  most  researchers  seek  more  flexibility  and  control  over  operating  modes  than  they   have   today,   largely   to   manage   data  efficiently,   but   also   to   meet   the   growing  need   for   on-­‐demand   use   with   guaranteed  turnaround   times.   A   minority   would   like  support   for   computational   steering   and   co-­‐scheduling.  Thus,  in  biomolecular  simulation  (see   section   5.4.4),   the   handling   o f   very  large   volumes   of   state   data   will   require  new   techniques   for   data   management,  collaborative   interactive   visualisation   and  the   computational   steering   of   simulations.  Further   developments   with   a   potentially  high   impact   on   computational   engineering  include   the   use   of   HPC   systems   for  interactive   computational   steering   that  requires   interactive   behaviour   and  correspondingly   fast   response   times   for   the  simulation.   Even   beyond   this   are   real-­‐time  and   embedded   simulations,   and   immersive  virtual  reality  techniques  (see  section  6.1).        

7.3 Computational  Science  Infrastructure  in  Europe  

Important   considerations   in   the   provision   of   high-­‐performance   computing   include   the   associated  development  infrastructure  in  place  around  the  machines,  plus  the  level  of  expertise  required  within  the  scientific  community   to  ensure  effective  exploitation  of   the  resources  provided.  We  focus  here  on  the  human  aspect  of  this   infrastructure  and  what   is  needed  to  keep  Europe  as  a   leading  area  in  the  world.    

 

Page 137: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

137  

 

Development  of  Adequate  Models  

The  development  of  adequate  models,  and  their  evolution  according  to  scientific  progress.  

Development  of  Mathematical  Methods,  

Numerical  and  Statistical  Methods  

The  development  of,  or  improvements  to,  the  hierarchy  of  mathematical  methods,  numerical  and  statistical  methods,  and  other  resolution  techniques  required  to  fully  exploit  the  developed  models.  It  is  important  to  recognise  that,  while  the  continuing  growth  in  computer  power  certainly  has  a  major  impact  on  computational  science,  by  far  the  greater  advances  are  due  to  algorithmic  and  method  developments.  

Associated  Computer  Codes  

The  development  of  the  associated  computer  codes,  together  with  associated  algorithms  and  their  efficient  implementation  on  the  available  resources.  Here  code  development  is  taken  to  include  the  following  components:  

• The  whole  process  from  a  researcher  (often  a  doctoral  student  or  young  postdoc)  initiating  an  algorithm  for  a  new  type  of  simulation  (for  example)  to  its  incorporation  in  a  generally  applicable  form  in  widely  disseminated  codes  

• Maintenance  of  codes  that  may  contain  a  million    lines  of  Fortran  as  new  advances  have  to  be  incorporated  from  diverse  directions  

• Code  portability  and  optimisation  for  new  machines,  particularly  with  novel  architectures  

• Interfacing  with  other  codes  • Incorporating  new  computational  developments  such  as  GRID,  

middleware,  sophisticated  databases,  metadata,  visualisation  and  the  use  of  different  types  of  architecture  for  different  purposes  

• All  types  of  code,  from  the  large  community  codes,  to  a  toolbox  of  simple  basic  codes  for  researchers  to  access  as  platforms  for  developing  new  directions.  

Researcher  Training  and  Support  

The  need  for  training  is  an  inherent  consequence  of  the  rapid  development  of  the  field  and  the  very  sophisticated  nature  of  much  of  the  methodology,  including  numerous  approximations,  tricks  and  short-­‐cuts  to  make  the  simulations  feasible.  In  many  areas,  it  is  only  in  the  simplest  routine  applications  that  one  can  use  the  code  as  a  'black  box'  without  expert  steering.  Young  researchers  having  been  trained,  and  code  users  generally,  need  continuing  expert  support  and  personal  contact  as  research  priorities  change  and  codes  evolve.  We  return  to  this  area  of  support  in  section  7.5.  

Access  to  Expertise  

Access  to  expertise,  code  libraries  and  other  information  across  this  interdisciplinary  field.  Better  code  libraries,  databases,  input  and  output  standardisation,  etc.,  are  needed,  with  the  means  of  access  to  all  sorts  of  information  and  personal  expertise  through  websites,  newsletters  and  email  lists.  

 Figure  7.1.  Key  components  of  the  computational  science  infrastructure.  

 

Page 138: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

138  

We   are   concerned   specifically   with   the   format   of   a   network   of   expertise   which   is   needed.  More  specific  descriptions  have  been  provided  in  the  preceding  thematic  chapters,  but  from  a  very  general  perspective  this  infrastructure  should  provide  for:  

• The  development  of  adequate  models,  and  their  evolution  according  to  scientific  progress  • The  development  of,  or  improvements  to,  the  hierarchy  of  mathematical  methods,  numerical  

and  statistical  methods,  and  other  resolution  techniques  required  to  exploit  fully  the  developed  models  

• The  development  of  the  associated  computer  codes,  together  with  associated  algorithms  and  their  efficient  implementation  on  the  available  resources    

• Researcher  training  and  support.  The  need  for  training  is  an  inherent  consequence  of  the  rapid  development  of  the  field  and  the  very  sophisticated  nature  of  much  of  the  methodology,  including  numerous  approximations,  tricks  and  short-­‐cuts  to  make  the  simulations  feasible.  (We  return  to  this  area  of  support  in  section  7.6)  

• Access  to  expertise,  code  libraries  and  other  information  across  this  interdisciplinary  field  

We   expand   on   each   of   the   above   points   in   Figure   7.1.   What   is   clear   is   that   addressing   these  requirements  requires  significant  planning  and  human  investment.  For  example,  the  development  of  a   large  code  may   involve  a  collaborative   team  effort   lasting  some  five  years,  culminating   in  a  code  that  may  be  used  for  10  to  20  years.  

Therefore,   a   visible,   long-­‐term   commitment   of   the   European   Community   and   of   the   research  organisations  is  crucial.  Such  a  commitment  would  convince  the  scientific  community  to  commit  their  own   expertise   and   resources;   indeed,   commitment   to   a   European   exascale-­‐level   supercomputing  infrastructure  would  be  a  clear  signal  of   intent,  confirming   to   leading  scientists   that  computational  science  is,  indeed,  perceived  to  be  one  of  the  major  pillars  of  scientific  progress  (see  section  7.4).  

This   argument   also   suggests   that   a   European   exascale-­‐level   infrastructure  would   increase   the   role  and   impact   of   the   overall   computational   resource   pyramid:   beneficiaries   would   include   national  centres,   application   codes   repositories,   access   and   data   grids,   and   so   on.  We   have   already   shown  that   computational   infrastructure   is   an   enabler   for   scientific   and   technological   development;   a  European  leadership-­‐class  infrastructure  will  prove  to  an  enabler  for  many  scientific  and  engineering  programmes.  

The  organisational  structure  needed  to  implement  the  three  activities  described  above  should  be  on  a   European   level   because   one   country   is   too   small   a   unit   for   efficiency   and   effectiveness.   We  envisage  that  the  cyber-­‐infrastructure  has  to  be  largely  managed  by  the  research  community  itself  in  each  particular  field  because  the  circumstances  vary  so  widely  across  the  sciences.  But  of  course  this  will  be  with  some  help  from  major  computer  centres  and/or  European  organisations  such  as  CECAM,  EMBL,  etc.  These  can  provide  a  permanent  hub  and  a  home  with  scientific  and  organisational  support  for  a  particular  research  community,  as  well  as  some  technical  help.    

7.3.1 Panel  Perspectives  Experience  in  the  astrophysics,  high-­‐energy  physics  and  plasma  physics  communities  shows  that  new  code   development   targeted   at   capability   computing   is   very   much   an   individual’s   initiative,   and  resources  are  initially  scarce.  This  seems  an  unavoidable  feature  of  frontier  research.  Conversion  and  optimisation  of  mature  codes  can  be  dealt  with  by  a  more  project-­‐oriented  organisation  involving  a  team.   It   should   be   clear   that   code   development   as   well   as   optimisation   is   an   integral   part   of   the  research   process,   with   all   the   hurdles   and   perhaps   dead   ends   characteristic   of   exploring   the  unknown.    

The   scientific   challenges   faced,   for   example   in   plasma   physics,   require   dedicated   effort   over  timescales   measured   in   decades.   Thus,   the   sustained   availability   of   state-­‐of-­‐the-­‐art   computer  

Page 139: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

139  

resources   such   as   those   provided   by   PRACE,   as   well   as   of   adequate   technical   support   to   code  developers  and  users,  is  essential  to  meet  the  challenges  faced  by  the  field.      

Less   demanding   of   the   highest   levels   of   HPC   resources,   the   development   of   integrated  modelling  frameworks  requires  a  strong  community  effort  –  probably  the  development  of  specific  software  to  interface  the  codes  –  substantial  optimisation  work,  and  perhaps  dedicated,  although  not  necessarily  field-­‐specific,   hardware.   The   fusion   community   recognises   the   crucial   importance   of   HPC  infrastructure   for   plasma   simulations.     A   Pflop/s   machine   was   recently   acquired   by   IFERC  (International   Fusion   Energy   Research   Center),   under   an   EU-­‐Japan   bilateral   agreement   that  accompanied  approval  of  the  ITER  device.  This  machine  provides  both  capability  computing  for  grand  challenges   in   fundamental   plasma   physics   dynamics   and   capacity   computing   for   demanding  parametric  studies  of   fusion  devices.  The   fusion  plasma  simulation  challenge  and  potential   societal  benefit  are  enormous.  Expanding  the  HPC  resources,  and  the  available  manpower  with  appropriate  IT  and  algorithm  development  skills,  could  be  an  essential  and  worthwhile  investment  for  Europe.  

Many   of   the   problems   in   the   life   sciences   and   medicine   cannot   be   addressed   with   present-­‐day  simulation   methodologies.   This   goes   beyond   the   adaptation   of   existing   software   to   new  computational   platforms   and   involves   a   general   lack   of   scalability   as   well   as  missing   concepts   of  multiscale,  multi-­‐model  interactions    that    are    required    to    exploit    exascale    computing    platforms  efficiently.   Such  hurdles   can   best   be   overcome   by  nucleating   communities   of   scientists,   from   life  science   research,   bioinformatics   and   computer   science,   who   will   work   together   to   address   the  problems    and    to    develop    innovative    solutions.  Such  communities    can    be    fostered    by  programs  such  as   the  E-­‐science/E-­‐infrastructure  schemes   implemented  in  the  present   ICT  program  of  FP7.  A  vigorous   expansion   of   such   activities   is   required   in   order   to   generate   methods     and    implementations     capable     of     correctly     exploiting     the     new     computational   resources   for   life  science  applications.   It  must  be  acknowledged   that   life   science   research   rewards   applications   and  method   development   only   in   the   context   of   successful   applications.   In   order   to   generate   a  sustainable  and  effective  set  of  codes   for   life  science  applications,   it   is   important   to  nucleate  and  consolidate   the   scientific   community  at   the   European   scale.   The   formation  of  broad   communities  targeting  exascale  method  development  would  be  a  tremendous  benefit  for  R&D  efforts  in  Europe,  because  it  would  enable  the  transfer  of  such  technologies    to    the    European    end-­‐user,    generating    a    competitive    advantage    over    other  regions.  

 

7.4 The  Challenges  of  Exascale-­‐Class  Computing  HPC  is  currently  undergoing  a  major  change  as  the  next  generation  of  computing  systems  (‘exascale  systems’4)   is   being   developed   for   2020.   These   new   systems   pose   numerous   challenges,   from   a  hundredfold   reduction   of   energy   consumption85   to   the   development   of   programming   models   for  computers  that  host  millions  of  computing  elements.  These  challenges  are  common  to  all  and  cannot  be  met  by  mere  extrapolation  but  require  radical   innovation   in  many  computing  technologies.  This  offers   opportunities   to   industrial   and   academic   players   in   the   EU   to   reposition   themselves   in   the  field.   Europe   has   all   the   technical   capabilities   and   human   skills   needed   to   tackle   the   exascale  challenge,   i.e.   to   develop   native   capabilities   that   cover   the   whole   technology   spectrum   from  processor  architectures  to  applications86.  Even  though  the  EU  is  currently  weak  compared  to  the  US  in  terms  of  HPC  system  vendors,  there  are  particular  strengths  in  applications,  low-­‐power  computing,  systems  and  integration  that  can  be  leveraged  to  engage  successfully  in  this  global  race,  getting  the  EU  back   on   the  world   scene   as   a   leading-­‐edge   technology   supplier.   Progress  within   Europe  has   to  

                                                                                                                         85  In  line  with  Europe's  green  economy  targets,  ec.europa.eu/europe2020/targets/eu-­‐targets/index_en.htm;  COM(2009)  111,  Mobilising  Information  and  Communication  Technologies  to  facilitate  the  transition  to  an  energy-­‐efficient,  low-­‐carbon  economy  

86  http://www.prace-­‐ri.eu/IMG/png/fecafedc.png  

Page 140: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

140  

date  been  channelled  through  the  EESI  –  The  European  Exascale  Software  Initiative11  –  an   initiative  co-­‐funded  by   the   European  Commission.   EESI’s   goal   is   to   build   a   European   vision   and   roadmap   to  address  the  challenge  of  the  new  generation  of  massively  parallel  systems  that  will  provide  Pflop/s  performances   in   2010   and   Eflop/s   performances   in   2020.   EESI   is   investigating   the   strengths   and  weaknesses   of   Europe   in   the   overall   international   HPC   landscape   and   competition.   In   identifying  priority   actions   and   the   sources   of   competitiveness   for   Europe   induced   by   the   development   of  peta/exascale  solutions  and  usages,  EESI  is  investigating  and  proposing  programmes  in  education  and  training  for  the  next  generation  of  computational  scientists.  The  Initiative  is  also  seeking  to  identify  and  stimulate  opportunities  for  worldwide  collaborations.  

 Figure  7.2  The  work  package  structure  of  EESI11.  

 

WP2  

WP2  International  networking  (Europe,  US  and  Asia);  acting  as  an  interface  between  Europe,  US  and  ASIA,  WP2  is  communicating  progress  and  opportunities  to  European  software  communities  involved  in  scientific  software  development,  and  also  signalling  the  needs  and  challenges  faced  by  European  scientific  software  developers,  on  a  global  level.  It  is  also  expected  to  identify  some  US,  ASIA  and  European  cross  actions,  providing  coordination  with  the  International  Exascale  Software  Project  (IESP12).  

WP3  and  WP4  

Working  groups  charged  with  creating  a  common  vision  and  deriving  coherent  roadmaps  for  each  of  eight  specified  topics,  including  the  identification  of  competitiveness  sources  for  Europe,  and  needs  for  education  and  training.  The  working  groups  include  those  in:  

• Industrial  and  Engineering  Applications  • Weather,  Climatology  and  Earth  Sciences  • Fundamental  Sciences  (including  Physics  and  Chemistry)  • Life  Science  and  Health  • Hardware  Roadmaps  and  Links  with  Vendors  • Software  Ecosystem  • Numerical  Libraries,  Solvers  and  Algorithms  • Scientific  Software  Engineering  

WP3  

Investigating  the  application  drivers  for  peta-­‐  and  exa-­‐scale  computing,  looking  to  identify  the  needs  and  expectations  of  scientific  applications  in  the  exascale  time  frame  in  terms  of  scientific  challenges,  levels  of  physics  involved,  coupling  of  codes,  numeric,  algorithms,  programming  models  and  languages,  size  of  data  sets,  simulation  steering,  pre/post  processing,  and  the  expected  level  of  performance  on  exascale-­‐class  resources  

WP4  

Identifying  the  necessary  technology  enabling  exascale  computing  to  take  the  application  requirements  and  needs  identified  by  WP3.  This  WP  uses  a  cross-­‐disciplinary  approach  to  assess  novel  hard  and  software  technologies  addressing  the  exascale  challenge.  Other  areas  of  interests  are  highly  scalable  system  software  and  program  tracing  tools,  fault  tolerance  on  the  system  as  well  as  application  side,  novel  programming  paradigms  as  well  as  novel,  highly  scalable  numerical  algorithms  

WP5  

Dissemination;  this  WP  is  dedicated  to  communication  and  dissemination  actions  at  large  

The  project  is  divided  into  the  five  work  packages  outlined  in  Figure  7.2.  Leveraging  the  results  from  the  EESI  deliberations  as  part  of  the  current  exercise  has  been  ensured  through  including  many  of  the  EESI  project  leads  in  the  Scientific  Case  panel  membership.      

Page 141: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

141  

7.4.1 Addressing  the  Data  Challenge  The   management   of   data   faces   many   challenges   as   rapidly   increasing   computational   power   and  similarly  fast  progress  in  a  variety  of  sensor  technologies  create  floods  of  data.  Science  is  not  alone  in  facing    an    explosion  of    data,    with    the    consequence  that    a    range    of    technological  solutions    are  emerging  as   industry   responds   to  sector-­‐wide  demand   for   storage  devices  at   lower  cost  and   lower  power  use.    

Scientific  data  centres  need  to  be  able  to  exploit  these  new  technologies.  Just  as  the  management  of  data  poses  a  number  of  significant  challenges,  so  exascale  supercomputing  is  faced  with  a  number  of  data  challenges,  perhaps  none  more  so  than  of  the  storage  system,  and  particularly  the  software  it  entails.   I/O  capabilities   in  high-­‐performance  computing  have  typically   lagged  behind  the  computing  capabilities   of   such   systems,   especially   at   the   high   end.   If   not   addressed,   these   exascale   storage  issues   promise   to   become  even  more   intractable   by   the   time   these   first  machines   start   to   appear  toward  the  end  of   the  decade.  By  way  of  example,  we  provide  below   just   two   instances  of   the  so-­‐called  ‘data  deluge’  from  the  scientific  panels  central  to  this  report  –  from  life  sciences  and  medicine,  and  from  the  climate-­‐modelling  community.  

The   benefits   of   the   continuous   development   of   more   powerful   computation   systems   are   visible  in   many   areas   of   l ife   s ciences.   For   example,   at   the   beginning   of   2000,   the   Human   Genome  Project87  was   an   international   flagship   project   that   took   several  months   of   CPU   time   using   a   100  Gflop/s   computer   with   1   terabyte   of   secondary   data   storage.   Today,   genomic   sequencing   has  changed   from   being   a   scientific   milestone   to   a   powerful   tool   for   the   treatment   of   diseases,   in  particular   because   it   is   able   to   deliver   results   in   days,   while   the   patients   are   still   under  treatment.   The   Beijing   Genomics  Institute  is  capable  of  sequencing  more  than  100  human  genomes  a   week   using   the   Next   Generation   Sequencing   instruments   and   a   100   Tflop/s   computer   that   will  migrate   in   the   near   future   to   a   1   Pflop/s   capability.88   Today,   genome   sequencing   technology   is  ineffective   if   the   data   analysis  needs  to  be  carried  out  on  a  grid  or  cloud-­‐like  distributed  computing  platform.   First,   such   systems   cannot   achieve   the   necessary   dataflow,   of   the   order   of   20  PBytes/year,   and,   second,   research   involving   living  patients   requires  both   speed  and  high   security  that  are  lacking  in  such  environments.  Lastly,  ethical  and  confidentiality  issues  handicap  distributing  patient   data   across   the   cloud   world.   In   coming   years,   sequencing   instrument   vendors   expect   to  decrease   costs   by   one   to   two   orders   of   magnitude,   with   the   objective   of   sequencing   a   human  genome   for   $1,000.   This   will   make   it   possible   to   integrate   genomic   data   into   clinical   trials   (that  typically   involve   thousands   of   human   tests)   and   into   the   health   systems   of   European   countries.  Drug   development  will  become  easier  and  faster,  and  it  will  have   a  dramatic  impact  on  therapy.  It  is  worth  noting  again  here  that  Europe's  pharmaceutical  industry  contributes  significantly  more  to  the  region's  GDP   than   is   true  of   the  pharmaceutical  industries   in   the  US  and  other  nations.  We  should  not   forget,   however,   that   all   these   possibilities   could   only   develop   if   computer  resources  can  deal  with  the  complexity  of   the   large   interconnected  data  sets  that  are  serving  the   large   community   of  l ife   s cience.   For   example,   today   the   EBI   (which   hosts   the   major   core   bio-­‐resources  of  Europe)  has  doubled  the  storage  from  6,000  TBytes  (in  2009)  to  11,000  TBytes  (in  2010),  and  has  received  an  average  of  4.6  million  requests  per  day  (see  Figure  1.1).  

Genomics   research   faces   problems   (e.g.   the   sequencing   of   2,500   genomes   of   cancer   patients)  involving  the  management  of  massive  amounts  of   data   in   programs  that   can   require  hundreds  of  thousands   of   processors,   but   little   inter-­‐processor   communication.   However,   the   vast   amount   of  data   to   be  managed   (and   often   confidentiality   and   privacy   aspects)   hampers   the   use   of   cloud   or  grid-­‐computing  initiatives  as  a  general  solution.  Suitable  and  flexible  access  to  computer  resources  is   crucial   in   this   area.   The   genomic   subpanel   asserts   that   currently   known   cornerstones   for   an  exascale   system   (number   of   computer   nodes,   I/O   and  memory   capacities)   are   clearly   driving   the  

                                                                                                                         87  International  Human  Genome  Sequencing  Consortium.  Nature  2001  88  http://www.genomics.cn/en/platform.php?id=248  

Page 142: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

142  

focus  only  to  reach  the  Eflop/s  peak  performance.  For  most  of  the  genomics  challenges,  an  Eflop/s  computer   that   could   be   even   less   ‘balanced’   than   today’s   HPC   systems   would   be   a   substantial  barrier  to  using  these  machines  efficiently.  The  genomics  subpanel  and  by  extension  the  entire  life  sciences  panel  wishes  to  stress  their  major  concerns  that  their  exascale  problem  will  be  difficult  to  treat  with  the  unbalanced  architectures  of  anticipated  Eflop/s  computers.  

Biological  data   is   growing  at   an   incredible   rate   and,  with   it,   the   computational  needs   in   the   field  are  increasing.  The  panel  wishes  to  stress  that,  in  this  field,  computing  ‘capability’  does  not  simply  translate  to  the  number  of  flops  that  can  be  brought  to  bear  on  a  single  project.  It  requires  instead  computer   systems   that   can   solve   biological   problems   on   an   appropriate   timescale.   This   point,  considered   crucial   by   the   panel,   becomes   very   clear   when   considering   studies   that   can   have   a  direct  impact  on  the  health  of  living  patients.  The  panel  considers  that  flops  should  be  not  the  only  parameter   defining   HPC   capabilities.   This,   in   turn,   requires   defining   exactly   what   ‘exascale  resources’   means.   Efficient   data   management   and   fast   and   flexible   interaction   with   computer  resources  are,  in  many  fields  of  life  sciences,  at  least  as  important  as  theoretical  peak  power.    

We  must   remember  that  biological  data   is   expected  to   grow  by  a  factor  of  10,000  before  the  end  of   the   present  decade,   surpassing  Moore’s  law   (see,   for   example,   the   growth   of   storage   in   the  EBI,   Figure   1.1).   Biological   data   is   very   heterogeneous,  is  difficult  to  organise  and,  in  some  cases,  is   subjected   to   ethical   restrictions   on   its   use.   Efficient  management   of   biological   data   to   obtain  relevant   information   will   require   optimised   I/O   capabilities,   efficient   structures,   post-­‐processing  pipelines   (quality  and  validation),  multi-­‐PByte  data   sharing  systems  and,   in  some  cases,  significant  main   memory   requirements.   The   standard   protocols   for   the   access   to   HPC   resources   are   not  presently   compatible   with   the   needs   of   research   in   several   areas   of   life   sciences,   especially  concerning  human  health,  where   fast  data  processing  has  a   real   impact  on  patients  under  clinical  treatment.  With  these  technical  hurdles   in  mind,  the   life  sciences  community  is  already  preparing  for  the  next  bio-­‐supercomputing  challenges.  

Advances   in   the   technologies  for  data  generation  that  both   increase  the  output  and  decrease  the  cost  will  mean  that,  over  the  next  decade,  the  quantity  of  data  being  produced  will   increase  by  at  least  a  thousandfold  and  perhaps  as  much  as  a  millionfold.  There  are  three  challenges  facing  HPC  centres  –  data  storage,  data   transportation  and   data  confidentiality.  On   the  other  hand,  while  the  most  popular  genomics  software  is  regularly  reviewed  and  optimised  for  new  systems  (e.g.  BLAST),  a   large   part   of   the   available   genomics   libraries   has   been   built   since   1990s   using   inefficient  script/high-­‐level   languages  (e.g.   Perl,   Java  or   Python  packages).  These  codes  still   perform  well   for  current  data  loads,  but  they  may  not  be  ready  for  the  data  challenges  of  the  next  decade.    

Within   the   climate   modelling   community,   the   CMIP5   (Coupled   Model   Inter-­‐comparison   Project  Phase   5)89   archive   is   pushing   the   boundaries   of   data   management,   with   an   expected   volume   of  around  10  PBytes.  Rapidly  increasing  HPC  performance  will  be  reflected  in  increased  data  volumes,  pushing  towards  and  1  EByte  archive  within  a  decade.  Dealing  with  such  volumes  of  data  will  require  fundamental   shifts   in   data   management   and   analysis   methodologies.   The   underlying   technology  drivers   behind   such   shifts   have   been   outlined   in   the   EESI   Working   Group   Report   on   Weather,  Climate  and  solid  Earth  Sciences,90.  Three  of  these  are  summarised  below:                                                                                                                              89  At  a  September  2008  meeting  involving  20  climate  modelling  groups  from  around  the  world,  the  WCRP's  Working  Group  on  Coupled  Modelling  (WGCM),  with  input  from  the  IGBP  AIMES  project,  agreed  to  promote  a  new  set  of  coordinated  climate  model  experiments.  These  experiments  comprise  the  fifth  phase  of  the  Coupled  Model  Intercomparison  Project  (CMIP5).  

90  Working  Group  Report  on  Weather,  Climate  and  solid  Earth  Sciences,  ESI_D3.4_WG3.2-­‐REPORT_R2.0.DOCX,  CSA-­‐2010-­‐261513,  08/11/2011.  We  refer  the  reader  to  the  Working  Group  report  for  details  of  the  other  technology  drivers,  and  the  associated    R&D  Strategies.  The  latter  include  (i)  Taking  the  computation  to  the  data,  (ii)  Grid  vs  cloud,  (iii)  Scientific  Workflow  Tools,  (iv)  Scalable  data  formats,  (v)  Search  and  Query  tools,  (vi)  A  range  of  Storage  media,  backup  and  curation  tools,  (vii)  Optimised  hardware  deployment  within  the archive,  (viii)  Archive  locations  and  the  optimal  location  of  data  centres,  and    (ix)  Options  for  collecting  robust  meta-­‐data.  

Page 143: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

143  

1. Increasing  rate  of  data  supply.  The  rapid  increase  of  HPC  centre  productivity  and  parallel  increases  in  sensor  technologies  can  be  expected  to   increase  data  flow  by  a   factor  of  1,000  in  the  coming  decade,  taking  us   from  peta-­‐  to  exa-­‐scale  archives.  

2. Power  supply  constraints.  The   rapidly  falling  cost  of   computation  is   expected  to   lead   to  a  correspondingly  rapid   increase   in  data  generation.  Data  volumes  can  be  expected  to  increase  by  a  factor  of  100–1,000  over  the  next  10  years.  Over  the  same  period,  the  fall  in  energy  usage  per  byte  of  stored  data  held  on  disk  may  only  be  a  factor  10,  compared  to  a  projected  fall  in  procurement  costs  by  a  factor  of  200.  We  are  likely  to  move  from  a  regime  in  which  procurement  costs  dominate  to  one  in  which  power  costs  dominate.  This  will  greatly  increase  the  cost  of  holding  data  on  disk,  demanding  more  power-­‐aware  data  management  strategies.  Major  commercial  data  centres  are  responding  by  placing  major  data  archives  near  sources  of  cheap  and  sustainable  cooling  and  power.  

3. Heterogeneity  of  storage  media.  The  growing  complexity  of  storage  systems  will  challenge  management  strategies.  A  storage  centre  might  contain  a  selection  of  storage  technologies:  traditional  disk,  micro-­‐servers,  tape,  solid  state,  WORM1.  In  addition,  the  disks  will  have  multiple  modes  of  operation:  full  speed,  slow,  idle,  rest.  A  multi-­‐state  cache  algorithm  will  be  needed  to  optimise  usage  where  more  than  two  technologies  are  deployed  and  disks  are  in  multiple  states.  

7.4.2 Software  Development  and  Tools  for  Exascale-­‐Class  Computers  Exascale  HPC   systems  will   be   very  different   from   today’s  HPC   systems  and  building,  operating  and  using   exascale   systems  will   face   severe   technological   challenges.   There   is   wide   agreement   in   the  HPC  community  that   these   challenges  cannot  be   dealt  with  only  on   the  hardware   level.  Therefore  the  HPC    middleware    and     the     application    developers    have     to     address     these     challenges,    the    most  important  ones  being  scalability,  resilience,  energy  and  performance  (see  Table  7.2).  

 

 

Table  7.2.  Software  challenges  faced  by  the  HPC  middleware  and  application  developers  –findings  of  the  EESI  Software  Ecosystem.    

 

Challenges  Faced  by  the  HPC  Middleware  and    the  Application    Developers  

Scalability:  The  number  of  cores  in  an  exascale  system  will  in  the  order  of  108.  Not  only  the  applications  and  the  algorithms  but  the  whole  software  ecosystem  have  to  support  this  unprecedented  level  of  parallelism.    

Resilience:  For  statistical  reasons,  the  mean  time  between  critical  systems  failures  will  become  shorter.  The  general  expectation  is   that   resiliency  cannot  rely  only  on  hardware  features.  The  whole  software  ecosystem  has  to  be  aware  of  the  resilience  issue.  

Energy:  Exascale  systems  will  be  based  on  low-­‐power  components  (cores,  memory,  interconnect,  etc.).  Energy  consumption  will  be  crucial  and  needs  to  be  dynamically  managed  through  software  control.  In  particular,  developer  tools  have  to  adopt  the  notion  of  ‘energy  optimisation’  in  addition  to  the  standard  ‘performance  optimisation’.  

Performance:  Achieving   the   necessary  highest   levels   of   performance  on   an   exascale   complex   integrated  hard-­‐  and  software  stack  under  the  presence  of  dynamic  system  adaption  because  of  power  and  fault-­‐tolerant  events  will  be  very  challenging.    Performance-­‐aware   design  of  system,  runtime  and  application  software  will  be  essential.  

   

Page 144: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

144  

Recommendation:  Algorithms, Software and Tools Most  applications  targeting  Tier-­‐0  machines  require  some  degree  of  rewriting  to  expose  more  parallelism,  and  many  face  severe  strong-­‐scaling  challenges  if  they  are  effectively  to  progress  to  exascale,  as  is  demanded  by  their  science  goals.  There  is  an  ongoing  need  for  support  for  software  maintenance,  tools  to  manage  and  optimise  workflows  across  the  infrastructure,  and  visualisation.  Support  for  the  development  and  maintenance  of  community  code  bases  is  recognised  as  enhancing  research  productivity  and  take-­‐up  of  HPC.  There  is  an  urgent  need  for  algorithm  and  software  development  to  be  able  to  continue  to  exploit  high-­‐end  architectures  efficiently  to  meet  the  needs  of  science,  industry  and  society.    

 

Findings  of  the  EESI  Software  Ecosystem  group  

For  exascale  computing,  most  of  the  HPC  software  components  have  to  be  newly  developed,  addressing  the  exascale  challenges  of  scalability,  resilience,  energy  management,  etc.  The  current  HPC  software  does  not  address  these  challenges,  therefore  an  evolutionary  approach  (develop  current  SW  further)  will  not  be  sufficient.  A  substantial  investment  in  new  HPC  software  development  is  necessary.  

While  the  exascale  hardware  and  the  system  software  will  be  substantially  different  from  today’s  HPC  systems,  the  applications  –  in  particular  the  industrial  applications  –  require  a  continuous  path  to  exascale.  The  lifetime  of  HPC  application  codes  is  very  long  and  the  application  developers  (in  research  and  in  industry)  are  typically  not  ready  to  tune  their  applications  to  very  specific  ‘exotic’  new  programming  models.  

The  SW  ecosystem  must  be  portable  to  be  able  to  support  various  exascale  hardware  architectures  based  on  many  core  architectures.  The  different  exascale  system  architectures  should  not  be  visible  at  the  level  of  the  programming  model.  

It  is  expected  that  most  of  the  new  exascale  software  ecosystem  will  be  developed  by  the  R&D  community  under  open-­‐source  licences.  This  developer  community  should  work  closely  –  more  than  in  the  past  –  with  the  vendors,  to  adapt  the  new  software  ecosystem  on  the  vendors’  exascale  platforms.  Co-­‐design  processes  must  be  established,  ensuring  that  international  HPC  hardware  and  system  vendors  collaborate  with  European  R&D  laboratories.  

Considerable  and  adequate  funding  is  essential  for  research  and  development  in  areas  where  Europe  has  technology  leadership  (e.g.  programming  models,  performance  tools,  validation  and  correctness  tools)  to  maintain  and  extend  this  leadership.  The  key  players  should  form  alliances  and  work  more  closely  with  the  hardware  vendors  to  define  (de  facto)  standards.  

 

The  panel  acknowledges  the  work  undertaken  by  the  EESI  software  ecosystem  group91  who,  in  focusing  on  the  HPC  software  between  the  hardware   and   the   application   (system  software   and   development   tools),   addressed  the   aspects   of   European   competitiveness,  potential   collaborations   and   the   need   for  future   investments   (and   funding).   The   panel  further   supports   the   findings   of   this   report,  which   are   summarised   in   Table   7.2;   their  conclusions  are  reinforced  by  all  the  scientific  panels   contributing   to   this   report.   The  materials  science,  chemistry  and  nanoscience  community   is   scientifically   very   diverse   and  deals   with   a   large   spectrum   of   simulation  methodologies   that   is   realised   in   many  different   computer   codes   used   by   large  communities   –   including   those   in   life   science,   medicine,   engineering   and   industry.   The   computer  codes   are   very   complex   and   their   lifespan   is   in   general   much   larger   than   the   lifespan   of   a   given  hardware  architecture.  These  codes  are  typically  developed  by  small  expert  groups  that  are  part  of  their  respective  communities.  The  adaptation  of  existing  software  to  new  computing  platforms   is  a  major   challenge   that   accompanies   the   advent   of   petascale   computing   –   either   in   the   form   of  massively  parallel  computing  platforms  with  tens  of  thousands  of  cores  or  by  employing  accelerators  (e.g.   GPUs).     This   challenge   will   become   more   acute   with   the   arrival   of   exascale   computing.   It                                                                                                                            91  Working  Group  Report  on  the  Software  Eco-­‐system,  CSA-­‐2010-­‐261513,  EESI-­‐D4.4-­‐WG4.2-­‐REPORT-­‐R2.1.DOCX  

Page 145: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

145  

Recommendation:  A Long Term Commitment to Europe-Level HPC Major  experiments  depend  on  HPC  for  analysis  and  interpretation  of  data,  including  simulation  of  models  to  try  to  match  observation  to  theory,  and  support  research  programmes  extending  over  10–20  year  time  frames.  Some  applications  require  access  to  stable  hardware  and  system  software  for  3–5  years.  Data  typically  need  to  be  accessed  over  long  periods  and  require  a  persistent  infrastructure.  Investment  in  new  software  must  realise  benefits  over  at  least  10  years,  with  the  lifetime  of  major  software  packages  being  substantially  longer.  A  commitment  to  Europe-­‐level  HPC  infrastructure  over  several  decades  is  required  to  provide  researchers  with  a  planning  horizon  of  10–20  years  and  a  rolling  5-­‐year  specific  technology  upgrade  roadmap.    

manifests   itself   as   a   lack   of   scalability   as   well   as   missing   concepts   (e.g.   in   parallelising   long-­‐time  simulations  of  particle  trajectories),  or  multiscale  and  multi-­‐model   interactions  that  are  required  to  use  an  exascale   computing   infrastructure  at   its  best.   It   is  worth  noting   that  many  national   funding  agencies  across  Europe  recognise  and  reward  applications  but  view  method  development   in  a  very  different  light.  Clearly,  the  development  of  an  exascale  infrastructure  must  overcome  such  obstacles  and  take  into  account  the  nature  of  the  networking  and  organisational  structure  of  the  community  at  the  European  scale  and  nucleate,  support  and  nurture  the  vigorous  and  grass-­‐root-­‐like  communities  of   scientists.   Such  measures   should   be   addressed   within   the   European   funding   scheme   FP7,   with  these  communities  working  coherently  to  phase  out  computer  codes  that  are  not  scalable  to  the  new  architectures,  while  developing  new  codes  using  in  part  software  from  previous  instantiations.  There  is   a   clear   requirement   to  establish   simulation   laboratories   to  provide   training  and  workshops   for  a  wider   community  on   these   codes,   and   in   the   context  of   the  exascale   applications,   further  develop  such  codes  in  response  to  community  requirements.    

All   these   groups   together   form   the   broad   community   targeting   exascale   method   development   in  Europe  with   significant  cross   synergies.  PRACE   itself   should  have  a   small  expert  group  advising   the  developers   and   assisting   the   deployment   of   the   codes.   A   network   of   these   people   is   essential   for  promoting  computational  science  on  an  exascale  infrastructure.  

7.5 A  Support  Infrastructure  for  the  European  HPC  Community      

7.5.1 Long-­‐term  Continuity  of  Reliable  HPC  Provision    The  central   requirement   for  all   fields   is   the  need  for   long-­‐term  continuity  of   reliable  HPC  provision  and   support.   Thus   the   typical   time   scales   for   any   real   development   in   astro,   particle   or   plasma  physics   is  one  or  several  decades.  Each  large  experiment  needs  continuous  theory  support,  starting    from  the  early  planning  phase  and  ending  only  when  the    analysis   is   finished,  which  often  happens  only  years  after  a  high-­‐energy  physics  experiment  was  shut  down  or  a  satellite  ran  out  of  power.  One  also  has  to  realise  that  the  development  and  optimisation  of  application  codes  needs  at  least  several  years.   Finally,   the  demands   faced  by   theory   and  thus   the   needs   for   HPC   infrastructure   will  increase  steadily  for  many  years  to  come.    

Most   major   experimental   and   observational  facilities   depend   on   large-­‐scale   HPC   for   analysis  and   interpretation   of   data,   including   simulation  of  a  range  of  models  and/or  parameter  values  to  try   to   match   observation   to   theory.   These  facilities   often   support   research   programmes  extending   over   10–20   year   time   frames   and   the  supporting  HPC  infrastructure  needs  to  exist  over  a   comparable   period.   Some   applications   (e.g.  climate   modelling)   require   bit-­‐reproducibility  over   multi-­‐year   programmes,   necessitating  guaranteed   access   to   stable   hardware   and  system   software   for   periods   of   3–5   years.   Data  typically   need   to   be   accessed   over   long   periods  and   require   a   persistent   infrastructure.  Investment  in  new  software  must  realise  benefits  over   at   least   a   10-­‐year   period,   with   the   lifetime   of   major   software   packages   being   substantially  longer.  A  commitment  to  Europe-­‐level  HPC  infrastructure  over  several  decades  is  required  to  provide  researchers  with  a  planning  horizon  of  10–20  years  and  a  rolling  5-­‐year  specific  technology  upgrade  roadmap.  However,  the  present  funding  scheme  of  PRACE  based  on  5-­‐year  periods  does  not  fit  these  

Page 146: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

146  

facts.   Thus,   there   exists   an   urgent   need   to   change   PRACE   into   a   long-­‐term   institution   which  guarantees  adequate  and   reliable  HPC  support   in  Europe  also   for  high-­‐level   research  groups  which  happen   to   be   situated   in   countries   which   by   themselves   are   not   able   to  meet   the   computational  needs  of  these  groups.          

Many  communities  –   computational    materials   science,   chemistry  and  nanoscience  being  a   case   in  point   –   are   best   served   if   the   small,   expert   group   of   code   developers   inherent   to   a   community  simulation  code   is   supported  within  a  scheme  of   the  FP7   framework  of   the  European  Commission.  Indeed,  it  is  this  set  of  expert  groups  that  constitutes  the  community  of  exascale  method  developers.  They   may   be   members   of   small   simulation   laboratories   providing   education   and   service   to   the  community   that   applies   this   software   using   the   exascale   infrastructure.   PRACE   has   a   small   expert  group  advising  and  assisting  expert  groups  of  code  developers  and  the  simulation  labs.      

7.5.2 User  Support    All  across  Europe,  world-­‐class  research  teams  are  using  HPC  resources  to  make  new  discoveries.  The  breadth   of   research   applications   is   staggering,   encompassing   virtually   all   areas   of   the   sciences,  engineering  and  medicine,  with  growing  applications  in  the  social  sciences  and  humanities.  Although  it  is  the  lead  researchers  who  often  assume  the  high-­‐profile  roles  in  the  research  process,  there  has  to  be  a  large  supporting  team  working  behind  the  scenes  for  HPC-­‐related  activities.    

An  effective  HPC  facility  is  much  more  than  hardware;  the  smooth  operation  of  the  facility  requires  a  second   tier   of   highly   qualified   personnel   (HQP)   to  manage,   operate   and  maintain   the   facility.   It   is  equally   important   for   these  highly   trained   technical   support   staff   to   train  and  assist   researchers   in  making  the  best  use  of  this  expensive  infrastructure.  

An  investment  in  people  for  today  and  for  the  future  is  a  critical  component  of  this  proposal.  In  many  respects,   computing   infrastructure   can  be  more   readily   acquired   than  human   infrastructure.  Given  adequate   funding,   the   upgrading   of   the   capital   equipment   is   straightforward:   one   can   simply   buy  whatever  is  needed.    

However,  human  infrastructure  is  much  more  challenging   to   obtain.   It   can   take   years   to  train  people  with  the  necessary  skill  sets,  and  then   they   can   be   easily   enticed   away   from  Europe   by   the   lure   of   better   opportunities  coupled   with   higher   salaries.   If   Europe   is   to  invest   in   people   and   skills,   then   it  must   also  invest   in   creating   the   environment   to   attract  and  retain  them.  

A   variety   of   skilled   personnel   and   support  roles   is   therefore   essential   to   the   effective  operation   and   maximum   exploitation   of   any  HPC  facility.  The  skills  and  experience  needed  are   extensive,   including:   (i)   managing,  operating   and   maintaining   the   facility;   (ii)  training  and  assisting  researchers  to  make  the  best  use  of   its   resources  and  capabilities;   (iii)  ensuring  maximal  productivity  of  the  HPC  sites  by,  for  example,  checking  that  software  is  run  on  the  most  suitable  computing  platform  and  reworking  code  to  achieve  significant  performance  gains;  and  (iv)   helping   to   create   new  applications   in   support   of   innovative   research   initiatives.   This   variety   of  skilled  personnel  is  summarised  in  Figure  7.3.  

Recommendation:  People and Train ing  There  is  grave  concern  about  HPC  skills  shortages  across  all  research  areas  and  particularly  in  industry.  The  need  is  for  people  with  both  domain  and  computing  expertise.  The  problems  are  both  insufficient  supply  and  low  retention,  because  of  poor  career  development  opportunities  for  those  supporting  academic  research.  Europe’s  long-­‐term  competitiveness  depends  on  people  with  skills  to  exploit  its  HPC  infrastructure.  It  must  provide  ongoing  training  programmes,  to  keep  pace  with  the  rapid  evolution  of  the  science,  methods  and  technologies,  and  must  put  in  place  more  attractive  career  structures  for  software  developers  to  retain  their  skills  in  universities  and  associated  institutions.  

Page 147: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

147  

Figure  7.3.  The  variety  of  skilled  personnel  required  in  the  effective  operation  and  maximum  exploitation  of  any  HPC  facility.  

System  Administration  and  Operations  

Systems  administration  and  operations  are  primarily  concerned  with  the  day-­‐to-­‐day  care  of  the  HPC  hardware  and  software  infrastructure.  The  supporting  personnel  ensure  the  proper  functioning  of  HPC  facilities,  providing  systems  management  and  operations  support.    Specific  tasks  include  installing  and  maintaining  operating  system(s),  performing  updates  and  patches,  managing  file  systems  and  backups,  and  ensuring  the  integrity  and  security  of  the  user  data.  These  activities  are  crucial  to  ensuring  that  the  system  is  fully  functional  and  available  to  the  community.  

  Programmer  /  Analysts      

The  role  of  programmer/analysts  is  to  provide  specialised  technical  assistance  to  researchers,  to  conduct  workshops  and  training,  and  to  evaluate  and  implement  software  tools  to  make  effective  use  of  available  resources.  HPC  hardware  typically  operates  at  a  sustained  rate  well  below  the  theoretical  peak  performance  of  the  system;  this  is  usually  due  to  a  lack  of  parallelism  in  parts  of  an  application.    A  creative  team  of  programmer/analysts  can  double  that  rate  through  code  optimisations,  algorithm  re-­‐design,  enhanced  cache  utilisation  and  improved  data  locality.  The  added  value  from  such  activities  can  be  huge,  and  can  correspond  to  twice  the  science  delivered  for  the  same  hardware.  These  skills  can  thus  dramatically  increase  the  scientific  productivity  of  the  research  community.  By  allowing  researchers  to  run  their  applications  faster,  analysts  support  researchers  and  their  students  to  do  better  science.  

Applications  Programmers  

Frontier  science  requires  world-­‐class  software  applications.  While  much  of  the  development  of  new  scientific  functionality  is  traditionally  carried  out  in  a  researcher’s  own  laboratory,  HPC  applications  programmers  often  make  valuable  contributions  to  this  work  by  virtue  of  their  own  scientific,  numerical  or  visualisation  experience.  The  additional  skills  of  the  support  staff  often  play  an  integral  role  in  enabling  ideas,  concepts  and  advice  to  flow  with  greater  ease  in  the  subject  domain  of  the  scientist.    This  support  has  the  additional  benefit  of  greatly  reducing  what  is  normally  a  challenging  start-­‐up  period  for  researchers  learning  to  work  with  HPC.  This  skill  set  is  imparted  to  students  and  postdoctoral  fellows  as  well,  giving  them  both  the  scientific  knowledge  and  the  programming  experience  necessary  to  create  new  computational  methods  and  applications  in  their  various  fields,  eventually  leading  to  dramatic  new  insights.  

  Data  Management  and  Visualization  Personnel    

The  importance  of  versatile  analysis  and  visualisation  techniques  for  simulation  work  is  self-­‐evident,  and  both  computation  and  visualisation  activities  are  increasingly  being  driven  by  'science  pull'  rather  than  'technology  push'.  The  most  challenging  aspect  of  data  management  and  visualisation  is  coping  with  the  massive  datasets  that  are  being  produced.  Simulations  in  climatology,  bioinformatics  and  astrophysics,  for  example,  regularly  produce  data  sets  that  are  hundreds  of  TBytes  or  even  PBytes  in  size.    Entirely  new  techniques  and  computing  resources  will  be  necessary  to  cope  with  them:  in  most  cases,  interactive  visualisation  is  the  only  practical  way  to  glean  insights  into  these  data  sets.  In  addition,  the  effective  exploitation  of  such  volumes  of  data  will  require  a  major  development  effort  in  distributed  computing  across  high-­‐speed  networks.  This  requires  the  training  and  the  retention  of  personnel  able  to  manage  the  data  resources  and  to  develop  the  new  tools  and  techniques  required  to  visualise  them.  

           

Page 148: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

148  

7.6 Education  and  Training  of  Researchers  The  overall  goal  of  HPC  support  is  to  support  proactively  the  needs  of  a  wide  variety  of  researchers  by   engaging   them   through   their   entire   HPC   lifecycle.   In   addition   to   maintaining   the   HPC   facility,  support  staff  must:    

• Create  awareness  of  the  resources  available  and  the  potential  for  these  resources  to  accelerate  research  productivity    

• Provide  usage  instructions  and  courses  (on  topics  such  as  parallelism,  programming  tools  and  performance  analysis)    

• Help  users  find  the  right  match  between  their  application  and  the  available  technologies  • Develop  new  tools  or  tune  existing  ones  (H/W  and  S/W)  to  optimise  the  use  of  HPC  

resources  and  the  applications  running  on  them  –  will  enable  the  researcher  to  obtain  more  results  in  the  same  time  or  the  same  results  in  less  time,  thus  freeing  up  facility  time  for  others  

HPC   support   staff   are  essential   for   training  Europe’s  next   generation  of   scientists   and  engineers   in  the  use  and  improvement  of  HPC  resources.  Interactions  of  HPC  support  staff  with  graduate  students  and   postdoctoral   fellows   will   provide   a   fertile   training   ground   to   develop   the   next   generation   of  researchers,  giving  the  new  scientists  grounding  in  HPC  as  part  of  their  disciplinary  training.  Much  like  today’s   researchers   use   personal   computers   to   support   their   work,   the   next   generation   of  researchers  will  rely  on  and  be  able  to  take  effective  advantage  of  HPC  resources,  thus  accelerating  their  research  outputs.  To  do  so  effectively,  researchers  will  need  appropriate  training.    

PRACE   has   an   extensive   education   and   training   effort   for   effective   use   of   the   RI   through   seasonal  schools,  workshops  and  scientific  and  industrial  seminars  throughout  Europe.  Seasonal  schools  target  broad  HPC  audiences,  whereas  workshops  are  focused  on  particular  technologies,  tools  or  disciplines  or  research  areas.  Education  and  training  material  and  documents  related  to  the  RI  are  available  on  the  PRACE  website  as  is  the  schedule  of  events.92  We  also  note  the  detailed  section  on  the  needs  of  education  and  training  from  the  EESI  Working  Group  report  on  Scientific  Software  Engineering.93  

Below,   we   illustrate   the   specific   training   requirements   identified   by   the   panels   through   specific  reference  to  the  areas  of  weather,  climatology  and  solid  Earth  sciences   (WCES);  astrophysics,  high-­‐energy  physics  and  plasma  physics;  engineering  sciences  and  industrial  applications;  and    life  sciences  and  medicine.  

                 

                                                                                                                         92  http://www.training.prace-­‐ri.eu/  93  Where  attention  is  focused  on  (i)  the  programming  model,  (ii)  the  runtime  environment,  (iii)  debugging,  (iv)  validation  and  correctness,  (v)  performance  tools,  (vi)  performance  modelling  and  simulation,  (vii)  batch  system  and  resource  managers,  (viii)  I/O  and  file  system,  (ix)  resilience,  and  (x)  energy  efficiency.  The  report  stresses  the  absence  of  any  such  courses  in  Europe,  e.g.  on  storage,  fault  tolerance  in  HPC,  and  the  need  to  develop  these  kind  of  courses  in  Europe.  

Page 149: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

149  

 Table  7.3.  Training  perspectives  from  weather,  climatology  and  solid  Earth  sciences,    astrophysics,  high-­‐energy  physics  and  plasma  physics,  and  engineering  sciences  and  industrial  applications.  

 

Weather,  Climatology  and  solid  Earth  Sciences  (WCES)  

Training  programmes  will   allow  WCES   scientists   to   improve   their  HPC  background  as  well   as   to   establish   stronger  links  between  the  HPC  community  and  their  own  domain.  In  this  respect,  funding  specific  actions  to  support  training  activities,   summer/winter   schools,   intra-­‐European   fellowships   as   well   as   international   incoming   and   outgoing  fellowships   will   play   a   strategic   role   in   preparing   new   scientists   with   a   stronger   and   more   interdisciplinary  background.  Given  the  expected  increased  complexity  of  the  component  models  and  of  future  exascale  computing  platforms,  a   lot  of  resources  should  be  devoted  to  the  technical  aspects  of  coupled  climate  modelling;  the  coupler  development  teams  should  be  reinforced,   including  experts   in  computing  science  remaining  at  the  same  time  very  close  to  the  climate  modelling  scientists.  

In  oceanography  and  marine  forecasting,  support  should   be   made   available   to   train   young  interdisciplinary   scientists   for   them   to   become  specialists  in  not  only  climate  science  or  HPC  but  both.   Training   should   be   provided   via   summer  schools   and   international   training   networks   (e.g.  the   International  Training  Network  SLOOP  (SheLf  to   deep   Ocean   mOdeling   of   Processes)   recently  submitted  to  FP7).  

In   solid   Earth   sciences,   the   community   is   preparing   itself   for  extensive   use   of   supercomputers   by   the   current   (re-­‐)  organisation  of  some  of  the  communities  through  large-­‐scale  EU  projects,  e.g.   the  Marie  Curie  Research  Training  Network  SPICE  (http://www.spice.rtn.org),   the   EC-­‐projects   NERIES  (http://www.neries-­‐eu.org/),   NERA   (http://www.nera-­‐eu.org),  VERCE   (http://www.verce.eu).   Similar   developments   occur   in  geodynamics  (e.g.  TOPOEurope)  and  in  geodynamo.  

Astrophysics,  High-­‐Energy  Physics  and  Plasma  Physics  

The   steady   advancements   in   computer   performance  usually   demands  deep   code  modifications   accompanied  with  steep   learning   curves   in   sophisticated   coding   techniques.  The  resources   involved   in   the  development   of   computer  hardware   are   usually  much  larger   than   those   used   for   scientific   software   development.   Moreover,   this   problem  is  exacerbated  by  the  decline  in  the  number  of  students  interested  in  enrolling  in  a  scientific  career.  One  must  avoid  the  situation   in  which  expensive   improvements  in  the  computer  facilities  are  not  followed  by  comparable  scientific  advancements  because  of  inadequate  manpower  to  develop  and  exploit  efficient  codes.  The  time  has  come  to  stress  the  importance  of  a  pan-­‐European  training  and  tenure-­‐track  programme  in  HPC.94  

Engineering  Sciences  and  Industrial  Applications  

The  systematic  training  of  code  developers  is  vital,  with  the   primary   task   being   to   educate   developers   of  technical   simulation   codes   in   the   design   of   hardware-­‐aware  algorithms   and   in   the   systematic   analysis   of   the  computational  performance  of  their  programs.  Failure  to  address   these   requirements  will   be   detrimental  to   the  practical  industrial  use  of  exascale  systems.  

Industry   obviously   needs   skilled   personnel   to   build  maintain   and   program   exascale   hardware.   Human  resources   are   the   key   element   in   grasping   the  competitive  edge  and  benefit  of  HPC,  with  education  and  training   mandatory   for   realising   the   potential   of   the  technology.  

Education  and  training  may  be  viewed  generally  in  two  distinct   ways.   First   is   the   classical   specialisation   on  specific   topics   –   university     courses   on   themes   such   as  hardware   design,   compiler   technology,   numerical  algebra,   etc.   Courses   at   universities   or   research  institutions   are   commonly   organised   in   this   way   and  are   appropriate   solutions   for   analysis   or   in-­‐depth  activities.  

A   second   way   to   view   education   and   training   is   in   a  somewhat  different   light   –   from  a   systems   view.   That  is   the   capability   to   build   up   systems   by   joining  different   components   or   knowledge   from   different  areas.   The   holistic   view   is   typically   necessary   when  solving  real-­‐world  problems  as   found   in   industry.    The  implementation   and   deployment   of   exascale  computing   have   been   described   with   needs   for   co-­‐design   and   ecosystems.     These   terms   indicate   the  necessity  for  personnel  with  skills  for  systems  ‘thinking  and  building’.  

Classical  education  and  training  with  focus  on  analysis  and  specific  in-­‐depth  study  is  certainly  both  needed  and  appropriate.  Here  we  suggest  that  it  is  highly  desirable  to  extend  the  formation  with  the  complementary  aspect  of  system  design  and  engineering.  

                                                                                                                         94  M.  Feldman,  http://www.hpcwire.com/hpcwire/2012-­‐01-­‐03/wanted:_supercomputer_programmers.html  

Page 150: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

150  

7.6.1 Life  Sciences  and  Medicine  There  is  a  clear  asymmetry  in  education  and  training  needs.  Life  scientists  and  clinicians  must  learn  how  to  use  the  best  of  breed  in-­‐silico  science.  On  the  other  hand,  programmers  need  to  understand  better  end  user’s’  needs  in  more  detail.  This  requires  significant  efforts  to  develop  methods  that  are  able  to  address  pressing  life  science  problems  with  the  help  of  exascale  computing.    

Figure  7.4.  Training  priorities  in  the  life  sciences  and  medicine.      

Method  development.  Many  present-­‐day  techniques  are  for  fundamental  reasons  not  scalable  towards  the  envisioned  access  architectures.  For  many  challenging  problems,  codes  addressing  various  aspects  of  the  problem  must  be  combined  in  an  interdependent  set  of  simulations/calculations.  Method  development  in  this  direction  is  still  in  its  infancy  and  will  probably  become  the  most  significant  obstacle  towards  the  exploitation  of  exascale  computing  for  life  science  applications.  

Memory  management.  Memory  can  become  a  major  bottleneck  for  life  sciences  exascale  computing  challenges.  A  better  understanding  of  how  to  optimise  this  resource  will  reduce  future  costs  in  updating  codes.  

Integration  of  optimised  libraries  in  scripts.  Genomics  and  systems  biology  are  fast-­‐evolving  disciplines  and  the  driving  force  for  code  development  will  be  non-­‐expert  programmers.  It  is  important  that  bio-­‐programmers  understand  which  regions  of  the  code  (exascale-­‐demanding)  should  be  developed  (eventually  by  specialists)  using  efficient  programming  languages.  They  also  need  to  understand  how  to  link  these  libraries  in  high-­‐level/scripting  languages  such  as  Java,  Perl  or  Python.  

Data  storage.  As  data  access  (memory/disk)  is  much  more  costly  than  flops,  training  should  focus  on  how  to  store  efficiently  information,  including  compression  methods  and  database  storage  models.  

Data    integration    and    analysis.    Tools    such    as    Hadoop    and    MapReduce    can    expedite  searches  through  the  large,  irregular  data  sets  that  characterise  some  life  sciences  problems.  These  tools  can  be  effective  for  retrieving  and  moving  through  huge  volumes  of  complex  data,  but  they  do  not  allow  researchers  to  take  the  next  step  and  pose  intelligent  questions.  A  related  issue  is  that  these  tools  may  be  fine  for  working  with  a  few  TBytes  of  scientific  data,  but  become  cumbersome  to  use  when  data  sets  cross  the  100-­‐terabyte  threshold.  Effective  tools  for  scientific  data  integration  and  analysis  on  this  scale  are  largely  lacking  today.  

Code    parallelisation.  The    trend    of    the    hardware    industry  is    to    increase    flop    power  by  multiplying  the  number  of  cores.  However,  this  trend  has  resulted  in  unbalanced  architectures.  This  implies  enormous  efforts  in  code  parallelisation.  In  the  coming  years,  it  will  be  important  to  train  researchers  in,  at  least,  standard  parallel  programming  models  (MPI,  OpenMP).  

Benchmarking    support.    Benchmarking  performance  to  evaluate  hardware  and  software  alternatives:  Software  tools  such  as  BSC-­‐Tools  are  able  to  identify  inefficient  blocks  of  code,  or  bottlenecks,  in  applications.  Such  performance  tools  will  provide  developers  with  a  powerful  aid  towards  reaching  exascale  challenges.    Another  vital  aspect  of  benchmarking  is  the  verification  of  the  scientific  quality  of  the  results  by  setting  standard  protocols  for  comparison  and  by  developing  meta-­‐servers  that  can  combine  multiple  approaches.  The  diversity  of  areas  makes  this  issue  computationally  complex.  

Computational  methods  training.  Overall,  the  groups  involved  in  exascale  challenges  for  life  science  will  require  expertise  in  code  parallelisation,  applied  mathematics,  mathematical  modelling,  statistics,  biology,  biochemistry,  biophysics,  data  analysis,  data  visualisation  and  biological  simulation.  Therefore,  we  will  need  to  focus  on  training  in  computational  methods  for  those  coming  from  a  biological,  as  opposed  to  a  physical  sciences,  background.  In  parallel,  computational  biologists  need  to  learn  how  to  design  software  and  build  friendly  interfaces  for  experimentalists.  

There   is   a   critical   need   to   train   the   computing   life   science   community   in   the   special   demands   of  parallel   computing   (programming,  performance  optimisation,  etc.),  and   to  prepare   them  for  using  HPC  in  combination  with  systems  and  integrative  biology.  Unfortunately,  there  are  very  few  places  in  Europe  providing  education  in  bioinformatics  and  computational  biology.  The  panel  identified  the  priorities  in  training  shown  in  Figure  7.4.  

Page 151: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

151  

7.7 Community  Building  and  Centres  of  Competence    

In  the  preceding  section,  we  considered  the  generic  class  of  HPC  support  staff  required  in  developing,  sustaining   and   supporting   a   petascale-­‐class   computing   infrastructure.   We   consider   below   the  broader   requirements   of   community   building   and   the   software   facilities   crucial   to   the   successful  exploitation  of  leadership-­‐class  systems.    

We   illustrate   these   requirements   through   specific   reference   to   the   areas   of   life   sciences,   and  materials  science,  chemistry  and  nanoscience.  

The   life   science   panel   is   eager   to   apply   the  model   of   USA   co-­‐design   centres   focused   on   exascale  physics  applications,  such  as  the  Centre  for  Exascale  Simulation  of  Advanced  Reactors  (CESAR),  the  Co-­‐Design  for  Exascale  Research  in  Fusion  (CERF),  the  Flash  High-­‐Energy  Density  Physics  Co-­‐Design  Centre,  and  the  Combustion  Exascale  Co-­‐Design  Centre.    

A   centre   with   academic   and   industrial   participation   focused   in   life   sciences   and   health   will   be  instrumental  to  facilitate  an  efficient  use  of  PRACE  Tier-­‐0  resources  in  areas  such  as  tissue  and  organ  simulation,  molecular   dynamics,   cell   simulation,   genome   sequencing,   and   personalised  medicine.  Considering  the  complex  nature  of  the  bio-­‐computational  field,  only  a  powerful  competence  centre  will   guarantee   compatibility   between   research   needs   in   the   area   and   the   new   generation   of  exascale  computers.  

European  activity   in  materials  science  has  been  considerably  strengthened  with  the  establishment  of  CECAM  (Centre  Européen  de  Calcul  Atomique  et  Moléculaire)95      as  the  focal  organisation  with  a  nodal  structure  and  with  funding  links  to  national  organisations.  It  provides  an  intersection  for  many  computational   disciplines,  with   activities   ranging   from   the   organisation   of   scientific  workshops   to  that  of   specific   tutorials   at   the   graduate   level   on   the  use  of   especially   relevant   software,   and   the  sponsorship  of  specialised  courses  in  computational  sciences  also  at  the  masters  level.  In  particular,  it  interlinks  the  first-­‐principles  community  (Psi-­‐k)  and  the  molecular  dynamics  community.  

First-­‐principle   simulations   based   primarily   on   the   quantum-­‐mechanical   density   functional   theory  (DFT)   provide   the   most   important   framework   across   physics,   surface   science,   materials   science,  chemistry,   nanoscience,   computational   biology,   mineralogy,   Earth   science,   and   engineering.   Such  simulations   are   capable   of   deriving   properties   –   frequently   with   predictive   capability   –   from   the  atomic   level   with   no   input   other   than   the   atomic   number   of   the   constituent   chemical   elements.  Included   are   also   areas  of   ab-­‐initio  molecular   dynamics,   time-­‐dependent  DFT,   the  quantum  many-­‐body  perturbation  theory  and  many-­‐body  approaches  to  electron  correlation  and  strongly  correlated  electron  systems.    

Research   is   pursued   worldwide   by   a   large   and   diverse   community.   Since   the   calculations   are  computationally  very  demanding,  the  progress  of  this  field  scales  with  the  availability  of  a  powerful  supercomputing   infrastructure   for   capability   and   capacity   computing.   This   community   has   gained  vital   experience   on   HPC   infrastructures   by   developing   cutting-­‐edge   algorithms.   Europe   is   a  recognised   leader   in   this   field.   In  2011,  more   than  17,000  papers  containing  DFT  calculations  were  published   worldwide.   As   shown   in   Figure   7.5,   Europe   contributes   more   than   one   third   to   this  outcome.      

A  large  majority  of  the  computer  codes  in  use  by  the  community  field  originate  from  and  are  being  developed   in   Europe   (e.g.   see   http://www.psi-­‐k.org/codes.shtml,   or   footnote   42).   The   European  scientific   value   and   strength   in   computational   materials   science   was   recently   recognised   by   the  European   Science   Foundation   through   the   launch   of   a   new   Research   Networking   Programme   in  Advanced  Concepts  in  ab-­‐initio  Simulations  of  Materials  (Psi-­‐k),  continuing  the  highly  successful  work  of  the  previous  Psi-­‐k  Programmes,  in  which  more  than  1,000  scientists  are  organised.  

                                                                                                                         95  www.cecam.org  

Page 152: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe   Requirements  –  The  Effective  Exploitation  of  HPC

152  

Soft  matter      

Europe   is  a  stronghold  of  soft  matter  science  worldwide.  The   importance  of   this   field   is,  of  course,  also  recognised  in  the  USA,  Japan  and  China,  as  demonstrated  by  the  steady  increase  of  groups  and  researchers  working   in  this   field.  However,  Europe  has  the   longest   tradition,   the   largest  number  of  groups,   and  many   of   the   leading   scientists.   Soft  matter   science   is   characterised   by   a   very   fruitful  interplay   of   experiment,   theory   and   computer   simulations.   As   explained   above,   computational  science   plays   an   essential   role   in   the   past,   current   and   future   progress   in   this   field.   Soft  matter   is  integrated  in  Europe  by  a  Network  of  Excellence  (NOE).

 

Figure  7.5.  Number  of  publications  per  year  containing  results  of  ab-­‐initio  calculations  based  on  the  density  functional  theory  listed  over  a  timespan  of  20  years  and  originating  from  Europe,  the  USA  and  East  Asia,  as  listed  in  the  Web  of  Knowledge  of  the  Institute  for  Scientific  Information  (ISI).  The  search  criterion  was  that  the  topic  contains  the  keywords  "ab-­‐initio"  or  "first  principles"  or  "density  functional".    In  2011,  more  than  17,000  publications  were  counted  worldwide.  Europe  contributes  more  than  one  third  to  the  total  number  of  publications.  The  rapid  growth  in  East  Asia  is  due  to  China.    Courtesy  of  Institute  for  Scientific  Information  (ISI)                      

Page 153: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe                   Panel  Membership  

153  

8 MEMBERSHIP OF INTERNATIONAL SCIENTIFIC PANEL

Area   Country   Title   First  Name   Last  Name   Institution   E-­‐Mail  Address   Workshop  

Weather,  Climatology  

and  Solid  Earth  Sciences  

DE  

Prof.   Geerd-­‐Rüdiger     Hoffmann   Deutscher  Wetterdienst   geerd-­‐[email protected]   WCES  

Prof.  Dr   Heiner     Igel    LMU  Munich   [email protected]­‐muenchen.de   WCES  

Dr   Joachim   Biercamp   Deutsche  Klimarechenzentrum  (DKRZ)   [email protected]     WCES  

Dr   Reinhard   Budich   MPI-­‐M,  Hamburg   [email protected]     WCES  

ES   Prof.   José  María   Baldasano   UPC  /  BSC   [email protected]   WCES  

IT  

Prof.   Giovani   Aloisio†   IS-­‐ENES  &  Univ.  Salento   [email protected]   WCES  

Dr   Massimo   Cocco   Istituto  Nazionale  di  Geofisica  e  Vulcanologia   [email protected]   WCES  

Dr   Alberto   Michelini   Istituto  Nazionale  di  Geofisica  e  Vulcanologia    alberto.michelini@  ingv.it     WCES  

FI   Dr   Johan   Silen   Finnish  Meteorological  Institute  (FMI)   johan.silen  @fmi.fi   WCES  

FR  

Dr   Jean  Claude   Andre   CERFACS   jean-­‐[email protected]     WCES  

Dr   Fabien   Dubuffet     CNRS,  Lyon     fabien.dubuffet@univ-­‐lyon1.fr   WCES  

Dr   Marie  Alice   Foujols   Institut  Pierre-­‐Simon  Laplace  (IPSL)   marie-­‐[email protected]     WCES  

Dr   Sylvie   Joussaume   Institut  Pierre-­‐Simon  Laplace  (IPSL)   [email protected]   WCES  

Prof.   Jean  Pierre   Vilotte   Institut  de  Physique  du  Globe   [email protected]   WCES  

Prof.   Bernard   Barnier   LEGI/Univ.  Grenoble   [email protected]­‐inp.fr   WCES  

Dr   Sophie   Valcke   CERFACS   [email protected]   WCES  

Page 154: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe                   Panel  Membership  

154  

SE   Dr   Colin   Jones   Swedish  Meteorological  &  Hydrological  Institute  (SMHI)   [email protected]   WCES  

 

Dr   Mike   Ashworth   STFC  Daresbury   [email protected]   WCES  

Dr   Chris   Gordon   Meteorological  Office   chris.gordon@  metoffice.gov.uk     WCES  

Prof.   Bryan   Lawrence   NCAS  Reading  University   [email protected]   WCES  

Dr   Graham   Riley   Manchester  University   [email protected]     WCES  

Dr   John   Brodholt   University  College  London   [email protected]   WCES  

 

Astrophysics,  High-­‐Energy  Physics  and  

Plasma  Physics    

DE  

Prof.  Dr   Wolfgang   Hillebrandt   MPI  für  Astrophysik   wfh@mpa-­‐garching.mpg.de   HEPPA  

Prof.  Dr     Andreas   Schäfer†   Fakultät  Physik  –  Universität  Regensburg    

[email protected]­‐regensburg.de  

HEPPA  

ES  

 Prof.   Jesús     Marco   Universidad  de  Cantabria   [email protected]   HEPPA  

 Prof.   Gustavo   Yepes   UAM  Madrid   [email protected]   HEPPA  

 Prof.   José  María   Ibañez   Universitat  de  València   [email protected]   HEPPA  

 Dr   Victor   Tribaldos   CIEMAT   [email protected]     HEPPA  

FR    Dr   Edouard   Audit   CEA,  SAP     [email protected]   HEPPA  

UK  

Prof.   Carlos   Frenk   University  of  Durham   [email protected]   HEPPA  

Prof.   Richard     Kenway   University  of  Edinburgh   [email protected]   HEPPA  

Dr   Colin   Roach   UKAEA   [email protected]   HEPPA  

Panel  Mailing  List  

CH  Prof.   Ben   Moore   University  of  Zurich   moore  @physik.uzh.ch                                       HEPPA  

Prof.  Dr     Romain   Teyssier   University  of  Zurich   teyssier@  physik.uzh.ch   HEPPA  

DE   Prof.  Dr    Gernot   Münster   Westfälische  Wilhelms-­‐Universität  Münster     munsteg@uni-­‐muenster.de   HEPPA  

FR   Prof.   Maurizio       Ottaviani       CEA  Cadarache   maurizio.ottaviani  @  cea.fr         HEPPA  

Page 155: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe                   Panel  Membership  

155  

Prof.   Laurent     Lellouch   CNRS  Luminy   [email protected]­‐mrs.fr   HEPPA  

PT     Ricardo           Fonseca     DCTI  –  ISCTE,  Lisbon   ricardo.fonseca  @  ist.utl.pt   HEPPA  

 

Life  Sciences  and  Medicine  

DE   Dr   Wolfgang     Wenzel     Institut  für  Nanotechnologie   [email protected]   LIFE  

ES  Prof.   Modesto   Orozco†   UB  /  BSC   [email protected]   LIFE  

Prof.   Roderic   Guigó   UPF   [email protected]   LIFE  

FR  

Dr   Thomas   Simonson     CNRS   [email protected]   LIFE  

Dr   Olivier   Poch   CNRS   [email protected]­‐strasbg.fr   LIFE  

Dr   Charles   Laughton   University  of  Nottingham   [email protected]   LIFE  

Panel  Mailing  List  

CH   Dr     Manuel   Peitsch   Swiss  Institute  of  Bioinformatics   manuel.peitsch@  isb-­‐sib.ch       LIFE  

DE   Dr   Paolo       Carloni     German  Research  School  for  Simulation  Sciences,  Jülich   p.carloni@  grs-­‐sim.de         LIFE  

DE   Prof.   Helmut   Grubmuller   Max-­‐Planck  Institute  for  Biophysical  Chemistry   [email protected]   LIFE  

IT   Prof.   Anna   Tramontano   University  of  Rome  ‘La  Sapienza’   [email protected]   LIFE  

LU   Dr   Reinhard   Schneider   University  of  Luxembourg   reinhard.schneider  @uni.lu   LIFE  

SE   Prof.   Erik     Lindahl   Stockholm  Universiy   lindahl  @  cbr.su.se   LIFE  

UK   Prof.   Peter     Coveney       University  College  London   [email protected]     LIFE  

 

     

 

BG   Prof.   Georgi   Vayssilov   University  of  Sofia   [email protected]­‐sofia.bg   CMSN  

DE  

Prof.  Dr     Gerhard     Gompper     Forschungszentrum  Jülich   g.gompper@fz-­‐jueilch.de   CMSN  

Prof.Dr   Stefan   Blügel†   Forschungszentrum  Jülich  GmbH   S.Bluegel@fz-­‐juelich.de   CMSN  

Page 156: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe                   Panel  Membership  

156  

 

     

       

       

 Materials  Science,  

Chemistry  and  Nanoscience  

Prof.  Dr     Wolfgang   Wenzel   Karlsruhe  Institute  of  Technology   [email protected]   CMSN  

ES   Prof.   Agustí   Lledós   UAB   [email protected]   CMSN  

FI   Prof.   Risto   Nieminen   Helsinki  University  of  Technology   [email protected]   CMSN  

FR   Dr   Thierry   Deutsch   CEA-­‐Grenoble   [email protected]   CMSN  

IE   Prof.    Jim     Greer   Tyndall  National  Institute,  Cork   [email protected]   CMSN  

IT   Prof.   Giovanni   Ciccotti   University  of  Rome,  ‘La  Sapienza’   [email protected]   CMSN  

NO   Prof.   Kenneth   Ruud   University  of    Tromsø     [email protected]   CMSN  

UK  

           

Prof.   Martyn   Guest   Cardiff  University   [email protected]   CMSN  

Prof.   Mike   Payne   Cambridge  University   [email protected]   CMSN  

Panel  Mailing  List  

AT   Prof.  Dr   Christoph     Dellago       University  of  Vienna   [email protected]   CMSN  

CH   Prof.   Alessandro   Curioni       IBM  Research  –  Zurich   [email protected]         CMSN  

DE   Prof.  Dr   Kurt       Binder   Johannes  Gutenberg-­‐Universität  Mainz   kurt.binder@uni-­‐mainz.de   CMSN  

ES   Prof.   Manuel     Yanez   Universidad  Autonoma  de  Madrid   [email protected]   CMSN  

FI   Prof.   Kai       Nordlungd   University  of  Helsinki   [email protected]   CMSN  

FR   Prof.   Gilles         Zerah   CEA  –  CECAM   [email protected]   CMSN  

FR   Prof.   Philippe     Sautet   CNRS  and  Ecole  Normale  Supérieure  of  Lyon  

philippe.sautet@ens-­‐lyon.fr   CMSN  

RU   Prof.   Alexei   Khoklov     Moscow  State  University   [email protected]   CMSN  

SE   Dr   Kersti   Hermansson   Uppsala  University   [email protected]   CMSN  

Page 157: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe                   Panel  Membership  

157  

UK  

Prof.   Steve   Parker         Bath  University   [email protected]   CMSN  

Prof.   Jonathan   Tennyson   University  College  London   [email protected]   CMSN  

Dr   Adrian   Wander   STFC  Daresbury  Laboratory   adrian.wander  @stfc.ac.uk   CMSN  

 

Engineering  Sciences  and  Industrial  

Applications  

BE   Dr   Koen   Hillewaert   CENAERO   [email protected]   Eng  

CZ   Prof.   Zdenek   Dostal   VŠB-­‐Technical  University  of  Ostrava   [email protected]   Eng  

DE  

Prof.  Dr     Uli   Rüde   University  Erlangen-­‐Nuremberg   [email protected]   Eng  

Prof.  Dr-­‐Ing     Wolfgang     Schröder  

RWTH  Aachen  –  Lehrstuhl  für  Strömungslehre  und  Aerodynamisches  Institut  

[email protected]­‐aachen.de   Eng  

ES   Prof.   Javier   Jimenez  Sendín   UPM   [email protected]   Eng  

FR  

Dr   Stephane   Requena   GENCI   [email protected]   Eng  

Dr   Denis   Veynante   CNRS     [email protected]   Eng  

Dr   Philippe     Ricoux†   TOTAL   [email protected]   Eng  

SE   Dr   Philipp   Schlatter   KTH  Mechanics     [email protected]   Eng  

UK  

Prof.   Neil  D   Sandham  School  of  Engineering  Sciences  

(Aero)  –  University  of  Southampton  

[email protected]   Eng  

Prof.   Stewart   Cant   University  of  Cambridge   [email protected]    

Dr   David   Emerson   STFC  Daresbury  Laboratory   [email protected]   Eng  

NL   Dr   Roel   Verstappen   University  of  Groningen   [email protected]   Eng  

Panel  Mailing  List  

DE   Dr   H.   Pitsch  RWTH  Aachen  –  Lehrstuhl  für  Strömungslehre  und  Aerodynamisches  Institut  

[email protected]­‐aachen.de   Eng  

Page 158: The Scientific Case for High Performance Computing in Europe 2012-2020

PRACE  –  The  Scientific  Case  for  HPC  in  Europe                   Panel  Membership  

158  

  †  Moderator    

  Country   Title   First  Name   Last  Name   Institution     E-­‐Mail  Address  

Editorial    Group  

 

DE   Prof.  Dr  Dr   Thomas   Lippert  

Forschungszentrum  Jülich  GmbH  –  Central  Institute  for  Applied  Mathematics  

Chairman,  PRACE  1IP   Th.Lippert@fz-­‐juelich.de  

UK  Prof.   Richard     Kenway   University  of  Edinburgh   Chairman,  Scientific  

Steering  Committee   [email protected]  

Prof.   Martyn     Guest   University  of  Cardiff   Lead  Editor   [email protected]  

PT    Dr   Maria   Ramalho   PRACE  Acting  Managing  Director  of    PRACE  aisbl  

[email protected]­‐ri.eu  

IE    Dr   Turlough   Downes   Dublin  City  University  Chairman,  User  Forum  Programme  Committee  

[email protected]  

   

  Country   Title   First  Name   Last  Name   Institution     E-­‐Mail  Address  

PRACE    Project   IT   Dr   Giovanni   Erbacci   CINECA   WP4-­‐1IP   [email protected]  

Dr   Norbert     Kroll   DLR  Institute  of  Aerodynamics  &  Flow  Technology   [email protected]   Eng  

FR  

Dr   Jean  Yves     Berthou     French  National  Research  Agency  

jean-­‐[email protected]    

Eng  

Dr   Henri   Calandra   TOTAL   [email protected]   Eng  

Prof.   Olivier     Pironneau     University  of  Paris  VI  (Pierre  et  Marie  Curie)   [email protected]     Eng  

Dr   Thiery       Poinsot   CERFACS   thierry.poinsot@  cerfacs.fr   Eng  

UK  Prof.   Demetrios   Papageorrgiou   Imperial  College   [email protected]   Eng  

Dr   C.  Y.   Wu   Birmingham  University   [email protected]   Eng  

Page 159: The Scientific Case for High Performance Computing in Europe 2012-2020