Top Banner
Computational Chemistry at Daresbury 16-22 November 2002 onal Science and Engineering Department Daresbury La Computational Chemistry at Computational Chemistry at Daresbury Laboratory Daresbury Laboratory Quantum Chemistry Group Quantum Chemistry Group Martyn. F. Guest, Paul. Sherwood, and Huub Martyn. F. Guest, Paul. Sherwood, and Huub J.J. van Dam J.J. van Dam http://www.dl.ac.uk/CFS http://www.dl.ac.uk/CFS http://www.cse.clrc.ac.uk/Activity/QUASI http://www.cse.clrc.ac.uk/Activity/QUASI Molecular Simulation Group Molecular Simulation Group Bill Smith and Maurice Leslie Bill Smith and Maurice Leslie http://www.dl.ac.uk/TCSC/Software/DL_POLY http://www.dl.ac.uk/TCSC/Software/DL_POLY
55

Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Apr 01, 2015

Download

Documents

Seth Fiddler
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Computational Chemistry at Computational Chemistry at Daresbury LaboratoryDaresbury Laboratory

Quantum Chemistry GroupQuantum Chemistry Group

Martyn. F. Guest, Paul. Sherwood, and Huub J.J. van Dam Martyn. F. Guest, Paul. Sherwood, and Huub J.J. van Dam

http://www.dl.ac.uk/CFShttp://www.dl.ac.uk/CFS

http://www.cse.clrc.ac.uk/Activity/QUASIhttp://www.cse.clrc.ac.uk/Activity/QUASI

Molecular Simulation GroupMolecular Simulation Group

Bill Smith and Maurice LeslieBill Smith and Maurice Leslie

http://www.dl.ac.uk/TCSC/Software/DL_POLYhttp://www.dl.ac.uk/TCSC/Software/DL_POLY

Page 2: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

OutlineOutlinePerformance of Computational Chemistry CodesPerformance of Computational Chemistry Codes

Serial Applications BenchmarksSerial Applications Benchmarks GAMESS-UK and DL_POLYGAMESS-UK and DL_POLY

Parallel Performance on High-end and Commodity-class Parallel Performance on High-end and Commodity-class systemssystems NWChem NWChem

Global Array (GA) ToolsGlobal Array (GA) ToolsParallel Eigensolver (PeIGS)Parallel Eigensolver (PeIGS)

GAMESS-UKGAMESS-UKSCF, DFT, MP2 and 2nd DerivativesSCF, DFT, MP2 and 2nd Derivatives

DL_POLYDL_POLYVersion 2: Replicated dataVersion 2: Replicated dataVersion 3: Distributed data (domain decomposition)Version 3: Distributed data (domain decomposition)

CHARMM and QM/MM CalculationsCHARMM and QM/MM CalculationsThrombin and TIM benchmarksThrombin and TIM benchmarks

Page 3: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

GAMESS-UK and DLGAMESS-UK and DL__POLY Serial BenchmarksPOLY Serial Benchmarks

12 Typical QC Calculations12 Typical QC Calculations

ModuleModule Basis (GTOs) SpeciesBasis (GTOs) Species1. SCF1. SCF STO-3G (124)STO-3G (124) MorphineMorphine

2. SCF2. SCF 6-31G (154)6-31G (154) CC66HH33(NO(NO22))33

3. ECP Geometry3. ECP Geometry ECPDZ (70)ECPDZ (70) NaNa77MgMg++

4. Direct-SCF4. Direct-SCF 6-31G (82)6-31G (82) CytosineCytosine

5. CAS-geometry5. CAS-geometry TZVP (52)TZVP (52) HH22COCO

6. MCSCF6. MCSCF EXT1 (74)EXT1 (74) HH22COCO

7. Direct-CI7. Direct-CI EXT2 (64)EXT2 (64) HH22CO/HCO/H22+CO+CO

8. MRD-CI (26M)8. MRD-CI (26M) ECP (59)ECP (59) TiClTiCl44

9. MP2-geometry9. MP2-geometry 6-31G* (70)6-31G* (70) HH33SiNCOSiNCO

10. SCF 2nd derivs.6-31G (64)10. SCF 2nd derivs.6-31G (64) CC55HH55NN

11. MP2 2nd derivs.6-31G* (60)11. MP2 2nd derivs.6-31G* (60) CC44

12. Direct-MP212. Direct-MP2 DZP (76)DZP (76) CC55HH55NN

Six Typical SimulationsSix Typical SimulationsSimulationSimulation AtomsAtoms TimeTime

stepssteps

1. Na-K disilicate glass1. Na-K disilicate glass 10801080 300300

2. Metallic Al with2. Metallic Al with 256256 80008000

Sutton-Chen potentialSutton-Chen potential

3. Valinomycin in 12233. Valinomycin in 1223 38373837 100100

water moleculeswater molecules

4. Dynamic shell model 4. Dynamic shell model 768768 10001000

water with 1024 siteswater with 1024 sites

5. Dynamic shell model 5. Dynamic shell model 768768 10001000

MgCl2 with 1280 sitesMgCl2 with 1280 sites

6. Model membrane, 2 6. Model membrane, 2 31483148 10001000

membrane chains, 202 membrane chains, 202

solute and 2746 solvent moleculessolute and 2746 solvent molecules

Page 4: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

The GAMESS-UK BenchmarkThe GAMESS-UK BenchmarkPerformance relative to the Compaq Alpha ES45/68-1000

55

67

90

46

16

59

45

111

51

48

96

84

49

126

100

73

144

0 30 60 90 120 150

IBM RS/6000 44P-270

IBM 690 Turbo/POWER4 1.3 GHz

HP PA-9000/J6000-552

HP PA-9000/J6700-750

SGI Origin3800/R14k-500

SGI O2 R12k/270

SUN Fire 6800 / 900 Cu

SUN Blade 1000 / 750

Compaq Alpha ES45/1000

Compaq Alpha ES40/833

Intel Tiger Itanium 2/1000

HP RX2600 Itanium 2/1000

IBM IA64 Itanium 800/4MB

HP RX4610 Itanium 733/2MB

Pentium 4 / 2000

AMD MP1800+ / 1533

Pentium III / 1000

3.6 minutes

Page 5: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

32

78

58

78

17

65

24

50

42

151

114

46

46

62

60

38

83

100

0 20 40 60 80 100 120 140 160

IBM RS/6000 44P-270

IBM 690 Turbo/POWER4 1.3 GHz

HP PA-9000/J6000-552

HP PA-9000/J6700-750

Cray T3E/1200

SGI Origin3800/R14k-500

SGI O2 R12k/270

SUN Fire 6800 / 900 Cu

SUN Blade 1000 / 750

Compaq Alpha ES45/1000

Compaq Alpha ES40/833

Intel Tiger Itanium 2/1000

HP RX2600 Itanium 2/1000

IBM IA64 Itanium 800/4MB

HP RX4610 Itanium 733/2MB

Pentium 4 / 2000

AMD MP1800+ / 1533

Pentium III / 1000

The DLThe DL__POLY Benchmark. POLY Benchmark. Performance relative to the Compaq Alpha ES45/68-1000

Page 6: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

High-End Systems EvaluatedHigh-End Systems Evaluated Cray T3E/1200ECray T3E/1200E

816 processor system at Manchester (CSAR service)816 processor system at Manchester (CSAR service) 600 Mz EV56 Alpha processor with 256 MB memory600 Mz EV56 Alpha processor with 256 MB memory

IBM SP/WH2-375 and SP/Regatta-H 32 CPU system at DL, 4-way Winterhawk2 SMP “thin nodes” with 2 GB 32 CPU system at DL, 4-way Winterhawk2 SMP “thin nodes” with 2 GB

memory, 375 MHz Power3-II processors with 8 MB L2 cachememory, 375 MHz Power3-II processors with 8 MB L2 cache IBM Regatta-HIBM Regatta-H (32-way node, 1.3 GHZ power4 CPUs) at Montepelier (32-way node, 1.3 GHZ power4 CPUs) at Montepelier IBM SP/Regatta-HIBM SP/Regatta-H (8-way LPAR’d nodes, 1.3 GHZ) at (8-way LPAR’d nodes, 1.3 GHZ) at ORNLORNL

Compaq AlphaServer SC Compaq AlphaServer SC 4-way ES40/667 A21264A (APAC) and 833 MHz SMP nodes (2 GB RAM); 4-way ES40/667 A21264A (APAC) and 833 MHz SMP nodes (2 GB RAM); TCS1 system at PSC (comprising 750 4-way ES45 nodes - 3,000 EV68 CPUs

- with 4 GB memory per node, 8MB L2 cache GB memory per node, 8MB L2 cache Quadrics “fat tree” interconnect (5 usec latency, 250 MB/sec B/W)Quadrics “fat tree” interconnect (5 usec latency, 250 MB/sec B/W)

SGI Origin 3800SGI Origin 3800 SARA (1000 CPUs) - Numalink with R14k/500 & R12k/400 CPUsSARA (1000 CPUs) - Numalink with R14k/500 & R12k/400 CPUs

Cray Supercluster at EagenCray Supercluster at Eagen Linux Alpha Cluster (96 X API CS20s - dual 833 MHz EV67 CPUs, Myrinet) Linux Alpha Cluster (96 X API CS20s - dual 833 MHz EV67 CPUs, Myrinet)

Page 7: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Commodity Systems (CSx)Commodity Systems (CSx)Prototype / Evaluation HardwarePrototype / Evaluation Hardware

SystemsSystems LocationLocation CPUsCPUs ConfigurationConfigurationCS1CS1 DaresburyDaresbury 3232 PentiumIII / 450 MHz + FE (EPSRC)PentiumIII / 450 MHz + FE (EPSRC)CS2CS2 DaresburyDaresbury 6464 24 X dual UP2000/EV67-667, 24 X dual UP2000/EV67-667,

QSNet Alpha/LINUX cluster,QSNet Alpha/LINUX cluster,8 X dual CS20/EV67-833 (“loki”)8 X dual CS20/EV67-833 (“loki”)

CS3CS3 RALRAL 1616 Athlon K7 850MHz + myrinetAthlon K7 850MHz + myrinetCS4CS4 SaraSara 3232 Athlon K7 1.2 GHz + FEAthlon K7 1.2 GHz + FECS6CS6 CLiCCLiC 528528 PentiumIII / 800 MHz; fast PentiumIII / 800 MHz; fast

ethernet (Chemnitzer Cluster)ethernet (Chemnitzer Cluster)CS7CS7 DaresburyDaresbury 6464 AMD K7/1000 MP + SCALI/SCI AMD K7/1000 MP + SCALI/SCI (“ukcp”)(“ukcp”)

CS8CS8 NCSANCSA 320320 160 dual IBM Itanium/800 + Myrinet 2k 160 dual IBM Itanium/800 + Myrinet 2k (“titan”)(“titan”)

CS9CS9 BristolBristol 9696 Pentium4 Xeon/2000 + Myrinet 2k Pentium4 Xeon/2000 + Myrinet 2k (“dirac”)(“dirac”)

Protoype SystemsProtoype SystemsCS0CS0 DaresburyDaresbury 1010 10 CPUS, Pentium II/26610 CPUS, Pentium II/266

CS5CS5 DaresburyDaresbury 1616 8 X dual Pentium III/933, SCALI8 X dual Pentium III/933, SCALI

www.cse.clrc.ac.uk/Activity/DisCo

Page 8: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

T (32-nodes Cray T3E/1200E) / T (32 CPUs ) CSxT (32-nodes Cray T3E/1200E) / T (32 CPUs ) CSx

[ T 32-node T3E / T 32-node CS1 Pentium III/450 + FE ]

T 32-node T3E / T 32-node CS6 Pentium III/800 + FE

T T 32-node32-node T3E / T T3E / T 32-CPU32-CPU CS2 Alpha Linux Cluster + Quadrix CS2 Alpha Linux Cluster + Quadrix

Performance Metrics: 1999-2001Performance Metrics: 1999-2001Attempted to quantify delivered performance from the Commodity-based Attempted to quantify delivered performance from the Commodity-based systems against current MPP (CSAR Cray T3E/1200E)and ASCI-style SMP-systems against current MPP (CSAR Cray T3E/1200E)and ASCI-style SMP-node platforms (e.g. SGI Origin 3800) i.e.node platforms (e.g. SGI Origin 3800) i.e.

Performance Metric (% 32-node Cray T3E)Performance Metric (% 32-node Cray T3E)

Performance Metrics: 2002Performance Metric (% 32-node AlphaServer SC [PSC])Performance Metric (% 32-node AlphaServer SC [PSC])

T (32-CPUs AlphaServer SC ES45/1000) / T (32 CPUs ) CSx

T 32-CPU AlphaServer ES45 / T 32-CPU CS9 Pentium 4 Xeon / 2000 + Myrinet 2k

Page 9: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Beowulf Comparisons with the T3E & O3800/R14k-500Beowulf Comparisons with the T3E & O3800/R14k-500CSx - Pentium III + FECSx - Pentium III + FE% of 32-node Cray T3E/1200E% of 32-node Cray T3E/1200E

GAMESS-UKGAMESS-UK CS1CS1 CS6CS6SCFSCF 53-69%53-69% 96%96%DFTDFT 65-85%65-85% 130-130-

178%178%DFT (Jfit)DFT (Jfit) 44-77%44-77% 65-65-

131%131%DFT GradientDFT Gradient 90%90% 130%130%MP2 GradientMP2 Gradient44%44% 73%73%SCF ForcesSCF Forces 80%80% 127%127%

NWChem NWChem (DFT Jfit)(DFT Jfit) 50-60%50-60%

REALCREALC 67%67%

CRYSTALCRYSTAL 79-145%79-145%

DL_POLYDL_POLYEwald-based Ewald-based 95-107%95-107% 151-184%151-184%bond constraintsbond constraints 34-56%34-56% 69%69%

CHARMMCHARMM 96%96% 172%172%

CASTEPCASTEP 33%33% 42%42%CPMDCPMD 62%62%

ANGUSANGUS 60%60% 68%68%FLITE3DFLITE3D 104%104%

CS2 - QSNet Alpha Linux ClusterCS2 - QSNet Alpha Linux Cluster% of 32-node Cray T3E and O3800/R14k-500% of 32-node Cray T3E and O3800/R14k-500

GAMESS-UKGAMESS-UKSCFSCF 256%256% 99%99%DFT † DFT † 301-361%301-361% 99%99%DFT (Jfit)DFT (Jfit) 219-379% 89-100%219-379% 89-100%DFT Gradient † DFT Gradient † 289%289% 89%89%MP2 GradientMP2 Gradient 228%228% 87%87%SCF ForcesSCF Forces 154%154% 86%86%

NWChem NWChem (DFT Jfit) † (DFT Jfit) † 150-288% 150-288% 7474-135%-135%

CRYSTAL CRYSTAL †† 349%349%

DL_POLYDL_POLYEwald-based † Ewald-based † 363-470%363-470% 95%95%bond constraintsbond constraints 143-260%143-260% 82%82%

CHARMM CHARMM †† 404%404% 78%78%

CASTEP CASTEP 166%166% 78%78%

ANGUSANGUS 145%145%

FLITE3D FLITE3D †† 480%480%

Page 10: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

High-End Computational ChemistryHigh-End Computational ChemistryThe NWChem SoftwareThe NWChem Software

Capabilities Capabilities (Direct, Semi-direct and conventional):(Direct, Semi-direct and conventional): RHF, UHF, ROHFRHF, UHF, ROHF using up to 10,000 basis functions; analytic 1st using up to 10,000 basis functions; analytic 1st

and 2nd derivatives.and 2nd derivatives. DFTDFT with a wide variety of local and non-local XC potentials, using with a wide variety of local and non-local XC potentials, using

up to 10,000 basis functions; analytic 1st and 2nd derivatives.up to 10,000 basis functions; analytic 1st and 2nd derivatives. CASSCFCASSCF; analytic 1st and numerical 2nd derivatives.; analytic 1st and numerical 2nd derivatives. Semi-direct and RI-based MP2Semi-direct and RI-based MP2 calculations for RHF and UHF wave calculations for RHF and UHF wave

functions using up to 3,000 basis functions; analytic 1st derivatives functions using up to 3,000 basis functions; analytic 1st derivatives and numerical 2nd derivatives.and numerical 2nd derivatives.

Coupled cluster, CCSD and CCSD(T)Coupled cluster, CCSD and CCSD(T) using up to 3,000 basis using up to 3,000 basis functions; numerical 1st and 2nd derivatives of the CC energy. functions; numerical 1st and 2nd derivatives of the CC energy.

Classical molecular dynamics and free energy simulations with the forces obtainable from a variety of sources

Page 11: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Single, shared data structure

Physically distributed data • Shared-memory-like model– Fast local access– NUMA aware and easy to use– MIMD and data-parallel modes– Inter-operates with MPI, …

• BLAS and linear algebra interface• Ported to major parallel machines

– IBM, Cray, SGI, clusters,...• Originated in an HPCC project• Used by 5 major chemistry codes,

financial futures forecasting,astrophysics, computer graphics

Global ArraysGlobal Arrays

Page 12: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

(Solution of real symmetric generalized and standard eigensystem problems)

Full eigensolution performed on a matrix generated in a charge density fitting procedure (966 fitting functions for a fluorinated biphenyl).

•uaranteed orthonormal eigenvectors in the presence of large clusters of degenerate eigenvalues•Packed Storage•Smaller scratch space requirements

0

10

20

30

40

50

60

0 16 32 48 64Number of processors

Tim

e (s

ec)

IBM SP

Cray T3E

Features (not available elsewhere):• Inverse iteration using Dhillon-Fann-

Parlett’s parallel algorithm (fastest uniprocessor performance and good parallel scaling)

• Guaranteed orthonormal eigenvectors in the presence of large clusters of degenerate eigenvalues

• Packed Storage• Smaller scratch space requirements

PeIGS 3.0 Parallel Performance

Page 13: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Fock matrix (N = 1152)

0

5

10

15

20

25

30

35

40

45

50

2 4 8 16 32

PeIGS 2.1PeIGS 2.1 - Cray T3E/1200PeIGS 3.0PDSYEV (Scpk 1.5)PDSYEVD (Scpk 1.7)

SGI Origin 3800/R12k-400 (“green”)SGI Origin 3800/R12k-400 (“green”)

Scalability of Numerical AlgorithmsScalability of Numerical AlgorithmsReal symmetric eigenvalue problems

Number of processors

Tim

e (s

ec)

Fock matrix (N = 3888)

0

20

40

60

80

100

120

16 32 64 128 256 512

PeIGS 2.1PeIGS 3.0PDSYEV (Scpk 1.5)PDSYEVD (Scpk 1.7)BFG-Jacobi (DL)

Number of processors

Page 14: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

CS7 AMD K7/1000 MP + SCALICS7 AMD K7/1000 MP + SCALI

Parallel EigensolversParallel Eigensolvers Real symmetric eigenvalue problems

Number of processors Number of processors

N = 3888

0

100

200

300

400

500

600

4 8 16 32 64

PDSYEV: 2 CPUs per Node

PDSYEV: 1 CPU Per Node

Scalapack (PDSYEV)

Measured Time (seconds)Measured Time (seconds)

CS9 P4/2000 Xeon + MyrinetCS9 P4/2000 Xeon + Myrinet

N = 3888

0

50

100

150

200

250

300

350

400

8 16 32 64

PDSYEV: 2 CPUs / nodePDSYEV: 1 CPU / nodePDSYEVD: 2 CPUs / nodePDSYEVD: 1 CPU / nodePeIGS 2.1: 2 CPUs / nodePeIGS 2.1: 1 CPU / node

Page 15: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Case Studies - Zeolite FragmentsCase Studies - Zeolite Fragments

SiSi88OO77HH1818 347/832347/832

SiSi88OO2525HH1818 617/1444617/1444

SiSi2626OO3737HH3636 1199/28181199/2818

SiSi2828OO6767HH3030 1687/39281687/3928

• DFT Calculations with DFT Calculations with Coulomb FittingCoulomb Fitting

Basis (Godbout et al.)Basis (Godbout et al.) DZVP - O, SiDZVP - O, Si

DZVP2 - HDZVP2 - HFitting Basis:Fitting Basis:

DGAUSS-A1 - O, SiDGAUSS-A1 - O, SiDGAUSS-A2 - HDGAUSS-A2 - H

• NWChem & GAMESS-UKNWChem & GAMESS-UK

Both codes use auxiliary fitting Both codes use auxiliary fitting basis for coulomb energy, with basis for coulomb energy, with 3 centre 2 electron integrals 3 centre 2 electron integrals held in coreheld in core..

Page 16: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

DFT Coulomb Fit - NWChemDFT Coulomb Fit - NWChem

Number of CPUs Number of CPUs

Measured Time (seconds)

SiSi88OO77HH1818 347/832347/832 SiSi88OO2525HH1818 617/1444617/1444

Measured Time (seconds)

227 223

68

61

163

55

177

249

55

43

107

42

0

50

100

150

200

250

300

350

400

16 32

CS6 PIII/800 + FECS7 AMD K7/1000 + SCICS9 P4/2000 + MyrinetCS2 QSNet Alpha/667Cray T3E/1200ESGI O3800/R14k-500AlphaServer SC ES45/1000

596

511

182

177

411

159

514404

135

124

257

119

0

200

400

600

800

16 32

CS6 PIII/800 + FECS7 AMD K7/1000 + SCICS9 P4/2000 + MyrinetCS2 QSNet Alpha/667Cray T3E/1200ESGI O3800/R14k-500AlphaServer SC ES45/1000

76%,95% 88%,104%

Page 17: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

DFT Coulomb Fit - NWChemDFT Coulomb Fit - NWChem

Number of CPUs Number of CPUs

Measured Time (seconds) Measured Time (seconds)

SiSi2828OO6767HH3030 1687/39281687/3928SiSi2626OO3737HH3636 1199/28181199/2818

TTIBM-SP/P2SC-120 IBM-SP/P2SC-120 (256) = 1137(256) = 1137 TTIBM-SP/P2SC-120 IBM-SP/P2SC-120 (256) = 2766(256) = 2766

2388

1147

5169

2414

907

1271

517

798

502404

632

303

0

1500

3000

4500

32 64 128

CS7 AMD K7/1000 + SCICS9 P4/2000 + Myrinet 2kCS2 QSNet Alpha Cluster/667Cray T3E/1200ESGI Origin 3800/R14k-500AlphaServer SC ES45/1000 4682

2424

2351

5507

1580

3008

1617

1504

6090

3050

11821360

880611

0

2000

4000

6000

32 64 128

CS7 AMD K7/1000 + SCICS9 P4/2000 + Myrinet 2kCS2 QSNet Alpha ClusterCray T3E/1200ESGI Origin 3800/R14k-500AlphaServer SC ES45/1000

79%,210%

85%,227%

Page 18: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Memory-driven Approaches: NWChem - DFT (LDA) Memory-driven Approaches: NWChem - DFT (LDA) 1. Performance on the SGI Origin 38001. Performance on the SGI Origin 3800

• DZVP Basis (DZV_A2) and DgaussDZVP Basis (DZV_A2) and Dgauss A1_DFT Fitting basis: A1_DFT Fitting basis:

AO basis: AO basis: 3554 3554 CD basis:CD basis: 1271312713

• MIPS R14k-500 CPUs (Teras)MIPS R14k-500 CPUs (Teras)

Wall time (13 SCF iterations):Wall time (13 SCF iterations):64 CPUs = 5,242 seconds64 CPUs = 5,242 seconds128 CPUs= 3,951 seconds128 CPUs= 3,951 seconds

Est. time on 32 CPUs = Est. time on 32 CPUs = 40,00040,000 secs secs

Zeolite ZSM-5Zeolite ZSM-5

• 3-centre 2e-integrals = 1.00 X 103-centre 2e-integrals = 1.00 X 10 12 12

• Schwarz screening = 5.95 X 10Schwarz screening = 5.95 X 10 10 10

• % 3c 2e-ints. In core = 100%% 3c 2e-ints. In core = 100%

Page 19: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Memory-driven Approaches: NWChem - DFT (LDA) Memory-driven Approaches: NWChem - DFT (LDA) 2. Performance on the HP/Compaq AlphaServer SC2. Performance on the HP/Compaq AlphaServer SC

• DZVP Basis (DZV_A2) and DgaussDZVP Basis (DZV_A2) and Dgauss A1_DFT Fitting basis: A1_DFT Fitting basis:

AO basis: AO basis: 5457 5457 CD basis:CD basis: 1271312713

• 256 EV67/6-667 CPUs (64 Compaq256 EV67/6-667 CPUs (64 Compaq AlphaServer SC nodes) AlphaServer SC nodes)

Wall time (10 SCF iterations) on Wall time (10 SCF iterations) on 256 CPUs = 11,960 seconds 256 CPUs = 11,960 seconds (60% efficiency)(60% efficiency)

Pyridine in Zeolite ZSM-5Pyridine in Zeolite ZSM-5

• 3-centre 2e-integrals = 3.79 X 103-centre 2e-integrals = 3.79 X 10 12 12

• Schwarz screening = 2.81 X 10Schwarz screening = 2.81 X 10 11 11

• % 3c 2e-ints. In core = 1.66%% 3c 2e-ints. In core = 1.66%

Page 20: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Parallel Implementations of GAMESS-UKParallel Implementations of GAMESS-UK

Extensive use of Global Array (GA) Tools and Parallel Extensive use of Global Array (GA) Tools and Parallel Linear Algebra from NWChem Project (EMSL)Linear Algebra from NWChem Project (EMSL)

SCF and DFTSCF and DFT Replicated data, but …Replicated data, but … GA Tools for caching of I/O for restart and checkpoint filesGA Tools for caching of I/O for restart and checkpoint files Storage of 2-centre 2-e integrals in DFT Jfit Storage of 2-centre 2-e integrals in DFT Jfit Linear Algebra (via PeIGs, DIIS/MMOs, Inversion of 2c-2e matrix)Linear Algebra (via PeIGs, DIIS/MMOs, Inversion of 2c-2e matrix)

SCF second derivativesSCF second derivatives Distribution of <vvoo> and <vovo> integrals via GAsDistribution of <vvoo> and <vovo> integrals via GAs

MP2 gradientsMP2 gradients Distribution of <vvoo> and <vovo> integrals via GAsDistribution of <vvoo> and <vovo> integrals via GAs

Page 21: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

896

529656

387

821

987

1418

2040

878

1276 1357

2424

1289

611

925

522

369

597

0

500

1000

1500

2000

2500

16 32

CS6 PIII/800 + FECS7 AMD K7/1000 MP + SCICS2 QSNet Alpha Cluster/667CS9 P4 Xeon/2000 + MyrinetCray T3E/1200EIBM SP/WH2-375Compaq AlphaServer SC ES40/667SGI Origin3800/R14k-500Compaq AlphaServer SC ES45/1000

GAMESS-UK GAMESS-UK SCF PerformanceSCF PerformanceCray T3E/1200E, High-end and Commodity-based SystemsCray T3E/1200E, High-end and Commodity-based Systems

Number of CPUs

Cyclosporin:(3-21G Basis, 1000 GTOS)

Elapsed Time (seconds)

95%,135%

T3ET3E128128 = 436 = 436

Impact of Impact of Serial Serial Linear Algebra:Linear Algebra:TTIBM-SPIBM-SP(16) = (16) = 26562656 [1289] [1289]

TTIBM-SPIBM-SP(32) = (32) = 21842184 [ 821] [ 821]

Page 22: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

4399

1859

6927

1760

1249

2774

1018

3612

990713

0

2000

4000

6000

8000

16 32

CS6 PIII/800 + FECS4 AMD K7/1200 + FECS7 AMD K7/1000 MP + SCICS2 QSNet Alpha Cluster/667CS9 P4 Xeon/2000 + MyrinetCray T3E/1200EIBM SP WH2/375SGI Origin 3800/R14k-500Cray SuperCluster/833AlphaServer SC ES45/1000

GAMESS-UK. DFT B3LYP PerformanceGAMESS-UK. DFT B3LYP PerformanceCray T3E/1200, High-end and Commodity-based SystemsCray T3E/1200, High-end and Commodity-based Systems

Basis: 6-31G

Elapsed Time (seconds)

Cyclosporin, 1000 GTOs

70%,117%

Number of CPUs

Page 23: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

106.7

73.4

16

32

48

64

80

96

112

128

16 32 48 64 80 96 112 128

LinearCray T3E/1200ECompaq AlphaServer SC/667Cray SuperCluster/833SGI Origin 3800/R14k-500AlphaServer SC ES45/1000

3612

990713

2003

606424

1039

481 310

0

1000

2000

3000

4000

32 64 128

Cray T3E/1200E

Compaq AlphaServer SC ES40/667

SGI Origin 3800/R14k-500

Cray SuperCluster/833

Compaq AlphaServer SC ES45/1000

GAMESS-UK. DFT B3LYP PerformanceGAMESS-UK. DFT B3LYP PerformanceThe Cray T3E/1200 and High-end SystemsThe Cray T3E/1200 and High-end Systems

Basis: 6-31G Elapsed Time (seconds)

Cyclosporin, 1000 GTOs

S = 106.7S = 106.7

Speed-up

Number of CPUsNumber of CPUs

Page 24: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

7979

4423

3596

2810

10389

4919

3190

19342035

1825

6107

19351167

0

4000

8000

12000

32 64

CS6 PIII/800 + FECS7 AMD K7/1000 MP + SCICS2 QSNet Alpha Cluster/667CS9 P4 Xeon/2000 + MyrinetCray T3E/1200EIBM SP/WH2-375SGI Origin 3800 R14k/500AlphaServer SC ES45/1000

16

32

48

64

16 32 48 64

Linear

Cray T3E/1200E

IBM SP/WH2-375

CS1 PIII/450 + FE

SGI Origin 3800/R14k-500

CS2 QSNet Alpha Cluster/667

DFT BLYP Gradient:DFT BLYP Gradient: Cray T3E/1200, High-end and Commodity-based SystemsCray T3E/1200, High-end and Commodity-based Systems

Number of CPUs Number of CPUs

Geometry optimisation of polymerisation catalystGeometry optimisation of polymerisation catalystCl(CCl(C33HH55O).Pd[(P(CMeO).Pd[(P(CMe33))22))22.C.C66HH44]]

Basis 3-21G* (446 GTOs): 10 energy + gradient Basis 3-21G* (446 GTOs): 10 energy + gradient evaluationsevaluations

Speed-upElapsed Time (seconds)

linear

69%,114%

Page 25: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Auxilliary Basis Coulomb Fit (I)Auxilliary Basis Coulomb Fit (I)

Where Where V V is the matrix of 2-centre 2-electron repulsion integrals in the charge is the matrix of 2-centre 2-electron repulsion integrals in the charge density basis and density basis and bb are the three centre electron repulsion integrals between the are the three centre electron repulsion integrals between the wavefunction basis set and the charge density basis.wavefunction basis set and the charge density basis.

pqpq bVC 1

The approach is based on the expansion of the charge density in an The approach is based on the expansion of the charge density in an auxiliary basis of Gaussian functionsauxiliary basis of Gaussian functions

As suggested by Dunlap, a variational choice of the fitting coefficients C As suggested by Dunlap, a variational choice of the fitting coefficients C can be obtained as followscan be obtained as follows::

uu

u pq

pqupq

pqpq uduCDpqDr)(

Page 26: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

The number of 3-centre integrals is significantly smaller than the 4-centre The number of 3-centre integrals is significantly smaller than the 4-centre integrals used in the conventional coulomb evaluation, but for large molecules integrals used in the conventional coulomb evaluation, but for large molecules additional screening is required.additional screening is required.

We make use of the Schwarz inequalityWe make use of the Schwarz inequality

Auxilliary Basis Coulomb Fit (ii)Auxilliary Basis Coulomb Fit (ii)

uupqpqupq

Where p and q are AO basis functions and u are the fitting functions. Where p and q are AO basis functions and u are the fitting functions. Since screening is applied on a shell basis, the maximal integrals for Since screening is applied on a shell basis, the maximal integrals for each shell quartet are stored.each shell quartet are stored.

Using this screening, and exploiting the aggregate memory of a parallel Using this screening, and exploiting the aggregate memory of a parallel machine, it is possible to hold a significant fraction of the 3-centre machine, it is possible to hold a significant fraction of the 3-centre integrals in core.integrals in core.

Page 27: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

3910

1567

6226

1488

933

2333

780

3063

726477

0

2000

4000

6000

16 32

CS6 PIII/800 + FECS4 AMD K7/1200 + FECS7 AMD K7/1000 MP + SCALICS2 QSNet Alpha Cluster/667CS9 P4 Xeon/2000 + MyrinetCray T3E/1200EIBM SP/WH2-375Cray SuperCluster/833 AlphaServer SC ES45/1000

7638

3451

14980

3386

2418

4281

1791

7617

17831301

0

3000

6000

9000

12000

15000

16 32

CS6 PIII/800 + FECS7 AMD K7/1000 MP + SCALICS4 AMD K7/1200 + FECS2 QSNet Alpha Cluster/667CS9 P4 Xeon/2000 + MyrinetCray T3E/1200EIBM SP/WH2-375Cray SuperCluster/833 AlphaServer SC ES45/1000

GAMESS-UK: DFT HCTH on Valinomycin. GAMESS-UK: DFT HCTH on Valinomycin.

Impact of Coulomb FittingImpact of Coulomb Fitting

Number of CPUs

Number of CPUs

Measured Time (seconds)

Measured Time (seconds)

Basis: DZV_A2 (Dgauss)Basis: DZV_A2 (Dgauss)A1_DFT Fit: 882/3012A1_DFT Fit: 882/3012

73%,161%

JJEXPLICITEXPLICIT61%,93%

JJFITFIT

TTT3E/1200E T3E/1200E (128) = 2139(128) = 2139

TTT3E/1200E T3E/1200E (128) = 995(128) = 995

Page 28: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

3063

881726

1573

539476

995

419364

0

1500

3000

32 64 128

Cray T3E/1200ECompaq AlphaServer SC/667SGI Origin 3800/R14k-500Cray Alpha Cluster/833

7617

20331783

4300

1123978

2139

713598

0

4000

8000

32 64 128

Cray T3E/1200ESGI Origin 3800/R14k-500Compaq AlphaServer SC/667Cray Alpha Cluster/833

GAMESS-UK: DFT HCTH on Valinomycin. GAMESS-UK: DFT HCTH on Valinomycin. Impact of Coulomb Fitting: Impact of Coulomb Fitting: Cray T3E/1200, Cray Super Cray T3E/1200, Cray Super

Cluster/833, Compaq AlphaServer SC/667 and SGI Origin R14k/500Cluster/833, Compaq AlphaServer SC/667 and SGI Origin R14k/500

Number of CPUs Number of CPUs

Measured Time (seconds) Measured Time (seconds)

Basis: DZV_A2 (Dgauss)Basis: DZV_A2 (Dgauss)A1_DFT Fit: 882/3012A1_DFT Fit: 882/3012

JJEXPLICITEXPLICIT JJFITFIT

Page 29: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

GAMESS-UK: DFT HCTH on Valinomycin. Speedups GAMESS-UK: DFT HCTH on Valinomycin. Speedups for both Explicit and Coulomb Fitting.for both Explicit and Coulomb Fitting.

JJEXPLICITEXPLICIT JJFITFIT

Number of CPUsNumber of CPUs

Speedup Speedup

112

90.6

16

32

48

64

80

96

112

128

16 32 48 64 80 96 112 128

Linear

Cray T3E/1200E

CS2 QSNet Alpha Cluster/667

Compaq AlphaServer/667

Cray SuperCluster/883

SGI Origin 3800/R12k-400

100.1

65.4

16

32

48

64

80

96

112

128

16 32 48 64 80 96 112 128

linearCray T3E/1200E

CS2 QSNet Alpha Cluster/667Compaq Alpha Server SC/667

Cray SuperCluster/833SGI Origin 3800/R14k-500

SGI Origin 3800/R12k-400 105+105+

Page 30: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Memory-driven Approaches - SCF and DFT Memory-driven Approaches - SCF and DFT

Number of CPUs

HF and DFT Energy and Gradient calculation for HF and DFT Energy and Gradient calculation for NiSiMeNiSiMe22CHCH22CHCH22CC66FF1313: Basis Ahlrichs DZ (1485 GTOs): Basis Ahlrichs DZ (1485 GTOs)

Elapsed Time (seconds)

4206

2476

1723

830

4182

2405

1692

799

0

1000

2000

3000

4000

5000

28 60 124

HF - Direct SCFHF - In-coreDFT - Direct-SCFDFT - In-core

Integrals written directly Integrals written directly to memory, rather than toto memory, rather than todisk, and are not re-disk, and are not re-

calculatedcalculated

SGI Origin 3800/R14k-500SGI Origin 3800/R14k-500

Page 31: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

MP2 Gradient AlgorithmsMP2 Gradient Algorithms

ConventionalConventional integrals written to diskintegrals written to disk read back, transformed, read back, transformed,

written out, resorted etc.written out, resorted etc. heavy I/O demandsheavy I/O demands

Direct/Semi-direct (Frisch, Direct/Semi-direct (Frisch, Head-Gordon & Pople, Hasse Head-Gordon & Pople, Hasse and Ahlrichs)and Ahlrichs) replace all/some I/O with replace all/some I/O with

batched integral batched integral recomputationrecomputation

Poor I/O-to-compute Poor I/O-to-compute performance of MPPsperformance of MPPs direct approachdirect approach

Current MPPs have large Current MPPs have large global memoriesglobal memories

Store subset of MO integralsStore subset of MO integrals reduce number of integral reduce number of integral

recomputationsrecomputations increase communication increase communication

overheadoverhead Subset includes VOVO, VVOO, Subset includes VOVO, VVOO,

VOOO,VOOO, VVVO-class too large to storeVVVO-class too large to store compute VVVO-terms in compute VVVO-terms in

separate stepseparate step

SerialSerial ParallelParallel

Page 32: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

7487

5233

2722

2521

6713

3456

2496

1539

4847

3790

1550

1725

3530

2129

13461012

0

3000

6000

16 32

CS6 PIII/800 + FECS7 AMD K7/1000 MP + SCICS2 QSNet Alpha Cluster/667CS9 P4 Xeon/2000 + MyrinetCray T3E/1200EIBM SP/WH2-375Compaq AlphaServer SC/667SGI Origin3800/R14k-500AlphaServer SC ES45/1000

16

32

48

64

80

96

112

128

16 32 48 64 80 96 112 128

Linear

Cray T3E/1200EAlphaServer SC ES45/1000

SGI Origin3800/R14k-500

Mn(CO)5H - MP2 geometry optimisation

BASIS: TZVP + f (217 GTOs

Performance of MP2 Gradient ModulePerformance of MP2 Gradient ModuleCray T3E/1200, High-end and Commodity-Cray T3E/1200, High-end and Commodity-based Systemsbased Systems

Number of CPUsNumber of CPUs

Elapsed Time (seconds)Elapsed Time (seconds)

6713

2496

1539

3530

1346

1012

1923

836634

1158

529

0

1500

3000

4500

6000

16 32 64 128

Cray T3E/1200ESGI Origin3800/R14k-500AlphaServer SC ES45/1000

Speed-up

59%,78%

Page 33: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

2687

1080

1439

803

488 499402

0

500

1000

1500

2000

2500

3000

16 32 64 128

Cray T3E/1200ESGI Origin 3800/R12k-400SGI Origin 3800/R14k-500AlphaServer SC ES40/667

1450

949

2687

1842

1490

1080933

543

1439

1073

803746

501

0

1000

2000

3000

16 32

CS2 QSNet Alpha Cluster/667CS9 P4 Xeon/2000 + MyrinetCray T3E/1200EIBM SP/WH2-375SGI Origin 3800/R14k-500AlphaServer SC ES40/667AlphaServer SC ES45/1000

SCF Analytic 2nd Derivatives PerformanceSCF Analytic 2nd Derivatives PerformanceCray T3E/1200, High-end and Commodity-based SystemsCray T3E/1200, High-end and Commodity-based Systems

(C6H4(CF3))2: Basis 6-31G (196 GTO)

Elapsed Time (seconds)

• Terms from MO 2e-integrals in GA storage (CPHF & pert. Fock matrices); Calculation dominated by CPHF:Gaussian98 - L1002 (CPU) - 32 nodes: 1181 secs; 64 nodes: 1058 secs.GAMESS-UK (total job time); 128 nodes: 499 secs.

CPUs

92%,148%

CPUs

G98: 2271

G98: 1706

G98: 1490

Page 34: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Vampir 2.5

VVisualization and AAnalysis ofMPIMPI Prrograms

GAMESS-UK on High-end and Commodity class machines

• extensions to handle GA applications

Performance Analysis of GA-Performance Analysis of GA-based Applications using based Applications using

Vampir Vampir

Page 35: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

GAMESS-UK / GAMESS-UK / SiSi88OO2525HH1818 : 8 CPUs: : 8 CPUs:

One DFT CycleOne DFT Cycle

Page 36: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

GAMESS-UK / GAMESS-UK / SiSi88OO2525HH1818 : 8 CPUs : 8 CPUsQQ††HQ (GAMULT2) and PEIGSHQ (GAMULT2) and PEIGS

Page 37: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Materials Simulation CodesMaterials Simulation Codes

Plane Wave DFT Codes:• CASTEP• VASP• CPMD

These codes have similar These codes have similar functionality, power and problems. functionality, power and problems. CASTEP is the flagship code of CASTEP is the flagship code of UKCP and hence subsequent UKCP and hence subsequent discussions will focus on this.discussions will focus on this.

Local Gaussian Basis Set Codes:• CRYSTAL

This code presents a different set of problems when considering performance on HPC(x).

SIESTA and SIESTA and CONQUEST:CONQUEST:

• O(n) scaling codes which will be extremely attractive to users. O(n) scaling codes which will be extremely attractive to users. • Both are currently development rather than production codes. Both are currently development rather than production codes.

Page 38: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

CRYSTAL - 2000CRYSTAL - 2000 Distributed Data implementationDistributed Data implementation Benchmark: Benchmark:

An Acid Centre in Zeolite-Y (Faujasite)An Acid Centre in Zeolite-Y (Faujasite) Single point energySingle point energy 145 atoms / cell, No symmetry / 8k-points145 atoms / cell, No symmetry / 8k-points 2208 basis functions, (6-21G2208 basis functions, (6-21G* * ))

18000

7609

9962

4028

5633

22063241

1483

0

7000

14000

21000

16 32 64 128

CS1 PIII/450 + FE/LAMCS2 Alpha Linux Cluster/667Cray T3E/1200ESGI Origin 3800/R12k-400IBM SP/WH2-375

144

128.8

0

64

128

192

256

0 64 128 192 256

LinearCray T3E/1200ECS1 PIII/450 + FESGI Origin 3800/R12k-400

Elapsed Time (seconds)Speed-up

Number of CPUs

TT256256 SGI Origin 3800 = 945 secs SGI Origin 3800 = 945 secs

Page 39: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Materials Simulation. Materials Simulation. Plane Wave Methods: CASTEPPlane Wave Methods: CASTEP

cutEGk

G

rGkikGj

kj eCr

2)().(

,)(

• Direct minimisation of the total energy (avoiding diagonalisation)

• Pseudopotentials must be used to keep the number of plane waves manageable

• Large number of basis functions N~106 (especially for heavy atoms).

The plane wave expansion means that the bulk of the computation comprises large 3D Fast Fourier Transforms (FFTs) between real and momentum space.

• These are distributed across the processors in various ways.

• The actual FFT routines are optimized for the cache size of the processor.

Page 40: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

CASTEP 4.2 - Parallel BenchmarkCASTEP 4.2 - Parallel BenchmarkChabaziteChabazite Acid sites in a zeolite. (SiAcid sites in a zeolite. (Si1111 O O2424

Al H)Al H) Vanderbilt ulatrasoft pseudo-Vanderbilt ulatrasoft pseudo-

potentialpotential Pulay density mixing minimiser Pulay density mixing minimiser

schemescheme single k point total energy, 96 single k point total energy, 96

bandsbands 15045 plane waves on 3D FFT 15045 plane waves on 3D FFT

grid size = 54x54x54; grid size = 54x54x54; convergence in 17 SCF cyclesconvergence in 17 SCF cycles

CPUs

Measured Time (seconds)

Time (comms)IBM SP/WH2-375 157Cray T3E/1200E 90CS6 PIII/800+FE 600CS7 AMD K7/1000 + SCI 242CS9 P4/2000 +Myrinet 115CS2 QSNet Alpha 111SGI Origin 3800/R14k 71

1228

367330

272

637

422

235228

147

774

195205

204

324271

153137

89

0

250

500

750

1000

1250

16 32

CS6 PIII/800 FE/MPICHCS2 QSNet Alpha Cluster/667CS9 P4/2000 Xeon + MyrinetCS8 Itanium/800 + MyrinetCray T3E/1200EIBM SP/WH2-375SGI Origin 3800/R14k-500AlphaServer SC ES45/1000IBM Regatta-H

67%,75%

GG

Page 41: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

CASTEP 4.2 - Parallel Benchmark II.CASTEP 4.2 - Parallel Benchmark II.TiN: TiN: A TiN 32 atom slab, 8 k-points, single point A TiN 32 atom slab, 8 k-points, single point

energy calculation with Mulliken analysis,energy calculation with Mulliken analysis,

CPUs

Measured Time (seconds)

1781

3109

1712

2815

7881178

2332

6038

2287

1096

560800

1312

577610910

0

2000

4000

6000

32 64 128

CS9 P4/2000 + Myrinet 2kCS8 Itanium/800 + Myrinet 2kCray T3E/1200ESGI Origin 3800/R14k-500AlphaServer SC ES45/1000IBM SP/Regatta-H

kGkG47%,81%

Page 42: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

CPMD - Car-Parrinello Molecular DynamicsCPMD - Car-Parrinello Molecular Dynamics

CPMDCPMD Version 3.5.1: Hutter, Alavi, Deutsh, Version 3.5.1: Hutter, Alavi, Deutsh,

Bernasconi, St. Goedecker, Marx, Bernasconi, St. Goedecker, Marx, Tuckerman and Parrinello (1995-2001)Tuckerman and Parrinello (1995-2001)

DFT, plane-waves, pseudo-potentials DFT, plane-waves, pseudo-potentials and FFT'sand FFT's

Benchmark Example: Benchmark Example: Liquid WaterLiquid Water Physical Specifications:Physical Specifications:

32 molecules, Simple cubic periodic box of 32 molecules, Simple cubic periodic box of length 9.86 A, Temperature 300Klength 9.86 A, Temperature 300K

Electronic StructureElectronic Structure; ; BLYP functional, Trouillier Martins BLYP functional, Trouillier Martins pseudopotential, Recriprocal space cutoff pseudopotential, Recriprocal space cutoff 70 Ry = 952 eV70 Ry = 952 eV

CPMD is the base code for the new CCP1 CPMD is the base code for the new CCP1 flagship project flagship project

CPUs

Elapsed Time (secs.)Elapsed Time (secs.)

Sprik and Vuilleumier (Cambridge)Sprik and Vuilleumier (Cambridge)

390

259

204221

145

99

253

208

124 11195

63

0

200

400

16 32

CS7 dual-Athlon K7/1000CS9 P4/2000 Xeon + Myrinet (2CPU)CS9 P4/2000 Xeon + Myrinet (1CPU)CS2 Alpha Linux ClusterIBM SP/WH2-375SGI Origin 3800 / R14k-500AlphaServer SC ES40 / 667AlphaServer SC ES45 / 1000

30%,53%

Page 43: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

0

64

128

192

256

0 64 128 192 256

DL_POLY Parallel Benchmarks (Cray T3E/1200)DL_POLY Parallel Benchmarks (Cray T3E/1200)

Number of Nodes

4. NaCl; Ewald, 27,000 ions5. NaK-disilicate glass; 8,640 atoms,MTS+ Ewald8. MgO microcrystal: 5,416 atoms

Number of Nodes

9. Model membrane/Valinomycin (MTS, 18,886)7. Gramicidin in water (SHAKE, 12,390)6. K/valinomycin in water (SHAKE, AMBER, 3,838)1. Metallic Al (19,652 atoms, Sutton Chen)3. Transferrin in Water (neutral groups + SHAKE, 27,593)2. Peptide in water (neutral groups + SHAKE, 3993).

Speed-up

0

16

32

48

64

0 16 32 48 64

Speed-up LinearLinear

V2: Replicated Data

Page 44: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

385

204

98

191

135

55

376

817

114

6590

48 4379

0

250

500

750

16 32

CS6 PIII/800 + FE/LAMCS7 AMD K7/1000 MP + SCICS4 AMD K7/1200 +FECS9 P4/2000 + MyrinetCS2 QSNet Alpha Cluster/667CS8 Itanium/800 + MyrinetCray T3E/1200EIBM SP/WH2-375SGI Origin 3800/R14k-500CraySuperCluster/833IBM Regatta-HAlphaServer SC ES45/1000

DL_POLY: Cray/T3E, High-end and DL_POLY: Cray/T3E, High-end and Commodity-based SystemsCommodity-based Systems

Bench 4. NaCl; 27,000 ions, 27,000 ions, Ewald , 75 time Ewald , 75 time steps, Cutoff=24Åsteps, Cutoff=24Å

Number of CPUs

Measured Time (seconds)

T3ET3E128128 =94 =94

44%,71%

Page 45: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

228

149

153

53

89

167

12168

44

57

109

60

4052

36

88

62

3623

0

50

100

150

200

250

16 32 64

CS6 PIII/800 + FE/LAMCS7 AMD K7/1000 MP + SCICS9 P4/2000 + MyrinetCS8 Itanium/800 + MyrinetCS2 QSNet Alpha Cluster/667SGI Origin3800/R14k-500Cray SuperCluster/833AlphaServer SC ES45/1000

DL_POLY: Scalability on the T3E, High-end & DL_POLY: Scalability on the T3E, High-end & Commodity SystemsCommodity Systems

Measured Time (seconds)

Bench 5: NaK-disilicate glass; 8,640 atoms, MTS + Ewald: 270 time steps

Number of CPUs

T3ET3E128128 = 75 = 75

Speed-up

Number of CPUs

98

50.1

0

32

64

96

128

0 32 64 96 128

LinearCray T3E/1200ESGI Origin 3800/R14k-500Cray SuperCluster / 833CS1 Pentium III/800 + FECS7 AMD K7/1000 MP +SCICompaq AlphaServer ES45/1000

53%,84%

Page 46: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

733

552

182

121

190

306

216

147

125

191

688

382

154

9311678

0

200

400

600

800

16 32

CS6 PIII/800 + FE/LAMCS4 AMD K7/1200 + FECS7 AMD K7/1000 MP + SCICS9 P4/2000 + MyrinetCS2 QSNet Alpha Cluster/667CS8 Itanium/800 + MyrinetCray T3E/1200EIBM SP/WH2-375Cray SuperCluster/833SGI Origin 3800/R14k-500IBM Regatta-HAlphaServer SC ES45/1000

DL_POLY: Macromolecular SimulationsDL_POLY: Macromolecular Simulations

Bench 7. Gramicidin in water; rigid bonds and rigid bonds and SHAKE, SHAKE, 12,390 atoms,12,390 atoms,500 time steps500 time steps

Number of CPUs

Measured Time (seconds)

T3ET3E128128 =166 =16641%,64%

Page 47: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Migration from Replicated to Distributed dataMigration from Replicated to Distributed data DL_POLY-3 : Domain DecompositionDL_POLY-3 : Domain Decomposition

Distribute atoms, forces across the Distribute atoms, forces across the nodesnodes More memory efficient, can address More memory efficient, can address

much larger cases (10 much larger cases (10 55-10 -10 77)) Shake and short-ranges forces require Shake and short-ranges forces require

only neighbour communicationonly neighbour communication communications scale linearly with communications scale linearly with

number of nodesnumber of nodes

Coulombic energy remains globalCoulombic energy remains global strategy depends on problem and strategy depends on problem and

machine characteristicsmachine characteristics Adopt Particle Mesh Ewald scheme

includes Fourier transform smoothed includes Fourier transform smoothed charge density (reciprocal space grid charge density (reciprocal space grid typically 64x64x64 - 128x128x128)typically 64x64x64 - 128x128x128)

AA BB

CC DD

Page 48: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Conventional routines (Conventional routines (e.g.e.g. fftw) assume plane fftw) assume plane or column distributions or column distributions

A global transpose of the data is required to A global transpose of the data is required to complete the 3D FFT and additional costs are complete the 3D FFT and additional costs are incurred re-organising the data from the natural incurred re-organising the data from the natural block domain decomposition. block domain decomposition.

An alternative FFT algorithm has been designed An alternative FFT algorithm has been designed to reduce communication costs. to reduce communication costs.

the 3D FFT are performed as a series of 1D the 3D FFT are performed as a series of 1D FFTs, each involving communications only FFTs, each involving communications only between blocks in a given columnbetween blocks in a given column

More data is transferred, but in far fewer More data is transferred, but in far fewer messagesmessages

Rather than all-to-all, the communications are Rather than all-to-all, the communications are column-wise onlycolumn-wise only

Plane Block

Migration from Replicated to Distributed dataMigration from Replicated to Distributed data DL_POLY-3: Coulomb Energy EvaluationDL_POLY-3: Coulomb Energy Evaluation

Page 49: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

171

186

0

64

128

192

256

0 64 128 192 256

Linear

AlphaServer ES45/1000

CS9 P4/2000 + Myrinet 2k

SGI Origin 3800/R14k-500

Migration from Replicated to Distributed dataMigration from Replicated to Distributed data DL_POLY-3: Coulomb Energy PerformanceDL_POLY-3: Coulomb Energy Performance

Speed-up

Number of CPUs

74.4

79.1

94.5

0

32

64

96

128

0 32 64 96 128

LinearDL_POLY-2: AlphaServerAlphaServer SC ES45/1000CS9 P4/2000 + MyrinetSGI Origin 3800/R14k-500

NaCl Simulation; DLPOLY_2.11, Ewald summationEwald summation

DLPOLY_3, DLPOLY_3, PMESPMES

DLPOLY_2.1127,000 ions, 500 time steps, 27,000 ions, 500 time steps, Cutoff=24ÅCutoff=24ÅDLPOLY_3DLPOLY_327,000 ions, 500 time steps, 27,000 ions, 500 time steps, Cutoff=12ÅCutoff=12Å DLPOLY_3DLPOLY_3

216,000 ions, 200 time steps, 216,000 ions, 200 time steps, Cutoff=12ÅCutoff=12Å

Page 50: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

174.7

208.4

0

64

128

192

256

0 64 128 192 256

Linear

AlphaServer SC ES45/1000CS9 P4/2000 + Myrinet 2k

SGI Origin 3800/R14k-500

Migration from Replicated to Distributed dataMigration from Replicated to Distributed data DL_POLY-3: Macromolecular SimulationsDL_POLY-3: Macromolecular Simulations

DLPOLY_3DLPOLY_3792,960 ions, 50 time steps792,960 ions, 50 time steps

Speed-up

Number of CPUs

26.4

31

43

0

16

32

48

64

0 16 32 48 64

LinearDL_POLY-2: 12.390 atoms - AlphaServerAlphaServer SC ES45/1000CS9 P4/2000 + Myrinet 2kSGI Origin 3800/R14k-500

DL_POLY 2.1112,390 atoms, 500 time steps12,390 atoms, 500 time stepsDL_POLY 3DL_POLY 399,120 atoms, 100 time steps99,120 atoms, 100 time steps

Gramicidin in water;rigid bonds + SHAKE

Page 51: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Parallel CHARMM BenchmarkParallel CHARMM BenchmarkBenchmark MD Calculation of Carboxy Myoglobin Benchmark MD Calculation of Carboxy Myoglobin (MbCO) with 3830 Water Molecules: (MbCO) with 3830 Water Molecules: 14026 atoms, 14026 atoms, 1000 steps (1 ps), 12-14 A shift.1000 steps (1 ps), 12-14 A shift.

4

20

36

52

4 20 36 52

LinearCS1 PIII/450 + FE/LAMCray T3E/1200ECS2 QSNet Alpha Cluster/667SGI Origin 3800/R14k-500

Measured Time (seconds)

Number of CPUs

231

104

114

89

199

64

6661

62

51

6473

0

100

200

300

16 32 64

CS6 PIII/800 + FE/LAMCS2 QSNet Alpha Cluster/667CS7 AMD K7/1000 MP + SCICS9 P4/2000 + Myrinet 2kSGI Origin 3800/R14k-500AlphaServer SC ES45/1000

95%,103%

T3ET3E1616 = 665, T3E = 665, T3E128128 = 106 = 106

Page 52: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Parallel CHARMM Benchmark:Parallel CHARMM Benchmark:LAM MPI vs. MPICHLAM MPI vs. MPICHBenchmark MD Calculation of Carboxy Myoglobin (MbCO) with 3830 Benchmark MD Calculation of Carboxy Myoglobin (MbCO) with 3830 Water Molecules:Water Molecules:

Measured Time (seconds)

Number of CPUs

198.9

440.1

0

150

300

450

600

750

4 8 16 32

CS6 PIII/800 + FE

LAM 6.3-b2

MPICH-1.2.1

Page 53: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

QM/MM Thrombin BenchmarkQM/MM Thrombin Benchmark

QM/MM calculation on thrombin pocket:

QM region (69 Atoms) is modelled by SCF 6-31G calculation (401 GTOs), total system size for the classical calculation is 16,659 atoms. The MM calculation was performed using all non-bonded interactions (i.e. without a pairlist), and the QM/MM interaction was cut off at 15 angstroms using a neutral group scheme (leading to the inclusion in the QM calculation of 1507 classical centres).

0

400

800

1200

1600

2000

1 2 4 8 16 32 64 128 256

QM time

MM Time

Scaling for Thrombin + Inhibitor (NAPAP) + WaterScaling for Thrombin + Inhibitor (NAPAP) + Water

Speedup (Cray T3E)

0

32

64

96

128

0 32 64 96 128

Number of Nodes

Page 54: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

• QM region 35 atoms (DFT BLYP) – include residues with possible proton donor/acceptor roles – GAMESS-UK, MNDO, TURBOMOLE

• MM region (4,180 atoms + 2 link)– CHARMM force-field, implemented in CHARMM, DL_POLY

Triosephosphate isomerase (TIM)

• Central reaction in glycolysis, catalytic interconversion ofDHAP to GAP

• Demonstration case within QUASI (Partners UZH, and BASF)

Triosephosphate isomerase (TIM)

• Central reaction in glycolysis, catalytic interconversion ofDHAP to GAP

• Demonstration case within QUASI (Partners UZH, and BASF)

QM/MM Applications

1467

1030

1487

714

802

540

778

419

508

308

428

274

196

257

213

0

400

800

1200

1600

8 16 32 64

CS7 AMD K7/1000 + SCI

CS9 P4/2000 + Myrinet 2k

SGI Origin3800/R14k-500

AlphaServer SC ES45/1000

Measured Time (seconds)

T T 128128 (O3800/R14k-500) = 181 secs (O3800/R14k-500) = 181 secs

89%,139%

Page 55: Computational Chemistry at Daresbury 16-22 November 2002 Computational Science and Engineering Department Daresbury Laboratory Computational Chemistry.

Computational Chemistry at Daresbury 16-22 November 2002

Computational Science and Engineering Department Daresbury Laboratory

Commodity Comparisons with High-end Systems:Commodity Comparisons with High-end Systems:Compaq AlphaServer SC ES45/1000 and the SGI O3800/R14k-500Compaq AlphaServer SC ES45/1000 and the SGI O3800/R14k-500

CS9 - Pentium4/2000 Xeon + Myrinet 2kCS9 - Pentium4/2000 Xeon + Myrinet 2k% of 32-CPU % of 32-CPU AlphaServer SC Origin 3800 AlphaServer SC Origin 3800GAMESS-UKGAMESS-UKSCFSCF 95% 135%DFTDFT 70-73% 117-161%DFT (Jfit)DFT (Jfit) 59-70% 92-111%DFT GradientDFT Gradient 69% 114%MP2 GradientMP2 Gradient 59% 78%SCF ForcesSCF Forces 92% 148%

NWChem NWChem (DFT Jfit)(DFT Jfit) 65-88% 94-227%

GAMESS / CHARMMGAMESS / CHARMM 89% 139%

DL_POLYDL_POLYEwald-based Ewald-based 44-53% 71-84%bond constraintsbond constraints 41% 64%

CHARMMCHARMM 95% 103%

CASTEPCASTEP 47-67% 75-81%CPMDCPMD 30% 53%

ANGUSANGUS 35-69% 51-147%