Top Banner
12

Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

May 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Analysis of Mantevo MiniMD benchmark

Gaurav Chotalia

Friedrich-Alexander-University Erlangen-Nürnberg

July 6, 2016MuCoSim SS16

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 1 /

12

Page 2: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Overview

1 MD for atomistic simulation

2 Pro�ling

3 Hotspot and bottlenecks

4 Performance analysis

5 Further outlook

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 2 /

12

Page 3: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Algorithm for MD in MiniMD

Velocity Verlet formulation:

Initialize: X(Positions), V(Velocities) and F(Forces).For timesteps:For every atom:

Update V by 1/2 step (using F).Update X (using V).Build neighbor lists(Occasionally)Calculate F (considering neighbors of current atom)Update V by 1/2 step (using new F).

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 3 /

12

Page 4: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Lennard-Jones (LJ) potential and cut o�

ULJ = 4ε[(

σr

)12 − (σr )6]LJ is fast decaying pair potential.

Force is calculated as gradient of Potential.

Decreases rapidly and approaches zero.

This justi�es cut o� after certain distanceand save computations.

This calls for creating/maintaining list ofneighbors.

Figure : LJ potentialwww.�le.scir.org

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 4 /

12

Page 5: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Neighbor list creation

Verlet list

List of atoms within sphere of Rcut + Rskin

Update list when any atom moves 1

2Rskin

Creating list still requires checking againstall atoms!!

Link Cell

Organize atoms in cells of size Rcut

Check only neighbor cells on grid for forcecalculations.Scanning volume of 27R3

cut vs 4

3πR3

cut

Hybrid

Use link cell approach to create Verlet list.Figure : Neighbor listwww.lammps.sandia.gov

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 5 /

12

Page 6: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Exploiting Newton's 3rd law

We calculate force as interaction betweentwo particles

Use the fact Fij = −Fji

Only half of total work needs to be done.

But there are problems with vectorization.

Figure : Neighbor listwww.lammps.sandia.gov

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 6 /

12

Page 7: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Pro�ling

% time Name

77.56 ForceLJ::compute_halfneigh<0, 1>14.29 Neighbor::build2.04 ForceLJ::compute_halfneigh<0, 1>

Table : Half neighbor list

% time Name

77.00 ForceLJ::compute_fullneigh<0>19.12 Neighbor::build2.77 ForceLJ::compute_fullneigh<1>

Table : Full neighbor list

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 7 /

12

Page 8: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Code from bottleneck region

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 8 /

12

Page 9: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Naive Roo�ine model(Considering only DIV)

Performance unit

Particle updates per second (PU/s)

Average neighbors per particle (neighavg )= 78Avg. fraction of neighbors within cut-o� (cutNeighavg )= 0.7 (DIVperformed)Pmax = f ∗ncores∗SIMD

cycles per DIV ∗neighavg∗cutNeighavgPUs

Pmax = 2.2∗109∗10∗214∗78∗0.7 = 57.56 MPU/s

Iknee ∗ bs = Pmax considering bs = 40.6 GB/sIknee =

Pmax

bs= 0.0014 PU

bytes

Assumed data transfer

3 LD per neighbor

I = 1 PUneighavg∗3∗8 bytes

I = 0.0005 PUbytes < Iknee !!!

likwid measured data transfer

14 of assumed

I = 4 ∗ 1 PUneighavg∗3∗8 bytes

I = 0.002 PUbytes > Iknee !!!

Code scales perfectly so assumed data transfer is obviously wrong !!Gaurav Chotalia (FAU) MiniMD

July 6, 2016 MuCoSim SS16 9 /12

Page 10: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Possible cause for data trasfer discrepency

kernel Code

for(int i = 0; i < nlocal; i++) {

for(int k = 0; k < numneighs; k++) { // Avg. length = 78

const int j = neighs[k]; // strided

LD -> x[j * PAD + 0], x[j * PAD + 1],x[j * PAD + 2]

//calculations

}

// update f[i]

}

Total data volume for one i iteration (considering CL granularity)= 78*8*8= 4.8 kB-> can be kept in cache

Not every atom will have all di�erent neighbors

Extra neighbors loaded due to CL can be used from cache.

likwid measurements show 40 % higher data volume for L2 ascompared to MEM and L3.

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 10 /

12

Page 11: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Further outlook

Check if there is some ordering in neighbor list building so we canknow apriori % LD in inner loop that would not be needed.

Re�ne roo�ine model considering other operations which cannot hidebehind DIV(Current Measured performance is 1

4Pmax)

Investigate e�ect of LD/ST ratio (here 11)

Investigate e�ects of branch misprediction ??

Compare performance of full and half-neighbor versions.

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 11 /

12

Page 12: Analysis of Mantevo MiniMD benchmark - RRZE Moodle...Analysis of Mantevo MiniMD benchmark Gaurav Chotalia Friedrich-Alexander-University Erlangen-Nürnberg July 6, 2016 MuCoSim SS16

Thank you. Questions ??

Gaurav Chotalia (FAU) MiniMDJuly 6, 2016 MuCoSim SS16 12 /

12