Top Banner
Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory STFC
24

Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Dec 17, 2015

Download

Documents

Clyde Haynes
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Distributed Computing for

Crystallographyexperiences and opportunities

Dr. Kenneth Shankland & Tom GriffinISIS Facility

CCLRC Rutherford Appleton Laboratory

STFC

Page 2: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Supercomputer

Expensive

State of the art

Good results

Dedicated

Cluster

Cheaper

Can easily expand

Dedicated

Distributed Grid

Cheaper

Increase with time

Can expand

Not dedicated

Many separate machines

Easy to use

Can easily expand

May be dedicated

Background – Parallel Computing Options

Page 3: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Spare cycles concept

• Typical PC usage is about 10%

• Usage minimal after 5pm

• Most desktop PCs are really fast

• Can we use (“steal?”) unused CPU

cycles to solve computational problems?

Page 4: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Suitable apps

•CPU Intensive

•Low to moderate memory use

•Licensing issues

•Not too much file output

•Coarse grained

•Command line / batch driven

Page 5: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

The United Devices GridMP System

• Server hardware• Two, dual Xeon 2.8GHz servers RAID 10

• Software• Servers run RedHat Linux Advanced Server / DB2• Unlimited Windows (and other) clients

•Programming• Web Services interface – XML, SOAP• Accessed with C++ and Java

• Management Console• Web browser based• Can manage services, jobs, devices etc

• Large industrial user base•GSK, J&J, Novartis etc.

Page 6: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

GridMP Platform Object ModelDocking

GOLDLigandfitMyDockTest ligands proteins

molec 1

molec m

protein 1

protein n

GOLD 2.0

WindowsLinux

gold20win.exegold20_rh.exe

Page 7: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Adapting a program for GridMP

1) Think about how to split your data

2) Wrap your executable

3) Write the application service• Pre and Post processing

4) Use the Grid

• Fairly easy to write

• Interface to grid via Web Services

• So far used: C++, Java, Perl, C# (any .Net language)

Page 8: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Package your executable

PROGRAM MODULEEXECUTABLE

Uploaded to, and residenton, the server

ExecutableDLLs Standard data

files Environmentvariables

Compress?

Encrypt? }

Page 9: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Create / run a jobPkg1 Pkg4Molecules Proteins

Pkg2 Pkg3

Create job, generatecross product

Datasets

Workunits

Clie

nt s

ide

Ser

ver

side

https://

Start job

Page 10: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Job execution

Page 11: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Current status at ISIS

•218 registered devices

•321 total CPUs

• Potential power ~300Gflops (cf HPCx @ 500Gflops)

Page 12: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Application: Structures from powders

CT-DMF2: Solvated form of a polymorphic pharmaceutical from xtal screen– a=12.2870(7), b=8.3990(4),

c=37.021(2), β= 92.7830(10)

– V= 3816.0(4)

– P21/c, Z’=2 (Nfragments=6)

605040302010

2,200

2,000

1,800

1,600

1,400

1,200

1,000

800

600

400

200

Asymmetric unit

Page 13: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

DASH

•Optimise molecular models against diffraction data

•Multi-solution simulated annealing

• Execute a number of SA runs (say 25) , pick the

best one

Page 14: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Grid adapt - straightforward

• Run GUI DASH as normal up to SA run point,

create .duff file

• Submit SA runs to GRID from own PCc:\dash-submit famot.grd

uploading data to server…

your job_id is 4300

•Retrieve and collate SA results

from GRID to your own PC

c:\dash-retrieve 4300

retrieving job data from server…

results stored in famot.dash

• View results as normal with DASH

Page 15: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Example of speedup

• Execute 80 SA runs on famotidine

with #SA moves set to 4 million

• Elapsed time 6hrs 40mins on 2.8GHz P4

• Elapsed time on grid 27 mins

• Speedup factor = 15 with only 24PCs

Page 16: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Think big…

or

Think different…

13 torsions + disordered benzoateZ’=4 72 non-H atoms / asu

Page 17: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

HMC structure solution

< single molecular dynamics trajectory >

Page 18: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Calculations embody ca. 6 months of CPU time.

On our current grid, runs would be completed in ca. 20 hours.

Algorithm ‘sweet spot’

Page 19: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

MD Manager• Slightly different to previous ‘batch’ style jobs

• More ‘interactive’

Page 20: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

• Large run - McStas• Submit program breaks up –n#####

• Uploads new command line + data + executable

• Parameter scan, fixed neutron count• Send each run to a separate machine

Instrument simulation

Page 21: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Full diffraction simulation for HRPD

tof / microseconds60,00055,00050,00045,00040,00035,00030,000

No

rma

lise

d I

nte

nsi

ty

65

60

55

50

45

40

35

30

25

20

15

10

5

0

-5

cubic-ZrW2O8 100.00 %

tof / microseconds120,000100,00080,00060,00040,000

No

rma

lise

d I

nte

nsi

ty280

260

240

220

200

180

160

140

120

100

80

60

40

20

0

-20

-40

cubic-ZrW2O8 100.00 %

CalcObs (full MC simulation)Diff

5537 hours = 230 daysElapsed time =2.5 days

Page 22: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

• Hardware – very few

• Software – a few, but excellent support

• Security concerns – encryption and tampering

• System administrators are suspicious of us !

• End user obtrusiveness• Perceived

• Real (memory grab with povray)

• Unanticipated

Problems / issues

Page 23: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

Programs in the wild

clinical trial patientsside fx

general population{druginteractions

all connected PCs

test computer poolruns ok

{programinteractions

Page 24: Distributed Computing for Crystallography experiences and opportunities Dr. Kenneth Shankland & Tom Griffin ISIS Facility CCLRC Rutherford Appleton Laboratory.

• Get a ‘head honcho’ involved early on

• Test, test and test again

• Employ small test groups

of friendly users

• Know your application

• Don’t automatically

dismiss complaints about

obtrusiveness

Top tips