Top Banner
Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder
34

Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Jun 08, 2018

Download

Documents

duongnga
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Python for High

Performance

Computing

Monte Lunacek

Research Computing, University of Colorado Boulder

Page 2: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

University of Colorado

• Some tightly-coupled MPI codes

• Many independent tasks• Diverse computing backgrounds

• Geography

• Ecology and Evolutionary Biology

• Microbial Ecology

• Astronomy

• Geology

• Range of computational experience

02

Page 3: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

High Throughout Computing

• Simulations

• Monte Carlo

• Parameter scan

• Uncertainty Quantification

• Parameter Optimization

• Data Analysis (MapReduce)

• Parallel workflows

03

Page 4: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Supercomputing Without the Pain

• Accessible to anyone with:

• Simulation or analysis to run

• Desire to do it faster

• Remove barriers to entry

04

Page 5: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Success Stories

~500,000 simulations on ~7,000 cores with mpi4py

(http://mpi4py.scipy.org/)

Parameter optimization on ~100 cores with Scoop

(https://code.google.com/p/scoop/) and DEAP

(https://code.google.com/p/deap/)

Improved biological workflow with IPython Parallel

(http://ipython.org/ipython-doc/dev/parallel/)

Wrapped an engineering simulation with f2py

(http://www.scipy.org/F2py) and IPython Parallel

(http://ipython.org/ipython-doc/dev/parallel/)

05

Page 6: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Outline

• Python (http://python.org)

• Ipython Notebook (http://ipython.org/ipython-

doc/dev/interactive/htmlnotebook.html)

• High Throughput Computing

• IPython Parallel (http://ipython.org/ipython-

doc/dev/parallel/)

• Scoop (https://code.google.com/p/scoop/)

• mpi4py (http://mpi4py.scipy.org/)

• Data Analysis with pandas (http://discoproject.org/)

• Conclude

06

Page 7: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

What is Python?

07

Page 8: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Python

• Flexible, powerful programming language

• Object oriented

• Runs everywhere

• Easy, clean syntax

• Glue: Cython, F2py

• Large community of support

• Consistent feel

• Free as in free beer• Free as in free speech

08

Page 9: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Packages for Computational Science

• python: the base language

• numpy: arrays, fast operations on arrays

• scipy: higher level computational routines

• matplotlib: plotting

• ipython: notebooks, flexible shell, and parallel

• pandas: data analysis

09

Page 10: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

What can you do with Python?

• OS support: manage files and directories

• Glue existing applications

• LAPACK and BLAS: access powerful C and Fortran libraries

• Parallel

• Data Analysis

• Visualization

• GUI programming

• Scrape websites

• Build websites

• Anything!

10

Page 11: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Distributions

• Enthought (http://www.enthought.com/products/epd.php)

• Python(x,y) (http://www.pythonxy.com/)

• Anaconda (https://store.continuum.io/cshop/anaconda)

IPython terminal

ipython --pylab

IPython notebook

ipython notebook --pylab=inline

11

Page 12: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Notebook

12

Page 13: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

This webpage is not available

Google Chrome's connection attempt to 127.0.0.1 was rejected. The website may be down, or your network may not be

properly configured.

Here are some suggestions:

Reload this webpage later.

Check your Internet connection. Restart any router, modem, or other network devices you may be using.

Add Google Chrome as a permitted program in your firewall's or antivirus software's settings. If it is already a permitted

program, try deleting it from the list of permitted programs and adding it again.

If you use a proxy server, check your proxy settings or contact your network administrator to make sure the proxy

server is working. If you don't believe you should be using a proxy server, adjust your proxy settings: Go to

Applications > System Preferences > Network > Advanced > Proxies and deselect any proxies that have been

selected.

Error 102 (net::ERR_CONNECTION_REFUSED): The server refused the connection.

13

Page 14: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

High Throughput Computing

14

Page 15: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Bash

count_base=0for i in {1..N}do for j in {1..12} do b=$(($count_base + $j)) ./simulator -s 5 -t $b & done wait count_base=$(( $count_base + $np))done

Limited to a single node

15

Page 16: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

pbsdsh

#!/bin/bashPATH=$PBS_O_WORKDIR:$PBS_O_PATHTRIAL=$(($PBS_VNODENUM + $1))python ./simulator.py -s 5 -t $TRIAL

for i in {1..N}do pbsdsh wrapper.sh $count count=$(( $count + 12))done

A little painful

16

Page 17: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

A little inefficient

node0000node0001node0002node0003node0004node0005node0006node0007node0008node0009node0010node0011node0012node0013node0014node0015node0016node0017node0018node0019

07:05 07:10 07:15 07:20

17

Page 18: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Objective Function

def simulation(x): value = x*x + 10 return value

The functions name is simulation

18

Page 19: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Multiprocessing

Import

from multiprocessing import Pool

Map the values

if __name__ == '__main__': pool = Pool(12) # workers data = range(200) # tasks results = pool.map(simulation, data)

Great for single node

Python's threading library

19

Page 20: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Scoop

Import

from scoop import futures

Map the values

if __name__ == '__main__': data = range(200) # tasks results = futures.map(simulation, data)

Launch

python -m scoop filename.py

Efficient startup!

20

Page 21: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

IPython Parallel

from IPython.parallel import Client, require

Map the values

if __name__ == '__main__': data = range(200) # tasks rc = Client(profile='mpi') lview = rc.load_balanced_view() results = lview.map(simulation, data) results.wait()

21

Page 22: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Compare

results = pool.map(simulation, data)

results = futures.map(simulation, data)

results = lview.map(simulation, data)

It's the way you create the object.map() that separates these

methods.

22

Page 23: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Compare

Good

IPython Fault-tolerance

IPython Schedule

IPython Interactive

Scoop Efficient launch

Multiprocessing Included in the standard Library

Needs work

All Scaling unknown

IPython Launcher (configuration)

Scoop and MP __main__

Scoop Schedule

Multiprocessing One node (kind of)

23

Page 24: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Scheduling: Bash

node0000node0001node0002node0003node0004node0005node0006node0007node0008node0009node0010node0011node0012node0013node0014node0015node0016node0017node0018node0019

07:05 07:10 07:15 07:20

24

Page 25: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Scheduling: Static

node0000node0001node0002node0003node0004node0005node0006node0007node0008node0009node0010node0011node0012node0013node0014node0015node0016node0017node0018node0019

07:05 07:10 07:15 07:20

25

Page 26: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Scheduling: Dynamic

node0000node0001node0002node0003node0004node0005node0006node0007node0008node0009node0010node0011node0012node0013node0014node0015node0016node0017node0018node0019

07:05 07:10 07:15 07:20

26

Page 27: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

27

Page 28: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Tenacious Robustness Test

@require('time','socket','random','IPython.parallel.error.KernelError')def simulation(x):

time.sleep(5) if random.random() < 0.3: raise KernelError return {'task':x, 'host' :socket.gethostname()}

Launch 10 nodes

Run several tasks

At some point, kill a node

28

Page 29: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

mpi4py

from mpi4py import MPI

comm = MPI.COMM_WORLD

rank = comm.Get_rank()

if rank == 0:

data = {'key1' : [7, 2.72, 3.2],

'key2' : ( 'abc', 'xyz')}

else:

data = None

data = comm.bcast(data, root=0)

29

Page 30: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

mpi4py Scaling

3 second jobs2048 924096 87%8192 67%

30

Page 31: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Data Analysis

31

Page 32: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

This webpage is not available

Google Chrome's connection attempt to 127.0.0.1 was rejected. The website may be down, or your network may not be

properly configured.

Here are some suggestions:

Reload this webpage later.

Check your Internet connection. Restart any router, modem, or other network devices you may be using.

Add Google Chrome as a permitted program in your firewall's or antivirus software's settings. If it is already a permitted

program, try deleting it from the list of permitted programs and adding it again.

If you use a proxy server, check your proxy settings or contact your network administrator to make sure the proxy

server is working. If you don't believe you should be using a proxy server, adjust your proxy settings: Go to

Applications > System Preferences > Network > Advanced > Proxies and deselect any proxies that have been

selected.

Error 102 (net::ERR_CONNECTION_REFUSED): The server refused the connection.

32

Page 33: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

Conclusions

Python makes supercomputing accessibleCombine libraries to achieve the task at hand.

• Simulate and analyze

• Share methods in a notebook

• Push your data to a database

• Share it on the web

• In parallel

33

Page 34: Python for High Performance Computing - SEA · Python for High Performance Computing Monte Lunacek Research Computing, University of Colorado Boulder

References

• Python Scripting for Computational Science

(http://www.springer.com/mathematics/computational+science+%26+engineering/book/978-

3-540-73915-9)

• Python Snakes Its Way Into HPC

(http://www.hpcwire.com/hpcwire/2010-11-

17/python_snakes_its_way_into_hpc.html)

• Andy Terrel: Getting Started with Python in HPC

(http://andy.terrel.us/blog/2012/09/27/starting-with-python/)

• Python Tutorial (http://docs.python.org/2/tutorial/)

• Think Python (http://www.greenteapress.com/thinkpython/)

• Data Analysis with Python34