Python for Science and Engineering: a presentation to A*STAR and the Singapore Computational Sciences Club, Edward Schofield, Python Charmers, June 2011

Post on 27-Jan-2015

104 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

An introduction to Python in science and engineering. The presentation was given by Dr Edward Schofield of Python Charmers (www.pythoncharmers.com) to A*STAR and the Singapore Computational Sciences Club in June 2011.

Transcript

Python for Science and Engineering

Dr Edward Schofield

A*STAR / Singapore Computational Sciences Club SeminarJune 14, 2011

Scientific programming in 2011

Most scientists and engineers are:

programming for 50+% of their work time (and rising)

self-taught programmers

using inefficient programming practices

using the wrong programming languages: C++, FORTRAN, C#, PHP, Java, ...

Scientific programming needs

Rapid prototyping

Efficiency for computational kernels

Pre-written packages!

Vectors, matrices, modelling, simulations, visualisation

Extensibility; web front-ends; database backends; ...

Ed's story:How I found Python

PhD in statistical pattern recognition: 2001-2006

Needed good tools for my research!

Discovered Python in 2002 after frustration with C++, Matlab, Java, Perl

Contributed to NumPy and SciPy:

maxent, sparse matrices, optimization, Monte Carlo, etc.

Managed six releases of SciPy in 2005-6

1. Why Python?

Introducing Python

What is it?

What is it good for?

Who uses it?

What is Python?

interpreted

strongly but dynamically typed

object-oriented

intuitive, readable

open source, free

‘batteries included’

‘batteries included’

Python’s standard library is:

very large

well-supported

well-documented

Python’s standard library

data types strings networking threads

operating system compression GUI arguments

CGI complex numbers FTP cryptography

testing multimedia databases CSV files

calendar email XML serialization

What is an efficient programming language?

Native Python code executes 10x more slowly than C and FORTRAN

Would you build a racing car ...... to get to Kuala Lumpur ASAP?

Date Cost per GFLOPS (US $) Technology

1961 US $1.1 trillion 17 million IBM 1620s

1984 US $15,000,000 Cray X-MP

1997 US $30,000 Two 16-CPU clusters of Pentiums

2000, Apr $1000 Bunyip Beowulf cluster

2003, Aug $82 KASY0

2007, Mar $0.42 Ambric AM2045

2009, Sep $0.13 ATI Radeon R800

Source: Wikipedia: “FLOPS”

Unit labor cost growthProxy for cost of programmer time

Efficiency

When FORTRAN was invented, computer time was more expensive than programmer time.

In the 1980s and 1990s that reversed.

Efficient programming

Python code is 10x faster to write than C and FORTRAN

What if ...... you now need to reach Sydney?

Advantages of Python

Easy to write

Easy to maintain

Great standard libraries

Thriving ecosystem of third-party packages

Open source

‘Batteries included’

Python’s standard library is:

very large

well supported

well documented

Python’s standard library

data types strings networking threads

operating system compression GUI arguments

CGI complex numbers FTP cryptography

testing multimedia databases CSV files

calendar email XML serialization

QuestionWhat is the date 177 days from now?

Natural applications of Python

Rapid prototyping

Plotting, visualisation, 3D

Numerical computing

Web and database programming

All-purpose glue

Python vs other languages

Languages used at CSIRO

Python Fortran Java

Matlab C VB.net

IDL C++ R

Perl C# +5-10 others!

Which language do I choose?

A different language for each task?

A language you know?

A language others in your team are using: support and help?

Python Matlab

Interpreted Yes Yes

Powerful data input/output Yes Yes

Great plotting Yes Yes

General-purpose language Powerful Limited

Cost Free $$$

Open source Yes No

Python C++

Powerful Yes Yes

Portable Yes In theory

Standard libraries Vast Limited

Easy to write and maintain Yes No

Easy to learn Yes No

Python C

Fast to write Yes No

Good for embedded systems, device drivers and operating systems No Yes

Good for most other high-level tasks Yes No

Standard library Vast Limited

Python Java

Powerful, well-designed language Yes Yes

Standard libraries Vast Vast

Easy to learn Yes No

Code brevity Short Verbose

Easy to write and maintain Yes Okay

Open source

Python is open source software

Benefits:

No vendor lock-in

Cross-platform

Insurance against bugs in the platform

Free

Python success stories

Computer graphics:

Industrial Light & Magic

Web:

Google: News, Groups, Maps, Gmail

Legacy system integration:

AstraZeneca - collaborative drug discovery

Python success stories (2)

Aerospace:

NASA

Research:

universities worldwide ...

Others:

YouTube, Reddit, BitTorrent, Civilization IV,

Industrial Light & Magic

Python spread from scripting to the entire production pipeline

Numerous reviews since 1996: Python is still the best tool for them

United Space Alliance

A common sentiment:

“We achieve immediate functioning code so much faster in Python than in any other language that it’s staggering.”

- Robin Friedrich, Senior Project Engineer

Case study: air-traffic control

Eric Newton, “Python for Critical Applications”: http://metaslash.com/brochure/recall.html

Metaslash, Inc: 1999 to 2001

Mission-critical system for air-traffic control

Replicated, fault-tolerant data storage

Case study: air-traffic control

Python prototype -> C++ implementation -> Python again

Why?

C++ dependencies were buggy

C++ threads, STL were not portable enough

Python’s advantages over C++

More portable

75% less code: more productivity, fewer bugs

More case studies

See http://www.python.org/about/success/ for lots more case studies and success stories

2. The scientific Python ecosystem

Scientific software development

Small beginnings

Piecemeal growth, quirky interfaces

... Large, cumbersome systems

NumPyAn n-dimensional array/matrix package

NumPyCentre of Python’s numerical computing ecosystem

NumPy

The most fundamental tool for numerical computing in Python

Fast multi-dimensional array capability

What NumPy defines:

Two fundamental objects:

1. n-dimensional array

2. universal function

a rich set of numerical data types

nearly 400 functions and methods on arrays:

type conversions

mathematical

logical

NumPy's features

Fast. Written in C with BLAS/LAPACK hooks.

Rich set of data types

Linear algebra: matrix inversion, decompositions, …

Discrete Fourier transforms

Random number generation

Trig, hypergeometric functions, etc.

Elementwise array operations

Loops are mostly unnecessary

Operate on entire arrays!>>> a = numpy.array([20, 30, 40, 50])>>> a < 35array([True, True, False, False], dtype=bool)>>> b = numpy.arange(4)>>> a - barray([20, 29, 38, 47])>>> b**2array([0, 1, 4, 9])

Universal functions

NumPy defines 'ufuncs' that operate on entire arrays and other sequences (hence 'universal')

Example: sin()>>> a = numpy.array([20, 30, 40, 50])>>> c = 10 * numpy.sin(a)>>> carray([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])

Array slicing

Arrays can be sliced and indexed powerfully:>>> a = numpy.arange(10)**3>>> aarray([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])>>> a[2:5]array([ 8, 27, 64])

Fancy indexing

Arrays can be used as indices into other arrays:

>>> a = numpy.arange(12)**2>>> ind = numpy.array([ 1, 1, 3, 8, 5 ])>>> a[ind]array([ 1, 1, 9, 64, 25])

Other linear algebra features

Matrix inversion: mat(A).I

Or: linalg.inv(A)

Linear solvers: linalg.solve(A, x)

Pseudoinverse: linalg.pinv(A)

What is SciPy?

A community

A conference

A package of scientific libraries

Python for scientific software

Back-end: computational work

Front-end: input / output, visualization, GUIs

Dozens of great scientific packages exist

Python in science (2)

NumPy: numerical / array moduleMatplotlib: great 2D and 3D plotting libraryIPython: nice interactive Python shellSciPy: set of scientific libraries: sparse matrices, signal processing, …RPy: integration with the R statistical environment

Python in science (3)

Cython: C language extensionsMayavi: 3D graphics, volumetric renderingNitimes, Nipype: Python tools for neuroimagingSymPy: symbolic mathematics library

Python in science (4)

VPython: easy, real-time 3D programming

UCSF Chimera, PyMOL, VMD: molecular graphics

PyRAF: Hubble Space Telescope interface to RAF astronomical data

BioPython: computational molecular biology

Natural language toolkit: symbolic + statistical NLP

Physics: PyROOT

The SciPy packageBSD-licensed software for maths, science, engineering

integration signal processing sparse matrices

optimization linear algebra maximum entropyinterpolation ODEs statistics

FFTs n-dim image processing scientific constants

clustering interpolation C/C++ and Fortran integration

SciPy optimisation exampleFit a model to noisy data:y = a/xb sin(cx)+ε

Example: fitting a model with scipy.optimize

Task: Fit a model of the form y = a/bx sin(cx)+εto noisy data.

Spec:

1. Generate noisy data

2. Choose parameters (a, b, c) to minimize sum squared errors

3. Plot the data and fitted model (next session)

SciPy optimisation exampleimport numpyimport pylabfrom scipy.optimize import leastsq

def myfunc(params, x): (a, b, c) = params return a / (x**b) * numpy.sin(c * x)

true_params = [1.5, 0.1, 2.]def f(x): return myfunc(true_params, x)

def err(params, x, y): # error function return myfunc(params, x) - y

SciPy optimisation example# Generate noisy data to fit n = 30; xmin = 0.1; xmax = 5x = numpy.linspace(xmin, xmax, n)y = f(x)y += numpy.rand(len(x)) * 0.2 * \ (y.max() - y.min())

v0 = [3., 1., 4.] # initial param estimate# Fittingv, success = leastsq(err, v0, args=(x, y), maxfev=10000)

print 'Estimated parameters: ', vprint 'True parameters: ', true_paramsX = numpy.linspace(xmin, xmax, 5 * n)pylab.plot(x, y, 'ro', X, myfunc(v, X))pylab.show()

SciPy optimisation exampleFit a model to noisy data:y = a/xb sin(cx)+ε

Ingredients for this example

numpy.linspace

numpy.random.rand for the noise model (uniform)

scipy.optimize.leastsq

Sparse matrix exampleConstruct and solve a sparse linear system

Sparse matricesSparse matrices are mostly zeros.

They can be symmetric or asymmetric.

Sparsity patterns vary:

block sparse, band matrices, ...

They can be huge!

Only non-zeros are stored.

Sparse matrices in SciPy

SciPy supports seven sparse storage schemes

... and sparse solvers in Fortran.

Sparse matrix creation

To construct a 1000x1000 lil_matrix and add values:>>> from scipy.sparse import lil_matrix>>> from numpy.random import rand>>> from scipy.sparse.linalg import spsolve

>>> A = lil_matrix((1000, 1000))>>> A[0, :100] = rand(100)>>> A[1, 100:200] = A[0, :100]>>> A.setdiag(rand(1000))

Solving sparse matrix systems

Now convert the matrix to CSR format and solve Ax=b:>>> A = A.tocsr()>>> b = rand(1000)>>> x = spsolve(A, b)

# Convert it to a dense matrix and solve, and check that the result is the same:>>> from numpy.linalg import solve, norm>>> x_ = solve(A.todense(), b)# Compute norm of the error:>>> err = norm(x - x_)>>> err < 1e-10True

Matplotlib

Great plotting package in Python

Matlab-like syntax

Great rendering: anti-aliasing etc.

Many ‘backends’: Cairo, GTK, Cocoa, PDF

Flexible output: to EPS, PS, PDF, TIFF, PNG, ...

Matplotlib: worked examplesSearch the web for 'Matplotlib gallery'

Example: NumPy vectorization1. Use a Monte Carlo algorithm to

estimate π:

1. Generate uniform random variates (x,%y) over [0, 1].

2. Estimate π from the proportion p that land in the unit circle.

2. Time two ways of doing this:

1. Using for loops

2. Using array operations (vectorized)

3. Scaling

HPCHigh-performance computing

Aspects to HPC

Supercomputers Distributed clusters / grids

Parallel programming Scripting

Caches, shared memory Job control

Code porting Specialized hardware

Python for HPCAdvantages Disadvantages

Portability Global interpreter lock

Easy scripting, glue Less control than C

Maintainability Native loops are slow

Profiling to identify hotspots

Vectorization with NumPy

Large data sets

Useful Python language features:

Generators, iterators

Useful packages:

Great HDF5 support from PyTables!

Hierarchical dataDatabases without the relational baggage

Great interface for HDF5 dataEfficient support for massive data sets

Applications of PyTables

aeronautics telecommunications

drug discovery data mining

financial analysis statistical analysis

climate prediction etc.

Breaking news: June 2011

PyTables Pro is now being open sourced.

Indexed searches for speed

Merging with PyTables

Working project name: NewPyTables

PyTables performance

OPSI indexing engine speed:

Querying 10 billion rows can take hundredths of a second!

Target use-case:

mostly read-only or append-only data

Principles for efficient code

Important principles

1. "Premature optimization is the root of all evil"

Don't write cryptic code just to make it more efficient!

2. 1-5% of the code takes up the vast majority of the computing time!

... and it might not be the 1-5% that you think!

Checklist for efficient codeFrom most to least important:

1. Check: Do you really need to make it more efficient?

2. Check: Are you using the right algorithms and data structures?

3. Check: Are you reusing pre-written libraries wherever possible?

4. Check: Which parts of the code are expensive? Measure, don't guess!

Relative efficiency gains

Exponential-order and polynomial-order speedups are possible by choosing the right algorithm for a task.

These require the right data structures!

These dwarf 10-25x linear-order speedups from:

using lower-level languages

using different language constructs.

4. About Python Charmers

The largest Python training provider in South-East Asia

Delighted customers include:

Most popular course topicsPython for Programmers 3 days

Python for Scientists and Engineers 4 days

Python for Geoscientists 4 days

Python for Bioinformaticians 4 days

Python for Financial Engineers 4 daysPython for IT Security Professionals 3 days

New courses:

Python Charmers:Topics of expertise

Python: beginners, advanced

Scientific data processing with Python

Software engineering with Python

Large-scale problems: HPC, huge data sets, grids

Statistics and Monte Carlo problems

Python Charmers:Topics of expertise (2)

Spatial data analysis / GIS

General scripting, job control, glue

GUIs with PyQt

Integrating with other languages: R, C, C++, Fortran, ...

Web development in Django

How to get in touch

See PythonCharmers.com

or email us at: info@pythoncharmers.com

top related