Top Banner
29

PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Jul 16, 2015

Download

Data & Analytics

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Pythran: Static Compilation ofParallel Scientific Kernels

a.k.a. Python/Numpy compiler for themass

Proudly made in Namek by & serge-sans-paille pbrunet

Page 2: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

/usSerge « sans paille » Guelton

$ w h o a m is g u e l t o n

R&D engineer at on compilation for securityAssociate researcher at Télécom Bretagne

QuarksLab

Pierrick Brunet$ w h o a m ip b r u n e t

R&D engineer at on parallelismINRIAlpes/MOAIS

Page 3: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Pythran in a snake shell

A Numpy-centric Python-to-C++ translatorA Python code optimizerA Pythonic C++ library

Page 4: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Core conceptsFocus on high-level constructsGenerate clean high level codeOptimize Python code before generated codeVectorization and ParalllelismTest, test, testBench, bench, bench

Page 5: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Ask StackOverflowwhen you're looking for test cases

http://stackoverflow.com/[...]numba-or-cython-acceleration-in-reaction-diffusion-algorithm

i m p o r t n u m p y a s n pd e f G r a y S c o t t ( c o u n t s , D u , D v , F , k ) : n = 3 0 0 U = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t 3 2 ) V = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t 3 2 ) u , v = U [ 1 : - 1 , 1 : - 1 ] , V [ 1 : - 1 , 1 : - 1 ]

r = 2 0 u [ : ] = 1 . 0 U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0 V [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 2 5 u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) v + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) )

f o r i i n r a n g e ( c o u n t s ) : L u = ( U [ 0 : - 2 , 1 : - 1 ] + U [ 1 : - 1 , 0 : - 2 ] - 4 * U [ 1 : - 1 , 1 : - 1 ] + U [ 1 : - 1 , 2 : ] + U [ 2 : , 1 : - 1 ] ) L v = ( V [ 0 : - 2 , 1 : - 1 ] + V [ 1 : - 1 , 0 : - 2 ] - 4 * V [ 1 : - 1 , 1 : - 1 ] + V [ 1 : - 1 , 2 : ] +

Page 6: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Thread SummaryOP

My code is slow with Cython and NumbaBest Answer

You need to make all loops explicit

Page 7: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Cython Versionc i m p o r t c y t h o ni m p o r t n u m p y a s n pc i m p o r t n u m p y a s n p

c p d e f c y t h o n G r a y S c o t t ( i n t c o u n t s , d o u b l e D u , d o u b l e D v , d o u b l e F , d o u b l e k ) : c d e f i n t n = 3 0 0 c d e f n p . n d a r r a y U = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t _ ) c d e f n p . n d a r r a y V = n p . z e r o s ( ( n + 2 , n + 2 ) , d t y p e = n p . f l o a t _ ) c d e f n p . n d a r r a y u = U [ 1 : - 1 , 1 : - 1 ] c d e f n p . n d a r r a y v = V [ 1 : - 1 , 1 : - 1 ]

c d e f i n t r = 2 0 u [ : ] = 1 . 0 U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0 V [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 2 5 u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) ) v + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) )

c d e f n p . n d a r r a y L u = n p . z e r o s _ l i k e ( u ) c d e f n p . n d a r r a y L v = n p . z e r o s _ l i k e ( v ) c d e f i n t i , c , r 1 , c 1 , r 2 , c 2 c d e f d o u b l e u v v

Page 8: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Pythran versionAdd this line to the original kernel:

# p y t h r a n e x p o r t G r a y S c o t t ( i n t , f l o a t , f l o a t , f l o a t , f l o a t )

Timings$ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 8 , 0 . 0 4 , 0 . 0 6 ) '1 0 l o o p s , b e s t o f 3 : 5 2 . 9 m s e c p e r l o o p$ c y t h o n g r a y s c o t t . p y x$ g c c g r a y s c o t t . c ` ` - s h a r e d - f P I C - o g r a y s c o t t . s o - O 3 - m a r c h =$ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 8 , 0 . 0 4 , 0 . 0 6 ) '1 0 l o o p s , b e s t o f 3 : 3 6 . 4 m s e c p e r l o o p$ p y t h r a n g r a y s c o t t . p y - O 3 - m a r c h = n a t i v e$ p y t h o n - m t i m e i t - s ' f r o m g r a y s c o t t i m p o r t G r a y S c o t t ' ' G r a y S c o t t ( 4 0 , 0 . 1 6 , 0 . 0 8 , 0 . 0 4 , 0 . 0 6 ) '1 0 l o o p s , b e s t o f 3 : 2 0 . 3 m s e c p e r l o o p

p y t h o n - c o n f i g - - c f l a g s - - l i b s

Page 9: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Lessons learntExplicit is not always better than implicitMany ``optimization hints'' can be deduced by the compilerHigh level constructs carry valuable informations

I am not saying Cython is bad. Cython does a great job. It is justpragmatic where Pythran is idealist

Page 10: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Compilation Challengesu = U [ 1 : - 1 , 1 : - 1 ]U [ n / 2 - r : n / 2 + r , n / 2 - r : n / 2 + r ] = 0 . 5 0u + = 0 . 1 5 * n p . r a n d o m . r a n d o m ( ( n , n ) )

Array viewsValue broadcastingTemporary arrays creationExtended slices compositionNumpy API calls

Page 11: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Optimization OpportunitiesL u = ( U [ 0 : - 2 , 1 : - 1 ] + U [ 1 : - 1 , 0 : - 2 ] - 4 * U [ 1 : - 1 , 1 : - 1 ] + U [ 1 : - 1 , 2 : ] + U [ 2 : , 1 : - 1 ] )

Many useless temporariesLu could be forward-substitutedSIMD instruction generation opportunitiesParallel loop opportunities

Page 12: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Pythran Usage$ p y t h r a n - - h e l pu s a g e : p y t h r a n [ - h ] [ - o O U T P U T _ F I L E ] [ - E ] [ - e ] [ - f f l a g ] [ - v ] [ - p p a s s ] [ - m m a c h i n e ] [ - I i n c l u d e _ d i r ] [ - L l d f l a g s ] [ - D m a c r o _ d e f i n i t i o n ] [ - O l e v e l ] [ - g ] i n p u t _ f i l e

p y t h r a n : a p y t h o n t o C + + c o m p i l e r

p o s i t i o n a l a r g u m e n t s : i n p u t _ f i l e t h e p y t h r a n m o d u l e t o c o m p i l e , e i t h e r a . p y o r a . c p p f i l e

o p t i o n a l a r g u m e n t s : - h , - - h e l p s h o w t h i s h e l p m e s s a g e a n d e x i t - o O U T P U T _ F I L E p a t h t o g e n e r a t e d f i l e - E o n l y r u n t h e t r a n s l a t o r , d o n o t c o m p i l e - e s i m i l a r t o - E , b u t d o e s n o t g e n e r a t e p y t h o n g l u e - f f l a g a n y c o m p i l e r s w i t c h r e l e v a n t t o t h e u n d e r l y i n g C + + c o m p i l e r - v b e v e r b o s e - p p a s s a n y p y t h r a n o p t i m i z a t i o n t o a p p l y b e f o r e c o d e g e n e r a t i o n - m m a c h i n e a n y m a c h i n e f l a g r e l e v a n t t o t h e u n d e r l y i n g C + + c o m p i l e r

Page 13: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Sample Usage$ p y t h r a n i n p u t . p y # g e n e r a t e s i n p u t . s o$ p y t h r a n i n p u t . p y - E # g e n e r a t e s i n p u t . c p p$ p y t h r a n i n p u t . p y - O 3 - f o p e n m p # p a r a l l e l !$ p y t h r a n i n p u t . p y - m a r c h = n a t i v e - O f a s t # E s o d M u m i x a m !

Page 14: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Type AnnotationsOnly for exported functions

# p y t h r a n e x p o r t f o o 0 ( )# p y t h r a n e x p o r t f o o 1 ( i n t )

# p y t h r a n e x p o r t f o o 2 ( f l o a t 3 2 [ ] [ ] )# p y t h r a n e x p o r t f o o 2 ( f l o a t 6 4 [ ] [ ] )# p y t h r a n e x p o r t f o o 2 ( i n t 8 [ ] [ ] [ ] )

# p y t h r a n e x p o r t f o o 3 ( ( i n t , f l o a t ) , i n t l i s t , s t r : s t r d i c t )

Page 15: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Pythran Compilation Flowinput.py

AST

IR

outout.py input.cpp

input.so

Optimization loop

import ast

gcc [...]

Page 16: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Front End100% based on the ast moduleSupports

Several standard module (incl. partial Numpy)Polymorphic functionsndarray, list, tuple, dict, str, int, long, floatNamed parameters, default argumentsGenerators…

Does not SupportNon-implicitely typed codeGlobal variableMost Python modules (no CPython mode!)User-defined classes…

Page 17: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Middle EndIteratively applies high level, Python-aware optimizations:

Interprocedural Constant FoldingFor-Based-Loop UnrollingForward SubstitutionInstruction SelectionDeforestationScalar Renamingdead Code Elimination

Fun Facts: can evaluate pure functions at compile time ☺

Page 18: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Back EndsPython Back End

Useful for debugging!

C++11 Back EndC++11 implementation of __builtin__ numpyitertools…Lazy evaluation through Expression TemplatesRelies on OpenMP, nt2 and boost::simd for theparallelization / vectorization

Page 19: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

C++11: Typingthe W.T.F. slide

Pythran translates Python implicitly statically typed polymorphiccode into C++ meta-programs that are instanciated for the user-

given types, and specialize them for the target architecture

Page 20: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

C++11: ParallelismImplicit

Array operations and several numpy functions are written usingOpenMP and Boost.simd

ExplicitOpenMP 3 support, ported to Python

# o m p p a r a l l e l f o r r e d u c t i o n ( + : r )f o r i , v i n e n u m e r a t e ( l ) : r + = i * v

Page 21: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Benchmarks

A collection of high-level benchmarkshttps://github.com/serge-sans-paille/numpy-benchmarks

Code gathered from StackOverflow + other compiler code baseMostly high-level codeGenerate results for CPython, PyPy, Numba, Parakeet, Hopeand Pythran

Most kernels are too high level for Numba and Hope…

Page 22: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Benchmarksno parallelism, no vectorisation (, no fat)

Page 23: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet
Page 24: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

(Num)Focus: growcutFrom the Numba codebase!

# p y t h r a n e x p o r t g r o w c u t ( f l o a t [ ] [ ] [ ] , f l o a t [ ] [ ] [ ] , f l o a t [ ] [ ] [ ] , i n t )i m p o r t m a t hi m p o r t n u m p y a s n pd e f w i n d o w _ f l o o r ( i d x , r a d i u s ) : i f r a d i u s > i d x : r e t u r n 0 e l s e : r e t u r n i d x - r a d i u s

d e f w i n d o w _ c e i l ( i d x , c e i l , r a d i u s ) : i f i d x + r a d i u s > c e i l : r e t u r n c e i l e l s e : r e t u r n i d x + r a d i u s

d e f g r o w c u t ( i m a g e , s t a t e , s t a t e _ n e x t , w i n d o w _ r a d i u s ) : c h a n g e s = 0 s q r t _ 3 = m a t h . s q r t ( 3 . 0 )

h e i g h t = i m a g e . s h a p e [ 0 ] w i d t h = i m a g e . s h a p e [ 1 ]

Page 25: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

(Num)Focus: growcut

Page 26: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Academic ResultsPythran: Enabling Static Optimization of Scientific PythonPrograms, S. Guelton, P. Brunet et al. in CSD, 2015Exploring the Vectorization of Python Constructs UsingPythran and Boost SIMD, S. Guelton, J. Falcou and P. Brunet,in WPMVP, 2014Compiling Python modules to native parallel modules usingPythran and OpenMP Annotations, S. Guelton, P. Brunetand M. Amini, in PyHPC, 2013Pythran: Enabling Static Optimization of Scientific PythonPrograms, S. Guelton, P. Brunet et al. in SciPy, 2013

Page 27: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

Powered by Strong EngineeringPreprequisite for reproductible science

2773 test cases, incl. unit testing, doctest, Continuousintegration (thx !)Peer-reviewed codePython2.7 and C++11Linux, OSX (almost okay), Windows (on going)User and Developer doc: Hosted on Releases on PyPi: $ pip install pythran

: $ apt-get install pythran

Travis

http://pythonhosted.org/pythran/https://github.com/serge-sans-paille/pythran

Custom Debian repo

Page 28: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

We need more peonsPythonic needs a serious cleanupTyping module needs better error reportingOSX support is partial and Windows support is on-goingnumpy.random and numpy.linalg

Page 29: PyData Paris 2015 - Track 3.2 Serge Guelton et Pierrick Brunet

THE ENDAUTHORS

Serge Guelton, Pierrick Brunet, Mehdi Amini, AdrienMerlini, Alan Raynaud, Eliott Coyac…

INDUSTRIAL SUPPORT, , →You←

CONTRIBUTE#pythran on FreeNode, [email protected], GitHub repo

Silkan NumScale