Top Banner
Scaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov. 1 st , 2017
16

Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

Mar 08, 2018

Download

Documents

lycong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

Scaling High-Performance Python with Minimal Effort

1

Ehsan Totoni

Research Scientist, Intel Labs

STAC Summit NYC, Nov. 1st, 2017

Page 2: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.

Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability,

fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course

of dealing, or usage in trade.

This document contains information on products, services and/or processes in development. All information provided here

is subject to change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications

and roadmaps.

The products and services described may contain defects or errors known as errata which may cause deviations from

published specifications. Current characterized errata are available on request.

Intel, the Intel logo, Intel Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

2017 © Intel Corporation.

*Other names and brands may be claimed as the property of others

2

Legal Disclaimer

Page 3: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

Data analytics is the greatest value driver in technology

Financial services need insights from data

• Exploit market data for financial modeling, etc.

High performance big data analytics is crucial

• Democratize HPC for data scientists

3

Motivation

http://www.businesscomputingworld.co.uk/

Page 4: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

Scripting languages like Python are productive but slow and serial

Big data frameworks (Hadoop/Spark) are hard to use and slow

• High overhead runtime libraries

• Not based on parallel computing fundamentals

High performance requires low-level programming

• Not practical for interactive workflows of data scientists and their expertise

4

Productivity-Performance Gap

python.org

llnl.orgisocpp.orgInfoobjects.com

Page 5: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

5

Motivation

127.22

64.1

4.28

2.151.11

1

10

100

1 (36) 2 (72) 4 (144)

Exe

cu

tio

n T

ime

(s)

Amazon AWS c4.8xlarge instances (vCPUs)

Spark

MPI/C++

53x

Logistic Regression on Amazon AWS

Totoni et al. “A Case Against Tiny Tasks in Iterative Analytics”, HotOS’17

NOT STAC BENCHMARK

Page 6: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

High performance/scalability for analytics/ML/AI with little effort

• Minimal changes to scripting source code

Compiler optimization and parallelization

• Scripting program → efficient parallel binary

High Performance Analytics Toolkit (HPAT)

• Python (previously Julia)

6

Overview

https://github.com/IntelLabs/HPAT.jl

python.org

https://github.com/IntelLabs/hpat

Page 7: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

7

HPAT Python Example

@hpat.jit

def logistic_regression(iterations):

f = h5py.File("lr.hdf5", "r")

X = f['points'][:]

Y = f['responses'][:]

D = X.shape[1]

w = np.ones(D) - 0.5

for i in range(iterations):

w -= np.dot(((1.0 / (1.0 + np.exp(-Y * np.dot(X, w))) - 1.0) * Y), X)

return w

33x speedup

on 4 nodes

Example launch command:

mpirun -n 144 python logistic_regression.py

Numpy code is implicitly data-parallel

Page 8: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

8

Data Parallelism Extraction

1Anderson et al. “Parallelizing Julia with a Non-invasive DSL”, ECOOP’17

D = A * B + C

parfor i=1:n

t[i]=A[i]*B[i]

parfor i=1:n

D[i]=t[i]+C[i]

parfor i=1:n

t[i]=A[i]*B[i]+C[i]

Recognize parallelism

Fuse loops

*

+

A

B

C

=D

Page 9: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

9

Spark Workflow HPAT Workflow

Python code Spark API code

Spark Runtime

Python code

Cluster/cloud

Parallel binary (MPI)

Cluster/cloud

Rewrite by programmer

Compile by HPAT

Driver

Executor 1 …

Rank 0 …Rank 1 Rank N-1

Driver

Executor N-1Executor 0

Page 10: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

10

Performance Evaluation

61

767 830351

0.013

10.5

0.24

0.03

2.060.98 0.95

0.01

0.1

1

10

100

1000

KernelDensity

LinearRegression

LogisticRegression

K-MeansE

xe

cutio

n t

ime

(s)

Spark MPI/C++ HPAT

46.2102

64.1

547

0.08

1.47 1.09 0.83

0.18

5.08

1.812.91

0.01

0.1

1

10

100

1000

Kernel Density LinearRegression

LogisticRegression

K-Means

Exe

cutio

n t

ime

(s)

Spark MPI/C++ HPAT

20x-256x speedup of HPAT vs Spark

Cori at NERSC/LBL

64 nodes (2048 cores)

Amazon AWS

4 nodes c4.8xlarge (144 vCPUs)

370x-2000x speedup of HPAT vs Spark

HPAT is within 2-4x MPI/C++HPAT Julia used, Python will be similar

Totoni et al. “HPAT: High Performance Analytics with Scripting Ease-of-Use”, ICS’17

NOT STAC BENCHMARKS

Page 11: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

11

Pandas Example

@hpat.jit(locals={'s_open': hpat.float64[:], …})

def intraday_mean_revert():

f = h5py.File("stock_data.hdf5", "r"); …

for i in prange(nsyms):

symbol = sym_list[i]

s_open = f[symbol+'/Open'][:]; …

df = pd.DataFrame({'Open': s_open, …})

df['Stdev'] = df['Close'].rolling(window=90).std()

df['Moving Average'] = df['Close'].rolling(window=20).mean()

df['Criteria1'] = (df['Open'] - df['Low'].shift(1)) < -df['Stdev']

df['Criteria2'] = df['Open'] > df['Moving Average']

df['BUY'] = df['Criteria1'] & df['Criteria2']

df['Pct Change'] = (df['Close'] - df['Open']) / df['Open']

df['Rets'] = df['Pct Change'][df['BUY'] == True]

n_days = len(df['Rets'])

res = np.zeros(max_num_days)

if n_days:

res[-n_days:] = df['Rets'].fillna(0)

all_res += res

100x speedup

on 36 cores

http://www.pythonforfinance.net/2017/02/20/intraday-stock-mean-reversion-trading-backtest-in-python/

Explicit loop parallelism

Page 12: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

Numpy:

• Element-wise operations: +, /, ==, exp, log, sqrt, …

• Array creation: zeros, ones_like, random, normal, …

• Others: sum, prod, dot, …

Pandas:

• Column access, and operations: df.A, df[‘A’], df.A.std()

• Filter: df[df.A > .5]

• Rolling windows: df.A.rolling(window=5).mean()

Parallel loop:

for i in prange(n):

s += A[i]**2

12

HPAT Operations

Page 13: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

Input code to HPAT should be statically compilable (type stable)

• Dynamic code example:

• Rare in analytics

13

Variable Type Limitation

if flag1:

a = 2

else:

a = np.ones(n)

if isinstance(a, np.ndarray):

doWork(a)

if flag2:

f = np.zeros

else:

f = np.ones

b = f(m)

Page 14: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

Data Frame column accesses should be static

• Dynamic code example:

• Refactor to:

14

Pandas Limitation

for i in range(5):

A += df['c'+str(i)]

A += df['c0']

A += df['c1']

A += df['c2']

A += df['c3']

A += df['c4']

Page 15: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

Compiler approach superior to library approach for analytics

HPAT bridges productivity-performance gap

• Compiles Python programs to efficient parallel binaries

• Available on GitHub: https://github.com/IntelLabs/hpat

15

Summary

Higher performanceEasier to use

Simpler infrastructureBroader functionality

Page 16: Scaling High-Performance Python with Minimal Effort · PDF fileScaling High-Performance Python with Minimal Effort 1 Ehsan Totoni Research Scientist, Intel Labs STAC Summit NYC, Nov.

E. Totoni, A. Roy, S. R. Dulloor, “A Case Against Tiny Tasks in Iterative Analytics”, HotOS’17

E. Totoni, T. A. Anderson, T. Shpeisman, “HPAT: High Performance Analytics with Scripting

Ease-of-Use”, ICS’17

https://arxiv.org/abs/1611.04934

T. A. Anderson, H. Liu, L. Kuper, E. Totoni, J. Vitek, T. Shpeisman, “Parallelizing Julia with a

Non-invasive DSL”, ECOOP’17

E. Totoni, W. Hassan, T. A. Anderson, T. Shpeisman, “HiFrames: High Performance Data

Frames in a Scripting Language”, (arxiv) 2017

https://arxiv.org/abs/1704.02341

16

References