Top Banner
Moving Towards Deep Learning Algorithms on HPCC Systems Maryam M. Najafabadi
80

Moving Toward Deep Learning Algorithms on HPCC Systems

Apr 21, 2017

Download

Data & Analytics

HPCC Systems
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Moving Toward Deep Learning Algorithms on HPCC Systems

Moving Towards Deep Learning Algorithms on HPCC Systems

Maryam M. Najafabadi

Page 2: Moving Toward Deep Learning Algorithms on HPCC Systems

Overview

• L-BFGS

• HPCC Systems

• Implementation of L-BFGS on HPCC Systems

• SoftMax

• Sparse Autoencoder

• Toward Deep Learning

2

Page 3: Moving Toward Deep Learning Algorithms on HPCC Systems

Mathematical optimization

• Minimizing/Maximizing a function

Minimum

3

Page 4: Moving Toward Deep Learning Algorithms on HPCC Systems

Optimization Algorithms in Machine Learning

• Linear Regression

Minimize Errors

4

Page 5: Moving Toward Deep Learning Algorithms on HPCC Systems

Optimization Algorithms in Machine Learning

• SVM

Maximize Margin

5

Page 6: Moving Toward Deep Learning Algorithms on HPCC Systems

Optimization Algorithms in Machine Learning

• Collaborative filtering

• K-means

• Maximum likelihood estimation

• Graphical models

• Neural Networks

• Deep Learning

6

Page 7: Moving Toward Deep Learning Algorithms on HPCC Systems

Formulate Training as an Optimization Problem

• Training model: finding parameters that minimize some objective function

Define ParametersDefine an Objective

FunctionFind values for the parameters that

minimize the objective function

Cost term Regularization term

Optimization Algorithm

7

Page 8: Moving Toward Deep Learning Algorithms on HPCC Systems

How they work

Search Direction

Step Length

8

Page 9: Moving Toward Deep Learning Algorithms on HPCC Systems

Gradient Descent

• Step length• Constant value

• Search direction• Negative gradient

9

Page 10: Moving Toward Deep Learning Algorithms on HPCC Systems

Gradient Descent

• Step length• Constant value

• Search direction• Negative gradient

Small Step Length

10

Page 11: Moving Toward Deep Learning Algorithms on HPCC Systems

Gradient Descent

• Step length• Constant value

• Search direction• Negative gradient

Large Step Length

11

Page 12: Moving Toward Deep Learning Algorithms on HPCC Systems

Newton Methods

• Step length• Use a line search

• Search direction• Use Curative Information (Inverse of Hessian Matrix)

12

Page 13: Moving Toward Deep Learning Algorithms on HPCC Systems

Quasi Newton Methods

• Problem with large n in Newton methods• Calculation of inverse of Hessian matrix too expensive

• Continuously updating an approximation of the inverse of the Hessian matrix in each iteration

13

Page 14: Moving Toward Deep Learning Algorithms on HPCC Systems

BFGS

• Broyden, Fletcher, Goldfarb, and Shanno

• Most popular Quasi Newton Method

• Uses Wolfe line search to find step length

• Needs to keep n×n matrix in memory

14

Page 15: Moving Toward Deep Learning Algorithms on HPCC Systems

L-BFGS

• Limited-memory: only a few vectors of length n (m×n instead of n×n)

• m << n

• Useful for solving large problems (large n)

• More stable learning

• Uses curvature information to take a more direct route • faster convergence

15

Page 16: Moving Toward Deep Learning Algorithms on HPCC Systems

How to use

• Define a function that calculates Objective value and Gradient

ObjectiveFunc (x, ObjectiveFunc_params, TrainData , TrainLabel)

16

Page 17: Moving Toward Deep Learning Algorithms on HPCC Systems

Why L-BFGS?

• Toward Deep Learning• Optimization is heart of DL and many other ML algorithms

• Popular

• Advantages over SGD

17

Page 18: Moving Toward Deep Learning Algorithms on HPCC Systems

HPCC Systems

• Open source, massive parallel-processing computing platform for big data processing and analytics

• LexisNexis Risk Solutions

• Uses commodity clusters of hardware running on top of the Linux operating system

• Based on DataFlow programming model

• THOR-ROXIE

• ECL

18

Page 19: Moving Toward Deep Learning Algorithms on HPCC Systems

THOR

19

RECORD base

Page 20: Moving Toward Deep Learning Algorithms on HPCC Systems

DataFlow Analysis

• Main focus is how the data is being changed

• A Graph represents a transformation on the data

• Each node is an operation

• Edges show the data flow

20

Page 21: Moving Toward Deep Learning Algorithms on HPCC Systems

A DataFlow example

21

Id value

1 2

1 3

2 5

1 10

3 4

2 9

Id value

1 2

1 3

1 10

2 5

2 9

3 4

Id value

1 10

2 9

3 4

MAX

Page 22: Moving Toward Deep Learning Algorithms on HPCC Systems

ECL

• Enterprise Control Language

• Compiled into optimized C++ code

• Declarative Language provides parallel and distributed DataFloworiented processing

22

Page 23: Moving Toward Deep Learning Algorithms on HPCC Systems

ECL

• Enterprise Control Language

• Compiled into optimized C++ code

• Declarative Language provides parallel and distributed DataFloworiented processing

23

Page 24: Moving Toward Deep Learning Algorithms on HPCC Systems

Declarative

• What to accomplish, rather than How to accomplish

• You’re describing what you’re trying to achieve, without instructing how to do it

24

Page 25: Moving Toward Deep Learning Algorithms on HPCC Systems

ECL

• Enterprise Control Language

• Compiled into optimized C++ code

• Declarative Language provides parallel and distributed DataFloworiented processing

25

Page 26: Moving Toward Deep Learning Algorithms on HPCC Systems

ECL

• Enterprise Control Language

• Compiled into optimized C++ code

• Declarative Language provides parallel and distributed DataFloworiented processing

26

Page 27: Moving Toward Deep Learning Algorithms on HPCC Systems

27

Id value

1 2

1 3

2 5

1 10

3 4

2 9

Page 28: Moving Toward Deep Learning Algorithms on HPCC Systems

28

Id value

2 5

2 9

Node 1

Node 2

READ

Id value

1 2

1 3

3 4

1 10

Page 29: Moving Toward Deep Learning Algorithms on HPCC Systems

29

Id value

2 5

2 9

Node 1

Node 2

LOCAL SORT

Id value

1 2

1 3

1 10

3 4

Id value

2 5

2 9

Node 1

Node 2

READ

Id value

1 2

1 3

3 4

1 10

Page 30: Moving Toward Deep Learning Algorithms on HPCC Systems

30

Id value

2 5

2 9

Node 1

Node 2

LOCAL SORT

Id value

1 2

1 3

1 10

3 4

Id value

2 5

2 9

Node 1

Node 2

READ

Id value

1 2

1 3

3 4

1 10

Id value

1 2

1 3

1 10

3 4

Id value

2 5

2 9

Node 1

Node 2

LOCAL GROUP

Page 31: Moving Toward Deep Learning Algorithms on HPCC Systems

31

Id value

2 5

2 9

Node 1

Node 2

LOCAL SORT

Id value

1 2

1 3

1 10

3 4

Id value

2 5

2 9

Node 1

Node 2

READ

Id value

1 2

1 3

3 4

1 10

Id value

1 2

1 3

1 10

3 4

Id value

2 5

2 9

Node 1

Node 2

LOCAL GROUP

Id value

1 10

3 4

Id value

2 9

Node 1

Node 2

LOCAL AGG/MAX

Page 32: Moving Toward Deep Learning Algorithms on HPCC Systems

Back to L-BFGS

• Minimize f(x)

• Start with an initialized x : x0

• Repeatedly update: xk+1 = xk + αkpk

32

Wolfe line search L-BFGS

Page 33: Moving Toward Deep Learning Algorithms on HPCC Systems

• If x too large it does not fit in memory of one machine

• Needs m × n memory

• Distribute x on different machines

• Try to do computations locally

• Do global computations as necessary

33

Page 34: Moving Toward Deep Learning Algorithms on HPCC Systems

• If x too large it does not fit in memory of one machine

• Needs m × n memory

• Distribute x on different machines

• Try to do computations locally

• Do global computations as necessary

34

Page 35: Moving Toward Deep Learning Algorithms on HPCC Systems

• If x too large it does not fit in memory of one machine

• Needs m × n memory

• Distribute x on different machines

• Try to do computations locally

• Do global computations as necessary

35

. . .

Page 36: Moving Toward Deep Learning Algorithms on HPCC Systems

• If x too large it does not fit in memory of one machine

• Needs m × n memory

• Distribute x on different machines

• Try to do computations locally

• Do global computations as necessary

36

. . .

. . .

Page 37: Moving Toward Deep Learning Algorithms on HPCC Systems

37

Page 38: Moving Toward Deep Learning Algorithms on HPCC Systems

• Dot Product

38

1, 3, 6, 8, 10, 9, 1, 2, 3, 9, 8

2, 3, 3, 8, 3, 11, 1, 2, 5, 9, 5

Page 39: Moving Toward Deep Learning Algorithms on HPCC Systems

• Dot Product

39

1, 3, 6, 8

3, 11, 1, 2

10, 9, 1, 2 3, 9, 8

Node 1 Node 2 Node 3

5, 9, 5 2, 3, 3, 8

Page 40: Moving Toward Deep Learning Algorithms on HPCC Systems

• Dot Product

40

1, 3, 6, 8

3, 11, 1, 2

10, 9, 1, 2 3, 9, 8

Node 1 Node 2 Node 3

5, 9, 5 2, 3, 3, 8

LCOAL Dot Product 120 134 136

Page 41: Moving Toward Deep Learning Algorithms on HPCC Systems

• Dot Product

41

1, 3, 6, 8

3, 11, 1, 2

10, 9, 1, 2 3, 9, 8

Node 1 Node 2 Node 3

5, 9, 5 2, 3, 3, 8

LCOAL Dot Product

Global Summation

120 134 136

390

Page 42: Moving Toward Deep Learning Algorithms on HPCC Systems

Using ECL for implementing L-BFGS

42

0.1, 0.3, 0.6, 0.8, 0.2, 0.7, 0.5, 0.5, 0.5, 0.3, 0.4, 0.6, 0.7, 0.7x

Page 43: Moving Toward Deep Learning Algorithms on HPCC Systems

Using ECL for implementing L-BFGS

43

0.1, 0.3, 0.6, 0.8, 0.2, 0.7, 0.5, 0.5, 0.5, 0.3, 0.4, 0.6, 0.7, 0.7x

Page 44: Moving Toward Deep Learning Algorithms on HPCC Systems

Using ECL for implementing L-BFGS

44

0.1, 0.3, 0.6, 0.8, 0.2, 0.7, 0.5, 0.5, 0.5, 0.3, 0.4, 0.6, 0.7, 0.7x

Node1 Node2 Node3 Node4

Node_id partition_values1 0.1, 0.3, 0.6, 0.82 0.2, 0.7, 0.5, 0.53 0.5, 0.3, 0.4, 0.64 0.7, 0.7

Page 45: Moving Toward Deep Learning Algorithms on HPCC Systems

Using ECL for implementing L-BFGS

45

0.1, 0.3, 0.6, 0.8, 0.2, 0.7, 0.5, 0.5, 0.5, 0.3, 0.4, 0.6, 0.7, 0.7x

Node1 Node2 Node3 Node4

Node_id partition_values1 0.1, 0.3, 0.6, 0.82 0.2, 0.7, 0.5, 0.53 0.5, 0.3, 0.4, 0.64 0.7, 0.7

Node 1

Node 4

Node 2

Node 3

Page 46: Moving Toward Deep Learning Algorithms on HPCC Systems

Using ECL for implementing L-BFGS

46

0.1, 0.3, 0.6, 0.8, 0.2, 0.7, 0.5, 0.5, 0.5, 0.3, 0.4, 0.6, 0.7, 0.7x

Node_id partition_values1 0.1, 0.3, 0.6, 0.82 0.2, 0.7, 0.5, 0.53 0.5, 0.3, 0.4, 0.64 0.7, 0.7

Node 1

Node 4

Node 2

Node 3

Page 47: Moving Toward Deep Learning Algorithms on HPCC Systems

Example of LOCAL operations

• Scale

47

Page 48: Moving Toward Deep Learning Algorithms on HPCC Systems

Example of LOCAL operations

• Scale

48

Node_id partition_values1 0.1, 0.3, 0.6, 0.82 0.2, 0.7, 0.5, 0.53 0.5, 0.3, 0.4, 0.64 0.7, 0.7

Node 1

Node 4

Node 2

Node 3

x

Page 49: Moving Toward Deep Learning Algorithms on HPCC Systems

Example of LOCAL operations

• Scale

49

Node_id partition_values1 1, 3, 682 2, 7, 5, 53 5, 3, 4, 64 7, 7

Node 1

Node 4

Node 2

Node 3

x_10

Page 50: Moving Toward Deep Learning Algorithms on HPCC Systems

Example of Global operation

• Dot Product

50

Node_id partition_values1 0.1, 0.3, 0.6, 0.82 0.2, 0.7, 0.5, 0.53 0.5, 0.3, 0.4, 0.64 0.7, 0.7

Node 1

Node 4

Node 2

Node 3

Node_id partition_values1 1, 3, 682 2, 7, 5, 53 5, 3, 4, 64 7, 7

Node 1

Node 4

Node 2

Node 3

x x_10

Page 51: Moving Toward Deep Learning Algorithms on HPCC Systems

Example of Global operation

• Dot Product

51

Node_id partition_values1 0.1, 0.3, 0.6, 0.82 0.2, 0.7, 0.5, 0.53 0.5, 0.3, 0.4, 0.64 0.7, 0.7

Node 1

Node 4

Node 2

Node 3

Node_id partition_values1 1, 3, 6, 82 2, 7, 5, 53 5, 3, 4, 64 7, 7

Node 1

Node 4

Node 2

Node 3

x x_10

Page 52: Moving Toward Deep Learning Algorithms on HPCC Systems

Example of Global operation

• Dot Product

52

Node_id dot_value1 2.272 1.393 1.674 0.98

dot_local

Page 53: Moving Toward Deep Learning Algorithms on HPCC Systems

Example of Global operation

• Dot Product

53

Node_id dot_value1 2.272 1.393 1.674 0.98

dot_local

Page 54: Moving Toward Deep Learning Algorithms on HPCC Systems

L-BFGS based Implementations

• Softmax

• Sparse Autoencoder

54

Page 55: Moving Toward Deep Learning Algorithms on HPCC Systems

SoftMax Regression

• Generalizes logistic regression

• More than two classes

• MNIST -> 10 different classes

55

Page 56: Moving Toward Deep Learning Algorithms on HPCC Systems

Formulate to an optimization problem

• Parameters• K × f variables

• Objective function• Generalize logistic regression objective function

• Define a function to calculate objective value and Gradient at a give point

56

Page 57: Moving Toward Deep Learning Algorithms on HPCC Systems

SoftMax Results

57

Page 58: Moving Toward Deep Learning Algorithms on HPCC Systems

SoftMax Results

• Lshtc-large• 410 GB• 61 itr, 81 fun• 1 hour

• Wikipedia-medium• 1,048 GB• 12 itr, 21 fun• Half an hour

58

400 Nodes

Page 59: Moving Toward Deep Learning Algorithms on HPCC Systems

More Examples

• Parameter matrix in SoftMax: K × f

• Data Matrix: f × m

• Multiply these two matrix

• Result is K × m

59

Page 60: Moving Toward Deep Learning Algorithms on HPCC Systems

60

K

f

f

m

Page 61: Moving Toward Deep Learning Algorithms on HPCC Systems

If parameter matrix is small

61

K

f

f

m

Page 62: Moving Toward Deep Learning Algorithms on HPCC Systems

62

Node1 Node2 Node3

Page 63: Moving Toward Deep Learning Algorithms on HPCC Systems

63

Node1 Node2 Node3

Page 64: Moving Toward Deep Learning Algorithms on HPCC Systems

64

Node1 Node2 Node3

K

f

f

m1 m2 m3

Page 65: Moving Toward Deep Learning Algorithms on HPCC Systems

65

Node1 Node2 Node3

K

f

f

m1 m2 m3

LOCAL JOIN

K×m1 K×m2 K×m3

Page 66: Moving Toward Deep Learning Algorithms on HPCC Systems

66

Node1 Node2 Node3

K

f

f

m1 m2 m3

LOCAL JOIN

K×m1 K×m2 K×m3K×m

Page 67: Moving Toward Deep Learning Algorithms on HPCC Systems

If both matrices big

67

K

f

f

m

Page 68: Moving Toward Deep Learning Algorithms on HPCC Systems

If both matrices big

68

f1

f

m2

K

m1m3

f2 f3

Page 69: Moving Toward Deep Learning Algorithms on HPCC Systems

If both matrices big

69

f1

f

m

K

f2 f3

f1

f2

f3

K×m

Page 70: Moving Toward Deep Learning Algorithms on HPCC Systems

If both matrices big

70

f1

f

m

K

f2 f3

f1

f2

f3

K×m K×m

Page 71: Moving Toward Deep Learning Algorithms on HPCC Systems

If both matrices big

71

f1

f

m

K

f2 f3

f1

f2

f3

K×m K×m K×m

Page 72: Moving Toward Deep Learning Algorithms on HPCC Systems

If both matrices big

72

f1

f

m

K

f2 f3

f1

f2

f3

K×m K×m K×m ROLLUP

Page 73: Moving Toward Deep Learning Algorithms on HPCC Systems

Sparse Autoencoder

• Autoencoder• Output is the same as the input

• Sparsity• constraint the hidden neurons to be inactive most of the time

• Stacking them up makes a Deep Network

73

Page 74: Moving Toward Deep Learning Algorithms on HPCC Systems

Formulate to an optimization problem

• Parameters• Weight and bias values

• Objective function• Difference between output and expected output

• Penalty term to impose sparsity

• Define a function to calculate objective value and Gradient at a give point

74

Page 75: Moving Toward Deep Learning Algorithms on HPCC Systems

Sparse Autoencoder results

• 10,000 samples of randomly 8×8 selected patches

75

Page 76: Moving Toward Deep Learning Algorithms on HPCC Systems

Sparse Autoencoder results

• MNIST dataset

76

Page 77: Moving Toward Deep Learning Algorithms on HPCC Systems

Toward Deep Learning

• Provide learned features from one layer to another sparse autoencoder

• …. Stack up to build a deep network

• Fine tuning • Using forward propagation to calculate cost value and back propagation to

calculate gradients

• Use L-BFGS to fine tune

77

Page 78: Moving Toward Deep Learning Algorithms on HPCC Systems

SUMMARY

• HPCC Systems allows implementation of Large-scale ML algorithms

• Optimization Algorithms an important aspect for advanced machine learning problems

• L-BFGS implemented on HPCC Systems• SoftMax• Sparse Autoencoder

• Implement other algorithms by calculating objective value and gradient

• Toward deep learning

78

Page 79: Moving Toward Deep Learning Algorithms on HPCC Systems

• HPCC Systems• https://hpccsystems.com/

• ECL-ML Library• https://github.com/hpcc-systems/ecl-ml

• My GitHub• https://github.com/maryamregister

• My Email• [email protected]

79

Page 80: Moving Toward Deep Learning Algorithms on HPCC Systems

Questions?

Thank You

80