Top Banner
Computer & Information Sciences - University of Delaware Colloquium / 55 Exhaustive Phase Order Search Space Exploration and Evaluation by Prasad Kulkarni (Florida State University)
66

Exhaustive Phase Order Search Space Exploration and Evaluation

Jan 29, 2016

Download

Documents

Mariana SAntos

Exhaustive Phase Order Search Space Exploration and Evaluation. by Prasad Kulkarni (Florida State University). Compiler Optimizations. To improve efficiency of compiler generated code Optimization phases require enabling conditions need specific patterns in the code - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Exhaustive Phase Order Search Space Exploration and Evaluation

Computer & Information Sciences - University of Delaware Colloquium / 55

Exhaustive Phase Order Search Space Exploration and

Evaluation

by

Prasad Kulkarni

(Florida State University)

Page 2: Exhaustive Phase Order Search Space Exploration and Evaluation

2Computer & Information Sciences - University of Delaware Colloquium / 55

Compiler Optimizations

• To improve efficiency of compiler generated code

• Optimization phases require enabling conditions– need specific patterns in the code– many also need available registers

• Phases interact with each other• Applying optimizations in different orders

generates different code

Page 3: Exhaustive Phase Order Search Space Exploration and Evaluation

3Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Ordering Problem

• To find an ordering of optimization phases that produces optimal code with respect to possible phase orderings

• Evaluating each sequence involves compiling, assembling, linking, execution and verifying results

• Best optimization phase ordering depends on– source application– target platform– implementation of optimization phases

• Long standing problem in compiler optimization!!

Page 4: Exhaustive Phase Order Search Space Exploration and Evaluation

4Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Ordering Space

• Current compilers incorporate numerous different optimization phases– 15 distinct phases in our compiler backend

• 15! = 1,307,674,368,000

• Phases can enable each other– any phase can be active multiple times

• 1515 = 437,893,890,380,859,375

– cannot restrict sequence length to 15• 1544 = 5.598 * 1051

Page 5: Exhaustive Phase Order Search Space Exploration and Evaluation

5Computer & Information Sciences - University of Delaware Colloquium / 55

Addressing Phase Ordering

• Exhaustive Search – universally considered intractable

• We are now able to exhaustively evaluate the optimization phase order space.

Page 6: Exhaustive Phase Order Search Space Exploration and Evaluation

6Computer & Information Sciences - University of Delaware Colloquium / 55

Re-stating of Phase Ordering

• Earlier approach– explicitly enumerate all possible optimization

phase orderings

• Our approach– explicitly enumerate all function instances that

can be produced by any combination of phases

Page 7: Exhaustive Phase Order Search Space Exploration and Evaluation

7Computer & Information Sciences - University of Delaware Colloquium / 55

Outline

• Experimental framework

• Exhaustive phase order space evaluation

• Faster conventional compilation

• Conclusions

• Summary of my other work

• Future research directions

Page 8: Exhaustive Phase Order Search Space Exploration and Evaluation

8Computer & Information Sciences - University of Delaware Colloquium / 55

Outline

• Experimental framework

• Exhaustive phase order space evaluation

• Faster conventional compilation

• Conclusions

• Summary of my other work

• Future research directions

Page 9: Exhaustive Phase Order Search Space Exploration and Evaluation

9Computer & Information Sciences - University of Delaware Colloquium / 55

Experimental Framework

• We used the VPO compilation system– established compiler framework, started development

in 1988– comparable performance to gcc –O2

• VPO performs all transformations on a single representation (RTLs), so it is possible to perform most phases in an arbitrary order

• Experiments use all the 15 re-orderable optimization phases in VPO

• Target architecture was the StrongARM SA-100 processor

Page 10: Exhaustive Phase Order Search Space Exploration and Evaluation

10Computer & Information Sciences - University of Delaware Colloquium / 55

VPO Optimization Phases

ID Optimization Phase ID Optimization Phase

b branch chaining l loop transformations

c common subexpr. elim. n code abstraction

d remv. unreachable code o eval. order determin.

g loop unrolling q strength reduction

h dead assignment elim. r reverse branches

i block reordering s instruction selection

j minimize loop jumps u remv. useless jumps

k register allocation

Page 11: Exhaustive Phase Order Search Space Exploration and Evaluation

11Computer & Information Sciences - University of Delaware Colloquium / 55

Disclaimers

• Did not include optimization phases normally associated with compiler front ends– no memory hierarchy optimizations– no inlining or other interprocedural

optimizations

• Did not vary how phases are applied

• Did not include optimizations that require profile data

Page 12: Exhaustive Phase Order Search Space Exploration and Evaluation

12Computer & Information Sciences - University of Delaware Colloquium / 55

Benchmarks• 12 MiBench benchmarks; 244 functions

Category Program Description

autobitcount test processor bit manipulation abilitiesqsort sort strings using quicksort sorting algorithm

networkdijkstra Dijkstra’s shortest path algorithmpatricia construct patricia trie for IP traffic

telecommfft fast fourier transformadpcm compress 16-bit linear PCM samples to 4-bit

consumerjpeg image compression and decompressiontiff2bw convert color .tiff image to b&w image

securitysha secure hash algorithmblowfish symmetric block cipher with variable length key

officestringsearch searches for given words in phrasesispell fast spelling checker

Page 13: Exhaustive Phase Order Search Space Exploration and Evaluation

13Computer & Information Sciences - University of Delaware Colloquium / 55

Outline

• Experimental framework

• Exhaustive phase order space evaluation

• Faster conventional compilation

• Conclusions

• Summary of my other work

• Future research directions

Page 14: Exhaustive Phase Order Search Space Exploration and Evaluation

14Computer & Information Sciences - University of Delaware Colloquium / 55

Terminology

• Active phase – An optimization phase that modifies the function representation

• Dormant phase – A phase that is unable to find any opportunity to change the function

• Function instance – any semantically, syntactically, and functionally correct representation of the source function (that can be produced by our compiler)

Page 15: Exhaustive Phase Order Search Space Exploration and Evaluation

15Computer & Information Sciences - University of Delaware Colloquium / 55

Naïve Optimization Phase Order Space

• All combinations of optimization phase sequences are attempted

ab c

d

a b cd a d a d a d

b c b c b c

L2

L1

L0

Page 16: Exhaustive Phase Order Search Space Exploration and Evaluation

16Computer & Information Sciences - University of Delaware Colloquium / 55

Eliminating Consecutively Applied Phases

• A phase just applied in our compiler cannot be immediately active again

ab c

d

b cd a d a d a

c b b c

L2

L1

L0

a b cd

Page 17: Exhaustive Phase Order Search Space Exploration and Evaluation

17Computer & Information Sciences - University of Delaware Colloquium / 55

Eliminating Dormant Phases

• Get feedback from the compiler indicating if any transformations were successfully applied in a phase.

L2

L1

L0

ab c

d

b cd a d a d

c ba

b c

Page 18: Exhaustive Phase Order Search Space Exploration and Evaluation

18Computer & Information Sciences - University of Delaware Colloquium / 55

Identical Function Instances

• Some optimization phases are independent

– example: branch chaining & register allocation

• Different phase sequences can produce the same code

r[2] = 1;r[2] = 1; r[3] = r[4] + r[2];r[3] = r[4] + r[2];

instruction selectioninstruction selection r[3] = r[4] + 1;r[3] = r[4] + 1;

r[2] = 1;r[2] = 1; r[3] = r[4] + r[2];r[3] = r[4] + r[2];

constant propagationconstant propagation r[2] = 1;r[2] = 1; r[3] = r[4] + 1;r[3] = r[4] + 1;

dead assignment eliminationdead assignment elimination r[3] = r[4] + 1;r[3] = r[4] + 1;

Page 19: Exhaustive Phase Order Search Space Exploration and Evaluation

19Computer & Information Sciences - University of Delaware Colloquium / 55

Equivalent Function Instancessum = 0;for (i = 0; i < 1000; i++ ) sum += a [ i ];

Source Code

r[10]=0; r[12]=HI[a]; r[12]=r[12]+LO[a]; r[1]=r[12]; r[9]=4000+r[12];L3 r[8]=M[r[1]]; r[10]=r[10]+r[8]; r[1]=r[1]+4; IC=r[1]?r[9]; PC=IC<0,L3;

Register Allocation before Code Motion

r[11]=0; r[10]=HI[a]; r[10]=r[10]+LO[a]; r[1]=r[10]; r[9]=4000+r[10];L5 r[8]=M[r[1]]; r[11]=r[11]+r[8]; r[1]=r[1]+4; IC=r[1]?r[9]; PC=IC<0,L5;

Code Motion before Register Allocation

r[32]=0; r[33]=HI[a]; r[33]=r[33]+LO[a]; r[34]=r[33]; r[35]=4000+r[33];L01 r[36]=M[r[34]]; r[32]=r[32]+r[36]; r[34]=r[34]+4; IC=r[34]?r[35]; PC=IC<0,L01;

After Mapping Registers

Page 20: Exhaustive Phase Order Search Space Exploration and Evaluation

20Computer & Information Sciences - University of Delaware Colloquium / 55

Efficient Detection of Unique Function Instances

• After pruning dormant phases there may be tens or hundreds of thousands of unique instances

• Use a CRC (cyclic redundancy check) checksum on the bytes of the RTLs representing the instructions

• Used a hash table to check if an identical or equivalent function instance already exists in the DAG

Page 21: Exhaustive Phase Order Search Space Exploration and Evaluation

21Computer & Information Sciences - University of Delaware Colloquium / 55

Eliminating Identical/Equivalent Function Instances

• Resulting search space is a DAG of function instances

L2

L1

L0

ab c

cd a d a

d

Page 22: Exhaustive Phase Order Search Space Exploration and Evaluation

22Computer & Information Sciences - University of Delaware Colloquium / 55

Static Enumeration ResultsFunction Inst. Fn_inst Len CF

Batch Vs.

optimal

start_input_bmp (j) 1,372 120,777 25 70 1.41

correct(i) 1,295 1,348,154 25 663 4.18

main (t) 1,276 2,882,021 29 389 16.25

parse_switches (j) 1,228 180,762 20 53 0.41

start_input_gif (j) 1009 39,352 21 18 2.46

start_input_tga (j) 972 63,458 21 30 1.66

askmode (i) 942 232,453 24 108 7.87

skiptoword (i) 901 439,994 22 103 1.45

start_input_ppm (j) 795 8,521 16 45 2.70

Average (234) 196.2 89,946.7 14.7 36.2 6.46

Page 23: Exhaustive Phase Order Search Space Exploration and Evaluation

23Computer & Information Sciences - University of Delaware Colloquium / 55

Exhaustively enumerated the optimization phase order space to

find an optimal phase ordering with respect to code-size

[Published in CGO ’06]

Page 24: Exhaustive Phase Order Search Space Exploration and Evaluation

24Computer & Information Sciences - University of Delaware Colloquium / 55

Determining Program Performance

• Almost 175,000 distinct function instances, on average– largest enumerated function has 2,882,021 instances

• Too time consuming to execute each distinct function instance– assemble link execute more expensive than compilation

• Many embedded development environments use simulation– simulation orders of magnitude more expensive than execution

• Use data obtained from a few executions to estimate the performance of all remaining function instances

Page 25: Exhaustive Phase Order Search Space Exploration and Evaluation

25Computer & Information Sciences - University of Delaware Colloquium / 55

Determining Program Performance (cont...)

• Function instances having identical control-flow graphs execute each block the same number of times

• Execute application once for each control-flow structure

• Statically estimate the number of cycles required to execute each basic block

• dynamic frequency measure = static cycles * block frequency)

Page 26: Exhaustive Phase Order Search Space Exploration and Evaluation

26Computer & Information Sciences - University of Delaware Colloquium / 55

Predicting Relative Performance – I

4 cycles

20 cycles

27 cycles

2 cycles

5 cycles

10 cycles

20

15

20

5

5

2

4 cycles

22 cycles

25 cycles

2 cycles

10 cycles

10 cycles

20

15

20

5

5

2

Total cycles = 744 Total cycles = 789

Page 27: Exhaustive Phase Order Search Space Exploration and Evaluation

27Computer & Information Sciences - University of Delaware Colloquium / 55

Dynamic Frequency Results

Function Inst. Fn_inst Len CF Leaf% from optimal

Batch Worst

main (t) 1,276 2,882,021 29 389 15,164 0.0 84.3

parse_switches (j) 1,228 180,762 20 53 2,027 6.7 64.8

askmode (i) 942 232,453 24 108 475 8.4 56.2

skiptoword (i) 901 439,994 22 103 2,834 6.1 49.6

start_input_ppm (j) 795 8,521 16 45 80 1.7 28.4

pfx_list_chk (i) 640 1,269,638 44 136 4,660 4.3 78.6

main (f) 624 2,789,903 33 122 4,214 7.5 46.1

sha_transform (h) 541 548,812 32 98 5,262 9.6 133.4

main (p) 483 14,510 15 10 178 7.7 13.1

Average (79) 234.3 174,574.8 16.1 47.4 813.4 4.8 65.4

Page 28: Exhaustive Phase Order Search Space Exploration and Evaluation

28Computer & Information Sciences - University of Delaware Colloquium / 55

Correlation – Dynamic Frequency Counts Vs. Simulator Cycles

• Static performance estimation is inaccurate– ignored cache/branch misprediction penalties

• Most embedded systems have simpler architectures– estimation may be sufficiently accurate– simulator cycles are close to executed cycles

• We show strong correlation between our measure of performance and simulator cycles

Page 29: Exhaustive Phase Order Search Space Exploration and Evaluation

29Computer & Information Sciences - University of Delaware Colloquium / 55

Complete Function Correlation

• Example: init_search in stringsearch

Page 30: Exhaustive Phase Order Search Space Exploration and Evaluation

30Computer & Information Sciences - University of Delaware Colloquium / 55

Leaf Function Correlation

• Leaf function instances are generated when no additional phases can be successfully applied

• Leaf instances provide a good sampling– represents the only code that can be generated by an

aggressive compiler, like VPO– at least one leaf instance represents an optimal phase

ordering for over 86% of functions– significant percent of leaf instances among optimal

Page 31: Exhaustive Phase Order Search Space Exploration and Evaluation

31Computer & Information Sciences - University of Delaware Colloquium / 55

Leaf Function Correlation Statistics

• Pearson’s correlation coefficient

• Accuracy of our estimate of optimal perf.

xy – (xy)/n

sqrt( (x2 – (x)2/n) * (y2 - (y)2/n) )Pcorr =

Lcorr = cycle count for best leaf

cy. cnt for leaf with best dynamic freq count

Page 32: Exhaustive Phase Order Search Space Exploration and Evaluation

32Computer & Information Sciences - University of Delaware Colloquium / 55

Leaf Function Correlation Statistics (cont…)

Function PcorrLcorr 0% Lcorr 1%

Ratio Leaves Ratio LeavesAR_btbl...(b) 1.00 1.00 1 1.00 1BW_btbl...(b) 1.00 1.00 2 1.00 2bit_count.(b) 1.00 1.00 2 1.00 2bit_shifter(b) 1.00 1.00 2 1.00 2bitcount(b) 0.89 0.92 1 0.92 1main(b) 1.00 1.00 6 1.00 23ntbl_bitcnt(b) 1.00 0.95 2 0.95 2ntbl_bit…(b) 0.99 1.00 2 1.00 2dequeue(d) 0.99 1.00 6 1.00 6dijkstra(d) 1.00 0.97 4 1.00 269.... …. …. …. …. ….

average 0.96 0.98 4.38 0.996 21

Page 33: Exhaustive Phase Order Search Space Exploration and Evaluation

33Computer & Information Sciences - University of Delaware Colloquium / 55

Exhaustively evaluated the optimization phase order space tofind a near-optimal phase ordering

with respect to simulator cycles

[Published in LCTES ’06]

Page 34: Exhaustive Phase Order Search Space Exploration and Evaluation

34Computer & Information Sciences - University of Delaware Colloquium / 55

Outline

• Experimental framework

• Exhaustive phase order space evaluation

• Faster conventional compilation

• Conclusions

• Summary of my other work

• Future research directions

Page 35: Exhaustive Phase Order Search Space Exploration and Evaluation

35Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Enabling Interaction

• b enables a along the path a-b-a

ab

c

baccb

a d

Page 36: Exhaustive Phase Order Search Space Exploration and Evaluation

36Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Enabling Probabilities

Ph St b c d g h i j k l n o q r s u

b 0.72 0.02 0.010.04 0.01 0.02 0.66

c 1.00 0.01 0.680.01

0.02 0.07 0.05 0.15 0.34

d 1.00 1.00 1.00

g 0.22 0.28 0.17 0.05 0.02 0.14 0.34 0.09 0.15

h 0.08 0.16 0.14 0.020.01 0.20

i 0.72 0.04 0.01 0.09

j 0.030.06

0.44

k 0.98 0.28 0.01

0.02

0.01 0.96

l 0.60 0.73 0.02 0.01 0.01

0.03 0.53

n 0.41 0.36 0.01 0.01 0.01 0.29

o 0.88 0.40

0.03

q 0.99 0.02 0.99

r 0.570.06 0.06

s 1.00 0.33 0.41 0.83 0.07 0.050.15 0.07

u 0.01 0.01

0.02

Page 37: Exhaustive Phase Order Search Space Exploration and Evaluation

37Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Disabling Interaction

• b disables a along the path b-c-d

ab

c

baccb

a d

Page 38: Exhaustive Phase Order Search Space Exploration and Evaluation

38Computer & Information Sciences - University of Delaware Colloquium / 55

Disabling Probabilities

Ph b c d g h i j k l n o q r s u

b 1.00 0.28 0.09 0.18 0.20 0.11 0.01

c 0.01 1.00 0.02

0.08

0.02

0.30

0.32

1.00 0.08

d 1.00 0.03

0.01

0.01

g 0.13 1.00 0.06 0.01 0.12 0.22

h 0.01 0.01 1.00

0.04

0.10

1.00

0.01

i 0.02 0.22 1.00 0.20 0.01 0.44 0.91

j 0.01 0.08 1.00 0.01 0.16

k 0.010.05 1.00

0.05

0.14

1.00

l 0.02 1.00

0.11 0.04 0.07 1.00

0.32

1.00

n 0.07 0.01 0.02 0.01 0.01 1.00 1.00 0.01

o 0.01 0.08 0.01 1.00

q 1.00

r 0.06 0.200.36 1.00 0.05

s 0.07 0.03

0.31

0.22

0.14

0.26

0.02 1.00

u 0.41 0.02 0.34 0.15 1.00

Page 39: Exhaustive Phase Order Search Space Exploration and Evaluation

39Computer & Information Sciences - University of Delaware Colloquium / 55

Faster Conventional Compiler

• Modified VPO to use enabling and disabling phase probabilities to decrease compilation time

# p[i] - current probability of phase i being active # e[i][j] - probability of phase j enabling phase i # d[i][j] - probability of phase j disabling phase i

For each phase i do p[i] = e[i][st]; While (any p[i] > 0) do Select j as the current phase with highest probability of being active Apply phase j If phase j was active then For each phase i, where i != j do

p[i] += ((1-p[i]) * e[i][j]) - (p[i] * d[i][j]) p[j] = 0

Page 40: Exhaustive Phase Order Search Space Exploration and Evaluation

40Computer & Information Sciences - University of Delaware Colloquium / 55

Probabilistic Compilation Results

Function Old Compilation Prob. Compilation Prob. / OldAttempted Active Attempted Active Time Size Speed

start_inp...(j) 233 16 55 14 0.469 1.014 N/A parse_swi...(j) 233 14 53 12 0.371 1.016 0.972start_inp...(j) 270 15 55 14 0.353 1.010 N/A start_inp...(j) 233 14 49 13 0.420 1.003 N/A start_inp...(j) 231 11 53 12 0.436 1.004 1.000fft_float(f) 463 28 99 25 0.451 1.012 0.974main(f) 284 20 73 18 0.550 1.007 1.000sha_trans...(h) 284 17 67 16 0.605 0.965 0.953read_scan...(j) 233 13 43 10 0.342 1.018 N/A LZWReadByte(j) 268 12 45 11 0.325 1.014 N/A main(j) 270 12 57 14 0.375 1.007 1.000dijkstra(d) 231 9 43 9 0.409 1.010 1.000.... .... .... .... .... .... .... ....

average 230.3 8.9 47.7 9.6 0.297 1.015 1.005

Page 41: Exhaustive Phase Order Search Space Exploration and Evaluation

41Computer & Information Sciences - University of Delaware Colloquium / 55

Outline

• Experimental framework

• Exhaustive phase order space evaluation

• Faster conventional compilation

• Conclusions

• Summary of my other work

• Future research directions

Page 42: Exhaustive Phase Order Search Space Exploration and Evaluation

42Computer & Information Sciences - University of Delaware Colloquium / 55

Conclusions

• Phase ordering problem– long standing problem in compiler optimization– exhaustive evaluation always considered infeasible

• Exhaustively evaluated the phase order space– re-interpretation of the problem– novel application of search algorithms– fast pruning techniques– accurate prediction of relative performance

• Analyzed properties of the phase order space to speedup conventional compilation

• published in CGO’06, LCTES’06, submitted to TOPLAS

Page 43: Exhaustive Phase Order Search Space Exploration and Evaluation

43Computer & Information Sciences - University of Delaware Colloquium / 55

Challenges

• Exhaustive phase order search is a severe stress test for the compiler– isolate analysis required and invalidated by each phase– produce correct code for all phase orderings– eliminate all memory leaks

• Search algorithm needs to be highly efficient– used CRCs and hashes for function comparisons– stored intermediate function instances to reduce disk

access– maintained logs to restart search after crash

Page 44: Exhaustive Phase Order Search Space Exploration and Evaluation

44Computer & Information Sciences - University of Delaware Colloquium / 55

Outline

• Experimental framework

• Exhaustive phase order space evaluation

• Faster conventional compilation

• Conclusions

• Summary of my other work

• Future research directions

Page 45: Exhaustive Phase Order Search Space Exploration and Evaluation

45Computer & Information Sciences - University of Delaware Colloquium / 55

VISTA

• Provides an interactive code improvement paradigm– view low-level program representation– apply existing phases and manual changes in any

order– browse and undo previous changes– automatically obtain performance information– automatically search for effective phase sequences

• Useful as a research as well as teaching tool– employed in three universities

• published in LCTES ’03, TECS ‘06

Page 46: Exhaustive Phase Order Search Space Exploration and Evaluation

46Computer & Information Sciences - University of Delaware Colloquium / 55

VISTA – Main Window

Page 47: Exhaustive Phase Order Search Space Exploration and Evaluation

47Computer & Information Sciences - University of Delaware Colloquium / 55

Faster Genetic Algorithm Searches

• Improving performance of genetic algorithms– avoid redundant executions of the application

• over 87% of executions were avoided• reduce search time by 62%

– modify search to obtain comparable results in fewer generations

• reduced GA generations by 59%• reduce search time by 35%

• published in PLDI ’04, TACO ’05

Page 48: Exhaustive Phase Order Search Space Exploration and Evaluation

48Computer & Information Sciences - University of Delaware Colloquium / 55

Heuristic Search Algorithms

• Analyzing the phase order space to improve heuristic algorithms– detailed performance and cost comparison of different

heuristic algorithms– demonstrated the importance and difficulty of

selecting the correct sequence length– illustrated the importance of leaf function instances– proposed modifications to existing algorithms, and

new search algorithms

• Will be published in CGO ‘07

Page 49: Exhaustive Phase Order Search Space Exploration and Evaluation

49Computer & Information Sciences - University of Delaware Colloquium / 55

Dynamic Compilation

• Explored asynchronous dynamic compilation in a virtual machine– demonstrated shortcomings of current popular

compilation strategy– describe importance of minimum compiler utilization– discussed new compilation strategies– explored the changes needed to current compilation

strategies to exploit free cycles

• Submitted to VEE ‘07

Page 50: Exhaustive Phase Order Search Space Exploration and Evaluation

50Computer & Information Sciences - University of Delaware Colloquium / 55

Outline

• Experimental framework

• Exhaustive phase order space evaluation

• Faster conventional compilation

• Conclusions

• Summary of my other work

• Future research directions

Page 51: Exhaustive Phase Order Search Space Exploration and Evaluation

51Computer & Information Sciences - University of Delaware Colloquium / 55

• Support for parallelism– traditional languages– express parallelism– dynamic scheduling

• Virtual machines– dynamic code generation

and optimization

• Push compilation decisions further down

Compiler Technology Challenges

c o m p i l e r

• multi-core– heterogeneous cores

• No great solution– performance monitoring– software-controlled

reconfiguration

• Can no longer do it alone

High LevelLanguage

MachineArchitecture

Page 52: Exhaustive Phase Order Search Space Exploration and Evaluation

52Computer & Information Sciences - University of Delaware Colloquium / 55

Iterative Compilation & Machine Learning

• Improved scope for iterative compilation & machine learning– proliferation of new architectures

• automate tuning compiler heuristics

– tuning important libraries– using performance monitors

• dynamic JIT compilers

• How to use machine learning to optimize and schedule more efficiently ?

Page 53: Exhaustive Phase Order Search Space Exploration and Evaluation

53Computer & Information Sciences - University of Delaware Colloquium / 55

Dynamic Compilation

• Virtual machines likely to grow in importance– productivity, portability, interoperability, isolation...

• Challenges– when, what, how to parallelize– using hardware performance monitors– using static analyses to aid dynamic compilation– debugging tools for correctness and performance

debugging

Page 54: Exhaustive Phase Order Search Space Exploration and Evaluation

54Computer & Information Sciences - University of Delaware Colloquium / 55

Heterogeneous Multi-core Architectures

• Can provide the best performance, cost, power balance

• Challenges– schedule tasks, allocate resources– dynamic core-specific optimization– automatic data layout to prevent conflicts

Page 55: Exhaustive Phase Order Search Space Exploration and Evaluation

55Computer & Information Sciences - University of Delaware Colloquium / 55

Questions ?

Page 56: Exhaustive Phase Order Search Space Exploration and Evaluation

58Computer & Information Sciences - University of Delaware Colloquium / 55

Leaf Vs. Non-Leaf Performance

Page 57: Exhaustive Phase Order Search Space Exploration and Evaluation

59Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Order Space Evaluation – Summary

generategeneratenext next

optimizationoptimizationsequencesequence

last last phase phase active?active?

identicalidenticalfunctionfunction

instance?instance?

equivalentequivalentfunctionfunction

instance?instance?

calculatecalculatefunctionfunction

performanceperformance

simulatesimulateapplicationapplication

seenseencontrol-flowcontrol-flowstructure?structure?

Y

Y Y

Y

N

N N

N

add nodeadd nodeto DAGto DAG

Page 58: Exhaustive Phase Order Search Space Exploration and Evaluation

60Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Order Space Evaluation – Summary

last last phase phase active?active?

identicalidenticalfunctionfunction

instance?instance?

equivalentequivalentfunctionfunction

instance?instance?

calculatecalculatefunctionfunction

performanceperformance

simulatesimulateapplicationapplication

seenseencontrol-flowcontrol-flowstructure?structure?

Y

Y Y

Y

N

N N

N

add nodeadd nodeto DAGto DAG

generategeneratenext next

optimizationoptimizationsequencesequence

Page 59: Exhaustive Phase Order Search Space Exploration and Evaluation

61Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Order Space Evaluation – Summary

last last phase phase active?active?

identicalidenticalfunctionfunction

instance?instance?

equivalentequivalentfunctionfunction

instance?instance?

calculatecalculatefunctionfunction

performanceperformance

simulatesimulateapplicationapplication

seenseencontrol-flowcontrol-flowstructure?structure?

Y

Y Y

Y

N

N N

N

add nodeadd nodeto DAGto DAG

generategeneratenext next

optimizationoptimizationsequencesequence

Page 60: Exhaustive Phase Order Search Space Exploration and Evaluation

62Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Order Space Evaluation – Summary

last last phase phase active?active?

identicalidenticalfunctionfunction

instance?instance?

equivalentequivalentfunctionfunction

instance?instance?

calculatecalculatefunctionfunction

performanceperformance

simulatesimulateapplicationapplication

seenseencontrol-flowcontrol-flowstructure?structure?

Y

Y Y

Y

N

N N

N

add nodeadd nodeto DAGto DAG

generategeneratenext next

optimizationoptimizationsequencesequence

Page 61: Exhaustive Phase Order Search Space Exploration and Evaluation

63Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Order Space Evaluation – Summary

last last phase phase active?active?

identicalidenticalfunctionfunction

instance?instance?

equivalentequivalentfunctionfunction

instance?instance?

calculatecalculatefunctionfunction

performanceperformance

simulatesimulateapplicationapplication

seenseencontrol-flowcontrol-flowstructure?structure?

Y

Y Y

Y

N

N N

N

add nodeadd nodeto DAGto DAG

generategeneratenext next

optimizationoptimizationsequencesequence

Page 62: Exhaustive Phase Order Search Space Exploration and Evaluation

64Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Order Space Evaluation – Summary

last last phase phase active?active?

identicalidenticalfunctionfunction

instance?instance?

equivalentequivalentfunctionfunction

instance?instance?

calculatecalculatefunctionfunction

performanceperformance

simulatesimulateapplicationapplication

seenseencontrol-flowcontrol-flowstructure?structure?

Y

Y Y

Y

N

N N

N

add nodeadd nodeto DAGto DAG

generategeneratenext next

optimizationoptimizationsequencesequence

Page 63: Exhaustive Phase Order Search Space Exploration and Evaluation

65Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Order Space Evaluation – Summary

last last phase phase active?active?

identicalidenticalfunctionfunction

instance?instance?

equivalentequivalentfunctionfunction

instance?instance?

calculatecalculatefunctionfunction

performanceperformance

simulatesimulateapplicationapplication

seenseencontrol-flowcontrol-flowstructure?structure?

Y

Y Y

Y

N

N N

N

add nodeadd nodeto DAGto DAG

generategeneratenext next

optimizationoptimizationsequencesequence

Page 64: Exhaustive Phase Order Search Space Exploration and Evaluation

66Computer & Information Sciences - University of Delaware Colloquium / 55

Phase Order Space Evaluation – Summary

last last phase phase active?active?

identicalidenticalfunctionfunction

instance?instance?

equivalentequivalentfunctionfunction

instance?instance?

calculatecalculatefunctionfunction

performanceperformance

simulatesimulateapplicationapplication

seenseencontrol-flowcontrol-flowstructure?structure?

Y

Y Y

Y

N

N N

N

add nodeadd nodeto DAGto DAG

generategeneratenext next

optimizationoptimizationsequencesequence

Page 65: Exhaustive Phase Order Search Space Exploration and Evaluation

68Computer & Information Sciences - University of Delaware Colloquium / 55

Predicting Relative Performance – II

4 cycles

10 cycles

5

4 cycles

?

15 cycles

26 cycles

15 cycles

90 cycles

2 cycles

44 cycles

10 cycles

?

?

?

?

?

?

?

Total cycles = 170 Total cycles = ??

10 cycles

10 cycles

5

5

5

Page 66: Exhaustive Phase Order Search Space Exploration and Evaluation

69Computer & Information Sciences - University of Delaware Colloquium / 55

Case when No Leaf is Optimal