Top Banner
Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cm u.edu
12

Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Dec 22, 2015

Download

Documents

Rebecca Lynch
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Software Pipelining in Pegasus/CASH

Cody Hartwig

Elie Krevat

{chartwig,ekrevat}@cs.cmu.edu

Page 2: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Software Pipelining Software pipelining is a method for increasing the available

parallelism for instruction scheduling Data dependencies limit the opportunity for parallel execution Software pipelining can overlap loop iterations to increase

available operations to schedule between dependencies Many techniques exist [classification by Allan et al.]

Kernel recognition (e.g., Aiken & Nicolau) Assumes schedule for iterations are fixed, loop is unrolled n times Pattern recognition identifies a repeating kernel

Modulo scheduling Analysis of data dependencies (resource/precedence constraints) Finds minimum initiation interval to use when scheduling

Page 3: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Software Pipelining in Pegasus/CASH

Pegasus is an intermediate representation used by the CASH compiler Pegasus graph models control-flow and data-flow

Our Approach: Apply optimizations to the Pegasus graph, not the generated assembly Abstracts away resource constraints Feedback loop possible after scheduler and

register allocation (e.g., to implement less aggressive pipelining because of register spilling)

Page 4: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

How Operations are Pipelined

Our approach computes operation outputs for future loop iterations in the current iteration Operations are copied into pre-header and the data-flow for

values before and after executing that operation are fed into the loop hyperblock

Then each loop iteration uses the value of the operation already computed, and computes the operation value for the next iteration

This approach is analogous to preparing temporary variables of future iterations to make the loop body schedule more efficient

Page 5: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Choosing Operations to Pipeline via Pattern Matching

An operation may be pipelined if it matches a number of possible patterns Patterns depend only on the type of operation and the

source of its inputs Operation type must allow speculative execution (e.g.,

loads are ok, but not stores)

Operations on the most expensive paths to etas are the first ones moved The most expensive path is not necessarily the longest

(e.g., a single ‘load’ operation is more expensive than two ‘add’ operations)

Page 6: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Recognized Patterns

Arithmetic Operation Load Operation Cast Operation

As operations are moved, new operations will form the recognized patterns

Page 7: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Example

int i = 0;

char a[100];

while(i < 100) {

char tmp = a[i];

tmp = tmp * 2;

a[i] = tmp;

i++;

}

The load and store are forced to execute in series

Operations in red are available to move

Page 8: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Step 1 Step 2

Load and store are no longer dependent!

Page 9: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Evaluation – Moving Average

void move_avg(int *a){ int i = 1; while (i < l00) { int t1 = a[i]; int t2 = a[i-1]; a[i] = (t1+t2)/2; i++; }}

Schedule Length Statistics(after moving 11 operations)

Before After

Pre-header 8 14

Loop Body 22 18

Cost of entire function ≈ Cost(Pre-header) + 100*Cost(Loop Body)

Cost before Software Pipelining ≈ 2208

Cost after Software Pipelining ≈ 1814

Software Pipelining improves performance here by ≈ 18%

Page 10: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Moving Average – Before Software Pipelining

Page 11: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Moving Average – After Software PipeliningPipelined graphs are considerably more complex

Page 12: Software Pipelining in Pegasus/CASH Cody Hartwig Elie Krevat {chartwig,ekrevat}@cs.cmu.edu.

Conclusion

Software pipelining at the Pegasus level can achieve significant loop improvement

Most regular operation types are pipelinable via our iterative pattern matching algorithm

Cost of improvement is increased register pressure & more complicated Pegasus graphs