Top Banner
Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4 . Continuing Development
40

Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Compiler Optimization Overview

1. Computer Hardware Architecture Review2. Analysis

3. Optimizations4 . Continuing Development

Page 2: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Review: Phases of a Compiler

Intermediate code optimizations are not machine specific

Low level optimizations can be machine specific

Page 3: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Review: Compiler Options

Page 4: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Review: Basic Processor Parts

Page 5: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Review: CISC vs RISK

CISC x86 Intel Multi-clock complex

instructions Memory access

incorporated in instruction

Complex instruction set

RISC Mac Powerbook Single clock

instructions Memory accesses

are separate instructions

Simple instruction set

Page 6: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Review: Memory Hierarchy

Memory access becomes exponentially slower at higher levels

Memory access intensive programs require special optimizations

Page 7: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Review: Multiple Cores

Need to create and use ILP

Multiple cores on the same die can share cache working together faster

Can only execute trivial parallelism (Dr. Doughty)

Must eliminate hazards

Page 8: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Review: Pipelines

Page 9: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Review: Pipelines

Page 10: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Compiler Optimization Overview

1. Computer Hardware Architecture Review2. Analysis

3. Optimizations4 . Continuing Development

Page 11: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Optimization Goals

SpeedExecutable SizeMemory AccessPower Usage – EmbeddedDebugging

Page 12: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Optimizing for Speed*

Useful for CPU intensive applications (graphics, video editing, sorting)

Scheduling – out of order execution Removal of dependencies increase ILP Instruction latency Multiple ALUs, Cores, etc Mix instruction types (int, float, mult, read,

write) Eliminate jumps Buffer writes (cannot write out of order)

Page 13: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Optimizing for Size

More common for embedded applications Competing with power/speed optimizations

Limiting code size to keep critical loops in memory

Choose form of instruction that is smaller (CISC)

Use short constants for jumps (simpler form of addressing)

Increase instruction length for loop alignment

Page 14: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Optimizing for Memory

Useful for memory I/O intensive applications Consideration of proper alignment of data

and instructions to reduce cache misses and improve results of paging

Use instructions for controlling cache Partially addresses Von Neumann bottleneck Reading lowest level cache in P4 is 3 clocks

Each higher level is an order of magnitude larger (10, 100)

Page 15: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Analysis

AliasControl flowData flowDependenceInterprocedural

Page 16: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Alias Analysis

Determines if there are multiple ways to access a single data point

Knowing aliases helps identify optimizations by recognizing data dependencies and locating redundant code/data updates

Alias analysis is critical for global optimizations (reference parameters, globally defined data, pointers)

Page 17: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Control Flow Analysis

Precursor to critical loop reductions Replacement of inefficient code

Gathers information concerning hierarchical flow of control

Identifies potential branches in program execution useful for mitigating pipeline hazards

Page 18: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Example: Fibonacci

Page 19: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Example: Fibonacci

Page 20: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Example: Fibonacci

Page 21: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Data Flow Analysis

Procure information about how a procedure uses data

Builds on structures from control flow analysis

There are many ways to achieve goal: Reaching definitions

Calculate potential definitions at a give point in the code

Iterative Analysis Use control graph

Structural Analysis etc

Page 22: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Dependence Analysis*

Recognizes relationships using a DAG True/Flow dependence Anitdependence Output dependence Input dependence (does not affect execution order)

Instruction scheduling Data caching

Page 23: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Interprocedural Analysis

Incorporates analysis methods discussed earlier, but on a broader level

OOD and high level coding methodologies are optimal for human understanding, not computer processing

Includes analysis of relationships between function calls to mitigate overhead of OOD oriented code

Page 24: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Compiler Optimization Overview

1. Computer Hardware Architecture Review2. Analysis

3. Optimizations4 . Continuing Development

Page 25: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Loop Optimizations*

Loop optimizations have the greatest impact on overall code performance

Desire to reduce dependencies to allow ILP Desire to reduce overhead of jumping and

branching in loop Predictability – predicting loop behavior to

mitigate pipeline hazards Loops must be well behaved

Single return No breaks, branches, etc

Page 26: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Loop Strength Reduction

Page 27: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Procedure Optimizations

Based on control flow Desire to eliminate overhead of context

switches Possibly turn function calls into branches Optimizations occur at high and low level

High level – Procedure integration Low level – In line expansion

Conventions Leaf routines (call no others) have reduced

overhead Shrink wrapping creates pseudo leaves by

adding data flow analysis

Page 28: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Tail Call Optimization: Tail Recursion

Page 29: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Code Scheduling*

Block Scheduling Blocks optimized as independent pieces of code Cross block scheduling applied to optimized

blocks Branch Scheduling

Fill stall cycles after branch with independent code

Reduces effect of bad branch predictions in HW pipeline

Software Pipelining Executes multiple iterations of loops

synchronously

Page 30: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Register Allocation

Applies to low level assembly Loops and nesting are used to weigh which

values should be maintained in registers Nested loops weigh more heavily Considers variable activity before and after block

of code is accessed Use of operation costs and number of times they

are performed

Page 31: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Register Allocation Calculation

Page 32: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Register Allocation: Graph Coloring

Use subset of objects that should be allocated to registers

Arcs represent points where two objects exist at the same time

Arcs represent conflicts where the object cannot be assigned a register (int, float)

Color graph with number of colors equal to number of registers

Assign registers based on color

Page 33: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Redundancy Elimination

Based on data flow analysis Intermediate level optimization Includes:

Common subexpression elimination Loop invariant code motion Partial redundancy elimination Code hoisting

Page 34: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Peephole Optimizations

Focused on very small subsets of code Generally performed late in the code process Arguably covers up bad and incomplete

optimizations from earlier processes Some examples include:

Dead code elimination (created from earlier optimizations)

Strength reductions Constant folding Instruction combining Copy propagation Algebraic simplifications

Page 35: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Compiler Optimization Overview

1. Computer Hardware Architecture Review2. Analysis

3. Optimizations4 . Continuing Development

Page 36: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Continuous Relevance of Compiler Development

Back end of compilers for older languages are reworked to take advantage of advances in hardware

Pipelines are becoming longer Multiple cores are now common allowing

more use of parallel instructions

Page 37: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Research Areas

Domain specific subjects: security, reliability, parallel, distributed, embedded, mobile

Analysis, prediction, and debugging tools Embedded JIT compilation Development of a research compiler (GCC) Enhancing compiler optimization times,

specifically iterative and whole program optimizations

MS F# - functional language for .NET like ML

Page 38: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Compiler Job Options

Additional exploitation of parallel computing environments for desktop platforms

Multiple OS/Environment support Integration of AI techniques, machine

learning, to know when, how, where to apply optimizations (GCC)

Special purpose languages for video, graphics, and audio processing (nVidea)

Special purpose vendors for embedded products (Wind River, VxWorks)

Page 39: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Compiler Job Options

Library adaptation for reconfigurable processors (GCC)

Fault tolerance and exception handling for security

Page 40: Compiler Optimization Overview 1. Computer Hardware Architecture Review 2. Analysis 3. Optimizations 4. Continuing Development.

Compiler Optimization Problems

Many optimizations are localized Non-local optimizations create increased

overhead in the computation process Multiple objectives of optimizations create

conflicts For example: speed vs executable size