Intermediate Representations
Dec 29, 2015
Intermediate Representations
Intermediate Representations
• Front end - produces an intermediate representation (IR)
• Middle end - transforms the IR into an equivalent IR that runs more efficiently
• Back end - transforms the IR into native code
• IR encodes the compiler’s knowledge of the program• Middle end usually consists of several passes
FrontEnd
MiddleEnd
BackEnd
IR IRSourceCode
TargetCode
Intermediate Representations
• Decisions in IR design affect the speed and efficiency of the compiler
• Some important IR properties Ease of generation Ease of manipulation Procedure size Freedom of expression Level of abstraction
• The importance of different properties varies between compilers Selecting an appropriate IR for a compiler is critical
Types of Intermediate RepresentationsThree major categories• Structural
Graphically oriented Heavily used in source-to-source translators Tend to be large
• Linear Pseudo-code for an abstract machine Level of abstraction varies Simple, compact data structures Easier to rearrange
• Hybrid Combination of graphs and linear code Example: control-flow graph
Examples:Trees, DAGs
Examples:3 address codeStack machine code
Example:Control-flow graph
Level of Abstraction
• The level of detail exposed in an IR influences the profitability and feasibility of different optimizations.
• Two different representations of an array reference:
subscript
A i j
loadI 1 => r1
sub rj, r1 => r2
loadI 10 => r3
mult r2, r3 => r4
sub ri, r1 => r5
add r4, r5 => r6
loadI @A => r7
Add r7, r6 => r8
load r8 => rAij
High level AST:Good for memory disambiguation
Low level linear code:Good for address calculation
Level of Abstraction
• Structural IRs are usually considered high-level• Linear IRs are usually considered low-level• Not necessarily true:
+
*
10
j 1
- j 1
-
+
@A
load
Low level ASTloadArray A,i,j
High level linear code
Abstract Syntax Tree
An abstract syntax tree is the procedure’s parse tree with the nodes for most non-terminal nodes removed
x - 2 * y• Can use linearized form of the tree
Easier to manipulate than pointers
x 2 y * - in postfix form
- * 2 y x in prefix form
• S-expressions are (essentially) ASTs
-
x
2 y
*
Directed Acyclic Graph
A directed acyclic graph (DAG) is an AST with a unique node for each value
• Makes sharing explicit• Encodes redundancy
x
2 y
*
-
z /
w
z x - 2 * yw x / 2
Stack Machine Code
Originally used for stack-based computers, now Java• Example:
x - 2 * y becomes
Advantages• Compact form• Introduced names are implicit, not explicit• Simple to generate and execute code
Useful where code is transmittedover slow communication links (the net )
push xpush 2push ymultiplysubtract
Implicit names take up no space, where explicit ones do!
Three Address Code
Several different representations of three address code• In general, three address code has statements of the
form:x y op z
With 1 operator (op ) and, at most, 3 names (x, y, & z)
Example:z x - 2 * y becomes
Advantages:• Resembles many machines• Introduces a new set of names• Compact form
t 2 * yz x - t
Three Address Code: Quadruples
Naïve representation of three address code• Table of k * 4 small integers• Simple record structure• Easy to reorder• Explicit names
load 1 Y
loadi 2 2
mult 3 2 1
load 4 X
sub 5 4 2
load r1, yloadI r2, 2mult r3, r2, r1load r4, xsub r5, r4, r3
RISC assembly code Quadruples
The original FORTRAN compiler used “quads”
Three Address Code: Triples
• Index used as implicit name• 25% less space consumed than quads• Much harder to reorder
load y
loadI 2
mult (1) (2)
load x
sub (4) (3)
(1)
(2)
(3)
(4)
(5)
Implicit names take no space!
Three Address Code: Indirect Triples
• List first triple in each statement• Implicit name space• Uses more space than triples, but easier to reorder
• Major tradeoff between quads and triples is compactness versus ease of manipulation In the past compile-time space was critical Today, speed may be more important
load y
loadI 2
mult (100)
(101)
load x
sub (103)
(102)
(100)
(101)
(102)
(103)
(104)
(100)
(105)
Static Single Assignment Form
• The main idea: each name defined exactly once• Introduce -functions to make it work
Strengths of SSA-form• Sharper analysis• -functions give hints about placement
Original
x …y …while (x < k) x x + 1 y y + x
SSA-form
x0 … y0 …
if (x0 > k) goto nextloop: x1 (x0,x2)
y1 (y0,y2) x2 x1 + 1 y2 y1 + x2
if (x2 < k) goto loopnext: …
Two Address Code
• Allows statements of the formx x op y
Has 1 operator (op ) and, at most, 2 names (x and y)
Example: z x - 2 * y becomes
• Can be very compact
Problems• Machines no longer rely on destructive operations• Difficult name space
Destructive operations make reuse hard Good model for machines with destructive ops (PDP-11)
t1 2t2 load yt2 t2 * t1
z load xz z - t2
Control-flow Graph
Models the transfer of control in the procedure• Nodes in the graph are basic blocks
Can be represented with quads or any other linear representation
• Edges in the graph represent control flow
Example
if (x = y)
a 2b 5
a 3b 4
c a * b
Basic blocks —Maximal length sequences of straight-line code
Using Multiple Representations
• Repeatedly lower the level of the intermediate representation Each intermediate representation is suited towards certain
optimizations
• Example: the Open64 compiler WHIRL intermediate format
Consists of 5 different IRs that are progressively more detailed
FrontEnd
MiddleEnd
BackEnd
IR 1 IR 3SourceCode
TargetCode
MiddleEnd
IR 2
Memory Models
Two major models• Register-to-register model
Keep all values that can legally be stored in a register in registers
Ignore machine limitations on number of registers Compiler back-end must insert loads and stores
• Memory-to-memory model Keep all values in memory Only promote values to registers directly before they are
used Compiler back-end can remove loads and stores
• Compilers for RISC machines usually use register-to-register Reflects programming model Easier to determine when registers are used
The Rest of the Story…
Representing the code is only part of an IR
There are other necessary components• Symbol table (already discussed)• Constant table
Representation, type Storage class, offset
• Storage map Overall storage layout Overlap information Virtual register assignments
The Procedure as a Control Abstraction
Procedures have well-defined control-flow
The Algol-60 procedure call• Invoked at a call site, with some set of actual
parameters • Control returns to call site, immediately after invocation
The Procedure as a Control Abstraction
Procedures have well-defined control-flow
The Algol-60 procedure call• Invoked at a call site, with some set of actual
parameters • Control returns to call site, immediately after invocation
int p(a,b,c) int a, b, c;{ int d; d = q(c,b); ...}
…s = p(10,t,u);…
The Procedure as a Control Abstraction
Procedures have well-defined control-flow
The Algol-60 procedure call• Invoked at a call site, with some set of actual
parameters • Control returns to call site, immediately after invocation
int p(a,b,c) int a, b, c;{ int d; d = q(c,b); ...}
int q(x,y) int x,y;{ return x + y;}
…s = p(10,t,u);…
The Procedure as a Control Abstraction
Procedures have well-defined control-flow
The Algol-60 procedure call• Invoked at a call site, with some set of actual
parameters • Control returns to call site, immediately after invocation
int p(a,b,c) int a, b, c;{ int d; d = q(c,b); ...}
int q(x,y) int x,y;{ return x + y;}
…s = p(10,t,u);…
The Procedure as a Control Abstraction
Procedures have well-defined control-flow
The Algol-60 procedure call• Invoked at a call site, with some set of actual parameters • Control returns to call site, immediately after invocation
int p(a,b,c) int a, b, c;{ int d; d = q(c,b); ...}
int q(x,y) int x,y;{ return x + y;}
…s = p(10,t,u);…
The Procedure as a Control Abstraction
Procedures have well-defined control-flow
The Algol-60 procedure call• Invoked at a call site, with some set of actual
parameters • Control returns to call site, immediately after invocation
• Most languages allow recursion
int p(a,b,c) int a, b, c;{ int d; d = q(c,b); ...}
int q(x,y) int x,y;{ return x + y;}
…s = p(10,t,u);…
The Procedure as a Control Abstraction
Implementing procedures with this behavior• Requires code to save and restore a “return address”
• Must map actual parameters to formal parameters (cx, by)
• Must create storage for local variables (&, maybe, parameters) p needs space for d (&, maybe, a, b, & c) where does this space go in recursive invocations?
Compiler emits code that causes all this to happen at run time
int p(a,b,c) int a, b, c;{ int d; d = q(c,b); ...}
int q(x,y) int x,y;{ return x + y;}
…s = p(10,t,u);…
The Procedure as a Control Abstraction
Implementing procedures with this behavior• Must preserve p’s state while q executes
recursion causes the real problem here• Strategy: Create unique location for each procedure activation
Can use a “stack” of memory blocks to hold local storage and return addresses
Compiler emits code that causes all this to happen at run time
int p(a,b,c) int a, b, c;{ int d; d = q(c,b); ...}
int q(x,y) int x,y;{ return x + y;}
…s = p(10,t,u);…
The Procedure as a Name Space
Each procedure creates its own name space• Any name (almost) can be declared locally• Local names obscure identical non-local names• Local names cannot be seen outside the procedure
Nested procedures are “inside” by definition
• We call this set of rules & conventions “lexical scoping”
Examples• C has global, static, local, and block scopes (Fortran-
like) Blocks can be nested, procedures cannot
• Scheme has global, procedure-wide, and nested scopes (let) Procedure scope (typically) contains formal parameters
The Procedure as a Name Space
Why introduce lexical scoping? • Provides a compile-time mechanism for binding “free”
variables• Simplifies rules for naming & resolves conflictsHow can the compiler keep track of all those names?
The Problem• At point p, which declaration of x is current?• At run-time, where is x found?• As parser goes in & out of scopes, how does it delete x?
The Answer• Lexically scoped symbol tables (see §
5.7.3)
Do People Use This Stuff ?
C macro from the MSCP compiler
#define fix_inequality(oper, new_opcode) \ if (value0 < value1) \ { \ Unsigned_Int temp = value0; \ value0 = value1; \ value1 = temp; \ opcode_name = new_opcode; \ temp = oper->arguments[0]; \ oper->arguments[0] = oper->arguments[1]; \ oper->arguments[1] = temp; \ oper->opcode = new_opcode; \ }
Declares a new name
Do People Use This Stuff ?
C code from the MSCP implementation
static Void phi_node_printer(Block *block){ Phi_Node *phi_node; Block_ForAllPhiNodes(phi_node, block) { if (phi_node->old_name < register_count) { Unsigned_Int i;
fprintf(stderr, "Phi node for r%d: [", phi_node->old_name); for (i = 0; i < block->pred_count; i++) fprintf(stderr, " r%d",
phi_node->parms[i]); fprintf(stderr, " ] => r%d\n",
phi_node->new_name); }
else { Unsigned_Int2 *arg_ptr;
fprintf(stderr, "Phi node for %s: [", Expr_Get_String(Tag_Unmap(
phi_node->old_name))); Phi_Node_ForAllParms(arg_ptr, phi_node) fprintf(stderr, " %d", *arg_ptr); fprintf(stderr, " ] => %d\n",
phi_node->new_name); } }}
More local declarations!
Lexically-scoped Symbol Tables
The problem• The compiler needs a distinct record for each
declaration• Nested lexical scopes admit duplicate declarations
The interface• insert(name, level ) – creates record for name at level• lookup(name, level ) – returns pointer or index• delete(level ) – removes all names declared at level
Many implementation schemes have been proposed (see § B.4)
• We’ll stay at the conceptual level• Hash table implementation is tricky, detailed, & fun
Symbol tables are compile-time structures the compiler use to resolve references to names.
We’ll see the corresponding run-time structures that are used to establish addressability later.
§ 5.7 in EaC
Exampleprocedure p {
int a, b, cprocedure q {
int v, b, x, wprocedure r {
int x, y, z….
}procedure s {
int x, a, v…
}… r … s
}… q …
}
B0: {int a, b, c
B1: {int v, b, x, w
B2: {int x, y, z….
}B3: {
int x, a, v…
}…
}…
}
Lexically-scoped Symbol Tables
High-level idea• Create a new table for each scope• Chain them together for lookup
“Sheaf of tables” implementation
• insert() may need to create table
• it always inserts at current level
• lookup() walks chain of tables & returns first occurrence of name
• delete() throws away table for level p, if it is top table in the chain
If the compiler must preserve the table (for, say, the debugger ), this idea is actually practical.
Individual tables can be hash tables.
x
y
z
v
bxw
a
b
c
•
r
q
p...
...
Implementing Lexically Scoped Symbol Tables
Stack organization
abcvbxwxyz
growth
Implementation
• insert () creates new level pointer if needed and inserts at nextFree
• lookup () searches linearly from nextFree–1 forward
• delete () sets nextFree to the equal the start location of the level deleted.
Advantage
• Uses much less spaceDisadvantage
• Lookups can be expensive
p (level 0)
q (level 1)
r (level 2)
nextFree
Implementing Lexically Scoped Symbol Tables
Threaded stack organization
•
•
•
•
•
•
h(x)
abcvbxwxyz
growth
Implementation
• insert () puts new entry at the head of the list for the name
• lookup () goes direct to location
• delete () processes each element in level being deleted to remove from head of list
Advantage
• lookup is fast
Disadvantage
• delete takes time proportional to number of declared variables in level
p
q
r
The Procedure as an External Interface
OS needs a way to start the program’s execution
• Programmer needs a way to indicate where it begins The “main” procedure in most languaages
• When user invokes “grep” at a command line OS finds the executable OS creates a process and arranges for it to run “grep” “grep” is code from the compiler, linked with run-time
system Starts the run-time environment & calls “main” After main, it shuts down run-time environment &
returns
• When “grep” needs system services It makes a system call, such as fopen()
UNIX/Linux specific discussion
Where Do All These Variables Go?
Automatic & Local• Keep them in the procedure activation record or in a
register• Automatic lifetime matches procedure’s lifetimeStatic • Procedure scope storage area affixed with procedure
name &_p.x
• File scope storage area affixed with file name• Lifetime is entire executionGlobal• One or more named global data areas• One per variable, or per file, or per program, …• Lifetime is entire execution
Placing Run-time Data Structures
Classic Organization
• Code, static, & global data have known size
Use symbolic labels in the code
• Heap & stack both grow & shrink over time
• This is a virtual address space
• Better utilization if stack & heap grow toward each other
• Very old result (Knuth)
• Code & data separate or interleaved
• Uses address space, not allocated memory
Code
S Gt la & ot bi ac l
Stack
Heap
Single Logical Address Space0 high
How Does This Really Work?
The Big Picture
Code
S Gt la & ot bi ac l
Heap
Stack
Code
S Gt la & ot bi ac l
Heap
Stack
Code
S Gt la & ot bi ac l
Heap
Stack
...
...
Hardware’s view
Compiler’s view
OS’s view
Physical address space_
virtual address spaces
0 high
Code
S Gt la & ot bi ac l
Heap
Stack
Where Do Local Variables Live?
A Simplistic model• Allocate a data area for each distinct scope• One data area per “sheaf” in scoped table
What about recursion?• Need a data area per invocation (or activation) of a
scope• We call this the scope’s activation record• The compiler can also store control information there !
More complex scheme• One activation record (AR) per procedure instance• All the procedure’s scopes share a single AR (may share
space)
• Static relationship between scopes in single procedure
Used this way, “static” means knowable at compile time (and, therefore, fixed).
Translating Local Names
How does the compiler represent a specific instance of x ?• Name is translated into a static coordinate
< level,offset > pair “level” is lexical nesting level of the procedure “offset” is unique within that scope
• Subsequent code will use the static coordinate to generate addresses and references
• “level” is a function of the table in which x is found Stored in the entry for each x
• “offset” must be assigned and stored in the symbol table Assigned at compile time Known at compile time Used to generate code that executes at run-time
Storage for Blocks within a Single Procedure
• Fixed length data can always be at a constant offset from the beginning of a procedure In our example, the a declared at
level 0 will always be the first data element, stored at byte 0 in the fixed-length data area
The x declared at level 1 will always be the sixth data item, stored at byte 20 in the fixed data area
The x declared at level 2 will always be the eighth data item, stored at byte 28 in the fixed data area
But what about the a declared in the second block at level 2?
B0: {int a, b, c
B1: {int v, b, x,
wB2: {
int x, y, z
….}
B3: {int x,
a, v…
}…
}…
}
Variable-length Data
Arrays If size is fixed at compile time, store in
fixed-length data area If size is variable, store descriptor in
fixed length area, with pointer to variable length area
Variable-length data area is assigned at the end of the fixed length area for block in which it is allocated
B0: {int a, b… assign value
to aB1: {
int v(a), b, x
B2: {int x,
y(8)….
}
a b v b x x y(8) v(a)
Variable-length dataIncludes variable length data for all blocks in the procedure …
Activation Record Basics
parameters
register save area
return value
return address
addressability
caller’s ARP
local variables
ARP
Space for parameters to the current routine
Saved register contents
If function, space for return value
Address to resume caller
Help with non-local accessTo restore caller’s AR on a return
Space for local values & variables (including spills)
One AR for each invocation of a procedure
Activation Record Details
How does the compiler find the variables?• They are at known offsets from the AR pointer
• The static coordinate leads to a “loadAI” operation Level specifies an ARP, offset is the constant
Variable-length data• If AR can be extended, put it below local variables• Leave a pointer at a known offset from ARP• Otherwise, put variable-length data on the heap
Initializing local variables• Must generate explicit code to store the values• Among the procedure’s first actions
Activation Record Details
Where do activation records live?• If lifetime of AR matches lifetime of invocation, AND
• If code normally executes a “return” Keep ARs on a stack
• If a procedure can outlive its caller, OR• If it can return an object that can reference its execution
state ARs must be kept in the heap
• If a procedure makes no calls AR can be allocated statically
Efficiency prefers static, stack, then heap
Code
S Gt la & ot bi ac l
Heap
Stack
Yes! This stack.
Communicating Between Procedures
Most languages provide a parameter passing mechanism Expression used at “call site” becomes variable in callee
Two common binding mechanisms• Call-by-reference passes a pointer to actual parameter
Requires slot in the AR (for address of parameter) Multiple names with the same address?
• Call-by-value passes a copy of its value at time of call Requires slot in the AR Each name gets a unique location (may have
same value) Arrays are mostly passed by reference, not value
• Can always use global variables …
call fee(x,x,x);
Establishing Addressability
Must create base addresses• Global & static variables
Construct a label by mangling names (i.e., &_fee)
• Local variables Convert to static data coordinate and use ARP + offset
• Local variables of other procedures Convert to static coordinates Find appropriate ARP Use that ARP + offset { Must find the right AR
Need links to nameable ARs
Establishing Addressability
Using access links• Each AR has a pointer to AR of lexical ancestor• Lexical ancestor need not be the caller
• Reference to <p,16> runs up access link chain to p• Cost of access is proportional to lexical distance
parameters
register save area
return value
return address
access link
caller’s ARP
local variables
ARP
parameters
register save area
return value
return address
access link
caller’s ARP
local variables
parameters
register save area
return value
return address
access link
caller’s ARP
local variables
Some setup cost
on each call
Establishing Addressability
Using access links
Access & maintenance cost varies with levelAll accesses are relative to ARP (r0 )
Assume• Current lexical level is 2• Access link is at ARP - 4
Maintaining access link• Calling level k+1 Use current ARP as link• Calling level j < k Find ARP for j –1 Use that ARP as link
Establishing Addressability
Using a display• Global array of pointer to nameable ARs • Needed ARP is an array access away
• Reference to <p,16> looks up p’s ARP in display & adds 16
• Cost of access is constant (ARP + offset)
ARP
parameters
register save area
return value
return address
saved ptr.
caller’s ARP
local variables
parameters
register save area
return value
return address
saved ptr.
caller’s ARP
local variables
level 0
level 1
level 2
level 3
Display
parameters
register save area
return value
return address
saved ptr.
caller’s ARP
local variables
Some setup cost
on each call
Establishing Addressability
Using a display
Access & maintenance costs are fixedAddress of display may consume a register
Assume• Current lexical level is 2• Display is at label _disp
Maintaining access link• On entry to level j Save level j entry into AR (Saved Ptr field) Store ARP in level j slot• On exit from level j Restore level j entry
Desired AR is at _disp + 4 x level
Establishing Addressability
Access links versus Display• Each adds some overhead to each call• Access links costs vary with level of reference
Overhead only incurred on references & calls If ARs outlive the procedure, access links still work
• Display costs are fixed for all references References & calls must load display address Typically, this requires a register
(rematerialization)
Your mileage will vary• Depends on ratio of non-local accesses to calls• Extra register can make a difference in overall speed
For either scheme to work, the compiler mustinsert code into each procedure call & return
Procedure Linkages
How do procedure calls actually work?• At compile time, callee may not be available for
inspection Different calls may be in different compilation units Compiler may not know system code from user code All calls must use the same protocol
Compiler must use a standard sequence of operations• Enforces control & data abstractions• Divides responsibility between caller & calleeUsually a system-wide agreement (for
interoperability)
Procedure Linkages
Standard procedure linkage
procedure p
prolog
epilog
pre-call
post-return
procedure q
prolog
epilog
Procedure has
• standard prolog
• standard epilog
Each call involves a
• pre-call sequence
• post-return sequence
These are completely predictable from the call site depend on the number & type of the actual parameters
Procedure Linkages
Pre-call Sequence• Sets up callee’s basic AR
• Helps preserve its own environment
The Details• Allocate space for the callee’s AR
except space for local variables
• Evaluates each parameter & stores value or address• Saves return address, caller’s ARP into callee’s AR
• If access links are used Find appropriate lexical ancestor & copy into callee’s AR
• Save any caller-save registers Save into space in caller’s AR
• Jump to address of callee’s prolog code
Procedure Linkages
Post-return Sequence• Finish restoring caller’s environment • Place any value back where it belongs
The Details• Copy return value from callee’s AR, if necessary• Free the callee’s AR
• Restore any caller-save registers• Restore any call-by-reference parameters to registers, if
needed Also copy back call-by-value/result parameters
• Continue execution after the call
Procedure Linkages
Prolog Code• Finish setting up the callee’s environment• Preserve parts of the caller’s environment that will be
disturbed
The Details• Preserve any callee-save registers• If display is being used
Save display entry for current lexical level Store current ARP into display for current lexical level
• Allocate space for local data Easiest scenario is to extend the AR
• Find any static data areas referenced in the callee• Handle any local variable initializations
With heap allocated AR, may need to use a separate heap object for local variables
Procedure Linkages
Epilog Code• Wind up the business of the callee• Start restoring the caller’s environment
The Details• Store return value? No, this happens on the return
statement• Restore callee-save registers• Free space for local data, if necessary (on the heap)• Load return address from AR
• Restore caller’s ARP
• Jump to the return address
If ARs are stack allocated, this may not be necessary. (Caller can reset stacktop to its pre-call value.)