Top Banner
CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501 G-1
33

CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Jan 11, 2016

Download

Documents

Wilfrid Glenn
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

CSE P501 – Compiler Construction

Parser Semantic Actions Intermediate Representations

AST Linear

Next

Spring 2014 Jim Hogg - UW - CSE - P501 G-1

Page 2: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Parts of a Compiler

Spring 2014 Jim Hogg - UW - CSE - P501 G-2

Source TargetFront End Back End

Scan

chars

tokens

AST

IR

AST = Abstract Syntax Tree

IR = Intermediate Representation

‘Middle End’

Optimize

Select Instructions

Parse

Semantics

Allocate Registers

Emit

Machine Code

IR

IR

IR

IR

IR

Convert

AST

Page 3: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-3

What does a Parser Do?

So far, we have discussed only Recognizers, which accept or reject a program as valid. Need to do more to be useful

Idea: at significant points during parse, perform a semantic action

Typically at an LR reduce step; or a convenient point in LL Attached to parser code - "syntax-directed translation"

Typical semantic actions Compiler: build and return a representation of the chunk of

input we have parsed so far (typically, AST) Interpreter: compute and return result

Page 4: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Intermediate Representations

In most compilers, the parser builds an intermediate representation (IR) of the program

“intermediate” between source & target

Rest of the compiler transforms the IR to improve (“optimize”) it; eventually translate to final code

Sometimes use multiple forms of IR on the way Eg: Muchnick’s HIR, MIR, LIR (“lowering”) Choosing the 'right' IR is crucial!

Some general examples now; specific examples as we cover later topics

Spring 2014 Jim Hogg - UW - CSE - P501 G-4

Page 5: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

IR Design

Decisions affect speed and efficiency of the compiler. Difficult to change later!

Desirable properties Easy to generate Easy to manipulate, for analysis and transformations Expressive Appropriate level of abstraction Efficient & compact (memory footprint; cache performance)

Different tradeoffs depending on compiler goals eg: Throughput versus Code Quality; JIT versus 'batch'

Different tradeoffs in different parts of the same compiler

Spring 2014 Jim Hogg - UW - CSE - P501 G-5

Page 6: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

IR Design Taxonomy

Structure Graphical (trees, DAGs, etc) Linear (code for some abstract machine) Hybrids are common (eg: Flowgraphs)

Abstraction Level High-level, near to source language Low-level, closer to machine

Spring 2014 Jim Hogg - UW - CSE - P501 G-6

Page 7: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-7

Levels of Abstraction

Key design decision: how much detail to expose

Affects possibility and profitability of various optimizations Structural IRs are typically fairly high-level Linear IRs are typically low-level

Page 8: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-8

Examples: Array Reference

t1 A[i, j]

loadI 1 => r1

sub rj,r1 => r2

loadI 10 => r3

mult r2,r3 => r4

sub ri,r1 => r5

add r4,r5 => r6

loadI @A => r7

add r7,r6 => r8

load r8 => r9

subscript

A i j

AST

High-level, linear IR

Low-level, linear IRA[i, j]Sourc

e

Page 9: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-9

Structural IRs

Typically reflect source (or other higher-level) language structure

Tend to be large - all those grammar NonTerminals

Examples: syntax trees, DAGs

Generally used in early phases of compilers

Page 10: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-10

Concrete Syntax Trees

Also called “Parse Trees”

Useful for IDE; syntax coloring; refactoring; source-to-source translation (which also, retains comments)

Full grammar is needed to guide the parser; but once parsed, we don’t need:

NonTerminals used to define associativity & precedence (recall E, T, F in Expression Grammar)

NonTerminals for every production Punctuation, such as ( ) { } - these help us express a tree

structure in linear text format (consider XML and LISP)

Page 11: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-11

Syntax Tree Example

Concrete syntax for x=2*(n+m);

A id = EE E + T | E – T | TT T F | T F | F F int | id | ( E )

Page 12: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 G-12

Parse Tree: x = 2 (m + n)A id = EE E + T | E – T | TT T F | T F | F F int | id | ( E )

E

T

T F

F

int

2

E + T

T

F

id

m

F

id

n

A

id

x

= Denotes a node that survives into the AST

( )E

Page 13: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Abstract Syntax Trees

Want only essential structural information Omit extraneous junk

Can be represented explicitly as a tree or in a linear form

Example: LISP/Scheme S-expressions are essentially ASTs

Common output from parser; used for static semantics (type checking, etc) and high-level optimizations

Usually lowered for later compiler phases

Spring 2014 Jim Hogg - UW - CSE - P501 G-13

Page 14: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

G-14

Parse Tree & ASTA id = EE E + T | E – T | TT T F | T F | F F int | id | ( E )

int:2 +

id:m

id:x

=

id:n

x = 2 (m + n)

Think of each box as a Java object

Spring 2014 Jim Hogg - UW - CSE - P501

E

T

T F

F

int

2

E + T

T

F

id

m

F

id

n

A

id

x

=

( )E

Page 15: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Directed Acyclic Graphs

DAGs often used to identify common sub-expressions (CSEs)

Not necessarily a primary representation, compiler might build dag then translate back after some code improvement

Leaves = operands Interior nodes = operators

Spring 2014 Jim Hogg - UW - CSE - P501 G-15

Page 16: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Expression DAG example

DAG for a + a * (b – c) + (b – c) * d

Spring 2014 Jim Hogg - UW - CSE - P501 G-16

Page 17: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

AST for a + a * (b – c) + (b – c) * d

Spring 2014 Jim Hogg - UW - CSE - P501 G-17

id:a -

id:b

id:a

+

id:c

+

-

id:b id:c

id:d

Duplicated Nodes

Page 18: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-18

id:a -

id:b

id:a

+

id:c

+

id:d

DAG for a + a * (b – c) + (b – c) * d

When we come to generate code (compiler) or evaluate (interpreter), we will process the green nodes only once. Example of Constant Sub-Expression Elimination (loosely called “CSE”, altho' it should really be "CSEE")

id:a -

id:b

id:a

+

id:c

+

-

id:b id:c

id:d

Original AST

'Folded' AST or DAG

Page 19: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-19

Linear IRs

Pseudo-code for some abstract machine

Level of abstraction varies Eg: t a[i, j] rather than @a + 4 * (i * numcols + j) Eg: no registers, just variables & temps

Simple, compact data structures Commonly used: arrays, linked lists

Examples Three-Address Code (TAC) – ADD t123, b, c Stack machine code – push a; push b; add

Page 20: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

IRs for a[i, j+2]

Medium-levelt1 j + 2t2 i * 20t3 t1 + t2t4 4 * t3t5 addr at6 t5 + t4t7 *t6

Low-levelr1 [fp-4]r2 r1 + 2r3 [fp-8]r4 r3 * 20r5 r4 + r2r6 4 * r5r7 fp – 216f1 [r7+r6]

Spring 2014 Jim Hogg - UW - CSE - P501 G-20

High-levela[i, j+2]

Akin to source code

Spells out 2-D indexing. Defines temps - like virtual registers • Full detail

• Actual machine registers• Actual locations (no

variable names)

Page 21: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Abstraction Level Tradeoffs

High-level: good for source optimizations; semantic checking; refactoring

Medium-level: great for machine-independent optimizations. Many (all?) optimizing compilers work at this level

Low-level: required for actual memory (frame) layout; target instruction selection; register allocation; peephole (target-specific) optimizations

Examples: Cooper&Torczon "ILOC" LLVM - http://llvm.org SUIF - http://suif.stanford.edu

Spring 2014 Jim Hogg - UW - CSE - P501 G-21

Page 22: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-22

Three-Address Code (TAC)

Usual form: x y op z One operator Maximum of 3 names (Copes with: nullary x y and unary x op y)

Eg: x = 2 * (m + n) becomest1 m + n; t2 2 * t1; x t2

You may prefer: add t1, m, n; mul t2, 2, t1; mov x, t2 Invent as many new temp names as needed. “expression

temps” – don’t correspond to any user variables; de-anonymize expressions

Store in a quad(ruple) <lhs, rhs1, op, rhs2>

Page 23: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-23

Three Address Code

Advantages Resembles code for actual machines Explicitly names intermediate results Compact Often easy to rearrange

Various representations Quadruples, triples, SSA (Static Single

Assignment) We will see much more of this…

Page 24: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-24

Stack Machine Code Example

Hypothetical code for x = 2 * (m + n)

Compact: common opcodes just 1 byte wide; instructions have 0 or 1 operand

pushaddr xpushconst 2pushval npushval maddmultstore

@x2nm

@x2

m + n

@x2*(m+n)

? ? ? ?

Page 25: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Stack Machine Code

Originally used for stack-based computers (famous example: B5000, ~1961)

Also now used for virtual machines: UCSD Pascal – pcode Forth Java bytecode in a .class files (generated by Java compiler) MSIL in a .dll or .exe assembly (generated by C#/F#/VB compiler)

Advantages Compact; mostly 0-address opcodes (fast download over network) Easy to generate; easy to write a FrontEnd compiler, leaving the 'heavy

lifting' and optimizations to the JIT Simple to interpret or compile to machine code

Disadvantages Inconvenient/difficult to optimize directly Does not match up with modern chip architectures

Spring 2014 Jim Hogg - UW - CSE - P501 G-25

Page 26: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-26

Hybrid IRs

Combination of structural and linear

Level of abstraction varies

Most common example: control-flow graph Nodes: basic blocks Edge from B1 to B2 if execution can flow from B1

to B2

Page 27: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Basic Blocks: Starting Tuples

1 i = 12 j = 13 t1 = 10 * i4 t2 = t1 + j5 t3 = 8 * t26 t4 = t3 - 887 a[t4] = 08 j = j + 19 if j <= 10 goto #3

10 i = i + 111 if i <= 10 goto #212 i = 113 t5 = i - 114 t6 = 88 * t515 a[t6] = 116 i = i + 117 if i <= 10 goto #13

Jim Hogg - UW - CSE - P501 G-27

Typical "tuple stew" - IR generated by traversing an AST

Partition into Basic Blocks:• Sequence of consecutive instructions• No jumps into the middle of a BB• No jumps out of the middles of a BB• "I've started, so I'll finish"• (Ignore exceptions)

Page 28: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Basic Blocks: Leaders

1 i = 12 j = 13 t1 = 10 * i4 t2 = t1 + j5 t3 = 8 * t26 t4 = t3 - 887 a[t4] = 08 j = j + 19 if j <= 10 goto #3

10 i = i + 111 if i <= 10 goto #212 i = 113 t5 = i - 114 t6 = 88 * t515 a[t6] = 116 i = i + 117 if i <= 10 goto #13

Spring 2014 Jim Hogg - UW - CSE - P501 G-28

Identify Leaders (first instruction in a basic block):• First instruction is a leader• Any target of a branch/jump/goto• Any instruction immediately after a branch/jump/goto

Leaders in red. Why is each leader a leader?

Page 29: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Basic Blocks: Flowgraph

Jim Hogg - UW - CSE - P501 G-29

i = 1

j = 1

t1 = 10 * it2 = t1 + jt3 = 8 * t2t4 = t3 - 88a[t4] = 0j = j + 1if j <= 10 goto B3

B1

B2

B3

i = i + 1if i <= 10 goto B2

B4

i = 1B5

t5 = i - 1t6 = 88 * t5a[t6] = 1i = i + 1if i <= 10 goto B6

B6

EXIT

ENTRY

Control Flow Graph ("CFG", again!)

• 3 loops total• 2 of the loops are nested

Most of the executions likely spent in loop bodies; that's where to focus efforts at optimization

Page 30: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Basic Blocks: Recap

A maximal sequence of instructions entered at the first instruction and exited at the last

So, if we execute the first instruction, we execute all of them

No jumps/branches into the middle of a BB No jumps/branches out of the middle of a BB

We are ignoring exceptions!

Spring 2014 Jim Hogg - UW - CSE - P501 G-30

Page 31: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Identifying Basic Blocks: Recap

Perform linear scan of instruction stream

A basic blocks begins at each instruction that is:

The beginning of a method The target of a branch Immediately follows a branch or return

Spring 2014 Jim Hogg - UW - CSE - P501 G-31

Page 32: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE - P501 G-32

What IR to Use?

Common choice: all of them! AST used in early stages of the compiler

Closer to source code Good for semantic analysis Facilitates some higher-level optimizations, such as CSEE -

altho' this can be done equally well on linear IR

Lower to linear IR for optimization & codegen Closer to machine code Use to build control-flow graph Exposes machine-related optimizations

Hybrid (graph + linear IR) for dataflow

Page 33: CSE P501 – Compiler Construction Parser Semantic Actions Intermediate Representations AST Linear Next Spring 2014 Jim Hogg - UW - CSE - P501G-1.

Spring 2014 Jim Hogg - UW - CSE P501 G-33

Representing ASTs

Working with ASTs Where do the algorithms go? Is it really object-oriented? Visitor pattern

Semantic analysis, type-checking, and symbol tables

Next