Top Banner
Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.” Compilers: Principles, Techniques & Tools”, 2 nd edition, Pearson Education, Inc, 2007. pp 558-563. “The LLVM Target-Independent Code Generator: Instruction Selection.”
134

Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Dec 14, 2015

Download

Documents

Braydon Ozanne
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction Selection Presented byHuang Kuo-An, Lu Kuo-ChangSubproject 3

A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.” Compilers: Principles, Techniques & Tools”, 2nd edition, Pearson Education, Inc, 2007. pp 558-563.

“The LLVM Target-Independent Code Generator: Instruction Selection.” http://llvm.org/docs/CodeGenerator.html#instselect

Page 2: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Outline

•Introducing LLVM•Instruction Selection

▫Tree Rewriting•Why we use LLVM?•Progress

Page 3: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Introducing LLVM

•The LLVM compiler infrastructure ▫Provides modular & reusable components.▫Reduces the time & cost to build a

particular compiler.▫Those components shared across different

compiles.

Page 4: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

LLVM

IR

The Steps of the LLVM Compiler

Language Front-endLanguage Front-end

C

C++

Page 5: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

LLVM

IR

The Steps of the LLVM Compiler

Language Front-endLanguage Front-end

C

C++

either one

Page 6: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

LLVM

IR

The Steps of the LLVM Compiler

Language Front-endLanguage Front-end

C

C++

An intermediate representation:Lower than the high level language (simple instructions, no for loops, etc)

Higher than the machine code(no opcodes, no registers, etc)

Page 7: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

LLVM

IR

The Steps of the LLVM Compiler

Language Front-endLanguage Front-end

C

C++

An intermediate representation:Lower than the high level language (simple instructions, no for loops, etc)

Higher than the machine code(no opcodes, no registers, etc)

source language

independent

target processor

independent

Page 8: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

LLVM

IR

The Steps of the LLVM Compiler

Language Front-endLanguage Front-end

Mid-level OptimizerMid-level Optimizer

LLVM

IR

C

C++

Page 9: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

LLVM

IR

The Steps of the LLVM Compiler

Language Front-endLanguage Front-end

Mid-level OptimizerMid-level Optimizer

LLVM

IR

C

C++

Code Generatio

n

Code Generatio

n

.s file

executable

Page 10: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

LLVM

IR

The Steps of the LLVM Compiler

Language Front-endLanguage Front-end

Mid-level OptimizerMid-level Optimizer

LLVM

IR

C

C++

Code Generatio

n

Code Generatio

n

.s file

executable

Instruction

Selection

Instruction

Selection

Schedulin

g

Schedulin

g

Register AllocationRegister

Allocation

Machine-specific

Optimizations

Machine-specific

Optimizations

Code Emission

Code Emission

Target Machine Instructions

LLVM IR

Page 11: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction Selection

How does the com-piler translate a C instruction like this:

Into machine code like this:

a[i] = b+1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1ST *R0, R1

Page 12: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction Selection

How does the com-piler translate a C instruction like this:

Into machine code like this:

a[i] = b+1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1ST *R0, R1

First Answer: break it into two steps

Page 13: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction Selection

How does the com-piler translate a C instruction like this:

Into machine code like this:

a[i] = b+1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1ST *R0, R1

The intermediate representation (IR):

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

Ca Rsp

First Answer: break it into two steps

Page 14: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction Selection

Into machine code like this:

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1ST *R0, R1

The intermediate representation (IR):

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

Ca Rsp

Page 15: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction Selection

Into machine code like this:

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1ST *R0, R1

New question: How to go from IR to machine code?

The intermediate representation (IR):

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

Ca Rsp

Page 16: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction Selection

One answer: use tree rewriting

Into machine code like this:

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1ST *R0, R1

The intermediate representation (IR):

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

Ca Rsp

Page 17: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

Ca Rsp

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 18: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

Ca Rsp

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 19: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

Ca Rsp

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 20: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

R0 Rsp

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 21: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

R0 Rsp

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

LD R0, #a

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 22: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

R0 Rsp

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

LD R0, #a

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 23: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind+

+

Ci Rsp

R0 Rsp

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

LD R0, #a

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 24: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind

+

Ci Rsp

R0

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

LD R0, #a

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 25: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind

+

Ci Rsp

R0

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

LD R0, #aADD R0, R0, SP

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 26: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind

+

Ci Rsp

R0

Ri Ca

Ri Mx

M =

Mx Ri

Ri ind

Ca Rj

+

M =

ind Rj

Ri

Ri

ind

Ca Rj

+

+

Ri

Ri +

Ri Rj

Ri +

Ri C1

LD R0, #aADD R0, R0, SP

{LD Ri, #a}

{LD Ri, x}

{ST x, Ri}

{LD Ri, a(Rj)}

{ST *Ri, Rj}

{ADD Ri, Ri, a(Rj)}

{ADD Ri, Ri, Rj}

{INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 27: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind

+

Ci Rsp

R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SP

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 28: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1+

ind

+

Ci Rsp

R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SP

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 29: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SP

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 30: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 31: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 32: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

Mb

+

=

C1R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 33: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

R1

+

=

C1R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 34: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

R1

+

=

C1R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, b

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 35: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

R1

+

=

C1R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, b

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 36: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind

R1

+

=

C1R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, b

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 37: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind R1

=

R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, b

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 38: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind R1

=

R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 39: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind R1

=

R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 40: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

ind R1

=

R0

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 41: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

M

Page 42: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1ST *R0, R1

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

M

Page 43: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Tree Rewriting

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

Ri C1

LD R0, #aADD R0, R0, SPADD R0, R0, i(SP)LD R1, bINC R1ST *R0, R1

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Page 44: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

ld Ri, #a

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Ri C1

But actually, something is missing…

The IR immediate value, #a, does not have a size limit, but the actual machine has a limited number of bits for the immediate value (let’s say, 16 bits)

Page 45: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

ld Ri, #a (a≤FFFF)

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Ri C1

But actually, something is missing…

So we ought to state that this tree rewriting rule only applies when the immediate value can be expressed in 16 bits (ie, a≤FFFF)

Page 46: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Ri Ca {LD Ri, #a}

Ri Mx {LD Ri, x}

M = {ST x, Ri}

Mx Ri

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

ld Ri, #a (a≤FFFF)

ld Ri, x

st x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Ri C1

But what about if a cannot be expressed in 16 bits ?

Then we need a new rule:

But actually, something is missing…

Page 47: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Ri

Ri {LD Ri, #a}

Ri Mx {LD Ri, x}M = {ST x, Ri}

Mx

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

ld Ri, #a (a≤FFFF)

ld Ri, xst x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Ri C1

Ca

But what about if a cannot be expressed in 16 bits ?

Then we need a new rule:

But actually, something is missing…

Page 48: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Ri

Ri {LD Ri, #a}

Ri Mx {LD Ri, x}M = {ST x, Ri}

Mx

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

ld Ri, #a (a≤FFFF)

ld Ri, xst x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Ri C1

ld Ri, #a (a>FFFF)

Ca

But what about if a cannot be expressed in 16 bits ?

Then we need a new rule:

But actually, something is missing…

Page 49: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Ri

Ri {LD Ri, #a}

Ri Mx {LD Ri, x}M = {ST x, Ri}

Mx

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

ld Ri, #a (a≤FFFF)

ld Ri, xst x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Ri C1

ld Ri, #a (a>FFFF)

Ca

But actually, something is missing…

The problem is the target processor does not have an instruction for 32-bit immediates. Instead, a set of machine instructions is needed. We call this set a pattern.

Page 50: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Ri

Ri

Ca

{LD Ri, #a}

Ri Mx {LD Ri, x}M = {ST x, Ri}

Mx

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

ld Ri, #a (a≤FFFF)

ld Ri, xst x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Ri C1

Ri {LD Ri, low16(#a) LD Rj, high16(#a)SHR Rj, Rj, #16ADD Ri, Ri, Rj}

ld Ri, #a (a>FFFF)

Ca

But actually, something is missing…

The problem is the target processor does not have an instruction for 32-bit immediates. Instead, a set of machine instructions is needed. We call this set a pattern.

Page 51: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1

Kinds of the tree rewriting rules

Page 52: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

Kinds of the tree rewriting rules

Page 53: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

•Many-to-One add Rx,Rj ,#a add Ri ,Ri ,Rx

Kinds of the tree rewriting rules

Page 54: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

•Many-to-One add Rx,Rj ,#a ADD Ri,Ri,a(Rj) add Ri ,Ri ,Rx

Kinds of the tree rewriting rules

Page 55: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

•Many-to-One add Rx,Rj ,#a ADD Ri,Ri,a(Rj) add Ri ,Ri ,Rx

•One-to-Many ld Ri, #a (a>0xFFFF)

Kinds of the tree rewriting rules

Page 56: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

•Many-to-One add Rx,Rj ,#a ADD Ri,Ri,a(Rj) add Ri ,Ri ,Rx

•One-to-Many ld Ri, #a (a>0xFFFF) LD Ri, low16(#a)

LD Rj, high16(#a)SHL Rj, #16ADD Ri, Ri, Rj

Kinds of the tree rewriting rules

Page 57: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

So, what’s the point?

To design an instruction selector, you do not need to write a program. Just define a set of rewriting rules.

Page 58: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Ri

Ri

Ca

{LD Ri, #a}

Ri Mx {LD Ri, x}M = {ST x, Ri}

Mx

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

ld Ri, #a (a≤FFFF)

ld Ri, xst x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Ri C1

Ri {LD Ri, low16(#a) LD Rj, high16(#a)SHR Rj, Rj, #16ADD Ri, Ri, Rj}

ld Ri, #a (a>FFFF)

Ca

So, what’s the point?

To design an instruction selector, you do not need to write a program. Just define a set of rewriting rules.

Page 59: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Ri

Ri

Ca

{LD Ri, #a}

Ri Mx {LD Ri, x}M = {ST x, Ri}

Mx

Ri ind {LD Ri, a(Rj)}

Ca Rj

+

M = {ST *Ri, Rj}

ind Rj

Ri

Ri

ind

{ADD Ri, Ri, a(Rj)}

Ca Rj

+

+

Ri

Ri + {ADD Ri, Ri, Rj}

Ri Rj

Ri + {INC Ri}

ld Ri, #a (a≤FFFF)

ld Ri, xst x, Ri

st *Ri, Rj

add Rx, Rj, #ald Ri, Rx

add Rx, Rj, #aadd Ri, Ri, Rx

add Ri, Ri, Rj

add Ri, Ri, #1

Ri C1

Ri {LD Ri, low16(#a) LD Rj, high16(#a)SHR Rj, Rj, #16ADD Ri, Ri, Rj}

ld Ri, #a (a>FFFF)

Ca

So, what’s the point?

To design an instruction selector, you do not need to write a program. Just define a set of rewriting rules.

Then use an existing instruction selection program to apply your set of rules. The LLVM compiler has such a selector.

Page 60: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionSuppose you want to use the LLVM compiler to create PowerPC code.The PowerPC has a single-precision floating point add instruction:

FADDS T1, X, YHow can we allow the LLVM compiler to generate FADDS instructions?We need to create a tree rewriting rule in the LLVM format:

Page 61: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionSuppose you want to use the LLVM compiler to create PowerPC code.The PowerPC has a single-precision floating point add instruction:

FADDS T1, A, BQ:How can we allow the LLVM compiler to generate FADDS instructions?We need to create a tree rewriting rule in the LLVM format:

Page 62: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction Selection

……def FADDS:Aform_2<59, 21, (outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB), “FADDS $FRT, $FRA, $FRB”, [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;……

Instruction Selector

Suppose you want to use the LLVM compiler to create PowerPC code.The PowerPC has a single-precision floating point add instruction:

FADDS T1, A, BQ:How can we allow the LLVM compiler to generate FADDS instructions?A:We need to create a tree rewriting rule in the LLVM format:

FRA FRB

FRT + {FADDS FRT, FRA, FRB}fadd RT, RA, RB

Page 63: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionThe PowerPC also has a single-precision floating point multiply instruction:

FMULS T1, X, YSo we need to create a tree rewriting rule for it too:

Page 64: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

……def FADDS:Aform_2<59, 21, (outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB), “FADDS $FRT, $FRA, $FRB”, [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;def FMULS:Aform_3<59, 25, (outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB), “FMULS $FRT, $FRA, $FRB”, [(set F4RC:$FRT, (fmul F4RC:$FRA, F4RC:$FRB))]>;……

Instruction Selection

Instruction Selector

The PowerPC also has a single-precision floating point multiply instruction:

FMULS T1, X, YSo we need to create a tree rewriting rule for it too:

FRA FRB

FRT * {FMULS FRT, FRA, FRB}fmul RT, RA, RB

FRT + {FADDS FRT, FRA, FRB}fadd RT, RA, RB

FRA FRA

Page 65: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionWith these two rules, we could now generate PowerPC code for the following LLVM IR:

FRA FRB

FRT * {FMULS FRT, FRA, FRB}fmul RT, RA, RB

FRT + {FADDS FRT, FRA, FRB}fadd RT, RA, RB

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

FRA FRB

Page 66: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionBut wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS? Broken compiler? No. Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.) Bad compiler? Yes. Why? FMADDS will never be used, and its faster than FMULS +FADDS

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

Page 67: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionBut wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS? Broken compiler? No. Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.) Bad compiler? Yes. Why? FMADDS will never be used, and its faster than FMULS +FADDS

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

Page 68: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionBut wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS? Broken compiler? No. Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.) Bad compiler? Yes. Why? FMADDS will never be used, and its faster than FMULS +FADDS

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

Page 69: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionBut wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS? Broken compiler? No. Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.) Bad compiler? Yes. Why? FMADDS will never be used, and its faster than FMULS +FADDS

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

Page 70: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionBut wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS? Broken compiler? No. Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.) Bad compiler? Yes. Why? FMADDS will never be used, and its faster than FMULS +FADDS

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

Page 71: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionBut wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS? Broken compiler? No. Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.) Bad compiler? Yes. Why? FMADDS will never be used, and its faster than FMULS +FADDS

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

Page 72: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionBut wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS? Broken compiler? No. Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.) Bad compiler? Yes. Why? FMADDS will never be used, and its faster than FMULS +FADDS

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

Page 73: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionBut wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS? Broken compiler? No. Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.) Bad compiler? Yes. Why? FMADDS will never be used, and its faster than FMULS +FADDS

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

Page 74: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction SelectionBut wait! PowerPC has the FMADDS instruction that performs both a multiply and an add. Why didn’t the compiler choose that instruction?

Because no tree rewriting rule was defined for FMADDS.

What are the consequences of not giving a rule for FMADDS? Broken compiler? No. Why not? Because the FMADDS instruction’s function can also be performed by other PowerPC instructions that were defined.

(But, if FADDS was not defined the compiler would be broken.) Bad compiler? Yes. Why? FMADDS will never be used, and its faster than FMULS +FADDS

fadd:f32 X, Y FADDS t2, t1, Z

%t1 = mul float %X, %Y%t2 = add float %t1, %Z

fmul:f32 X, Y FMULS t1, X, Y

Page 75: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

……def FADDS:Aform_2<59, 21, (outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB), “FADDS $FRT, $FRA, $FRB”, [(set F4RC:$FRT, (fadd F4RC:$FRA, F4RC:$FRB))]>;def FMULS:Aform_3<59, 25, (outs F4RC:$FRT), (ins F4RC:$FRA, F4RC:$FRB), “FMULS $FRT, $FRA, $FRB”, [(set F4RC:$FRT, (fmul F4RC:$FRA, F4RC:$FRB))]>;def FMADDS:Aform_1<59, 29, (ops F4RC:$FRT, F4RC:$FRA, F4RC:$FRC, F4RC:$FRB), “FMADDS $FRT, $FRA, $FRC, $FRB”, [(set F4RC:$FRT, (fadd (fmul F4RC:$FRA, F4RC:$FRC), F4RC:$FRB))]>;……

Instruction Selection

Instruction Selector

We can add a new rule for the PowerPC’s multiply and add instruction:

FMADDS T1, A, B, C

FRA FRB

FRT * {FMULS FRT, FRA, FRB}fmul RT, RA, RB

FRT + {FADDS FRT, FRA, FRB}fadd RT, RA, RB

FRA FRC

FRT

*

{FMADDS FRT, FRA, FRB, FRC}fmul RT1, RA, RC

fadd RT2, RT1, RB

+

FRB

FRA FRB

Page 76: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

•Many-to-One add Rx,Rj ,#a ADD Ri,Ri,a(Rj) add Ri ,Ri ,Rx

•One-to-Many ld Ri, #a (a>0xFFFF) LD Ri, low16(#a)

LD Rj, high16(#a)SHL Rj, #16ADD Ri, Ri, Rj

3 Kinds of the tree rewriting rules

Page 77: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

•Many-to-One add Rx,Rj ,#a ADD Ri,Ri,a(Rj) add Ri ,Ri ,Rx

•One-to-Many ld Ri, #a (a>0xFFFF) LD Ri, low16(#a)

LD Rj, high16(#a)SHL Rj, #16ADD Ri, Ri, Rj

3 Kinds of the tree rewriting rules FMADDS is

a Many-to-

One

Page 78: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

•Many-to-One add Rx,Rj ,#a ADD Ri,Ri,a(Rj) add Ri ,Ri ,Rx

•One-to-Many ld Ri, #a (a>0xFFFF) LD Ri, low16(#a)

LD Rj, high16(#a)SHL Rj, #16ADD Ri, Ri, Rj

3 Kinds of the tree rewriting rules FMADDS is

not needed for a basic

compiler

Page 79: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

•Many-to-One add Rx,Rj ,#a ADD Ri,Ri,a(Rj) add Ri ,Ri ,Rx

•One-to-Many ld Ri, #a (a>0xFFFF) LD Ri, low16(#a)

LD Rj, high16(#a)SHL Rj, #16ADD Ri, Ri, Rj

3 Kinds of the tree rewriting rules Infact, many-

to-ones can all

be skipped.

Page 80: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•One-to-One add R1,R1,#1 INC Ri

•Many-to-One add Rx,Rj ,#a ADD Ri,Ri,a(Rj) add Ri ,Ri ,Rx

•One-to-Many ld Ri, #a (a>0xFFFF) LD Ri, low16(#a)

LD Rj, high16(#a)SHL Rj, #16ADD Ri, Ri, Rj

3 Kinds of the tree rewriting rules

Page 81: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

We will use the LLVM compiler

Because:•It has good optimizations

•It has good documentation

•It is designed to be a little bit easier to retarget to a new processor

•It was the compiler used by subproject 3, year 1 – so there is some infrastructure

Page 82: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

But there are some difficulties with the LLVM compiler

Because:•It compiles C, not OpenGL 2.0

•Though it has backends for several processors, none of them are SIMD

•So, the LLVM IR is not SIMD

Page 83: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

How we will use the LLVM compiler

Our work is in two parallel paths:•Fast track: uses Subproject 2’s code to

convert OPENGL to C•Slow track: use Subproject 3 year 1’s code

to generate SIMD instructions in the LLVM IR

Page 84: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

A quick reminder• OpenGL 2.0 code is stored in a string array.

• It is not compiled until the game is actually running.

• At some point during the running of the game, the game calls glCompileShader, which takes the string array as an input argument and returns an object file.

• Maybe the player entered a new level, and the new level has brick walls. But the previous level did not have brick walls, so the graphics processor does not have a rule for how to render bricks.

• The brick shader must be compiled, linked, and loaded to the graphics processor. • This is accomplished through 3 operating system calls from within the game

• glCompileShader(…)• glLinkProgram(…)• glUseProgram(…)

• Our current work is only on the implementation of glCompileShader.

• glCompileShader is a program that runs on the ARM processor, when called by the ARM’s OS.

• So, our compiler (which is written in C++) is compiled into an ARM executable. But when this compiler executable is run, it generates a shader executable.

Page 85: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse +

SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

A sample OpenGL codeHere is some shader code:

shader string array

Page 86: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

void AddBrickFragments(GLuint currentProgram) {

GLuint brickFS = glCreateShader(GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray, NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);}

A sample compilation triggerAnd here is a function inside of the game that compiles and loads the shader:

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

Page 87: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

void AddBrickFragments(GLuint currentProgram) {

GLuint brickFS = glCreateShader(GL_FRAGMENT_SHADER);

glShaderSource(brickFS, 1, brickStringArray, NULL);

glCompileShader(brickFS);

glAttachShader(currentProgram,brickFS);

glLinkProgram(currentProgram);

glUseProgram(currentProgram);}

A sample compilation triggerAnd here is a function inside of the game that compiles and loads the shader:

shader string array

game running on ARM

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

void AddBrickFragments(Gluint currentProgram) { GLuint brickFS =glCreateShader( GL_FRAGMENT_SHADER);glShaderSource(brickFS, 1, brickStringArray, NULL);glCompileShader(brickFS);glAttachShader(currentProgram,brickFS); glLinkProgram(currentProgram);glUseProgram(currentProgram);}

Page 88: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

The fast track compiler process1. So now, as the game runs, this call

to glCompileShader happens2. Then the ARM processor calls the

LLVM compiler, passing in this code for compilation

3. The LLVM compiler then:1. Runs Proj2Converter to make C code2. Runs the LLVM front end to create IR3. Runs our new LLVM backend to create

shader object file4. Sends the object file back to the game

void AddBrickFragments(Gluint currentProgram) { GLuint brickFS =glCreateShader( GL_FRAGMENT_SHADER);glShaderSource(brickFS, 1, brickStringArray, NULL);glCompileShader(brickFS);glAttachShader(currentProgram,brickFS); glLinkProgram(currentProgram);glUseProgram(currentProgram);}

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

game running on ARM

Page 89: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

The fast track compiler process1. So now, as the game runs, this call

to glCompileShader happens2. Then the ARM processor calls the

LLVM compiler, passing in this code for compilation

3. The LLVM compiler then:1. Runs Proj2Converter to make C code2. Runs the LLVM front end to create IR3. Runs our new LLVM backend to create

shader object file4. Sends the object file back to the game

void AddBrickFragments(Gluint currentProgram) { GLuint brickFS =glCreateShader( GL_FRAGMENT_SHADER);glShaderSource(brickFS, 1, brickStringArray, NULL);glCompileShader(brickFS);glAttachShader(currentProgram,brickFS); glLinkProgram(currentProgram);glUseProgram(currentProgram);}

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

game running on ARM

Page 90: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

The fast track compiler process1. So now, as the game runs, this call

to glCompileShader happens2. Then the ARM processor calls the

LLVM compiler, passing in this code for compilation

3. The LLVM compiler then:1. Runs Proj2Converter to make C code2. Runs the LLVM front end to create IR3. Runs our new LLVM backend to create

shader object file4. Sends the object file back to the game

void AddBrickFragments(Gluint currentProgram) { GLuint brickFS =glCreateShader( GL_FRAGMENT_SHADER);glShaderSource(brickFS, 1, brickStringArray, NULL);glCompileShader(brickFS);glAttachShader(currentProgram,brickFS); glLinkProgram(currentProgram);glUseProgram(currentProgram);}

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

game running on ARM

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

equivalent C code

Proj2converter

Page 91: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

void AddBrickFragments(Gluint currentProgram) { GLuint brickFS =glCreateShader( GL_FRAGMENT_SHADER);glShaderSource(brickFS, 1, brickStringArray, NULL);glCompileShader(brickFS);glAttachShader(currentProgram,brickFS); glLinkProgram(currentProgram);glUseProgram(currentProgram);}

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

game running on ARM

The fast track compiler process1. So now, as the game runs, this call

to glCompileShader happens2. Then the ARM processor calls the

LLVM compiler, passing in this code for compilation

3. The LLVM compiler then:1. Runs Proj2Converter to make C code2. Runs the LLVM front end to create IR3. Runs our new LLVM backend to create

shader object file4. Sends the object file back to the game

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

equivalent C code

Proj2converter

Page 92: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

1. So now, as the game runs, this call to glCompileShader happens

2. Then the ARM processor calls the LLVM compiler, passing in this code for compilation

3. The LLVM compiler then:1. Runs Proj2Converter to make C code2. Runs the LLVM front end to create IR3. Runs our new LLVM backend to create

shader object file4. Sends the object file back to the game

void AddBrickFragments(Gluint currentProgram) { GLuint brickFS =glCreateShader( GL_FRAGMENT_SHADER);glShaderSource(brickFS, 1, brickStringArray, NULL);glCompileShader(brickFS);glAttachShader(currentProgram,brickFS); glLinkProgram(currentProgram);glUseProgram(currentProgram);}

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

game running on ARM

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

equivalent C code

Proj2converter

The fast track compiler process

equivalent LLVM IR

LLVMfrontend

Page 93: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

1. So now, as the game runs, this call to glCompileShader happens

2. Then the ARM processor calls the LLVM compiler, passing in this code for compilation

3. The LLVM compiler then:1. Runs Proj2Converter to make C code2. Runs the LLVM front end to create IR3. Runs our new LLVM backend to create

shader object file4. Sends the object file back to the game

void AddBrickFragments(Gluint currentProgram) { GLuint brickFS =glCreateShader( GL_FRAGMENT_SHADER);glShaderSource(brickFS, 1, brickStringArray, NULL);glCompileShader(brickFS);glAttachShader(currentProgram,brickFS); glLinkProgram(currentProgram);glUseProgram(currentProgram);}

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

game running on ARM

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

equivalent C code

Proj2converter

equivalent LLVM IR

LLVMfrontend

The fast track compiler process

Page 94: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

1. So now, as the game runs, this call to glCompileShader happens

2. Then the ARM processor calls the LLVM compiler, passing in this code for compilation

3. The LLVM compiler then:1. Runs Proj2Converter to make C code2. Runs the LLVM front end to create IR3. Runs our new LLVM backend to create

shader object file4. Sends the object file back to the game

void AddBrickFragments(Gluint currentProgram) { GLuint brickFS =glCreateShader( GL_FRAGMENT_SHADER);glShaderSource(brickFS, 1, brickStringArray, NULL);glCompileShader(brickFS);glAttachShader(currentProgram,brickFS); glLinkProgram(currentProgram);glUseProgram(currentProgram);}

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

game running on ARM

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

equivalent C code

Proj2converter

equivalent LLVM IR

LLVMfrontend

The fast track compiler process

……MUL R1, R2, R3MADD R4,R1,R5……

equivalent shader object file

fast trackbackend

Page 95: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

The fast track compiler process

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

equivalent C code

Proj2converter

equivalent LLVM IR

LLVMfrontend

……MUL R1, R2, R3MADD R4,R1,R5……

equivalent shader object file

fast trackbackend

Page 96: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

The fast track compiler process iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

equivalent C code

Proj2converter

equivalent LLVM IR

LLVMfrontend

……MUL R1, R2, R3MADD R4,R1,R5……

equivalent shader object file

fast trackbackend

Page 97: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

The slow track compiler processIt is not good to use subproject 2’s converter:

•The compiler is run during game execution, so the conversion step adds overhead

•The conversion destroys vectors, so that you can’t create SIMD code

• After all, if C was a good fit for 3D shaders, then we wouldn’t need the OpenGL language!

Page 98: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

The slow track compiler processThe subproject 3, year 1 team addressed this problem: •They modified the LLVM frontend to read OpenGL code instead of C code

• To handle the SIMD information expressed in the OpenGL (such as variables declared as “vec4”), they added vectors into the LLVM IR

•The problem is that the LLVM backend was not modified, so their result is a non-standard LLVM IR, that can’t be currently compiled

• The gist of our slow track development process is modifying the backend to understand the augmented IR

Page 99: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

The fast track compiler process iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

equivalent C code

Proj2converter

equivalent LLVM IR

LLVMfrontend

……MUL R1, R2, R3MADD R4,R1,R5……

equivalent shader object file

fast trackbackend

The slow track compiler process iform vec3 LightPosition; const float SpecularContribution = 0.3;const float DiffuseContribution = 1.0 - SpecularContribution; varying float LightIntensity; varying vec2 MCposition;

void main(void) { vec3 ecPosition = vec3 (gl_ModelViewMatrix * gl_Vertex); vec3 tnorm = normalize(gl_NormalMatrix * gl_Normal); vec3 lightVec = normalize(LightPosition - ecPosition); vec3 reflectVec = reflect(-lightVec, tnorm); vec3 viewVec = normalize(-ecPosition); float diffuse = max(dot(lightVec, tnorm), 0.0); float spec = 0.0; if (diffuse > 0.0) { spec = max(dot(reflectVec, viewVec), 0.0); spec = pow(spec, 16.0); } LightIntensity = DiffuseContribution * diffuse + SpecularContribution * spec; MCposition = gl_Vertex.xy; gl_Position = ftransform(); }

shader string array

equivalent, aug-mented LLVM IR

Proj3Y1 LLVM

frontend

……SQRT R1, R2RCP R4,R1……

equivalent shader object file

slow trackbackend

Page 100: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Instruction selection summaryThere are then 3 steps in our instruction selector

1st cut: fast track selection- Backend changed to target the shader processors- Works but has no SIMD operation

2nd cut: slow track selection- Merge in the second backend change to understand the augmented IR- Update instruction selector to make SIMD choices

3rd cut: Create tree rewriting rules for the complex processor instructions, like SQRT and LOG

Page 101: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Progress

SHADER Instructions

LLVM

MOV

LD

ST

MUL

ADD

MAD

MIN

MAX

SLT

SLE

SGT

SGE

SHADER Instructions

LLVM

AND

OR

XOR

DP3

DP4

RCP

RSQ

LOG

EXP

BEQ

JMP

NOP

The following table shows the Shader Instructions. And our goal is map LLVM Instructions into our Shader Instructions.

Page 102: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

SHADER Instructions

LLVM

MOV

LD

ST

MUL mul

ADD add

MAD

MIN

MAX

SLT setlt

SLE setle

SGT setgt

SGE setge

SHADER Instructions

LLVM

AND and

OR or

XOR xor

DP3

DP4

RCP

RSQ

LOG

EXP shl

BEQ seteq

JMP

NOP nop

The following table shows the Shader Instructions. And our goal is map LLVM Instructions into our Shader Instructions.There are some LLVM Instructions that can obviously map into our Shader Instructions.

Progress

Page 103: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

SHADER Instructions

LLVM

MOV

LD

ST

MUL mul

ADD add

MAD

MIN

MAX

SLT setlt

SLE setle

SGT setgt

SGE setge

SHADER Instructions

LLVM

AND and

OR or

XOR xor

DP3

DP4

RCP

RSQ

LOG

EXP shl

BEQ seteq

JMP

NOP nop

We have map some of them, but there are more LLVM IR. If you have a LLVM IR without a tree rewriting rule for it, then you are not going to get a working compiler.

Progress

Page 104: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

SHADER Instructions

LLVM

MOV

LD

ST

MUL mul

ADD add

MAD

MIN

MAX

SLT setlt

SLE setle

SGT setgt

SGE setge

SHADER Instructions

LLVM

AND and

OR or

XOR xor

DP3

DP4

RCP

RSQ

LOG

EXP shl

BEQ seteq

JMP

NOP nop

We have map some of them, but there are more LLVM IR. If you have a LLVM IR without a tree rewriting rule for it, then you are not going to get a working compiler.

These are some harder to map, which means we are going to cover these one by one.

Progress

Page 105: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

SHADER Instructions

LLVM

MOV

LD

ST

MUL mul

ADD add

MAD

MIN

MAX

SLT setlt

SLE setle

SGT setgt

SGE setge

SHADER Instructions

LLVM

AND and

OR or

XOR xor

DP3

DP4

RCP

RSQ

LOG

EXP shl

BEQ seteq

JMP

NOP nop

Some are harder to map, which means one of 2 things:• It will require a more complicated mapping• It can be skipped (for now), it’s a many-to-one mapping

Progress

Page 106: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

ProgressFor example, here is how we map the SHR instruction, which is easy.

Page 107: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

ProgressFor example, here is how we map the SHR instruction, which is easy.

First, we took the MIPS backend to modify, it defines the SHR instruction like this:

def SHR : SetCC_R<0x00, 0x2a, "shr", setlt>;

Page 108: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

ProgressFor example, here is how we map the SHR instruction, which is easy.

First, we took the MIPS backend to modify, it defines the SHR instruction like this:

def SHR : SetCC_R<0x00, 0x2a, "shr", setlt>;

Then we turn it into the following code: def SHR : SetCC_R<0x00, 0x2a, "SHR", setlt>;

Page 109: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

ProgressFor example, here is how we map the SHR instruction, which is easy.

First, we took the MIPS backend to modify, it defines the SHR instruction like this:

def SHR : SetCC_R<0x00, 0x2a, "shr", setlt>;

Then we turn it into the following code: def SHR : SetCC_R<0x00, 0x2a, "SHR", setlt>;

Because this is a simple mapping, we can just change the the string which we can actually see in the assembly file. For now our target just to get correct assembly, not executables.

Page 110: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

ProgressFor example, here is how we map the SHR instruction, which is easy.

First, we took the MIPS backend to modify, it defines the SHR instruction like this:

def SHR : SetCC_R<0x00, 0x2a, "shr", setlt>;

Then we turn it into the following code: def SHR : SetCC_R<0x00, 0x2a, "SHR", setlt>;

Because this is a simple mapping, we can just change the the string which we can actually see in the assembly file. For now our target just to get correct assembly, not executables.

But there are some instruction hard to map, for example, the ASHR instruction.

Page 111: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

First, to remind what arithmetic shift right is:• It’s a shift that preserves sign extension.

Consider: if R0 = 10101010101010101010101010101010 then SHR R0,10 = 00000000001010101010101010101010 but ASHR R0,10 = 11111111111010101010101010101010 The shr was easy to make a rule for, because the

shader has an SHR instruction. But it doesn’t have an ASHR.

Q: How then can we make a rule to deal with the LLVM ashr IR instruction?

A: We’ll need to use multiple shader instructions (1 to many)

Page 112: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•But how to define a pattern of shader instructions?

In the example of previous slide, we see that SHR R0,10 = 00000000001010101010101010101010 ASHR R0,10 = 11111111111010101010101010101010

Page 113: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•But how to define a pattern of shader instructions?

In the example of previous slide, we see that SHR R0,10 = 00000000001010101010101010101010 ASHR R0,10 = 11111111111010101010101010101010

This part can be different

Page 114: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•But how to define a pattern of shader instructions?

In the example of previous slide, we see that SHR R0,10 = 00000000001010101010101010101010 ASHR R0,10 = 11111111111010101010101010101010

This part can be different This part is always the same

Page 115: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•But how to define a pattern of shader instructions?

In the example of previous slide, we see that SHR R0,10 = 00000000001010101010101010101010 ASHR R0,10 = 11111111111010101010101010101010

Left Part Right Part

Page 116: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•But how to define a pattern of shader instructions?

In the example of previous slide, we see that SHR R0,10 = 00000000001010101010101010101010 ASHR R0,10 = 11111111111010101010101010101010

Left Part Right PartThis part can be different This part is always the same

These are always the same

Page 117: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•But how to define a pattern of shader instructions?

In the example of previous slide, we see that SHR R0,10 = 00000000001010101010101010101010 ASHR R0,10 = 11111111111010101010101010101010

This number is 1023 = 210 -1, which can be computed as (1<<10) – 1

Page 118: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•But how to define a pattern of shader instructions?

In the example of previous slide, we see that SHR R0,10 = 00000000001010101010101010101010 ASHR R0,10 = 11111111111010101010101010101010

This number is 1023 = 210 -1, which can be computed as (1<<10) – 1

Page 119: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•But how to define a pattern of shader instructions?

In the example of previous slide, we see that SHR R0,10 = 00000000001010101010101010101010 ASHR R0,10 = 11111111111010101010101010101010

This number is 1023 = 210 -1, which can be computed as (1<<10) – 1

So it looks like answer here is to compute the right part with SHR and the left part as: (TopBit << ShiftAmount) - 1

Page 120: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Now we can start to build the ASHR instruction

•First we define a pattern call RED. Recall that shader registers have 4 32-bit fields: Red, Green, Blue, and Alpha. Since we are not using SIMD yet, we will only deal with 1 32-bit register. That is what RED does. Here is the LLVM pattern: def:PAT<(RED Rx),(AND Rx,0xFFFFFFFF)>

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

Page 121: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This strips out everything but the sign bit, which is now in the bottom bit position.

Page 122: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This strips out everything but the sign bit, which is now in the bottom bit position.

For example 10101010101010101010101010101010

Page 123: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This strips out everything but the sign bit, which is now in the bottom bit position.

For example 10101010101010101010101010101010After we do the blue part:

00000000000000000000000000000001

Page 124: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This strips out everything but the sign bit, which is now in the bottom bit position.

For example 10101010101010101010101010101010After we do the blue part:

00000000000000000000000000000001

Page 125: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This pushes the sign bit up y places. Thus it computes 2y, if the sign bit is 1.

Page 126: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This pushes the sign bit up y places. Thus it computes 2y, if the sign bit is 1.

For example 10101010101010101010101010101010After we do the blue part:

00000000000000000000000000000001After we do the green part (assuming y =10):

Page 127: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This pushes the sign bit up y places. Thus it computes 2y, if the sign bit is 1.

For example 10101010101010101010101010101010After we do the blue part:

00000000000000000000000000000001After we do the green part (assuming y =10): 00000000000000000000010000000000

Page 128: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB

32,Ry))>

This now computes the left part.

Page 129: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

This now computes the left part.For example 10101010101010101010101010101010After we do the blue part:

00000000000000000000000000000001After we do the green part (assuming y =10): 00000000000000000000010000000000After we do the purple part: 00000000000000000000001111111111

Page 130: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB

32,Ry))>

Finally, the sign extension bits shift up to where they go.

Page 131: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Second we define a pattern of shader instructions for computing the left part:

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

Finally, the sign extension bits shift up to where they go.For example 10101010101010101010101010101010After we do the blue part:

00000000000000000000000000000001After we do the green part (assuming y =10): 00000000000000000000010000000000After we do the purple part: 00000000000000000000001111111111After we do the red part: 11111111110000000000000000000000

Page 132: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•Third we define a pattern of shader instructions for merging the left and right parts:

def : PAT <(ASHR, Rx), (OR (TOPBITS Rx,Ry), (SHR (RED Rx), y))>

From the previous slide the red part is: 11111111110000000000000000000000

It is clear that the lavender part is: 00000000001010101010101010101010

And a bitwise-OR of the two parts yields: 11111111111010101010101010101010

Page 133: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

•So, we defined 3 patterns: def:PAT<(RED Rx),(AND Rx,0xFFFFFFFF)>

def:PAT<(TOPBITS Rx,Ry), (SHL(SUB(SHL(SHR(RED Rx),31),Ry),1),(SUB 32,Ry))>

def : PAT <(ASHR, Rx), (OR (TOPBITS Rx,Ry), (SHR (RED Rx), y))>

•As a result, there is now a rewriting rule for ashr

• Its awkward, but it works▫Besides its unclear how often shaders would do an ashr

•We must similarly build patterns for every LLVM IR instruction that does not naturally map to a shader processor instruction

Page 134: Instruction Selection Presented by Huang Kuo-An, Lu Kuo-Chang Subproject 3 A. Aho, M. Lam, R. Sethi, J. Ullman, “Instruction Selection by Tree Rewriting.”

Future workAll of the above is just for the first-cut compiler

1st cut: fast track selection- Backend changed to target the shader processors- Works but has no SIMD operation

2nd cut: slow track selection- Merge in the second backend change to understand the augmented IR- Update instruction selector to make SIMD choices

3rd cut: Create tree rewriting rules for the complex processor instructions, like SQRT and LOG