Top Banner
CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to Me CSE Laurent Michel CSE Aggelos Kiayias CSE Steven Demurjian CSE Robert La Barre UTRC
47

CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

CSE4100Compiler

(a.k.a. Programming Language Translation)

CSE4100Compiler

(a.k.a. Programming Language Translation)

Notes credit go to

Me CSE

Laurent Michel CSE

Aggelos Kiayias CSE

Steven Demurjian CSE

Robert La Barre UTRC

Page 2: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

2CSE244 Compilers

Overview• Objectives

• Structure of the course

• Evaluation

• Compiler Introduction– A compiler bird’s eye view

• Lexical analysis

• Parsing (Syntax analysis)

• Semantic analysis

• Code generation

• Optimization

Page 3: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

3CSE244 Compilers

Information• Course web page

– Will be on HuskyCT

– Currently• http://www.engr.uconn.edu/~weiwei/

• Instructor– Wei Wei

[email protected]

• Office hour– TuTh 3:30pm ~ 4:30pm

– Wednesday 1:30pm ~ 4:30pm

Page 4: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

4CSE244 Compilers

Objectives• Compilers are….

– Ubiquitous in Computer Science

– Central to symbolic processing

– Relates to• Theory of computing

• Practice of computing

• Programming languages

• Operating Systems

• Computer Architecture

Page 5: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

5CSE244 Compilers

Purpose• Simple intent

– Translate• From Source Language

• To Target Language

Page 6: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

6CSE244 Compilers

Translate… Why?• Languages offer

– Abstractions

– At different levels• From low

– Good for machines….

• To high– Good for humans….

Let the computerDo the heavy lifting.

Page 7: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

7CSE244 Compilers

Translate… How ?• Three approaches

– Interpreted

– Compiled

– Mixed

Page 8: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

8CSE244 Compilers

Interpreter• Motivation…

– Easiest to implement!

• Upside ?

• Downside ?

• Phases– Lexical analysis

– Parsing

– Semantic checking

– Interpretation

Page 9: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

9CSE244 Compilers

Compiler• Motivation

– It is natural!

• Upside?

• Downside?

• Phases– [Interpreter]

– Code Generation

– Code Optimization

– Link & load

Page 10: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

10

CSE244 Compilers

Mixed• Motivation

– The best of two breeds…

• Upside ?

• Downside?

Page 11: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

11

CSE244 Compilers

Objectives• Learn about compilers because…

– It helps to better program– Understand the tools of the trade– Many languages are compiled

• Programming Languages [C,C#,ML,LISP,…]• Communication Languages [XML,HTML,…]• Presentation Languages [CSS,SGML,…]• Hardware Languages [VHDL,…]• Formatting Languages [Postscript,troff,LaTeX,…]• Query Languages [SQL & friends]

– Many assistive tools use compiler technology…• Syntax highlighting• Type assist - type completion

– You may write/use compiler technology!

Page 12: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

12

CSE244 Compilers

Overview• Objectives

• Structure of the course

• Evaluation

• Compiler Introduction– A compiler bird’s eye view

• Lexical analysis

• Parsing

• Semantic analysis

• Code generation

• Optimization

Page 13: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

13

CSE244 Compilers

Course structure• A reflection of the topic

– Lexical analysis

– Parsing

– Semantic analysis

– Midterm

– Runtime structures

– Intermediate code generation

– Machine code generation

– Optimization

– Final

Page 14: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

14

CSE244 Compilers

Evaluation• Course evaluation

– Six homeworks (50%)• 5 mandatory

• 1 extra credit

– One midterm (20%)

– One final (30%)

• Exams– Open book

Page 15: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

15

CSE244 Compilers

Six Homeworks• One project to rule them all...

• Purpose– First Hand Experience with compiler technology

• Six Homeworks are connected– Scanning

– Parsing

– Analysis

– IR Code Generation

– IR Optimization

– Machine Code Generation

And in the Darkness Bind You...

Page 16: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

16

CSE244 Compilers

C++C++

The Source Language

CC

JavaJava C--C--

Page 17: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

17

CSE244 Compilers

C--• The Object Oriented Language For Dummies

– C-- supports• Classes

• Inheritance

• Polymorphism

• 2 basic types

• Arrays

• Simple control primitives– if-then-else

– while

Page 18: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

18

CSE244 Compilers

C-- Exampleclass Foo {

int fact(int n) {return 0;

}int fib(int x) {

return 1;}

};

class Main extends Foo {Main() {

int x;x = fact(5);}

int fact(int n) {if (n==0)

return 1;else

return n * fact(n-1);}

};

class Foo {int fact(int n) {

return 0;}int fib(int x) {

return 1;}

};

class Main extends Foo {Main() {

int x;x = fact(5);}

int fact(int n) {if (n==0)

return 1;else

return n * fact(n-1);}

};

Page 19: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

19

CSE244 Compilers

The Target Language• Something realistic...

• Something challenging...

• Something useful...

It’s True!We will generate code for a Pentium Class

Machine Running EithterWindows or Linux

Page 20: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

20

CSE244 Compilers

But...... • Won’t that be hard?

• No!– My C-- compiler

• 6000 lines of code

• Written in < 10 days

– Your C-- compiler• Will use some of my code...

Page 21: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

21

CSE244 Compilers

Really ?global mainextern printfextern mallocsection .data

D_0: dd block_0,block_1D_1: dd block_3,block_1,block_2

section .textmain:push 4 ; push constantcall malloc ; call to C functionadd esp,4 ; pop argumentmov [eax], dword D_1 ; move source into destpush eax ; push variablemov eax, dword [eax] ; get the VTBLmov eax, dword [eax+8] ; get the k=2 methodcall eax ; call the methodadd esp,4 ; pop argsret

block_0:mov [esp-4], dword ebp ; save old BPmov ebp, dword esp ; set BP to SPmov [ebp-8], dword esp ; save old SPsub esp,8 ; reserve space mov eax, dword 0 ; write result in output registermov esp, dword [ebp-8] ; restore old SPmov ebp, dword [ebp-4] ; restore old BPret ; return from functionblock_1:mov [esp-4], dword ebp ; save old BPmov ebp, dword esp ; set BP to SPmov [ebp-8], dword esp ; save old SPsub esp,8 ; reserve space formov eax, dword 1 ; write result in output registermov esp, dword [ebp-8] ; restore old SPmov ebp, dword [ebp-4] ; restore old BPret ; return from function

block_3:mov [esp-4], dword ebp ; save old BPmov ebp, dword esp ; set BP to SPmov [ebp-8], dword esp ; save old SPsub esp,20 ; reserve spacemov eax, dword [ebp+8] ; move o2

cmp eax,0 ; do the operationmov eax,0sete ahcmp eax,0 ; compare to 0 to set CCjz block_5 ; transfer to true block

block_4:mov eax, dword 1 ; write result in output registerjmp block_6 ; transfer controlblock_5:mov eax, dword [ebp+8] ; move o2sub eax,1 ; do the operationpush eax ; push variablemov ebx, dword [ebp+4] ; get argument in registerpush ebx ; push variablemov ebx, dword [ebx] ; get the VTBLmov ebx, dword [ebx] ; get the k=0 methodcall ebx ; call the methodadd esp,8 ; pop argsmov ecx, dword [ebp+8] ; move o2imul ecx,eax ; do the operationmov eax, dword ecx ; write result in output register

block_6:mov esp, dword [ebp-8] ; restore old SPmov ebp, dword [ebp-4] ; restore old BPret ; return from function

block_2:mov [esp-4], dword ebp ; save old BPmov ebp, dword esp ; set BP to SPmov [ebp-8], dword esp ; save old SPsub esp,12 ; reserve spacepush 5 ; push constantmov eax, dword [ebp+4] ; get argument in registerpush eax ; push variablemov eax, dword [eax] ; get the VTBLmov eax, dword [eax] ; get the k=0 methodcall eax ; call the methodadd esp,8 ; pop argsmov esp, dword [ebp-8] ; restore old SPmov ebp, dword [ebp-4] ; restore old BPret ; return from function

This is NASM assembly This is NASM assembly generated for the generated for the C--C--

example shown earlier.example shown earlier.

Page 22: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

22

CSE244 Compilers

Overview• Objectives

• Structure of the course

• Evaluation

• Compiler Introduction– A compiler bird’s eye view

• Lexical analysis

• Parsing

• Semantic analysis

• Code generation

• Optimization

Page 23: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

23

CSE244 Compilers

Compiler Classes• Compilers Viewed from Many Perspectives

• However, All utilize same basic tasks to accomplish their actions

Single PassMultiple PassLoad & Go

Construction

DebuggingOptimizing

Functional

Page 24: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

24

CSE244 Compilers

Compiler Structure• Two fundamental sets of issues

• Our Focus:– Both

AnalysisText analysisSyntactic analysisStructural analysis

Synthesis Program generation

Program optimization

Page 25: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

25

CSE244 Compilers

Some Good news!• Tools do exist for

– Lexical and syntactic analysis

– Note: it was not always the case.• Structure / Syntax directed editors:

– Force “syntactically” correct code to be entered

• Pretty Printers: – Standardized version for program structure (i.e.,indenting)

• Static Checkers: – A “quick” compilation to detect rudimentary errors

• Interpreters: – “real” time execution of code a “line-at-a-time”

Page 26: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

26

CSE244 Compilers

Phases of compilationSource Program

Lexical Analyzer1

Syntax Analyzer2

Semantic Analyzer3

Intermediate Code Generator

4

Code Optimizer5

Code Generator6

Target Program

Symbol-table Manager

Error Handler

1, 2, 3 : Analysis 4, 5, 6 : Synthesis

Page 27: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

27

CSE244 Compilers

RelocatableSource Program

Pre-Processor1

Compiler2

Assembler3

RelocatableMachine Code

4

Loader Link/Editor5

Executable

Library,relocatable object files

Page 28: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

28

CSE244 Compilers

AnalysisSource Program

Lexical Analyzer1

Syntax Analyzer2

Semantic Analyzer3

Intermediate Code Generator

4

Code Optimizer5

Code Generator6

Target Program

Error Handler

Language Analysis Phases

Page 29: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

29

CSE244 Compilers

Lexical analysis• Purpose

– Slice the sequence of symbols into tokens

Date x := new Date ( System.today( ) + 30 ) ;

Id Date

Id x

Symbol :=

Keyword new

Symbol (

Id Date Id System

Symbol .

Id Today

Symbol (

Symbol )

Symbol +

Integer 30

Symbol )

Symbol ;

Id Date

Id x

Symbol :=

Keyword new

Symbol (

Id Date Id System

Symbol .

Id Today

Symbol (

Symbol )

Symbol +

Integer 30

Symbol )

Symbol ;

Page 30: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

30

CSE244 Compilers

Syntax Analysis (parsing)• Purpose

– Organize tokens in sentences based on grammar

Symbol :=

Keyword new

Symbol (

Symbol .

Symbol (

Symbol )

Symbol +

Symbol )

Symbol ;

Page 31: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

31

CSE244 Compilers

What is a grammar ?• Grammar is a Set of Rules Which Govern

– The Interdependencies &

– Structure Among the Tokens

statement is anassignment statement, or while statement, or if statement, or ...

assignment statement

is an identifier := expression ;

expression is an

(expression), or expression + expression, or expression * expression, or number, or identifier, or ...

Page 32: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

32

CSE244 Compilers

Summary so far…• Turn a symbol stream into a parse tree

Page 33: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

33

CSE244 Compilers

Semantic Analysis• Purpose

– Determine Unique / Unambiguous Interpretation

– Catch errors related to program meaning

• Examples– Natural Language

• “John Took Picture of Mary Out on the Patio”

– Programming Language• Wrong types• Missing declaration• Missing methods• Ambiguous statements…

Page 34: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

34

CSE244 Compilers

Semantic Analysis• Main task

– Type checking– Many Different Situations

– Primary tool• Symbol table

Real := int + char ;

A[int] := A[real] + int ;

while char <> int do

…. Etc.

Page 35: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

35

CSE244 Compilers

Summary so far...• Turned a symbol stream into an annotated

parse tree

Page 36: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

36

CSE244 Compilers

AnalysisSource Program

Lexical Analyzer1

Syntax Analyzer2

Semantic Analyzer3

Intermediate Code Generator

4

Code Optimizer5

Code Generator6

Target Program

Error Handler

Synthesis Phases

Page 37: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

37

CSE244 Compilers

Intermediate Code• What is intermediate code ?

– A low level representation

– A simple representation

– Easy to reason about

– Easy to manipulate

– Programming Language independent

– Hardware independent

Page 38: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

38

CSE244 Compilers

IR Code example• Three-address code

– Three operands per instruction

– Each assignment instruction has at most one operator on the right side

– Instructions fix the order in which operations are to be done

– Generate a temporal name to hold value computed

• ExamplePosition = initial + rate *60

t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3

t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3id1 id2 id3

Page 39: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

39

CSE244 Compilers

Code Optimization• Purpose

– Improve the intermediate code• Get rid of redundant / dead code

• Get rid of redundant computation

• Reorganize code

• Schedule instructions

• Factor out code

– Improve the machine code• Register allocation

• Instruction scheduling [for specific hardware]

Page 40: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

40

CSE244 Compilers

Machine Code Generation• Purpose

– Generate code for the target architecture

– Code is relocatable

– Accounts for platforms specific• Registers

– IA32: 6 GP registers (int)

– MIPS: 32 GP registers (int)

• Calling convention– IA32: stack-based

– MIPS: register based

– Sparc: register sliding window based

Page 41: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

41

CSE244 Compilers

Machine code generation• Pragmatics

– Generate assembly

– Let an assembler produced the binary code.

Page 42: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

42

CSE244 Compilers

Assemblers• Assembly code: names are used for instructions,

and names are used for memory addresses.

• Two-pass Assembly:– First Pass: all identifiers are assigned to memory

addresses (0-offset)– e.g. substitute 0 for a, and 4 for b– Second Pass: produce relocatable machine code:

MOV a, R1

ADD #2, R1MOV R1, b

0001 01 00 00000000 *

0011 01 10 000000100010 01 00 00000100 *

relocationbit

Page 43: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

43

CSE244 Compilers

Linker & Loader• Loader

– Input• Relocatable code in a file

– Output• Bring executable into virtual address space. Relocate

“shared” libs.

• Linker– Input

• Many relocatable machine code files (object files)

– Output• A single executable file with all the object file glued

together.– Need to relocate each object file as needed

Page 44: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

44

CSE244 Compilers

Pre-processors• Purpose

– Macro processing– Performs text replacement (editing)– Performs file concatenation (#include)

• Example– In C, #define does not exist– In C, #define is handled by a preprocessor

#define X 3

#define Y A*B+C

#define Z getchar()

Page 45: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

45

CSE244 Compilers

Modern Compilers• Motto

– Re-targetable compilers

• Organization– Front-end

• The analysis• The generation of intermediate code

– Flexible IR

– Back-end• The generation of machine code• The architecture-specific optimization

Page 46: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

46

CSE244 Compilers

Modern Compiler Example• Gcc

– Front-ends for• C,C++,FORTRAN,Objective-C,Java

– IR• Static Single Assignment (SSA)

– Back-ends for• IA32, PPC, Ultra, MIPS, ARM, Itanium, DEC, Alpha, VMS,

Page 47: CSE4100 Compiler (a.k.a. Programming Language Translation) Notes credit go to MeCSE Laurent Michel CSE Aggelos KiayiasCSE Steven DemurjianCSE Robert La.

47

CSE244 Compilers

Ahead• Today’s lecture

– Reading• Chapter 1

– For the inquisitive mind• Chapter 2 is an overview of the entire process. Useful

to browse and refer to later on.

• Next Lecture– Scanning

– Reading• Chapter 3