Top Banner
Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and Dr. Scherger
28

Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Mar 06, 2018

Download

Documents

phungdiep
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Compiler Construction

Chapter 1: Introduction

Slides modified from Louden Book and Dr. Scherger

Page 2: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Terminology

January, 2010 Chapter 1: Introduction 2

Compiler

Interpreter

Translator

Assembler

Linker

Loader

Preprocessor

Editor

Debugger

Profiler

Source Language

Target Language

Target Platform

Relocatable

Macro substitution

IDE

Cross Compiler

Dissambler

Front End

Back End

Page 3: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Compiler Stages

January, 2010 Chapter 1: Introduction 3

Scanner

Parser

Semantic

Analyzer

Source Code

Optimizer

Code

Generator

Target Code

Optimizer

Source Code

Target

Code

Tokens

Syntax Tree

Annotated

Tree

Intermediate

Code

Target

Code

Literal

Table

Symbol

Table

Error

Handler

Analysys

Synthesis

Page 4: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Files Used by Compilers

January, 2010 Chapter 1: Introduction 4

A source code text file (.c, .cpp, .java, etc. file extensions).

Intermediate code files: transformations of source code

during compilation, usually kept in temporary files rarely

seen by the user.

An assembly code text file containing symbolic machine

code, often produced as the output of a compiler (.asm,

.s file extensions).

Page 5: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Files Used by Compilers (cont.)

January, 2010 Chapter 1: Introduction 5

One or more binary object code files: machine

instructions, not yet linked or executable (.obj, .o file

extensions)

A binary executable file: linked, independently executable

(well, not always…) code (.exe, .out extensions, or no

extension).

Page 6: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Compiler Execution

What is O() of a compiler?

Page 7: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Extended Example

January, 2010 Chapter 1: Introduction 7

Source code:

a[index] = 4 + 2

Tokens: ID Lbracket ID Rbracket AssignOp Num AddOp Num

Parse tree (syntax tree with all steps of the parser in

gory detail):

Page 8: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Parse Tree

January, 2010 Chapter 1: Introduction 8

expression = expression

subscript-expression

identifier

[

identifier

]

a index

additive-expression

number 4

expression expression +

number 2

expression expression

assign-expression

expression

Page 9: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Syntax Tree

January, 2010 Chapter 1: Introduction 9

a "trimmed" version of the parse tree with only

essential information:

assign-expression

subscript-expression

identifier identifier

a index

additive-expression

number

4

number

2

Page 10: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Annotated Syntax Tree (with attributes)

January, 2010 Chapter 1: Introduction 10

assign-expression

subscript-expression

identifier identifier

a index

additive-expression

number

4

number

2

integer integer

array of

integer integer integer integer

integer

Page 11: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Intermediate Code

Syntax tree very abstract

Machine code too specific

Something in between may make optimization much

easier

One such representation is three-address code

Has only up to three different variables (addresses)

t = 4 + 2

a[index] = t

Page 12: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Target Code

January, 2010 Chapter 1: Introduction 12

(edited & modified for this presentation):

mov eax, 6

mov ecx, DWORD PTR _index$[ebp]

mov DWORD PTR _a$[ebp+ecx*4], eax

(Note source level constant folding optimization.)

Source code: a[index] = 4 + 2

Tokens: ID Lbracket ID Rbracket AssignOp Num AddOp Num

Page 13: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Scanner

Parser

Semantic

Analyzer

Source Code

Optimizer

Code

Generator

Target Code

Optimizer

Source Code

Target

Code

Tokens

Syntax Tree

Annotated

Tree

Intermediate

Code

Target

Code

Literal

Table

Symbol

Table

Error

Handler

The Big Picture

January, 2010 Chapter 1: Introduction 13

mov eax, 6

mov ecx, DWORD PTR _index$[ebp]

mov DWORD PTR _a$[ebp+ecx*4], eax

ID Lbracket ID Rbracket AssignOp Num AddOp Num

a[index] = 4 + 2

assign-expression

subscript-expression

identifier identifier

a index

additive-expression

number

4

number

2

assign-expression

subscript-expression

identifier identifier

a index

additive-expression

number

4

number

2

integer integer

array of

integer integer integer integer

integer

t = 4 + 2

a[index] = t

Page 14: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Algorithmic Tools

January, 2010 Chapter 1: Introduction 14

Tokens: defined using regular expressions. (Chapter 2)

Scanner:

an implementation of a finite state machine (deterministic

automaton) that recognizes the token regular expressions

(Chapter 2).

Page 15: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Algorithmic Tools (cont.)

January, 2010 Chapter 1: Introduction 15

Parser

A push-down automaton (i.e. uses a stack), based on grammar

rules in a standard format (BNF – Backus-Naur Form).

(Chapters 3, 4, 5)

Semantic Analyzer and Code Generator:

Recursive evaluators based on semantic rules for attributes

(properties of language constructs). (Chapters 6, 7, 8)

Page 16: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Other Phase Features

January, 2010 Chapter 1: Introduction 16

Parser and scanner together typically operate as a unit

(parser calls scanner repeatedly to generate tokens).

Front end:

Parser, scanner, semantic analyzer and source code optimizer

depend primarily on source language.

Back end:

code generator and target code optimizer depend primarily on

target language (machine architecture).

Page 17: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Other Classifications

January, 2010 Chapter 1: Introduction 17

Logical unit: phase

Physical unit: separately compiled code file (see later)

Temporal unit: pass

Passes: trips through the source code (or intermediate code).

These are not phases (but they could be).

Page 18: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Data Structure Tools

January, 2010 Chapter 1: Introduction 18

Syntax tree: see previous pictures.

Literal table: "Hello, world!", 3.141592653589793, etc.

If a literal is used more than once (as they often are in a program), we still want to store it only once.

So we use a table (almost always a hash table or table of hash tables).

Symbol table: all names (variables, functions, classes, typedefs, constants,

namespaces).

Again, a hash table or set of hash tables is the most likely data structure.

Page 19: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Error Handler

January, 2010 Chapter 1: Introduction 19

One of the more difficult parts of a compiler to design.

Must handle a wide range of errors

Must handle multiple errors.

Must not get stuck.

Must not get into an infinite loop (typical simple-minded

strategy:count errors, stop if count gets too high).

Page 20: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Kinds of Errors

January, 2010 Chapter 1: Introduction 20

Syntax: iff (x == 0) y + = z + r; }

Semantic: int x = "Hello, world!";

Runtime: int x = 2;

...

double y = 3.14159 / (x - 2);

Page 21: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Errors (cont.)

January, 2010 Chapter 1: Introduction 21

A compiler must handle syntax and semantic errors, but

not runtime errors (whether a runtime error will occur

is an undecidable question).

Sometimes a compiler is required to generate code to

catch runtime errors and handle them in some graceful

way (either with or without exception handling).

This, too, is often difficult.

Page 22: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Sample Compilers in This Class ("Toys")

January, 2010 Chapter 1: Introduction 22

TINY: a 4-pass compiler for the TINY language, based on

Pascal (see text, pages 22-26)

C-Minus: A project language given in the text(see text,

pages 26-27 and Appendix A). Based on C.

SIL: Simple Island Language:

Page 23: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

TINY Example

January, 2010 Chapter 1: Introduction 23

read x;

if x > 0 then

fact := 1;

repeat

fact := fact * x;

x := x - 1

until x = 0;

write fact

end

Page 24: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

C-Minus Example

January, 2010 Chapter 1: Introduction 24

int fact( int x )

{ if (x > 1)

return x * fact(x-1);

else

return 1;

}

void main( void )

{ int x;

x = read();

if (x > 0) write( fact(x) );

}

Page 25: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Structure of the TINY Compiler

January, 2010 Chapter 1: Introduction 25

globals.h main.c

util.h util.c

scan.h scan.c

parse.h parse.c

symtab.h symtab.c

analyze.h analyze.c

code.h code.c

cgen.h cgen.c

Page 26: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Conditional Compilation Options

January, 2010 Chapter 1: Introduction 26

NO_PARSE:

Builds a scanner-only compiler.

NO_ANALYZE:

Builds a compiler that parses and scans only.

NO_CODE:

Builds a compiler that performs semantic analysis, but generates

no code.

Page 27: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Listing Options (built in - not flags)

January, 2010 Chapter 1: Introduction 27

EchoSource:

Echoes the TINY source program to the listing, together with line

numbers.

TraceScan:

Displays information on each token as the scanner recognizes it.

TraceParse:

Displays the syntax tree in a linearlized format.

TraceAnalyze:

Displays summary information on the symbol table and type checking.

TraceCode:

Prints code generation-tracing comments to the code file.

Page 28: Compiler Construction Chapter 1: Introductionsking/Courses/Compilers/Slides/Introduction.pdf · Compiler Construction Chapter 1: Introduction Slides modified from Louden Book and

Terminology Review

January, 2010 Chapter 1: Introduction 28

Compiler

Interpreter

Translator

Assembler

Linker

Loader

Preprocessor

Editor

Debugger

Profiler

Source Language

Target Language

Target Platform

Relocatable

Macro substitution

IDE

Cross Compiler

Dissambler

Front End

Back End