Top Banner
Compiler construction in4020 – lecture 5 Koen Langendoen Delft University of Technology The Netherlands
47

Compiler construction in4020 – lecture 5

Feb 10, 2017

Download

Documents

vucong
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Compiler construction in4020 – lecture 5

Compiler constructionin4020 – lecture 5

Koen Langendoen

Delft University of TechnologyThe Netherlands

Page 2: Compiler construction in4020 – lecture 5

• syntax analysis: tokens AST

• bottom-up parsing• push-down automaton• ACTION/GOTO tables

• LR(0) NO look-ahead• SLR(1) one-token look-ahead, FOLLOW sets

to solve shift-reduce conflicts• LR(1) SLR(1), but FOLLOW set per item• LALR(1) LR(1), but “equal” states are

merged

Summary of lecture 4

Page 3: Compiler construction in4020 – lecture 5

Quiz

2.50 Is the following grammar LR(0), SLR(1), LALR(1), or LR(1) ?

(a) S x S x | y

(b) S x S x | x

Page 4: Compiler construction in4020 – lecture 5

• semantic analysis • identification – symbol tables• type checking

• assignment• yacc• LLgen

Overview

program text

lexical analysis

syntax analysis

context handling

annotated AST

tokens

AST

parsergenerator

languagegrammar

Page 5: Compiler construction in4020 – lecture 5

Semantic analysis• information is scattered throughout the program• identifiers serve as connectors

• find defining occurrence of each applied occurrence of an identifier in the AST• undefined identifiers error• unused identifiers warning

• check rules in the language definition• type checking• control flow (dead code)

se-man-tic: of or relating to meaning in language.

Webster’s Dictionary

Page 6: Compiler construction in4020 – lecture 5

Semantic analysis• information is scattered throughout the program• identifiers serve as connectors

• find defining occurrence of each applied occurrence of an identifier in the AST• undefined identifiers error• unused identifiers warning

• check rules in the language definition• type checking• control flow (dead code)

Page 7: Compiler construction in4020 – lecture 5

Symbol table

• global storage used by all compiler phases

• holds information about identifiers:• type• location• size•

program text

lexical analysis

syntax analysis

context handling

annotated AST

optimizations

code generation

executable

symbol

table

Page 8: Compiler construction in4020 – lecture 5

Symbol table implementation

• extensible string-indexable array• linear list• tree• hash table

next

name

type

next

name

type

next

name

type

”mies””noot”

”aap”bucket 0

bucket 1

bucket 2

bucket 3

hash function: string int

Page 9: Compiler construction in4020 – lecture 5

Identification

• different kinds of identifiers• variables• type names• field selectors

• name spaces

• scopes

typedef int i;

int j;

void foo(int j) { struct i {i i;} i;

i: i.i = 3; printf( "%d\n", i.i); }

Page 10: Compiler construction in4020 – lecture 5

Identification

• different kinds of identifiers• variables• type names• field selectors

• name spaces

• scopes

typedef int i;

int j;

void foo(int j) { struct i {i i;} i;

i: i.i = 3; printf( "%d\n", i.i); }

Page 11: Compiler construction in4020 – lecture 5

Handling scopes

• stack of scope elements• when entering a scope a new element is

pushed on the stack• declared identifiers are entered in the top

scope element• applied identifiers are looked up in the scope

elements from top to bottom• the top element is removed upon scope exit

Page 12: Compiler construction in4020 – lecture 5

A scoped hash-based symbol table

”mies”decl

bucket 0 bucket 1

bucket 2

bucket 3

”aap”decl

”noot”decl

aap( int noot){ int mies, aap; ....}

prop2

prop0prop2

prop1

level

210

scope stack

hash table

Page 13: Compiler construction in4020 – lecture 5

Identification: complications• overloading

• operators: N*2 prijs*2.20371• functions: PUT(s:STRING) PUT(i:INTEGER)

• solution: yield set of possibilities (to be constrained by type checking)

• imported scopes• C++ scope resolution operator x::• Modula FROM module IMPORT ...

• solution: stack (or merge) the new scope

Page 14: Compiler construction in4020 – lecture 5

Type checking

• operators and functions impose restrictions on the types of the arguments

• types• basic types• structured types• type names

typedef struct { double re; double im; } complex;

Page 15: Compiler construction in4020 – lecture 5

Forward declarations• recursive data structures

• type information must be stored• type table

• type information must be resolved• undefined types• circularities

TYPE Tree = POINTER TO Node;Type Node = RECORD

element : Integer; left, right : Tree;END RECORD;

Page 16: Compiler construction in4020 – lecture 5

Type equivalence• name equivalence [all types get a unique name]

VAR a : ARRAY [Integer 1..10] OF Real;VAR b : ARRAY [Integer 1..10] OF Real;

• structural equivalence [difficult to check]TYPE c = RECORD i : Integer; p : POINTER TO c; END RECORD;TYPE d = RECORD

i : Integer; p : POINTER TO

RECORD i : Integer; p : POINTER to c; END RECORD;

END RECORD;

Page 17: Compiler construction in4020 – lecture 5

Coercions• implicit data and type conversion

to match operand (argument) type

• coercions complicate identification(ambiguity)

• two phase approach• expand a type to a set by applying coercions• reduce type sets based on constraints imposed by

(overloaded) operators and language semantics

VAR a : Real;...a := 5;

3.14 + 7 8 + 9

Page 18: Compiler construction in4020 – lecture 5

Variable: value or location?

• two usages of variablesrvalue: valuelvalue: location

• insert coercion to dereference variable• checking rules:

VAR p : Real;VAR q : Real;...p := q;

foundexpected

lvalue rvaluelvalue - derefrvalue ERROR -

:=

(location of)p

deref

(location of)q

Page 19: Compiler construction in4020 – lecture 5

Exercise (5 min.)

complete the table expressionconstruct

result kind(lvalue/rvalue)

constant rvalueidentifier&lvalue*rvalueV[rvalue]V.selectorrvalue + rvaluelvalue = rvalue

V stands for lvalue or rvalue

Page 20: Compiler construction in4020 – lecture 5

Answers

complete table expressionconstruct

result kind(lvalue/rvalue)

constant rvalueidentifier (variable) lvalueidentifier (otherwise) rvalue&lvalue rvalue*rvalue lvalueV[rvalue] VV.selector Vrvalue + rvalue rvaluelvalue = rvalue rvalue

V stands for lvalue or rvalue

Page 21: Compiler construction in4020 – lecture 5

Break

Page 22: Compiler construction in4020 – lecture 5

Assignment (practicum)Asterix compiler

1) replace yacc by LLgen2) make Asterix object-oriented

• classes and objects• inheritance and dynamic binding C-code

tokendescription

Asterixgrammar yacc

lex lexical analysis

syntax analysis

context handling

code generation

Asterix program

Page 23: Compiler construction in4020 – lecture 5

Yet another compiler compiler

• yacc (bison): parser generator for UNIX• LALR(1) grammar C code

• format of the yacc input file:definitions

%%

rules

%%

user code

tokens + properties

grammar rules + actions

auxiliary C-code

Page 24: Compiler construction in4020 – lecture 5

Yacc-basedexpression interpreter• input file

• yacc maintains a stack of “values” that may be referenced ($i) in the semantic actions

%token DIGIT

%%line : expr '\n' { printf("%d\n", $1);} ;expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ;%%

grammar semantics

Page 25: Compiler construction in4020 – lecture 5

Yacc interface to lexical analyzer• yacc invokes yylex()

to get the next token

• the “value” of a token must be stored in the global variable yylval

• the default value type is int, but can be changed

%%yylex(){ int c;

c = getchar();

if (isdigit(c)) { yylval = c - '0'; return DIGIT; } return c;}

Page 26: Compiler construction in4020 – lecture 5

Yacc interface to back-end• yacc generates a

function named yyparse()

• syntax errors are reported by invoking a callback function yyerror()

%%yylex(){ ...}main(){ yyparse();}

yyerror(){ printf("syntax error\n"); exit(1);}

Page 27: Compiler construction in4020 – lecture 5

Yacc-basedexpression interpreter• input file

(desk0)

• run yacc

%%line : expr '\n' { printf("%d\n", $1);} ;expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ;%%

> make desk0bison -v desk0.ydesk0.y contains 4 shift/reduce conflicts.gcc -o desk0 desk0.tab.c>

Page 28: Compiler construction in4020 – lecture 5

Conflict resolution in Yacc• shift-reduce: prefer shift• reduce-reduce: prefer the rule that comes first

Page 29: Compiler construction in4020 – lecture 5

Yacc-basedexpression interpreter• input file

(desk0)

• run yacc• run desk0, is it correct?

%%line : expr '\n' { printf("%d\n", $1);} ;expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ;%%

> desk0 2*3+414

NO

Page 30: Compiler construction in4020 – lecture 5

Operator precedence in Yaccpriority fromtop (low) tobottom (high)

%token DIGIT%left '+'%left '*'

%%line : expr '\n' { printf("%d\n", $1);} ;expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ;%%

Page 31: Compiler construction in4020 – lecture 5

Exercise (7 min.)multiple lines: %%

lines: line | lines line ;line : expr '\n' { printf("%d\n", $1);} ;expr : expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | DIGIT ;%%

Extend the interpreter to a desk calculator withregisters named a – z. Example input: v=3*(w+4)

Page 32: Compiler construction in4020 – lecture 5

Answers

Page 33: Compiler construction in4020 – lecture 5

Answers%{int reg[26];%}%token DIGIT%token REG%right '='%left '+'%left '*'%%expr : REG '=' expr { $$ = reg[$1] = $3;} | expr '+' expr { $$ = $1 + $3;} | expr '*' expr { $$ = $1 * $3;} | '(' expr ')' { $$ = $2;} | REG { $$ = reg[$1];} | DIGIT ;%%

Page 34: Compiler construction in4020 – lecture 5

Answers%%

yylex(){ int c = getchar();

if (isdigit(c)) { yylval = c - '0'; return DIGIT; } else if ('a' <= c && c <= 'z') { yylval = c - 'a'; return REG; } return c;}

Page 35: Compiler construction in4020 – lecture 5

LLgen: LL(1) parser generator• LLgen is part of the Amsterdam Compiler Kit

• takes LL(1) grammar + semantic actions in C and generates a recursive descent parser

• LLgen features:• repetition operators• advanced error handling• parameter passing• control over semantic actions• dynamic conflict resolvers

Page 36: Compiler construction in4020 – lecture 5

LLgen example:expression interpreter• start from LR(1) grammar• make grammar LL(1)

• use repetition operators

lines : line | lines line ;line : expr '\n' ;expr : expr '+' expr | expr '*' expr | '(' expr ')‘ | DIGIT ;

yacc

• left recursion• operator precedence

Page 37: Compiler construction in4020 – lecture 5

LLgen example:expression interpreter• start from LR(1) grammar• make grammar LL(1)

• use repetition operators

• left recursion• operator precedence

%token DIGIT;

main : [line]+ ;line : expr '\n' ;expr : term [ '+' term ]* ;term : factor [ '*' factor ]* ;factor : '(' expr ')‘ | DIGIT ;

LLgen

• add semantic actions• attach parameters to grammar rules• insert C-code between the symbols

Page 38: Compiler construction in4020 – lecture 5

main : [line]+ ;line {int e;} : expr(&e) '\n' { printf("%d\n", e);} ;expr(int *e) {int t;} : term(e) [ '+' term(&t) { *e += t;} ]* ;term(int *t) {int f;} : factor(t) [ '*' factor(&f) { *t *= f;} ]* ;factor(int *f) : '(' expr(f) ')' | DIGIT { *f = yylval;} ;

grammar semantics

Page 39: Compiler construction in4020 – lecture 5

main : [line]+ ;line {int e;} : expr(&e) '\n' { printf("%d\n", e);} ;expr(int *e) {int t;} : term(e) [ '+' term(&t) { *e += t;} ]* ;term(int *t) {int f;} : factor(t) [ '*' factor(&f) { *t *= f;} ]* ;factor(int *f) : '(' expr(f) ')' | DIGIT { *f = yylval;} ;

values/results passed as parameters

grammar semantics

Page 40: Compiler construction in4020 – lecture 5

main : [line]+ ;line {int e;} : expr(&e) '\n' { printf("%d\n", e);} ;expr(int *e) {int t;} : term(e) [ '+' term(&t) { *e += t;} ]* ;term(int *t) {int f;} : factor(t) [ '*' factor(&f) { *t *= f;} ]* ;factor(int *f) : '(' expr(f) ')' | DIGIT { *f = yylval;} ;

semantic actions: C code between {}

grammar semantics

Page 41: Compiler construction in4020 – lecture 5

LLgen interface to lexical analyzer• by default LLgen invokes

yylex() to get the next token

• the “value” of a token can be stored in any global variable (yylval) of any type (int)

yylex(){ int c;

c = getchar();

if (isdigit(c)) { yylval = c - '0'; return DIGIT; } return c;}

Page 42: Compiler construction in4020 – lecture 5

LLgen interface to back-end• LLgen generates a

user-named function (parse)

• LLgen handles syntax errors by inserting missing tokens and deleting unexpected tokens

• LLmessage() is invoked to notify the lexical analyzer

%start parse, main;

LLmessage(int class){ switch (class) { case -1: printf("expecting EOF, "); case 0: printf("deleting token (%d)\n",LLsymb); break; default: /* push back token LLsymb */ printf("inserting token (%d)\n",class); break; }}

Page 43: Compiler construction in4020 – lecture 5

Exercise (5 min.)

• extend LLgen-based interpreter to a desk calculator with registers named a – z. Example input: v=3*(w+4)

Page 44: Compiler construction in4020 – lecture 5

Answers

Page 45: Compiler construction in4020 – lecture 5

Answers%token REG;

expr(int *e) {int r,t;} : %if (ahead() == '=') reg(&r) '=' expr(e) { reg[r] = *e;} | term(e) [ '+' term(&t) { *e += t;} ]* ;factor(int *f) {int r;} : '(' expr(f) ')' | DIGIT { *f = yylval;} | reg(&r) { *f = reg[r];} ;reg(int *r) : REG { *r = yylval;} ;

Page 46: Compiler construction in4020 – lecture 5

Answersdynamic conflictresolution

%token REG;

expr(int *e) {int r,t;} : %if (ahead() == '=') reg(&r) '=' expr(e) { reg[r] = *e;} | term(e) [ '+' term(&t) { *e += t;} ]* ;factor(int *f) {int r;} : '(' expr(f) ')' | DIGIT { *f = yylval;} | reg(&r) { *f = reg[r];} ;reg(int *r) : REG { *r = yylval;} ;

Page 47: Compiler construction in4020 – lecture 5

Homework

• study sections:• 2.2.4.6 LLgen• 2.2.5.9 yacc

• assignment 1:• replace yacc with LLgen• deadline April 9 08:59

• print handout for next week [blackboard]