Lab 3: Using ML-Yacc

Lab 3: Using ML-Yacc

Zhong [email protected]

How to write a parser? Write a parser by hand Use a parser generator

May not be as efficient as hand-written parser General and robust How it works?

Parser Specification parser

generator

Parser

abstract syntax

stream oftokens

ML-Yacc specification Three parts again

User Declarations: declare values available in the rule actions

%%

ML-Yacc Definitions: declare terminals and non-terminals; special declarations to resolve conflicts

%%

Rules: parser specified by CFG rules and associated semantic action that generate abstract syntax

ML-Yacc Definitions specify type of positions

%pos int * int specify terminal and nonterminal symbols

%term IF | THEN | ELSE | PLUS | MINUS ...%nonterm prog | exp | op

specify end-of-parse token%eop EOF

specify start symbol (by default, non terminal in LHS of first rule)

%start prog

A Simple ML-Yacc File%%

%term NUM | PLUS | MUL | LPAR | RPAR%nonterm exp | fact | base

%pos int%start exp%eop EOF

%%

exp : fact () | fact PLUS exp ()

fact : base () | base MUL factor ()

base : NUM () | LPAR exp RPAR ()

grammar rules

semantic actions(currentlydo nothing)

grammarsymbols

each nonterminal may have a semantic value associated with it

when the parser reduces with (X ::= s) a semantic action will be executed uses semantic values from symbols in s

when parsing is completed successfully parser returns semantic value associated with the

start symbol usually a syntax tree

to use semantic values during parsing, we must declare symbol types: %terminal NUM of int | PLUS | MUL | ... %nonterminal exp of int | fact of int | base of int

type of semantic action must match type declared for the nonterminal in rule

A Simple ML-Yacc File with Action%%

%term NUM of int | PLUS | MUL | LPAR | RPAR%nonterm exp of int | fact of int | base of int

%pos int%start exp%eop EOF

%%

exp : fact (fact) | fact PLUS exp (fact + exp)

fact : base (base) | base MUL base (base1 * base2)

base : NUM (NUM) | LPAR exp RPAR (exp)

grammar ruleswithsemantic actions

grammarsymbolswithtypedeclarations

computinginteger resultvia semanticactions

Conflicts in ML-Yacc We often write ambiguous grammar

Example Tokens from lexer

NUM PLUS NUM MUL NUM

State of Parser E+E

exp ::= NUM | exp PLUS exp | exp MUL exp | LPAR exp RPAR

To be read




State of Parser E+E Result is : E+(E*E)


To be read

Shift E+E*Shift E+E*EReduce E+EReduce E

If we shift




State of Parser E+E Result is: (E+E)*E


To be read

Reduce EShift E*Shift E*EReduce E

If we reduce

This is a shift-reduce conflict We want E+E*E, because “*” has higher

precedence than “+” Another shift-reduce conflict

Tokens from lexer NUM PLUS NUM PLUS NUM

State of Parser E+E Result is : E+(E+E) and (E+E)+E

To be read

Shift E+E+Shift E+E+EReduce E+EReduce E

If we shift

Reduce EShift E+Shift E+EReduce E

If we reduce

Deal with shift-reduce conflicts This case, we need to reduce, because “+” is

left associative Deal with it!

let ML-Yacc complain. default choice is to shift when it encounters a shift-

reduce error BAD: programmer intentions unclear; harder to debug

other parts of your grammar; generally inelegant rewrite the grammar to eliminate ambiguity

can be complicated and less clear use Yacc precedence directives

%left, %right %nonassoc

Precedence and Associativity precedence of terminal based on order in

which associativity is specified precedence of rule is the precedence of the

right-most terminal eg: precedence of (E ::= E + E) == prec(+)

a shift-reduce conflict is resolved as follows prec(terminal) > prec(rule) ==> shift prec(terminal) < prec(rule) ==> reduce prec(terminal) = prec(rule) ==>

assoc(terminal) = left ==> reduce assoc(terminal) = right ==> shift assoc(terminal) = nonassoc ==> report as error

datatype exp = Int of int | Add of exp * exp | Sub of exp * exp | Mul of exp * exp | Div of exp *exp

%%

%left PLUS MINUS%left MUL DIV

%%

exp : NUM (Int NUM) | exp PLUS exp (Add (exp1, exp2)) | exp MINUS exp (Sub (exp1, exp2)) | exp MUL exp (Mul (exp1, exp2)) | exp DIV exp (Div (exp1, exp2)) | LPAR exp RPAR (exp)

Higher precedence

Reduce-reduce Conflict This kind of conflict is more difficult to deal

with Example

When we get a “word” from lexer, word -> maybeword -> sequence (rule 1) empty –> sequence word -> sequence (rule 2)

We have more than one way to get “sequence” from input “word”

sequence::= | maybeword | sequence wordmaybeword: := | word

Reduce-reduce Conflict Reduce-reduce conflict means there are two

or more rules that apply to the same sequence of input. This usually indicates a serious error in the grammar.

ML-Yacc reduce by first rule Generally, reduce-reduce conflict is not allowed in

your ML-Yacc file We need to fix our grammarsequence::=

| sequence word

Summary of conflicts Shift-reduce conflict

precedence and associativity Shift by default

Reduce-reduce conflict reduce by first rule Not allowed!

Lab3 Your job is to finish a parser for C language Input: A “.c” file Output: “Success!” if the “.c” file is correct File description

c.lex c.grm main.sml call-main.sml sources.cm lab3.mlb test.c

Using ML-Yacc Read the ML-Yacc Manual Run

If your finish “c.grm” and “c.lex” In command-line: (use MLton’s)

mlyacc c.grm mllex c.lex

we will get “c.grm.sig”, “c.grm.sml”, “c.grm.desc”, “c.lex.sml”

Then compile Lab3 Start SML/NJ, Run CM.make “sources.cm”; or in command-line, mlton lab3.mlb

To run lab3 In SML/NJ, Main.parse “test.c”; or in command-line, lab3 test.c

“Debug” ML-Yacc File When you run mlyacc, you’ll see error messages

if your ml-yacc file has conflicts. For example, mlyacc c.grm

2 shift/reduce conflicts open file “c.grm.desc”(This file is generated by

mlyacc) The beginning of this file

the rest are all the states

rule 12 means the 12th rule (from 0) in your ML-Yacc file

2 shift/reduce conflicts error: state 0: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12)error: state 1: shift/reduce conflict (shift MYSTRUCT, reduce by rule 12)

state 0: prog : . structs vdecs preds funcs MYSTRUCT shift 3 prog goto 429 structs

goto 2 structdec goto 1 .reduce by rule 12

Use ML-lex with ML-yacc Most of the work in “c.lex” this time can be

copied from Lab2 You can re-use Regular expressions and

Lexical rules Difference with Lab2

You have to define “token” in “c.grm” %term INT of int | EOF “%term” in “c.grm” will be automatically in “c.grm.sig”signature C_TOKENS =

sigtype ('a,'b) tokentype svalueval EOF: 'a * 'a -> (svalue,'a) tokenval INT: (int) * 'a * 'a -> (svalue,'a) tokenend

Hints Read ML-Yacc Manual Read the language specification Test a lot!

Lab 3: Using ML-Yacc

Documents