Top Banner
CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak www.cs.sjsu.edu/~mak 1
22

CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Jan 19, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

1

CS 153: Concepts of Compiler DesignOctober 26 Class Meeting

Department of Computer ScienceSan Jose State University

Fall 2015Instructor: Ron Mak

www.cs.sjsu.edu/~mak

Page 2: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

2

Review: JavaCC Compiler-Compiler

Feed JavaCC the grammar for a source language and it will automatically generate a scanner and a parser.

Specify the source language tokens with regular expressions JavaCC generates a scanner for the source language.

Specify the source language syntax rules with Extended BNF JavaCC generates a parser for the source language.

Page 3: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

3

Review: JavaCC Compiler-Compiler, cont’d

The generated scanner and parser are written in Java.

Note: JavaCC calls the scanner the “tokenizer”.

Page 4: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

4

Review: JavaCC Regular Expressions

Literals <HELLO : "hello">

Character classes <SPACE_OR_COMMA : [" ", ","]>

Character ranges <LOWER_CASE : ["a"-"z"]>

Alternates <UPPER_OR_LOWER : ["A"-"Z"] | ["a"-"z"]>

Tokenname

Tokenstring

Page 5: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

5

Review: JavaCC Regular Expressions, cont’d

Negation <NO_DIGITS : ~["0"-"9"]>

Repetition <THREE_A : ("a"){3}> <TWO_TO_FOUR_A : ("a"){2,4}>

Quantifiers <ONE_OR_MORE_A : ("a")+> <ZERO_OR_ONE_SIGN : (["+", "-"])?> <ZERO_OR_MORE_DIGITS : (["0"-"9"])*>

Page 6: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

6

JavaCC Parser Specification

Use JavaCC regular expressions to specify tokens. Use EBNF to specify JavaCC production rules.

Phone number example from Chapter 3 of the JavaCC book.

Example phone number: 408-123-4567 EBNF:

<digit> ::= 0|1|2|3|4|5|6|7|8|9<three digits> ::= <digit> <digit> <digit><four digits> ::= <digit> <digit> <digit> <digit>

<phone number> ::= <three digits> - <three digits> - <four digits>

Page 7: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

7

JavaCC Parser Specification, cont’d

EBNF:

JavaCC:

TOKEN : { <FOUR_DIGITS : (<DIGITS>){4}> | <THREE_DIGITS : (<DIGITS>){3}> | <#DIGITS : ["0"-"9"]>}

void PhoneNumber() : {} { <THREE_DIGITS> "-" <THREE_DIGITS> "-" <FOUR_DIGITS> <EOF>}

Token specifications

Production rule

Java statements can go in here!

<digit> ::= 0|1|2|3|4|5|6|7|8|9<three digits> ::= <digit> <digit> <digit><four digits> ::= <digit> <digit> <digit> <digit>

<phone number> ::= <three digits> - <three digits> - <four digits>

phone.jj

Terminal Terminal LiteralLiteral Terminal Terminal

Nonterminal

Page 8: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

8

JavaCC Production Rule Methods

JavaCC generates a top-down recursive-descent parser. Each production rule becomes a Java method of the parser class. You can pass parameters to the methods.

void PhoneNumber() : { StringBuffer sb = new StringBuffer();}{ AreaCode(sb) "-" <THREE_DIGITS> {sb.append(token.image);} "-" <FOUR_DIGITS> {sb.append(token.image);} <EOF> {System.out.println("Number: " + sb.toString());}} void AreaCode(StringBuffer buf) : {} { <THREE_DIGITS> {buf.append(token.image);}}

Java statement.

Syntactic action.

phone_method_param.jj

Also try withdebugging on!

Page 9: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

9

Grammar Problems

Be very careful when specifying grammars!

JavaCC will not be able to generate a correct parser for a faulty grammar.

Common grammar faults include choice conflict left recursion

Page 10: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

10

Choice Conflict

Suppose we want to parse both local phone numbers and long-distance phone numbers: Local: 123-4567 Long-distance: 201-456-7890

<local> ::= <prefix> - <four digit><long distance> ::= <area code> - <prefix> - <four digit>

<prefix> ::= <three digit><area code> ::= <three digit>

Choice conflict! While attempting to parse “123-4567”, the parser cannot tell

whether the initial “123” is a <prefix> or an<area code> since they are both <three digit>.

phone_choice.jj

Page 11: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

11

Choice Conflict Resolution: Left Factoring

One way to resolve a choice conflict is by left factoring. Factor out the common head from the productions.

void PhoneNumber() : {} {

Head() "-" (LocalNumber() | LongDistanceNumber()) <EOF>}

void LocalNumber() : {} { <FOUR_DIGITS>}

void LongDistanceNumber() : {} { <THREE_DIGITS> "-" <FOUR_DIGITS>}

void Head() : {} { <THREE_DIGITS> }

phone_left_factored.jj

How does thisfix the problem?

Page 12: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

12

Lookahead

A top-down parser naturally “looks ahead” one token. This token tells the parser which nonterminal

it will parse next. “IF” : next parse an IF statement “REPEAT” : next parse a REPEAT statement

A choice conflict occurs if a one-token lookahead is not sufficient to determine which nonterminal to parse next. <three digit>

Next parse a local number or a long-distance number?

Page 13: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

13

Backtracking

The parser cannot backtrack.

Suppose the parser has parsed “123-”

It decides that’s an area code, so it must be parsing a long-distance number.

Now it sees “4567”.

Oops! It cannot backtrack and reparse “123-” as the prefix to a local number.

Page 14: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

14

Choice Conflict Resolution: Lookahead

Another way to resolve a choice conflict is by telling the parser to look ahead more than just one token.

To decide between parsing a local number and a long-distance telephone number: One-token lookahead is insufficient: “123” Two-token lookahead is insufficient: “123-” Three-token lookahead will distinguish a local number from a

long-distance number: “123-4567”

void PhoneNumber() : {} { ( LOOKAHEAD(3) LocalNumber() | LongDistanceNumber() ) <EOF>}

By looking ahead three tokens, the parser can successfullychoose between LocalNumber() and LongDistanceNumber().

phone_lookahead.jj

Page 15: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

15

Lookahead

Global lookahead Major performance penalty. Avoid if possible!

Syntactic lookahead Semantic lookahead Nested lookahead

Too convoluted! Minimize the need for these. Why would you design a grammar

that needed these?

Page 16: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

16

Lookahead

Lookahead will slow down parsing.

Try to design grammars that do not require more than one token of lookahead.

For example, Pascal only requires one-token lookahead.

Page 17: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

17

Left Recursion

Suppose we want to parse very simple expressions like “1+2”, “1+2+3”, “9+4+7+2”, etc.

<expression> ::= <expression> + <term> | <term> <term> ::= <digit>

Left recursion!

The nonterminal <expression> refers to itself recursively such that the recursion will never end.

Because the recursive reference is at the left end of the rule, no tokens are consumed.

<expression> ::= <expression> + <term>

Page 18: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

18

Left Recursion, cont’d

void Expression() : {} { ( Expression() "+" Term() | Term() ) { System.out.println("Parsed expression"); }}

expression_left_recursion.jj

Page 19: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

19

Left Recursion Resolution: Iteration

Resolve left recursion by replacing it with iteration. Instead of:

<expression> ::= <expression> + <term> | <term> <term> ::= <digit>

Use EBNF:

<expression> ::= <term> { + <term> }<term> ::= <digit>

void Expression() : {} { Term() ("+" Term())* { System.out.println("Parsed expression"); }}

expression_iteration.jj

Page 20: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

20

Right Recursion

Right recursion:

<expression> ::= <term> + <expression> | <term> <term> ::= <digit>

Right recursion is not a problem for JavaCC. Because there are non-recursive references to

the left of the recursive reference, tokens are consumed by the scanner. The parser continues to make forward progress. The recursion ends as soon as the parser sees

a token that doesn’t fit the production rule.

Page 21: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

21

Right Recursion

However, there may be choice conflicts.

Does a <digit> start <term> + <expression>

or simply <term> ?

How much lookahead do we need?void Expression() : {} { LOOKAHEAD(2) Term() "+" Expression() | Term() { System.out.println("Parsed expression"); }}

expression_right_recursion.jj

Page 22: CS 153: Concepts of Compiler Design October 26 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: October 26

CS 153: Concepts of Compiler Design© R. Mak

22

JJDoc

JJDoc produces documentation for your grammar.

Right-click in the .jj edit window to generate an HTML file from a .jj grammar file.

Or run from the command line:

Read Chapter 5 of the JavaCC book. Ideal for your project documentation!

Demo

java -classpath /Applications/Eclipse/plugins/sf.eclipse.javacc_1.5.30/jars/javacc.jar jjdoc foo.jj