Top Banner
CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak www.cs.sjsu.edu/~mak 1
46

CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Dec 31, 2015

Download

Documents

Miles Harrison
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

1

CS 153: Concepts of Compiler DesignAugust 31 Class Meeting

Department of Computer ScienceSan Jose State University

Fall 2015Instructor: Ron Mak

www.cs.sjsu.edu/~mak

Page 2: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

2

Conceptual Design (Version 3)

A compiler and an interpreter can both use the

same front end and intermediate tier.

Page 3: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

3

Three Java Packages

TO:

UML package andclass diagrams.

Package

Class

FROM:

Page 4: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

4

Front End Class Relationships

+ public

- private

# protected

~ package

“owns a”

transientrelationship

abstractclass

These four framework classesshould be source language-independent.

class

field

Page 5: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

5

Messages from the Front End

The Parser generates messages. Syntax error messages Parser summary

number of source lines parsed number of syntax errors total parsing time

The Source generates messages. For each source line:

line number contents of the line

Page 6: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

6

Front End Messages, cont’d

We want the message producers (Parser and Source) to be loosely-coupled from the message listeners.

The producers shouldn’t care who listens to their messages.

The producers shouldn’t care what the listeners do with the messages.

The listeners should have the flexibility to do whatever they want with the messages.

Page 7: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

7

Front End Messages, cont’d

Producers implement the MessageProducer interface.

Listeners implement the MessageListener interface.

A listener registers its interest in the messages from a producer.

Whenever a producer generates a message, it “sends” the message to all of its registered listeners.

Page 8: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

8

Front End Messages, cont’d

A message producer can delegate message handling to a MessageHandler.

This is the Observer Design Pattern.

Page 9: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

9

Message Implementation Message producers

implement the MessageProducer interface.

Message listeners implement the MessageListener interface.

A message producer can delegate message handling to a MessageHandler.

Each Message has a message type and a body.

“implements”

multiplicity“zero or more”

This appears to be a lot of extra work, but it will be easy to use and it will pay back large dividends.

Page 10: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

10

Two Message Types

SOURCE_LINE message the source line number text of the source line

PARSER_SUMMARY message number of source lines read number of syntax errors total parsing time

By convention, the message producers and the message listeners agree on the format and content of the messages.

Page 11: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

11

Good Framework Symmetry

Page 12: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

12

An Apt Quote?

Before I came here, I was confused about this subject. Having listened to your lecture, I am still confused, but on a higher level. Enrico Fermi, physicist, 1901-1954

Page 13: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

13

Pascal-Specific Front End Classes PascalParserTD

is a subclass of Parser and implements the parse() and getErrorCount() methods for Pascal. TD for “top down”

PascalScanner is a subclass of Scanner and implements the extractToken() method for Pascal.

StrategyDesign Pattern

Page 14: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

14

The Pascal Parser Class

The initial version of method parse() does hardly anything, but it forces the scanner into action and serves our purpose of doing end-to-end testing.

public void parse() throws Exception{ Token token; long startTime = System.currentTimeMillis();

while (!((token = nextToken()) instanceof EofToken)) {}

// Send the parser summary message. float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; sendMessage(new Message(PARSER_SUMMARY, new Number[] {token.getLineNumber(), getErrorCount(), elapsedTime}));}

What does thiswhile loop do?

Page 15: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

15

The Pascal Scanner Class The initial version of method extractToken() doesn’t

do much either, other than create and return either a default token or the EOF token.

protected Token extractToken() throws Exception{ Token token; char currentChar = currentChar();

// Construct the next token. The current character determines the // token type. if (currentChar == EOF) { token = new EofToken(source); } else { token = new Token(source); }

return token;}

Remember that the Scannermethod nextToken() calls theabstract method extractToken().

Here, the Scanner subclassPascalScanner implementsmethod extractToken().

Page 16: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

16

The Token Class

The Token class’s default extract() method extracts just one character from the source. This method will be overridden by the various

token subclasses. It serves our purpose of doing end-to-end testing.

protected void extract() throws Exception{ text = Character.toString(currentChar()); value = null;

nextChar(); // consume current character}

Page 17: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

17

The Token Class, cont’d

A character (or a token) is “consumed” after it has been read and processed, and the next one is about to be read.

If you forget to consume, you will loop forever on the same character or token.

Page 18: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

18

A Front End Factory Class

A language-specific parser goes together with a scanner for the same language.

But we don’t want the framework classes to be tied to a specific language. Framework classes should be language-independent.

We use a factory class to create a matching parser-scanner pair.

Factory MethodDesign Pattern

Page 19: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

19

A Front End Factory Class, cont’d

Good:

Arguments to the createParser() method enable it to create and return a parser bound to an appropriate scanner.

Variable parser doesn’t have to know what kind of parser subclass the factory created.

Once again, the idea is to maintain loose coupling.

Parser parser = FrontendFactory.createParser( … );

“Coding to the interface.”

Page 20: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

20

A Front End Factory Class, cont’d

Good:

Bad:

Why is this bad? Now variable parser is tied to a specific language.

Parser parser = FrontendFactory.createParser( … );

PascalParserTD parser = new PascalParserTD( … )

Page 21: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

21

A Front End Factory Class, cont’d

public static Parser createParser(String language, String type, Source source) throws Exception{ if (language.equalsIgnoreCase("Pascal") && type.equalsIgnoreCase("top-down")) { Scanner scanner = new PascalScanner(source); return new PascalParserTD(scanner); } else if (!language.equalsIgnoreCase("Pascal")) { throw new Exception("Parser factory: Invalid language '" + language + "'"); } else { throw new Exception("Parser factory: Invalid type '" + type + "'"); }}

Page 22: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

22

Initial Back End Subclasses The CodeGenerator and Executor subclasses

will only be (do-nothing) stubs for now.

StrategyDesign Pattern

Page 23: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

23

The Code Generator Class

All the process() method does for now is send the COMPILER_SUMMARY message. number of instructions generated (none for now) code generation time (nearly no time at all for now)

public void process(ICode iCode, SymTab symTab) throws Exception{ long startTime = System.currentTimeMillis(); float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; int instructionCount = 0;

// Send the compiler summary message. sendMessage(new Message(COMPILER_SUMMARY, new Number[] {instructionCount, elapsedTime}));}

Page 24: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

24

The Executor Class All the process() method does for now is

send the INTERPRETER_SUMMARY message. number of statements executed (none for now) number of runtime errors (none for now) execution time (nearly no time at all for now)

public void process(ICode iCode, SymTab symTab) throws Exception{ long startTime = System.currentTimeMillis(); float elapsedTime = (System.currentTimeMillis() - startTime)/1000f; int executionCount = 0; int runtimeErrors = 0;

// Send the interpreter summary message. sendMessage(new Message(INTERPRETER_SUMMARY, new Number[] {executionCount, runtimeErrors, elapsedTime}));}

Page 25: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

25

A Back End Factory Class

public static Backend createBackend(String operation) throws Exception{ if (operation.equalsIgnoreCase("compile") { return new CodeGenerator(); } else if (operation.equalsIgnoreCase("execute")) { return new Executor(); } else { throw new Exception("Backend factory: Invalid operation '" + operation + "'"); }}

Page 26: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

26

End-to-End: Program Listings

Here’s the heart of the main Pascal class’s constructor:

source = new Source(new BufferedReader(new FileReader(filePath)));source.addMessageListener(new SourceMessageListener());

parser = FrontendFactory.createParser("Pascal", "top-down", source);parser.addMessageListener(new ParserMessageListener());

backend = BackendFactory.createBackend(operation);backend.addMessageListener(new BackendMessageListener());

parser.parse();iCode = parser.getICode();symTab = parser.getSymTab();

backend.process(iCode, symTab);source.close();

The front end parser creates the intermediate codeand the symbol table of the intermediate tier.

The back end processes the intermediate code and the symbol table .

Page 27: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

27

Listening to Messages Class Pascal has inner classes that implement the MessageListener interface.

private static final String SOURCE_LINE_FORMAT = "%03d %s";

private class SourceMessageListener implements MessageListener{ public void messageReceived(Message message) { MessageType type = message.getType(); Object body[] = (Object []) message.getBody();

switch (type) {

case SOURCE_LINE: { int lineNumber = (Integer) body[0]; String lineText = (String) body[1];

System.out.println(String.format(SOURCE_LINE_FORMAT, lineNumber, lineText)); break; } } }}

Demo

Page 28: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

28

Is it Really Worth All this Trouble?

Major software engineering challenges: Managing change. Managing complexity.

To help manage change, use the open-closed principle. Close the code for modification.

Open the code for extension.

Closed: The language-independent framework classes.

Open: The language-specific subclasses.

Page 29: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

29

Is it Really Worth All this Trouble? cont’d

Techniques to help manage complexity: Partitioning Loose coupling Incremental development

Always build upon working code.

Good object-oriented designwith design patterns.

Page 30: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

30

Source Files from the Book

Download the Java source code from each chapter of the book: http://www.apropos-logic.com/wci/

You will not survive this course if you use a simple text editor like Notepad to view and edit the Java code.

The complete Pascal interpreter in Chapter 12 contains 127 classes and interfaces.

Page 31: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

31

Integrated Development Environment (IDE)

You can use either Eclipse or NetBeans.

Eclipse is preferred because there is a JavaCC plug-in.

Learn how to create projects, edit source files, single-step execution, set breakpoints, examine variables, read stack dumps, etc.

Page 32: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

32

Pascal-Specific Front End Classes

Page 33: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

33

The Payoff

Now that we have …

Source language-independent framework classes Pascal-specific subclasses

Mostly just placeholders for now An end-to-end test (the program listing generator)

… we can work on the individual components

Without worrying (too much) about breaking the rest of the code.

Page 34: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

34

Front End Framework Classes

Page 35: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

35

Pascal-Specific Subclasses

Page 36: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

36

PascalTokenType Each token is an enumerated value.

public enum PascalTokenType implements TokenType{ // Reserved words. AND, ARRAY, BEGIN, CASE, CONST, DIV, DO, DOWNTO, ELSE, END, FILE, FOR, FUNCTION, GOTO, IF, IN, LABEL, MOD, NIL, NOT, OF, OR, PACKED, PROCEDURE, PROGRAM, RECORD, REPEAT, SET, THEN, TO, TYPE, UNTIL, VAR, WHILE, WITH,

// Special symbols. PLUS("+"), MINUS("-"), STAR("*"), SLASH("/"), COLON_EQUALS(":="), DOT("."), COMMA(","), SEMICOLON(";"), COLON(":"), QUOTE("'"), EQUALS("="), NOT_EQUALS("<>"), LESS_THAN("<"), LESS_EQUALS("<="), GREATER_EQUALS(">="), GREATER_THAN(">"), LEFT_PAREN("("), RIGHT_PAREN(")"), LEFT_BRACKET("["), RIGHT_BRACKET("]"), LEFT_BRACE("{"), RIGHT_BRACE("}"), UP_ARROW("^"), DOT_DOT(".."),

IDENTIFIER, INTEGER, REAL, STRING, ERROR, END_OF_FILE; ...}

Page 37: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

37

PascalTokenType, cont’d

The static set RESERVED_WORDS contains all of Pascal’s reserved word strings in lower case: "and" , "array" , "begin" , etc.

We can test whether a token is a reserved word:

// Set of lower-cased Pascal reserved word text strings.public static HashSet<String> RESERVED_WORDS = new HashSet<String>();static { PascalTokenType values[] = PascalTokenType.values(); for (int i = AND.ordinal(); i <= WITH.ordinal(); ++i) { RESERVED_WORDS.add(values[i].getText().toLowerCase()); }}

if (RESERVED_WORDS.contains(text.toLowerCase())) …

Page 38: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

38

PascalTokenType, cont’d

Static hash table SPECIAL_SYMBOLS contains all of Pascal’s special symbols. Each entry’s key is the string, such as "<" , "=" , "<=” Each entry’s value is the corresponding enumerated

value.

// Hash table of Pascal special symbols. // Each special symbol's text is the key to its Pascal token type.public static Hashtable<String, PascalTokenType> SPECIAL_SYMBOLS = new Hashtable<String, PascalTokenType>();static { PascalTokenType values[] = PascalTokenType.values(); for (int i = PLUS.ordinal(); i <= DOT_DOT.ordinal(); ++i) { SPECIAL_SYMBOLS.put(values[i].getText(), values[i]); }}

Page 39: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

39

PascalTokenType, cont’d

We can test whether a token is a special symbol:

if (PascalTokenType.SPECIAL_SYMBOLS .containsKey(Character.toString(currentChar))) …

Page 40: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

40

Pascal-Specific Token Classes Each class

PascalWordToken, PascalNumberToken, PascalStringToken, PascalSpecial-SymbolToken, and PascalErrorToken is is a subclass of class PascalToken. PascalToken

is a subclass of class Token.

Each Pascal token subclass overrides the default extract() method of class Token. The default method

could only create single-character tokens.

Loosely coupled.Highly cohesive.

Page 41: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

41

Syntax Diagrams

Page 42: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

42

How to Scan for Tokens

Suppose the source line contains

IF (index >= 10) THEN

The scanner skips over the leading blanks. The current character is I, so the next token must be a word.

The scanner extracts a word token by copying characters up to but not including the first character that is not valid for a word, which in this case is a blank. The blank becomes the current character. The scanner determines that the word is a reserved word.

Page 43: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

43

How to Scan for Tokens, cont’d

The scanner skips over any blanks between tokens. The current character is (. The next token must be a special symbol.

After extracting the special symbol token, the current character is i. The next token must be a word.

After extracting the word token, the current character is a blank.

Page 44: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

44

How to Scan for Tokens, cont’d Skip the blank. The current character is >.

Extract the special symbol token. The current character is a blank.

Skip the blank. The current character is 1, so the next token must be a number.

After extracting the number token, the current character is ).

Page 45: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

45

How to Scan for Tokens, cont’d Extract the special symbol token. The current character is a blank.

Skip the blank. The current character is T, so the next token must be a word.

Extract the word token. Determine that it’s a reserved word.

The current character is \n, so the scanner is done with this line.

Page 46: CS 153: Concepts of Compiler Design August 31 Class Meeting Department of Computer Science San Jose State University Fall 2015 Instructor: Ron Mak mak.

Computer Science Dept.Fall 2015: August 31

CS 153: Concepts of Compiler Design© R. Mak

46

Basic Scanning Algorithm Skip any blanks until the current character is nonblank.

In Pascal, a comment and the end-of-line character each should be treated as a blank.

The current (nonblank) character determines what the next token is and becomes that token’s first character.

Extract the rest of the next token by copying successive characters up to but not including the first character that does not belong to that token.

Extracting a token consumes all the source characters that constitute the token. After extracting a token, the current character is the first character after the last character of that token.