Top Banner
CS453 Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today (Finish up last Thursday) User-defined datatypes (Finish up last Thursday) Lexicographical analysis for punctuation and keywords in Haskell Regular languages and lexicographical analysis part I This week HW2: Due tonight PA1: It is due in 6 days! PA2 has been posted. We are starting to cover concepts needed for PA2.
51

Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

May 11, 2018

Download

Documents

DuongAnh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

CS453 Lecture Regular Languages and Lexical Analysis 1

Writing a Lexical Analyzer in Haskell

Today– (Finish up last Thursday) User-defined datatypes– (Finish up last Thursday) Lexicographical analysis for punctuation and

keywords in Haskell– Regular languages and lexicographical analysis part I

This week– HW2: Due tonight– PA1: It is due in 6 days!– PA2 has been posted. We are starting to cover concepts needed for PA2.

Page 2: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

User-defined Datatypes in Haskell

Kindof like enumerate types but can have fieldsdata Bool = False | True

data Shape = Point | Rect Int Int Int Int | Circle Int

Can derive handy propertiesdata Color = Blue | Red | Yellow deriving (Show)

main = print Yellow

data Color = Blue | Red | Yellow deriving (Show,Eq)if (Yellow==Blue) then ... else ...

Constructors can be used in pattern matchingfoo :: Shape -> String

foo Point = “Point”

foo Rect p1 p2 p3 p4 = “Rect “ ++ (show p1) ++ ...

CS453 Lecture Regular Languages and Lexical Analysis 2

Page 3: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

CS453 Lecture Regular Languages and Lexical Analysis 3

Structure of a Typical Compiler

“sentences”

Synthesis

optimization

code generation

target language

IR

IR code generation

IR

Analysis

character stream

lexical analysis

“words”tokens

semantic analysis

syntactic analysis

AST

annotated AST

interpreter

Page 4: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

CS453 Lecture Regular Languages and Lexical Analysis 4

Tokens for Example MeggyJava programimport meggy.Meggy;

class PA3Flower {

public static void main(String[] whatever){

{

// Upper left petal, clockwise Meggy.setPixel( (byte)2, (byte)4, Meggy.Color.VIOLET );

Meggy.setPixel( (byte)2, (byte)1, Meggy.Color.VIOLET);

…}

}

Tokens: TokenImportKW, TokenMeggyKW, TokenSemi, TokenClassKW, TokenID ”PA3Flower”, TokenLBrace, …

Page 5: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Some Lexical Analysis with Haskell (why is this broken?)

module Lexer where

import Data.Char -- needed for isSpace function

data Token= TokenIfKW| TokenComma-- TODO: constructors for all other tokensderiving (Show,Eq)

lexer :: String -> [Token]lexer [] = []lexer (‘i’:’f’:rest) = TokenIfKW : lexer rest-- TODO: patterns for other keyword and punctuation tokenslexer (c:rest) = if isSpace c then lexer rest else lexer (c:rest)

CS453 Lecture Regular Languages and Lexical Analysis 5

Page 6: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

CS453 Lecture Regular Languages and Lexical Analysis 6

General Approach for Lexical Analysis

Regular Languages

Finite State Machines–DFAs: Deterministic Finite Automata–Complications when doing lexical analysis– NFAs: Non Deterministic Finite State Automata

From Regular Expressions to NFAs

From NFAs to DFAs

Page 7: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

About The Slides on Languages and Finite Automata

Slides Originally Developed by Prof. Costas Busch (2004)– Many thanks to Prof. Busch for developing the original slide set.

Adapted with permission by Prof. Dan Massey (Spring 2007)– Subsequent modifications, many thanks to Prof. Massey for CS 301 slides

Adapted with permission by Prof. Michelle Strout (Spring 2011)– Adapted for use in CS 453

Adapted by Wim Bohm( added regular expr à NFA à DFA, Spr2012)Added slides from Profs. Christian Colberg and Saumya Debray (Fall 2016)

CS453 Lecture Regular Languages and Lexical Analysis 7

Page 8: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

A language is a set of strings(sometimes called sentences)

String: A finite sequence of letters

Examples: “cat”, “dog”, “house”, …

Defined over a fixed alphabet:

{ }zcba ,,,, …=Σ

Languages

CS453 Lecture Regular Languages and Lexical Analysis 8

Page 9: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Empty String

A string with no letters: ε

Observations:

ε = 0

εw = wε = w

εabba = abbaε = abba

CS453 Lecture Regular Languages and Lexical Analysis 9

Page 10: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Regular Expressions

Regular expressions describe regular languages You have probably seen them in OSs / editors

Example:

describes the language

(a | (b)(c)) *

L((a | (b)(c))*) = ε,a,bc,aa,abc,bca,...{ }

CS453 Lecture Regular Languages and Lexical Analysis 10

Page 11: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Recursive Definition for Specifying Regular Expressions

∅, ε, α

r1 | r2r1 r2r1 *r1( )

Are regular expressions

Primitive regular expressions:where

2r1rGiven regular expressions and α ∈ Σ, somealphabet

CS453 Lecture Regular Languages and Lexical Analysis 11

Page 12: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Regular operators

choice: A | B a string from L(A) or from L(B)concatenation: A B a string from L(A) followed by a

string from L(B)repetition: A* 0 or more concatenations of strings

from L(A)A+ 1 or more

grouping: ( A ) Concatenation has precedence over choice: A|B C vs. (A|B)CMore syntactic sugar, used in scanner generators:

[abc] means a or b or c[\t\n ] means tab, newline, or space[a-z] means a,b,c, …, or z

CS453 Lecture Regular Languages and Lexical Analysis 12

Page 13: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Example Regular Expressions and Regular Definitions

Regular definition:name : regular expressionname can then be used in other regular expressions

Keywords “print”, “while”

Operations: “+”, “-”, “*”

Identifiers:let : [a-zA-Z] // chose from a to z or A to Zdig : [0-9]id : let (let | dig)*

Numbers: dig+ = dig dig*

CS453 Lecture Regular Languages and Lexical Analysis 13

Page 14: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Finite Automaton, or Finite State Machine (FSM)

Input

StringOutput

String

FiniteAutomaton

CS453 Lecture Regular Languages and Lexical Analysis 14

Page 15: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Finite State Machine

Input

“Accept”or

“Reject”

String

FiniteAutomaton

Output

CS453 Lecture Regular Languages and Lexical Analysis 15

Page 16: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

State Transition Graph

initialstate

finalstate“accept”state

transition

abba -Finite Accepter

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

ba,

CS453 Lecture Regular Languages and Lexical Analysis 16

Page 17: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Initial Configuration

1q 2q 3q 4qa b b a

5q

a a bb

ba,

Input Stringa b b a

ba,

0q

CS453 Lecture Regular Languages and Lexical Analysis 17

Page 18: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Reading the Input

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

a b b a

ba,

CS453 Lecture Regular Languages and Lexical Analysis 18

Page 19: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

a b b a

ba,

CS453 Lecture Regular Languages and Lexical Analysis 19

Page 20: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

a b b a

ba,

CS453 Lecture Regular Languages and Lexical Analysis 20

Page 21: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

a b b a

ba,

CS453 Lecture Regular Languages and Lexical Analysis 21

Page 22: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

0q 1q 2q 3q 4qa b b a

Output: “accept”

5q

a a bb

ba,

a b b a

ba,

Input finished

CS453 Lecture Regular Languages and Lexical Analysis 22

Page 23: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

String Rejection

1q 2q 3q 4qa b b a

5q

a a bb

ba,

a b a

ba,

0q

CS453 Lecture Regular Languages and Lexical Analysis 23

Page 24: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

a b a

ba,

CS453 Lecture Regular Languages and Lexical Analysis 24

Page 25: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

a b a

ba,

CS453 Lecture Regular Languages and Lexical Analysis 25

Page 26: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

a b a

ba,

CS453 Lecture Regular Languages and Lexical Analysis 26

Page 27: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,Output:“reject”

a b a

ba,

Input finished

CS453 Lecture Regular Languages and Lexical Analysis 27

Page 28: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

The Empty String

1q 2q 3q 4qa b b a

5q

a a bb

ba,

ba,

0q

ε

CS453 Lecture Regular Languages and Lexical Analysis 28

Page 29: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

1q 2q 3q 4qa b b a

5q

a a bb

ba,

ba,

0q

Output:“reject”

Would it be possible to accept the empty string?

ε

CS453 Lecture Regular Languages and Lexical Analysis 29

Page 30: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Another Example

a

b ba,

ba,

0q 1q 2q

a ba

CS453 Lecture Regular Languages and Lexical Analysis 30

Page 31: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

a

b ba,

ba,

0q 1q 2q

a ba

CS453 Lecture Regular Languages and Lexical Analysis 31

Page 32: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

a

b ba,

ba,

0q 1q 2q

a ba

CS453 Lecture Regular Languages and Lexical Analysis 32

Page 33: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

a

b ba,

ba,

0q 1q 2q

a ba

CS453 Lecture Regular Languages and Lexical Analysis 33

Page 34: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

a

b ba,

ba,

0q 1q 2q

a ba

Output: “accept”

Input finished

CS453 Lecture Regular Languages and Lexical Analysis 34

Page 35: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Rejection

a

b ba,

ba,

0q 1q 2q

ab b

CS453 Lecture Regular Languages and Lexical Analysis 35

Page 36: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

a

b ba,

ba,

0q 1q 2q

ab b

CS453 Lecture Regular Languages and Lexical Analysis 36

Page 37: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

a

b ba,

ba,

0q 1q 2q

ab b

CS453 Lecture Regular Languages and Lexical Analysis 37

Page 38: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

a

b ba,

ba,

0q 1q 2q

ab b

CS453 Lecture Regular Languages and Lexical Analysis 38

Page 39: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

a

b ba,

ba,

0q 1q 2q

ab b

Output: “reject”

Input finished

Which strings are accepted?

CS453 Lecture Regular Languages and Lexical Analysis 39

Page 40: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Formalities

Deterministic Finite Automaton (DFA)

( )FqQM ,,,, 0δΣ=

δ

0q

F

: set of states

: input alphabet

: transition function

: initial state

: set of final (accepting) statesCS453 Lecture Regular Languages and Lexical Analysis 40

Page 41: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Input Alphabet Σ

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

{ }ba,=Σ

ba,

CS453 Lecture Regular Languages and Lexical Analysis 41

Page 42: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Set of States Q

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

{ }543210 ,,,,, qqqqqqQ =

ba,

CS453 Lecture Regular Languages and Lexical Analysis 42

Page 43: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Initial State 0q

1q 2q 3q 4qa b b a

5q

a a bb

ba,

ba,

0q

CS453 Lecture Regular Languages and Lexical Analysis 43

Page 44: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Set of Final States F

0q 1q 2q 3qa b b a

5q

a a bb

ba,

{ }4qF =

ba,

4q

CS453 Lecture Regular Languages and Lexical Analysis 44

Page 45: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Transition Function δ

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

QQ →Σ×:δ

ba,

CS453 Lecture Regular Languages and Lexical Analysis 45

Page 46: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

( ) 10, qaq =δ

2q 3q 4qa b b a

5q

a a bb

ba,

ba,

0q 1q

CS453 Lecture Regular Languages and Lexical Analysis 46

Page 47: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

( ) 50, qbq =δ

1q 2q 3q 4qa b b a

5q

a a bb

ba,

ba,

0q

CS453 Lecture Regular Languages and Lexical Analysis 47

Page 48: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

ba,

( ) 32, qbq =δ

CS453 Lecture Regular Languages and Lexical Analysis 48

Page 49: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Transition Function / Table δ

0q 1q 2q 3q 4qa b b a

5q

a a bb

ba,

δ a b0q

1q2q3q

4q5q

1q 5q

5q 2q5q 3q4q 5q

ba,5q5q5q5q

CS453 Lecture Regular Languages and Lexical Analysis 49

Page 50: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Complications

1. "1234" is an NUMBER but what about the “123” in “1234”or the “23”, etc. Also, the scanner must recognize many tokens,not one, only stopping at end of file.

2. "if" is a keyword or reserved word IF, but "if" is also defined by the reg. exp. for identifier ID. We want to recognize IF.

3. We want to discard white space and comments.

4. "123" is a NUMBER but so is "235" and so is "0", just as"a" is an ID and so is "bcd”, we want to recognize a token, but add attributes to it.

CS453 Lecture Regular Languages and Lexical Analysis 50

Page 51: Writing a Lexical Analyzer in Haskell - University of … Lecture Regular Languages and Lexical Analysis 1 Writing a Lexical Analyzer in Haskell Today – (Finish up last Thursday)

Before Next Time

HW2: Due tonight!

PA1: It is due in 6 days. Should be almost done.

Read Chapters 2 and 3 in the online book.

CS453 Lecture Regular Languages and Lexical Analysis 51