Top Banner
Combinator Parsing By Swanand Pagnis
210

Combinator parsing

Mar 17, 2018

Download

Technology

Swanand Pagnis
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Combinator parsing

Combinator ParsingBy Swanand Pagnis

Page 2: Combinator parsing

Higher-Order Functions for ParsingBy Graham Hutton

Page 3: Combinator parsing

• Abstract & Introduction

• Build a parser, one fn at a time

• Moving beyond toy parsers

Page 4: Combinator parsing

Abstract

Page 5: Combinator parsing

In combinator parsing, the text of parsers resembles BNF notation. We present the basic method, and a number of extensions. We address the special problems presented by whitespace, and parsers with separate lexical and syntactic phases. In particular, a combining form for handling the “offside rule” is given. Other extensions to the basic method include an “into” combining form with many useful applications, and a simple means by which combinator parsers can produce more informative error messages.

Page 6: Combinator parsing

• Combinators that resemble BNF notation

• Whitespace handling through "Offside Rule"

• "Into" combining form for advanced parsing

• Strategy for better error messages

Page 7: Combinator parsing

Introduction

Page 8: Combinator parsing

Primitive Parsers

• Take input

• Process one character

• Return results and unused input

Page 9: Combinator parsing

Combinators

• Combine primitives

• Define building blocks

• Return results and unused input

Page 10: Combinator parsing

Lexical analysis and syntax

• Combine the combinators

• Define lexical elements

• Return results and unused input

Page 11: Combinator parsing

input: "from:swiggy to:me" output: [("f", "rom:swiggy to:me")]

Page 12: Combinator parsing

input: "42 !=> ans" output: [("4", "2 !=> ans")]

Page 13: Combinator parsing

rule: 'a' followed by 'b' input: "abcdef" output: [(('a','b'),"cdef")]

Page 14: Combinator parsing

rule: 'a' followed by 'b' input: "abcdef" output: [(('a','b'),"cdef")]

Combinator

Page 15: Combinator parsing

Language choice

Page 16: Combinator parsing

Suggested: Lazy Functional Languages

Page 17: Combinator parsing

Miranda: Author's choice

Page 18: Combinator parsing

Haskell: An obvious choice. 🤓

Page 19: Combinator parsing

Racket: Another obvious choice. 🤓

Page 20: Combinator parsing

Ruby: 🍎 to 🍊 so $ for learning

Page 21: Combinator parsing

OCaml: Functional, but not lazy.

Page 22: Combinator parsing

Haskell %

Page 23: Combinator parsing

Simple when stick to fundamental FP

• Higher order functions

• Immutability

• Recursive problem solving

• Algebraic types

Page 24: Combinator parsing

Let's build a parser, one fn at a time

Page 25: Combinator parsing

type Parser a b = [a] !-> [(b, [a])]

Page 26: Combinator parsing

Types help with abstraction

• We'll be dealing with parsers and combinators

• Parsers are functions, they accept input and return results

• Combinators accept parsers and return parsers

Page 27: Combinator parsing

A parser is a function that accepts an input and returns parsed results and the unused input for each result

Page 28: Combinator parsing

Parser is a function type that accepts a list of type a and returns all possible results as a list of tuples of type (b, [a])

Page 29: Combinator parsing

(Parser Char Number) input: "42 it is!" !-- a is a [Char] output: [(42, " it is!")] !-- b is a Number

Page 30: Combinator parsing

type Parser a b = [a] !-> [(b, [a])]

Page 31: Combinator parsing

Primitive Parsers

Page 32: Combinator parsing

succeed !:: b !-> Parser a b succeed v inp = [(v, inp)]

Page 33: Combinator parsing

Always succeeds Returns "v" for all inputs

Page 34: Combinator parsing

failure !:: Parser a b failure inp = []

Page 35: Combinator parsing

Always fails Returns "[]" for all inputs

Page 36: Combinator parsing

satisfy !:: (a !-> Bool) !-> Parser a a satisfy p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []

Page 37: Combinator parsing

satisfy !:: (a !-> Bool) !-> Parser a a satisfy p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []

Guard Clauses, if you want to Google

Page 38: Combinator parsing

literal !:: Eq a !=> a !-> Parser a a literal x = satisfy (!== x)

Page 39: Combinator parsing

match_3 = (literal '3') match_3 "345" !-- !=> [('3',"45")] match_3 "456" !-- !=> []

Page 40: Combinator parsing

succeed failure satisfy literal

Page 41: Combinator parsing

Combinators

Page 42: Combinator parsing

match_3_or_4 = match_3 `alt` match_4 match_3_or_4 "345" !-- !=> [('3',"45")] match_3_or_4 "456" !-- !=> [('4',"56")]

Page 43: Combinator parsing

alt !:: Parser a b !-> Parser a b !-> Parser a b (p1 `alt` p2) inp = p1 inp !++ p2 inp

Page 44: Combinator parsing

(p1 `alt` p2) inp = p1 inp !++ p2 inpList concatenation

Page 45: Combinator parsing

(match_3 `and_then` match_4) "345" # !=> [(('3','4'),"5")]

Page 46: Combinator parsing

🐉

Page 47: Combinator parsing

and_then !:: Parser a b !-> Parser a c !-> Parser a (b, c) (p1 `and_then` p2) inp = [ ((v1, v2), out2) | (v1, out1) !<- p1 inp, (v2, out2) !<- p2 out1 ]

Page 48: Combinator parsing

and_then !:: Parser a b !-> Parser a c !-> Parser a (b, c) (p1 `and_then` p2) inp = [ ((v1, v2), out2) | (v1, out1) !<- p1 inp, (v2, out2) !<- p2 out1 ]

List comprehensions

Page 49: Combinator parsing

(v11, out11) (v12, out12) (v13, out13)

(v21, out21) (v22, out22)

(v31, out31) (v32, out32)

(v31, out31)

p1

p2

Page 50: Combinator parsing

((v11, v21), out21) ((v11, v22), out22)

Page 51: Combinator parsing

(match_3 `and_then` match_4) "345" # !=> [(('3','4'),"5")]

Page 52: Combinator parsing

Manipulating values

Page 53: Combinator parsing

match_3 = (literal '3') match_3 "345" !-- !=> [('3',"45")] match_3 "456" !-- !=> []

Page 54: Combinator parsing

(number "42") "42 !=> answer" # !=> [(42, " answer")]

Page 55: Combinator parsing

(keyword "for") "for i in 1!..42" # !=> [(:for, " i in 1!..42")]

Page 56: Combinator parsing

using !:: Parser a b !-> (b !-> c) !-> Parser a c (p `using` f) inp = [(f v, out) | (v, out) !<- p inp ]

Page 57: Combinator parsing

((string "3") `using` float) "3" # !=> [(3.0, "")]

Page 58: Combinator parsing

Levelling up

Page 59: Combinator parsing

many !:: Parser a b !-> Parser a [b] many p = ((p `and_then` many p) `using` cons) `alt` (succeed [])

Page 60: Combinator parsing

0 or many

Page 61: Combinator parsing

(many (literal 'a')) "aab" !=> [("aa","b"),("a","ab"),("","aab")]

Page 62: Combinator parsing

(many (literal 'a')) "xyz" !=> [("","xyz")]

Page 63: Combinator parsing

some !:: Parser a b !-> Parser a [b] some p = ((p `and_then` many p) `using` cons)

Page 64: Combinator parsing

1 or many

Page 65: Combinator parsing

(some (literal 'a')) "aab" !=> [("aa","b"),("a","ab")]

Page 66: Combinator parsing

(some (literal 'a')) "xyz" !=> []

Page 67: Combinator parsing

positive_integer = some (satisfy Data.Char.isDigit)

negative_integer = ((literal '-') `and_then` positive_integer) `using` cons

positive_decimal = (positive_integer `and_then` (((literal '.') `and_then` positive_integer) `using` cons)) `using` join

negative_decimal = ((literal '-') `and_then` positive_decimal) `using` cons

Page 68: Combinator parsing

number !:: Parser Char [Char] number = negative_decimal `alt` positive_decimal `alt` negative_integer `alt` positive_integer

Page 69: Combinator parsing

word !:: Parser Char [Char] word = some (satisfy isLetter)

Page 70: Combinator parsing

string !:: (Eq a) !=> [a] !-> Parser a [a] string [] = succeed [] string (x:xs) = (literal x `and_then` string xs) `using` cons

Page 71: Combinator parsing

(string "begin") "begin end" # !=> [("begin"," end")]

Page 72: Combinator parsing

xthen !:: Parser a b !-> Parser a c !-> Parser a c p1 `xthen` p2 = (p1 `and_then` p2) `using` snd

Page 73: Combinator parsing

thenx !:: Parser a b !-> Parser a c !-> Parser a b p1 `thenx` p2 = (p1 `and_then` p2) `using` fst

Page 74: Combinator parsing

ret !:: Parser a b !-> c !-> Parser a c p `ret` v = p `using` (const v)

Page 75: Combinator parsing

succeed, failure, satisfy, literal, alt, and_then, using, string, many, some, string, word, number, xthen, thenx, ret

Page 76: Combinator parsing

Expression Parser & Evaluator

Page 77: Combinator parsing

data Expr = Const Double | Expr `Add` Expr | Expr `Sub` Expr | Expr `Mul` Expr | Expr `Div` Expr

Page 78: Combinator parsing

(Const 3) `Mul` ((Const 6) `Add` (Const 1))) # !=> "3*(6+1)"

Page 79: Combinator parsing

parse "3*(6+1)" # !=> (Const 3) `Mul` ((Const 6) `Add` (Const 1)))

Page 80: Combinator parsing

(Const 3) Mul ((Const 6) `Add` (Const 1))) # !=> 21

Page 81: Combinator parsing

BNF Notation

expn !::= expn + expn | expn − expn | expn ∗ expn | expn / expn | digit+ | (expn)

Page 82: Combinator parsing

Improving a little:

expn !::= term + term | term − term | term term !::= factor ∗ factor | factor / factor | factor factor !::= digit+ | (expn)

Page 83: Combinator parsing

Parsers that resemble BNF

Page 84: Combinator parsing

addition = ((term `and_then` ((literal '+') `xthen` term)) `using` plus)

Page 85: Combinator parsing

subtraction = ((term `and_then` ((literal '-') `xthen` term)) `using` minus)

Page 86: Combinator parsing

multiplication = ((factor `and_then` ((literal '*') `xthen` factor)) `using` times)

Page 87: Combinator parsing

division = ((factor `and_then` ((literal '/') `xthen` factor)) `using` divide)

Page 88: Combinator parsing

parenthesised_expression = ((nibble (literal '(')) `xthen` ((nibble expn) `thenx`(nibble (literal ')'))))

Page 89: Combinator parsing

value xs = Const (numval xs) plus (x,y) = x `Add` y minus (x,y) = x `Sub` y times (x,y) = x `Mul` y divide (x,y) = x `Div` y

Page 90: Combinator parsing

expn = addition `alt` subtraction `alt` term

Page 91: Combinator parsing

term = multiplication `alt` division `alt` factor

Page 92: Combinator parsing

factor = (number `using` value) `alt` parenthesised_expn

Page 93: Combinator parsing

expn "12*(5+(7-2))" # !=> [ (Const 12.0 `Mul` (Const 5.0 `Add` (Const 7.0 `Sub` Const 2.0)),""), … ]

Page 94: Combinator parsing

value xs = Const (numval xs) plus (x,y) = x `Add` y minus (x,y) = x `Sub` y times (x,y) = x `Mul` y divide (x,y) = x `Div` y

Page 95: Combinator parsing

value = numval plus (x,y) = x + y minus (x,y) = x - y times (x,y) = x * y divide (x,y) = x / y

Page 96: Combinator parsing

expn "12*(5+(7-2))" # !=> [(120.0,""), (12.0,"*(5+(7-2))"), (1.0,"2*(5+(7-2))")]

Page 97: Combinator parsing

expn "(12+1)*(5+(7-2))" # !=> [(130.0,""), (13.0,"*(5+(7-2))")]

Page 98: Combinator parsing

Moving beyond toy parsers

Page 99: Combinator parsing

Whitespace? 🤔 (

Page 100: Combinator parsing

white = (literal " ") `alt` (literal "\t") `alt` (literal "\n")

Page 101: Combinator parsing

white = many (any literal " \t\n")

Page 102: Combinator parsing

/\s!*/

Page 103: Combinator parsing

any p = foldr (alt.p) fail

Page 104: Combinator parsing

any p [x1,x2,!!...,xn] = (p x1) `alt` (p x2) `alt` !!... `alt` (p xn)

Page 105: Combinator parsing

white = many (any literal " \t\n")

Page 106: Combinator parsing

nibble p = white `xthen` (p `thenx` white)

Page 107: Combinator parsing

The parser (nibble p) has the same behaviour as parser p, except that it eats up any white-space in the input string before or afterwards

Page 108: Combinator parsing

(nibble (literal 'a')) " a " # !=> [('a',""),('a'," "),('a'," ")]

Page 109: Combinator parsing

symbol = nibble.string

Page 110: Combinator parsing

symbol "$fold" " $fold " # !=> [("$fold", ""), ("$fold", " ")]

Page 111: Combinator parsing

The Offside Rule

Page 112: Combinator parsing

w = x + y where x = 10 y = 15 - 5 z = w * 2

Page 113: Combinator parsing

w = x + y where x = 10 y = 15 - 5 z = w * 2

Page 114: Combinator parsing

When obeying the offside rule, every token must lie either directly below, or to the right of its first token

Page 115: Combinator parsing

i.e. A weak indentation policy

Page 116: Combinator parsing

The Offside Combinator

Page 117: Combinator parsing

type Pos a = (a, (Integer, Integer))

Page 118: Combinator parsing

prelex "3 + \n 2 * (4 + 5)" # !=> [('3',(0,0)), ('+',(0,2)), ('2',(1,2)), ('*',(1,4)), … ]

Page 119: Combinator parsing

satisfy !:: (a !-> Bool) !-> Parser a a satisfy p [] = failure [] satisfy p (x:xs) | p x = succeed x xs !-- if p(x) is true | otherwise = failure []

Page 120: Combinator parsing

satisfy !:: (a !-> Bool) !-> Parser (Pos a) a satisfy p [] = failure [] satisfy p (x:xs) | p a = succeed a xs !-- if p(a) is true | otherwise = failure [] where (a, (r, c)) = x

Page 121: Combinator parsing

satisfy !:: (a !-> Bool) !-> Parser (Pos a) a satisfy p [] = failure [] satisfy p (x:xs) | p a = succeed a xs !-- if p(a) is true | otherwise = failure [] where (a, (r, c)) = x

Page 122: Combinator parsing

offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp onside (a, (r, c)) (b, (r', c')) = r' !>= r !&& c' !>= c

Page 123: Combinator parsing

offside !:: Parser (Pos a) b !-> Parser (Pos a) b

Page 124: Combinator parsing

offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]

Page 125: Combinator parsing

(nibble (literal 'a')) " a " # !=> [('a',""),('a'," "),('a'," ")]

Page 126: Combinator parsing

offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)]

Page 127: Combinator parsing

offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp

Page 128: Combinator parsing

offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp

Page 129: Combinator parsing

offside !:: Parser (Pos a) b !-> Parser (Pos a) b offside p inp = [(v, inpOFF) | (v, []) !<- (p inpON)] where inpON = takeWhile (onside (head inp)) inp inpOFF = drop (length inpON) inp onside (a, (r, c)) (b, (r', c')) = r' !>= r !&& c' !>= c

Page 130: Combinator parsing

(3 + 2 * (4 + 5)) + (8 * 10)

(3 + 2 * (4 + 5)) + (8 * 10)

Page 131: Combinator parsing

(offside expn) (prelex inp_1) # !=> [(21.0,[('+',(2,0)),('(',(2,2)),('8',(2,3)),('*',(2,5)),('1',(2,7)),('0',(2,8)),(')',(2,9))])]

(offside expn) (prelex inp_2) # !=> [(101.0,[])]

Page 132: Combinator parsing

Quick recap before we 🛫

Page 133: Combinator parsing

∅ !|> succeed, fail !|> satisfy, literal !|> alt, and_then, using !|> many, some !|> string, thenx, xthen, return !|> expression parser & evaluator !|> any, nibble, symbol !|> prelex, offside

Page 134: Combinator parsing

Practical parsers

Page 135: Combinator parsing

🎯Syntactical analysis 🎯Lexical analysis 🎯Parse trees

Page 136: Combinator parsing

type Parser a b = [a] !-> [(b, [a])] type Pos a = (a, (Integer, Integer))

Page 137: Combinator parsing

data Tag = Ident | Number | Symbol | Junk deriving (Show, Eq) type Token = (Tag, [Char])

Page 138: Combinator parsing

(Symbol, "if") (Number, "123")

Page 139: Combinator parsing

Parse the string with parser p, & apply token t to the result

Page 140: Combinator parsing

(p `tok` t) inp = [ (((t, xs), (r, c)), out) | (xs, out) !<- p inp] where (x, (r,c)) = head inp

Page 141: Combinator parsing

(p `tok` t) inp = [ ((<token>,<pos>),<unused input>) | (xs, out) !<- p inp] where (x, (r,c)) = head inp

Page 142: Combinator parsing

(p `tok` t) inp = [ (((t, xs), (r, c)), out) | (xs, out) !<- p inp] where (x, (r,c)) = head inp

Page 143: Combinator parsing

((string "where") `tok` Symbol) inp # !=> ((Symbol,"where"), (r, c))

Page 144: Combinator parsing

many ((p1 `tok` t1) `alt` (p2 `tok` t2) `alt` !!... `alt` (pn `tok` tn))

Page 145: Combinator parsing

[(p1, t1), (p2, t2), …, (pn, tn)]

Page 146: Combinator parsing

lex = many.(foldr op failure) where (p, t) `op` xs = (p `tok` t) `alt` xs

Page 147: Combinator parsing

🐉

Page 148: Combinator parsing

lex = many.(foldr op failure) where (p, t) `op` xs = (p `tok` t) `alt` xs

Page 149: Combinator parsing

# Rightmost computation cn = (pn `tok` tn) `alt` failure

Page 150: Combinator parsing

# Followed by (pn-1 `tok` tn-1) `alt` cn

Page 151: Combinator parsing

many ((p1 `tok` t1) `alt` (p2 `tok` t2) `alt` !!... `alt` (pn `tok` tn))

Page 152: Combinator parsing

lexer = lex [ ((some (any_of literal " \n\t")), Junk), ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]

Page 153: Combinator parsing

lexer = lex [ ((some (any_of literal " \n\t")), Junk), ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]

Page 154: Combinator parsing

lexer = lex [ ((some (any_of literal " \n\t")), Junk), ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]

Page 155: Combinator parsing

head (lexer (prelex "where x = 10")) # !=> ([((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10)) ],[])

Page 156: Combinator parsing

(head.lexer.prelex) "where x = 10" # !=> ([((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10)) ],[])

Page 157: Combinator parsing

(head.lexer.prelex) "where x = 10" # !=> ([((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10)) ],[])

Function composition

Page 158: Combinator parsing

length ((lexer.prelex) "where x = 10") # !=> 198

Page 159: Combinator parsing

Conflicts? Ambiguity?

Page 160: Combinator parsing

In this case, "where" is a source of conflict. It can be a symbol, or identifier.

Page 161: Combinator parsing

lexer = lex [ {- 1 -} ((some (any_of literal " \n\t")), Junk), {- 2 -} ((string "where"), Symbol), {- 3 -} (word, Ident), {- 4 -} (number, Number), {- 5 -} ((any_of string ["(",")","="]), Symbol)]

Page 162: Combinator parsing

Higher priority, higher precedence

Page 163: Combinator parsing

Removing Junk

Page 164: Combinator parsing

strip !:: [(Pos Token)] !-> [(Pos Token)] strip = filter ((!!= Junk).fst.fst)

Page 165: Combinator parsing

((!!= Junk).fst.fst) ((Symbol,"where"),(0,0)) # !=> True ((!!= Junk).fst.fst) ((Junk,"where"),(0,0)) # !=> False

Page 166: Combinator parsing

(fst.head.lexer.prelex) "where x = 10" # !=> [((Symbol,"where"),(0,0)), ((Junk," "),(0,5)), ((Ident,"x"),(0,6)), ((Junk," "),(0,7)), ((Symbol,"="),(0,8)), ((Junk," "),(0,9)), ((Number,"10"),(0,10))]

Page 167: Combinator parsing

(strip.fst.head.lexer.prelex) "where x = 10" # !=> [((Symbol,"where"),(0,0)), ((Ident,"x"),(0,6)), ((Symbol,"="),(0,8)), ((Number,"10"),(0,10))]

Page 168: Combinator parsing

Syntax Analysis

Page 169: Combinator parsing

characters !|> lexical analysis !|> tokens

Page 170: Combinator parsing

tokens !|> syntax analysis !|> parse trees

Page 171: Combinator parsing

f x y = add a b where a = 25 b = sub x y

answer = mult (f 3 7) 5

Page 172: Combinator parsing

f x y = add a b where a = 25 b = sub x y

answer = mult (f 3 7) 5

Script

Page 173: Combinator parsing

f x y = add a b where a = 25 b = sub x y

answer = mult (f 3 7) 5

Definition

Page 174: Combinator parsing

f x y = add a b where a = 25 b = sub x y

answer = mult (f 3 7) 5

Body

Page 175: Combinator parsing

f x y = add a b where a = 25 b = sub x y

answer = mult (f 3 7) 5

Expression

Page 176: Combinator parsing

f x y = add a b where a = 25 b = sub x y

answer = mult (f 3 7) 5

Definition

Page 177: Combinator parsing

f x y = add a b where a = 25 b = sub x y

answer = mult (f 3 7) 5

Primitives

Page 178: Combinator parsing

data Script = Script [Def] data Def = Def Var [Var] Expn data Expn = Var Var | Num Double | Expn `Apply` Expn | Expn `Where` [Def] type Var = [Char]

Page 179: Combinator parsing

prog = (many defn) `using` Script

Page 180: Combinator parsing

defn = ( (some (kind Ident)) `and_then` ((lit "=") `xthen` (offside body))) `using` defnFN

Page 181: Combinator parsing

body = ( expr `and_then` (((lit "where") `xthen` (some defn)) `opt` [])) `using` bodyFN

Page 182: Combinator parsing

expr = (some prim) `using` (foldl1 Apply)

Page 183: Combinator parsing

prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using` numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))

Page 184: Combinator parsing

!-- only allow a kind of tag kind !:: Tag !-> Parser (Pos Token) [Char] kind t = (satisfy ((!== t).fst)) `using` snd

— only allow a given symbol lit !:: [Char] !-> Parser (Pos Token) [Char] lit xs = (literal (Symbol, xs)) `using` snd

Page 185: Combinator parsing

prog = (many defn) `using` Script

Page 186: Combinator parsing

defn = ( (some (kind Ident)) `and_then` ((lit "=") `xthen` (offside body))) `using` defnFN

Page 187: Combinator parsing

body = ( expr `and_then` (((lit "where") `xthen` (some defn)) `opt` [])) `using` bodyFN

Page 188: Combinator parsing

expr = (some prim) `using` (foldl1 Apply)

Page 189: Combinator parsing

prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using` numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))

Page 190: Combinator parsing

data Script = Script [Def] data Def = Def Var [Var] Expn data Expn = Var Var | Num Double | Expn `Apply` Expn | Expn `Where` [Def] type Var = [Char]

Page 191: Combinator parsing

Orange functions are for transforming values.

Page 192: Combinator parsing

Use data constructors to generate parse trees

Page 193: Combinator parsing

Use evaluation functions to evaluate and generate a value

Page 194: Combinator parsing

f x y = add a b where a = 25 b = sub x y

answer = mult (f 3 7) 5

Page 195: Combinator parsing

Script [ Def "f" ["x","y"] ( ((Var "add" `Apply` Var "a") `Apply` Var "b") `Where` [ Def "a" [] (Num 25.0), Def "b" [] ((Var "sub" `Apply` Var "x") `Apply` Var "y")]), Def "answer" [] ( (Var "mult" `Apply` ( (Var "f" `Apply` Num 3.0) `Apply` Num 7.0)) `Apply` Num 5.0)]

Page 196: Combinator parsing

Strategy for writing parsers

Page 197: Combinator parsing

1. Identify components i.e. Lexical elements

Page 198: Combinator parsing

lexer = lex [ ((some (any_of literal " \n\t")), Junk), ((string "where"), Symbol), (word, Ident), (number, Number), ((any_of string ["(", ")", "="]), Symbol)]

Page 199: Combinator parsing

2. Structure these elements a.k.a. syntax

Page 200: Combinator parsing

defn = ((some (kind Ident)) `and_then` ((lit "=") `xthen` (offside body))) `using` defnFN

body = (expr `and_then` (((lit "where") `xthen` (some defn)) `opt` [])) `using` bodyFN

expr = (some prim) `using` (foldl1 Apply)

prim = ((kind Ident) `using` Var) `alt` ((kind Number) `using` numFN) `alt` ((lit "(") `xthen` (expr `thenx` (lit ")")))

Page 201: Combinator parsing

3. BNF notation is very helpful

Page 202: Combinator parsing

4. TDD in the absence of types

Page 203: Combinator parsing

Where to, next?

Page 204: Combinator parsing

Monadic ParsersGraham Hutton, Eric Meijer

Page 205: Combinator parsing

Introduction to FPPhilip Wadler

Page 206: Combinator parsing

The Dragon BookIf your interest is in compilers

Page 207: Combinator parsing

Libraries?

Page 208: Combinator parsing

Haskell: Parsec, MegaParsec. ✨ OCaml: Angstrom. ✨ 🚀 Ruby: rparsec, or roll you own Elixir: Combine, ExParsec Python: Parsec. ✨

Page 209: Combinator parsing

Thank you!

Page 210: Combinator parsing

Twitter: @_swanand GitHub: @swanandp