Top Banner
Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University
27

Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

Dec 12, 2015

Download

Documents

Kole Turtle
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

Regular LanguagesRegular ExpressionsFinite-State Automata

Torbjörn Lager, Stockholm University

Page 2: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 2

Languages

Sets of stringsExample:

{“ac” “abc” “abbc” “abbbc” ...}

Page 3: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 3

Finite-State Automata

Directed graphs consisting of states, and arcs labelled with symbols. For example:

20 1a c

b

Page 4: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 4

Regular expressions

Example:

Note: Atomic symbols (e.g. ‘a’) denote languages (e.g. {“a”})

Operations (here: concatenation and kleene-star) are operations over languages

[a b* c]

Page 5: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 5

Regular expressions, regular languages and automata

Regular expressions

Finite-state

automata

Regular languages

compile into

generateaccept

denote

Page 6: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 6

Regular expressions, regular languages and automata

compiles into

generatesaccepts

denotes

20 1a c

b

[a b* c]

{“ac” “abc” “abbc” ...}

Page 7: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 7

Regular expression operators

Concatenation A BUnion A | BIteration (Kleene-star) A*Difference A - BIntersection A & B

Grouping of expressions [A]

Page 8: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 8

Examples

a b {“ab”}a [b|c] {“ab” “ac”}a* [b|c]* {“” “a” “ab” ... }[a|b] & [b|c] {“b”}[a|b] - b {“a”}[a|b] - [b|a] {}

Page 9: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 9

Special symbols

? The any symbol?* The universal language[] The empty-string

language0 (or “”) The empty string (epsilon)

Page 10: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 10

Regular expression operators

Optionality (A)Kleene-plus A+Complement ~A

Containment $ARestriction A => B _

C

Page 11: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 11

Examples

a (b) c {“ac” “abc”}a b+ c {“abc” “abbc” “abbbc” ...}~[a b c] {“” ... “a” ... “ab” ...

“abca” ..}$[a|b] {“a” ... “abba” ...

“abcd” ...}b => a _ c {“” ... “a” ... “ccc” ...

“abc” ..}

Page 12: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 12

Component technologies in FST

Word lists and lexicaTokenisersMorphological analysersPart-of-speech taggersParsers

Page 13: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 13

Applications of FST

Named-entity recognitionInformation extractionCorpus linguisticsSpelling- and grammar checkingSpeech processing applications

Page 14: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 14

The Xerox Finite-State Tool

Compiles extended regular expressions into finite-state machines (automata and transducers)

Allows the user to display, examine and modify the machines

Page 15: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 15

Non-deterministic FSAs

At least one state has more than one transition leading from it labelled with the same symbol

0 1a b

b

2

Page 16: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 16

Determinization of FSAs

Any non-deterministic FSA can be transformed into an equivalent deterministic FSA.

Example:

Determinize for efficiency!

20 1a b

b

20 1a b

b

Page 17: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 17

Minimization of FSAs

Any (deterministic) FSA can be transformed into an equivalent FSA that has a minimal number of states.

Minimize for space!

Page 18: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 18

Representing word lists

Think of a word list as a regular language

Use the calculus of regular expressions to query and update the wordlist

Determinize for speed!Minimize for space!

Page 19: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 19

Various equivalences

(A) = A|[]A+ = A A* A+ = A* - []A - B = A & ~B~A = ?* - A$A = ?* A ?*~[A | B] = ~A & ~B~[A & B] = ~A | ~B

Page 20: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 20

Various equivalences

A - A = ~[?*]A | ~[?*] = AA [] = AA ~[?*] = ~[?*]A & ?* = AA | ?* = ?*

Page 21: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 21

Important theoretical results

Kleene’s theorem (concerning FSAs)Closure properties of regular

languages and regular relationsDecidability

Page 22: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 22

FSAs and regular expressions

Kleene’s theorem: Any language recognised by an FSA is denoted by a regular expression and any language denoted by a regular expression can be recognised by a FSA.

Page 23: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 23

Regex to FSA to regex

Regularexpressions

NondeterministicFSA

Nondeterministic FSAwith epsilon transitions

DeterministicFSA

Picture adapted from Hopcroft & Ullman 1979

Page 24: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 24

From regular expressions to finite-state automata

The only really necessary operators: Disjunction Concatenation Iteration

Sidenote: Compare regular grammars: A --> x B

A --> x(where A and B are nonterminals, and where x is a sequence of terminals)

Page 25: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 25

Closure properties of regular languages

A set is said to be closed under an operation iff applying the operation to members of the set will never take us outside the set

Example: if A and B are regular languages, then [A|B] is always regular. Therefore regular languages are closed under union.

Page 26: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 26

Decidability

Given one automaton A: Is the string S a string in L(A) ? Does L(A) contain any strings at all ? Is L(A) equivalent to ?* ?

Given two automata A1 and A2: Is L(A1) a subset of L(A2) ?

Are L(A1) and L(A2) equivalent ?

Do L(A1) and L(A2) overlap ?

Page 27: Regular Languages Regular Expressions Finite-State Automata Torbjörn Lager, Stockholm University.

FST - Torbjörn Lager, UU 27