Top Banner
cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans
36

Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Jan 11, 2016

Download

Documents

Sylvia Harris
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

cs3102: Theory of Computation

Class 10: DFAs in Practice

Spring 2010University of VirginiaDavid Evans

Page 2: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Menu

• Today:– Preparing for Exam 1– Language class for Deterministic PDAs– Applications of DFAs

• Thursday:– Exam Review (if you send questions and/or topics)– Applications of probabilistic DFAs and Grammars

Page 3: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Exam 1

• In class, next Tuesday, 2 March • Covers:

Classes 1-9(10 and 11)

Sipser Ch 0-2

Problem Sets 1-3 + Comments

Exam 1

Note: unlike nearly all other sets we draw in this class, all of these sets are finite, and the size (roughly) represents the relative size.

Page 4: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

What’s on the Exam?Definitions

Language, problem, setsConstructing and understanding computing models

Finite automata (DFA, NFA)Pushdown automata (DPDA, NPDA)Grammars (Context-Free Grammar)

Language Classes: Regular and Context FreeShow a language is in the classShow a language is not in the classProve or disprove a closure property

Proof MethodsProof by InductionProof by ConstructionUnderstand and use the pumping lemmas for RL and CFL

Sample exam on website should give you a good idea what to expect

Your exam will probably also have “what’s wrong with this proof” questions

Page 5: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Exam 1 Notesheet

For Exam 1, you may use only:– Your own brain and body– A low-tech writing instrument (pen or pencil) – A single page (both sides) of notes that you create

You may work with others to create your notes page.

Page 6: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Admiral Grace Hopper

John von Neumann

Albert Einstein

Page 7: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Exam Help Available

• Office Hours:– Thursdays, 8:30-9:30am– Thursdays, after class– Fridays, 10-11:30am (Sonali in Stacks)– Mondays, 1:15-3pm

• TA’s Exam Review Session– This Sunday, 5-6:30pm, Olsson 228E

Page 8: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

s

All Languages

RegularLanguages

(DFA, NFA, RE, RG)

Finite Languages

Context-Free(CFG or NPDA)

w

an

anbncn

ww

Where are the languages recognized by a Deterministic PDA?

Page 9: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proving Set Equivalence

A = B A B and B A

Sets A and B are equivalent if A is a subset of B and B is a subset of A.

BA

A B B A

Page 10: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proving Formalism Equivalence

Page 11: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proving Formalism Equivalence

Page 12: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proving Formalism Non-Equivalence

Page 13: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

s

All Languages

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free(CFG or NPDA)

Which of these could be true?

anbn

Page 14: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

How can we distinguish these two plausible possibilities?

Page 15: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free (NPDA)

DPDA

How can we distinguish these two plausible possibilities?

Find some language A that can be recognized by some NPDA but not by any DPDA.

A

Prove by construction: for any NPDA, there is a DPDA that recognizes the same language.

Page 16: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.
Page 17: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

ε, ε$

a, ε+

ε, εε

b, +εε, $ ε

ε, ε

εb, +ε

b, εεε, $ ε

Page 18: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proof by contradiction: Assume there is a DPDA that recognizes A. Show how to construct a NPDA that recognizes some language we know is not context free.

Proved by construction: We showed an NPDA that recognizes A.

Page 19: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proof by contradiction. Suppose there is a DPDA M that recognizes A.It must be in an accept state only after processing aibi and aib2i.

…a, αβ b, αβ

2i transitions, consuming 0i1i

…b, αβ b, αβ

i transitions, consuming 1i

Construct M’: copy all the states on the second half, replacing b with c:

…a, αβ b, αβ …c, αβ c, αβ

What is the language of M’?

Page 20: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Proof by contradiction. Suppose there is a DPDA M that recognizes A.It must be in an accept state only after processing aibi and aib2i.

…a, αβ b, αβ …b, αβ b, αβ

Construct M’: copy all the states on the second half, replacing b with c:

…a, αβ b, αβ …c, αβ c, αβ

Not a Context-Free Language!

We have a contradiction: if A is in L(DPDA), we could use the DPDA that recognizes A to construct an DPDA that recognizes a non-context-free language! Hence, A must not be in L(DPDA).

Page 21: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

s

All Languages

RegularLanguages

(DFA, NFA, RE, RG)

Context-Free(CFG or NPDA)

anbn

A

Deterministic Context-Free LanguagesRecognized by a DPDA (or DCFG)

Context-Free Languages DeterministicContext-Free Languages

Regular Languages

Page 22: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

DFAs in Practice

Page 23: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

MalwareScanner

W32.Bolzano.Gen: 576a222bd2c20400558b4c240cd9ffff07fbffffff{0-2}5c4e544c445200{0-2}5c57494e4e545c73797374656d33325c6e746f736b726e6c2e65786500{0-29}3b4658

W32.MyLife.E: 7a6172793230*40656d61696c2e636f6d

Note: These are the signatures from ClamAV, an open source virus scanner.

FilesNetwork

Traffic

Page 24: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

String Matching

q0 q1 q2 q3 q4 q5

t r u t h

We hold these truths to be self-evident, that …

How much work is it to scan a string of length N for a signature?

Page 25: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Faster String Matching

q0 q1 q2 q3 q4 q5

t r u t h

We hold these truths to be self-evident, that …

s[4] = h?s[10] = h?

truthtruth

s[9] = t?s[8] = u?

truthtruth

truthSkip table:a, b, c, d, e, f, g, i, j, k, l, m, n, o, p, q,

r, s, v, w, x, y, z: 6h: 0r: 4t: 1u: 2

Page 26: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

DFA / Skipping DFA

Is a “Skipping DFA” still a DFA?

(That is, does it still only accept the Regular Languages?)

Page 27: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

J. Strother Moore (UT Austin)

Boyer-Moore Fast String Searching Algorithm (1977)

Best case: N/(w+1) comparisons where N is the length of the text and w is the length of the search string

Is this fast enough for a malware scanner?

Page 28: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Virus Detection

Total number of signatures: 720,033

2

4

6

8

10

12

11/01 05/02 12/02 06/03 01/04 08/04 02/05 09/05 03/06

Date

Size

(MB

)Symantec

RAV AV

Nate Paul’s study

Can we scan one input for many possible malware signatures quickly?

Page 29: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Combining DFAs?Regular languages closed under union:

q0

qA0

qB0

qA1

qB1

ε

ε

a

a

How many states are there now?

Page 30: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Signatures

First byte: Set of signatures:00000000 ~720000/25600000001 ~720000/25600000010 ~720000/256…11111111 ~720000/256

Page 31: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Try a Trie

q0

q00

q01

q02

qFF

0x00

0x01

0x02

0xFF

q0000

q0001

q0002

q01FF

0x00

0x01

0x02

0xFF…

720000/(256*256) ~ 11

Alfred V. Aho and Margaret J. Corasick, 1975

q0000Alureona

0x02

Page 32: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Scanner Demo

http://www.virustotal.com

Page 33: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Evasive Malware

Metamorphic Code: as virus propagates, each new copy is different

How hard is it to automatically modify code without changing its behavior?

Page 34: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Detecting Evasive Malware

• Less exact signatures (e.g., W32.MyLife.E:

7a6172793230*40656d61696c2e636f6d)– Dangerous – start matching benign programs if you’re not

careful!• Behavioral signatures: match the behavior, not the

program text– Undecidable in general (we’ll see in a few weeks)– Expensive and difficult in practice (but done by all decent

scanners)

Page 35: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Faster String Scanning

Page 36: Cs3102: Theory of Computation Class 10: DFAs in Practice Spring 2010 University of Virginia David Evans.

Charge

• We focus on DFAs, NFAs, PDAs, CFGs, etc. as abstract models: Number of states, time to process, etc. don’t matter

• Lots of real applications of these models: but in practice, what matters is different

If you have topics you want me to review, post comments (on today’s class announcement) by 5pm tomorrow.