Top Banner
Zadar, August 2010 1 Learning from Text Colin de la Higuera University of Nantes
67

Learning from Text

Jan 23, 2016

Download

Documents

dimaia

Learning from Text. Colin de la Higuera University of Nantes. Acknowledgements. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning from Text

Zadar, August 2010 1

Learning from Text

Colin de la HigueraUniversity of Nantes

Page 2: Learning from Text

Zadar, August 2010

2

Cdlh 2010

Acknowledgements Laurent Miclet, Jose Oncina, Tim Oates, Anne-

Muriel Arigon, Leo Becerra-Bonache, Rafael Carrasco, Paco Casacuberta, Pierre Dupont, Rémi Eyraud, Philippe Ezequel, Henning Fernau, Jean-Christophe Janodet, Satoshi Kobayachi, Thierry Murgue, Frédéric Tantini, Franck Thollard, Enrique Vidal, Menno van Zaanen,...

http://pagesperso.lina.univ-nantes.fr/~cdlh/http://videolectures.net/colin_de_la_higuera/

Page 3: Learning from Text

Zadar, August 2010

3

Cdlh 2010

Outline1. Motivations, definition and difficulties2. Some negative results3. Learning k-testable languages from

text4. Learning k-reversible languages from

text5. Conclusions

http://pagesperso.lina.univ-nantes.fr/~cdlh/slides/Chapters 8 and 11

Page 4: Learning from Text

Zadar, August 2010

4

Cdlh 2010

1 Identification in the limit

L Pres ℕXA class of languages

A class of grammars

G

L A learnerThe naming function

yields

a

(ℕ)=(ℕ) yields()=yields()L(a())=yields()

Page 5: Learning from Text

Zadar, August 2010

5

Cdlh 2010

Learning from text

Only positive examples are available Danger of over-generalization: why

not return *? The problem is “basic”:

Negative examples might not be available

Or they might be heavily biased: near-misses, absurd examples…

Base line: all the rest is learning with help

Page 6: Learning from Text

Zadar, August 2010

6

Cdlh 2010

PTA

?

GI as a search problem

Page 7: Learning from Text

Zadar, August 2010

7

Cdlh 2010

Questions?

Data is unlabelled… Is this a clustering problem? Is this a problem posed in other

settings?

Page 8: Learning from Text

Zadar, August 2010

8

Cdlh 2010

2 The theory

Gold 67: No super-finite class can be identified from positive examples (or text) only

Necessary and sufficient conditions for learning

Literature: inductive inference, ALT series, …

Page 9: Learning from Text

Zadar, August 2010

9

Cdlh 2010

Limit point

A class L of languages has a limit point if there exists an infinite sequence Ln nℕ

of languages in L such that L0 L1 … Ln …, and there exists another

language L L such that L = nℕLn

L is called a limit point of L

Page 10: Learning from Text

Zadar, August 2010

10

Cdlh 2010

L is a limit point

L0 L1L2

L3

Li

L

Page 11: Learning from Text

Zadar, August 2010

11

Cdlh 2010

Theorem

If L admits a limit point, then L is not learnable from text

Proof:Proof: Let si be a presentation in length-lex order for Li, and s be a presentation in length-lex order for L. Then nℕ i / kn si

k = sk

Note: having a limit point is a sufficient condition for non learnability; not a necessary condition

Page 12: Learning from Text

Zadar, August 2010

12

Cdlh 2010

Mincons classes

A class is mincons if there is an algorithm which, given a sample S,

builds a GG such that S L L(G) L = L(G)

Ie there is a unique minimum (for inclusion) consistent grammar

Page 13: Learning from Text

Zadar, August 2010

13

Cdlh 2010

Accumulation point (Kapur 91)

A class L of languages has an accumulation point if there exists an infinite sequence Sn nℕ of sets such that S0 S1 … Sn …, and L= nℕSn L …and for any nℕ there exists a language Ln’ in L such that Sn Ln’ L.

The language L is called an accumulation point of L

Page 14: Learning from Text

Zadar, August 2010

14

Cdlh 2010

L is an accumulation point

L

Ln’

S0 S1S2

S3

Sn

Page 15: Learning from Text

Zadar, August 2010

15

Cdlh 2010

Theorem (for Mincons classes)

L admits an accumulation point

iff

L is not learnable from text

Page 16: Learning from Text

Zadar, August 2010

16

Cdlh 2010

Infinite Elasticity

If a class of languages has a limit point there exists an infinite ascending chain of languages L0

L1 … Ln ….This property is called infinite

elasticity

Page 17: Learning from Text

Zadar, August 2010

17

Cdlh 2010

Infinite Elasticity

x0 x1

x2

x3

xi Xi+1 Xi+2 Xi+3 Xi+4

Page 18: Learning from Text

Zadar, August 2010

18

Cdlh 2010

Finite elasticity

L has finite elasticity if it does not have infinite elasticity

Page 19: Learning from Text

Zadar, August 2010

19

Cdlh 2010

Theorem (Wright)

If L (G) has finite elasticity and is

mincons, then G is learnable.

Page 20: Learning from Text

Zadar, August 2010

20

Cdlh 2010

Tell tale sets

L(G)

L(G’)TG

x4

x3

x2

x1

Forbidden

Page 21: Learning from Text

Zadar, August 2010

21

Cdlh 2010

Theorem (Angluin)

G is learnable iff there is a computable partial function : G ℕ* such that:

1) nℕ, (G,n) is defined iff GG and L(G)

2) GG, TM={(G,n): nℕ} is a finite subset of L(G) called a tell-tale subset

3) G,G’M, if TM L(G’) then L(G’) L(G)

Page 22: Learning from Text

Zadar, August 2010

22

Cdlh 2010

Proposition (Kapur 91)

A language L in L has a tell-tale subset iff L is not an accumulation point.

(for mincons)

Page 23: Learning from Text

Zadar, August 2010

23

Cdlh 2010

Summarizing

Many alternative ways of proving that identification in the limit is not feasible

Methodological-philosophical discussion We still need practical solutions

Page 24: Learning from Text

Zadar, August 2010 24

3 Learning k-testable languages

P. García and E. Vidal. Inference of K-testable languages in the strict sense and applications

to syntactic pattern recognition. Pattern Analysis and Machine Intelligence, 12(9):920–

925, 1990P. García, E. Vidal, and J. Oncina. Learning

locally testable languages in the strict sense. In Workshop on Algorithmic Learning Theory

(Alt 90), pages 325–338, 1990

Page 25: Learning from Text

Zadar, August 2010

25

Cdlh 2010

Definition

Let k0, a k-testable language in the strict sense (k-TSS) is a 5-tuple Zk=(, I, F, T, C) with: a finite alphabet I, F k-1 (allowed prefixes of length k-1 and

suffixes of length k-1) T k (allowed segments) C <k contains all strings of length less

than k Note that I∩F=C∩Σk-1

Page 26: Learning from Text

Zadar, August 2010

26

Cdlh 2010

The k-testable language is L(Zk)=I* *F - *(k-T)*C

Strings (of length at least k) have to use a good prefix and a good suffix of length k-1, and all sub-strings have to belong to T. Strings of length less than k should be in C

Or: k-T defines the prohibited segments

Key idea: use a window of size k

Page 27: Learning from Text

Zadar, August 2010

27

Cdlh 2010

An example (2-testable)

I={a}

F={a}

T={aa, ab, ba}C={,a}

ab

a

a

ba

Page 28: Learning from Text

Zadar, August 2010

28

Cdlh 2010

Window language

By sliding a window of size 2 over a string we can parse

ababaaababababaaaab OK aaabbaaaababab not OK

Page 29: Learning from Text

Zadar, August 2010

29

Cdlh 2010

The hierarchy of k-TSS languages

k-TSS()={L*: L is k-TSS} All finite languages are in k-TSS() if k

is large enough! k-TSS() [k+1]-TSS() (bak)* [k+1]-TSS() (bak)* k-TSS()

Page 30: Learning from Text

Zadar, August 2010

30

Cdlh 2010

A language that is not k-testable

b

aa

ba

a

Page 31: Learning from Text

Zadar, August 2010

31

Cdlh 2010

K-TSS inference

Given a sample S, L(ak-TSS(S))= Zk where Zk=((S), I(S), F(S), T(S), C(S) ) and (S) is the alphabet used in S C(S)=(S)<kS I(S)=(S)k-1Pref(S) F(S)= (S)k-1Suff(S) T(S)=(S)k {v: uvwS}

Page 32: Learning from Text

Zadar, August 2010

32

Cdlh 2010

Example

S={a, aa, abba, abbbba} Let k=3

(S)={a, b} I(S)= {aa, ab} F(S)= {aa, ba} C(S)= {a , aa} T(S)={abb, bbb, bba}

L(a3-TSS(S))= ab*a+a

Page 33: Learning from Text

Zadar, August 2010

33

Cdlh 2010

Building the corresponding automaton

Each string in IC and PREF(IC) is a state Each substring of length k-1 of strings in T is a

state is the initial state Add a transition labeled b from u to ub for each

state ub Add a transition labeled b from au to ub for

each aub in T Each state/substring that is in F is a final state Each state/substring that is in C is a final state

Page 34: Learning from Text

Zadar, August 2010

34

Cdlh 2010

Running the algorithm

S={a, aa, abba, abbbba}

I={aa, ab}

F={aa, ba}

T={abb, bbb, bba}C={a, aa}

a

ab

babb

aaa

b

b

b

a

a

a

ab

babb

aa

Page 35: Learning from Text

Zadar, August 2010

35

Cdlh 2010

Properties (1)

S L(ak-TSS(S))

L(ak-TSS(S)) is the smallest k-TSS language that contains S If there is a smaller one, some prefix, suffix

or substring has to be absent

Page 36: Learning from Text

Zadar, August 2010

36

Cdlh 2010

Properties (2)

ak-TSS identifies any k-TSS language in the limit from polynomial data Once all the prefixes, suffixes and

substrings have been seen, the correct automaton is returned

If YS, L(ak-TSS(Y)) L(ak-TSS(S))

Page 37: Learning from Text

Zadar, August 2010

37

Cdlh 2010

Properties (3)

L(ak+1-TSS(S)) L(ak-TSS(S))

In Ik+1 (resp. Fk+1 and Tk+1) there are less allowed prefixes (resp. suffixes or substrings) than in Ik (resp. Fk and Tk)

k>maxxSx, L(ak-TSS(S))= S Because for a large k, Tk(S)=

Page 38: Learning from Text

Zadar, August 2010 38

4 Learning k-reversible languages from text

D. Angluin. Inference of reversible languages. Journal of the Association for Computing Machinery, 29(3):741–765, 1982

Page 39: Learning from Text

Zadar, August 2010

39

Cdlh 2010

The k-reversible languages

The class was proposed by Angluin (1982) The class is identifiable in the limit from text The class is composed by regular languages

that can be accepted by a DFA such that its reverse is deterministic with a look-ahead of k

Page 40: Learning from Text

Zadar, August 2010

40

Cdlh 2010

Let A=(, Q, , I, F) be a NFA, we denote by AT=(, Q, T, F, I) the

reversal automaton with:

T(q,a)={q’Q: q(q’,a)}

Page 41: Learning from Text

Zadar, August 2010

41

Cdlh 2010

0 1

3

b2

4

a

ba

a a a

0 1

3

b2

4

a

ba

a a a

A

AT

Page 42: Learning from Text

Zadar, August 2010

42

Cdlh 2010

Some definitions

u is a k-successor of q if │u│=k and (q,u)

u is a k-predecessor of q if │u│=k and T(q,uT)

is 0-successor and 0-predecessor of any state

Page 43: Learning from Text

Zadar, August 2010

43

Cdlh 2010

0 1

3

b2

4b

a

a a a

A

aa is a 2-successor of 0 and 1 but not of 3

a is a 1-successor of 3 aa is a 2-predecessor of 3 but not of

1

a

Page 44: Learning from Text

Zadar, August 2010

44

Cdlh 2010

A NFA is deterministic with look-ahead k if q,q’Q: qq’

(q,q’I) (q,q’(q”,a))

(u is a k-successor of q) (v is a k-successor of q’) uv

Page 45: Learning from Text

Zadar, August 2010

45

Cdlh 2010

Prohibited:

2

1

a

a

u

u

│u│=k

Page 46: Learning from Text

Zadar, August 2010

46

Cdlh 2010

Example

This automaton is not deterministic with look-ahead 1 but is deterministic with look-ahead 2

0 1

3

b2

4

a

ba

a a a

Page 47: Learning from Text

Zadar, August 2010

47

Cdlh 2010

K-reversible automata

A is k-reversible if A is deterministic and AT is deterministic with look-ahead k

Example

0 1

b

2ba

a

b

0 1

b

2ba

a

bdeterministic deterministic with look-ahead 1

Page 48: Learning from Text

Zadar, August 2010

48

Cdlh 2010

Notations

RL(, k) is the set of all k reversible languages over alphabet

RL() is the set of all k-reversible languages over alphabet (ie for all values of k)

ak-RL is the learning algorithm we describe

Page 49: Learning from Text

Zadar, August 2010

49

Cdlh 2010

Properties

There are some regular languages that

are not in RL()

RL(,k) RL(,k-1)

Page 50: Learning from Text

Zadar, August 2010

50

Cdlh 2010

Violation of k-reversibility

Two states q, q’ violate the k-reversibility condition if

they violate the deterministic condition: q,q’(q”,a)

or they violate the look-ahead condition:

q,q’F, uk: u is k-predecessor of both q and q’

uk, (q,a)=(q’,a) and u is k-predecessor of both q and q’

Page 51: Learning from Text

Zadar, August 2010

51

Cdlh 2010

Learning k-reversible automata

Key idea: the order in which the merges are performed does not matter!

Just merge states that do not comply with the conditions for k-reversibility

Page 52: Learning from Text

Zadar, August 2010

52

Cdlh 2010

K-RL algorithm (ak-RL)

Data: kℕ, S sample of a k-RL language L

A0=PTA(S) ={{q}:qQ}While B,B’ k-reversibility violators do

= -B-B’ {BB’}A=A0/

Page 53: Learning from Text

Zadar, August 2010

53

Cdlh 2010

K-RL Algorithm (ak-RL)

Data: kℕ, S sample of a k-RL language LA=PTA(S)While q,q’ k-reversibility violators do

A=merge(A,q,q’)

Page 54: Learning from Text

Zadar, August 2010

54

Cdlh 2010

Let S={a, aa, abba, abbbba}

a

ab abb

aa

abbbbabbb abbbba

abba

a

b b b b a

a

a

k=2

Violators, for u= ba

Page 55: Learning from Text

Zadar, August 2010

55

Cdlh 2010

S={a, aa, abba, abbbba}

a

ab abb

aa

abbbbabbb

abba

a

b b b b

a

a

a

k=2

Violators, for u= bb

Page 56: Learning from Text

Zadar, August 2010

56

Cdlh 2010

S={a, aa, abba, abbbba}

a

ab abb

aa

abbb

abbaa

b b b

b

a

a

k=2

Suppose k=1. Then now a, aa and abba violate.

Page 57: Learning from Text

Zadar, August 2010

57

Cdlh 2010

Properties (1)

k0, S, ak-RL(S) is a k-reversible language

L(ak-RL(S)) is the smallest k-reversible language that contains S

The class RL(, k) is identifiable in the limit from text

Page 58: Learning from Text

Zadar, August 2010

58

Cdlh 2010

Properties (2)

Any regular language is k-reversible iff (u1v)-1L (u2v)-1L and │v│=k

(u1v)-1L=(u2v)-1L

(if two strings are prefixes of a string of length at least k, then the strings are

Nerode-equivalent)

Page 59: Learning from Text

Zadar, August 2010

59

Cdlh 2010

Properties (3)

L(ak-RL(S)) L(a(k-1)-RL(S))

RL(, k) RL(, k-1)

Page 60: Learning from Text

Zadar, August 2010

60

Cdlh 2010

Properties (4)

The time complexity is O(k║S║3)

The space complexity is O(║S║)

Page 61: Learning from Text

Zadar, August 2010

61

Cdlh 2010

Properties (4) Polynomial aspects

Polynomial characteristic sets Polynomial update time But not necessarily a polynomial

number of mind changes

Page 62: Learning from Text

Zadar, August 2010

62

Cdlh 2010

Extensions

Sakakibara built an extension for context-free grammars whose tree language is k-reversible

Marion & Besombes propose an extension to tree languages

Different authors propose to learn these automata and then estimate the probabilities as an alternative to learning stochastic automata

Page 63: Learning from Text

Zadar, August 2010

63

Cdlh 2010

Exercises

Build a language L that is not k-reversible, k0

Prove that the class of all k-reversible languages is not learnable from text

Run ak-RL on S={aa, aba, abb, abaaba, baaba} for k=0,1,2,3

Page 64: Learning from Text

Zadar, August 2010

64

Cdlh 2010

Solution (idea)

Lk={ai: ik}

Then for each k: Lk is k-reversible but not k-1 reversible.

And ULk = a*

So there is an accumulation point…

Page 65: Learning from Text

Zadar, August 2010

65

Cdlh 2010

6 Conclusions

Window languages

Page 66: Learning from Text

Zadar, August 2010

66

Cdlh 2010

Exercise (1)

Let Jn={w*: wn} And J=U{Jn} Find an algorithm that identifies J in the

limit from text Prove that this algorithm works in

polynomial update time Prove that it admits a polynomial locking

sequence (characteristic set) Prove that the algorithm does not meet

Yokomori’s conditions

Page 67: Learning from Text

Zadar, August 2010

67

Cdlh 2010

Exercise (2)

Let Bn,w={u*: dedit(u,w)n}

And B=U{Bn,w}

Find an algorithm that identifies B in the limit from text.

Does your algorithm meet Yokomori’s conditions?