Characterization of state merging strategies which ensure identification in the limit from complete data Cristina Bibire.

Characterization of Characterization of state merging state merging

strategies which strategies which ensure identification ensure identification

in the limit from in the limit from complete datacomplete data

Cristina BibireCristina Bibire

History

Motivation

Preliminaries

RPNI

Further Research

Bibliography

HistoryHistoryIn the second half of 60’s it was Gold who first formulated the process of learning formal languages. Motivated by observing children’s learning process, he proposed an idea that learning is an infinite process of making guesses of grammars and it does not terminate in finite steps but only able to converge at a correct grammar in the limit.

Gold’s algorithm for learning regular languages from both positive and negative examples finds the correct automaton when a characteristic sample is included in the data.

The problem of learning the minimum state DFA that is consistent with a given sample has been actively studied for over two decades. A lot of algorithms have been developed: RPNI (Regular Inference from Positive and Negative Data), ALERGIA, MDI (Minimum Divergence Inference), DDSM (Data Driven State Merging) and many others.

Even if there is no guarantee of identification from the available data, the existence of the associated characteristic sets means that these algorithms converge towards the correct solution.

MotivationMotivationGiven two sets of strings, how can we decide if they contain or not a characteristic sample for a given algorithm? How do we decide which algorithm to apply? How many consistent DFA can we find? Which is the best searching strategy: exhaustive search, beam search, greedy search, etc?

The importance of learning regular languages (or equivalently, identification of the corresponding DFA) is justified by the fact that algorithms treating the inference problem for DFA can be nicely adapted for larger classes of grammars, for instance: even linear grammars (Takada 88 & 94; Sempere & Garcia 94, Makinen 96), subsequential functions (Oncina, Garcia & Vidal 93), tree automata (Knuutila) or Context-free grammars from skeletons (Sakakibara 90).

The problem of exactly learning the target DFA from an arbitrary set of labeled examples and the problem of approximating the target DFA from labeled examples are both known to be hard problems. Thus the question as to whether DFA are efficiently learnable under some restricted but fairly general and practically useful classes of distribution is clearly of interest.

,S S

We will assume that the target DFA being learned is a canonical DFA.

Let and denote the set of positive and negative examples of A respectively. A is consistent with a sample if it accepts all positive examples and rejects all negative examples.

A set is said to be structurally complete with respect to a DFA A if it covers each transition of A and uses each final state of A.

Given a set , let denote the prefix tree automaton for . is a DFA that contains a path from the start state to an accepting state for each string in modulo common prefixes.

Ex:

PreliminariesPreliminaries

S

S S

S S S

S PTA S PTA S

S

00,1,010,011,100S 1

λ 1

10

0

100

0

0

0

00 01

0 1

010

011

0 1The states of the are labeled based on the standard order of the set Pr(S+)

PTA S

Given a DFA , is a partition of Q iff

1. Each is nonempty,

2. ,

3. .

Ex: The DFA is

Partitions of Q are: Lattice of partitions is:

iff πi covers πj iff πj ≤ πi

PreliminariesPreliminaries , , , ,oA Q p F 1, , nB B

iB

, i ji j B B

iiB Q

, , , 0,1 , , ,p q r p r

p q r0 1

1 , ,p q r 2 , ,p q r 3 , ,p q r 4 , ,p r q 5 , ,p q r

π3

π1

π2 π4

π5πi

πj

Given a DFA A and a partition π on the set of states Q of A, we define the quotient automaton Aπ obtained by merging the states of A that belong to the same block of the partition π.

Note that a quotient automaton of a DFA might be a NFA and vice-versa.

Ex: Given M:

A structurally complete set for M is: .

:

then

:


p q0 1 r

01S

A PTA S

4 , ,p r q

4/A p, r

q0

1

0

1

PreliminariesPreliminariesSearch Space comprising π-quotient automata of A:

qp r10

00

p,q

r1 q,r

p 0

1

p,q,r

0,1

p,r

q

1

4/A

5/A

3/A 2/A

1/A

The set of all derived automata obtained by systematically merging the states of A represents a lattice of finite state automata.

Given a canonical DFA M and a set that is structurally complete with respect to M, the lattice derived from is guaranteed to contain M (Pao & Carr, 1978; Parekh & Honavar 1993; Dupont et al, 1994)

Pr (α) – prefixes of α

- the set of prefixes of L

- the set of tails of α

The standard order of strings of the alphabet Σ is denoted by <. The standard enumeration of strings over is λ, a, b, aa, ab, ba, bb, …

- short prefixes of L

- the kernel of L


S

PTA S

L L Pr L L

*PrpS L L suchthat L L and

, , PrpN L a S L a a L

,a b

PreliminariesPreliminariesDefinition: A sample is said to be characteristic with respect to a regular language L (with the canonical DFA A) if it satisfies the following two conditions:

Intuitively, condition 1 implies structural completeness with respect to A and condition 2 implies that for any distinct states of A there is a suffix γ that would correctly distinguish them.

Notice that:

- if you add more strings to a characteristic sample it still is characteristic;

- there can be many different characteristic samples

*,N L if Lthen S else suchthat S

*, ,pS L N L if L L then suchthat

S and S or S and S

S S S

RPNIRPNIThe regular positive and negative inference (RPNI) algorithm [Oncina & Garcia, 1992] is a polynomial time algorithm for identifying a DFA consistent with a given sample . It can be shown that given a characteristic sample for the target DFA the algorithm is guaranteed to return a canonical representation of the target DFA [Oncina & Garcia, 1992; Dupont, 1996].

S S S

0 0

;

; , ;

: , ,

, ,

,

A PTA S

K q Fr q a a

While Fr do

choose q from Fr

if p K L dmerge A p q X

then A dmerge A p q

else K K q

Fr q a q K K

Ex: Suppose our language L is the set of all words which are congruent with 2 (mod 3).

A canonical automaton for this language is:

It can be easily verified that

is a characteristic sample, where

RPNIRPNI *0,1w

S S S

100

01

110 2

0101,10100,1110

0,1,1001

S

S

Ex: Suppose our language L is the set of all words which are congruent with 2 (mod 3).

A canonical automaton for this language is:

It can be easily verified that

is a characteristic sample, where

:The PTA S is

λ

10100

1010

11

1

1110

111

10

101

0

01

010

0101

00

0

0

0

0

1

1 1

11

1

RPNIRPNIλ

10100

1010

11

1

1110

111

10

101

0

01

010

0101

00

0

0

0

0

1

1 1

11

1

0101,10100,1110

0,1,1001

S

S

K

Fr

RPNIRPNI 0,1,1001S

λ

10100

1010

11

1

1110

111

10

101

0

01

010

0101

00

0

0

0

0

1

1 1

11

1

λ

10100

1010

11

1

1110

111

10

101

01

010

0101

00

0

0

0

0

1

1 1

1

1

1

RPNIRPNI 0,1,1001S

λ

10100

1010

11

1

1110

111

10

101

0

01

010

0101

00

0

0

0

0

1

1 1

11

1

λ,0

10100

1010

11

1,01

1110

111

10

101010

0101

00

0

0

0

0

1

1 1

1

1

RPNIRPNI 0,1,1001S

λ

10100

1010

11

1

1110

111

10

101

0

01

010

0101

00

0

0

0

0

1

1 1

11

1

λ,0

10100

1010

11

1,01

1110

111

10,010

101

0101

00

0

0

0

1

1 1

1

1

RPNIRPNI 0,1,1001S

λ

10100

1010

11

1

1110

111

10

101

0

01

010

0101

00

0

0

0

0

1

1 1

11

1

λ,0

10100

1010

11

1,01

1110

111

10,010

101,0101

00

0

0

0

1 1

1

1

K

Fr

RPNIRPNI 0,1,1001S

λ,0,1,01

10100

1010

11

1110

111

10,010

101,0101

00

0,1

0

0

1 1

1

λ,0

10100

1010

11

1,01

1110

111

10,010

101,0101

00

0

0

0

1 1

1

1

λ,0,1,01,10,

010,101,01011010,10100,

11,111,1110

0,1

! 0 L S

11

RPNIRPNI 0,1,1001S

λ,0

10100

1010

1,01

1110

111

10,010

101,0101

00

0

0

0

1 1

1

1 K

Fr 11

λ,10,010,

0

10100

1010

1,01

1110

111 101,0101

00

0

0

0

1 1

1

1

11

λ,010,10,0,

1010,1010

1,010101,101

1110

111

0

0

0

1

1

1

! 0 L S

11

RPNIRPNI 0,1,1001S

λ,0

10100

1010

1,01

1110

111

10,010

101,0101

00

0

0

0

1 1

1

1 K

Fr 11

λ,0

10100

1010

1,01,10,010

1110

111 101,0101

00

0

0

1

1

1

1

0

11,101,010

1

λ,0

10100

1010

1,01,10,010

1110

111

00

0

0 1

1

1

0

!1001 L S

11

RPNIRPNI 0,1,1001S

λ,0

10100

1010

1,01

1110

111

10,010

101,0101

00

0

0

0

1 1

1

1 K

Fr 11

λ,0,101,010

1

10100

1010

1,01

1110

111

10,010

0

0

0

0

0

1

1

1

1

11

λ,0,101,010

1

1,01

1110

111

10,010

0

0

0

1

1

1

1

! 0 L S

11

RPNIRPNI 0,1,1001S

λ,0

10100

1010

1,01

1110

111

10,010

101,0101

00

0

0

0

1 1

1

1 K

Fr

!1 L S

11

λ,0

10100

1010

1,01,101,01

01

1110

111

10,010

0

0

0

0

01

1

1

1

11

λ,0

10100

1,01,101,01

01

1110

111

10,010,101

0

0

0

0

01

1

1

1

RPNIRPNI 0,1,1001S

Fr

1110

111

11

λ,0

10100

1010

1,01

10,010

101,0101

00

0

0

0

1 1

1

1 K

L S

11

λ,0

1,01

1110

111

10,010,101,0101

0

0

0

10100

1010

0

01

1

1

1

1010

K

Fr11

λ,0,1010

1,01

1110

111

10,010,101,0101

0

0

0

10100

0

01

1

1

1

RPNIRPNI 0,1,1001S

Fr

1110

111

11

λ,0

10100

1010

1,01

10,010

101,0101

00

0

0

0

1 1

1

1 K

11

λ,0

1,01

1110

111

10,010,101,0101

0

0

0

10100

1010

0

01

1

1

1

L S

1010

K

Fr11

λ,0,1010,10100

1,01

1110

111

10,010,101,0101

0

0

0

01

1

1

1

!0 L S

RPNIRPNI 0,1,1001S

Fr

1110

111

11

λ,0

10100

1010

1,01

10,010

101,0101

00

0

0

0

1 1

1

1 K

11

λ,0

1,01

1110

111

10,010,101,0101

0

0

0

10100

1010

0

01

1

1

1

L S

1010

K

Fr11

λ,0

1,01,1010

1110

111

10,010,101,0101

0

0

0

10100

0

01

1

1

1

RPNIRPNI 0,1,1001S

Fr

1110

111

11

λ,0

10100

1010

1,01

10,010

101,0101

00

0

0

0

1 1

1

1 K

11

λ,0

1,01

1110

111

10,010,101,0101

0

0

0

10100

1010

0

01

1

1

1

L S

1010

K

Fr10,010,101,0101,10100

11

λ,0

1,01,1010

1110

111

0

0

0

01

1

1

1

L S

RPNIRPNI 0,1,1001S

K

Fr10,010,101,0101,10100

11

λ,0

1,01,1010

1110

111

0

0

0

01

1

1

1

10,010,101,0101,10100

λ,0,11

1,01,1010

1110

111

0

0

0

01

1

11

RPNIRPNI 0,1,1001S

K

Fr10,010,101,0101,10100

11

λ,0

1,01,1010

1110

111

0

0

0

01

1

1

1

10,010,101,0101,10100

1,01,1010,1

11

1110

00

0

0

1

1 1

λ,0,11

RPNIRPNI 0,1,1001S

K

Fr10,010,101,0101,10100

11

λ,0

1,01,1010

1110

111

0

0

0

01

1

1

1

10,010,101,0101,10100,1110

1,01,1010,1

11

0

0

0

1

1 1

λ,0,11

RPNIRPNI

The convergence of the RPNI algorithm relies on the fact that sooner or later, the set of labeled examples seen by the learner will include a characteristic set.

If the stream of examples provided to the learner is drawn according to a simple distribution, the characteristic set would be made available relatively early (during learning) with a sufficiently high probability and hence the algorithm will converge quickly to the desired target.

RPNI is an optimistic algorithm: at any step two states are compared and the question is: can they be merged? No positive evidence can be produced; merging will take place each time that such a merge does not produce inconsistency. Obvious an early mistake can have disastrous effects and Lang proved that a breadth first exploration of the lattice is likely to be better.

Further ResearchFurther Research

o The RPNI complexity is not a tight upper bound. Find the correct complexity

o Are DFA’s PAC-identifiable if examples are drawn from the uniform

distribution, or some other known simple distribution?

o The study of some data-independent algorithms (which do not use the state

merging strategy)

o The development of a software which would facilitate the merging of the states

in any given algorithm (any merging strategy)

BibliographyBibliography

• Colin de la Higuera, José Oncina, Enrique Vidal. “Identification of DFA: Data-Dependent versus Data-Independent Algorithms”. Lecture Notes in Artificial Intelligence 1147, Grammatical Inference: Learning Syntax from Sentences, 313-325

• Rajesh Parekh, Vasant Honavar. “Learning DFA from Simple Examples”. Lecture Notes in Artificial Intelligence 1316, Algorithmic Learning Theory, 116-131

• Satoshi Kobayashi, Lecture notes for the 3rd International PhD School on Formal Languages and Applications, Tarragona, Spain

• Colin de la Higuera, Lecture notes for the 3rd International PhD School on Formal Languages and Applications, Tarragona, Spain

• Michael J. Kearns, Umesh V. Vazirani “An Introduction to Computational Theory”

Characterization of state merging strategies which ensure identification in the limit from complete data Cristina Bibire.

Documents

target dfa

consistent dfa

minimum state dfa

corresponding dfa

canonical dfa

negative examples

positive examples

learning process