Grammatical Inference - Inria

Grammatical Inference

Francois Coste

SML, Master SIF

2020-2021

F. Coste (Inria) Grammatical Inference SML 2020-2021 1 / 123

Grammatical Inference

Learn the grammar of a language from correct (and incorrect) sentences

N. Chomsky, Syntactic Structure, Mouton, 1957, PhD thesis MIT 1955

E. M. Gold, Language Identification in the Limit, Information and Control, 1967

. . .

(targeted) Applications

Syntactic pattern recognition [Fu, 1982]

Natural language, Molecular biology, Structured texts, Web, actionplanning, intrusion detection . . .

Field

Theoretical (learnability)Practical (algorithms)


Formal languages theory

Sequence of symbols s1s2 . . . sp: word

Set of words {m1,m2, . . .}: language

Set of production rules generating a language: grammar

Learning a grammar by induction: Grammatical Inference

(covers more broadly inductive learning of languages, even if the representation is not

grammatical)


Grammar

Grammar : G = 〈Σ, N, S,R〉Σ finite set of terminals (a,b,c,. . . )

N finite set of non-terminals (S,T,U,. . . )

S(∈ N) axiom (start symbol)

R set of rewriting rulesEach rule is written as:

α→ β, α ∈ (N ∪ Σ)∗N(N ∪ Σ)∗, β ∈ (N ∪ Σ)∗

When some rules have the same left hand side, we write:

α→ β1|β2| · · ·


Grammars and languages

Elementary derivation: ⇒G :

µαδ ⇒G µβδ iff ∃ α→ β ∈ R, µ, δ ∈ (N ∪ Σ)∗

Derivation ⇒∗G : finite sequence of elementary derivations

Language generated by a grammar G, L(G) :

L(G) = {m ∈ Σ∗|S ⇒∗G m}

Free Monoid Σ∗ : set of all the words on Σ

Empty word: ε or λ

Empty language: ∅ (6= {ε})


ExampleDyck1’s grammar (balanced parenthesis)

G = 〈Σ, N, S,R〉Σ = {a, b}N = {S}R = {S → aSbS, S → ε}

Derivation

S ⇒ aSbS⇒ aaSbSbS⇒ aabSbS⇒ aabbS⇒ aabb


Exercises

Find the grammars generating the following languages:

{aaba, aaa}All the words on {a, b} (Σ∗)

Words on {a, b} beginning by a

Codons on {a, c, g, t} (letter’s count is a multiple of 3)

Palindromes on {a, b}R = {S → aSa|bSb|a|b|ε}Biological palindromes (on {a, c, g, t}, a− t, c− g)exercise. . .

{anbncn|n ≥ 1}R = {S → abc|aSAc, bA→ bb, cA→ Ac}S ⇒ aSAc⇒ aabcAc⇒ aabAcc⇒ aabbcc

Copy : {ww/w ∈ {a, b}∗}exercise. . .


Chomsky Hierarchy

Hierarchy of recursively enumerable languages:

0 Unrestricted

1 Context sensitive (grammaires contextuelles)

α→ β, |α| ≤ |β|

2 context-free (grammaires algebriques)

A→ β, A ∈ N

3 regular (grammaires regulieres, automates)

A→ aB or A→ a, A,B ∈ N, a ∈ Σ ∪ {ε}


The Chomsky Hierarchy


Regular languages are worth inferring

For practical applications, powerful recursive models may not be required

Regular languages can account for short term dependencies (likeN-Gramms), but also some long-term dependencies.

Any language can be approximated by a regular language (each finitelanguage is regular!).

Properties of regular languages are well studied; this makes the developmentof inference methods easier

Simple and efficient parsing of string (O(|m|) for DFA).


Outline

1 Learning automataDefinitionsLearning automata from positive and negative examplesLearning automata from positive example


Automata

A = 〈Σ, Q,Q0, QF , δ〉

Multiples of 3 (binary):

Σ finite alphabet {0, 1}Q finite set of states {q0, q1, q2}

Q0(⊆ Q) initial states {q0}QF (⊆ Q) final states {q0}

δ transition function: Q× Σ→ P(Q)(δ∗ : P(Q)× Σ∗ → P(Q) denotes the extension to words of δ)

Language accepted by A

L(A) = {m ∈ Σ∗|δ∗(Q0,m) ∩QF 6= ∅}


Automata and languages

Language accepted/recognized by automata: regular language. + ∗ ()

Exces

Find automatas on Σ = {a, b} recognizing:

- {abba, aab}. (show that each finite language is regular)

- all the words on Σ : (a+ b)∗ = {a, b}∗ = Σ∗

- all the words containing the motif aa

- all the words with 3 letter (extension to codons ?)

- all the words with an even number of a.

Deterministic finite state automata (DFA) : |δ(q, a)| ≤ 1Any non deterministic automata (NFA) can be determinized

⇒ LAFN = LAFD

Canonical automaton of L, A(L) : smallest DFA accepting L


Can we learn regular languagesfrom positive examples only?

Theoretical framework: identification in the limit [Gold67]

Presentation : infinite sequence of examples

P : x1 x2 x3 . . . xk . . . xi . . .↓ ↓ ↓ ↓ ↓H1 H2 H3 Hk Hi ≡ Hk ≡ H0

Identification in the limit of H0 :

∀P,∃k, ∀i > k,Hi ≡ H0


Let’s try!

a, aa, aaa . . .


Limit point

If a limit point exists:

L1 ⊂ L2 ⊂ L3 ⊂ · · · ⊂ L∞ =⋃i

Li

Then

The class of languages is not identifiable in the limit from positiveexamples


Results [Gold67]

No superfinite class of language (⊃ regular) can be identified in thelimit from text (i.e. positive examples only)

The class of primitive recursive function (“fonction recursiveprimitive”) can be identified in the limit from informant (examplesand counter-examples)(False for the class of total recursive functions)

→ rationale for using counter-examples

Time needed for learning ???


Polynomial Time and Data Identification in the Limit[Gold 78] [Pitt 89] [Higuera 95]

Identification in the limit from Polynomial Time and Data (IPTD)

A representation class R is identifiable in the limit from polynomial timeand data iff there exists two polynomials p and q, a learning algorithm As.t.:

Given any sample S = 〈S+, S−〉 of size m,A returns a representation R in R compatible with S in p(m) time

For each representation R of size n,there exists a characteristic sample of size less than q(n)

Characteristic sample CS = 〈CS+, CS−〉: for any S = 〈S+, S−〉, s.t.CS+ ⊆ S+, CS− ⊆ S−, A returns a representation R′ equivalent with R


Are automata IPTD?

Outline



1 Learning automataDefinitionsLearning automata from positive and negative examples

Problem definitionRPNIStructural completeness hypothesisUtility of counter-examplesEDSM heuristic

Learning automata from positive example


Remark:Given a sample S = 〈S+, S−〉, an infinite number of automata are

compatible with S

Searching for the smallest compatible DFA

Smallest compatible DFA problem

Given S+ ⊂ Σ∗ (examples) and S− ⊂ Σ∗ (counter-examples),Find smallest DFA A st S+ ⊂ L(A) and S− ∩ L(A) = ∅

Application of Occam’s razor

Canonical automata of language . . .

NP-Complete problem [Gold78] [Angluin78]

Proof: reduction to SAT

To find a DFA (only) polynomially bigger than the smallest DFA compatiblewith 〈S+, S−〉 is NP-Complete [Pitt, Warmuth 93]

PAC-Learning DFA is as hard as breaking the RSA cryptosystem [Pitt,

Warmuth 88] [Kearn, Valiant 89]


PAC (Probably Approximatively Correct) - Learning[Valiant 84]

Approximatively Correct Error upper bound ε

Rreal(h) = P (h(o) 6= f(o)) < ε

For any concept f in FFor any error ε and any confidence 1− δThere exists Nε,δ such that for the set of h learnt from Nε,δ examples:

P (Rreal(h) < ε) > 1− δ






RPNI : Regular Positive Negative Inference[Oncina, Garcia 1992], [Lang 1992]

S+ = {aaa, bba, baaa} S− = {aaaa, baab, bbabab}

Maximal Canonical Automaton MCA(S+) Determinisation. . .




Prefix Tree Automaton PTA(S+) Rote learning! Generalisation throughstate merging under control of S− Merge 0 and 1




Result of merging 0 and 1Non deterministic automaton!

Merging for determinisation . . .


Merging for determinisationHow to consider only DFAs

Merging for determinisation

∀q ∈ Q,∀a ∈ Σ,∀s1, s2 ∈ δ(q, a),Merge(s1, s2)

(6= determinisation algorithm of a NFA : language can grow here!)

→

PTA(S+) = merging for determinisation of MCA(S+)

→

Deterministic merge

Merging states + merging for determinisationF. Coste (Inria) Grammatical Inference SML 2020-2021 28 / 123



Result of merging 0 and 1Non deterministic automaton!

Merging for determinisation . . .




After merging for determinisation A counter-example is accepted!




Backtrack Merge 0 and 2




Merged 0 and 2 Merging determinisation




After merging for determinisation Merge 0 and 3




Merged 0 et 3 Merging for determinisation




After merging for determinisation No more possible merging... Solution !



RPNI

A← PTA(S+)for all (p, q) in standard order 1 doA′ ← Deterministic merge(A, p, q)if A′ accepts no counter-example from S− thenA← A′

end ifend for

Complexity : O((|S+|+ |S−|).|S+|2)

1Standard order u ≺ v : (|u| < |v|) ∨ (|u| = |v| ∧ ∃k, ∀i < k, ui = vi ∧ uk < vk)F. Coste (Inria) Grammatical Inference SML 2020-2021 36 / 123

Success / amount of sequences in training sample

fig. from [Lang, 1992]


Identification ?

Requirements for finding the solution with RPNI?

1. The target automata has to be in the search space

and

2. The good merges have to be chosen






Structural completeness hypothesis

S+ is structurally complete wrt A if an acceptation of S+ by A exists st:

Every transition of A is used

Every final state of A is used for acceptation

S+ = {aaa, bba, baaa} A =


Maximal Canonical Automaton

Rote learning of S+ = {aaa, bba, baaa}

Union :

MCA(S+)

Only one initial state (classical but not required):

MCA(S+)


Merging states

Language generalisation operator

Preserve structural completeness


Merging states




Merging states




Merging states




Merging states




Merging states




Merging states


Preservation of structural completeness

ua(S+)

Theorem

All automata A st S+ is structurally complete wrt A can be built bymerging states of MCA(S+)


Search space


DFA search space

operator: deterministic merge

Theorem

All automata A st S+ is structurally complete wrt A can be build bydeterministic merges of states in MCA(S+) (or PTA(S+))






Limiting generalisation with a set of counter-examples S−

Border Set : set of most general elements(Greater generalisation under control of S−)

Occam’s razor → looking for smallest automaton

S− guides also the search. . .


Limiting generalisation with a set of counter-examples S−

Border Set : set of most general elements(Greater generalisation under control of S−)

Occam’s razor → looking for smallest automaton

S− guides also the search. . .


Characteristic sample for RPNI

How to ensure that RPNI returns A(L) ?

Ideas :

Sample has to be structurally complete wrt A(L)

Sample is informative enough to prevent merging distinct states


Characteristic sample for RPNIShort prefixes and Kernel

Let Pr(L) denote the set of prefixes of a language L: Pr(L) = {u ∈ Σ∗ : uv ∈ L}

Short prefixesSmallest sequences enabling to reach each state of the target

Sp(L) = {u ∈ Pr(L) : @v ∈ Pr(L), v < u and δA(L)(q0, v) = δA(L)(q0, u)}

KernelSequences of Sp concatenated with one letter allowing to reach a new state(exercise all the possible transitions)

N(L) = {ua ∈ Pr(L) : u ∈ Sp(L), a ∈ Σ} ∪ {ε}

What would be N(L) for the following DFA target ?


Characteristic sample for RPNI

S = 〈S+, S−〉 is a characteristic sample of A(L) for RPNI if:

∀x ∈ N(L) :∃u ∈ Σ∗, xu ∈ S+(u = ε if x ∈ L)

∀x, y ∈ N(L), δA(L)(q0, x) 6= δA(L)(q0, y) :

∃u ∈ Σ∗, ((xu ∈ S+ and yu ∈ S−) or (xu ∈ S− and yu ∈ S+))

What would be a characteristic sample for ?

Is the characteristic sample unique for an automate?

It can shown that:

- Adding new examples to the characteristic sample does not change theautomata returned by RPNI

- For each A(L), there exists a characteristic sample of size O(|A(L)|2)


What about merging states in random order?Trakhtenbrot et Barzdin 1973

Algorithm : deterministic merge of pair of states not resulting inincompatible automata in random order

Algorithm complexity?At most |PTA|.|A|2 [Lang92] (where A is the target automaton)

Characteristic sample?{w ∈ Σ∗/|w| ≤ d+ 1 + ρ}d : depth of automataρ : distinguishably degree(length of suffix required to distinguish pairs ofstates, i.e. allowing to reach a final state and a non final state)

Worst case d = ρ = |A| − 1In average, ρ = log|Σ| log2 |A| et d = C log|Σ|(where C : constant wrt Σ)For |Σ| = 2, average size is: ∼ 16|A|2 − 1|A| = 32 → 16383 seq., 65 → 67599, 506 → 4096575 ...


RPNI

The solution returned by RPNI is:

a DFA belonging to the Border Set

the canonical automata of the language that it accepts

if the sample is characteristic, it is the smallest compatible DFA(Contradiction with NP-Completeness of the problem ?

No, sample has to be characteristic!)

Complexity : O((|S+|+ |S−|).|S+|2)Characteristic sample: O(n2)⇒

DFA are identifiable in the limit from polynomial time and data (IPTD)


Positive results

Deterministic automata are IPTD⇒ Even linear grammars[Takada 88,94], [Sempere, Garcıa 94], [Makinen 96]

⇒ Sub-sequential transducers[Oncina, Garcıa, Vidal 93]

⇒ Context-free grammars from structure[Sakakibara 90]

⇒ Tree automata[Knuutila 93]


Simple PAC

[Denis, D’Halluin, Gilleron 96]

PAC Learning but for “simple distribution” only

Simple example have a higher probability in the training sampleSo unseen simple example are counter-examples

DFA are Simple PAC learnable [Parekh, Honavar 97]

DFA are Simple PAC learnable from positive examples [Denis 98]


Negative results

The classes below are not IPTD for |Σ| ≥ 2 :

Context-free grammars

Linear grammars

Non-deterministic automata






Unbiased/symmetrical learning

Defining a regular language⇔ Definition of complementary language

[Alquezar, Sanfeliu 95]:

Consider symmetrically S+ and S−→ learn L+ and L−

Classification of words: +, - or ?

Related to learning Mealy, Moore finite states machines [Biermann,

Feldmann 72], and automata [Lang 92, Oncina, Garcia 92]


Maximal Canonical Automaton

S+ = {aaa, bba, baaa} ; S− = {aaaa, baab, bbabab}MCA(S+,S−) :

Rote learning


EDSM Heuristic

Evidence Driven State MergingR. Price, K. Lang, Abbadingo One, 1998

Data driven heuristic

Dynamic choice of best pair of states to merge at each step accordingto evidence of a good merge

Evidence measure: maximise count of final states merge fordeterminization

( Rem. : → similarity between subtrees)


Example

S+ = {a, aaa, ba, baaa} S− = {aab, baab, baba}

PTA(S+, S−)

f(1,2) = 0, f(0,2) = 3, . . . , f(3,8) = 2, . . . , f(8,9) = −∞, . . .


Example


PTA(S+, S−)

f(1,2) = 0, f(0,2) = 3, . . . , f(3,8) = 2, . . . , f(8,9) = −∞, . . .


Example


PTA(S+, S−)

f(0 2, 1 4) = −∞, f(0 2, 3 8) = 1 , . . . ,


Example


PTA(S+, S−)

f(1 4 6 11, 9) = 1 , . . . ,


Example


PTA(S+, S−)


EDSM : a good heuristic . . .for automata randomly and uniformly generated

fig. from Merge order count K. Lang, 1997

Abbadingo : pb 506 states, 60 000 seq. (R. Price)

would require ∼ 100 000 seq. with RPNI.


. . . but expensive

Evaluation of O(n2) merges at each step of the algorithm!

Remark: scores for merging states far from the root are smaller

→ window w (Lang, Price ?)→ Blue-Fringe. . .


Blue Fringe, H. Juille, Abbadingo One, 1998

fig. from Faster algorithms for finding minimal consistent DFA, K. Lang, 1999

Any state B not mergeable with any state in R is promoted to R

Merge pairs of states in B ×REasy to implement: states of B are roots of subtrees

Blue-Fringe + EDSM (+ SAGE, H. Juille)Abbadingo : pb 65 states, 1 521 seq.

would require ∼ 4 000 seq. with RPNIF. Coste (Inria) Grammatical Inference SML 2020-2021 78 / 123

Learning from positive and negative examples

[Gold 67]:

No superfinite class of language can be identified in the limit frompositive examples onlyThe class of primitive recursive function can be identified in the limitfrom positive and negative examples

Efficient learning

DFA are IPTD from positive and negative examples (RPNI)Extension to some closely related classesNFA are not! CFG neither . . .An heuristic (EDSM) that seems to perform better . . . (?)

What if negative examples are not available?


Outline



Learning from positive example (only)

Statistical criteria for not merging pair of states: ALERGIA

“Characterizable” methods: k-RI, k-testable languages

Heuristics methods: ECGI



Alergiak-reversibles languagesECGI


ALERGIA

[Carrasco, Oncina 99]

Input: S+, precision parameter αOutput : (probabilistic) DFA AA← PPTA(S+)for all (p, q) in standard order do

if compatible(p, q, α) thenA← deterministic merge(A, p, q)

end ifend for


ALERGIA

Compatibility between two 2 states q1 and q2 :

Transition probabilities are similar enough:∀a ∈ Σ ∪ {#},

∣∣∣∣C(q1, a)

C(q1)−C(q2, a)

C(q2)

∣∣∣∣ <√

1

2ln

2

α

(1√C(q1)

+1√C(q2)

)

Compatibility of successors :

∀a ∈ Σ, δ(q1, a) et δ(q2, a) sont α-compatibles


ALERGIA

Local measure of suffix language similarity

Other measures . . .→ Learning probabilistic automata→ Identification of probability distributions on words

See:

PAC-learnability of Probabilistic Deterministic Finite State Automata,A. Clark and F. Thollard, Journal of Machine Learning Research, 2004.Towards feasible PAC-learning probabilistic deterministic finiteautomata, J. C. Rabal and R. Gavalda, ICGI 2008Learning Rational stochastic languages, F. Denis, Y. Esposito, A.Habrard, COLT 2006.Spectral learning of weighted automata - A forward-backwardperspective, B. Balle, X. Carreras, F. M. Luque, A. Quattoni, MachineLearning, 2014. . .





Characterizable learning

Negative result of [Gold67] applies to superfinite languages.To avoid over-generalization, an approach performing minimalgeneralisation at each step ensure identification for particular classes oflanguages.


0-reversible languages

0−reversible automata: deterministic automata whose mirror isdeterminisistic

0−reversible language = recognized by a 0−reversible automata

Learnable from positive sample [Angluin 82]


k-reversibles language

k-reversible automata: deterministic automata A whose reverse Ar isdeterministic with look-ahead k:∀q, q′ ∈ Q, q 6= q′,((q, q′ ∈ Q0) ∨ (q, q′ ∈ δ(q′′, a))⇒ @u ∈ Σk : (δ(q, u) 6= ∅) ∧ (δ(q′, u) 6= ∅)).k-reversible language iff a k-reversible automata recognize it(⇔ u1vw, u2vw ∈ L et |v| = k ⇒ SL(u1v) = SL(u2v)).

A : Ar :A est 1-reversible

Are the following languages 0-reversible, 1-reversible?

Σ∗, 1∗01, 0∗1+, 11∗

Find one non 1-reversible language. . . Does it exists non reversibles languages (i.e.

non k-reversible for all k) ? 0∗(1 + ε)0∗


k-RI [Angluin82]

Input : k, S+

Output : Ak, canonical automaton accepting smallest k-reversible languageincluding S+

A← PTA(S+)while ∃(q1, q2)← (non-k-reversible (A)) doA← deterministicmerge(A, q1, q2)

end while

Temporal complexity: O(Σk|S+|k+3) Source: [TD2013]

Memory complexity: O(|S+|)Non incremental algorithm

[TD2013] How Symbolic Learning Can Help Statistical Learning (and vice versa),I. Tellier and Y. Dupont, RANLP 2013


k-RIExample, k = 0

S = {ε, aa, bb, aaaa, abab, abba, baba}Prefix tree acceptor (PTA)


k-RIExample, k = 0

S = {ε, aa, bb, aaaa, abab, abba, baba}Merging all final states


k-RIExample, k = 0

S = {ε, aa, bb, aaaa, abab, abba, baba}Merging for determinisation of states B


k-RIExample, k = 0

S = {ε, aa, bb, aaaa, abab, abba, baba}B predecessors of A by a, D predecessors of A by b have to be merged


k-RIExample, k = 0

S = {ε, aa, bb, aaaa, abab, abba, baba}C predecessors by b of B to merge


k-RIExample, k = 0

S = {ε, aa, bb, aaaa, abab, abba, baba}Solution


k-RI

Remarks : returns smallest language, not smallest automaton!

[Angluin82]

the class Ck−rev is identifiable from positive examples (proof: existence ofa characteristic sample)

see also: distinguishing functions [Fernau2000]

Choice of k ?Pertinence of the subclass for the application ?Exercise: Automata returned for S = {a, aa, aaa} (k = 0)





ECGI Heuristic

Error Correcting GI [Rulot, Vidal 88]

Learns regular grammars which are non deterministic and without cycles,st ∀A,B,C ∈ N, ∀b, a ∈ Σ :

if (B → aA) ∈ P and (C → bA) ∈ P then b = a

Positive examples

Incremental algorithm:

First grammar G0 = first example s0

Minimal modification of Gi−1 to accept new example si


ECGI

Error rules:Insertion of a : A→ aA,∀(A→ bB) ∈ P,∀a ∈ Σ

Subst. of b by a : A→ aB, ∀(A→ bB) ∈ P,∀a ∈ ΣA→ a,∀(A→ b) ∈ P,∀a ∈ Σ

Deletion of b : A→ B, ∀(A→ bB) ∈ P,∀a ∈ ΣA→ ε, ∀(A→ b) ∈ P,∀a ∈ Σ

By extending Gi−1 with these errors rules, one can compute (dyn. prog.) the

optimal error correcting parsing of si (using a minimal number of error rules)

Gi is Gi−1 extended with the minimal set of rules required to parse si


ECGI

Example for S+ = {aabb, abbb, abbab, bbb} :


ECGI

Input : I+Output : a grammar “ECGI” G compatible with I+x← I1

+ ; n← |x|N ← {A0, . . . , An−1} ; Σ← {a1, . . . , an}P ← {(Ai → aiAi), i = 1, . . . , n− 1} ∪ {An−1 → an}S ← A0 ; G1 ← (N,Σ, P, S)for i = 2 a |I+| doG← Gi−1 ; x← Ii+ ;P ′ ←optimal derivation(x,Gi−1)for j = 1 a P ′ doG←extend gram (G, pj)

end forGi ← G

end forReturn G


ECGI

No recursivity

Heuristic capturing variations of a family of sequences

Order of examples can change result

Easy extension to stochastic grammars

Is there a link between the structural completeness hypothesis and ECGI?


Learning languages: conclusion and perspective

Learning to classify sequences:

Classical machine learning approachTransformation into attribute-value representations and use classical ML.

Words embedding in multiple dimension. . .

Learning automataWell studied. Established learnability results.

Recent advances on learning regular distributions. . .

Learning grammarsHot topic nowadays. Substituability as a central concept for practical algorithms

and learnability results even beyond CFG (midly context sensitive languages).

Learning graphs is an emerging domain. . .


Some references

Inference grammaticale reguliere : fondements theoriques et principauxalgorithmes,Dupont, Miclet, RR-INRIA 3449, 1998

Recent advances of grammatical inference,Sakakibara, TCS vol 185, pp 15-45, 1997

A bibliographical study of Grammatical Inference,De la Higuera, 2002,http://pagesperso.lina.univ-nantes.fr/~cdlh/papers/bibliography_survey.pdf

Grammatical Inference,Colin de la Higuera

Inference grammaticaleChap7, support de cours de Laurent Micletftp: // ftp. irisa. fr/ local/ cordial/ polyAC0304. ps

Learnable classes of categorial grammars,Kanazawa, Cambridge University Press, 1998

Topics in Grammatical Inference,Editors: Heinz, Jeffrey, Sempere, Jose M (Eds.), Springer, 2016

Grammatical Inference Homepage : http://www.grammarlearning.org/


Biological palindrome : S → aSt|cSg|tSa|gSc|εDerivation tree of atgttcgaacat ?Consequence of adding a new rewriting rule:S → SS|aSt|cSg|tSa|gSc|ε ?Derivation tree of caaatcgatcatcgaagagctcttgttg ?de gaatattcgaatattc ?

CopyS → AaS|CcS|GgS|TtS|XX → εAa→ aA ; Ac→ cA ; Ag → gA ; At→ tACa→ aC ; Cc→ cC ; Cg → gC ; Ct→ tCGa→ aG ; Gc→ cG ; Gg → gG ; Gt→ tGTa→ aT ; Tc→ cT ; Tg → gT ; Tt→ tTAX → Xa ; CX → Xc ; GX → Xg ; TX → Xt

Derivation tree of ctaacctaac ?


What we have seen in SML so far

Introduction to machine learning

Generalisation, necessity of a bias. . .How to define properly a machine learning problem: choice of objectdescription, choice of hypothesis space, choice of ’best’ hypothesis, i.e.setting biasesExploration of search spaceEvaluation of the risk

Learning on sequences

Vectorization of texts and Naive BayesAutomata and learnability

Next: State-of-the art algorithms for attribute-valuerepresentations of instances. . .


http://pagesperso.lina.univ-nantes.fr/~cdlh/papers/bibliography_survey.pdf

ftp://ftp.irisa.fr/local/cordial/polyAC0304.ps

http://www.grammarlearning.org/

Grammatical Inference - Inria

Documents