ICGI, Valencia, S eptember 2010 1 Zulu: an active finite state machine learning competition Valencia September 2010 Colin de la Higuera
Jan 13, 2016
ICGI, Valencia, September 2010
1
Zulu: an active finite state machine learning competition
ValenciaSeptember 2010
Colin de la Higuera
ICGI, Valencia, September 2010
2
Cdlh 2010
General goal
http://labh-curien.univ-st-etienne.fr/zulu
To support research in DFA learning
To promote active learning as an alternative to statistical learning
To attempt to use learning for under-resourced languages
ICGI, Valencia, September 2010
3
Cdlh 2010
State of the art (1)
1. Learning automata is a difficult but great topic, with not enough positive results (… do come this afternoon…)
2. The question of learning DFA has received attention for 30 years
3. Typical protocol consists in learning from a bunch of data: you need a lot of data if you want to learn…
ICGI, Valencia, September 2010
4
Cdlh 2010
State of the art (2)
1. Alternative introduced by Angluin: the learner can make queries to an oracle
2. Typical queries are membership q., equivalence q., subset q. or correction q.
3. Algorithm L* can learn DFA with a polynomial amount of resources
ICGI, Valencia, September 2010
5
Cdlh 2010
State of the art (3)
Many reasons for wanting to learn DFA from queries
Useful in a number of fields Start with DFA… Under-resourced languages
ICGI, Valencia, September 2010
6
Cdlh 2010
The task
The participant is told that (s)he is to learn a DFA and allowed to ask k membership queries
She is given the alphabet, k, and an upper bound on the number of states.
The participant interactively uses the online oracle, and after making k queries, is given 1800 strings that she has to parse and classify. Score is % of correct labels.
ICGI, Valencia, September 2010
7
Cdlh 2010
The baseline
Angluin’s L* algorithm learns perfectly but uses MQ and EQs
A version in which EQs are “simulated” by random sampling is provided
ICGI, Valencia, September 2010
8
Cdlh 2010
A membership query
Learner: does aababababbbab belong to the language?
Oracle: no
ICGI, Valencia, September 2010
9
Cdlh 2010
An equivalence query
Learner: Is (aa*(b+ab)*bb+aa)* the correct answer?
Oracle: No, because aabababba does belong to the language
ICGI, Valencia, September 2010
10
Cdlh 2010
Simulating an equivalence query
Random strings are sampled: aabba, bbabba, aaaababab, bbabababaaaa,…
Learner’s hypothesis: aabba L Learner: does aabba belong to L? Oracle: yes (if we agree many times I
can’t be far off) Oracle: no (aabba can be used as a
counterexample)
ICGI, Valencia, September 2010
11
Cdlh 2010
The theory
DFA are learnable with MQ and EQ DFA are not learnable from a
polynomial number of MQ You can’t really simulate the EQ
through sampling because you don’t know what the distribution is
ICGI, Valencia, September 2010
12
Cdlh 2010
The oracle (1)
is given an upper bound n on the number of queries and the size of the alphabet
generates a (minimal) DFA with at most n states
runs the baseline on this DFA and halts as soon as it is 70% correct. This gives the number of queries (k) for that task.
gives the player an identifier.
ICGI, Valencia, September 2010
13
Cdlh 2010
The oracle (2)
interacts with the learner and answers to k queries
generates 1800 strings and gives them to the learner
receives the 1800 labels and computes the score
ICGI, Valencia, September 2010
14
Cdlh 2010
Scientific committee Dana Angluin, Yale University, USA Leo Becerra Bonache, Univ. de Tarragona, Spain François Coste, IRISA, Rennes, France Alex Clark, Royal Holloway Univ. of London, UK Ricard Gavaldá, UPC Barcelona, Spain Colin de la Higuera, U. Saint-Etienne/Nantes, France Jean-Christophe Janodet, U. de Saint-Etienne, France Aurélien Lemay, Université de Lille 3, France Laurent Miclet, ENSSAT Lannion and IRISA, France Tim Oates, University of Maryland, USA Anssi Yli-Jyrä, Helsinki, Finland Menno van Zaanen, Tilburg University, The
Netherlands
ICGI, Valencia, September 2010
15
Cdlh 2010
Organisation committee
Myrtille Ponge David Combe Jean-Christophe Janodet Colin de la Higuera
ICGI, Valencia, September 2010
16
Cdlh 2010
Some open issues
How should the DFA be generated? What is a random DFA? Generate random NFA instead? Should they not be “typical DFA”?
What distribution for the test set? If the distribution is known, this helps!
How do we have a fair competition?
ICGI, Valencia, September 2010
17
Cdlh 2010
Main dates
23rd of July 2009: official launch till May 2010: advertising and training
phase June 2010: competition phase 7th July 2010: results published September 2010: Workshop / Special
session
ICGI, Valencia, September 2010
18
Cdlh 2010
Zulu competition
http://labh-curien.univ-st-etienne.fr/zulu 23 competing algorithms, 11 players End of the competition a week ago. Tasks: Learn a DFA, be as precise as possible,
with n queries
ICGI, Valencia, September 2010
19
Cdlh 2010
ResultsTask
queries
alphabet
Best%
states
Task
queries
alphabet
states Best %
1 304 3 100,00
8 13 725 15 10 100,00
2 199 3 100,00
16 14 1365 15 17 100,00
3 1197 3 96,50 81 15 5266 15 60 100,00
4 1384 3 93,22 100 16 7570 15 71 100,00
5 1971 3 85,89 151 17 17034 15 147 100,00
6 3625 3 100,00
176 18 16914 15 143 87,94
7 429 5 100,00
15 19 1970 5 93 81,67
8 375 5 100,00
18 20 1329 5 61 70,00
9 2524 5 96,44 84 21 571 5 40 69,22
10 3021 5 100,00
90 22 735 5 57 65,11
11 5428 5 99,94 153 23 483 5 73 86,61
12 4616 5 100,00
123 24 632 5 78 100,00
ICGI, Valencia, September 2010
20
Cdlh 2010
Winners
Falk Howar Balle Eisenstat