Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
1/27
Automatic Generation of Language-based Tools:
The LISA Approach
Marjan Mernik
UNIVERSITY OF MARIBORFACULTY OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
2/27
Outline of the Presentation
• How to specify a programming language?
• Formal methods for programming language definition
• LISA compiler/interpreter generator• Language-based tools generated by
LISA
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
3/27
How to specify a programming language?
• Using the natural language– Advantages:
• descriptions are understandable,• accessible to a wide variety of users.
– Disadvantages:• lack of clarity, • ambiguities, • various interpretations.
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
4/27
How to specify a programming language?
• Using a formal method– Advantages:
• syntax and the semantics are defined in a precise and unambiguous manner,
• possibility for automatic generation of compilers or interpreters,
• tool for programming language design
– Disadvantages:• required detail knowledge
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
5/27
Formal methods for programming language definition
• Lexicon (regular definitions, FSA)• Syntax (BNF)• Semantics (axiomatic, attribute
grammars, denotational, algebraic, structural operational/natural, action, abstract state machines, ....)
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
6/27
Formal methods for programming language definition
• Possibility for automatic generation of compilers or interpreters– Attribute Grammars: Synthesizer
Generator– Denotational: PSG– Algebraic: ASF+SDF– Structural operational/Natural: Centaur– Action: ASD– Abstract-state machines: Gem-Mex
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
7/27
Formal methods for programming language definition
• From formal language definitions many other language-based tools can be automatically generated, such as: – syntax-directed editors, – type checkers, – dataflow analyzers, – partial evaluators, – debuggers, – test case generators, – animators, etc.
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
8/27
Formal methods for programming language definition
• The core language definitions have to be augmented or
• Just a part of formal language definitions is enough for automatic tool generation or
• Implicit information must be extracted from formal language definition
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
9/27
Formal methods for programming language definition
• Automatic generation is possible whenever a tool can be built from a fixed part and a variable part; and also the variable part, language dependent, has to be systematically derivable from the language specifications (Table 1).
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
10/27
Table 1Generated Tool Formal
SpecificationFixed Part Variable part
Lexer regular definitions algorithm which interpret action table
action table:StateState
Parser (LR) BNF algorithm which interpret action table and goto table
action table:StateTActiongoto table:State(TN)State
Evaluator Attribute Grammar tree walk algorithm
semantic functions
Language knowledgeable editor
regular definitions (extracted from AG)
matching algorithm
same as lexer
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
11/27
Table 1 (cont.)Generated Tool Formal
SpecificationFixed Part Variable part
FSA visualization regular definitions (extracted from AG)
FSA layout algorithm
same as lexer
Syntax tree visualization
BNF (extracted from AG)
Syntax tree layout algorithm
syntax tree
Dependency graph visualization
extracted from AG DG layout algorithm dependency graph
Semantic evaluator animation
extracted from AG Semantic tree layout algorithm
decorated syntax tree & semantic functions
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
12/27
Regular definitions
• An example – arithmetic expressions:
e.g. (23+2)*3 integer [0-9]+ operator + | * separator ( | )
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
13/27
Regular definitions (variable part)
action table: StateState
0..9 +,* (,) \t, \n, ‘ ‘
0 1 2 3 4
1 1
2
3
4 4
0
1
2
34
0..9
+,*
(,)\t,\n
0..9
\t,\n
tInteger
tOperator
tSeparator
tIgnore
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
14/27
Regular definitions (variable part)
void initAutomata() {
for (int i = 0; i<=maxState; i++) {
for (int j = 0; j<256; j++)
automata[i][j] = noEdge;
}
for (int i = '0'; i<='9'; i++)
automata[0][i] = automata[1][i] = 1;
automata[0]['+'] = automata[0]['*'] = 2;
automata[0]['('] = automata[0][')'] = 3;
automata[0]['\n']=automata[0][' ']=automata[0]['\t']=4;
automata[4]['\n']=automata[4][' ']=automata[4]['\t']=4;
finite[0] = tLexError;
finite[1] = tInteger;
finite[2] = tOperator;
finite[3] = tSeparator;
finite[4] = tIgnore;
}
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
15/27
Regular definitions (fixed part)Token nextToken() {
int currentState = startState;string lexem;int startColumn = column;int startRow = row;do { int tempState = getNextState(currentState, peek());
if (tempState!=noEdge) { currentState = tempState; lexem += (char)read(); }
else { if (isFiniteState(currentState)) {
Token token(lexem, startColumn, startRow, getFiniteState(currentState), eof());
if (token.getToken()==tIgnore) return nextToken();
else return token; } else { return Token("", startColumn, startRow,
tLexError, eof());} }
} while (true); }
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
16/27
BNF
• An example – arithmetic expressions: e.g. (23+2)*3
E ::= T EEEE ::= + T EE | T ::= F TTTT ::= * F TT | F ::= (E) | #integer
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
17/27
LL(1) Parser (fixed part)
a T a if token = 'a' then nextToken else error
1
2
n
...
if token IN FIRST(1) then T(1) else if token IN FIRST(2) then T(2) ... else if token IN FIRST(n) then T(n) else error
A N A call A;
n21 T(1);T(2);...;T(n). . .
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
18/27
LL(1) Parser (variable part)bool EE() {
if (scanner->currentToken().getLexem()=="+") { scanner->nextToken(); return T() && EE();
}return true;
}bool F(){
if (scanner->currentToken().getToken()==Scanner::tInteger) { scanner->nextToken();return true;
} else if (scanner->currentToken().getLexem()=="(") {
scanner->nextToken(); bool zac = E(); if (zac && scanner->currentToken().getLexem()==")") {
scanner->nextToken(); return true;
} else return false; } else return false;
}
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
19/27
SLR(1) Parser (variable part)
i ( ) + - # S E T i ( ) + - #
0 s s 1 2 3 4
1 s s s 6 7 5
2 r2 r2 r2 r2 r2 r2
3 r5 r5 r5 r5 r5 r5
4 s s 10 2 3 4
5 a
6 s s 8 3 4
7 s s 9 3 4
8 r3 r3 r3 r3 r3 r3
9 r4 r4 r4 r4 r4 r4
10 s s s 11 6 7
11 r6 r6 r6 r6 r6 r6
F: StateTAction G: State(TN)State
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
20/27
SLR(1) Parser (fixed part)
PUSH(stack,0)
current := nextToken
while true do
if F(TOPV(stack), current) = “s” then {shift}
T := G(TOPV(stack), current)
PUSH(stack,T)
current := nextToken
elseif F(TOPV(stack), current) = “r k” then {reduce k-th production}
for j = 1 to SIZE(k)
POP(stack)
T := G(TOPV(stack),LHS(k))
PUSH(stack,T)
elseif F(TOPV(stack),current) = “a” then
accept
else
error
endif
enddo
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
21/27
Attribute Grammars
Productions (p P): Semantic functions (fp,a, a S(X0)):
E T EE E.val = EE.val EE.inval = T.val EE0 + T EE1 EE0.val = EE1.val
EE1.inval = EE0.inval + T.val
EE0 EE0.val = EE0.inval
T F TT T.val = TT.val
TT.inval = F.val TT0 * F TT1 TT0.val = TT1.val
TT1.inval = TT0.inval * F.val
TT0 TT0.val = TT0.inval
F ( E ) F.val = E.val F Integer F.val = Str2Int(Integer.lexVal)
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
22/27
Attribute Grammars
bool EE(int inVal, int &val) { if (scanner->currentToken().getLexem()=="+")
{ scanner->nextToken(); int tempVal; bool ok = T(tempVal); return ok && EE(inVal+tempVal,val);
}else {
val = inVal; return true;
} } bool T(int &val) {
int tempVal;bool ok = F(tempVal);return ok && TT(tempVal, val);
}
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
23/27
LISA ver. 1
• Developed 1994• Mernik, Korbar,
Žumer. LISA: A Tool for Automatic Language Implementation, ACM Sigplan Notices, Vol. 30, No. 4, pp. 71 – 79, 1995.
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
24/27
LISA ver. 2.0
• LISA ver. 2.0 (joint work with M. Lenič, E. Avdičaušević, V. Žumer)– started in summer 1997– finished in summer 2000
• Incremental language development• Educational tool
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
25/27
LISA ver. 2.0
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
26/27
LISA ver. 2.0
• LISA generates also other language-based tools– Editors,– Inspectors,– visualizers/animators (Slovene-Portugal
Project)
Socrates/Erasmus Programme, University of Minho, Braga, May 29, 2003
27/27
LISA ver. 2.0
• LISA tool demonstration
More info: http://marcel.uni-mb.si/lisa