by Neng-Fa Zhou
Programming language syntax
Three aspects of languages – Syntax
• How are sentences formed?
– Semantics• What does a sentence mean?
– Pragmatics• How to use the language?
Only syntax can be described formally– Regular expressions and context-free grammars
by Neng-Fa Zhou
Regular expressions (RE)
The empty stringis a RE Every symbol in (alphabet) is a RE Let r and s be REs.
– r | s : or– rs : concatenation– (r)* : zero or more instances– (r)+ : one or more instances – (r)? : zero or one instance
by Neng-Fa Zhou
Precedence of operators
high
low
r* r+ r?
rs
r|s
all left associative Examples
= {a,b}1. a|b2. (a|b)(a|b)3. a*4. (a|b)*5. a| a*b
by Neng-Fa Zhou
Algebraic Properties of RE
by Neng-Fa Zhou
d1 r1
d2 r2
dn rn
....di is a RE over {d1,d2,...,di-1}
Regular Definitions
not recursive
by Neng-Fa Zhou
Examples
Identifiers
Decimal integers in Java
Hexadecimal integers
letter A | B | ... | Z | a | b | ... | zdigit 0 | 1 | ... | 9id letter ( letter | digit )*
DecimalNumeral 0 | nonZeroDigit digit*
HexaNumeral (0x | 0X) hexadigit+
by Neng-Fa Zhou
Lex
A tool for automatically generating lexical analyzers
by Neng-Fa Zhou
Lex Specifications
declarations%%
translation rules
%%auxiliary procedures
p1 {action1}p2 {action2}...pn {actionn}
by Neng-Fa Zhou
Lex Regular Expressions
Example-1
by Neng-Fa Zhou
%{ int num_lines = 0, num_chars = 0;%} %% \n ++num_lines; ++num_chars; . ++num_chars;
%%main(){ yylex(); printf( "# of lines = %d, # of chars = %d\n", num_lines, num_chars );}
yywrap(){return 0;}
by Neng-Fa Zhou
Example-2D [0-9]INT {D}{D}*
%%{INT}("."{INT}((e|E)("+"|-)?{INT})?)? {printf("valid %s\n",yytext);}. {printf("unrecognized %s\n",yytext);}%%int main(int argc, char *argv[]){
++argv, --argc;if (argc>0) yyin = fopen(argv[0],"r"); else yyin = stdin;yylex();
}
yywrap(){return 0;}
java.util.regex
by Neng-Fa Zhou
import java.util.regex.*;
class Number { public static void main(String[] args){
String regExNum = "\\d+(\\.\\d+((e|E)(\\+|-)?\\d+)?)?";if (Pattern.matches(regExNum,args[0])) System.out.println("valid");else System.out.println("invalid");
}}
String Pattern Matching in Perl
by Neng-Fa Zhou
print "Input a string :";$_ = <STDIN>;chomp($_);if (/^[0-9]+(\.[0-9]+((e|E)(\+|-)?[0-9]+)?)?$/){ print "valid\n";} else { print "invalid\n"; }
by Neng-Fa Zhou
Context-free Grammars
– is a finite set of terminals– N is a finite set of non-terminals– P is a finite subset of production rules– S is the start symbol
G=( ,N,P,S)
by Neng-Fa Zhou
E T | E + T | E - TT F | T * F |T / FF id | (E)
CFG: Examples
Arithmetic expressions
Statements
IfStatement if E then Statement else Statement
by Neng-Fa Zhou
CFG vs. Regular Expressions
CFG is more expressive than RE– Every language that can be described by regular
expressions can also be described by a CFG
Example languages that are CFG but not RE– if-then-else statement, {anbn | n>=1}
Non-CFG– L1={wcw | w is in (a|b)*}– L2={anbmcndm | n>=1 and m>=1}
by Neng-Fa Zhou
Derivations
if
and then
*
* *
S*
is a sentential form
is a sentence if it contains only terminal symbols
by Neng-Fa Zhou
Derivations
leftmost derivation
Rightmost derivation
if is a string of terminals
if is a string of terminals
by Neng-Fa Zhou
Parse Trees
A parse tree is any tree in which– The root is labeled with S– Each leaf is labeled with a token a or – Each interior node is labeled by a nonterminal– If an interior node is labeled A and has children
labeled X1,.. Xn, then A X1...Xn is a production.
by Neng-Fa Zhou
Parse Trees and Derivations
E E + E | E * E | E - E | - E | ( E ) | id
by Neng-Fa Zhou
YACC%token DIGIT %%lines : lines expr '\n' {printf("%d\n",$2);}
| lines '\n'|;
expr : expr '+' term {$$ = $1 + $3;} | expr '-' term {$$ = $1 - $3;}
| term;
term : term '*' factor {$$ = $1 * $3;}| term '/' factor {$$ = $1 / $3;}| factor;
factor : '(' expr ')'{$$ = $2;}| DIGIT;
%%
DCG in Prolog Strings with an equal number of 0’s and 1’s
DCG Prolog clauses
by Neng-Fa Zhou
:-table e/2.e --> [].e --> [0],e,[1].e --> [1],e,[0].e --> e,e.
:-table e/2.e(A, A).e(A, B) :- 'C'(A, 0, C), e(C, D), 'C'(D, 1, B).e(A, B) :- 'C'(A, 1, C), e(C, D), 'C'(D, 0, B).e(A, B) :- e(A, C), e(C, B).