LAB # 5: SYMBOL TABLE, FLEX COMPILER ENGINEERING
Jan 17, 2015
L A B # 5 : S Y M BO L TA B L E , F L E X
COMPILER ENGINEERING
Department of Computer Science - Compiler Engineering Lab
2
WHY DO I NEED A SYMBOL TABLE?
• Symbol Table answers the following questions:1. For a certain declaration of an Identifier name , does it
have multiple declarations in different scopes?• If YES, keep track of these different cases
2. For a USE of an Identifier name, to which scope does it correspond? (using the "most closely nested" rule)
3. How can various language logical structures be presented?
• One of the Main Purposes of Symbol Table is: to keep track of IDENTIFIERS recognized in the input stream.
• All subsequent references to identifiers refer to the appropriate symbol table index.
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
3
WHY DO I NEED A SYMBOL TABLE?
• So, Symbol Table is a group of linked hash tables that manages either:• IDENTIFIERS attributes,• or convey the structure of a statement\expression that
they reflect.
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
4
BUILDING COMPILER WITH TOOLS
1-4/4/12
Lexical Analysis
(Lex \ Flex)
Syntax-Semantic Analysis
(Yacc \Bison)
Assembly(LLVM)
Linking
Department of Computer Science - Compiler Engineering Lab
51-4/4/12
TOKENS ARE NUMERICAL REPRESENTATIONS OF STRINGS,
AND SIMPLIFY PROCESSING
First: Lexical Analysis with Flex
Reminder: What is the goal of Lexical Analysis
phase?In other words, why do I use TOKENS?
Department of Computer Science - Compiler Engineering Lab
6
BUILDING A COMPILER WITH FLEX(LEX) & BISON(YACC)
1-4/4/12
Yacc\ Bison
Lex \ Flex
Department of Computer Science - Compiler Engineering Lab
7
FLEX & BISON INPUT-OUTPUT INTERACTION
• Bas.y (Parser Input):• Y.tab.h (Parser Output):• is a header file that contains, among other things,
• the definitions of the token names NUM, OPA, etc.,• and the variable yylval that we use to pass the bison code the semantic values of
tokens
• Y.tab.c (Parser Output): • contains the C\C++ code for the parser (which is a function called
yyparse())
• Bas.l (Lexica Analyzer Input):• needs those definitions in (Y.tab.h) so that the yylex() function it defines for
bison can pass back the information it needs to pass back.
• Lex.yy.c(Lexica Analyzer Output): generated by FLEX that contains, among other things, the definition of the yylex() function
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
8
FLEX
• Flex is an open-source tool for Fast lexical Analyzer can be downloaded from Flex Download Webpage
• Flex reads user-specified input files• If no input file is given, Flex will read its standard file
• Flex INPUT file:• is a description of a scanner to generate• The description is Pairs of Regular Expressions and C Code
called RULES
Source: Flex Website
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
9
FLEX
• When the executable runs it analyzes its input for occurrences of text matching the Regular Expressions (RegExp) for each rule.• Whenever it finds a match it executes the
corresponding C code
• Flex generates a C source file named “lex.yy.c” which defines the function yylex()• The file “lex.yy.c” can be compiled and linked to
produce an executable
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
10
BISON(1)
• Bison is a general-purpose parser generator that converts an annotated context-free grammar into a deterministic [LR or generalized LR (GLR)] parser.• Bison, you can use it to develop a wide range of
language parsers.• Bison is compatible with YACC• Based on C\C++ and works with Java.
(1) Source: Bison Webpage
1-4/4/12
11Department of Computer Science - Compiler Engineering Lab
1-4/4/12
FLEX TOOL
FIRST : LEXICAL ANALYSIS
Department of Computer Science - Compiler Engineering Lab
12
BISON\YACC & PARSING
• The grammar in the previous diagram (Flex\Bison Interaction Model Diagram) is a text file you create with a text editor
• Yacc will read your grammar and generate C code for a syntax analyzer or parser
• The syntax analyzer uses grammar rules that allow it to analyze tokens from the lexical analyzer and create a syntax tree
• The syntax tree imposes a hierarchical structure on the tokens• e.g. operator precedence and associativity are apparent in
the syntax tree
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
13
LEXICAL ANALYSIS: FLEX IN DETAIL
• The flex generated scanner code tries to match characters from the current input stream to these regular expressions, and when a match is found, it executes the associtated action code.• The variable yytext contains the string (in the C
sense, i.e. '\0' terminated char*.) of characters that were matched.• When more than one match is possible it breaks
ties by going with the longest match then first listed
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
14
FLEX FILES FORMAT
The general format of Lex source is:
{definitions} %%
{rules}%%
{user subroutines}
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
15
FLEX FILES FORMAT
• Any line which is not part of a Lex rule or action which begins with a blank or tab is copied into the Lex generated program. (used for Comments)
• source input prior to the first %% delimiter will be external to any function in the code; if it appears immediately after the first %%, it appears in an appropriate place for declarations in the function written by
• Anything included between lines containing only %{ and %} is copied out as above. The delimiters are discarded. This format permits entering text like preprocessor statements that must begin in column 1.
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
16
FLEX FILES FORMAT
• Any line which is not part of a Lex rule or action which begins with a blank or tab is copied into the Lex generated program. (used for Comments)
• This means each code-line must start from column one.• source input prior to the first %% delimiter will be
external to any function in the code; if it appears immediately after the first %%, it appears in an appropriate place for declarations in the function written by
• Anything included between lines contain- ing only %{ and %} is copied out as above. The delimiters are discarded. This format permits entering text like preprocessor statements that must begin in column 1.
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
171-4/4/12
Department of Computer Science - Compiler Engineering Lab
18
EXAMPLE # 1 : CALCULATOR FLEX FILE
Write a Flex file for a calculator that is able to recognize the following:• numbers, • (+,-,*,/) operators,• and parantheses
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
19
EXAMPLE # 1 : CALCULATOR (.L)
%{ #include "ex1.tab.hpp" #include <iostream> using namespace std; %} %option noyywrap %% [0-9]+ { yylval.val = atoi(yytext); return NUM; }[\+|\-] { yylval.sym = yytext[0]; return OPA; } [\*|/] { yylval.sym = yytext[0]; return OPM; } "(" { return LP; } ")" { return RP; } ";" { return STOP; } <<EOF>> { return 0; } [ \t\n]+ { } . { cerr << "Unrecognized token!" << endl; exit(1); } %%
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
20
EXAMPLE # 2 : “BASIC LANGUAGE” FILE
Write a Flex file for ”Basic Language” that is able to recognize the following:• Numbers,• Identifiers,• Keywords,• Relation Operations• Arthimitic operators,• and Delimiters
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
21
EXAMPLE # 2 : (BAS.L) FILE FOR BASIC LANGAUGE
%{ #include "y.tab.h” %} digit [0-9]letter [a-zA-Z] %%"+" { return PLUS; } "-" { return MINUS; } "*" { return TIMES; } "/" { return SLASH; } "(" { return LPAREN; } ")" { return RPAREN; } ";" { return SEMICOLON; } "," { return COMMA; } "." { return PERIOD; }
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
22
EXAMPLE # 2 : (BAS.L) FILE FOR BASIC LANGAUGE
":=" { return BECOMES; } "=" { return EQL; } "<>" { return NEQ; } "<" { return LSS; } ">" { return GTR; } "<=" { return LEQ; } ">=" { return GEQ; }
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
23
EXAMPLE # 2 : (BAS.L) FILE FOR BASIC LANGAUGE
"begin" { return BEGINSYM; } "call" { return CALLSYM; } "const" { return CONSTSYM; } "do" { return DOSYM; } "end" { return ENDSYM; } "if" { return IFSYM; }"odd" { return ODDSYM; } "procedure" { return PROCSYM; } "then" { return THENSYM; } "var" { return VARSYM; } "while" { return WHILESYM; }
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
24
EXAMPLE # 2 : (BAS.L) FILE FOR BASIC LANGAUGE
{letter}({letter}|{digit})* { yylval.id = (char *)strdup(yytext);
return IDENT; } {digit}+
{ yylval.num = atoi(yytext); return NUMBER; }
[ \t\n\r] /* skip whitespace */ . { printf("Unknown character [%c]\
n",yytext[0]); return UNKNOWN; } %% int yywrap(void) {return 1;}
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
25
INSTALLING FLEX & BISON
• Flex & Bison are Unix-based Tools that can be installed directly under Unix\Linux environment.• Unser Windows Environment: Flex & Bison can be
installed under Cygwin ( Linux Terminal). • Cygwin can be downloaded from here.• Cygwin is a Linux API layer providing substantial
Linux API functionality.
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
26
INSTALLING FLEX & BISON
• Steps to download Cygwin:1. Click the following link for Setup2. Click Run Next Choose the option “Install from
Internet” Next Make sure the correct installation directory then Next
3. Choose “Direct Connection” then Next4. Choose any of the available HTTP download sites and
click Next5. In the “Select Packages” page: search for both “Flex”
and “Bison” packages through the search textbox then ensure the “install” option is selected
6. Finish the installation process by clicking Next to the end.
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
27
OVERVIEW OF ANTLR(1)
• ANTLR, ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions in a variety of target languages.
• ANTLR provides excellent support for tree construction, tree walking, translation, error recovery, and error reporting
• ANTLR can be downloaded from the following link ANTLR (.jar) file download link
(1) Source: ANTLR Webpage
1-4/4/12
Department of Computer Science - Compiler Engineering Lab
28
QUESTIONS?
Thank you for listening
1-4/4/12