Compiler Lab Manual Final E-content

PRIST UNIVERSITY

(Estd. u/s 3 of UGC Act, 1956)

_________________________________________________________________________________

11150L66-Compiler Design

Lab Manual

III Year / VI Semester

PREPARED BY

R.BHAVANI,

ASSISTANT PROFESSORS

CSE DEPARTMENT

November-2013

FACULTY OF ENGINEERING AND TECHNOLOGY

CONTENTS

S.No Title Page No

1. Implement a lexical analyzer in “C” 11

2. Implement a lexical analyzer in “C” 15

3. Use LEX tool to implement a lexical analyzer 18

4. Implement a recursive descent parser for an expression grammar that

generates arithmetic expressions with digits, + and *.

22

5. Use YACC and LEX to implement a parser for the same grammar as

given in previous problem

27

6. Write semantic rules to the YACC program and implement a

calculator.

31

7. Implement the front end of a compiler that generates the three address

code for a simple language.

35

8. Implement the front end of a compiler that generates the three address

code for a simple language.

35

9. Implement the back end of the compiler and produces the assembly

language instructions

39

10. Implement the back end of the compiler which takes the three address

code generated and produces the assembly language instructions.

39

11. Implementing a Shift reduce parser.

45

12. Sample LEX program 51

11150L66/Compiler Design Lab III year/VI Sem

1 Prepared by R.Bhavani, AP/CSE, PRIST University.

THEORY OF PRACTICE

Lexical Analyzer

•Lexical Analyzer reads the source program character by character to produce tokens.

•Normally a lexical analyzer doesn‟t return a list of tokens at one attempt; it returns a

token when the parser asks a token from it.

The Role of the Lexical Analyzer

Read input characters

Group them into lexemes

Produce as output as a sequence of tokens

◦ Which in-turn used as input for the syntactical analyzer

Interact with the symbol table

◦ Insert identifiers

It stripes out

◦ comments

◦ White spaces: blank, new line, tab…

◦ other separators

Correlates error messages generated by the compiler with the source program

◦ By keep tracking the number of new lines seen

◦ And associates a line number with each error message

Tokens, Patterns, Lexemes

Token - pair of:

◦ Token name – abstract symbol representing a kind of lexical unit

keyword, identifier, …

◦ Optional attribute value

Pattern

◦ Description of the form that the lexeme of a token may take

◦ e.g.

For a keyword the pattern is the character sequence forming that

keyword

For identifiers the pattern is a complex structure that is matched by

many strings

Lexeme

◦ A sequence of characters in the source program matching a pattern for a

token



Shift-Reduce Parser:

• There are four possible actions of a shift-parser action:

– Shift : The next input symbol is shifted onto the top of the stack.

– Reduce: Replace the handle on the top of the stack by the non-terminal.

– Accept: Successful completion of parsing.

– Error: Parser discovers a syntax error, and calls an error recovery routine.

• Initial stack just contains only the end-marker $.

• The end of the input string is marked by the end-marker $.

A Stack Implementation of A Shift-Reduce Parser

Stack Input Action

$ id+id*id$ shift

$id +id*id$ reduce by F id

$F +id*id$ reduce by T F

$T +id*id$ reduce by E T

$E +id*id$ shift

$E+ id*id$ shift

$E+id *id$ reduce by F id

$E+F *id$ reduce by T F

$E+T *id$ shift

$E+T* id$ shift

$E+T*id $ reduce by F id

$E+T*F $ reduce by T T*F

$E+T $ reduce by E E+T

$E $ accept

Recursive-Descent Parsing (uses Backtracking)

•Backtracking is needed.

It tries to find the left-most derivation

– Recursive-Descent Parsing

• Backtracking is needed (If a choice of a production rule does not work, we

backtrack to try other alternatives.)

• It is a general parsing technique, but not widely used.

• Not efficient



Operator Precedence Parsing:

• Form of Shift/Reduce parsing

• Two important properties for these shift reduce Parsers is that ε (epsilon) does not

appear on the right side of any production and no production has two adjacent non

terminals (NT).

E -> E + E

T -> + T T// wrong production, because it has 2 adjacent NT. This allows us to find

handles

Precedence

• We need to define three different precedence relations between pairs of

terminals .They look like >, <, and ==. These symbols are positioned by using

precedence rules.

Relation Meaning:

a <. b, a yields precedence to b

a =. b a has the same precedence as b

a >. b a takes precedence

THE LEXICAL- ANALYZER GENERATOR LEX In this section, we introduce a tool called Lex, or in a more recent implementation Flex,

that allows one to specify a lexical analyzer by specifying regular expressions to describe

patterns for tokens. The input notation for the Lex tool is referred to as the Lex language

and the tool itself is the Lex compiler. Behind the scenes, the Lex compiler transforms the

input patterns into a transition diagram and generates code, in a file called lex . yy . c,

that simulates this transition diagram. The mechanics of how this translation from regular

expressions to transition diagrams occurs is the subject of the next sections; here we only

learn the Lex language.

Use of Lex:

An input file lex1 is written in the lex language and describes the lexical analyzer to be

generated . The Lex compiler transforms lex1 to a c program in a file that is always

named lex.yy.c



STRUCTURE OF A LEX PROGRAM:

A Lex program has the following form:

declarations

%%

translation rules

%%

auxiliary functions

The declarations section includes declarations of variables, manifest constants (identifiers

declared to stand for a constant, e.g., the name of a token), and regular definitions

The translation rules each have the form

Pattern { Action )

Each pattern is a regular expression, which may use the regular definitions of the

declaration section. The actions are fragments of code, typically written in C.

The third section holds whatever additional functions are used in the actions.

Alternatively, these functions can be compiled separately and loaded with the lexical

analyzer.

The lexical analyzer created by Lex behaves as follows :

1. When called by the parser, the lexical analyzer begins reading its remaining input, one

character at a time, until it finds the longest prefix of the input that matches one of the

patterns Pi.

2.It then executes the associated action Ai. Typically, Ai will return to the parser, but if it

does not (e.g., because Pi describes whitespace or comments), then the lexical analyzer

proceeds to find additional lexemes, until one of the corresponding actions causes a

return to the parser.

3.The lexical analyzer returns a single value, the token name, to the parser, but uses the

shared, integer variable yylval to pass additional information about the lexeme found, if

needed.

LEX PROGRAM FOR TOKEN : %{

/* definitions of manifest constants



LT, LE, EQ, NE, GT, GE,

IF, THEN, ELSE, ID, NUMBER, RELOP */ %}

/* regular definitions */ delim [ \t\nl

ws (delim)+

letter [A-Za-z]

digit [o-9]

id {letter} {(letter) | {digit})*

number {digit)+ (\ . {digit}+)? (E [+-] ?{digit}+)?

%%

{ws} (/* no action and no return */) if {return(IF) ; }

then {return(THEN) ; }

else {return(ELSE) ; }

{id} {yylval = (int) installID(); return(ID);}

{number} {yylval = (int) installNum() ; return(NUMBER) ; }

“<” {yylval = LT; return(REL0P); }

“<=” {yylval = LE; return(REL0P); }

“=” {yylval = EQ ; return(REL0P); }

“<>” {yylval = NE; return(REL0P);}

“>” {yylval = GT; return(REL0P);}

“>=” {yylval = GE; return(REL0P);}

%%

int installID0 {/* function to install the lexeme, whose first character is pointed to by yytext,

and whose length is yyleng, into the

symbol table and return a pointer

thereto */ } int installNum() {/* similar to installID, but puts numerical

constants into a separate table */ }

In the declarations section we see a pair of special brackets, %( and %). Anything

within these brackets is copied directly to the file lex . yy . c, and is not treated as a

regular definition.The manifest constants are placed inside it Also the languages occur as

a sequence of regular definitions .

Regular definitions that are used in later definitions or in the patterns of the

translation rules are surrounded by curly braces. Thus, for instance, delim is defined to be

a shorthand for the character class consisting of the blank, the tab, and the newline; the

latter two are represented, as in all UNIX commands, by backslash followed by t or n,

respectively

In the auxiliary-function section, we see two such functions, installID( )and installNum().

Like the portion of the declaration section that appears between everything in the

auxiliary section is copied directly to file lex. yy . c, but may be used in the actions.

First, an identifier declared in the first section, has an associated empty action. If we find

whitespace, we do not return to the parser, but look for another lexeme. The second token



has the simple regular expression pattern if. Should we see the two letters if on the input,

and they are not followed by another letter or digit (which would cause the lexical

analyzer to find a longer prefix of the input matching the pattern for id), then the lexical

analyzer consumes these two letters from the input and returns the token name IF, that is,

the integer for which the manifest constant IF stands. Keywords then and else are treated

similarly. The fifth token has the pattern defined by id. Note that, although keywords like

i f match this pattern as well as an earlier pattern, Lex chooses whichever pattern is listed

first in situations where the longest matching prefix matches two or more patterns. The

action taken when id is matched is given as follows:

I. Function installID( ) is called to place the lexeme found in the symbol table.

2. This function returns a pointer to the symbol table, which is placed in global variable

yylval, where it can be used by the parser or a later component of the compiler. Note that

installID () has available to it two variables that are set automatically by the lexical

analyzer that Lex generates:

(a) yytext is a pointer to the beginning of the lexeme

(b) yyleng is the length of the lexeme found.

3. The token name ID is returned to the parser

Lex and yacc:

This section contains example programs for the lex and yacc commands. Together, these

example programs create a simple, desk-calculator program that performs addition,

subtraction, multiplication, and division operations. This calculator program also allows

you to assign values to variables (each designated by a single, lowercase letter) and then

use the variables in calculations. The files that contain the example lex and yacc

programs are as follows:

calc.lex Specifies the lex command specification file, which defines the lexical analysis

rules.

calc.yacc Specifies the yacc command grammar file, which defines the parsing rules, and

calls the yylex subroutine created by the lex command to provide input.

The following descriptions assume that the calc.lex and calc.yacc example programs are

located in your current directory.

Compiling the Example Program

To create the desk calculator example program, do the following:

1. Process the yacc grammar file using the -d optional flag (which informs the yacc

command to create a file that defines the tokens used in addition to the C language

source code):



yacc -d calc.yacc

2. Use the ls command to verify that the following files were created:

y.tab.c

The C language source file that the yacc command created for the parser

y.tab.h

A header file containing define statements for the tokens used by the parser

3. Process the lex specification file:

lex calc.lex

4. Use the ls command to verify that the following file was created:

lex.yy.c

The C language source file that the lex command created for the lexical analyzer

5. Compile and link the two C language source files:

cc y.tab.c lex.yy.c

6. Use the ls command to verify that the following files were created:

y.tab.o

The object file for the y.tab.c source file

lex.yy.o

The object file for the lex.yy.c source file

a.out

The executable program file

To run the program directly from the a.out file, type:

$ a.out

To move the program to a file with a more descriptive name, as in the following example,

and run it, type:

$ mv a.out calculate



$ calculate

The file contains the following sections:

Declarations Section. This section contains entries that:

Include standard I/O header file

Define global variables

Define the list rule as the place to start processing

Define the tokens used by the parser

Define the operators and their precedence

Rules Section. The rules section defines the rules that parse the input

stream.

%start - Specifies that the whole input should match stat.

%union - By default, the values returned by actions and the lexical analyzer are

integers. yacc can also support values of other types, including structures. In

addition, yacc keeps track of the types, and inserts appropriate union member

names so that the resulting parser will be strictly type checked. The yacc value

stack is declared to be a union of the various types of values desired. The user

declares the union, and associates union member names to each token and

nonterminal symbol having a value. When the value is referenced through a $$

or $n construction, yacc will automatically insert the appropriate union name,

so that no unwanted conversions will take place.

%type - Makes use of the members of the %union declaration and gives an

individual type for the values associated with each part of the grammar.

%token - Lists the tokens which come from lex tool with their type.

Programs Section. The programs section contains the following subroutines.

Because these subroutines are included in this file, you do not need to use the

yacc library when processing this file.

main The required main program that calls the yyparse subroutine to

start the program.

yyerror(s) This error-handling subroutine only prints a syntax error message.

yywrap The wrap-up subroutine that returns a value of 1 when the end of

input occurs.



Front end of compiler:

The front end analyzes the source code to build an internal representation of the

program, called the intermediate representation or IR. It also manages the symbol table, a

data structure mapping each symbol in the source code to associated information such as

location, type and scope. This is done over several phases, which includes some of the

following:

1. Lexical analysis breaks the source code text into small pieces called tokens. Each

token is a single atomic unit of the language, for instance a keyword, identifier or

symbol name. The token syntax is typically a regular language, so a finite state

automaton constructed from a regular expression can be used to recognize it. This

phase is also called lexing or scanning, and the software doing lexical analysis is

called a lexical analyzer or scanner.

2. Preprocessing some languages like C, require a preprocessing phase which

supports macro substitution and conditional compilation. Typically the

preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case

of C, the preprocessor manipulates lexical tokens rather than syntactic forms.

However, some languages such as Scheme support macro substitutions based on

syntactic forms.

3. Syntax analysis involves parsing the token sequence to identify the syntactic

structure of the program. This phase typically builds a parse tree, which replaces

the linear sequence of tokens with a tree structure built according to the rules of a

formal grammar which define the language's syntax. The parse tree is often

analyzed, augmented, and transformed by later phases in the compiler.

4. Semantic analysis is the phase in which the compiler adds semantic information to

the parse tree and builds the symbol table. This phase performs semantic checks

such as type checking (checking for type errors), or object binding (associating

variable and function references with their definitions), or definite assignment

Back end of compiler

The term back end is sometimes confused with code generator because of the

overlapped functionality of generating assembly code. Some literature uses middle end to

distinguish the generic analysis and optimization phases in the back end from the

machine-dependent code generators.

The main phases of the back end include the following:

1. Analysis: This is the gathering of program information from the intermediate

representation derived from the input. Typical analyses are data flow analysis to

http://en.wikipedia.org/wiki/Intermediate_representation

http://en.wikipedia.org/wiki/Symbol_table

http://en.wikipedia.org/wiki/Lexical_analysis

http://en.wikipedia.org/wiki/Keyword_(computing)

http://en.wikipedia.org/wiki/Identifier

http://en.wikipedia.org/wiki/Symbol

http://en.wikipedia.org/wiki/Regular_language

http://en.wikipedia.org/wiki/Finite_state_automaton



http://en.wikipedia.org/wiki/Regular_expression

http://en.wikipedia.org/wiki/Lexical_analyzer

http://en.wikipedia.org/wiki/Preprocessor

http://en.wikipedia.org/wiki/C_(programming_language)

http://en.wikipedia.org/wiki/Macro_(computer_science)

http://en.wikipedia.org/wiki/Scheme_(programming_language)

http://en.wikipedia.org/wiki/Syntax_analysis

http://en.wikipedia.org/wiki/Parsing

http://en.wikipedia.org/wiki/Parse_tree

http://en.wikipedia.org/wiki/Formal_grammar

http://en.wikipedia.org/wiki/Parse_tree

http://en.wikipedia.org/wiki/Type_checking

http://en.wikipedia.org/wiki/Object_binding

http://en.wikipedia.org/wiki/Definite_assignment_analysis

http://en.wikipedia.org/wiki/Code_generation_(compiler)

http://en.wikipedia.org/wiki/Compiler_analysis

http://en.wikipedia.org/wiki/Data_flow_analysis



build use-define chains, dependence analysis, alias analysis, pointer analysis,

escape analysis etc. Accurate analysis is the basis for any compiler optimization.

The call graph and control flow graph are usually also built during the analysis

phase.

2. Optimization: the intermediate language representation is transformed into

functionally equivalent but faster (or smaller) forms. Popular optimizations are in

line expansion, dead code elimination, constant propagation, loop transformation,

register allocation and even automatic parallelization.

3. Code generation: the transformed intermediate language is translated into the

output language, usually the native machine language of the system. This involves

resource and storage decisions, such as deciding which variables to fit into

registers and memory and the selection and scheduling of appropriate machine

instructions along with their associated addressing modes.

http://en.wikipedia.org/wiki/Use-define_chain

http://en.wikipedia.org/wiki/Dependence_analysis

http://en.wikipedia.org/wiki/Alias_analysis

http://en.wikipedia.org/wiki/Pointer_analysis

http://en.wikipedia.org/wiki/Escape_analysis

http://en.wikipedia.org/wiki/Call_graph

http://en.wikipedia.org/wiki/Control_flow_graph

http://en.wikipedia.org/wiki/Compiler_optimization

http://en.wikipedia.org/wiki/Inline_expansion



http://en.wikipedia.org/wiki/Dead_code_elimination

http://en.wikipedia.org/wiki/Constant_propagation

http://en.wikipedia.org/wiki/Loop_transformation

http://en.wikipedia.org/wiki/Register_allocation

http://en.wikipedia.org/wiki/Automatic_parallelization

http://en.wikipedia.org/wiki/Code_generation_(compiler)

http://en.wikipedia.org/wiki/Machine_language



EX: NO: 1 LEXICAL ANALYZER USING ‘C’ LANGUAGE

DATE:

AIM:

To write a c program to implement lexical analysis for separating tokens.

ALGORITHM:

Start the program.

Read the input statement from the keyboard.

Use the tokenseperation() function to analyze the input program

and store the identifiers, keywords, operators, punctuation in the

dynamic arrays respectively.

Stores the tokens in its corresponding pointer array.

Increment the line number of each token and its occurrences.

Use the printtoken() function to print the stored tokens from the

arrays.

Stop the program.



PROGRAM: LEXICAL ANALYSER USING ‘C’

//File name: lex.c

#include<stdio.h>

#include<conio.h>

#include<ctype.h>

#include<string.h>

char str[100];

char symboltable[25][25];

char attributetable[25][25];

int firstindex=0;

void main()

{

void tokenseperation();

void printtokens();

int i;

clrscr();

printf("\n enter the source program \n");

gets(str);

tokenseperation();

printtokens();

getch();

}

void tokenseperation()

{

int i,j,k,l,len;

int keyword;

char *keywords[]={"if","else","while","void","switch","int","main","case"};

char *operators[]={"<",">","=","+","-","*","/"};

char punctuation[]="{}[];:( )";

char token[20];

len=strlen(str);

i=j=0;

while(i<len)

{

if(isalpha(str[i]))

{

while(isalpha(str[i])||isdigit(str[i]))

{

token[j++]=str[i];

i++;

}

token[i]='\0';

strcpy(symboltable[firstindex],token);

keyword=0;

for(k=0;k<=0;k++)



if(strcmp(keywords[k],token)==0)

{

strcpy(attributetable[firstindex++],"key");keyword=1;

break;

}

if(keyword==0)

strcpy(attributetable[firstindex++],"var");

}

j=0;

if(str[i]==NULL)

{

while(str[++i]!=NULL)

token[j++]=str[i];

token[j]='\0';


strcpy(attributetable[firstindex++],"l");

}

j=0;

if(isdigit(str[i]))

{

while(isdigit(str[i])||(str[i]=='.'))

token[j++]=str[i++];

token[j]='\0';


strcpy(attributetable[firstindex++],"c");

}

j=0;

token[j++]=str[i];

token[j++]='\0';

for(k=0;k<11;k++)

{

if(strcmp(operators[k],token)==0)

{


strcpy(attributetable[firstindex++],"operator");

break;

}

}

for(k=0;k<12;k++)

{

if(punctuation[k]==str[i])

{


strcpy(attributetable[firstindex++],"p");

break;

}



}

j=0;

i++;

}

}

void printtokens()

{

int i;

for(i=0;i<firstindex;i++)

printf("\n%s\t%s\n",symboltable[i],attributetable[i]);

getch();

}

OUTPUT:

Enter the source program

a+b=c;

a var

+ operator

b var

= operator

c var

; p

RESULT:

Thus the program has been executed and the separated tokens are

printed.



EX: NO: 2 LEXICAL ANALYZER USING ‘C’ LANGUAGE

DATE:

Aim:

To write, a program for dividing the given input program into lexemes.

ALGORITHM:

Start the program.

Declare the file pointer and necessary variables.

Open the input file in the read mode.

Use the string comparison function to check whether the current input

string is punctuation or keyword or operator or identifier respectively.

Print the tokens which are found.

Stop the program.



Program: LEXICAL ANALYSER USING ‘C’ //File name: lexical.c

#include<stdio.h>

#include<conio.h>

#include<string.h>

main()

{

int i,j,k,p,c;

char s[120],r[100];

char par[6]={'(',')','{','}','[',']'};

char sym[9]={'.',';',':',',','<','>','?','$','#'};

char key[9][10]={"main","if","else","switch","void","do","while","for","return",

“include”,”stdio”};

char dat[4][10]={"int","float","char","double"};

char opr[5]={'*','+','-','/','^'};

FILE *fp;

clrscr();

printf("\n\n\t enter the file name");

scanf("%s",s);

fp=fopen(s,"r");

c=0;

do

{ fscanf(fp,"%s",r);

getch();

for(i=0;i<6;i++)

if(strchr(r,par[i])!=NULL)

printf("\n paranthesis :%c",par[i]);

for(i=0;i<9;i++)

if(strchr(r,sym[i])!=NULL)

printf("\n symbol :%c",sym[i]);

for(i=0;i<9;i++)

if(strstr(r,key[i])!=NULL)

printf("\n keyword :%s",key[i]);

for(i=0;i<4;i++)

if((strstr(r,dat[i])&&(!strstr(r,"printf")))!=NULL)

{

printf("\n data type :%s",dat[i]);

fscanf(fp,"%s",r);

printf("\n identifiers :%s",r);

}

for(i=0;i<5;i++)

if(strchr(r,opr[i])!=NULL)

printf("\n operator :%c",opr[i]);

p=c;

c=ftell(fp);

}



while(p!=c);

return 0; }

INPUT FILE : sample.c

#include <stdio.h>

main()

{

}

OUTPUT:

enter the file name: sample.c

keyword : include

punctuation: <

keyword: stdio

punctuation: .

punctuation:>

keyword: main

punctuation: (

punctuation: )

punctuation: {

punctuation: }

RESULT:

Thus the program has been executed and the tokens are separated.



EX: NO: 3 IMPLEMENT THE LEXICAL ANALYZER USING

LEX TOOL.

DATE:

Aim:

To implement the lexical analyzer using LEX tool, for a subset of C language.

ALGORITHM:

Start the program.

Declare necessary variables and creates token representation using

Regular Expression.

Print the pre processor or directives, keywords by analysis of the

input program.

Check whether there are argument counter and argument vectors.

Open the input file in read mode.

Read the file and if any token in source program matches with the

regular expression that are all returned as integer values.

Print the token identified using yylex() function.

Stop the program.



Program: LEXICAL ANALYSER USING ‘LEX TOOL’

// File name is lexp.l

%{

int COMMENT=0;

%}

identifier [a-zA-Z][a-zA-Z0-9]*

%%

#.* { printf("\n%s is a PREPROCESSOR DIRECTIVE",yytext);}

int |

float |

char |

double |

while |

for |

do |

if |

break |

continue |

void |

switch |

case |

long |

struct |

const |

typedef |

return |

else |

goto {printf("\n\t%s is a KEYWORD",yytext);}

"/*" {COMMENT = 1;}

"*/" {COMMENT = 0;}

{identifier}$ {if(!COMMENT)printf("\n\nFUNCTION\n\t%s",yytext);}

\{ {if(!COMMENT) printf("\n BLOCK BEGINS");}

\} {if(!COMMENT) printf("\n BLOCK ENDS");}

{identifier}(\[[0-9]*\])? {if(!COMMENT) printf("\n %s is an IDENTIFIER",yytext);}

\".*\" {if(!COMMENT) printf("\n\t%s is a STRING",yytext);}

[0-9]+ {if(!COMMENT) printf("\n\t%s is a NUMBER",yytext);}

$(\;)? {if(!COMMENT) printf("\n\t");ECHO;printf("\n");}

\( ECHO;

= {if(!COMMENT)printf("\n\t%s is an ASSIGNMENT OPERATOR",yytext);}

\<= |

\>= |

\< |

== |

\> {if(!COMMENT) printf("\n\t%s is a RELATIONAL OPERATOR",yytext);}



%%

int main(int argc,char **argv)

{

if (argc > 1)

{

FILE *file;

file = fopen(argv[1],"r");

if(!file)

{

printf("could not open %s \n",argv[1]);

exit(0);

}

yyin = file;

}

yylex();

printf("\n\n");

return 0;

} int yywrap()

{

return 0;

}

INPUT:

$vi var.c

#include<stdio.h>

main()

{

int a,b;

}



OUTPUT:

$lex lexp.l

$cc lex.yy.c

$./a.out var.c

#include<stdio.h> is a PREPROCESSOR DIRECTIVE

FUNCTION

main (

)

BLOCK BEGINS

int is a KEYWORD

a IDENTIFIER

b IDENTIFIER

BLOCK ENDS

RESULT:

Thus the Lexical Analyzer was implemented using LEX TOOL for a subset of C

language.



EX: NO: 4 IMPLEMENTATION OF RECURSIVE DESCENT PARSING

DATE:

AIM:

To perform the implementation of recursive descent parsing.

ALGORITHM:

Start the program.

Get the expression from the user and call the e() function.

In gets(ipsym) function to get the input symbol and match with the

look ahead pointer and then return the token accordingly.

In e() and eprime(), it check whether the look ahead pointer is „+‟

or „-„ else return syntax error.

In t() and tprime(), it check whether the look ahead pointer is „*‟ or

„/‟ else return syntax error.

In f(), it check whether the look ahead pointer is a member of any

identifier.

In advance(), it advances the input pointer to the next position of

input string.

In main(), check if the current look ahead points to the token in a

given CFG it doesn‟t match the return syntax error.



PROGRAM: RECURSIVE DESDENT PARSING

// File name: recur.c

#include<stdio.h>

#include<conio.h>

#include<stdlib.h>

#include<string.h>

#include<ctype.h>

char ipsym[15],ipptr=0;

void eprime();

void e();

void tprime();

void t();

void advance();

void e();

void f();

void e()

{

printf("\n \t \t E-->TE'");

t();

eprime();

}

void eprime()

{

if(ipsym[ipptr]=='+')

{

printf("\n \t \t T-->+TE'");

advance();

t();

eprime();

}

else

printf("\n \t \t E'-->e");

}

void t()

{

printf("\n \t \t E'-->FT'");

f();

tprime();

}

void tprime()



{

if(ipsym[ipptr]=='*')

{

printf("\n \t \t E'-->*FT'");

advance();

f();

tprime();

}

else

printf("\n \t \t T'-->e");

}

void f()

{

if((ipsym[ipptr]=='i')||(ipsym[ipptr]=='I'))

{

printf("\n\t\tF-->i");

advance();

}

else if(ipsym[ipptr]=='c')

{

advance();

e();

if(ipsym[ipptr]==')')

{

advance();

printf("\n\t\tF-->(E)");

}

}

else

{

printf("\n\t syntax error");

getch();

exit(1);

}

}

void advance()

{

ipptr++;

}

void main()



{

int i;

clrscr();

printf("\n\t\tINPUT");

printf("\n\t\tGrammar without error recursion");

printf("\n\t\tE-->TE'\n\t\tE'-->+TE'|e\n\t\tT-->FT'");

printf("\n\t\tT'-->*FT'|e\n\t\tF-->(E)|i");

printf("\nENTER THE IP EXPRESSION");

gets(ipsym);

printf("\n\t\toutput");

printf("\n sequence of production rules");

e();

for(i=0;i<strlen(ipsym);i++)

{

if(ipsym[i]!='+'&&ipsym[i]!='*'&&ipsym[i]!='('&&ipsym[i]!=')'&&ipsym[i]!='i'

&&ipsym[i]!='c')

{

printf("\n syntax error");

break;

}

}

getch();

}



OUTPUT:

INPUT

Grammar without error recursion

E-->TE'

E'-->+TE'|e

T-->FT'

T'-->*FT'|e

F-->(E)|i

ENTER THE IP EXPRESSION i+i

Output sequence of production rules

E-->TE'

E'-->FT'

F-->i

T'-->e

T-->+TE'

E'-->FT'

F-->i

T'-->e

E'-->e

RESULT:

Thus the program to implement the recursive descent parser was

implemented and input strings were parsed.



EX: NO: 5 Using LEX and YACC to implement lexical analyzer

DATE:

AIM:

To write a c program to implement the lexical analyzer using

LEX and YACC tool.

ALGORITHM:

Start the program

Open a file seven.c in read and include the yylex() tool for input scanning.

Define the alphabets, numbers and identifiers.

Print the preprocessor, function, keyword using yytext.lex tool.

Print the relational, assignment and all the operator using yytext() tool.

Also scan and print where the loop ends and begins.

Stop the program.



PROGRAM: IMPLEMENTATION OF LEXICAL ANALYSER USING

LEX & YACC

// File name is lexical.l

%%

#.* {printf("\n %s is a PREPROCESSOR DIRECTIVE",yytext);}

int |

float |

char |

double |

while |

for |

do |

if |

break |

continue |

void |

switch |

case |

long |

struct |

scanf |

printf |

const |

typedef |

return |

else |

goto {printf("\n\t %s is a KEYWORD",yytext);}

\< |

\> |

\<= |

\>= |

\== |

\!= {printf("\n\t %s is RELATIONAL OPERATOR",yytext); }

\= {printf("\n\t %s is ASSIGNMENT OPERATOR",yytext);}

\+ |

\- |

\* |

\/ |

\% {printf("\n\t %s is ARITHMETIC OPERATOR",yytext);}

\".*\" {printf("\n\t %s is a string",yytext);}

[0-9]+ {printf("\n\t %s is a NUMBER",yytext);}

[a-zA-Z][a-zA-Z0-9]* {printf("\n\t %s is a IDENTIFIER",yytext);}

\{ {printf("\n BLOCK BEGINS");}

\} {printf("\n BLOCK ENDS");}

\/\*.*\*\/ printf("\n %s is COMMENT",yytext);

[\t];



%%


{

if(argc>1)

{

FILE *f;

f=fopen(argv[1],"r");

if(!f)

{

printf("could not open %s\n",argv[1]);

exit(0);

}

yyin=f;

}

yylex();

}

INPUT:

#include<stdio.h>

void main()

{

int a,b,c;

printf("enter the a ");

scanf("%d%d",&a,&b);

c=a+b;

getch();

}



OUTPUT:

$ lex lexical.l

$ cc lex.yy.c –ll

$ ./a.out seven.c

#include<stdio.h> is a PREPROCESSOR DIRECTIVE

void is a KEYWORD

main is a IDENTIFIER

BLOCKS BEGINS

int is a KEYWORD

a,b,c are IDENTIFIER

printf is a KEYWORD

scanf is a KEYWORD

BLOCK ENDS

RESULT:

Thus the program for implementation of a lexical analyzer using LEX and YACC

tools were successfully done.



EX: NO: 6 IMPLEMENTATION OF SIMPLE CALCULATOR

USING LEX and YACC

DATE:

AIM:

To write LEX and YACC program for implementing calculator using LEX and

YACC tools.

ALGORITHM:

Start the program.

In LEX program declare the identifier for log, cos, sin, tan and memory.

Identify the identifier and return id to parser.

In YACC program declare the possible symbol type, which are the tokens

which are returned by LEX.

Define precedence and associativity.

Define rule in CFG for non terminal.

In main() get the expression from user and print the output.

Stop the program.



PROGRAM:

//File name: calc.yacc

%{

#include <stdio.h>

int regs[26];

int base;

%}

%start list

%token DIGIT LETTER

%left '|'

%left '&'

%left '+' '-'

%left '*' '/' '%'

%left UMINUS /*supplies precedence for unary minus */

%% /* beginning of rules section */

list: /*empty */

|

list stat '\n'

|

list error '\n'

{

yyerrok;

}

;

stat: expr

{

printf("%d\n",$1);

}

|

LETTER '=' expr

{

regs[$1] = $3;

}

;

expr: '(' expr ')'

{

$$ = $2;

}

|

expr '*' expr

{

$$ = $1 * $3;

}

|

expr '/' expr

{

$$ = $1 / $3;

}

|

expr '%' expr

{

$$ = $1 % $3;

}

|



expr '+' expr

{

$$ = $1 + $3;

}

|

expr '-' expr

{

$$ = $1 - $3;

}

|

expr '&' expr

{

$$ = $1 & $3;

}

|

expr '|' expr

{

$$ = $1 | $3;

}

|

'-' expr %prec UMINUS

{

$$ = -$2;

}

|

LETTER

{

$$ = regs[$1];

}

|

number

;

number: DIGIT

{

$$ = $1;

base = ($1==0) ? 8 : 10;

} |

number DIGIT

{

$$ = base * $1 + $2;

}

;

%%

main()

{

return(yyparse());

}

yyerror(s)

char *s;

{

fprintf(stderr, "%s\n",s);

}

yywrap()

{

return(1);

}



PROGRAM:

// File name: calc.lex

%{

#include <stdio.h>

#include "y.tab.h"

int c;

extern int yylval;

%}

%%

" " ;

[a-z] {

c = yytext[0];

yylval = c - 'a';

return(LETTER);

}

[0-9] {

c = yytext[0];

yylval = c - '0';

return(DIGIT);

}

[^a-z0-9\b] {

c = yytext[0];

return(c);

}

OUTPUT:

$ lex calc.lex

$ yacc -d calc.yacc

$ cc y.tab.c lex.yy.c

$ mv a.out calculate

$ calculate

m=45 <press enter>

m+10 <press enter>

55

RESULT:

Thus the program for implementing calculator using LEX and YACC tools was

successfully done.



EX NO:7 & 8: IMPLEMENTATION OF THE FRONT END OF

COMPILER

DATE:

AIM:

To Write a C program for implementation of the front end of compiler.

ALGORITHM:

STEP 1: The input is a normal c program.

STEP 2: It is given as a text file.

STEP 3: The front end of the compiler has task of converting the source

program into intermediate code.

STEP 4: Intermediate code syntax tree, three address code (or) Post fix

Notation.

STEP 5: The backtracking process is involved here to produce the

Intermediate code.

STEP 6: Thus the intermediate code is generated for the given input.



PROGRAM: IMPLEMENTATION OF FRONT END OF COMPILER

//file name is front.c

#include<stdio.h>

#include<conio.h>

#include<string.h>

void main()

{char pg[100][100],str1[24];

int tem=-1,ct=0,i=-1,j=0,j1,pos=-1,t=-1,flag,flag1,tt=0,fg=0;

clrscr();

printf("Enter the codings \n");

while(i>-2)

{i++;

lab1:

t++;

scanf("%s",&pg[i]);

if((strcmp(pg[i],"getch();"))==0)

{i=-2;

goto lab1;}}

printf("\n pos \t oper \t arg1 \t arg2 \tresult \n");

while(j<t)

{lop:ct=0;

if(pg[j][1]=='=')

{ pos++;

tem++;

printf("%d\t%c\t%c\t%c\tt%d\n",pos,pg[j][3],pg[j][2],pg[j][4],tem);

pos++;

printf("%d\t:=\tt%d\t\t%c\n",pos,tem,pg[j][0]);

}

else if(((strcmp(pg[j],"if"))==0)||((strcmp(pg[j],"while"))==0))

{if((strcmp(pg[j],"if"))==0)

strcpy(str1,"if");

if((strcmp(pg[j],"while"))==0)

strcpy(str1,"ehile");

j++;

j1=j;tem++;

pos++;

if(pg[j][3]=='=')

printf("%d\t%c\t%c\t%c\tt%%d\n",pos,pg[j][2],pg[j][1],pg[j][4],tem);

else


j1+=2;

pos++;

while((strcmp(pg[j],"}"))!=0)

{ j++;

if(pg[j][1]=='=')

{tt=j;



fg=1;

}

ct++;

}

ct=ct+pos+1;

printf("%d\t==\tt%d\tFALSE\t%d\n",pos,tem,ct);

if(fg==1)

{ j=tt;

Goto lop;

}


{ pos++;

tem++;


pos++;

printf("%d\t:=\tt%d\t\t%c\n",pos,tem,pg[j][0]);

j++;

}

if((strcmp(pg[j+1],"else"))==0)

{ct=0;

j++;

j1=j;

j1+=2;

pos++;


{ j1++;

ct++;

}ct=ct*2;

ct++;

ct+=(pos+1);

j+=2;

printf("%d\tGOTO\t\t\\t%d\n",pos,ct);


{ pos++;

tem++;


pos++;

printf("%t:=\tt%d\t\t%c\n",pos,tem,pg[j][0]);

j++;

}

pos++;

printf("%d\tGOTO\t\t\t\%d\n",pos,ct);

}}

if((strcmp(pg[j],"}"))==0)

{ pos++;

printf("%d\tGOTO\t\t\t%d\n",pos,pos+1);



}

j++;

}

getch();

}

OUTPUT:

RESULT:

Thus the program has been executed and implemented the front end of the

compiler. .



EX NO:9&10 IMPLEMENTATION OF BACK END OF COMPILER

DATE:

AIM:

To Write a C program for implementation of back end of compiler.

ALGORITHM:

The input for the back end of the compiler is the intermediate code

generated by front end of the compiler.

The input file (IN.TXT) is provided in read mode.

The output file (TARGET.TXT) is created by the program in write mode.

Each and every intermediate code in the input file is converted to its

equivalent target code by the backend of the compiler

The output is stored in the TARGET.Txt file in the form of assembly

language.

Stop the program.



PROGRAM: IMPLEMENTATION OF BACK END OF COMPILER

//file name is back.c

#include<stdio.h>

#include<conio.h>

#include<stdlib.h>

#include<string.h>

int label[20];

int no=0;

int main()

{

FILE *fp1,*fp2;

int check_label(int n);

char fname[10],op[10],ch;

char operand1[8],operand2[8],result[8];

int i=0;

clrscr();

printf("\n\nEnter filename of the intermediate code:");

scanf("%s",&fname);

fp1=fopen(fname,"r");

fp2=fopen("target.txt","w");

if(fp1==NULL||fp2==NULL)

{

printf("\nError Opening the File.");

getch();

exit(0);

}

while(!feof(fp1))

{

fprintf(fp2,"\n");

fscanf(fp1,"%s",op);

i++;

if(check_label(i))

{

fprintf(fp2,"\nlabel#%d:",i);

}

if(strcmp(op,"print")==0)

{

fscanf(fp1,"%s",result);

fprintf(fp2,"\n\tOUT%s",result);

}



if(strcmp(op,"goto")==0)

{

fscanf(fp1,"%s",operand2);

fprintf(fp2,"\n\t JMP labe#%s",operand2);

label[no++]=atoi(operand2);

}

if(strcmp(op,"[]=")==0)

{

fscanf(fp1,"%s%s%s",operand1,operand2,result);

fprintf(fp2,"\n\tSTORE%s[%s],%s",operand1,operand2,result);

}

if(strcmp(op,"uminus")==0)

{

fscanf(fp1,"%s%s",operand1,result);

fprintf(fp2,"\n\tMOV R1,-%s",operand1);

fprintf(fp2,"\n\tMOV %s,R1",result);

}

switch(op[0])

{

case'*':


fprintf(fp2,"\n\t MOV R0,%s",operand1);


fprintf(fp2,"\n\t MUL R0 R1");

fprintf(fp2,"\n\t MOV %s,R0",result);

break;

case'+':




fprintf(fp2,"\n\t ADD R0 R1");


break;

case'-':




fprintf(fp2,"\n\t SUB R0 R1");


break;



case'/':




fprintf(fp2,"\n\t DIV R0 R1");


break;

case'%':




fprintf(fp2,"\n\t DIV R0 R1");


break;

case'=':

fscanf(fp1,"%s%s",operand1,result);

fprintf(fp2,"\n\t MOV %s,%s",result,operand1);

break;

case'>':


fprintf(fp2,"\n\t JGT %s,%s label#%s",operand1,operand2,result);

label[no++]=atoi(result);

break;

case'<':


fprintf(fp2,"\n\t JLT%s,%s label#%s",operand1,operand2,result);

label[no++]=atoi(result);

break;

}

}

fclose(fp2);

fclose(fp1);

fp2=fopen("target.txt","r");

if(fp2==NULL)

{

printf("\nError Opening the File");

getch();

exit(0);

}

do



{

ch=fgetc(fp2);

printf("%c",ch);

}while(ch!=EOF);

fclose(fp2);

getch();

return 0;

}

int check_label(int k)

{

int i;

for(i=0;i<no;i++)

{

if(k==label[i])

return 1;

}

return 0;

}

INPUT: (IN.TXT)

[]=a i 1

* x y t1

+ t1 z t2

> t2 num 6

goto 8

+ x x x

+ y y y

print x

= y z

print z

OUTPUT: (TARGET.TXT)

MOV R0,x

MOV R1,y

MUL R0 R1

MOV t1,R0

MOV R0,t1



MOV R1,z

ADD R0 R1

MOV t2,R0

JGT t2,num label#6

JMP labe#8

label#8:

MOV R0,x

MOV R1,x

ADD R0 R1

MOV x,R0

MOV R0,y

MOV R1,y

ADD R0 R1

MOV y,R0

OUTx

MOV z,y

OUTz

RESULT:

Thus the program has been executed and implemented the back end of the

compiler.



ADDITIONAL LIST OF PRACTICALS

EX.NO : 11 SHIFT REDUCE PARSER:

Date:

AIM:

To write a C program to perform the shift reduce parsing.

ALGORITHM:

1. Start the program.

2. Define the main function.

3. Declare array for string and stack and other necessary variables.

4. Get the expression from the user and store it as string.

5. Append $ to the end of the string.

6. Store $ into the stack.

7. Print three columns as „Stack‟, „String‟ and „Action‟ for the respective actions.

8. Use for loop from i as 0 till string length and check the string.

9. If string has some operator or id, push it to the stack.

10. Mark this action as „Shift‟.

11. Print the stack, string and action values.

12. If stack contains some production on shifting, reduce it.

13. Mark this action as „Reduce‟.

14. Print the stack, string and action values.

15. Repeat steps 9 to 14 again and again till the for loop is valid.

16. Now check the string and the stack.

17. If the string contains only $ and the stack has only $E within it, then print that

the „given string is valid‟.

18. Else print that the „given string is invalid‟.

19. End the program



PROGRAM: SHIFT REDUCE PARSER

// file name is shift.c

#include<stdio.h>

#include<string.h>

#include<conio.h>

void main()

{

char str[25],stk[25];int i,j,t=0,l,r;

clrscr();

printf("Enter the String : ");

scanf("%s",&str);

l=strlen(str);

str[l]='$';

stk[t]='$';

printf("Stack\t\tString\t\tAction\n-----------------------------------\n ");

for(i=0;i<l;i++)

{if(str[i]=='i')

{t++;

stk[t]=str[i];

stk[t+1]=str[i+1];

for(j=0;j<=t+1;j++)

printf("%c",stk[j]);

printf("\t\t ");

for(j=i+2;j<=l;j++)

printf("%c",str[j]);

printf("\t\tShift");

printf("\n ");

stk[t]='E';i++;

}

else

{t++;

stk[t]=str[i];

}

for(j=0;j<=t;j++)


printf("\t\t ");

for(j=i+1;j<=l;j++)

printf("%c",str[j]);

if(stk[t]=='+' || stk[t]=='*')

printf("\t\tShift");

else

printf("\t\tReduce");

printf("\n ");

}

while(t>1)

{if(stk[t]=='E' && (stk[t-1]=='+' || stk[t-1]=='*') && stk[t-2]=='E')



{t-=2;

for(j=0;j<=t;j++)


printf("\t\t");

printf(" %c",str[l]);

printf("\t\tReduce\n ");

}

else

t-=2;

}

if(t==1 && stk[t]!='+' && stk[t]!='*')

printf("\nThe Given String is Valid\n\n");

else

printf("\nThe Given String is Invalid\n\n");

getch();

}

OUTPUT:

Enter the String : id+id*id

Stack String Action

$id +id*id $ Shift

$E +id*id$ Reduce

$E+ id *id$ Shift

$E+E * id$ Reduce

$E+E* id$ Shift

$E+E*id $ Shift

$E+E*E $ Reduce

$E+E $ Reduce

$E $ Reduce

The Given String is Valid

ALTERNATE PROGRAM:

// file name is shift1.c

#include<stdio.h>

#include<conio.h>

#include<string.h>

#include<ctype.h>

#include<process.h>

typedef struct

{

char num[10];

int top;



}stack;

stack s;

void st_push(char a)

{ s.num[s.top++]=a;

} char st_pop()

{ int a;

if(s.top==0)

return-1;

s.top=s.top-1;

a=s.num[s.top];

s.num[s.top]='\0';

return(a);

} char *substring(char *str,int start)

{ char *sub="";

int i=0;

while (str[start]!='\0')

sub[i++]=str[start++];

sub[i]='\0';

return sub;

} void main()

{ char lhs[10],rhs[10][10];

char sub[10],ipstring[10];

char doller[3],*substr;

int i,j,k,l,m,n,flag;

int length=0,index=0;

clrscr();

printf("\n Enter the no. of productions:");

scanf("%d",&n);

printf("\n Enter the production in following form");

printf("\nlhs\trhs\n");

for(i=0;i<n;i++)

{

fflush(stdin);

scanf("%c",&lhs[i]);

scanf("%s",&rhs[i]);

} doller[0]='$';

doller[1]=lhs[0];

doller[2]='\0';

printf("\nEnter the input string to be checked");

scanf("%s",&ipstring);

strcat(ipstring,"$");

st_push('$');

length=strlen(ipstring);

i=0;

printf("\n\t stack\tinput\taction");

printf("\n\t%s\t%s\t",s.num,ipstring);



st_push(ipstring[i++]);

substr=substring(ipstring,1);

printf("\n\t%s\t%s\tSHIFT",s.num,substr);

while(i<length)

{

for(j=1;j<s.top;j++)

{

flag=0;

index=0;

if(s.top==0)

sub[0]='\0';

else

for(k=j;k<s.top;k++)

sub[index++]=s.num[k];

sub[index]='\0';

for(k=0;k<n;k++)

{

if(strcmp(sub,rhs[k])==0)

{

m=0;

while(m<strlen(rhs[k]))

{

st_pop();

m++;

}

st_push(lhs[k]);

flag=1;

substr=substring(ipstring,i);

printf("\n\t%s\t%s\tREDUCE",s.num,substr);

}

}

if(flag==1)

break;

} if(flag==0)

{ if(ipstring[i]!='$')

{ st_push(ipstring[i++]);

substr=substring(ipstring,i);

printf("\n\t%s\t%s\tSHIFT",s.num,substr);

}

else

{

if(strcmp(s.num,doller)==0)

printf("\n\t%s\t%s\tACCEPT",s.num,substr);

else

printf("\n\t%s\t%s\tERROR",s.num,substr);

getch();



exit(0);

}

}

}

}

OUTPUT:

Enter the no. of productions:3

Enter the production in following form

lhs rhs

S CC

C cC

C d

Enter the input string to be checkedcdcd

stack input action

$ cdcd$

$c dcd$ SHIFT

$cd cd$ SHIFT

$cC cd$ REDUCE

$C cd$ REDUCE

$Cc d$ SHIFT

$Ccd $ SHIFT

$CcC $ REDUCE

$CC $ REDUCE

$S $ REDUCE

$S $ ACCEPT

RESULT:

Thus the Shift reduce parser has been successfully implemented.



EX: NO:12 SAMPLE LEX PROGRAM

DATE:

AIM:

To write a LEX program to count the no of characters, words in

an input program.

ALGORITHM:

Start the program

Open a file seven.c in readmode and include the yylex() tool for input

scanning.

Define and initialize the counters cc, lc and wc.

Count number of characters, words and lines until end of input file.

Print the number of characters, words and lines (cc, wc and lc).

Stop the program.



PROGRAM: SAMPLE LEX PROGRAM

// file name is word.l

%{

int cc=0,wc=0,lc=0;

%}

%%

[^\t\n]+ {wc++;cc+=yyleng;}

\n {cc++;lc++;}

[ ] cc++;

%%


{

if(argc>1)

{

FILE *f;

f=fopen(argv[1],"r");

if(!f)

{

printf("could not open%s",argv[1]);

exit(0);

}

yyin=f;

}

+

yylex();

printf("%d\n%d\n",cc,wc,lc);

return 0;

}

int yywrap()

{return 1;

}

INPUT:

#include<stdio.h>

void main()

{

int a,b,c;

printf("enter the a ");

scanf("%d%d",&a,&b);



c=a+b;

getch();

}

OUTPUT:

$ lex word.l

$ cc lex.yy.c -ll

$ ./a.out seven.c

101

9

RESULT:

Thus the LEX program has been successfully calculated the number of

characters, words and lines in an input file.

Compiler Lab Manual Final E-content

Documents

id id

lexical analyzer doesnt

shiftparser action

parser stack input action

syntactical analyzer

input symbol

source program character

yacc program