UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical Analyzer, Input Buffering. Specification of Tokens, Recognition of Tokens, Design of Lexical Analyzer using Uniform Symbol Table, Lexical Errors. LEX: LEX Specification, Generation of Lexical Analyzer by LEX.
55
Embed
UNIT - III INTRODUCTION TO COMPILERS · UNIT - III INTRODUCTION TO COMPILERS Phase structure of Compiler and entire compilation process. Lexical Analyzer: The Role of the Lexical
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
UNIT - III INTRODUCTION TO COMPILERS
Phase structure of Compiler and entire compilation process.
Lexical Analyzer: The Role of the Lexical Analyzer, InputBuffering. Specification of Tokens, Recognition of
Tokens, Design of Lexical Analyzer using Uniform SymbolTable, Lexical Errors.
LEX: LEX Specification, Generation of Lexical Analyzer by LEX.
What is Compiler?
• Compiler is a software which converts a program written in high level language (Source Language) to low level language (Object/Target/Machine Language).
Symbol Table
• Symbol Table – It is a data structure being used and maintained by the compiler, consists all the identifier’s name along with their types. It helps the compiler to function smoothly by finding the identifiers quickly.
Phases of Compiler
• Structure of compiler has of two parts:
1.Analysis phase(front end )
2.Sysnthesis Phase(back end)
• Front-end constitutes of the Lexical analyzer, semantic analyzer, syntax analyzer and intermediate code generator. And the rest are assembled to form the back end
1.Lexical Analyzer
• Lexical Analyzer – It reads the program and converts it into tokens. It converts a stream of lexemes into a stream of tokens. Tokens are defined by regular expressions which are understood by the lexical analyzer. It also removes white-spaces and comments.
• Example: X:=z+y
• 5 token x,=,z,+,y after this stage id1 assign id2 binop id3
Tokens, Patterns, and LexemesToken Sample Lexemes Informal Description of Pattern
const
if
relation
id
num
literal
const
if
<, <=, =, < >, >, >=
pi, count, D2
3.1416, 0, 6.02E23
“core dumped”
const
if
< or <= or = or < > or >= or >
letter followed by letters and digits
any numeric constant
any characters between “ and “ except “
Classifies Pattern
Actual values are critical. Info is :
1. Stored in symbol table
2. Returned to parser
2.Syntax Analyzer
• It is sometimes called as parser. It constructs the parse tree.
• It takes all the tokens one by one and uses Context Free Grammar to construct the parse tree.
• Why Grammar ?The rules of programming can be entirely represented in some few productions. Using these productions we can represent what the program actually is. The input has to be checked whether it is in the desired format or not.
• Syntax error can be detected at this level if the input is not in accordance with the grammar
3.Semantic Analyzer
• Semantic Analyzer – It verifies the parse tree, whether it’s meaningful or not. It furthermore produces a verified parse tree.
• Semantic deals with Type checking and constraint with the help of rules.
4.Intermediate Code Generator
• It act as bridge between the analysis phase and synthesis phase of compilation process.
• It generates intermediate code, that is a form which can be readily executed by machine We have many popular intermediate codes.
• Example – Three address code(TAC), Quadruples, triples,Postfix etc.
• Intermediate code is converted to machine language using the last two phases which are platform dependent. Till intermediate code, it is same for every compiler out there, but after that, it depends on the platform. To build a new compiler we don’t need to build it from scratch. We can take the intermediate code from the already existing compiler and build the last two parts.
5.Code Optimizer• Code Optimizer – It transforms the code so that it
consumes fewer resources and produces more speed. • The meaning of the code being transformed is not altered. • Optimization can be categorized into two types: machine
dependent and machine independent.• Optimization Technique:• 1.Removing redundant identifiers• 2.Removing unreachable s• ections of code• 3.Identifying common sub expression.• 4.Unfolding loops• 5.Elimnating Procedure
6.Target Code Generator
• Target Code Generator – The main purpose of Target Code generator is to write a code that the machine can understand.
• The output is dependent on the type of assembler.
• This is the final stage of compilation.
1.Lexical Analysis
• Lexical Analysis is the first phase of compiler also known as scanner. It converts the input program into a sequence of Tokens.
• Lexical Analysis can be implemented with the Deterministic finite Automata.
• What is a token?A lexical token is a sequence of characters that can be treated as a single logical entity.
• Example of tokens: Keyword, Operators, Constants,Identifiers, Special Symbol.
• Keywords; Examples-for, while, if etc. Identifier; Examples-Variable name, function name etc. Operators; Examples '+', '++', '-' etc. Separators; Examples ',' ';' etc
Tokens, Patterns, Lexemes
• Pattern: A set of strings in the input for which the same token is produced as output. This set of string is described by a rule called a pattern associated with the token.– e.g., id => “letter followed by letters and digits”
• Lexeme: a sequence of characters in the source program that is matched by the pattern for a token
• Examples: int a;
First string:int pattern:int,lexeme:int, token:keyword
Second string:a pattern:[a-zA-Z][a-zA-Z-)-9]* lexeme:atoken: identifier
It is the first phase of the compiler.
It reads the input characters and produces as output a sequence of tokens that the parser uses for syntax analysis.
It strips out from the source program comments and white spaces in the form of blank , tab and newline characters .
It also correlates error messages from the compiler with the source program (because it keeps track of line numbers).
20
The role of the lexical analyzer
Interaction Of The Lexical Analyzer With The Parser
21
LexicalAnalyzer
ParserSource
Program
Token,tokenval
Symbol Table
Get nexttoken
error error
Recognition of Token
Recognition of Token
•Data structure used in Lexical Analyzer,•Terminal Table(TRM)•Identifier Table|(IDN)•Uniform symbol table•Literal Table
Recognition of Token
Lexical Errors
• Lexical error is a sequence of characters that does not match the pattern of any token. Lexical phase error is found during the execution of the program.
•Lexical phase error can be:1. Spelling error.2. Exceeding length of identifier or numeric constants.3. Appearance of illegal characters.4. To remove the character that should be present.5. To replace a character with an incorrect character.6. Transposition of two characters.
Example:Void main(){
int x=10, y=20;char * a;a= &x;x= 1xab;
}In this code, 1xab is neither a number nor an identifier. So this code will show the lexical error.
Lexical Errors
A lexical error is any input that can be rejected by the lexer.
This generally results from token recognition falling off the end of