Page 1
Compiler and Language Processing ToolsSummer Term 2009
Introduction
Dr.-Ing. Ina Schaefer
Software Technology GroupTU Kaiserslautern
Ina Schaefer Compilers 1
Introduction
Outline
1. IntroductionOverview and Application DomainsTasks of Language-Processing ToolsExamples
2. Language ProcessingTerminology and RequirementsCompiler Architecture
3. Compiler Construction
Ina Schaefer Compilers 2
Page 2
Introduction Overview and Application Domains
Language Processing Tools
• Processing of Source Texts in Source Languages• Analysis of Source Texts• Translation to Target Languages
Ina Schaefer Compilers 3
Introduction Overview and Application Domains
Language Processing Tools (2)
Typical Source Languages• Programming Languages: C, C++, C#, Java, ML, Smalltalk,
Prolog, Script Languages (JavaScript), bash• Languages for Configuration Management: make, ant• Application and Tool-Specific Languages: Excel, JFlex, CUPS• Specification Languages: Z, CASL, Isabelle/HOL• Formatting and Data Description Languages: LaTeX, HTML, XML• Design and Architecture Description Languages: UML, SDL,
VHDL, Verilog
Ina Schaefer Compilers 4
Page 3
Introduction Overview and Application Domains
Language Processing Tools (3)
Typical Target Languages• Assembly and Machine Languages• Programming Language• Data and layout Description Languages• Languages for Printer Control
Ina Schaefer Compilers 5
Introduction Overview and Application Domains
Language Processing Tools (4)
Language Implementation Tasks• Tool Support for Language Processing• Integration into Existing Systems• Connection to Other Systems
Ina Schaefer Compilers 6
Page 4
Introduction Overview and Application Domains
Application Domains
• Programming Environments! Context-sensitive Editors, Class Browers! Graphical Programming Tools! Pre-Processors! Compilers! Interpreters! Debuggers! Run-time Environments (loading, linking, execution, memory
management)
Ina Schaefer Compilers 7
Introduction Overview and Application Domains
Application Domains (2)
• Generation of Programs from Design Documents (UML)• Program Comprehension, Re-engineering• Design and Implementation of Application-specific Languages
! Robot Control! Simulation Tools! Spread Sheets, Active Documents
• Web Technology! Analysis of Web Sites! Active Websites (with integrated functionality)! Abstract Platforms, e.g. JVM, .NET! Optimization of Caching
Ina Schaefer Compilers 8
Page 5
Introduction Overview and Application Domains
Related Fields
• Formal Languages, Language Specification and Design• Programming and Specification Languages• Programming, Software Engineering, Sotware Generation,
Software Architecture• System Software, Computer Architecture
Ina Schaefer Compilers 9
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools
Analyser Translation Interpreter
Source Code Source Code
Target CodeAnalysis Results
Source Code
Input Data
Output Data
Analysis, Translation and Interpretation are often combined.
Ina Schaefer Compilers 10
Page 6
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools (2)
1. Translation! Compiler implements Analysis and Translation! OS and Real Machine implement Interpretation
Pros:! Most efficient solution! One interpreter for all programming languages! Prerequisite for other solutions
Ina Schaefer Compilers 11
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools (3)
2. Direct Interpretation! Interpreter implements all tasks.! Examples: Java Script, Command Line Languages (bash)! Pros: No translation necessary (but analysis at run-time)
Ina Schaefer Compilers 12
Page 7
Introduction Tasks of Language-Processing Tools
Tasks of Language-Processing Tools (4)
3. Abstract and Virtual Machines! Compiler implements Analysis and Translation to Abstract Machine
Code! Abstract Machine works as Interpreter! Examples: Java/JVM, C#NET Pros:
• Platform independent (portability, mobile code)• Self-modifing programs possible
4. Other Combinations
Ina Schaefer Compilers 13
Introduction Examples
Example: Analysis
17.04.2007 8© A. Poetzsch-Heffter, TU Kaiserslautern
package b1_1
;
class Weltklasse
extends Superklasse
implement BesteBohnen
{Qualifikation studieren
( Arbeit schweiss) {
return new
Qualifikation
();}}
Beispiel: (Analyse)
javac-Analysator
Superklasse.class
Qualifikation.class
Arbeit.class
BesteBohnen.class
...
b1_1/Weltklasse.java:4: '{' expected.
extends Superklasse
^
1 error
Ina Schaefer Compilers 14
Page 8
Introduction Examples
Example: Translation
17.04.2007 9© A. Poetzsch-Heffter, TU Kaiserslautern
package b1_1;
class Weltklasse
extends Superklasse
implements BesteBohnen
{
Qualifikation studieren
( Arbeit schweiss ) {
return new Qualifikation();
}}
Beispiel 1: (Übersetzung)
javac
Su
pe
rkla
sse.
cla
ss
Qu
alif
ika
tion.
cla
ss
Arb
eit.
clas
s
Be
ste
Bo
hne
n.c
lass
...
Compiled from Weltklasse.java
class b1_1/Weltklasse
extends ... implements ... {
b1_1/Weltklasse();
b1_1.Qualifikation studieren(...);
}
Method b1_1/Weltklasse()
...
Method b1_1.Qualifikation studieren(...)
...
Ina Schaefer Compilers 15
Introduction Examples
Example: Translation (2)Result of Translation
17.04.2007
10
© A
. Poe
tzsch
-Heffte
r, TU
Kais
ers
laute
rn
Be
isp
iel 1
:(F
orts
etz
un
g)
Compiled from Weltklasse.java
class b1_1/Weltklasse
extends b1_1.Superklasse
implements b1_1.BesteBohnen {
b1_1/Weltklasse();
b1_1.Qualifikation studieren(b1_1.Arbeit);
}
Method b1_1/Weltklasse()
0 aload_0
1 invokespecial #6 <Method b1_1.Superklasse()>
4 return
Method b1_1.Qualifikation studieren(b1_1.Arbeit)
0 new #2 <Class b1_1.Qualifikation>
3 dup
4 invokespecial #5 <Method b1_1.Qualifikation()>
7 areturn
Ina Schaefer Compilers 16
Page 9
Introduction Examples
Example 2: Translation
17.04.2007 11© A. Poetzsch-Heffter, TU Kaiserslautern
int main() {
printf("Willkommen zur Vorlesung!");
return 0;
}
Beispiel 2: (Übersetzung)
gcc
.file "hello_world.c"
.version "01.01"
gcc2_compiled.:
.section .rodata
.LC0:
.string "Willkommen zur Vorlesung!"
.text
.align 16
.globl main
.type main,@function
main:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
...
Ina Schaefer Compilers 17
Introduction Examples
Example 2: Translation (2)Result of Translation
17.04.2007 12© A. Poetzsch-Heffter, TU Kaiserslautern
Beispiel 2: (Fortsetzung)
.file "hello_world.c"
.version "01.01"
gcc2_compiled.:
.section .rodata
.LC0:
.string "Willkommen zur Vorlesung!"
.text
.align 16
.globl main
.type main,@function
main:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
addl $-12,%esp
pushl $.LC0
call printf
addl $16,%esp
xorl %eax,%eax
jmp .L2
.p2align 4,,7
.L2:
movl %ebp,%esp
popl %ebp
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.95.2 19991024 (release)"
Ina Schaefer Compilers 18
Page 10
Introduction Examples
Example 3: Translation
17.04.2007 13© A. Poetzsch-Heffter, TU Kaiserslautern
Beispiel 3: (Übersetzung)
\documentclass{article}
\begin{document}
\vspace*{7cm}
\centerline{\Huge\bf It‘s groovy}
\end{document}
groovy.tex (104 bytes)
...
groovy.dvi (207 bytes, binary)
%!PS-Adobe-2.0
%%Creator: dvips(k) 5.86 ...
%%Title: groovy.dvi
...
groovy.ps (7136 bytes)
latex
dvips
Ina Schaefer Compilers 19
Introduction Examples
Example: Interpretation
17.04.2007 14© A. Poetzsch-Heffter, TU Kaiserslautern
Beispiel: (Interpretation)
...
14 iload_1
15 iload_2
16 idiv
17 istore_3
...
.class-Datei
Eingabedaten
Ausgabedaten
...
14 iload_1
15 iload_2
16 idiv
17 istore_3
...
Java Virtual Machine (JVM)
Input Data
Output Data
.class File
Ina Schaefer Compilers 20
Page 11
Introduction Examples
Example: Combined TechniqueJava Implementation with Just-In-time (JIT) Compiler
17.04.2007 15© A. Poetzsch-Heffter, TU Kaiserslautern
Kombinierte Implementierungstechnik: Java-Implementierung mit JIT-Übersetzer
Java-Überset-zungseinheit
java
c Analysator
Übersetzer
Eingabedaten
Java Byte Code
.class-Datei
Ausgabedaten
JIT-Übersetzer
JVM
Maschinencode reale Maschine/Hardware
(JIT=Just in time)
Beispiel: (Kombinierte Technik)
Java SourceCode Unit
AnalyzerTranslator
Input Data
Output Data
.class file
JIT Translator
Machine Code Real Machine / Hardware
Ina Schaefer Compilers 21
Language Processing Terminology and Requirements
Language Processing: The Translation Task
Translator
Source Code
Error Message or Target Code
• Translator (in a broader sense):Analysis, Optimization andTranslation
• Souce Code:Input (String) for Translator inSyntax of Source Language (SL)
• Target Code:Output (String) of Translator inSyntax of Target Language (TL)
Ina Schaefer Compilers 22
Page 12
Language Processing Terminology and Requirements
Phases of Language Processing
• Analysis of Input:! Program Text! Specification! Diagrams
• Dependant on Target of Implementation! Transformation (XSLT, Refactoring)! Pretty Printing, Formatting! Semantic Analysis (Program Comprehension)! Optimization! (Actual) Translation
Ina Schaefer Compilers 23
Language Processing Terminology and Requirements
Compile Time vs. Run-time
• Compile Time: during Run-time of Compiler/TranslatorStatic: All Information/Aspects known at Compile Time, e.g.
! Type Checks! Evaluation of Constant Expressions! Relative Adresses
• Run Time: during Run-time of Compiled ProgramDynamic: All Information that are not statically known, e.g.
! Allocation of Dynamic Arrays! Bounds Check of Arrays! Dynamic Binding of Methods! Memory Management of Recursive Procedures
For dynamic aspects that cannot be handled at compile time, thecompiler generates code that handles these aspects at runtime.
Ina Schaefer Compilers 24
Page 13
Language Processing Terminology and Requirements
What is a good compiler?
Ina Schaefer Compilers 25
Language Processing Terminology and Requirements
Requirements for Translators
• Error Handling (Static/Dynamic)• Efficient Target Code• Choice: Fast Translation with Slow Code
vs. Slow Translation with Fast Code• Semantically Correct Translation
Ina Schaefer Compilers 26
Page 14
Language Processing Terminology and Requirements
Semantically Correct Translation
Intuitive Definition: Compiled Program behaves according toLanguage Definition of Source Language.
Formal Definition:• semSL: SL_Program ! SL_Data" SL_Data• semTL: TL_Program ! TL_Data" TL_Data• compile: SL_Program" TL_Program• code: SL_Data" TL_Data• decode: TL_Data" SL_Data
Semantic Correctness:semSL(P,D) = decode(semTL(compile(P), code(D)))
Ina Schaefer Compilers 27
Language Processing Compiler Architecture
Compiler Architecture
Scanner
Source Codeas String
TokenStream
Parser
Name and Type Analysis
Translator
Code Generator
Syntax Tree
DecoratedSyntax Tree
(Close to SL)
IntermediateLanguage
Target Codeas String
Attribution &Optimization
Attribution &Optimization
Peep HoleOptimization
Analysis
Synthesis
Ina Schaefer Compilers 28
Page 15
Language Processing Compiler Architecture
Properties of Compiler Architectures
• Phases represent Concepts.• Phases can be interleaved.• Concrete Layout of Phases depends on Source Language, Target
Language and Concrete Design Decisions.• Phase vs. Pass (Phase can comprise more than one pass.)• Separate Translation of Program Parts
(Interface information must be accessible.)• Combination with other Architecture Decisions:
Common Intermediate Language
Ina Schaefer Compilers 29
Language Processing Compiler Architecture
Common Intermediate Language
Source Language 1
Source Language 2
Source Language n
Intermediate Language
Target Language 1
Target Language 2
Target Language m
...
...
Ina Schaefer Compilers 30
Page 16
Language Processing Compiler Architecture
Dimensions of Compiler Construction
• Programming Languages! Sequential Procedural, Imperative, OO-Languages! Functional, Logical Languages! Parallel Languages/Language Constructs
• Target Languages/Machines! Code for Abstract Machines! Assembler! Machine Languages (CISC, RISC, ...)! Multi Processor Architectures! Memory Hierarchy
• Translation Tasks: Analysis, Optimization, Synthesis• Construction Techniques and Tools: Bootstrapping, Generators• Portability, Specification, Correctness
Ina Schaefer Compilers 31
Compiler Construction
Compiler Construction Techniques
1. Stepwise Construction! Construction with compiler for different language! Construction with compiler for different machine! Bootstrapping
2. Compiler - Compiler: Tools for Compiler Generation! Scanner Generators (regular expressions)! Parser Generators (context-free grammars)! Attribute Evaluation Generators (attribute grammar)! Code Generator Generators (machine specification)! Interpreter Generators (semantics of language)! Other Phase-specific Tools
3. Special Programming Techniques! General Technique: syntax-driven! Special Technique: recursive descend
Ina Schaefer Compilers 32
Page 17
Compiler Construction
Stepwise Construction
Programming typically depends on existing compiler for implemen-tation language. For compiler construction, this does not hold ingeneral.
Source, target and implementation languages of compilers can bedenoted in T-Diagrams.
17.04.2007 22© A. Poetzsch-Heffter, TU Kaiserslautern
Konstruktionstechniken:
Programmierung basiert üblicherweise auf Existenz eines Übersetzers für die Implementierungssprache.
Bei Übersetzerkonstruktion kann davon im Allg. nichtausgegangen werden; stehe
für einen Übersetzer der in Sprache PS geschriebenist und QS in ZS übersetzt.
QS
PS
ZS
A. Schrittweise Konstruktion:
1. Konstruktion mit Übersetzer für andere Sprache:
Gesucht: QS-Compiler, Ziel MS, läuft auf MS
Annahme: C-Compiler existiert auf Plattform mit Maschinensprache MS
C
MS
MS
QS
C
MS QS
MS
MS
zu entwickeln
existiert
durchÜbersetzung
T-diagram denotes compiler from QS to ZS written in PS.
Ina Schaefer Compilers 33
Compiler Construction
Construction with compiler for different language
• Given: SL Compiler• Construct: Compiler for machine language (MS) in MS• Suppose: C Compiler exists on platform with machine language
17.04.2007 22© A. Poetzsch-Heffter, TU Kaiserslautern
Konstruktionstechniken:
Programmierung basiert üblicherweise auf Existenz eines Übersetzers für die Implementierungssprache.
Bei Übersetzerkonstruktion kann davon im Allg. nichtausgegangen werden; stehe
für einen Übersetzer der in Sprache PS geschriebenist und QS in ZS übersetzt.
QS
PS
ZS
A. Schrittweise Konstruktion:
1. Konstruktion mit Übersetzer für andere Sprache:
Gesucht: QS-Compiler, Ziel MS, läuft auf MS
Annahme: C-Compiler existiert auf Plattform mit Maschinensprache MS
C
MS
MS
QS
C
MS QS
MS
MS
zu entwickeln
existiert
durchÜbersetzung
SL SL
existing
to be developed by
translation
Ina Schaefer Compilers 34
Page 18
Compiler Construction
Construction with compiler for different machine
• Construct: C Compiler for M1 in M1
• Suppose: C Compiler exists for M2 in M2
• Method: Construct Cross Compiler first
First Step
17.04.2007 23© A. Poetzsch-Heffter, TU Kaiserslautern
2. Konstruktion mit Übersetzer für anderen Rechner:
Gesucht: C-Compiler auf für
Annahme: C-Compiler auf für existiert
Methode: realisiere zunächst Cross-Compiler
C
C
C
C
MS1
MS 2
MS1
MS 2
MS1
MS 2
MS1MS1
MS 2
MS 2
Cross-Compiler
C
C
C
C MS1MS1
MS 2
1.Schritt:
2.Schritt:
MS1
17.04.2007 23© A. Poetzsch-Heffter, TU Kaiserslautern
2. Konstruktion mit Übersetzer für anderen Rechner:
Gesucht: C-Compiler auf für
Annahme: C-Compiler auf für existiert
Methode: realisiere zunächst Cross-Compiler
C
C
C
C
MS1
MS 2
MS1
MS 2
MS1
MS 2
MS1MS1
MS 2
MS 2
Cross-Compiler
C
C
C
C MS1MS1
MS 2
1.Schritt:
2.Schritt:
MS1
Cross Compiler
Cross Compiler
Ina Schaefer Compilers 35
Compiler Construction
Construction with compiler for different machine (2)
Second Step
17.04.2007 23© A. Poetzsch-Heffter, TU Kaiserslautern
2. Konstruktion mit Übersetzer für anderen Rechner:
Gesucht: C-Compiler auf für
Annahme: C-Compiler auf für existiert
Methode: realisiere zunächst Cross-Compiler
C
C
C
C
MS1
MS 2
MS1
MS 2
MS1
MS 2
MS1MS1
MS 2
MS 2
Cross-Compiler
C
C
C
C MS1MS1
MS 2
1.Schritt:
2.Schritt:
MS1
17.04.2007 23© A. Poetzsch-Heffter, TU Kaiserslautern
2. Konstruktion mit Übersetzer für anderen Rechner:
Gesucht: C-Compiler auf für
Annahme: C-Compiler auf für existiert
Methode: realisiere zunächst Cross-Compiler
C
C
C
C
MS1
MS 2
MS1
MS 2
MS1
MS 2
MS1MS1
MS 2
MS 2
Cross-Compiler
C
C
C
C MS1MS1
MS 2
1.Schritt:
2.Schritt:
MS1
Cross Compiler
Cross Compiler
Ina Schaefer Compilers 36
Page 19
Compiler Construction
Bootstrapping
• Construct: QS Compiler for MS in MS• Suppose: yet no compiler exists• Method:
1. Construct partial language QSi of QS such thatQS0 # QS1 # QS2 # . . . # QS
2. Implement QS0 Compiler for MS in MS3. Implement QSi+1 Compiler for MS in QSi4. Create QSi+1 Compiler for MS in MS
Ina Schaefer Compilers 37
Compiler Construction
Bootstrapping (2)
17.04.2007 24© A. Poetzsch-Heffter, TU Kaiserslautern
3. Bootstrapping:
Gesucht: QS-Compiler auf MS für MS
Annahme: kein Compiler verfügbar
Methode:
1. Entwerfe Teilsprachen von QS mit
2. Implementiere QS -Compiler für MS in MS
3. Implementiere QS -Compiler für MS in QS
4. Erzeuge QS -Compiler für MS in MS
QS 1
QS i
QS
durch Erweiterung
QS
MS
2
U
QS 2
U
QS 1U
QS 0
0
i+1
i
0
i+1
MSQS 0
MS
MS
QS 1 MS
QS 2
QS 1
MS
MS
QS 2 MS
QS
QS
MS
MS
MSQS
durch Übersetzung
von Hand manually
by extension
by translation
Ina Schaefer Compilers 38
Page 20
Compiler Construction
Recommended Reading
Wilhelm, Maurer:• Chap. 1, Introduction (pp. 1–5)• Chap. 6, Structure of Compilers (pp. 225 – 238)
Appel• Chap. 1, Introduction (pp. 3 – 14)
Ina Schaefer Compilers 39