Introduction to Compilers Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY
Jan 03, 2016
Introduction to Compilers
Jianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITY
Computers run 0/1 strings(machine language program)0010001000000100
0010010000000100
0001011001000010
0011011000000011
1111000000100101
0000000000000101
0000000000000110
0000000000000000
A machine language program that adds two numbers
•First 4 bits for opcode•Last 12 bits for operands
Source:Louden and Lambert’s book:Programming Languages
Programmers write more readable character strings
An assembly language program that adds two numbers, from Louden and Lambert’s book.
Even more readable character strings: high-level languages Imperative Languages: specifies HOW
Fortran ALGOL PASCAL C C++ Java
Declarative Languages: specifies WHAT SQL, ML, Prolog
Models of Computation in Languages
Underlying most programming languages is a model of computation:
Procedural: Fortran (1957)
Functional: Lisp (1958)
Object oriented: Simula (1967)
Logic: Prolog (1972)
Relational algebra: SQL (1974)
Source: A. V. Aho. Lectures of Programming Languages and Translators
Programming Languages Evolve:Java as an Example Java 1.0, 1996
Object-oriented The language of choice for internet applet programs.
Java 8, 2014 Changing computing background: multicore and
processing big data. Java 8 streams support database-queries style of
programming Java 8 incorporates many ideas from functional
programming.
What is a compiler?
A Compiler is a translator between computers and programmers
More generally speaking, a Compiler is a translator between source strings and target strings. between assembly language and Fortran between Java and Java Bytecode between Java and SQL
Assembly language vs Fortran
Source: Stephen A. Edwards. Lectures of Programming Languages and Translators
The Structure of a Compiler
1. Lexical Analysis
2. Syntax Analysis (or Parsing)
3. Semantic Analysis
4. Intermediate Code Generation
5. Code Optimization
6. Code Generation
Translation of an assignment statement (1)
Translation of an assignment statement (2)
Translation of SQL query
SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5
Reserves Sailors
sid=sid
bid=100 rating > 5
sname
Query can be converted to relational algebra Relational Algebra converts to tree, joins form branches Each operator has implementation choices
Operators can also be applied in different order!
(sname)(bid=100 rating > 5) (Reserves Sailors)
Cost-based Query Sub-System
Query Parser
Query Optimizer
Plan Generator
Plan Cost Estimator
Query Executor
Catalog Manager
Usually there is aheuristics-basedrewriting step beforethe cost-based steps.
Schema Statistics
Select *From Blah BWhere B.blah = blah
Queries
Motivating Example
Cost: 500+500*1000 I/Os By no means the worst plan! Misses several opportunities:
selections could be`pushed’ down
no use made of indexes Goal of optimization: Find faster
plans that compute the same answer.
SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid=S.sid AND R.bid=100 AND S.rating>5
Sailors Reserves
sid=sid
bid=100 rating > 5
sname
(Page-Oriented Nested loops)
(On-the-fly)
(On-the-fly)Plan:
500,500 IOs
Alternative Plans – Push Selects (No Indexes)
Sailors Reserves
sid=sid
bid=100 rating > 5
sname
(Page-Oriented Nested loops)
(On-the-fly)
(On-the-fly)
Sailors
Reserves
sid=sid
rating > 5
sname
(Page-Oriented Nested loops)
(On-the-fly)
(On-the-fly)
bid=100 (On-the-fly)
250,500 IOs
Alternative Plans – Push Selects (No Indexes)
Sailors
Reserves
sid=sid
rating > 5
sname
(Page-Oriented Nested loops)
(On-the-fly)
(On-the-fly)
bid=100 (On-the-fly)
Sailors Reserves
sid=sid
bid = 100
sname
(Page-Oriented Nested loops)
(On-the-fly)
rating > 5
(On-the-fly)(On-the-fly)
500 + 1000 + 250 + 250*10250,500 IOs
4250 IOs