Introduction to Language Processing Technology Natawut Nupairoj, Ph.D. Department of Computer Engineering Chulalongkorn University
Jan 05, 2016
Introduction to Language Processing TechnologyNatawut Nupairoj, Ph.D.
Department of Computer EngineeringChulalongkorn University
Outline
Level of Programming Languages. Language Processors. Specification of Programming Languages.
swap(int v[], int k)
{ int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
swap:
muli $2, $5, 4
add $2, $4, $2
lw $15, 0($2)
...
Assembler
C Compiler
Level of Programming Languages
000010001101101100110000
000010001101101100110000
000010001101101100110000
000010001101101100110000
...
•High level: C / Java / Pascal•Low level: Assembly / Bytecode•Machine Language
High-Level Language Characteristics Expressions:
a = b + (c – d)/2; Data types:
Integer, character, boolean. Record, array.
Control structures: Selective. Iterative.
High-Level Language Characteristics Declarations:
Identifier can be constant, variable, procedure, function, and type.
Abstraction: Object-oriented concept. Concern only what, not how.
Encapsulation: Object-oriented concept. Information hiding.
Language Processors
Program that manipulates programs express in some programming languages.
Example:Editor.Translator / Compiler. Interpreter.
Translator
Translate a “source” program into an “equivalent” “object” program.
Translatorsourceprogram
objectprogram
error messages
CC++FORTRANJavaVB
AssemblyCBytecodep-code
Tombstone Diagrams
Ordinary program
Program P
Written with Language L
L
P
Java
Sort
x86
Sort
x86
Web Browser
x86
Web Browser
Tombstone Diagrams
Machine
M
Machine M
x86
SPARCx86
SPARC
x86
Web Browser
Tombstone Diagrams
Translator
L
S T
S is translatedto T
Translator is written with Language L
C
Java x86
x86
Java x86
C++
Java C
Tombstone Diagrams
Compilation
x86
C x86
x86
x86
x86
Sort
C
Sort
x86
Sort
Tombstone Diagrams
Cross Compilation
x86
C SPARC
x86
SPARC
SPARC
Sort
SPARC
Sort
C
Sort
Tombstone Diagrams
x86
Java C
x86
x86
C x86
x86
Two-stage compilation
C
Sort
Java
Sort
x86
Sort
Tombstone Diagrams
x86
C x86
x86
Compiling a compiler
C
Pascal x86
x86
Pascal x86
Tombstone Diagrams
Interpreter
S
L
Interpret source S
x86Written in language L
Basic
x86
Basic
x86
SQL
SPARC
Basic
Sort
Tombstone Diagrams
Abstract machine = hardware emulator interpreter for low-level language.
x86
C x86
x86
370
C
370
x86
x86
370
x86=
370
HW1
370
370
HW1
Tombstone Diagrams
Java Portable environment: write-once-run-anywhere. Interpretive compiler.
M
Java JVM JVM
M
JVM = Bytecode
Tombstone Diagrams
x86
JVM
x86
SPARC
JVM
SPARC
JVM
Sort
JVM
Sort
x86
Java JVM
x86
JVM
Sort
Java
Sort
Tombstone Diagrams
BootstrappingCompiler L that is written on L language.
Full bootstrapStart from nothing.
Half bootstrapStart from other machine.
NNP
C NNP
Tombstone Diagrams
Full Bootstrap
NNP
Csub
Csub NNP
NNP
Csub NNP
NNP
Csub
C NNP
NNP
C NNP
NNP
Csub NNP
NNP
Csub NNP
NNP
Csub NNP
Tombstone Diagrams
NNP
C
C NNP
NNP
C NNP
NNP
C NNP
Tombstone Diagrams
NNP
Csub
Csub NNP
NNP
Csub NNP
NNP
Csub
C NNP
NNP
C NNP
NNP
Csub NNP
NNP
C NNP
NNP
C
C NNP
Tombstone Diagrams
Half Bootstrap
x86
C x86
x86
C
C NNP
x86
C NNP
x86
C NNP
x86
C
C NNP
NNP
C NNP
x86
C X86
x86
Specification of Programming Language Specification
Syntax Define symbol and structure of the language. Grammar.
Contextual constraints Constraints beyond grammar. Rules of the language: scope rules, type rules, etc.
Semantics Meaning of program: its behaviors when run. How to translate a sentence S of the language L to a
machine code on M
Syntax
Context-free grammarTerminals.Non-terminals / Variables.Start symbol.Production rules.
Usually being expressed with BNF notation.
BNF Notation
Backus-Naur Form. Given production rule:
N N
Can be written as:
N ::=
Example: Mini-Triangle Program
! This is a comment. It continues to the end-of-line.
let
const m ~ 7;
var n: Integer
in
begin
n:= 2 * m * m;
putint(n);
end
Terminalsbegin const do else end ifin let then var while; : := ~ ( )+ - * / < >= \
Mini-Triangle Syntax
Program ::= Command
Command ::= single-Command
| Command ; single-Command
single-Command ::= V-name := Expression
| Identifier ( Expression )
| if Expression then single-Command
else single-Command
| while Expression do single-Command
| let Declaration in single-Command
| begin Command end
Mini-Triangle Syntax
Expression ::= primary-Expression
| Expression Operator primary-Expression
primary-Expression ::= Integer-Literal
| V-name
| Operator primary-Expression
| ( Expression )
V-name ::= Identifier
Declaration ::= single-Declaration
| Declaration ; single-Declaration
single-Declaration ::= const Identifier ~ Expression
| var Identifier : Type-denoter
Mini-Triangle Syntax
Type-denoter ::= Identifier
Operator ::= + | - | * | / | < | > | = | \
Identifier ::= Letter | Identifier Letter
| Identifier Digit
Integer-Literal ::= Digit | Integer-Literal Digit
Comment ::= ! Graphic* eol
Letter ::= a | b | … |z
Digit ::= 0 | 1 | 2 | … | 9
Syntax Tree
Ordered tree with Internal nodes: non-terminals.Leaf nodes: terminals.N-tree of G is a syntax tree with N as the root.
Mini-Triangle Syntax Tree
Expression ::= primary-Expression| Expression Operator primary-Expression
primary-Expression ::= Integer-Literal| V-name| Operator primary-Expression|( Expression )
V-name ::= Identifier…
Expression
Expression
Expression
primary-Expr.
V-name
Ident.
d
Op.
+
Int. Lit.
10
Op.
*
primary-Expr. primary-Expr.
V-name
Ident.
n