MASTER PROJECT ABDULLAH SHENEAMER MSCS GRADUATE CANDIDATE FALL 2012 DCSPM: Develop and Compile Subset of PASCAL Language to MSIL 1 Abdullah Sheneamer Master project presentation 11/xx/2012
Feb 23, 2016
1
M A S T E R P R O J E C TA B D U L L A H S H E N E A M E R
M S C S G R A D U AT E C A N D I D AT E
FA L L 2 0 1 2
DCSPM: Develop and Compile Subset of PASCAL Language to MSIL
Abdullah Sheneamer Master project presentation
11/xx/2012
2
Outline
Introduction to MSIL Related Works Why PASCAL to MSIL PASCAL Compiler Lexical Analyzer Design Symbol Table Design Parser and MSIL Design Improvements Evaluations Lesson Learned Future Work Conclusion
Abdullah Sheneamer Master project presentation 11/xx/2012
3
Introduction to MSIL
Microsoft intermediate language(MSIL) is the lowest-level human readable programming language defined by the Common Language Infrastructure (CLI) specification and .NET Framework
(MSIL) includes instructions for loading, storing, initializing, and calling methods on objects, as well as instructions for arithmetic and logical operations.
Abdullah Sheneamer Master project presentation 11/xx/2012
4
Related Works
11/xx/2012Abdullah Sheneamer Master project presentation
“The Design and Implementation of C-like Language Interpreter” [XX11]
The authors presented a paper designs and implements a C-like language interpreter using C++ based on the idea of modularity. The function of lexical analyzer is to read character strings from the source program, split them into separate words, and constructs the internal expression of these words, that is, TOKEN. The basic idea of lexical analyzer design is: first, to judge the start and the end position of a word; second, to judge the attribute of a word. After a word is separated, the next thing is to determine its attribute
“Simple Calculator Compiler Using Lex and YACC” [Upad11] The author presented a paper containing the details of how one can develop the simple
compiler for procedural language using Lex (Lexical Analyzer Generator) and YACC (Yet Another Compiler-Compiler). Lex tool helps write programs whose control flow is directed by instances of regular expressions in the input stream.
5
Why PASCAL to MSIL
- Allow PASCAL to run on .NET platform - Study how compiler in .NET environment
work - PASCAL can now be run on modern
machines - MSIL is platform independent - JIT compilers can be optimized for
specific machines and architectures
Abdullah Sheneamer Master project presentation
11/xx/2012
6
PSCAL Compiler
Compilation process: takes a PASCAL source code and produce (MSIL) Microsoft intermediate language.
Execution process: MSIL must be converted to CPU-specific code, usually by a just-in-time(JIT) Compiler . Native code is computer programming (code) that is compiled to run with a particular processor (such as an Intel x86- class processor) and its set of instructions.
Abdullah Sheneamer Master project presentation 11/xx/2012
Abdullah Sheneamer Master project presentation
7
11/xx/2012
8
Compilation Process
Lexical Analysis
Parser & MSIL
Symbol Table
Error Handler
PASCAL Source Code
Abdullah Sheneamer Master project presentation 11/xx/2012
MSIL Code
Output
9
Lexical Analyzer Design
11/xx/2012Abdullah Sheneamer Master project presentation
After reading next character from input stream ;
State 0 : identify the current token and decide the next state ;
State 1 : Handle identifiers and keywords.
State 2: Handle Number .
State 3 : Handle one – character token or two –character token .
State 4,5 : Handle Comments “\\” or “\*”, skip the line start with “\\” or skip the data between “\*” and “*\”.
10
Lexical Analyzer Design (cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
Begin -/-1 lexbuf=“”
2- state=0;
INITIAL0
WhiteSpace/ No Action
Letter Or @ Or _/Place it in
lexbuf
Letter Or Digit
/Place it in lexbuf
ID1
Anything Else/ 1- return that last char into the input
stream. 2- search the lexbuf in Symbol.3- insert it as ID if not found otherwise get the row number P. 4- build the
token as: [code=sympol[p,token],
[attr=p]5. Enqueue the token and
set lexbuf=“”.
Anything Else/ 1- return that last char
into the input stream. 2- Build the token as :
[code: NUM, attr: value]
3. Enqueue the token and set lexbuf=“”.
NUM2
Digit/Place it in lexbuf
Letter Or @ Or _/Place it in
lexbuf
11
Lexical Analyzer Design (cont.)
11/xx/2012Abdullah Sheneamer Master project presentation 4/10/2012Abdullah Sheneamer Master project
INITIAL0
Unrelated Chararcter1- Return last char into input
stream.2- Build the token:
[ Code=ASCII(first char in lexbuf); attr=-1]
3- lexbuf=“”; state=0;4- Return the token to the
parser.
One or Two Char
3
Sequence is”//”/
state=4;
Anything else/Place it in
lexbuf
Sequence is”*/”/
lexbuf=“”; state=0;
Sequence is”/*”/
lexbuf=“”; state=5;
Other character: 1- Place it in lexbuf. 2- Get the code for the
two charcter token in lexbuf. 3- Build the
token:[code = obtained code; attr=-
1]. 4- lexbuf=“”; state=0. 5- Return
the token to the parser
Multiple line
comment5Single line
comment4
New line/ lexbuf=“”; state=0;
Anything else/Place it in lexbuf
12
Symbol Table Design
11/xx/2012Abdullah Sheneamer Master project presentation
Every key word is a token and has a unique integer code The identifier token has a code 256 The number token has a code 257 For every special character is a token and has an integer token code equals its
ASCII number. Tokens of two characters have unique to Codes
Token Code Keyword
300 Begin
323 If
302 For
305 Switch
376 While
Token Code Tow – Characters Tokens
406 !=
407 ==
408 <=
409 >=
13
Parser and MSIL Design
11/xx/2012Abdullah Sheneamer Master project presentation
The parser is used the most of PASCAL Grammar BNF [22]
Such as nested if/else and if logic expression statement.
14
Parser and MSIL Design (Cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
15
Parser and MSIL Design (Cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
16
Improvements
11/xx/2012Abdullah Sheneamer Master project presentation
Two Improvements in DCSPM Compiler: 1- Lexical Analysis Improvement
Array List Dictionary
17
Improvements (Cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
2- MSIL Code Output ImprovementSimple Pascal Code:
begina:=0; b:=1; c:=2;if( a== 0) then begin a:= b+c; end; end;end.
IL_0000: ldc.i4.0 IL_0001: stloc.0 IL_0002: ldc.i4.1 IL_0003: stloc.1 IL_0004: ldc.i4.2 IL_0005: stloc.2 IL_0006: ldloc.0 IL_0007: ldc.i4.1 IL_0008: ceq IL_000a: stloc.3 IL_000b: ldloc.3 IL_000c: brfalse.s IL_0012 IL_000e: ldloc.1 IL_000f: ldloc.2 IL_0010: add IL_0011: stloc.0 IL_0012: ret
IL_0000: ldc.i4.0 IL_0001: stloc.0 IL_0002: ldc.i4.1 IL_0003: stloc.1 IL_0004: ldc.i4.2 IL_0005: stloc.2 IL_0006: ldloc.0 IL_0007: ldc.i4.1 IL_0008: ceq IL_000a: ldc.i4.0 IL_000b: ceq IL_000d: stloc.3 IL_000e: ldloc.3 IL_000f: brtrue.s IL_0015 IL_0011: ldloc.1 IL_0012: ldloc.2 IL_0013: add IL_0014: stloc.0 IL_0015: ret
18
Evaluations
11/xx/2012Abdullah Sheneamer Master project presentation
1- Array list data structure vs. Dictionary data structure
11 22 33 44 55 66 77 88 990123456789
10
Array ListDictionary
19
Evaluations (cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
Collection Ordering Contiguous Storage?
Direct Access? Lookup Efficiency
ManipulateEfficiency
Notes
Dictionary Unordered Yes Via Key Key:O(1)
O(1) Best for high performance lookups.
ArrayList User has precise control over element ordering
Yes Via Index O(n) O(n) Best for smaller lists
Complexity of Array list vs. Dictionary
20
Evaluations (cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
2- Parser phase test
11 22 33 44 55 66 77 88 990
2
4
6
8
10
12
14
16
Parser Phase
Parser Phase
# lines of Pascal code
Tim
e m
s
21
Evaluations (cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
3- Initial and Improved nested If/else MSIL Code
11 22 33 44 55 66 77 88 990
2
4
6
8
10
12
14
16
18
if/else MSIL results
unimprove MSIL code improve MSIL code
# lines of Pascal Code
Tim
e m
s
22
Evaluations (cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
Size of Initial and Improved nested if/else MSIL Code
11 22 33 44 55 66 77 88 990
2
4
6
8
10
12
14
16
Size of initial and improve if/els MSIL
Unimprove SizeImprove Size
# lines of Pascal Code
Size
/kb
23
Lessons Learned
11/xx/2012Abdullah Sheneamer Master project presentation
ildasm.EXE: Converts IL to human readable code tool C:\Program Files\Microsoft SDKs\Windows\v7.0A\bin
ILASM.EXE: Converts human readable code to IL toolC:\WINDOWS\Microsoft.NET\Framework\v1.1.4322Or C:\Windows\Microsoft.NET\Framework\v2.0.50727
Date Time and Time SpanDateTime Start = DateTime.Now; lex(); TimeSpan Elapsed = DateTime.Now- Start;speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms";
Stopwach classSystem.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch(); Stopwatch stopwatch = new Stopwatch();Stopwatch.Start();lex();stopwatch.Stop();speed = "Time Elapsed of Lexical Analysis: " + Elapsed.TotalMilliseconds + "ms";
24
Lessons Learned (cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
Nested if/else logic statement
25
Future Works
11/xx/2012Abdullah Sheneamer Master project presentation
Many statements and data structures of Pascal language are yet to be supported and related MSIL generated:
1- complicated case statement.2- if logic of a complex condition with multiple levels 3- assert statement 4- exit statement 5- goto statement6- repeat statement 7- next statement 8- complicated one dimensional array, 9- two dimensional array data structure 10- queue data structure 11- stack data structure
26
Conclusion
11/xx/2012Abdullah Sheneamer Master project presentation
The DCSPM compiler is useful to legacy Pascal to run on modern machines and its MSIL is a platform independent. MSIL code is verified for safety during runtime and MSIL can be executed in any environment supporting the CLI (Common Language Infrastructure).
One dimensional array has two cases when compiling to MSIL. First, when the array has one element or 2 elements will be the same looks like the MSIL of other statements ( if/else/while….etc)
The initial lexical analysis is using array list data structure in symbol table and the improved lexical analysis which is using a dictionary data structure in symbol table too. So, when I had tested the two situations by Stopwatch class.
27
Conclusion (cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
A batch timer.cmd file to calculate time of MSIL results.
Improved nested if/else statement faster than initial nested if/else statement, although both of them have the same results.
The experiences learned in this project can serve as a foundation for developing new programming language.
28
Demo & Questions
11/xx/2012Abdullah Sheneamer Master project presentation
http://cs.uccs.edu/~gsc/pub/master/asheneam/src/COMPILER/bin/Debug/
29
Bibliography
[MC5tk]: http://msdn.microsoft.com/en-us/library/c5tkafs1(v=vs.71).aspx [XX11]: Xiaohong Xiao and You Xu “The Design and Implementation of C-like Language Interpreter” Proceedings of 2nd International Symposium on Intelligence Information Processing and
Trusted Computing (IPTC), pp. 104-107, 2011 [Upad11]: Mohit Upadhyaya “Simple Calculator Compiler Using Lex and YACC” Proceedings of 3rd IEEE
Interenational Conference on Elecronic Computer Technology (ICECT), Vol. 6, pp. 182-187, 8-10 April 2011 [DLNYM]: C# To Program By H.M Deitel & P.J.Deitel& J.Listfield & T.R. Nieto & C.Yaeger & M.Zlatkina. [L97]: Compiler Construction principles and practice by Kennth C.louden [MN11]: Data Structure using Java By D.S.Malik & P.S.Nair. [L06]: An introduction to formal languages and automata. Fourth Edition. Peter Linz [ASU11]: Compilers Principles, Techniques and Tools (2nd Edition) Alfred V. Aho, Monica S. Lam , Ravi Sethi
, Jeffrey D. Ullman [AL09]: Develop a Compiler in Java for a Compiler Design Course Abdul Sattar and Torben Lorenzen [Assembly11]: Guide to assembly language [electronic resource] : a concise introduction / James T. Streib.
Streib, James T. London ; New York : Springer, c2011. [WFRBE89-90]: Using a Stack Assembler Language in a Compiler Course by Dr. Gerald Wildenberg St . John
Fisher College, Rochester, NY Bristol Polytechnic, England (1989-1990 )
Abdullah Sheneamer Master project presentation 11/xx/2012
30
Bibliography (cont.)
11/xx/2012Abdullah Sheneamer Master project presentation
[ LS56]: Expert .NET 2. IL assembler/ Serge Lidin. Lidin, Serge. 1956- Berkeley, CA [CodeProject]: http://www.codeproject.com/Articles/3778/Introduction-to-IL-Assembly-Language [MHt8e]: http://msdn.microsoft.com/en-us/library/ht8ecch6(v=vs.71) [ K08]:Pro C# 2008 and the .NET 3.5 Platform, Fourth Edition [ CodeMSIL]: http://www.codeguru.com/csharp/.net/net_general/il/article.php/c4635/MSIL-Tutorial.htm [WikiPascal]: http://en.wikipedia.org/wiki/Pascal_(programming_language) [PagesCs]: http://pages.cs.wisc.edu/~fischer/cs536.s08/lectures/Lecture02.4up.pdf [MArraylist]: http://msdn.microsoft.com/en-us/library/system.collections.arraylist.aspx [MKx37]:http://msdn.microsoft.com/en-us/library/kx37x362.aspx [WikiExpr]:http://en.wikipedia.org/wiki/Microsoft_Visual_Studio_Express#Visual_C.23_Express [DllAssem]: http://dll-repair-tools.com/dll-files/fusiondll-the-assembly-manager [learnExp]:http://www.learnvisualstudio.net/start-here/lesson-1-1-installing-visual-c-2010-express-edition/) [SeasPascal]: http://www.seas.gwu.edu/~hchoi/teaching/cs160d/pascal.pdf [GeekClass]:
http://geekswithblogs.net/BlackRabbitCoder/archive/2011/06/16/c.net-fundamentals-choosing-the-right-collection-class.aspx
[DotArray]: http://www.dotnetperls.com/arraylist [Ecma]: http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf