8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
1/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 1
CSCI 5535 Project
A Report on
Interpreted Programming languages
by
Xiaoli Zhang
Helen Wong
December 11, 1996
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
2/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 2
Content
1. Introduction
2. Two important Languages in the Evolution of Interpreted Languages
PascalSmalltalk
3. Interpreter and Virtual Machine
Traditional Compilation Process
Self Compilation
Compiler and Interpreter
Intermediate Language
Just-in-time and On-the-fly
Virtual Machine
Examples of intermediate languages and related abstract machines
Abstract Machine to Actual MachinePortability of Interpreters
4. Scripting Languages and Interpreted Languages
5. Case Study
a) Java
Overview
Java Virtual Machine
Java Language Construct and Javas Interpreter
b) Tcl/TkOverview
On-the fly Bytecode compiler for Tcl
c) Both are Web Programming Languages
d) Another Mobile Language: Omniware
6. Why Interpreted Languages?
Portability
Security
Reusability
Rapid Development
Performance
Other Advantages of Interpreted Languages
7. Summary
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
3/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 3
1. Introduction
Interpreted Languages have become more and more popular. In recent years, interpreted language
such as Java, Tcl/Tk and Perl are the hot topics and wide-spread. Why? Generally, it is because
they are portable, easy to use, fast to develop and safe. And most interpreted languages are closely
related to Web programming. In this paper, we will do some study to expose the nature of inter-
preted programming languages and how these features of interpreted languages are achieved.
2. Two important Languages in the Evolution of Interpreted Languages
Pascal
Pascal is one of the early interpreted language developed by Niklaus Wirth. The non-interpreted
Pascal was designed and implemented in 1967. The first Pascal compiler was implemented for the
CDC6000 computer family. It was written in Pascal itself. In implementing Pascal compiler,
Wirth found that the effort to generate good code is proportional to the mismatch between lan-
guage and machine, and the CDC6000 had certainly not been designed with high-level languagein mind. [1]
Whats more, after the existence of Pascal became well-known, many people asked Wirth for
assistance in implementing Pascal on various other machines. Most of them wanted to use Pascal
for teaching purpose. They liked Pascal for its simplicity and implementation elegance while did
not care much about the performance.
Thereupon, Wirth decided to provide a compiler version that would generate code for machines of
different designs. Later, the code became known as P-code. P-code is an abstract machine code
whose target is a Virtual Machine called P-machine. As an intermediate language, P-code is then
interpreted to emulate its virtual machine on real machine. The P-code version Pascal was easy toconstruct because the new compiler was developed as a substantial exercise in structured pro-
gramming by stepwise refinement and therefore the first few refinement steps could be adopted
unchanged. It also proved to be very successful in spreading the language among many users on
different machines. Wirth had regretted that he had not possessed the wisdom to foresee the
dimensions of this movement. Otherwise, he would have put more efforts into designing and doc-
umenting P-code. [1]
Pascals P-code and related Virtual Machine elaborated the concept of existed Intermediate Lan-
guage and Virtual Machine and thus are very important in the evolution of interpreted lan-
guages. Now, P-code has almost become a household word in the area of programming
languages. With the Virtual Machine, Pascal-P system was well developed to an environment withintegrated compiler, filter, editor, and debugger. This caused Pascals further spreading out.
As mentioned above, Pascal-P is both compiled and interpreted. It has both compiler such as
pcom and interpreter such aspint. [4] As a whole, it takes place in two phases, first the compiler
compiles a source code into P-code, and then the interpreter interprets the P-code. This imple-
mentation used self-compilation: The compiler is written in its own source language and can
compiler itself. This approach is a common combination of elementary methods and is called
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
4/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 4
bootstrap which is also very helpful in software migration. In Pascal-P, The resulting compiler is
written in the Virtual Machine Language -- P-code and generates code for this same machine.
Hence the compiler itself must be interpreted. [2] Similar story happens in Javas implementation.
Smalltalk
Another significant interpreted language is Smalltalk which was developed during the 1970s at
Xerox PARC (Palo Alto Research Center). It was the first language to really exploit a graphical
user interface. Many of the ideas for the Macintosh came from Smalltalk. Smalltalk is more of an
envrionment rather than a language. This is because there is a Smalltalk Virtual Machine, and the
entire operation of the Smalltalk environment and language is built on the virtual machine. [6]
We here call Smalltalk an interpreted language solely because Smalltalk is a P-Machine. What
actually happens as a result of a message sent in Smalltalk is:
first the system checks to see if the method has already been translated to machine code that
has been cached in memory if the native machine code form is in the cache, the system executes that machine code
if the cache doesnt contain a translated form of the method, the system dynamically compiles
the methods bytecode [5]
Dynamic translation yields the benefits of the execution speed of compiled code and the space
compactness of bytecode. If all the code in a running Smalltalk image were kept purely in the
form of compiled machine code, the image would consume 5-10 times as much memory, and
therefore could in fact degrade performance on a virtual memory system by causing increased
paging. [7]
Many features in Smalltalk are worthy to be borrowed by new interpreted languages such as Java.One of these features isjust-in-time compilation we mentioned above. Currently, Java is imple-
menting just-in-time compilation of the bytecode into native code to improve its performance.
We will address some not well-known interpreted Languages next section in introducing Inter-
preter and Virtual Machine. Also we will address in details those popular interpreted languages
such as Java and Tcl as case study while address Perl as scripting language.
3. Interpreter and Virtual Machine
In last section, quite a few terms (in bold characters) related to interpreter languages are men-
tioned. This section, we will report in details the concepts represented by these terms. Lets go alittle bit backward.
Traditional Compilation Process
The process that translate a high-level language into machine code, which the hardware can
understand is done by the compiler. The task of compiler has two subtasks: analysis of the source
program and synthesis of the object program. Typically as in figure 1, the analysis tasks consists
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
5/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 5
of three subphases: lexical analysis, syntax analysis and semantic analysis. While the synthesis
task is usually a single phase: code-generation
Figure 1
The lexical analyzer is responsible for reading the characters of the source program and recogniz-ing basic syntactic components or tokens that they represent and returning the tokens to the syntax
analyzer or parser. Then the parser has to determine how to group and structure the tokens accord-
ing to the syntax rules of the language. The output of parser is a representation of the syntactic
structure of the source program and often expressed in the form of parse tree. The parse tree is
then passed to the semantic analyzer which is to determine the meaning of the source program
including the meaning of declarations and scopes of identifiers, storage allocation, type checking,
selection of appropriate polymorphic operators, addition of automatic type transfers, etc.
The code generator in the last phase of the compilation process takes the output from the semantic
analyzer as input and generate machine code or assembly language for the target hardware.
It has to know the machine architecture including machine instructions, allocation of machineregisters, addressing, interfacing with the operating system and so on. in order to generate object
code for that machine.
If we say that the analysis phase or front end is language-dependent, --- the analyzers have to
know the syntactic and semantic rules of the language, --- the synthesis phase or the back end is
machine-dependent.
The code generator usually includes some form of code optimizer to produce faster or more com-
pact code. The code generation may include both machine-dependent and machine-independent
techniques. [27]
Self Compilation
While compiler is to translate high level languages into machine code or object code, most com-
pilers themselves are software written in high level languages, some of them are in the source lan-
guages they are supposed to compile. How can this self-compilation be achieved? This is done by
a process called bootstrapping. We are going to take Pascal as an example to try to illustrate the
bootstraping process. Refer to Table 1, suppose there are machines X, Y, Z. Any two of them
sourceprogram
lexicalanalyzer
syntaxanalyzer
Front End
semanticanalyzer
codegenerator
object
program
Back End
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
6/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 6
could be the same or different.
source code of target used compiler object code
Pascal Compiler machine
(1) { Modula-2 for Y } [in Xs assembly language, running on X] Xs object code
(2) { Pascal for Z } [in Modular-2, running on X] Ys object code
(3) { Pascal for any } [Pascal, running on Y] Zs object code
Table 1. Bootstraping
In compilation process (1), a Pascal compiler source code for Y was written in Modula-2. The
code then was compiled by a Modula-2 compiler written in Xs assembly language, and was
translated into Xs object code. Once this new compiler existed, a Pascal compiler source code for
Z written in Pascal could be passed to it and was translated into Ys object code, as in (2). Further,
another Pascal compiler source code written in Pascal for an arbitrary machine could be passed to
the newest compiler and could be translated into Zs object code as in (3). Note a compiler willcompile its input source code into object code of its target machine. While the compiler itself is an
object code of the machine where it is running on. This machine does not have to be its target
machine. [27]
Compiler and Interpreter
We denote interpreted languages to those languages using an interpreter in its compilation pro-
cess. So, whats an interpreter by definition?
A translator takes a program written in a source language as input and translates it into a pro-
gram having the same meaning but written in an object language. If the source language is ahigher-level one, the translator is a compiler. Generally, compiler generates machine code or
abstract machine code from source code.
A interpreter directly executes its source language, without first translating it into an object lan-
guage. Some Lisp or APL implementations could be considered to be pure interpreters. But many
languages implementation consist of both compiler and interpreter. The former translates the
source language into an interpretable intermediate language, in this case, the intermediate lan-
guage is the source language for the interpreter. [2]
With the intermediate language and interpreter, the compilation process becomes more sophisti-
cated, typically as in Figure 2. The semantic analysis phase is often followed by another processthat takes the parse tree from the syntax analyzer and produces a linear sequence of instructions
equivalent to the original source program. [27] The sequence of instructions can be considered as
abstract machine code since it is targeted not to an actual machine but an abstraction of real
machines. This abstraction is often called abstract machine or virtual machine.
Intermediate Language
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
7/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 7
The intermediate language, which occurs between two phases of an language translation process,
Figure 2
is an object code for the first phase and a source language for the second phase. It is very impor-
tant for modern interpreted languages such as Java and plays a great role in languages portability
and security.
With intermediate language, the problem arising from the characteristics of the target hardware
can be confined to the code generator. So the front end of the compiler can be used for any differ-
ent code generator for different machines. And the compiler can be easily ported to different
machine such as by bootstrapping, since now only the code generator is necessary to be ported. If
we substitute the code generator with an interpreter in the back end of the compilation process,
the implementation of a language on new hardware will be further easier since the implementationof an interpreter is much easier than that of a code generator. We will see this typically in Javas
implementation (section 5. a)).
The minor disadvantage of intermediate language is that it is sometimes somewhat harder to gen-
erate optimized machine code from intermediate language than directly from parse tree. [27]
An intermediate language is usually designed for a particular source language. It reflects the con-
machine
code
memory
interpreter
code
ator
semantic
abstract
codegenerator
on-the-fly nativemachinecode
abstract machine code = intermediate language
Front end Back end
analyzer
syntaxanalyzer
lexicalanalyzer
gener-
machine
sourceprogram
code
abstract
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
8/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 8
structs, data types and operators of the source language in its basic operation. For example, P-
machine is a hypothetical stack-based virtual machine with very simple structure and P-code con-
tains many instructions closely related to Pascal languages construct. While the designing of an
intermediate code for several different source languages is hard. Such kind of generic intermedi-
ate language was proposed early in 1950s, a language called UNCOL(UNiversal Computer-Ori-
ented Language), but failed to be developed due to practical difficulties. Recently, people are
thinking about generic intermediate language and there are some on going. We will address this in
next section about Virtual Machine.
Intermediate languages represent internal interfaces in the compilation process and consequently
they can take any suitable form: trees, triples, quadruples, assembly languages, bytecode, etc. Pas-
cals P-code is a famous intermediate language and is in assembly language. [2]
Just-in-time and On-the-fly
As in Figure 2, intermediate language can be interpreted directly by an interpreter, or sometimes
compiled again by native compiler or code generator into native machine code. This native compi-lation process is often Just-in-time compilation, means the native compiler rewrite those com-
puter-intensive sections into native machine code at run-time as necessary, and the native machine
code will not exist in disc file system but directly in memory. So, the compilation of intermediate
language into native machine code is often a on-the-fly compilation which by definition is that:
the output of the compiler does not exist in the disc file system, but is loaded into memory portion
by portion. While the input of on-the-fly compilation could be either high-level source code or
intermediate code, the output could be either intermediate code or native machine code. When the
output is an intermediate code such as Bytecode, interpreter is necessary to interpret the Bytecode
usually cached in memory.
Virtual Machine
As mentioned above, a virtual machine is an abstraction of a family of real machines. To be more
accurate, a virtual machine is a fictitious target machine of an intermediate language, it specifies
an somewhat ideal machine for some kind of convenience, either easier to write a simple-minded
compiler or closer to most real machines.
Most computers now have a set of general-purpose registers. Usually, operations take one of their
operands from a register and the other from memory. Only some of the registers are for addressing
in most cases.[2] So, a register-based virtual machine is closer to most real machine and the emu-
lation of the virtual machine on a real machine needs less native machine instructions thus has less
overhead.
While most virtual machine now are stack-based. A stack machine has few actual registers, but an
operand stack where operations find their operands and put their results. The advantage of stack
machines is that they can be totally independent of computer[2]. And the compilation process is
relatively simple for a stack machine.
Again, the usefulness of an virtual machine stems from the fact that it allows the majority of the
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
9/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 9
compilation process to be isolated from dependency on a specific machine.
Examples of intermediate languages and related abstract machines [2]
P-code and related abstract machine: The abstract machine associated with Pascal-P is very
conventional and flexible. It is a stack machine with five registers: top of stack, base of global
variables, top of heap used for dynamic variables, base of local variables and instruction
counter.
EM-1 intermediate language and related machine: EM-1 is more sophisticated language
than P-code and is closer to actual assembly language than P-code. It contains 130 instruc-
tions and P-code has only 60. It also contains a dozen pseudo instructions. While EM-1
machine is very similar to P-code machine, with a stack of local variable areas whose top is
used as an execution stack, a heap, a global variable area, and a program area.
Janus and its abstract machine: The abstract machine associated with the intermediate lan-
guage Janus has a memory that is divided into several independent areas that are organized astree structures. It uses a stack for expression evaluation, a processing unit to execute Janus
instructions, and three specialized registers: condition code, instruction counter, and index
register.
We will address two other important virtual machine: Java Virtual Machine and Omniware in
Case Study of this paper.
Abstract Machine to Actual Machine
To obtain an actual implementation, the abstract machine must then be transported into actual
machine. Generally, it is the interpreter that executes abstract machines instruction set in actualmachine and gives abstract machine actual implementation. The two generally ways of interpreter
implementation are:
If the intermediate language resembles an assembly language, like Pascals P-code, the base
operations of the abstract machine could be implemented using a macroprocessor. But if the
macroprocessor is not a very powerful one, the resulting code is usually rather inefficient.
An interpreter could be programmed or microprogrammed, which would amount to direct
execution of the abstract machines instruction set, like Javas interpreter.[2]
For performance reasons, interpreter is not the only way for abstract machine to actual machine.The intermediate code could be recompiled usually on-the-fly into targets machines native code.
Portability of Interpreters
The functionality of the front end of the compilation process of a language is identical to different
target machines. With the self-compilation technique, the implementation of the front end could
be totally portable to various platforms, such as Javas compiler javac. While the non-portable part
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
10/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 10
of the implementation of a language is interpreter or code generator.
Like code generator, an interpreter has to make use of the operating system facilities of the target
machine by performing input and output, making use of graphics or window systems, making
storage allocation requests, etc. That is, an interpreter has to deal with the run-time library
which is a convenient way of providing interface between the compiled (abstract) machine code
and the operating system, including a set of routines that can be called by the compiled code to
perform all the machine and operating system dependent functions required by the users high-
level language program. It is possible to write part of the run-time library in high level language,
as Javas API in Java. But at least part of the run-time library will have to be written in a low-level
language to make use of particular machine and operating system facilities.
4. Scripting Languages and Interpreted Languages
Scripting languages are good for their implementing variables, flow control and procedures for
commands and serving as glues for commands.
Scripting Languages are all interpreted. UNIX shell languages are simple scripting languages,
they are interpreted directly with no intermediate languages and virtual machines involved. The
interpreter for UNIX shell language is just a single executable, such as /usr/local/bin/sh for
Bourne Shell, /usr/local/bin/ksh for Korn Shell.
The interpreter that interprets high-level language directly has to include the lexical and syntax
analysis phases in the front end of compilation process. But most directly interpreted high-level
languages such as UNIX shell programming languages are simple enough that the interpreters can
still be kept simple.
But the modern scripting language, such as Perl, is not interpreted directly any more. The designgoal of Perl is to make a scripting language easy to develop and portable. So the implementation
process of Perl is two-phased. Perl is both compiled and interpreted. It is compiled because the
program is completely read and parsed before the first statement is executed. It is an interpreted
because there is no object code sitting around filling up disk space. In some way, its the best of
both world, typically a on-the-fly compiler-interpreter process.
While the compilation does take time -- its inefficient to have a voluminous Perl program that
does one small quick task and then exits, because the runtime for the program will be dwarfed by
the compiler time -- it is more efficient for heavy tasks such as those with a large body of loop.
Compilation will save the time for reparsing. That is why another directly interpreted scripting
language, Tcl is switching to the on-the-fly bytecode compiler-interpreted style like Perl. To takemore advantage of this style of interpreted languages. A caching of the compiled object code
between invocations is used by both Perl and Tcl. We will address Tcls on-the-fly bytecode com-
pilation in case study of Tcl.
The on-the-fly compilation of Tcl or Perl is different from Javas on-the-fly compilation, since
they are on-the-fly bytecode compilation. The whole compiler-interpreter compilation process
happens at run-time; the target of the on-the-fly compilation is bytecode that will be cached in
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
11/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 11
memory and then be interpreted dynamically.
While all scripting languages are interpreted, not all interpreted languages are scripting lan-
guages. An example is the popular language: PostScript, which is a page description language.
PostScript language is typically interpreted, stack-based. The stack-based feature make the source
code of PostScript natural to be interpreted and portable. This feature makes PostScript device
independent, meaning that the image is described without reference to any specific device fea-
tures. So, PostScript files in their source code can be transferred from machine to machine even by
email in ASCII form and then be interpreted by interpreter such as ghostview and those pluggedin printers without any modification.
5. Case Study
a) Java
Overview
Java is a simple, familiar to user, Object-Oriented language. That is because Java takes the syntax
very similar to C and C++ while it is a cleaned-up version of C++. It supports garbage collection
removed off a bunch of features in C and C++ that make C and C++ complex, such as: pointers,
automatic coercions, operator overloading and multiple inheritance, etc.
Other important aspects for Javas success are its internet-related features, as in the following:
Dynamic: In Java, classes are linked only as needed. New code modules can be linked in on
demand from a variety of sources, even from sources across a network. Instead of simply
downloading static pages of texts and images, Javas applets can be download through webbrowser and run in the client machine. This support the image animation and real-time user-
program interaction.
Threaded: Modern network-based applications, such as the HotJava Web browser, typically
need to do several things at the same time. A user can run several animations concurrently
while downloading an image and scrolling the page. Javas multithreading capability provides
the means to support this feature. [11]
The reason why Java is a popular mobile language is that it is architecture neutral and portable.
To accommodate the diversity of operating environments, the Java compiler generated bytecode--an architecture neutral intermediate format designed to transport code efficiently to multiple hard-
ware and software platforms. The interpreted nature of Java solves both binary distribution prob-
lem and the version problem; the same Java language byte codes will run on any platform.
Javas portability also relies on its basic data types and the behavior of its arithmetic operator.
This makes programs the same on every platform. There are no data type incompatibilities across
hardware and software architectures.
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
12/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 12
The self-compilation feature of Java is also a factor that makes Java more portable. Javas com-
piler is written in Java and exists as Java bytecode. Furthermore, Java API and HotJava browser
all exist as bytecode. Finally, in Java system, only the interpreter is left to be run-time system
dependent.
Java Virtual Machine
The architecture-neutral and portable platform of Java is the Java Virtual Machine. Its the specifi-
cation of an abstract machine for which Java compiler can generate code. Specific implementa-
tions of the Java Virtual Machine for specific hardware and software platforms then provide the
concrete realization of the virtual machine. The Java Virtual Machine is based primarily on the
POSIX interface specification -- an industry-standard definition of a portable system interface.
Implementing the Java Virtual Machine on new architectures is a relatively straightforward task as
long as the target platform meets the basic requirements such as support for multithreading. [11]
Java VM is called A soft-CPU. It is a stack-based machine. JVM supports about 248 bytecodes,each performs a basic CPU operation like adding an integer to a register, combining the numbers
in two registers, jumping to subroutines, storing a result, incrementing or decrementing registers,
etc. In effect, JVM is a stacked arithmetic logic unit with local and global variables.
To add two numbers, the VM actually works as follows: the VM first pushes them onto its stack,
then adds them. After completing the addition, the VM leaves the results on the stack for the next
step in the process. To emulate this in a real machine, most probably a register-based machine, it
takes quite a few real machine instructions and memory references. So, there is overhead for the
transportation from stack-based VM to register-based real machine. We will address this further in
section 6 of this paper where addressing Performance.
At the beginning, Java Virtual Machine is the target machine just for Java source language.
Recently, people are trying to support other languages on top of the same Java Virtual Machine.
According to Javas creator James Gosling, languages like Visual Basic, COBOL, Dylan and
Scheme are fairly reasonable bet for the Java VM. So, although JVM was not designed as a
generic virtual machine, it is now intend to serve as one for existing requirements. [28]
Java Language Construct and Javas Interpreter
A Java programmer can create:
Applets: Programs that are included in HTML pages through the APP tag and displayed inthe HotJava browser. The simple hello world program shown in A Simple Java Program is
an applet. The HotJava browser is invoked by the hotjava command included in the Java code
distribution.
Applications: The stand-alone program written in Java and executed independently is the
HotJava browser. This is done using the Java interpreter--java, included in the Java code distri-
bution.
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
13/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 13
Protocol handlers: Programs that are loaded into the users HotJava browser and interpreter
protocol. These protocols include standard ones such as HTTP or programmer-defined proto-
cols.
Content handlers: A program loaded into the users HotJava browser, which interprets files
of a type defined by the Java programmer. The Java programmer provides the necessary code
for the users HotJava browser to display/interpret this special format.
Native methods: Methods that are declared in a Java class but implemented in C. These
native methods essentially allow a Java programmer to access C code from Java. [10]
There is another tools in JDK called AppletViewer for testing and running applets. AppletViewer
also has Java interpreter, java, embedded.
Java Interpreter is plugged into every Java-enabled web browser. Here is a practical way to under-
stand the technical description of Java by looking at the processes that occur when a user with aJava-enabled browser requests a page containing a Java applet:
1. The user sends a request for an HTML document to the information providers server.
2. The HTML document is returned to the users browser. The document contains the APP tag,
which identifies the applet.
3. The corresponding applet bytecode is transferred to the users host. This bytecode had been
previously created by the Java compiler using the Java source code for that applet.
4. The Java-enabled browser on the users host interprets the bytecode and provides the display.
5. The user may have further interaction with the applet but with no further downloading from
the providers Web server. This is because the bytecode contains all the information necessary
to interpret the applet.
b) Tcl/Tk
Overview
Tcl stands for Tool Command Language, which is an extensible embedded command language or
a scripting language, implemented by John Ousterhout originally from University of California,Berkeley, now working for Sun.
What makes Tcl different from other scripting languages is the ability of easily adding a Tcl
interpreter to applications. A Tcl interpreter consists of a set of commands, a set of variable
bindings and a command execution state. It is the basic unit manipulated by most of the Tcl
library procedures. Applications may have one or more interpreters according to their complexity
respectively. Multiple interpreters may responsible for different purposes. Tcl commands may be
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
14/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 14
built-in commands such as those flow control key words: for, if, case, eval, etc. or may be appli-
cation-specific commands defined by users. The application-specific commands have no limit to
be extended up to the developer and user group. Since programmers can structure their applica-
tions using a set of primitive operations as well as any existing commands together with any new
command(s) developed by themselves to best suit their need, there is no need to invent a com-
mand language for new application. All commands are embedded in Tcl code via creating inter-
preter object(s) inside the application by calling library procedures, similar to defining an extern
function in C. That is natural for Tcl to create an interpreter inside the Tcl source code since an
interpreter is equally a set of commands. Unlike other languages, such as Java, where an inter-
preter is a separate executable even though the execution of the interpreter costs memory and CPU
time
concurrently with interpreting the bytecode.
The aspects that set Tcl apart form other extension languages, such as Scheme, Elisp and Python
are: (1). Tcl has simple constructs somewhat like C and Tcl primitives are written in C or C++
procedures. (2). Tcl C library provides a clean interface to native C code. (3). Most extensions
include new functionality such as socket access for network programming, database access, tele-phone control and expected interactive features. (4) Tcl is open to be developed by its community.
[21]
The most notable extension of Tcl is Tk, a toolkit for X windows as well as for windows and Mac.
Tk provides a convenient way for user to build Motif-based GUIs because of its higher-level inter-
face to X and its rapid turnaround in development.
Safe-Tcl is a subset of Tcl where access to system resource is controlled. With something secure,
Safe-Tcl is for running network agents. With the combination of Tk and Safe-Tcl, a web browser
called TkWWW is now available for free. [12]
On-the-fly Bytecode compiler for Tcl [19]
Although Tcl has bunch of advantages as a new scripting language. Its lack of structure and slow-
ness make it not good for large applications. To improve Tcls performance, people in Sun Micro-
systems Laboratories are working on an on-the-fly bytecode compiler for Tcl. Below are some
direct quotation from the paper An On-the-fly Bytecode Compiler for Tcl by Brian T. lewis of
that lab:
So far Tcl is interpreted directly. Although the current Tcl interpreter is fast enough for most Tcl
uses, there are many applications that need greater speed. The two main performance problems incurrent Tcl system (Tcl 7.5) are script reparsing and conversions between strings and other data
representations. The current interpreter spends as much as 50% of its time in parsing. it reparses
the body of a loop, for example, on each iteration. Data conversions also consume a great deal of
time. It is reported that 92% of the time in incrs command procedure Tcl_incrCmd() was spent
converting between strings and integers.
To solve these performance problems, a new Tcl compiler and interpreter are being developed at
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
15/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 15
Sun Microsystems Laboratories. Their goal for the bytecode compiler is to improve the speed for
compute intensive Tcl scripts by a factor of 10.
The compiler translates Tcl scripts at program runtime, or on-the-fly, into a sequence of bytecode
instructions that are then interpreted. The compiler eliminates most runtime script parsing. It also
makes many decisions at compiler time that are made now only at runtime. It can tell, for exam-
ple, whether a variable name refers to a scalar or an array element. It also compiles away many
type conversions. As an example, it can recognize whether the argument string specifying the
increment amount in an incr command represents a constant integer.
The bytecode interpreter uses dual-ported objects extensively. These objects contain both a
string and an internal representation appropriate for some data type. For example, a Tcl list is now
represented as an object that holds the lists string representation as well as an array of pointers to
the objects for each list element, dual-port objects avoid most runtime type conversions. they also
improve the speed of many operations since an appropriate representation is available. The com-
piler itself uses dual-ported objects to cache the bytecode resulting from the compilation of each
script.
c) Both are Web Programming Languages
As we mentioned in a) of this section, Java is an Internet-Oriented language. Tcl/Tk is also closely
related to Web programming. Sun has recently released a Tcl/Tk plug-in for NetScape Navigator.
It allows Web pages to contain Tcl/Tk scripts and display interfaces in the browser window. The
plug-in used the Safe-Tcl mechanism to ensure that even untrusted script can be executed safely.
So whats the difference between Java and Tcl/Tk?
Tcl is a high-level scripting language. It is good for creating small and medium-sized applicationsquickly and gluing existing things together. It has a simple syntax and almost no structure, which
makes it good for scripting. However, at least so far, Tcl is an directly interpreted language so it
may not perform well for very large tasks. Think of Tcl as something like UNIX shell, except that
it is embeddable and portable and can be used for Internet scripting, including CGI implementa-
tion.
Java, on the other hand, is a system programming language like C or C++. it is much more struc-
tures than Tcl. This makes Java easier to build large complex application than Tcl. Java is also
compiled, which results in great efficiency. Java also supports multi-threading, whereas Tcl does
not. Think of Java as something like C++ except simpler and more powerful and with facilities for
sending Java programs around the Internet as executable content. [20]
Since both Java and Tcl are properties of Sun and both are web programming languages, people
are thinking of a marriage of Java and Tk, using Tk as the GUI building part of Java. It is said that
Sun has a early version of a Tcl-to-Java interface.
d) Another Mobile Language: Omniware
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
16/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 16
Mobile language is pretty popular recently. It denotes those languages that can be easily ported
and widely run on many nodes of the network. Since any programming language can be a web
programming language and does not have to be portable, it is better to call Java, Perl and Tcl
mobile languages.
Another notable mobile language in our reports point of view is Omniware.
Omniware is an interpreted language with two-phase compiler-interpreter process. It defines a vir-
tual machine called OmniVM.
The advantages of Omniware are:
1. OmniVM is a register-based virtual machine, and thus, it is closer to most real machine. So,
the transportation from OmniVM to real machines is a shorter and lighter process than from
Java Virtual Machine which is stack-based.
2. The design of OmniVM has all languages with C/C++ constructs in mind. So it can be thecompiler targets of C/C++ and many others. In this case, Omniware serves somewhat a
generic virtual machine.
Omniware uses a technique called Software-based Fault Isolation which adds instructions to
check at runtime that addresses are within legal address space to provide security, but as many
other mobile languages, access to hosts system resource still remain a big problem in Omniware.
[12]
6. Why Interpreted Languages?
Now the hottest languages such as Java, Tcl/Tk and Perl are all interpreted languages. Why? Animportance reason we think is they are all closely related with Internet. To be Internet-Oriented,
the most importance feature of the language is portable. It has to operate in distributed environ-
ment, which means that security is of paramount importance. Interpreted languages have advan-
tage to support both these features.
Portability
A program is portable if the effort required for its transport is much less than the effort required
for its initial implementation and if its initial qualities remains the same after the transport. The
portability of a program can be evaluated by measuring the transport effort. For example, if I is the
work involved in initial implementation, and T is the work involved in transport, then the pro-grams portability can be evaluated as: (I-T)/I. Hence any program can be mathematically deter-
mined to be 100 percent portable, which means that there is no transport effort involved, but this is
impossible. [2]
A mechanism that support software portability thus is the mechanism that can reduce efforts in
software transportation. Some significant this kind of mechanisms are:
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
17/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 17
A compiler generates intermediate code that is independent of the target computer. If the com-
piler is self-compilation, itself is also portable. This is typically a compiler-interpreter
mechanism with typical example as Pascal-P, Snobol4 and Java.
A compiler can also be divided into two parts, the front end depending on the source language
and the back end on the object language which in turn depends on the target machine. The
interface between these two parts, if well designed, can be independent of both languages. A
on-going study of generic virtual machine focuses on this mechanism. The mobile language
Omniware mentioned above is a nice try in this category.
Isolating those platform dependent parts of software, then using configuration tools such as
imake to enable code to be compiled and installed on different platforms.
The first two mechanisms are typically realized by interpreted languages with virtual machine.
The virtual machines of interpreted languages are the platforms for architecture neutral and porta-
ble languages. In this case, Java and Omniware are the typical examples.
Security
Part of Javas security mechanism comes from its language design policy: simplicity. It excludes
many dangerous features in C++, such as pointer, with which programmer could directly manipu-
late memory by accidents. And at the same time, Java provide automatic garbage collection. But
the more important security mechanisms comes from its compiler-interpreter nature mentioned
above.
The compiler-interpreter mechanism with bytecode provides several levels of security defense for
Java. The first level is provided by the extensive compile-time checking. A trustworthy compiler
ensures that Java source code does not violate the safety rules. The second level is provided bybytecode verifier. This happens in the run time. Java just does not trust any applet coming from
anywhere of the internet, and the bytecode verifier has to ensure that the code passed to Java inter-
preter is in a fit state to be executed and can run without breaking Java interpreter. The third level
defense is done by the class loader. The class loader dynamically partition each network class
source into its own private namespace and then prevents classes in one namespace from polluting
other namespace. [13]
While Javas security is mainly provided by its compiler-interpreted mechanism, Tcls security is
provided by Safe-Tcl.
Safe-Tcl is a mechanism that initializes a Tcl interpreter to a safe subset of Tcl commands so thatTcl scripts cannot harm their hosting machine or application. There are also mechanisms to grant
privileges to a safe interpreter so the script can do non-trivial things.
So the basic approach to ensuring safety is to first completely remove the file command from safe
interpreters and then replaced with command aliases. The NetScape Tcl plug-in supports Tcl/Tk
applets, also called Tclets. The Tcl plug-in implements the standard Safe-Tcl subset, plus a lim-
ited version of Tk.
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
18/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 18
Command aliases are the primary mechanism provided by Safe-Tcl to grant privileges. An alias is
a command in the untrusted interpreter that is really implemented by a different, fully trusted
interpreter. This is much like the user-mode and kernel-modes in multiuser operating systems. In
Safe-Tcl, an untrusted script is isolated in its interpreter context, and given a few extra commands
that are carefully implemented by another Tcl interpreters to ensure safety.
Reusability
Scripting languages as interpreted languages typically provide glue for commands. A shared, uni-
versal scripting language like Tcl serves as a powerful and flexible glue for assembling reusable
components.
Tcl is a reusable command language because almost everything in this language is a command,
from the Flow Control: for, if, case, continue, etc. to Variables and Procedures: global, proc,return, set. These built-in commands provide programmability and extensibility for free. Users
of Tcl will feel free to develop any application-specific commands similar to those UNIX com-mands to UNIX shell. And these commands will appear the same as the built-in commands in Tcl.
The most important design goal of Tcl is reusability. Thus it is component-approached. rather
than building a new application as self-contained monolith with hundreds of thousands of lines of
code, Tcl is a combination of many smaller reusable components. Each component would be
small enough to be implemented by a small group, and interesting applications could be created
by assembling existing components. [17]
Rapid Development
Reusability provides a way for rapid development of software application. The scripting or inter-preted nature of interpreted languages are obvious good for rapid development. Instead of the
heavyweight compiler, link, crush, debug cycles, interpreted languages can be interpreted directly
and are easier to trace whats happening in the interpreting processes.
Performance
Currently, Java runs about 30 times slower than an equivalent C program. This seems not very bad
considering those advantages Java has. Actually, performance is always a consideration of Javas
designer. They thought they have achieved a superior performance by adopting a scheme by
which the interpreter can run at full speed without needing to check the runtime environment.
Also, the automatic garbage collection runs as a low-priority background thread, ensuring a highprobability that memory is available when required, leading to better performance. Whats more,
Sun have also been improving performance by providing just in time compilation of the byte-
code into native code. Applications requiring large amounts of computer power can be designed
such that compute-intensive sections can be rewritten in native machine code as required and
interfaced with the Java platform.
In general, Javas interactive applications respond quickly even though they are interpreted. But
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
19/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 19
the efforts to improve performance will never get to an end. The current performance of Java still
cannot meet the needs of a category of applications.
Typically, an interpreted language has relatively low performance because of the overhead for
fetching and decoding each virtual command or virtual instruction before performing the work
specified by the commands. Most virtual machines at present are stack-based while most real
machine are register-based. Interpreting the intermediate code to emulate corresponding virtual
machine on a real machine thus is a heavier process compared with the situation where the virtual
machine and the real machine have similar structure, either both stack-based or both register-
based.
In Java, interpreting consists of token threadings. Each token threading is for one bytecode execu-
tion. A token threading requires about three instruction and five memory references. And each vir-
tual instruction required several real machine instruction. For example, executing an integer add
(IADD) of JVM on most general-purpose processor-Sparc, 80x86, 680X0, PowerPC, ARM and
MIPS-requires at least seven conventional processor instructions when using a C source code
interpreter.
To improve performance, a just-in-time compilation technique has been applied to Java which
translates Java bytecode into instructions for the host processor at runtime. This technique does
improve Javas performance by several times. Since native code compilers (or code generators)
are usually complex software which cost both memory and execution time. This JIT compilation
uses a less aggressive optimization which just translate each byte-code to in-line machine code or
keep the top of the stack in a register.
The performance improvement by JIT compilation is limited and it compromises with memory
cost. There are arguments that the most efficient execution vehicle for many Java applications
would be a dedicated Java chip which directly executes the Bytecode. Sun is now building apicoJava chip which is a microcontroller intended to directly execute Java Bytecodes. It is a
simple, stack-based processor. Rather than being a pure stack architecture, the machine would
have specific hardware features for dealing with Bytecode and other hardware feathers to fit gar-
bage collection, object-oriented, multithreading nature of Java.
Now forget those rare-existed and newly-designed stack-based real machines, and lets talk about
just stack-based virtual machines on register-based real machines.
As mentioned above, the execution time of an interpreted program depends on the number of
commands interpreted, the fetching and decoding cost of each command, and the time spent actu-
ally executing the operation specified by the commands. Since the number of commands requiredto accomplish a given task depends on the level of the virtual machine of the language, the perfor-
mance of a interpreted language mainly depends on the level of the virtual machine defined for
that language. A simple virtual machine might require the execution of a large number of com-
mands, like Java. But the overhead of each virtual command is small and nearly fixed. In contrast,
Perl and Tcl each define complex virtual machines and result in non-uniform slowdowns relative
to the C implementations even their virtual machine can execute a given program in fewer com-
mands. [26]
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
20/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 20
As we mentioned in section 5. b), An On-the fly Bytecode Compiler for Tcl is being implemented
to improve Tcls performance. And caching those compiled bytecode will be very helpful in
improving the performance of the interpreted languages such as Perl and Tcl. [24]
Other Advantages of Interpreted Languages [27]
Type of a variable could change dynamically during execution
Compiling efficient code to handle a dynamic typing where type of a variable could change during
execution time is hard as the type of a variable is not known at compile time. While an interpreter
could handle this situation easily and efficiently.
An interpreter can be very good for debugging
The interpreter can access the source program in its original form or in an internal form at any
time. It also keeps holding a symbol table containing variable names and values. So, programmerscan get diagnostic information in easily understandable forms.
7. Summary
Most of interpreted languages have Virtual Machine, either explicitly defined such as Javas vir-
tual machine, or implicitly defined such as Tcls and Perls. Some simple scripting languages such
as UNIX shell PLs are interpreted directly and do not have virtual machine.
Virtual Machines play a great role in interpreted programming languages. With the assistance of
Virtual Machine, the compiler-interpreter mechanism provides portability, security and better per-
formance for interpreted PLs.
Scripting languages as interpreted languages are good for gluing programming components.
When the group of components are open for extension, such as Tcls commands, built-in plus
application-specific commands, the language can provide great reusability.
The development processes of interpreted languages are relatively lightweight compared with the
compile-link-test cycles in a traditional compiled language. So, interpreted languages are good for
rapid development.
Acknowledgment
We would like to appreciate our Professor Benjamin Zorn for guidance of the topics in this paper.
We believe that without his help we would have been still in a maze.
.
References
[1]. Wirth N. From Programming Language Design to Computer Construction, ACM, February
1985, Vol 28, No. 2
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
21/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
By Xiaoli Zhang & Helen Wong Dec. 11, 1996 21
[2]. Lecarm O., Cart M. P., Gart M. Software Portability, McGraw-Hill Publishing Company,
1989
[3]. Kamin S. N. Programming Languages: an Interpreter-Based Approach, Addison-Wesley Pub.
Co., 1990
[4]. Pembereton S., Daniels M. Pascal Implementation: The P4 Compiler and Interpreter, ISBN:
0-13-653-0311
[5]. Newsgroup: comp.lang.smalltalk
[6]. Byrne S. B. GNU Smalltalk Users Guide, http://www.cs.utah.edu/csinfo/texinfo/mst/
mst_toc.html
[7]. Goldberg, Robson, Smalltalk-80: The Language and Its Implementation, Addison Wesley,
1983, ISBN 0-201-11371-6
[8]. Sun Microsystems. The Java Virtual Machine Specification. http://java.sun.com/doc/vmspec/
html/vmspecl.html, 1995
[9]. Gosling, J, Java Intermediate Bytecodes, ACM SIGPLAN Workshop on Intermediate Repre-
sentation, Jan. 1995
[10]. Sun JavaSoft: Getting Started: The Java Developers Kit
[11]. Sun JavaSoft: Design Goals of Java 1.2
[12]. Caron J. Java: Status Report and Language Overview, CSCI 5535 Project, Dec. 1995, Uni-
versity of Colorado at Boulder
[13]. Wang W, An Y, Zang L, Security --- How is it implemented in the Java language?, CSCI 5535
Project, Dec. 1995, University of Colorado at Boulder
[14]. Sun JavaSoft: A Look Inside the Java Platform
[15]. Sun JavaSoft: The Java language Environment, a White paper
[16]. Abelson, H. and Sussman, G.J. Structure and Interpretation of Computer Programs, MITPress, Cambridge, MA, 1985
[17]. Ousterhout J. K. Tcl and Tk Toolkit, Addison-Wesley, ISBN 0-201-63337-X
[18]. Ousterhout J. K. Tcl: An Embeddable Command language, USENIX Conference Proceed-
ings, 1990
8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages
22/22
CSCI 5535 Course Project -- A Report On Interpreted Programming Languages
[19]. Lewis B. T. An On-the-fly Bytecode Compiler for Tcl. http://www.sunlabs.com/people/
brian.lewis/
[20]. Ousterhout. J. K. Whats Happening at Sun Labs. http://www.sunlabs.com/research/tcl/
team.html, April 1996
[21]. Welch B. Practical Programming in Tcl and Tk, Prentice-Hall, 1995, ISBN 0-13-182007-9
[22]. newsgroup: comp.lang.tcl
[23]. Ousterhout. J. K. An Introduction To Tcl Scripting, http://www.sunlabs.com/people/
john.ousterhout/
[24]. Schwartz R. L. Learning Perl, OReilly & Associates, Inc. 1993
[25]. Perl Documentation, http://www.csc.tntech.edu/docs/perl.html
[26]. Romer, T. H. Lee D. etc. The structure and Performance of Interpreters, ACM, Oct.1996
[27]. Watson, D High-level Languages and Their Compilers, Addison-Wesley Publishing Com-
pany, 1989
[28]. Gosling on Java, DATAMATION, March 1, 1996