CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

8/2/2019 CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

1/22

CSCI 5535 Course Project -- A Report On Interpreted Programming Languages

By Xiaoli Zhang & Helen Wong Dec. 11, 1996 1

CSCI 5535 Project

A Report on

Interpreted Programming languages

by

Xiaoli Zhang

Helen Wong

December 11, 1996


2/22



Content

1. Introduction

2. Two important Languages in the Evolution of Interpreted Languages

PascalSmalltalk

3. Interpreter and Virtual Machine

Traditional Compilation Process

Self Compilation

Compiler and Interpreter

Intermediate Language

Just-in-time and On-the-fly

Virtual Machine

Examples of intermediate languages and related abstract machines

Abstract Machine to Actual MachinePortability of Interpreters

4. Scripting Languages and Interpreted Languages

5. Case Study

a) Java

Overview

Java Virtual Machine

Java Language Construct and Javas Interpreter

b) Tcl/TkOverview

On-the fly Bytecode compiler for Tcl

c) Both are Web Programming Languages

d) Another Mobile Language: Omniware

6. Why Interpreted Languages?

Portability

Security

Reusability

Rapid Development

Performance

Other Advantages of Interpreted Languages

7. Summary


3/22



1. Introduction

Interpreted Languages have become more and more popular. In recent years, interpreted language

such as Java, Tcl/Tk and Perl are the hot topics and wide-spread. Why? Generally, it is because

they are portable, easy to use, fast to develop and safe. And most interpreted languages are closely

related to Web programming. In this paper, we will do some study to expose the nature of inter-

preted programming languages and how these features of interpreted languages are achieved.

2. Two important Languages in the Evolution of Interpreted Languages

Pascal

Pascal is one of the early interpreted language developed by Niklaus Wirth. The non-interpreted

Pascal was designed and implemented in 1967. The first Pascal compiler was implemented for the

CDC6000 computer family. It was written in Pascal itself. In implementing Pascal compiler,

Wirth found that the effort to generate good code is proportional to the mismatch between lan-

guage and machine, and the CDC6000 had certainly not been designed with high-level languagein mind. [1]

Whats more, after the existence of Pascal became well-known, many people asked Wirth for

assistance in implementing Pascal on various other machines. Most of them wanted to use Pascal

for teaching purpose. They liked Pascal for its simplicity and implementation elegance while did

not care much about the performance.

Thereupon, Wirth decided to provide a compiler version that would generate code for machines of

different designs. Later, the code became known as P-code. P-code is an abstract machine code

whose target is a Virtual Machine called P-machine. As an intermediate language, P-code is then

interpreted to emulate its virtual machine on real machine. The P-code version Pascal was easy toconstruct because the new compiler was developed as a substantial exercise in structured pro-

gramming by stepwise refinement and therefore the first few refinement steps could be adopted

unchanged. It also proved to be very successful in spreading the language among many users on

different machines. Wirth had regretted that he had not possessed the wisdom to foresee the

dimensions of this movement. Otherwise, he would have put more efforts into designing and doc-

umenting P-code. [1]

Pascals P-code and related Virtual Machine elaborated the concept of existed Intermediate Lan-

guage and Virtual Machine and thus are very important in the evolution of interpreted lan-

guages. Now, P-code has almost become a household word in the area of programming

languages. With the Virtual Machine, Pascal-P system was well developed to an environment withintegrated compiler, filter, editor, and debugger. This caused Pascals further spreading out.

As mentioned above, Pascal-P is both compiled and interpreted. It has both compiler such as

pcom and interpreter such aspint. [4] As a whole, it takes place in two phases, first the compiler

compiles a source code into P-code, and then the interpreter interprets the P-code. This imple-

mentation used self-compilation: The compiler is written in its own source language and can

compiler itself. This approach is a common combination of elementary methods and is called


4/22



bootstrap which is also very helpful in software migration. In Pascal-P, The resulting compiler is

written in the Virtual Machine Language -- P-code and generates code for this same machine.

Hence the compiler itself must be interpreted. [2] Similar story happens in Javas implementation.

Smalltalk

Another significant interpreted language is Smalltalk which was developed during the 1970s at

Xerox PARC (Palo Alto Research Center). It was the first language to really exploit a graphical

user interface. Many of the ideas for the Macintosh came from Smalltalk. Smalltalk is more of an

envrionment rather than a language. This is because there is a Smalltalk Virtual Machine, and the

entire operation of the Smalltalk environment and language is built on the virtual machine. [6]

We here call Smalltalk an interpreted language solely because Smalltalk is a P-Machine. What

actually happens as a result of a message sent in Smalltalk is:

first the system checks to see if the method has already been translated to machine code that

has been cached in memory if the native machine code form is in the cache, the system executes that machine code

if the cache doesnt contain a translated form of the method, the system dynamically compiles

the methods bytecode [5]

Dynamic translation yields the benefits of the execution speed of compiled code and the space

compactness of bytecode. If all the code in a running Smalltalk image were kept purely in the

form of compiled machine code, the image would consume 5-10 times as much memory, and

therefore could in fact degrade performance on a virtual memory system by causing increased

paging. [7]

Many features in Smalltalk are worthy to be borrowed by new interpreted languages such as Java.One of these features isjust-in-time compilation we mentioned above. Currently, Java is imple-

menting just-in-time compilation of the bytecode into native code to improve its performance.

We will address some not well-known interpreted Languages next section in introducing Inter-

preter and Virtual Machine. Also we will address in details those popular interpreted languages

such as Java and Tcl as case study while address Perl as scripting language.

3. Interpreter and Virtual Machine

In last section, quite a few terms (in bold characters) related to interpreter languages are men-

tioned. This section, we will report in details the concepts represented by these terms. Lets go alittle bit backward.

Traditional Compilation Process

The process that translate a high-level language into machine code, which the hardware can

understand is done by the compiler. The task of compiler has two subtasks: analysis of the source

program and synthesis of the object program. Typically as in figure 1, the analysis tasks consists


5/22



of three subphases: lexical analysis, syntax analysis and semantic analysis. While the synthesis

task is usually a single phase: code-generation

Figure 1

The lexical analyzer is responsible for reading the characters of the source program and recogniz-ing basic syntactic components or tokens that they represent and returning the tokens to the syntax

analyzer or parser. Then the parser has to determine how to group and structure the tokens accord-

ing to the syntax rules of the language. The output of parser is a representation of the syntactic

structure of the source program and often expressed in the form of parse tree. The parse tree is

then passed to the semantic analyzer which is to determine the meaning of the source program

including the meaning of declarations and scopes of identifiers, storage allocation, type checking,

selection of appropriate polymorphic operators, addition of automatic type transfers, etc.

The code generator in the last phase of the compilation process takes the output from the semantic

analyzer as input and generate machine code or assembly language for the target hardware.

It has to know the machine architecture including machine instructions, allocation of machineregisters, addressing, interfacing with the operating system and so on. in order to generate object

code for that machine.

If we say that the analysis phase or front end is language-dependent, --- the analyzers have to

know the syntactic and semantic rules of the language, --- the synthesis phase or the back end is

machine-dependent.

The code generator usually includes some form of code optimizer to produce faster or more com-

pact code. The code generation may include both machine-dependent and machine-independent

techniques. [27]

Self Compilation

While compiler is to translate high level languages into machine code or object code, most com-

pilers themselves are software written in high level languages, some of them are in the source lan-

guages they are supposed to compile. How can this self-compilation be achieved? This is done by

a process called bootstrapping. We are going to take Pascal as an example to try to illustrate the

bootstraping process. Refer to Table 1, suppose there are machines X, Y, Z. Any two of them

sourceprogram

lexicalanalyzer

syntaxanalyzer

Front End

semanticanalyzer

codegenerator

object

program

Back End


6/22



could be the same or different.

source code of target used compiler object code

Pascal Compiler machine

(1) { Modula-2 for Y } [in Xs assembly language, running on X] Xs object code

(2) { Pascal for Z } [in Modular-2, running on X] Ys object code

(3) { Pascal for any } [Pascal, running on Y] Zs object code

Table 1. Bootstraping

In compilation process (1), a Pascal compiler source code for Y was written in Modula-2. The

code then was compiled by a Modula-2 compiler written in Xs assembly language, and was

translated into Xs object code. Once this new compiler existed, a Pascal compiler source code for

Z written in Pascal could be passed to it and was translated into Ys object code, as in (2). Further,

another Pascal compiler source code written in Pascal for an arbitrary machine could be passed to

the newest compiler and could be translated into Zs object code as in (3). Note a compiler willcompile its input source code into object code of its target machine. While the compiler itself is an

object code of the machine where it is running on. This machine does not have to be its target

machine. [27]

Compiler and Interpreter

We denote interpreted languages to those languages using an interpreter in its compilation pro-

cess. So, whats an interpreter by definition?

A translator takes a program written in a source language as input and translates it into a pro-

gram having the same meaning but written in an object language. If the source language is ahigher-level one, the translator is a compiler. Generally, compiler generates machine code or

abstract machine code from source code.

A interpreter directly executes its source language, without first translating it into an object lan-

guage. Some Lisp or APL implementations could be considered to be pure interpreters. But many

languages implementation consist of both compiler and interpreter. The former translates the

source language into an interpretable intermediate language, in this case, the intermediate lan-

guage is the source language for the interpreter. [2]

With the intermediate language and interpreter, the compilation process becomes more sophisti-

cated, typically as in Figure 2. The semantic analysis phase is often followed by another processthat takes the parse tree from the syntax analyzer and produces a linear sequence of instructions

equivalent to the original source program. [27] The sequence of instructions can be considered as

abstract machine code since it is targeted not to an actual machine but an abstraction of real

machines. This abstraction is often called abstract machine or virtual machine.

Intermediate Language


7/22



The intermediate language, which occurs between two phases of an language translation process,

Figure 2

is an object code for the first phase and a source language for the second phase. It is very impor-

tant for modern interpreted languages such as Java and plays a great role in languages portability

and security.

With intermediate language, the problem arising from the characteristics of the target hardware

can be confined to the code generator. So the front end of the compiler can be used for any differ-

ent code generator for different machines. And the compiler can be easily ported to different

machine such as by bootstrapping, since now only the code generator is necessary to be ported. If

we substitute the code generator with an interpreter in the back end of the compilation process,

the implementation of a language on new hardware will be further easier since the implementationof an interpreter is much easier than that of a code generator. We will see this typically in Javas

implementation (section 5. a)).

The minor disadvantage of intermediate language is that it is sometimes somewhat harder to gen-

erate optimized machine code from intermediate language than directly from parse tree. [27]

An intermediate language is usually designed for a particular source language. It reflects the con-

machine

code

memory

interpreter

code

ator

semantic

abstract

codegenerator

on-the-fly nativemachinecode

abstract machine code = intermediate language

Front end Back end

analyzer

syntaxanalyzer

lexicalanalyzer

gener-

machine

sourceprogram

code

abstract


8/22



structs, data types and operators of the source language in its basic operation. For example, P-

machine is a hypothetical stack-based virtual machine with very simple structure and P-code con-

tains many instructions closely related to Pascal languages construct. While the designing of an

intermediate code for several different source languages is hard. Such kind of generic intermedi-

ate language was proposed early in 1950s, a language called UNCOL(UNiversal Computer-Ori-

ented Language), but failed to be developed due to practical difficulties. Recently, people are

thinking about generic intermediate language and there are some on going. We will address this in

next section about Virtual Machine.

Intermediate languages represent internal interfaces in the compilation process and consequently

they can take any suitable form: trees, triples, quadruples, assembly languages, bytecode, etc. Pas-

cals P-code is a famous intermediate language and is in assembly language. [2]

Just-in-time and On-the-fly

As in Figure 2, intermediate language can be interpreted directly by an interpreter, or sometimes

compiled again by native compiler or code generator into native machine code. This native compi-lation process is often Just-in-time compilation, means the native compiler rewrite those com-

puter-intensive sections into native machine code at run-time as necessary, and the native machine

code will not exist in disc file system but directly in memory. So, the compilation of intermediate

language into native machine code is often a on-the-fly compilation which by definition is that:

the output of the compiler does not exist in the disc file system, but is loaded into memory portion

by portion. While the input of on-the-fly compilation could be either high-level source code or

intermediate code, the output could be either intermediate code or native machine code. When the

output is an intermediate code such as Bytecode, interpreter is necessary to interpret the Bytecode

usually cached in memory.

Virtual Machine

As mentioned above, a virtual machine is an abstraction of a family of real machines. To be more

accurate, a virtual machine is a fictitious target machine of an intermediate language, it specifies

an somewhat ideal machine for some kind of convenience, either easier to write a simple-minded

compiler or closer to most real machines.

Most computers now have a set of general-purpose registers. Usually, operations take one of their

operands from a register and the other from memory. Only some of the registers are for addressing

in most cases.[2] So, a register-based virtual machine is closer to most real machine and the emu-

lation of the virtual machine on a real machine needs less native machine instructions thus has less

overhead.

While most virtual machine now are stack-based. A stack machine has few actual registers, but an

operand stack where operations find their operands and put their results. The advantage of stack

machines is that they can be totally independent of computer[2]. And the compilation process is

relatively simple for a stack machine.

Again, the usefulness of an virtual machine stems from the fact that it allows the majority of the


9/22



compilation process to be isolated from dependency on a specific machine.

Examples of intermediate languages and related abstract machines [2]

P-code and related abstract machine: The abstract machine associated with Pascal-P is very

conventional and flexible. It is a stack machine with five registers: top of stack, base of global

variables, top of heap used for dynamic variables, base of local variables and instruction

counter.

EM-1 intermediate language and related machine: EM-1 is more sophisticated language

than P-code and is closer to actual assembly language than P-code. It contains 130 instruc-

tions and P-code has only 60. It also contains a dozen pseudo instructions. While EM-1

machine is very similar to P-code machine, with a stack of local variable areas whose top is

used as an execution stack, a heap, a global variable area, and a program area.

Janus and its abstract machine: The abstract machine associated with the intermediate lan-

guage Janus has a memory that is divided into several independent areas that are organized astree structures. It uses a stack for expression evaluation, a processing unit to execute Janus

instructions, and three specialized registers: condition code, instruction counter, and index

register.

We will address two other important virtual machine: Java Virtual Machine and Omniware in

Case Study of this paper.

Abstract Machine to Actual Machine

To obtain an actual implementation, the abstract machine must then be transported into actual

machine. Generally, it is the interpreter that executes abstract machines instruction set in actualmachine and gives abstract machine actual implementation. The two generally ways of interpreter

implementation are:

If the intermediate language resembles an assembly language, like Pascals P-code, the base

operations of the abstract machine could be implemented using a macroprocessor. But if the

macroprocessor is not a very powerful one, the resulting code is usually rather inefficient.

An interpreter could be programmed or microprogrammed, which would amount to direct

execution of the abstract machines instruction set, like Javas interpreter.[2]

For performance reasons, interpreter is not the only way for abstract machine to actual machine.The intermediate code could be recompiled usually on-the-fly into targets machines native code.

Portability of Interpreters

The functionality of the front end of the compilation process of a language is identical to different

target machines. With the self-compilation technique, the implementation of the front end could

be totally portable to various platforms, such as Javas compiler javac. While the non-portable part


10/22



of the implementation of a language is interpreter or code generator.

Like code generator, an interpreter has to make use of the operating system facilities of the target

machine by performing input and output, making use of graphics or window systems, making

storage allocation requests, etc. That is, an interpreter has to deal with the run-time library

which is a convenient way of providing interface between the compiled (abstract) machine code

and the operating system, including a set of routines that can be called by the compiled code to

perform all the machine and operating system dependent functions required by the users high-

level language program. It is possible to write part of the run-time library in high level language,

as Javas API in Java. But at least part of the run-time library will have to be written in a low-level

language to make use of particular machine and operating system facilities.

4. Scripting Languages and Interpreted Languages

Scripting languages are good for their implementing variables, flow control and procedures for

commands and serving as glues for commands.

Scripting Languages are all interpreted. UNIX shell languages are simple scripting languages,

they are interpreted directly with no intermediate languages and virtual machines involved. The

interpreter for UNIX shell language is just a single executable, such as /usr/local/bin/sh for

Bourne Shell, /usr/local/bin/ksh for Korn Shell.

The interpreter that interprets high-level language directly has to include the lexical and syntax

analysis phases in the front end of compilation process. But most directly interpreted high-level

languages such as UNIX shell programming languages are simple enough that the interpreters can

still be kept simple.

But the modern scripting language, such as Perl, is not interpreted directly any more. The designgoal of Perl is to make a scripting language easy to develop and portable. So the implementation

process of Perl is two-phased. Perl is both compiled and interpreted. It is compiled because the

program is completely read and parsed before the first statement is executed. It is an interpreted

because there is no object code sitting around filling up disk space. In some way, its the best of

both world, typically a on-the-fly compiler-interpreter process.

While the compilation does take time -- its inefficient to have a voluminous Perl program that

does one small quick task and then exits, because the runtime for the program will be dwarfed by

the compiler time -- it is more efficient for heavy tasks such as those with a large body of loop.

Compilation will save the time for reparsing. That is why another directly interpreted scripting

language, Tcl is switching to the on-the-fly bytecode compiler-interpreted style like Perl. To takemore advantage of this style of interpreted languages. A caching of the compiled object code

between invocations is used by both Perl and Tcl. We will address Tcls on-the-fly bytecode com-

pilation in case study of Tcl.

The on-the-fly compilation of Tcl or Perl is different from Javas on-the-fly compilation, since

they are on-the-fly bytecode compilation. The whole compiler-interpreter compilation process

happens at run-time; the target of the on-the-fly compilation is bytecode that will be cached in


11/22



memory and then be interpreted dynamically.

While all scripting languages are interpreted, not all interpreted languages are scripting lan-

guages. An example is the popular language: PostScript, which is a page description language.

PostScript language is typically interpreted, stack-based. The stack-based feature make the source

code of PostScript natural to be interpreted and portable. This feature makes PostScript device

independent, meaning that the image is described without reference to any specific device fea-

tures. So, PostScript files in their source code can be transferred from machine to machine even by

email in ASCII form and then be interpreted by interpreter such as ghostview and those pluggedin printers without any modification.

5. Case Study

a) Java

Overview

Java is a simple, familiar to user, Object-Oriented language. That is because Java takes the syntax

very similar to C and C++ while it is a cleaned-up version of C++. It supports garbage collection

removed off a bunch of features in C and C++ that make C and C++ complex, such as: pointers,

automatic coercions, operator overloading and multiple inheritance, etc.

Other important aspects for Javas success are its internet-related features, as in the following:

Dynamic: In Java, classes are linked only as needed. New code modules can be linked in on

demand from a variety of sources, even from sources across a network. Instead of simply

downloading static pages of texts and images, Javas applets can be download through webbrowser and run in the client machine. This support the image animation and real-time user-

program interaction.

Threaded: Modern network-based applications, such as the HotJava Web browser, typically

need to do several things at the same time. A user can run several animations concurrently

while downloading an image and scrolling the page. Javas multithreading capability provides

the means to support this feature. [11]

The reason why Java is a popular mobile language is that it is architecture neutral and portable.

To accommodate the diversity of operating environments, the Java compiler generated bytecode--an architecture neutral intermediate format designed to transport code efficiently to multiple hard-

ware and software platforms. The interpreted nature of Java solves both binary distribution prob-

lem and the version problem; the same Java language byte codes will run on any platform.

Javas portability also relies on its basic data types and the behavior of its arithmetic operator.

This makes programs the same on every platform. There are no data type incompatibilities across

hardware and software architectures.


12/22



The self-compilation feature of Java is also a factor that makes Java more portable. Javas com-

piler is written in Java and exists as Java bytecode. Furthermore, Java API and HotJava browser

all exist as bytecode. Finally, in Java system, only the interpreter is left to be run-time system

dependent.

Java Virtual Machine

The architecture-neutral and portable platform of Java is the Java Virtual Machine. Its the specifi-

cation of an abstract machine for which Java compiler can generate code. Specific implementa-

tions of the Java Virtual Machine for specific hardware and software platforms then provide the

concrete realization of the virtual machine. The Java Virtual Machine is based primarily on the

POSIX interface specification -- an industry-standard definition of a portable system interface.

Implementing the Java Virtual Machine on new architectures is a relatively straightforward task as

long as the target platform meets the basic requirements such as support for multithreading. [11]

Java VM is called A soft-CPU. It is a stack-based machine. JVM supports about 248 bytecodes,each performs a basic CPU operation like adding an integer to a register, combining the numbers

in two registers, jumping to subroutines, storing a result, incrementing or decrementing registers,

etc. In effect, JVM is a stacked arithmetic logic unit with local and global variables.

To add two numbers, the VM actually works as follows: the VM first pushes them onto its stack,

then adds them. After completing the addition, the VM leaves the results on the stack for the next

step in the process. To emulate this in a real machine, most probably a register-based machine, it

takes quite a few real machine instructions and memory references. So, there is overhead for the

transportation from stack-based VM to register-based real machine. We will address this further in

section 6 of this paper where addressing Performance.

At the beginning, Java Virtual Machine is the target machine just for Java source language.

Recently, people are trying to support other languages on top of the same Java Virtual Machine.

According to Javas creator James Gosling, languages like Visual Basic, COBOL, Dylan and

Scheme are fairly reasonable bet for the Java VM. So, although JVM was not designed as a

generic virtual machine, it is now intend to serve as one for existing requirements. [28]

Java Language Construct and Javas Interpreter

A Java programmer can create:

Applets: Programs that are included in HTML pages through the APP tag and displayed inthe HotJava browser. The simple hello world program shown in A Simple Java Program is

an applet. The HotJava browser is invoked by the hotjava command included in the Java code

distribution.

Applications: The stand-alone program written in Java and executed independently is the

HotJava browser. This is done using the Java interpreter--java, included in the Java code distri-

bution.


13/22



Protocol handlers: Programs that are loaded into the users HotJava browser and interpreter

protocol. These protocols include standard ones such as HTTP or programmer-defined proto-

cols.

Content handlers: A program loaded into the users HotJava browser, which interprets files

of a type defined by the Java programmer. The Java programmer provides the necessary code

for the users HotJava browser to display/interpret this special format.

Native methods: Methods that are declared in a Java class but implemented in C. These

native methods essentially allow a Java programmer to access C code from Java. [10]

There is another tools in JDK called AppletViewer for testing and running applets. AppletViewer

also has Java interpreter, java, embedded.

Java Interpreter is plugged into every Java-enabled web browser. Here is a practical way to under-

stand the technical description of Java by looking at the processes that occur when a user with aJava-enabled browser requests a page containing a Java applet:

1. The user sends a request for an HTML document to the information providers server.

2. The HTML document is returned to the users browser. The document contains the APP tag,

which identifies the applet.

3. The corresponding applet bytecode is transferred to the users host. This bytecode had been

previously created by the Java compiler using the Java source code for that applet.

4. The Java-enabled browser on the users host interprets the bytecode and provides the display.

5. The user may have further interaction with the applet but with no further downloading from

the providers Web server. This is because the bytecode contains all the information necessary

to interpret the applet.

b) Tcl/Tk

Overview

Tcl stands for Tool Command Language, which is an extensible embedded command language or

a scripting language, implemented by John Ousterhout originally from University of California,Berkeley, now working for Sun.

What makes Tcl different from other scripting languages is the ability of easily adding a Tcl

interpreter to applications. A Tcl interpreter consists of a set of commands, a set of variable

bindings and a command execution state. It is the basic unit manipulated by most of the Tcl

library procedures. Applications may have one or more interpreters according to their complexity

respectively. Multiple interpreters may responsible for different purposes. Tcl commands may be


14/22



built-in commands such as those flow control key words: for, if, case, eval, etc. or may be appli-

cation-specific commands defined by users. The application-specific commands have no limit to

be extended up to the developer and user group. Since programmers can structure their applica-

tions using a set of primitive operations as well as any existing commands together with any new

command(s) developed by themselves to best suit their need, there is no need to invent a com-

mand language for new application. All commands are embedded in Tcl code via creating inter-

preter object(s) inside the application by calling library procedures, similar to defining an extern

function in C. That is natural for Tcl to create an interpreter inside the Tcl source code since an

interpreter is equally a set of commands. Unlike other languages, such as Java, where an inter-

preter is a separate executable even though the execution of the interpreter costs memory and CPU

time

concurrently with interpreting the bytecode.

The aspects that set Tcl apart form other extension languages, such as Scheme, Elisp and Python

are: (1). Tcl has simple constructs somewhat like C and Tcl primitives are written in C or C++

procedures. (2). Tcl C library provides a clean interface to native C code. (3). Most extensions

include new functionality such as socket access for network programming, database access, tele-phone control and expected interactive features. (4) Tcl is open to be developed by its community.

[21]

The most notable extension of Tcl is Tk, a toolkit for X windows as well as for windows and Mac.

Tk provides a convenient way for user to build Motif-based GUIs because of its higher-level inter-

face to X and its rapid turnaround in development.

Safe-Tcl is a subset of Tcl where access to system resource is controlled. With something secure,

Safe-Tcl is for running network agents. With the combination of Tk and Safe-Tcl, a web browser

called TkWWW is now available for free. [12]

On-the-fly Bytecode compiler for Tcl [19]

Although Tcl has bunch of advantages as a new scripting language. Its lack of structure and slow-

ness make it not good for large applications. To improve Tcls performance, people in Sun Micro-

systems Laboratories are working on an on-the-fly bytecode compiler for Tcl. Below are some

direct quotation from the paper An On-the-fly Bytecode Compiler for Tcl by Brian T. lewis of

that lab:

So far Tcl is interpreted directly. Although the current Tcl interpreter is fast enough for most Tcl

uses, there are many applications that need greater speed. The two main performance problems incurrent Tcl system (Tcl 7.5) are script reparsing and conversions between strings and other data

representations. The current interpreter spends as much as 50% of its time in parsing. it reparses

the body of a loop, for example, on each iteration. Data conversions also consume a great deal of

time. It is reported that 92% of the time in incrs command procedure Tcl_incrCmd() was spent

converting between strings and integers.

To solve these performance problems, a new Tcl compiler and interpreter are being developed at


15/22



Sun Microsystems Laboratories. Their goal for the bytecode compiler is to improve the speed for

compute intensive Tcl scripts by a factor of 10.

The compiler translates Tcl scripts at program runtime, or on-the-fly, into a sequence of bytecode

instructions that are then interpreted. The compiler eliminates most runtime script parsing. It also

makes many decisions at compiler time that are made now only at runtime. It can tell, for exam-

ple, whether a variable name refers to a scalar or an array element. It also compiles away many

type conversions. As an example, it can recognize whether the argument string specifying the

increment amount in an incr command represents a constant integer.

The bytecode interpreter uses dual-ported objects extensively. These objects contain both a

string and an internal representation appropriate for some data type. For example, a Tcl list is now

represented as an object that holds the lists string representation as well as an array of pointers to

the objects for each list element, dual-port objects avoid most runtime type conversions. they also

improve the speed of many operations since an appropriate representation is available. The com-

piler itself uses dual-ported objects to cache the bytecode resulting from the compilation of each

script.

c) Both are Web Programming Languages

As we mentioned in a) of this section, Java is an Internet-Oriented language. Tcl/Tk is also closely

related to Web programming. Sun has recently released a Tcl/Tk plug-in for NetScape Navigator.

It allows Web pages to contain Tcl/Tk scripts and display interfaces in the browser window. The

plug-in used the Safe-Tcl mechanism to ensure that even untrusted script can be executed safely.

So whats the difference between Java and Tcl/Tk?

Tcl is a high-level scripting language. It is good for creating small and medium-sized applicationsquickly and gluing existing things together. It has a simple syntax and almost no structure, which

makes it good for scripting. However, at least so far, Tcl is an directly interpreted language so it

may not perform well for very large tasks. Think of Tcl as something like UNIX shell, except that

it is embeddable and portable and can be used for Internet scripting, including CGI implementa-

tion.

Java, on the other hand, is a system programming language like C or C++. it is much more struc-

tures than Tcl. This makes Java easier to build large complex application than Tcl. Java is also

compiled, which results in great efficiency. Java also supports multi-threading, whereas Tcl does

not. Think of Java as something like C++ except simpler and more powerful and with facilities for

sending Java programs around the Internet as executable content. [20]

Since both Java and Tcl are properties of Sun and both are web programming languages, people

are thinking of a marriage of Java and Tk, using Tk as the GUI building part of Java. It is said that

Sun has a early version of a Tcl-to-Java interface.

d) Another Mobile Language: Omniware


16/22



Mobile language is pretty popular recently. It denotes those languages that can be easily ported

and widely run on many nodes of the network. Since any programming language can be a web

programming language and does not have to be portable, it is better to call Java, Perl and Tcl

mobile languages.

Another notable mobile language in our reports point of view is Omniware.

Omniware is an interpreted language with two-phase compiler-interpreter process. It defines a vir-

tual machine called OmniVM.

The advantages of Omniware are:

1. OmniVM is a register-based virtual machine, and thus, it is closer to most real machine. So,

the transportation from OmniVM to real machines is a shorter and lighter process than from

Java Virtual Machine which is stack-based.

2. The design of OmniVM has all languages with C/C++ constructs in mind. So it can be thecompiler targets of C/C++ and many others. In this case, Omniware serves somewhat a

generic virtual machine.

Omniware uses a technique called Software-based Fault Isolation which adds instructions to

check at runtime that addresses are within legal address space to provide security, but as many

other mobile languages, access to hosts system resource still remain a big problem in Omniware.

[12]

6. Why Interpreted Languages?

Now the hottest languages such as Java, Tcl/Tk and Perl are all interpreted languages. Why? Animportance reason we think is they are all closely related with Internet. To be Internet-Oriented,

the most importance feature of the language is portable. It has to operate in distributed environ-

ment, which means that security is of paramount importance. Interpreted languages have advan-

tage to support both these features.

Portability

A program is portable if the effort required for its transport is much less than the effort required

for its initial implementation and if its initial qualities remains the same after the transport. The

portability of a program can be evaluated by measuring the transport effort. For example, if I is the

work involved in initial implementation, and T is the work involved in transport, then the pro-grams portability can be evaluated as: (I-T)/I. Hence any program can be mathematically deter-

mined to be 100 percent portable, which means that there is no transport effort involved, but this is

impossible. [2]

A mechanism that support software portability thus is the mechanism that can reduce efforts in

software transportation. Some significant this kind of mechanisms are:


17/22



A compiler generates intermediate code that is independent of the target computer. If the com-

piler is self-compilation, itself is also portable. This is typically a compiler-interpreter

mechanism with typical example as Pascal-P, Snobol4 and Java.

A compiler can also be divided into two parts, the front end depending on the source language

and the back end on the object language which in turn depends on the target machine. The

interface between these two parts, if well designed, can be independent of both languages. A

on-going study of generic virtual machine focuses on this mechanism. The mobile language

Omniware mentioned above is a nice try in this category.

Isolating those platform dependent parts of software, then using configuration tools such as

imake to enable code to be compiled and installed on different platforms.

The first two mechanisms are typically realized by interpreted languages with virtual machine.

The virtual machines of interpreted languages are the platforms for architecture neutral and porta-

ble languages. In this case, Java and Omniware are the typical examples.

Security

Part of Javas security mechanism comes from its language design policy: simplicity. It excludes

many dangerous features in C++, such as pointer, with which programmer could directly manipu-

late memory by accidents. And at the same time, Java provide automatic garbage collection. But

the more important security mechanisms comes from its compiler-interpreter nature mentioned

above.

The compiler-interpreter mechanism with bytecode provides several levels of security defense for

Java. The first level is provided by the extensive compile-time checking. A trustworthy compiler

ensures that Java source code does not violate the safety rules. The second level is provided bybytecode verifier. This happens in the run time. Java just does not trust any applet coming from

anywhere of the internet, and the bytecode verifier has to ensure that the code passed to Java inter-

preter is in a fit state to be executed and can run without breaking Java interpreter. The third level

defense is done by the class loader. The class loader dynamically partition each network class

source into its own private namespace and then prevents classes in one namespace from polluting

other namespace. [13]

While Javas security is mainly provided by its compiler-interpreted mechanism, Tcls security is

provided by Safe-Tcl.

Safe-Tcl is a mechanism that initializes a Tcl interpreter to a safe subset of Tcl commands so thatTcl scripts cannot harm their hosting machine or application. There are also mechanisms to grant

privileges to a safe interpreter so the script can do non-trivial things.

So the basic approach to ensuring safety is to first completely remove the file command from safe

interpreters and then replaced with command aliases. The NetScape Tcl plug-in supports Tcl/Tk

applets, also called Tclets. The Tcl plug-in implements the standard Safe-Tcl subset, plus a lim-

ited version of Tk.


18/22



Command aliases are the primary mechanism provided by Safe-Tcl to grant privileges. An alias is

a command in the untrusted interpreter that is really implemented by a different, fully trusted

interpreter. This is much like the user-mode and kernel-modes in multiuser operating systems. In

Safe-Tcl, an untrusted script is isolated in its interpreter context, and given a few extra commands

that are carefully implemented by another Tcl interpreters to ensure safety.

Reusability

Scripting languages as interpreted languages typically provide glue for commands. A shared, uni-

versal scripting language like Tcl serves as a powerful and flexible glue for assembling reusable

components.

Tcl is a reusable command language because almost everything in this language is a command,

from the Flow Control: for, if, case, continue, etc. to Variables and Procedures: global, proc,return, set. These built-in commands provide programmability and extensibility for free. Users

of Tcl will feel free to develop any application-specific commands similar to those UNIX com-mands to UNIX shell. And these commands will appear the same as the built-in commands in Tcl.

The most important design goal of Tcl is reusability. Thus it is component-approached. rather

than building a new application as self-contained monolith with hundreds of thousands of lines of

code, Tcl is a combination of many smaller reusable components. Each component would be

small enough to be implemented by a small group, and interesting applications could be created

by assembling existing components. [17]

Rapid Development

Reusability provides a way for rapid development of software application. The scripting or inter-preted nature of interpreted languages are obvious good for rapid development. Instead of the

heavyweight compiler, link, crush, debug cycles, interpreted languages can be interpreted directly

and are easier to trace whats happening in the interpreting processes.

Performance

Currently, Java runs about 30 times slower than an equivalent C program. This seems not very bad

considering those advantages Java has. Actually, performance is always a consideration of Javas

designer. They thought they have achieved a superior performance by adopting a scheme by

which the interpreter can run at full speed without needing to check the runtime environment.

Also, the automatic garbage collection runs as a low-priority background thread, ensuring a highprobability that memory is available when required, leading to better performance. Whats more,

Sun have also been improving performance by providing just in time compilation of the byte-

code into native code. Applications requiring large amounts of computer power can be designed

such that compute-intensive sections can be rewritten in native machine code as required and

interfaced with the Java platform.

In general, Javas interactive applications respond quickly even though they are interpreted. But


19/22



the efforts to improve performance will never get to an end. The current performance of Java still

cannot meet the needs of a category of applications.

Typically, an interpreted language has relatively low performance because of the overhead for

fetching and decoding each virtual command or virtual instruction before performing the work

specified by the commands. Most virtual machines at present are stack-based while most real

machine are register-based. Interpreting the intermediate code to emulate corresponding virtual

machine on a real machine thus is a heavier process compared with the situation where the virtual

machine and the real machine have similar structure, either both stack-based or both register-

based.

In Java, interpreting consists of token threadings. Each token threading is for one bytecode execu-

tion. A token threading requires about three instruction and five memory references. And each vir-

tual instruction required several real machine instruction. For example, executing an integer add

(IADD) of JVM on most general-purpose processor-Sparc, 80x86, 680X0, PowerPC, ARM and

MIPS-requires at least seven conventional processor instructions when using a C source code

interpreter.

To improve performance, a just-in-time compilation technique has been applied to Java which

translates Java bytecode into instructions for the host processor at runtime. This technique does

improve Javas performance by several times. Since native code compilers (or code generators)

are usually complex software which cost both memory and execution time. This JIT compilation

uses a less aggressive optimization which just translate each byte-code to in-line machine code or

keep the top of the stack in a register.

The performance improvement by JIT compilation is limited and it compromises with memory

cost. There are arguments that the most efficient execution vehicle for many Java applications

would be a dedicated Java chip which directly executes the Bytecode. Sun is now building apicoJava chip which is a microcontroller intended to directly execute Java Bytecodes. It is a

simple, stack-based processor. Rather than being a pure stack architecture, the machine would

have specific hardware features for dealing with Bytecode and other hardware feathers to fit gar-

bage collection, object-oriented, multithreading nature of Java.

Now forget those rare-existed and newly-designed stack-based real machines, and lets talk about

just stack-based virtual machines on register-based real machines.

As mentioned above, the execution time of an interpreted program depends on the number of

commands interpreted, the fetching and decoding cost of each command, and the time spent actu-

ally executing the operation specified by the commands. Since the number of commands requiredto accomplish a given task depends on the level of the virtual machine of the language, the perfor-

mance of a interpreted language mainly depends on the level of the virtual machine defined for

that language. A simple virtual machine might require the execution of a large number of com-

mands, like Java. But the overhead of each virtual command is small and nearly fixed. In contrast,

Perl and Tcl each define complex virtual machines and result in non-uniform slowdowns relative

to the C implementations even their virtual machine can execute a given program in fewer com-

mands. [26]


20/22



As we mentioned in section 5. b), An On-the fly Bytecode Compiler for Tcl is being implemented

to improve Tcls performance. And caching those compiled bytecode will be very helpful in

improving the performance of the interpreted languages such as Perl and Tcl. [24]

Other Advantages of Interpreted Languages [27]

Type of a variable could change dynamically during execution

Compiling efficient code to handle a dynamic typing where type of a variable could change during

execution time is hard as the type of a variable is not known at compile time. While an interpreter

could handle this situation easily and efficiently.

An interpreter can be very good for debugging

The interpreter can access the source program in its original form or in an internal form at any

time. It also keeps holding a symbol table containing variable names and values. So, programmerscan get diagnostic information in easily understandable forms.

7. Summary

Most of interpreted languages have Virtual Machine, either explicitly defined such as Javas vir-

tual machine, or implicitly defined such as Tcls and Perls. Some simple scripting languages such

as UNIX shell PLs are interpreted directly and do not have virtual machine.

Virtual Machines play a great role in interpreted programming languages. With the assistance of

Virtual Machine, the compiler-interpreter mechanism provides portability, security and better per-

formance for interpreted PLs.

Scripting languages as interpreted languages are good for gluing programming components.

When the group of components are open for extension, such as Tcls commands, built-in plus

application-specific commands, the language can provide great reusability.

The development processes of interpreted languages are relatively lightweight compared with the

compile-link-test cycles in a traditional compiled language. So, interpreted languages are good for

rapid development.

Acknowledgment

We would like to appreciate our Professor Benjamin Zorn for guidance of the topics in this paper.

We believe that without his help we would have been still in a maze.

.

References

[1]. Wirth N. From Programming Language Design to Computer Construction, ACM, February

1985, Vol 28, No. 2


21/22



[2]. Lecarm O., Cart M. P., Gart M. Software Portability, McGraw-Hill Publishing Company,

1989

[3]. Kamin S. N. Programming Languages: an Interpreter-Based Approach, Addison-Wesley Pub.

Co., 1990

[4]. Pembereton S., Daniels M. Pascal Implementation: The P4 Compiler and Interpreter, ISBN:

0-13-653-0311

[5]. Newsgroup: comp.lang.smalltalk

[6]. Byrne S. B. GNU Smalltalk Users Guide, http://www.cs.utah.edu/csinfo/texinfo/mst/

mst_toc.html

[7]. Goldberg, Robson, Smalltalk-80: The Language and Its Implementation, Addison Wesley,

1983, ISBN 0-201-11371-6

[8]. Sun Microsystems. The Java Virtual Machine Specification. http://java.sun.com/doc/vmspec/

html/vmspecl.html, 1995

[9]. Gosling, J, Java Intermediate Bytecodes, ACM SIGPLAN Workshop on Intermediate Repre-

sentation, Jan. 1995

[10]. Sun JavaSoft: Getting Started: The Java Developers Kit

[11]. Sun JavaSoft: Design Goals of Java 1.2

[12]. Caron J. Java: Status Report and Language Overview, CSCI 5535 Project, Dec. 1995, Uni-

versity of Colorado at Boulder

[13]. Wang W, An Y, Zang L, Security --- How is it implemented in the Java language?, CSCI 5535

Project, Dec. 1995, University of Colorado at Boulder

[14]. Sun JavaSoft: A Look Inside the Java Platform

[15]. Sun JavaSoft: The Java language Environment, a White paper

[16]. Abelson, H. and Sussman, G.J. Structure and Interpretation of Computer Programs, MITPress, Cambridge, MA, 1985

[17]. Ousterhout J. K. Tcl and Tk Toolkit, Addison-Wesley, ISBN 0-201-63337-X

[18]. Ousterhout J. K. Tcl: An Embeddable Command language, USENIX Conference Proceed-

ings, 1990


22/22


[19]. Lewis B. T. An On-the-fly Bytecode Compiler for Tcl. http://www.sunlabs.com/people/

brian.lewis/

[20]. Ousterhout. J. K. Whats Happening at Sun Labs. http://www.sunlabs.com/research/tcl/

team.html, April 1996

[21]. Welch B. Practical Programming in Tcl and Tk, Prentice-Hall, 1995, ISBN 0-13-182007-9

[22]. newsgroup: comp.lang.tcl

[23]. Ousterhout. J. K. An Introduction To Tcl Scripting, http://www.sunlabs.com/people/

john.ousterhout/

[24]. Schwartz R. L. Learning Perl, OReilly & Associates, Inc. 1993

[25]. Perl Documentation, http://www.csc.tntech.edu/docs/perl.html

[26]. Romer, T. H. Lee D. etc. The structure and Performance of Interpreters, ACM, Oct.1996

[27]. Watson, D High-level Languages and Their Compilers, Addison-Wesley Publishing Com-

pany, 1989

[28]. Gosling on Java, DATAMATION, March 1, 1996

CSCI 5535 Course Project -- A Report on Interpreted Programming Languages

Documents