1 Programming Language Concepts Axel-Tobias Schreiner Department of Computer Science Rochester Institute of Technology This volume contains copies of the overhead slides used in class. This information is available online as part of the World Wide Web; it contains hypertext references to itself and to the documentation for various programming languages in the Web. The example programs are included into this text from the original sources. So that it may be viewed on other platforms, this text also exists as a PDF document. With the Acrobat Reader from Adobe the text can be printed on Windows systems. The text is not a complete transcript of the lectures. A rudimentary knowledge of some programming languages is assumed; for self study one would have to consult introductory books on programming, programming languages, and compiler construction. A major part of this course were repor ts on n umerous prog r amming languages given by students followed by prog r amming assignments in the more prominent of these languages . During the presentations common and unique concepts of different languages were related and contrasted to each other. The assignments tried to exhibit simple but typical uses of each language through progressively more difcult problems. Contents 0 Introduction 1 1 Timeline 3 2 Rosetta Stone 5 3 bc 11 4 XML 17 5 Little Smalltalk Cribsheet 23 6 Syntax Analysis 25 7 Pure Lisp 31
38
Embed
Programming Language Concepts - Department of Computer Science
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Programming Language ConceptsDepartment of Computer Science
Rochester Institute of Technology
This volume contains copies of the overhead slides used in class.
This information is available online as part of the World Wide Web;
it contains hypertext references to itself and to the documentation
for various programming languages in the Web. The example programs
are included into this text from the original sources.
So that it may be viewed on other platforms, this text also exists
as a PDF document. With the Acrobat Reader from Adobe the text can
be printed on Windows systems.
The text is not a complete transcript of the lectures. A
rudimentary knowledge of some programming languages is assumed; for
self study one would have to consult introductory books on
programming, programming languages, and compiler
construction.
A major part of this course were reports on numerous programming
languages given by students followed by programming assignments in
the more prominent of these languages. During the presentations
common and unique concepts of different languages were related and
contrasted to each other. The assignments tried to exhibit simple
but typical uses of each language through progressively more
difÞcult problems.
Contents 0 Introduction 1 1 Timeline 3 2 Rosetta Stone 5 3 bc 11 4
XML 17 5 Little Smalltalk Cribsheet 23 6 Syntax Analysis 25 7 Pure
Lisp 31
Adobe
Framemaker,
Distiller.
OmniGraffle is used for the drawings. The slides are available in
the World Wide Web.
Today there are lots of programming languages and even more books
about them. Some books deal with single languages, others compare
languages or discuss the history of language development. Here are
typical ones:
There is also a useful Web site:
http://dmoz.org/Computers/Programming/Languages/.
Flanagan 1-56592-487-8 Java in a Nutshell (3rd Edition)
Kernighan/Ritchie 3-446-15497-3 The C Programming Language
Schreiner/Friedman Compiler Construction Wexelblatt History Wirth
Pascal Wirth Modula Wirth Oberon
1 Timeline
Here is a brief list of programming languages that are signiÞcant
for one reason or another and that are/were used by many people or
for a long time:
around 1960 Algol 60 IFIP, block structure, recursion, BNF syntax
dead COBOL business data processing legacy COMIT MIT, string
processing Snobol Fortran IBM, workhorse for numerical problems
updated Lisp McCarthy, workhorse for artiÞcial intelligence
updated
around 1965 APL Iverson, numerical mathematics J Basic Dartmouth,
interactive programming updated PL/I IBM, attempt at a universal
language dead Simula Dahl et al., simulation, class concept
SmallTalk Snobol Griswold, string processing Icon
around 1970 Algol 68 very formally deÞned, updated successor to
Algol 60 stillborn C Ritchie et al., workhorse for systems
programming Pascal Wirths Þrst for academia Delphi
around 1975 awk Aho et al., scripting, report generation Perl bc
Unix, high precision interactive calculator m4 Unix, macro
processor Prolog Clocksin et al.,declarative programming
dying
around 1980 Icon Griswolds successor to Snobol exotic Modula Wirths
Þrst for modular programming in academia exotic Objective C Cox,
NeXT, C with object orientation MacOS X SGML Goldfarb et al.,
document markup XML sh Bourne et al., Unix command languages
SmallTalk 80 PARC, proverbial object orientation
around 1985 Ada DoD, designed through requirements updated C++
Stroustrup, C with object orientation dying
around 1990 Oberon Wirths best C ever exotic
around 1995 Java Gosling et al., selection from C++ and Objective
C
around 2000 C# Microsofts own brand of Java, common runtime Haskell
Thompson, functional programming XML W3C, structured data
representation XSLT W3C, functional language for XML
manipulation
4
5
2 Rosetta Stone
In the early 1970s Bill Wulf suggested to create a Rosetta Stone
for programming languages. He had a serious comparison in mind;
http://internet.ls-la.net/mirrors/99bottles/ is much less so. The
following tables are derived from Wulfs project. They serve as a
starting point for comparing programming languages from a lexical
and syntactic (looks) and semantic (meaning) point of view.
2.1 Lexical Aspects
Comments
Delimeters
Keywords
Names Names tend to consist of letters and digits, but what is a
letter or a digit?
punched cards, some columns special COBOL, Fortran line oriented
Basic, machine languages (assembler) free format most current
languages white space ignored Basic, Fortran
part of executing program APL, Basic positional on line Assembler
positional within phrases Algol, between parameters like a
statement
<!-- comment -->
" comment "
-- comment
end of line awk (mostly), Basic, Fortran semicolon as separator
Pascal semicolon as terminator C
distinguished by position Basic distinguished by context Fortran,
PL/I distinguished by typeface Algol stropping
.begin end
reserved Pascal
Scalars
Aggregates
2.3 Literals A literal is a self-deÞning constant, i.e., the
textual appearance must indicate type and value.
Is there a literal for every data type?
Scalars
Aggregates
fixed decimal(2)
^integer
procedure
[] of integer
record ... end
file of record ... end
integer, various bases
1.0 1e2 1.0 1e2
7
2.4 Variables A variable can be viewed as a binding between a name
and a value, which in many programming languages can be changed
while a program is executed.
A constant can be viewed as a variable where the binding cannot be
changed.
Are there variables/constants for every data type supported by the
language?
x := x + 1;
This typical assignment statement shows that the meaning of a
variable name usually depends on context: it delivers a typed value
(r-value) or it is the target of rebinding (l-value).
How does the language decide on the type of a variable?
2.5 Subroutines A procedure is a name for a sequence of statements.
A function is a name for a sequence of statements which computes a
value.
Which does the language have?
Properties Which data types can a function deliver as a
result?
Is recursion allowed, forbidden but detected, or simply
forbidden?
Can global variables be changed by a subroutine?
Can subroutines be programmed in other languages?
no typing each operation interprets a value usually in machine
languages
strong typing variables have predeclared attributes usually in
compiled languages
dynamic typing (re)bound value determines current type often in
interactive languages
8
Parameters Can subroutines have parameters?
Which data types can be used for parameters and for
arguments?
Which conversions are implicit, if any?
How are arguments and parameters related:
Does caller or subroutine decide how a parameter is passed?
2.6 Scope Where are variable names and other user-deÞned names
visible in a program?
Do (all?) names have to be declared before they are used?
2.7 Lifetime How long is a value available?
by value argument value is copied by value return parameter value
is copied back by address parameter aliases any argument by
reference parameter aliases only variable as argument by name
effectively, text of argument is used in place of parameter
ßat Basic local/global awk nested Algol modular Modula,
export/
import
as long as program C
static as long as subroutine is active Algol as long as
referencable Lisp as long as program decides C
malloc/
free
9
break,
continue,
on error begin ... end
loop ... exitif ... end
dependent on range
fork() synchronization
3 bc
bc (basic calculator) originally appeared with Unix. It is a simple
programming language with high precision arithmetic. It was
originally implemented as a frontend for
dc (desk calculator) which implemented a stack machine. POSIX
bc is less elaborate than GNU bc.
bc is powerful enough to implement mathematical functions: sine
s(x), cosine c(x), arctangent a(x), natural logarithm l(x),
exponential e(x), and bessel function j(x,y).
3.1 Lexical Aspects
Names Single lowercase ASCII letters (or alphanumeric as an
extension).
newline is statement separator and it starts execution
12; 34; 56 12 34 56
asymmetrically marked /* comment */
semicolon as separator [there is no else]
reserved x=7 if=12 (standard_in) 2: parse error x 7
12
ibase and obase control input and output, scale controls
computation.
Numbers are converted to decimal and have arbitrary precision.
length is the total number of signiÞcant digits, scale is the
number of decimal digits to the right of the decimal point.
y=1.001 length(y) 4 scale(y) 3 x = 0.001 length(x) 3
Strings can only be used as literals.
Arrays are managed dynamically and contain numbers.
a[7] = 3; x = 7; a[0]; a[x] 0 3
Numbers as sequences of digits and uppercase letters A through F,
with or without decimal point, interpreted in ibase (between 2 and
16), printed in obase.
ibase=2; 110 6 ibase=1010; 81 81 ibase=2; obase=10000; 11111
1F
Single digits deÞne themselves, irrespective of ibase. Too-large
digits yield ibase-1.
ibase=2; F 15 1234 15
Strings as sequences of characters within doublequotes.
"hello " hello
Recursion is allowed.
Parameters
bc has functions that can be used as procedures. DeÞnitions cannot
be nested.
define f(n) { if (n > 1) return (n*f(n-1)) return (1) } f(10)
3628800 define b(n,k) { return (f(n)/f(k)/f(n-k)) } b(4,2) 6
There are local variables, initialized with zero. Nested function
calls can access them. Globals can be changed.
define a() { auto a, b b() a; b } define b() { auto a ( a=3 ) ( b=4
) } a() 3 assignment to b’s a 4 assignment to a’s b 0 result of b()
0 value of a’s a 4 value of a’s b 0 result of a()
Functions have parameters which must be exactly matched. Arguments
are numbers and are passed by value.
a = 4; a 4 define b(a) { ( a = 5 ) } b(a); a 5 assignment to b’s a
0 result of b() 4 global a
14
3.6 Scope There are 26 names, but they can be used three times: for
scalar variables, for array variables, and for functions.
3.7 Lifetime Globals live as long as the program, locals and
parameters as long as the active function call.
A function can declare an auto list of more names. A called
function searches outward for the callers auto names or for global
names. Variables are initialized auto names.
define a(b) { auto a b() } define b() { ( a = b ) } a(5); a 5
assignment to a’s a with a’s b 0 result of b() 0 result of a(5) 0
global a is not affected b=3; b(); a 3 assignment to global a with
b 0 result of b() 3 global a is affected
15
Relational operators are only allowed singly in statements.
Simple Statements
negative -a
scale=3; a=2; a^3; a^-3 8 .125
from left multiplication, etc. tricky scale for remainder
a*b; a/b; a%b
from left addition, etc. a+b; a-b
from right assignment a=b; a+=b
expression statement, printed ( a = 3 )
assignment statement, not printed a = 3
string, printed "hello"
compound { a; b }
stop (immediately) quit
optional part if ( a <= b ) "hello"
dependent on precondition define e(x,y) { while (x != y) { if (x
> y) x -= y if (x < y) y -= x } return (x) } e(36,54)
18
dependent on range i=4 for (j=0; j<=i; ++ j) b(i,j) 1 4 6 4
1
16
17
4 XML
XML is a slightly modernized version of SGML which was originally
intended to logically markup text for printing. XML is now mostly
used to represent structured data for transmission or storage a
well-formed XML-based document has a tree structure.
XSL is a modernized version of DSSSL which was originally intended
to manipulate and in particular print SGML-based documents. XSLT,
part of XSL, is a functional language, mostly for tree
transformations. XSL includes a speciÞcation for formatting
objects, into which XML- based documents can be transformed with
XSLT for high-quality printing. XSL and XSLT programs are speciÞed
as XML-based documents.
4.1 XML (simpliÞed) A well-formed XML-based document consists of a
header and one element which may contain other elements.
xml/hello.xml
<?xml version="1.0" encoding="UTF-8"?> <!-- Hello, World
--> <hello>Hello, World!</hello>
Elements consist of a start tag and an end tag with matching names.
They must be properly nested no overlap and can also contain
text.
An empty element consists of a tag marked with / at the end:
<empty/>
The tag of an empty element or a start tag can have attributes
following the tag name. An attribute consists of a key and a value,
the value is enclosed in single or double quotes:
<greeting text="hello" terminator=!/>
Lexical aspects XML is free format; whitespace is signiÞcant as
part of text or an attribute value.
Tag and attribute names are case-sensitive (unlike HTML) and
consist of letters and digits and some other characters.
Comments are special elements, cannot contain --, and cannot appear
everywhere.
There is a text replacement mechanism (entities) but is is not
cleanly separated from the language.
In particular, & ' > < and " must be used
because, e.g., attribute values cannot contain < > delimiting
quotes and &. Furthermore, &#dd; and &#xhh; can be used
to specify any character by its position in the Unicode character
set.
Checking There are many XML parsers. rxp is particularly fast for
checking:
$ rxp -o b hello.xml At 39: comment: Hello, World At 61: start:
hello At 68: pcdata: Hello, World! At 81: end: hello At 119:
EOF
Suns Java API for XML Processing (JAXP) are factory classes to
access different parsers. The package includes an XML parser
(crimson) and an XSLT transformer (xalan). The following main
program can be used to check an XML-based document:
xml/WellFormed.java
$ javac -classpath .:jaxp.jar:crimson.jar WellFormed.java $ java
-classpath .:jaxp.jar:crimson.jar WellFormed < hello.xml
Validation Sometimes it is desirable to describe what elements a
document may use, how they may be nested, and what attributes they
may use, etc.
A valid XML-based document is well-formed and conforms to a
Document Type DeÞnition (DTD), which is either referenced from the
document
xml/pets.xml
...
xml/people.xml
<!DOCTYPE people [ <!ELEMENT people (person*)>
<!ELEMENT person (first,last)> <!ATTLIST person pid ID
#REQUIRED boss IDREF #IMPLIED sex (male|female) #REQUIRED >
<!ELEMENT first (#PCDATA)> <!ELEMENT last (#PCDATA)>
]>
<people> <person pid="i4711" boss="i4712" sex="female"
> <first>Jane</first> <last>Doe</last>
</person> <person pid="i4712" sex="male">
<first>John</first> <last>Doe</last>
</person> </people>
rxp can be used for validation:
$ rxp -V -o 0 hello.xml Warning: Document has no DTD, validating
abandoned (detected at end of prolog of document
file:///.../plc/code/xml/hello.xml) $ rxp -V -o 0 pets.xml $ rxp -V
-o 0 people.xml
20
With JAXP a parser can be asked to validate if it can
xml/Valid.java
DocumentBuilder db = df.newDocumentBuilder(); if (validate !=
db.isValidating()) System.err.println("validate/validating
mismatch"); Document doc = db.parse(System.in, cwd.toString()); }
}
$ java -classpath .:jaxp.jar:crimson.jar Valid < hello.xml $
java -classpath .:jaxp.jar:crimson.jar -Dvalidate=true Valid <
hello.xml Warning: validation was turned on but an
org.xml.sax.ErrorHandler was not set, which is probably not what is
desired. Parser will use a default ErrorHandler to print the first
10 errors. Please call the 'setErrorHandler' method to fix this.
Error:
URI=file:/Volumes/Axel/axel/Documents/Vorlesungen/plc/code/xml/
Line=3: Element type "hello" is not declared. $ java -classpath
.:jaxp.jar:crimson.jar -Dvalidate=true Valid < pets.xml
21
4.2 DTD (simpliÞed) A DTD can control possible attribute names and
restrict element nesting.
XML Schema is likely to succeed DTDs, last not least because a DTD
has almost no provisions to restrict attribute values or text in
elements. Unfortunately, XML Schema is XML-based and views a
document more through datatype semantics and less through syntactic
means which makes it harder to see what is legal.
Here is an external DTD for the pets:
pets.dtd
<!ELEMENT pets (dogs?, cats?)> <!ELEMENT dogs (dog+)>
<!ELEMENT dog (color)> <!ATTLIST dog breed (Labrador)
#REQUIRED > <!ELEMENT cats (cat+)> <!ELEMENT cat
(color)> <!ATTLIST cat breed (Siamese|Burmese|Tortoiseshell)
#REQUIRED > <!ELEMENT color (#PCDATA)>
Elements ELEMENT describes what content is legal for a particular
element name:
The content models can be nested:
<!ELEMENT nest (( a, b ) | ( c, d ))+ >
and the result can be ambiguous:
<!ELEMENT bits (( n, e )*, (n, e )*) >
One of the most signiÞcant shortcomings of DTDs is that the mixed
content model cannot specify the order or number of elements nested
inbetween text.
Attributes ATTLIST describes legal attributes and tries to restrict
the values. Attributes may appear at most once but in any
order.
Attributes could be declared more than once; the Þrst declaration
is binding. An element can only have one ID.
One of the most signiÞcant shortcomings of DTDs is that they cannot
specify attribute value types or element text content types.
For a document type designer it is not really clear when to choose
text content or text attributes.
<!ELEMENT a EMPTY> empty element <!ELEMENT b ANY>
arbitrary well-formed content <!ELEMENT c (#PCDATA)> text
content <!ELEMENT d (#PCDATA)*> text content <!ELEMENT e
(#PCDATA | a)*> text mixed with many elements <!ELEMENT f
(a)?> optional nested element <!ELEMENT g (a)+> one or
more nested elements <!ELEMENT h (a)*> any number of nested
elements <!ELEMENT f (a | b? | c+ | d*)> choice of elements
<!ELEMENT f (a , b? , c+ , d*)> nested elements in
order
<!ATTLIST element explains for which element a CDATA default
value text value with default b CDATA FIXED only value text with
mandatory value c CDATA #IMPLIED optional text d CDATA #REQUIRED
mandatory text e (u | v | w) #REQUIRED one of three names f ID
#REQUIRED unique name g IDREF #REQUIRED reference to unique name
>
23
5 Little Smalltalk Cribsheet
Little Smalltalk is an old, text-only implementation by Tim Budd
which is nevertheless useful to study the language structure.
"comment" comment
literals
1 + 2 4 print. 5 1 + 2; print. 3 hello, , world a <- 9
simple computations
( x ~= y ) ifTrue: [t] ifFalse: [f] [ x < y ] whileTrue: [x
<- x + 1 ] b <- [:parm| parm print] b value:3. b value:5 (1
to: 5 by: 2) do: b
control structures
a <- #(1 2) (a grow: 3) grow: 4 a select: [:each| true]
array operations
class management
structure of a method
6 Syntax Analysis
6.1 Grammars A grammar consists of a Þnite set of nonterminal
symbols, a Þnite set of terminal symbols, a start symbol which is a
nonterminal, and a Þnite set of rules. A rule is an ordered pair of
sequences of nonterminal and terminal symbols. For example:
Chomsky distinguishes four different kinds of grammars based on the
structure of the rules.
In a context-free grammar each rule must have a single nonterminal
as the Þrst sequence. It is known that a push-down automaton (PDA,
stack machine) is sufÞcient for recognition. For example, the
grammar above is not context-free, but with the following rules it
is:
In a regular grammar, a rule consists either of a nonterminal and a
terminal or of a nonterminal and a sequence consisting of a
nonterminal and a terminal. A Þnite state automaton (FSA) can be
constructed from the grammar and perform recognition. For example,
the grammar above with the context-free rules is in fact
regular.
It turns out that the pattern matching performed by commands like
grep can (for the most part) be done with a FSA. This is where the
regular expressions describing the patterns got their name.
nonterminals: a, b terminals: c, d start symbol: a rules: (ac, ab),
(a, )
rules: (a, b), (a, ac), (b, d)
26
6.2 Trees The start symbol of a grammar produces syntax trees:
Nodes are nonterminals or terminals. If a node has descendants the
node must be a nonterminal and there must be a rule consisting of
the nonterminal and the (ordered) sequence of descendants. For
example:
The ordered sequence of all terminal leaves of a syntax tree (with
the start symbol as a root) is called a sentence. In this example
the sentence is dc.
A language is a set of sentences, i.e., sequences of terminals. A
grammar for a language must exactly produce all sentences. A
language can have more than one grammar.
A grammar is called ambiguous if there is a sentence for which
there is more than one syntax tree. For example:
There is no easy way to prove that a grammar is not ambiguous, but
there are sufÞcient conditions which can be checked by
programs.
nonterminals: sum terminals: number, + start symbol: sum rules:
(sum, sum + sum), (sum, number)
a
sum
sum
27
6.3 Backus-Naur-Form BNF is a language for rule speciÞcation. yacc
and similar programs use BNF as part of their input:
bnf/sum.y
%left '+' /* defines associativity to disambiguate */
%start sum /* defines start symbol (if not first) */
%% /* separates parts of the input */
sum : sum '+' sum | number ;
Single characters in quotes are terminals. The Þrst part of the Þle
contains %token lines to deÞne terminal symbols. %start deÞnes the
start symbol. %left and %right lines specify a table of increasing
precedence and associativity for operators; this can help to make a
grammar non-ambiguous.
BNF can be used to deÞne BNF:
bnf/bnf.y
sequence: /* empty */ | sequence nonterminal | sequence terminal
;
Rules with the same nonterminal are normally combined; the
alternative sequences are separated using |.
28
Using yacc yacc checks if a grammar is LALR(1) and thus
non-ambiguous:
$ yacc bnf.y $ yacc sum.y
yacc produces a function yyparse() that calls a function yylex()
for symbols and a function yyerror() when it detects that the
sequence of symbols cannot be a sentence. The functions can be in
the third part of the input to yacc:
bnf/sum.y
yyerror (char *s) { fprintf(stderr, "%s\n", s), exit(1); }
int yylex () { int ch; for (;;) if ((ch = getchar()) == EOF) return
0; else if (isspace(ch)) continue; else if (isdigit(ch)) { while
((ch = getchar()) != EOF && isdigit(ch)) ; ungetc(ch,
stdin); return number; } else return ch; }
yylex() returns zero at end of Þle, the symbol names which yacc
deÞnes as constants, or the character values for single character
terminals. lex can be used to generate yylex() from a table of
regulat expressions and actions programmed in C.
The result is a program to recognize sentences:
$ yacc sum.y $ cc -o sum y.tab.c $ sum 1 + 2 ^D $ sum 1 - 2 syntax
error
29
The program can be extended to do something during recognition.
yylex() can assign a value to yylval when a symbol is
recognized:
bnf/adder.y
int yylex () { int ch; for (;;) if ((ch = getchar()) == EOF) return
0; else if (isspace(ch)) continue; else if (isdigit(ch)) { char buf
[BUFSIZ], *bp = buf; do *bp ++ = ch; while ((ch = getchar()) != EOF
&& isdigit(ch)); ungetc(ch, stdin); *bp = '\0'; yylval =
atoi(buf); return number; } else return ch; }
Here the value is an int, but a different type can be declared
through yacc.
Following each alternative there can be an action: C code in
braces.
bnf/adder.y
sum : sum '+' sum { $$ = $1 + $3; } | number { $$ = $1; } ;
The code uses variables $1, $2 etc. to access the values
corresponding to the elements of the sequence. $$ is a variable
corresponding to the nonterminal, i.e., to the result of
recognizing the sequence. The action is executed once the sequence
is recognized.
$ yacc adder.y $ cc -o adder y.tab.c $ adder 1 + 2 3
jay is yacc retargeted to Java. In this case the values are
objects, the actions are programmed in Java, and
java.io.StreamTokenizer is a quick way to assemble symbols from
input characters.
6.4 Extended BNF BNF has to use recursion to express repetition.
Extended BNF uses a notation to express repetition. For example, a
DTD uses
More commonly, braces { } denote repetition and brackets [ ] denote
an optional part.
Wirth deÞned Pascal using syntax graphs, a graphical representation
of EBNF. For example:
A rule is a graph labeled with the nonterminal. Rectangular nodes
reference other graphs, round nodes contain terminals. Edges
connect the nodes in the order of the rules sequence; just as in a
ßow diagram there can be alternatives and repetitions.
(Nassi-Shneiderman structograms suggest a topologically sounder way
to sketch this.)
EBNF can easily be translated into a hand-coded recognizer
consisting of functions that call each other recursively; one
function for each graph.
However, recognition only works if considering the graph as a road
map it is clear at each intersection, how to proceed without
backtracking. This condition is known as LL(1) and the recognizer
algorithm is called recursive descent.
XML parsers demonstrate that EBNF can be used to deÞne a grammar
from which a recognizer can be mechanically produced. Our parser
generator oops accepts EBNF and produces a recognizer in Java or
C#.
( ) grouping * zero or more occurrences + one or more occurrences ?
zero or one occurrence
number
7 Pure Lisp
Lisp dates back to John McCarthy in the late 1950s. At the time it
had no peer as a language for symbolic computation. Current
dialects Common Lisp and Scheme are used, e.g., for programming in
artiÞcial intelligence. The dialects are quite different from each
other.
Common Lisp the Language, 2nd ed. by Guy Steele is the deÞnitive
book on the language. Common Lisp: A Gentle Introduction to
Symbolic Computation by David S. Touretzky is a good introduction.
Both are available online.
There is a tutorial by Colin Allen and Maneesh Dhagat. There is an
overview of some Lisp words by Marty Hall who also provides a
tutorial and lots of links to Lisp materials on his web page.
Lisp is based on very few, primitive concepts and has many, many
predeÞned words.
This introduction centers only on Common Lisp and on functional
programming using a subset of words as suggested by Andrew
Kitchen.
7.1 GNU Common Lisp GNU Common Lisp is a more or less portable,
free implementation of Common Lisp:
[Based on information from Edith Hemaspaandra.]
7.2 GNU CLisp GNU CLisp is a more portable, free implementation of
Common Lisp with line-editing features:
$ gcl start the interpreter > (help) access the help system >
(load "filename") load a Þle > (trace name) > (untrace
name)
trace a function
>> prompt if something is amiss >> :q return to top
level in interpreter > (bye) leave interpreter; also happens at
end of Þle
$ ~ats/bin/clisp start the interpreter [1]> help access the help
system [2]> (load "filename") load a Þle [3]> (edit-file
"filename") edit a Þle using vi
:set ai sm lisp
turn on auto indent, parentheses matching, and other lisp stuff
> (trace name) > (untrace name)
trace a function
1. Break [4]> prompt if something is amiss 1. Break [4]>
Abort return to top level in interpreter; also happens at end of
Þle > (bye) leave interpreter; also happens at end of Þle
7.3 First Steps White space is ignored. Comments range from a
semicolon to the end of a line.
( ) , ; : " \ | #
Lines are executed as they are entered (read eval loop). Some
things represent themselves:
$ gcl > "string" "string"
> nil ; another way to represent false NIL
> 10 ; numbers: integers, floating point, rational 10
> (quote axel) ; one way to make a symbol AXEL
> 'axel ; another way to make a symbol AXEL
> '(1 2 3 4) ; quoting a list (1 2 3 4)
Lists are constructed from atoms and lists. The empty list () is
also an atom.
Lists are executed like function calls with a preÞx notation.
Arguments are usually passed by value:
> (* 1/3 (+ 1/5 0.3)) ; mixed arithmetic
0.16666666666666666
> (+ 1 2 3 4) ; n-ary operation 10
> (let ((nm 10) ; locally bind a value to an atom ) nm ; and use
it ) 10
There are words for global assignment (setf and setq); however,
they should be offensive to Lisp purists.
There are also dotted pairs, vectors, and other aggregates all
impure.
33
7.4 Functions A function receives the argument values and returns a
value:
7.5 Predicates A predicate is a function, i.e., it receives the
argument values, and it returns t or nil:
7.6 Macros A macro simpliÞes writing. It decides which arguments if
any to evaluate:
(append list..) concatenation of all contents of all arguments or
nil (cons x list) constructs new list with x inserted before
contents of list (eval x) value of x (i.e., the value of x is
evaluated again) (first list) (car list)
Þrst element from list or nil
(funcall name x..) result of applying name to the arguments x; name
must be bound to a result of function or lambda; for gcl lambda
must be quoted
(length list) number of elements (list x..) list with x as elements
or nil (max num ..) maximum of the arguments (min num ..) minimum
of the arguments (print x) (princ x)
outputs and returns x without string quotes and preceding
newline
(rest list) (cdr list)
list without Þrst element or nil
(+ num..) sum of all arguments or 0 (- num ..) cumulative
difference of all arguments (* num..) product of all arguments or 1
(/ num ..) cumulative quotient of all arguments
(atom x) t if argument is atom (eql x y) t if equal atoms or
identical lists (equal x y) t if equal atoms or equal lists (evenp
num) t if argument is even (not x) (null x)
t if x is nil (i.e., empty list)
(oddp num) t if argument is odd (= num ..) t if all arguments are
equal (< num ..) t if all arguments form numerically ordered
sequence (> num ..)
(<= num ..)
(>= num ..)
(and x..) evaluates arguments in order until nil value or last
argument; returns nil or last value or t
(or x..) evaluates arguments in order until non-nil value; returns
non-nil value or nil
34
7.7 Special Forms Lisp expressions, atoms or lists, are usually
called forms. Special forms are built into the language and just
like macros decide which arguments if any to evaluate:
7.8 First Programs There are only two control structures: cond
handles decisions and defun is used to deÞne a function with a
(global) name. Functions are called recursively to handle
repetition. Because the body is not evaluated while the function is
deÞned, no predeclaration is necessary.
lisp/euclid.cl
(defun euclid (x y) ; Euclid's algorithm (cond ((> x y) (euclid
(- x y) y)) ((< x y) (euclid x (- y x))) (t x) ) )
lisp/factorial.cl
(defun factorial (n) ; ubiquitous recursion (cond ((<= n 1) 1)
(t (* n (factorial (- n 1)))) ) )
(cond (test body..) .. )
evaluates each test until Þrst non-nil; evaluates each body after
that test; returns value of last body or non-nil test or nil
(defun name (name..) body.. )
deÞnes a function with a global name and local parameter names and
the forms to be executed when the function is called; arguments and
parameters should match
(function name) #name
return the function bound to name by defun; if this is bound to
another name it can be used in funcall
(lambda (name..) body.. )
deÞnes a lambda expression which can be applied to arguments using
funcall; when called arguments are bound to each name and then each
body is evaluated; nil or the last value is returned
(let ((name init).. ) body.. )
simultaneously binds each name locally to the value of init or to
nil; sequentially evaluates each body; returns last value or
nil
(quote x) 'x
return x unevaluated
35
lisp/rev.cl
(defun rev (list) ; reverse a list (cond ((atom list) list) ((null
list) list) (t (append (rev (cdr list)) (list (car list)))) )
)
append concatenates lists, cons inserts an atom before a list, and
list makes a list from elements. car (first) and cdr (rest) produce
an atom and a list, respectively.
Trees, speciÞcally algebraic expressions, can be represented as
nested lists:
lisp/naive-infix.cl
(defun naive-infix (f) ; render an algebraic expression (cond
((atom f) ; render atom as is (princ f)) ((= (length f) 1) ; render
first of singleton (princ (car f))) ((= (length f) 2) ; render
first and second of pair (princ (car f)) (princ (car (cdr f)))) (t
; recursively render rest with parentheses (princ "(") (naive-infix
(car (cdr f))) (princ ")") (princ (car f)) (princ "(") (naive-infix
(car (cdr (cdr f)))) (princ ")")) ) )
$ clisp [1]> (load "naive-infix.cl") [2]> (naive-infix (* (+
a b) c)) ((A)+(B))*(C) ")"
36
7.9 Scope let and lambda deÞne local names. defun deÞnes local
parameter names and a global function name. defun can even be used
to dynamically redeÞne its own function:
lisp/gcd-lcm.cl
(defun gcd-lcm (x y) ; a function with a mind of it's own (defun
euclid (x y) ; greatest common divisor (cond ((> x y) (euclid (-
x y) y)) ((< x y) (euclid x (- y x))) (t (defun gcd-lcm (x y)
(dijkstra x y y x)) x) ) ) (defun dijkstra (x y u v) ; least common
multiple (cond ((> x y) (dijkstra (- x y) y u (+ u v))) ((< x
y) (dijkstra x (- y x) (+ u v) v)) (t (defun gcd-lcm (x y) (euclid
x y)) (/ (+ u v) 2)) ) ) (euclid x y) )
$ clisp [1]> (load "gcd-lcm.cl") [2]> (gcd-lcm 36 54) 18
[3]> (gcd-lcm 36 54) 108 [4]> (gcd-lcm 36 54) 18 [5]>
(dijkstra 36 54 54 36) 108
Pure Lisp should be side-effect free after all, there is nothing
like an assignment.
gcd-lcm is a rather extreme counterexample: RedeÞning gcd-lcm from
within gives the function state that survives the function call in
the conventional fashion, namely by assigning to gcd-lcm as a
global variable.
37
7.10 Function Pointers plot is a very trivial function plotter that
accepts a function as a parameter:
lisp/plot.cl
(defun plot (f from to step x-offset) ; plot function in a range,
shifted over (defun plot-1 (x) ; plot a line -- should be made
local (cond ((> x 0) (princ " ") (plot-1 (- x 1))) (t (princ
"*")) ) ) (defun plot-curve (x) ; plot curve -- should be made
local (print x) (plot-1 (+ x-offset (funcall f x))) (let ((next (+
x step)) ) (cond ((<= next to) (plot-curve next))) ) )
(plot-curve from) )
$ clisp [1]> (load "plot.cl") [2]> (defun sqr (x) (* x x))
[3]> (plot (function sqr) 1 5 1 10) 1 * 2 * 3 * 4 * 5 * NIL
[4]> (plot (lambda (x) (* x x)) 1 5 1 10)) 1 * 2 * 3 * 4 * 5 *
NIL [5]> (plot-curve 0) 0 * 1 * 2 * 3 * 4 * 5 * NIL
lambda creates an anonymous function similar to using function to
retrieve a function deÞned with defun. For gcl, lambda must be
quoted. The last example shows that the locally deÞned global
functions are bound to their lexical environment, i.e., plot-curve
is deÞned bound to whatever argument values were last given to
plot.
38
Contents
References