Programming Language Concepts - Department of Computer Science

Programming Language ConceptsDepartment of Computer Science Rochester Institute of Technology
This volume contains copies of the overhead slides used in class. This information is available online as part of the World Wide Web; it contains hypertext references to itself and to the documentation for various programming languages in the Web. The example programs are included into this text from the original sources.
So that it may be viewed on other platforms, this text also exists as a PDF document. With the Acrobat Reader from Adobe the text can be printed on Windows systems.
The text is not a complete transcript of the lectures. A rudimentary knowledge of some programming languages is assumed; for self study one would have to consult introductory books on programming, programming languages, and compiler construction.
A major part of this course were reports on numerous programming languages given by students followed by programming assignments in the more prominent of these languages. During the presentations common and unique concepts of different languages were related and contrasted to each other. The assignments tried to exhibit simple but typical uses of each language through progressively more difÞcult problems.
Contents 0 Introduction 1 1 Timeline 3 2 Rosetta Stone 5 3 bc 11 4 XML 17 5 Little Smalltalk Cribsheet 23 6 Syntax Analysis 25 7 Pure Lisp 31
Adobe
Framemaker,
Distiller.
OmniGraffle is used for the drawings. The slides are available in the World Wide Web.
Today there are lots of programming languages and even more books about them. Some books deal with single languages, others compare languages or discuss the history of language development. Here are typical ones:
There is also a useful Web site: http://dmoz.org/Computers/Programming/Languages/.
Flanagan 1-56592-487-8 Java in a Nutshell (3rd Edition) Kernighan/Ritchie 3-446-15497-3 The C Programming Language Schreiner/Friedman Compiler Construction Wexelblatt History Wirth Pascal Wirth Modula Wirth Oberon
1 Timeline
Here is a brief list of programming languages that are signiÞcant for one reason or another and that are/were used by many people or for a long time:
around 1960 Algol 60 IFIP, block structure, recursion, BNF syntax dead COBOL business data processing legacy COMIT MIT, string processing Snobol Fortran IBM, workhorse for numerical problems updated Lisp McCarthy, workhorse for artiÞcial intelligence updated
around 1965 APL Iverson, numerical mathematics J Basic Dartmouth, interactive programming updated PL/I IBM, attempt at a universal language dead Simula Dahl et al., simulation, class concept SmallTalk Snobol Griswold, string processing Icon
around 1970 Algol 68 very formally deÞned, updated successor to Algol 60 stillborn C Ritchie et al., workhorse for systems programming Pascal Wirths Þrst for academia Delphi
around 1975 awk Aho et al., scripting, report generation Perl bc Unix, high precision interactive calculator m4 Unix, macro processor Prolog Clocksin et al.,declarative programming dying
around 1980 Icon Griswolds successor to Snobol exotic Modula Wirths Þrst for modular programming in academia exotic Objective C Cox, NeXT, C with object orientation MacOS X SGML Goldfarb et al., document markup XML sh Bourne et al., Unix command languages SmallTalk 80 PARC, proverbial object orientation
around 1985 Ada DoD, designed through requirements updated C++ Stroustrup, C with object orientation dying
around 1990 Oberon Wirths best C ever exotic
around 1995 Java Gosling et al., selection from C++ and Objective C
around 2000 C# Microsofts own brand of Java, common runtime Haskell Thompson, functional programming XML W3C, structured data representation XSLT W3C, functional language for XML manipulation
4
5
2 Rosetta Stone
In the early 1970s Bill Wulf suggested to create a Rosetta Stone for programming languages. He had a serious comparison in mind; http://internet.ls-la.net/mirrors/99bottles/ is much less so. The following tables are derived from Wulfs project. They serve as a starting point for comparing programming languages from a lexical and syntactic (looks) and semantic (meaning) point of view.
2.1 Lexical Aspects
Comments
Delimeters
Keywords
Names Names tend to consist of letters and digits, but what is a letter or a digit?
punched cards, some columns special COBOL, Fortran line oriented Basic, machine languages (assembler) free format most current languages white space ignored Basic, Fortran
part of executing program APL, Basic positional on line Assembler positional within phrases Algol, between parameters like a statement

" comment "
-- comment
end of line awk (mostly), Basic, Fortran semicolon as separator Pascal semicolon as terminator C
distinguished by position Basic distinguished by context Fortran, PL/I distinguished by typeface Algol stropping
.begin end
reserved Pascal
Scalars
Aggregates
2.3 Literals A literal is a self-deÞning constant, i.e., the textual appearance must indicate type and value.
Is there a literal for every data type?
Scalars
Aggregates
fixed decimal(2)
^integer
procedure
[] of integer
record ... end
file of record ... end
integer, various bases
1.0 1e2 1.0 1e2
7
2.4 Variables A variable can be viewed as a binding between a name and a value, which in many programming languages can be changed while a program is executed.
A constant can be viewed as a variable where the binding cannot be changed.
Are there variables/constants for every data type supported by the language?
x := x + 1;
This typical assignment statement shows that the meaning of a variable name usually depends on context: it delivers a typed value (r-value) or it is the target of rebinding (l-value).
How does the language decide on the type of a variable?
2.5 Subroutines A procedure is a name for a sequence of statements. A function is a name for a sequence of statements which computes a value.
Which does the language have?
Properties Which data types can a function deliver as a result?
Is recursion allowed, forbidden but detected, or simply forbidden?
Can global variables be changed by a subroutine?
Can subroutines be programmed in other languages?
no typing each operation interprets a value usually in machine languages
strong typing variables have predeclared attributes usually in compiled languages
dynamic typing (re)bound value determines current type often in interactive languages
8
Parameters Can subroutines have parameters?
Which data types can be used for parameters and for arguments?
Which conversions are implicit, if any?
How are arguments and parameters related:
Does caller or subroutine decide how a parameter is passed?
2.6 Scope Where are variable names and other user-deÞned names visible in a program?
Do (all?) names have to be declared before they are used?
2.7 Lifetime How long is a value available?
by value argument value is copied by value return parameter value is copied back by address parameter aliases any argument by reference parameter aliases only variable as argument by name effectively, text of argument is used in place of parameter
ßat Basic local/global awk nested Algol modular Modula,
export/
import
as long as program C
static as long as subroutine is active Algol as long as referencable Lisp as long as program decides C
malloc/
free
9
break,
continue,
on error begin ... end
loop ... exitif ... end
dependent on range
fork() synchronization
3 bc
bc (basic calculator) originally appeared with Unix. It is a simple programming language with high precision arithmetic. It was originally implemented as a frontend for
dc (desk calculator) which implemented a stack machine. POSIX
bc is less elaborate than GNU bc.
bc is powerful enough to implement mathematical functions: sine s(x), cosine c(x), arctangent a(x), natural logarithm l(x), exponential e(x), and bessel function j(x,y).
3.1 Lexical Aspects
Names Single lowercase ASCII letters (or alphanumeric as an extension).
newline is statement separator and it starts execution
12; 34; 56 12 34 56
asymmetrically marked /* comment */
semicolon as separator [there is no else]
reserved x=7 if=12 (standard_in) 2: parse error x 7
12
ibase and obase control input and output, scale controls computation.
Numbers are converted to decimal and have arbitrary precision. length is the total number of signiÞcant digits, scale is the number of decimal digits to the right of the decimal point.
y=1.001 length(y) 4 scale(y) 3 x = 0.001 length(x) 3
Strings can only be used as literals.
Arrays are managed dynamically and contain numbers.
a[7] = 3; x = 7; a[0]; a[x] 0 3
Numbers as sequences of digits and uppercase letters A through F, with or without decimal point, interpreted in ibase (between 2 and 16), printed in obase.
ibase=2; 110 6 ibase=1010; 81 81 ibase=2; obase=10000; 11111 1F
Single digits deÞne themselves, irrespective of ibase. Too-large digits yield ibase-1.
ibase=2; F 15 1234 15
Strings as sequences of characters within doublequotes.
"hello " hello
Recursion is allowed.
Parameters
bc has functions that can be used as procedures. DeÞnitions cannot be nested.
define f(n) { if (n > 1) return (n*f(n-1)) return (1) } f(10) 3628800 define b(n,k) { return (f(n)/f(k)/f(n-k)) } b(4,2) 6
There are local variables, initialized with zero. Nested function calls can access them. Globals can be changed.
define a() { auto a, b b() a; b } define b() { auto a ( a=3 ) ( b=4 ) } a() 3 assignment to b’s a 4 assignment to a’s b 0 result of b() 0 value of a’s a 4 value of a’s b 0 result of a()
Functions have parameters which must be exactly matched. Arguments are numbers and are passed by value.
a = 4; a 4 define b(a) { ( a = 5 ) } b(a); a 5 assignment to b’s a 0 result of b() 4 global a
14
3.6 Scope There are 26 names, but they can be used three times: for scalar variables, for array variables, and for functions.
3.7 Lifetime Globals live as long as the program, locals and parameters as long as the active function call.
A function can declare an auto list of more names. A called function searches outward for the callers auto names or for global names. Variables are initialized auto names.
define a(b) { auto a b() } define b() { ( a = b ) } a(5); a 5 assignment to a’s a with a’s b 0 result of b() 0 result of a(5) 0 global a is not affected b=3; b(); a 3 assignment to global a with b 0 result of b() 3 global a is affected
15
Relational operators are only allowed singly in statements.
Simple Statements
negative -a
scale=3; a=2; a^3; a^-3 8 .125
from left multiplication, etc. tricky scale for remainder
a*b; a/b; a%b
from left addition, etc. a+b; a-b
from right assignment a=b; a+=b
expression statement, printed ( a = 3 )
assignment statement, not printed a = 3
string, printed "hello"
compound { a; b }
stop (immediately) quit
optional part if ( a <= b ) "hello"
dependent on precondition define e(x,y) { while (x != y) { if (x > y) x -= y if (x < y) y -= x } return (x) } e(36,54) 18
dependent on range i=4 for (j=0; j<=i; ++ j) b(i,j) 1 4 6 4 1
16
17
4 XML
XML is a slightly modernized version of SGML which was originally intended to logically markup text for printing. XML is now mostly used to represent structured data for transmission or storage a well-formed XML-based document has a tree structure.
XSL is a modernized version of DSSSL which was originally intended to manipulate and in particular print SGML-based documents. XSLT, part of XSL, is a functional language, mostly for tree transformations. XSL includes a speciÞcation for formatting objects, into which XML- based documents can be transformed with XSLT for high-quality printing. XSL and XSLT programs are speciÞed as XML-based documents.
4.1 XML (simpliÞed) A well-formed XML-based document consists of a header and one element which may contain other elements.
xml/hello.xml
<?xml version="1.0" encoding="UTF-8"?>  <hello>Hello, World!</hello>
Elements consist of a start tag and an end tag with matching names. They must be properly nested no overlap and can also contain text.
An empty element consists of a tag marked with / at the end:
<empty/>
The tag of an empty element or a start tag can have attributes following the tag name. An attribute consists of a key and a value, the value is enclosed in single or double quotes:
<greeting text="hello" terminator=!/>
Lexical aspects XML is free format; whitespace is signiÞcant as part of text or an attribute value.
Tag and attribute names are case-sensitive (unlike HTML) and consist of letters and digits and some other characters.
Comments are special elements, cannot contain --, and cannot appear everywhere.
There is a text replacement mechanism (entities) but is is not cleanly separated from the language.
In particular, & ' > < and " must be used because, e.g., attribute values cannot contain < > delimiting quotes and &. Furthermore, &#dd; and &#xhh; can be used to specify any character by its position in the Unicode character set.
Checking There are many XML parsers. rxp is particularly fast for checking:
$ rxp -o b hello.xml At 39: comment: Hello, World At 61: start: hello At 68: pcdata: Hello, World! At 81: end: hello At 119: EOF
Suns Java API for XML Processing (JAXP) are factory classes to access different parsers. The package includes an XML parser (crimson) and an XSLT transformer (xalan). The following main program can be used to check an XML-based document:
xml/WellFormed.java
$ javac -classpath .:jaxp.jar:crimson.jar WellFormed.java $ java -classpath .:jaxp.jar:crimson.jar WellFormed < hello.xml
Validation Sometimes it is desirable to describe what elements a document may use, how they may be nested, and what attributes they may use, etc.
A valid XML-based document is well-formed and conforms to a Document Type DeÞnition (DTD), which is either referenced from the document
xml/pets.xml
...
xml/people.xml
<!DOCTYPE people [ <!ELEMENT people (person*)> <!ELEMENT person (first,last)> <!ATTLIST person pid ID #REQUIRED boss IDREF #IMPLIED sex (male|female) #REQUIRED > <!ELEMENT first (#PCDATA)> <!ELEMENT last (#PCDATA)> ]>
<people> <person pid="i4711" boss="i4712" sex="female" > <first>Jane</first> <last>Doe</last> </person> <person pid="i4712" sex="male"> <first>John</first> <last>Doe</last> </person> </people>
rxp can be used for validation:
$ rxp -V -o 0 hello.xml Warning: Document has no DTD, validating abandoned (detected at end of prolog of document file:///.../plc/code/xml/hello.xml) $ rxp -V -o 0 pets.xml $ rxp -V -o 0 people.xml
20
With JAXP a parser can be asked to validate if it can
xml/Valid.java
DocumentBuilder db = df.newDocumentBuilder(); if (validate != db.isValidating()) System.err.println("validate/validating mismatch"); Document doc = db.parse(System.in, cwd.toString()); } }
$ java -classpath .:jaxp.jar:crimson.jar Valid < hello.xml $ java -classpath .:jaxp.jar:crimson.jar -Dvalidate=true Valid < hello.xml Warning: validation was turned on but an org.xml.sax.ErrorHandler was not set, which is probably not what is desired. Parser will use a default ErrorHandler to print the first 10 errors. Please call the 'setErrorHandler' method to fix this. Error: URI=file:/Volumes/Axel/axel/Documents/Vorlesungen/plc/code/xml/ Line=3: Element type "hello" is not declared. $ java -classpath .:jaxp.jar:crimson.jar -Dvalidate=true Valid < pets.xml
21
4.2 DTD (simpliÞed) A DTD can control possible attribute names and restrict element nesting.
XML Schema is likely to succeed DTDs, last not least because a DTD has almost no provisions to restrict attribute values or text in elements. Unfortunately, XML Schema is XML-based and views a document more through datatype semantics and less through syntactic means which makes it harder to see what is legal.
Here is an external DTD for the pets:
pets.dtd
<!ELEMENT pets (dogs?, cats?)> <!ELEMENT dogs (dog+)> <!ELEMENT dog (color)> <!ATTLIST dog breed (Labrador) #REQUIRED > <!ELEMENT cats (cat+)> <!ELEMENT cat (color)> <!ATTLIST cat breed (Siamese|Burmese|Tortoiseshell) #REQUIRED > <!ELEMENT color (#PCDATA)>
Elements ELEMENT describes what content is legal for a particular element name:
The content models can be nested:
<!ELEMENT nest (( a, b ) | ( c, d ))+ >
and the result can be ambiguous:
<!ELEMENT bits (( n, e )*, (n, e )*) >
One of the most signiÞcant shortcomings of DTDs is that the mixed content model cannot specify the order or number of elements nested inbetween text.
Attributes ATTLIST describes legal attributes and tries to restrict the values. Attributes may appear at most once but in any order.
Attributes could be declared more than once; the Þrst declaration is binding. An element can only have one ID.
One of the most signiÞcant shortcomings of DTDs is that they cannot specify attribute value types or element text content types.
For a document type designer it is not really clear when to choose text content or text attributes.
<!ELEMENT a EMPTY> empty element <!ELEMENT b ANY> arbitrary well-formed content <!ELEMENT c (#PCDATA)> text content <!ELEMENT d (#PCDATA)*> text content <!ELEMENT e (#PCDATA | a)*> text mixed with many elements <!ELEMENT f (a)?> optional nested element <!ELEMENT g (a)+> one or more nested elements <!ELEMENT h (a)*> any number of nested elements <!ELEMENT f (a | b? | c+ | d*)> choice of elements <!ELEMENT f (a , b? , c+ , d*)> nested elements in order
<!ATTLIST element explains for which element a CDATA default value text value with default b CDATA FIXED only value text with mandatory value c CDATA #IMPLIED optional text d CDATA #REQUIRED mandatory text e (u | v | w) #REQUIRED one of three names f ID #REQUIRED unique name g IDREF #REQUIRED reference to unique name >
23
5 Little Smalltalk Cribsheet
Little Smalltalk is an old, text-only implementation by Tim Budd which is nevertheless useful to study the language structure.
"comment" comment
literals
1 + 2 4 print. 5 1 + 2; print. 3 hello, , world a <- 9
simple computations
( x ~= y ) ifTrue: [t] ifFalse: [f] [ x < y ] whileTrue: [x <- x + 1 ] b <- [:parm| parm print] b value:3. b value:5 (1 to: 5 by: 2) do: b
control structures
a <- #(1 2) (a grow: 3) grow: 4 a select: [:each| true]
array operations
class management
structure of a method
6 Syntax Analysis
6.1 Grammars A grammar consists of a Þnite set of nonterminal symbols, a Þnite set of terminal symbols, a start symbol which is a nonterminal, and a Þnite set of rules. A rule is an ordered pair of sequences of nonterminal and terminal symbols. For example:
Chomsky distinguishes four different kinds of grammars based on the structure of the rules.
In a context-free grammar each rule must have a single nonterminal as the Þrst sequence. It is known that a push-down automaton (PDA, stack machine) is sufÞcient for recognition. For example, the grammar above is not context-free, but with the following rules it is:
In a regular grammar, a rule consists either of a nonterminal and a terminal or of a nonterminal and a sequence consisting of a nonterminal and a terminal. A Þnite state automaton (FSA) can be constructed from the grammar and perform recognition. For example, the grammar above with the context-free rules is in fact regular.
It turns out that the pattern matching performed by commands like grep can (for the most part) be done with a FSA. This is where the regular expressions describing the patterns got their name.
nonterminals: a, b terminals: c, d start symbol: a rules: (ac, ab), (a, )
rules: (a, b), (a, ac), (b, d)
26
6.2 Trees The start symbol of a grammar produces syntax trees: Nodes are nonterminals or terminals. If a node has descendants the node must be a nonterminal and there must be a rule consisting of the nonterminal and the (ordered) sequence of descendants. For example:
The ordered sequence of all terminal leaves of a syntax tree (with the start symbol as a root) is called a sentence. In this example the sentence is dc.
A language is a set of sentences, i.e., sequences of terminals. A grammar for a language must exactly produce all sentences. A language can have more than one grammar.
A grammar is called ambiguous if there is a sentence for which there is more than one syntax tree. For example:
There is no easy way to prove that a grammar is not ambiguous, but there are sufÞcient conditions which can be checked by programs.
nonterminals: sum terminals: number, + start symbol: sum rules: (sum, sum + sum), (sum, number)
a
sum
sum
27
6.3 Backus-Naur-Form BNF is a language for rule speciÞcation. yacc and similar programs use BNF as part of their input:
bnf/sum.y
%left '+' /* defines associativity to disambiguate */
%start sum /* defines start symbol (if not first) */
%% /* separates parts of the input */
sum : sum '+' sum | number ;
Single characters in quotes are terminals. The Þrst part of the Þle contains %token lines to deÞne terminal symbols. %start deÞnes the start symbol. %left and %right lines specify a table of increasing precedence and associativity for operators; this can help to make a grammar non-ambiguous.
BNF can be used to deÞne BNF:
bnf/bnf.y
sequence: /* empty */ | sequence nonterminal | sequence terminal ;
Rules with the same nonterminal are normally combined; the alternative sequences are separated using |.
28
Using yacc yacc checks if a grammar is LALR(1) and thus non-ambiguous:
$ yacc bnf.y $ yacc sum.y
yacc produces a function yyparse() that calls a function yylex() for symbols and a function yyerror() when it detects that the sequence of symbols cannot be a sentence. The functions can be in the third part of the input to yacc:
bnf/sum.y
yyerror (char *s) { fprintf(stderr, "%s\n", s), exit(1); }
int yylex () { int ch; for (;;) if ((ch = getchar()) == EOF) return 0; else if (isspace(ch)) continue; else if (isdigit(ch)) { while ((ch = getchar()) != EOF && isdigit(ch)) ; ungetc(ch, stdin); return number; } else return ch; }
yylex() returns zero at end of Þle, the symbol names which yacc deÞnes as constants, or the character values for single character terminals. lex can be used to generate yylex() from a table of regulat expressions and actions programmed in C.
The result is a program to recognize sentences:
$ yacc sum.y $ cc -o sum y.tab.c $ sum 1 + 2 ^D $ sum 1 - 2 syntax error
29
The program can be extended to do something during recognition. yylex() can assign a value to yylval when a symbol is recognized:
bnf/adder.y
int yylex () { int ch; for (;;) if ((ch = getchar()) == EOF) return 0; else if (isspace(ch)) continue; else if (isdigit(ch)) { char buf [BUFSIZ], *bp = buf; do *bp ++ = ch; while ((ch = getchar()) != EOF && isdigit(ch)); ungetc(ch, stdin); *bp = '\0'; yylval = atoi(buf); return number; } else return ch; }
Here the value is an int, but a different type can be declared through yacc.
Following each alternative there can be an action: C code in braces.
bnf/adder.y
sum : sum '+' sum { $$ = $1 + $3; } | number { $$ = $1; } ;
The code uses variables $1, $2 etc. to access the values corresponding to the elements of the sequence. $$ is a variable corresponding to the nonterminal, i.e., to the result of recognizing the sequence. The action is executed once the sequence is recognized.
$ yacc adder.y $ cc -o adder y.tab.c $ adder 1 + 2 3
jay is yacc retargeted to Java. In this case the values are objects, the actions are programmed in Java, and java.io.StreamTokenizer is a quick way to assemble symbols from input characters.
6.4 Extended BNF BNF has to use recursion to express repetition. Extended BNF uses a notation to express repetition. For example, a DTD uses
More commonly, braces { } denote repetition and brackets [ ] denote an optional part.
Wirth deÞned Pascal using syntax graphs, a graphical representation of EBNF. For example:
A rule is a graph labeled with the nonterminal. Rectangular nodes reference other graphs, round nodes contain terminals. Edges connect the nodes in the order of the rules sequence; just as in a ßow diagram there can be alternatives and repetitions. (Nassi-Shneiderman structograms suggest a topologically sounder way to sketch this.)
EBNF can easily be translated into a hand-coded recognizer consisting of functions that call each other recursively; one function for each graph.
However, recognition only works if considering the graph as a road map it is clear at each intersection, how to proceed without backtracking. This condition is known as LL(1) and the recognizer algorithm is called recursive descent.
XML parsers demonstrate that EBNF can be used to deÞne a grammar from which a recognizer can be mechanically produced. Our parser generator oops accepts EBNF and produces a recognizer in Java or C#.
( ) grouping * zero or more occurrences + one or more occurrences ? zero or one occurrence
number
7 Pure Lisp
Lisp dates back to John McCarthy in the late 1950s. At the time it had no peer as a language for symbolic computation. Current dialects Common Lisp and Scheme are used, e.g., for programming in artiÞcial intelligence. The dialects are quite different from each other.
Common Lisp the Language, 2nd ed. by Guy Steele is the deÞnitive book on the language. Common Lisp: A Gentle Introduction to Symbolic Computation by David S. Touretzky is a good introduction. Both are available online.
There is a tutorial by Colin Allen and Maneesh Dhagat. There is an overview of some Lisp words by Marty Hall who also provides a tutorial and lots of links to Lisp materials on his web page.
Lisp is based on very few, primitive concepts and has many, many predeÞned words.
This introduction centers only on Common Lisp and on functional programming using a subset of words as suggested by Andrew Kitchen.
7.1 GNU Common Lisp GNU Common Lisp is a more or less portable, free implementation of Common Lisp:
[Based on information from Edith Hemaspaandra.]
7.2 GNU CLisp GNU CLisp is a more portable, free implementation of Common Lisp with line-editing features:
$ gcl start the interpreter > (help) access the help system > (load "filename") load a Þle > (trace name) > (untrace name)
trace a function
>> prompt if something is amiss >> :q return to top level in interpreter > (bye) leave interpreter; also happens at end of Þle
$ ~ats/bin/clisp start the interpreter [1]> help access the help system [2]> (load "filename") load a Þle [3]> (edit-file "filename") edit a Þle using vi
:set ai sm lisp
turn on auto indent, parentheses matching, and other lisp stuff > (trace name) > (untrace name)
trace a function
1. Break [4]> prompt if something is amiss 1. Break [4]> Abort return to top level in interpreter; also happens at end of Þle > (bye) leave interpreter; also happens at end of Þle
7.3 First Steps White space is ignored. Comments range from a semicolon to the end of a line.
( ) , ; : " \ | #
Lines are executed as they are entered (read eval loop). Some things represent themselves:
$ gcl > "string" "string"
> nil ; another way to represent false NIL
> 10 ; numbers: integers, floating point, rational 10
> (quote axel) ; one way to make a symbol AXEL
> 'axel ; another way to make a symbol AXEL
> '(1 2 3 4) ; quoting a list (1 2 3 4)
Lists are constructed from atoms and lists. The empty list () is also an atom.
Lists are executed like function calls with a preÞx notation. Arguments are usually passed by value:
> (* 1/3 (+ 1/5 0.3)) ; mixed arithmetic 0.16666666666666666
> (+ 1 2 3 4) ; n-ary operation 10
> (let ((nm 10) ; locally bind a value to an atom ) nm ; and use it ) 10
There are words for global assignment (setf and setq); however, they should be offensive to Lisp purists.
There are also dotted pairs, vectors, and other aggregates all impure.
33
7.4 Functions A function receives the argument values and returns a value:
7.5 Predicates A predicate is a function, i.e., it receives the argument values, and it returns t or nil:
7.6 Macros A macro simpliÞes writing. It decides which arguments if any to evaluate:
(append list..) concatenation of all contents of all arguments or nil (cons x list) constructs new list with x inserted before contents of list (eval x) value of x (i.e., the value of x is evaluated again) (first list) (car list)
Þrst element from list or nil
(funcall name x..) result of applying name to the arguments x; name must be bound to a result of function or lambda; for gcl lambda must be quoted
(length list) number of elements (list x..) list with x as elements or nil (max num ..) maximum of the arguments (min num ..) minimum of the arguments (print x) (princ x)
outputs and returns x without string quotes and preceding newline
(rest list) (cdr list)
list without Þrst element or nil
(+ num..) sum of all arguments or 0 (- num ..) cumulative difference of all arguments (* num..) product of all arguments or 1 (/ num ..) cumulative quotient of all arguments
(atom x) t if argument is atom (eql x y) t if equal atoms or identical lists (equal x y) t if equal atoms or equal lists (evenp num) t if argument is even (not x) (null x)
t if x is nil (i.e., empty list)
(oddp num) t if argument is odd (= num ..) t if all arguments are equal (< num ..) t if all arguments form numerically ordered sequence (> num ..)
(<= num ..)
(>= num ..)
(and x..) evaluates arguments in order until nil value or last argument; returns nil or last value or t
(or x..) evaluates arguments in order until non-nil value; returns non-nil value or nil
34
7.7 Special Forms Lisp expressions, atoms or lists, are usually called forms. Special forms are built into the language and just like macros decide which arguments if any to evaluate:
7.8 First Programs There are only two control structures: cond handles decisions and defun is used to deÞne a function with a (global) name. Functions are called recursively to handle repetition. Because the body is not evaluated while the function is deÞned, no predeclaration is necessary.
lisp/euclid.cl
(defun euclid (x y) ; Euclid's algorithm (cond ((> x y) (euclid (- x y) y)) ((< x y) (euclid x (- y x))) (t x) ) )
lisp/factorial.cl
(defun factorial (n) ; ubiquitous recursion (cond ((<= n 1) 1) (t (* n (factorial (- n 1)))) ) )
(cond (test body..) .. )
evaluates each test until Þrst non-nil; evaluates each body after that test; returns value of last body or non-nil test or nil
(defun name (name..) body.. )
deÞnes a function with a global name and local parameter names and the forms to be executed when the function is called; arguments and parameters should match
(function name) #name
return the function bound to name by defun; if this is bound to another name it can be used in funcall
(lambda (name..) body.. )
deÞnes a lambda expression which can be applied to arguments using funcall; when called arguments are bound to each name and then each body is evaluated; nil or the last value is returned
(let ((name init).. ) body.. )
simultaneously binds each name locally to the value of init or to nil; sequentially evaluates each body; returns last value or nil
(quote x) 'x
return x unevaluated
35
lisp/rev.cl
(defun rev (list) ; reverse a list (cond ((atom list) list) ((null list) list) (t (append (rev (cdr list)) (list (car list)))) ) )
append concatenates lists, cons inserts an atom before a list, and list makes a list from elements. car (first) and cdr (rest) produce an atom and a list, respectively.
Trees, speciÞcally algebraic expressions, can be represented as nested lists:
lisp/naive-infix.cl
(defun naive-infix (f) ; render an algebraic expression (cond ((atom f) ; render atom as is (princ f)) ((= (length f) 1) ; render first of singleton (princ (car f))) ((= (length f) 2) ; render first and second of pair (princ (car f)) (princ (car (cdr f)))) (t ; recursively render rest with parentheses (princ "(") (naive-infix (car (cdr f))) (princ ")") (princ (car f)) (princ "(") (naive-infix (car (cdr (cdr f)))) (princ ")")) ) )
$ clisp [1]> (load "naive-infix.cl") [2]> (naive-infix (* (+ a b) c)) ((A)+(B))*(C) ")"
36
7.9 Scope let and lambda deÞne local names. defun deÞnes local parameter names and a global function name. defun can even be used to dynamically redeÞne its own function:
lisp/gcd-lcm.cl
(defun gcd-lcm (x y) ; a function with a mind of it's own (defun euclid (x y) ; greatest common divisor (cond ((> x y) (euclid (- x y) y)) ((< x y) (euclid x (- y x))) (t (defun gcd-lcm (x y) (dijkstra x y y x)) x) ) ) (defun dijkstra (x y u v) ; least common multiple (cond ((> x y) (dijkstra (- x y) y u (+ u v))) ((< x y) (dijkstra x (- y x) (+ u v) v)) (t (defun gcd-lcm (x y) (euclid x y)) (/ (+ u v) 2)) ) ) (euclid x y) )
$ clisp [1]> (load "gcd-lcm.cl") [2]> (gcd-lcm 36 54) 18 [3]> (gcd-lcm 36 54) 108 [4]> (gcd-lcm 36 54) 18 [5]> (dijkstra 36 54 54 36) 108
Pure Lisp should be side-effect free after all, there is nothing like an assignment.
gcd-lcm is a rather extreme counterexample: RedeÞning gcd-lcm from within gives the function state that survives the function call in the conventional fashion, namely by assigning to gcd-lcm as a global variable.
37
7.10 Function Pointers plot is a very trivial function plotter that accepts a function as a parameter:
lisp/plot.cl
(defun plot (f from to step x-offset) ; plot function in a range, shifted over (defun plot-1 (x) ; plot a line -- should be made local (cond ((> x 0) (princ " ") (plot-1 (- x 1))) (t (princ "*")) ) ) (defun plot-curve (x) ; plot curve -- should be made local (print x) (plot-1 (+ x-offset (funcall f x))) (let ((next (+ x step)) ) (cond ((<= next to) (plot-curve next))) ) ) (plot-curve from) )
$ clisp [1]> (load "plot.cl") [2]> (defun sqr (x) (* x x)) [3]> (plot (function sqr) 1 5 1 10) 1 * 2 * 3 * 4 * 5 * NIL [4]> (plot (lambda (x) (* x x)) 1 5 1 10)) 1 * 2 * 3 * 4 * 5 * NIL [5]> (plot-curve 0) 0 * 1 * 2 * 3 * 4 * 5 * NIL
lambda creates an anonymous function similar to using function to retrieve a function deÞned with defun. For gcl, lambda must be quoted. The last example shows that the locally deÞned global functions are bound to their lexical environment, i.e., plot-curve is deÞned bound to whatever argument values were last given to plot.
38
Contents
References