Kent Academic Repository Full text document (pdf) Copyright & reuse Content in the Kent Academic Repository is made available for research purposes. Unless otherwise stated all content is protected by copyright and in the absence of an open licence (eg Creative Commons), permissions for further reuse of content should be sought from the publisher, author or other copyright holder. Versions of research The version in the Kent Academic Repository may differ from the final published version. Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the published version of record. Enquiries For any further enquiries regarding the licence status of this document, please contact: [email protected]If you believe this document infringes copyright then please contact the KAR admin team with the take-down information provided at http://kar.kent.ac.uk/contact.html Citation for published version Welch, Peter H. (1998) Parallel and Distributed Computing in Education (Invited Talk). In: VECPAR''98: Third International Conference on Vector and Parallel Processing - Selected Papers, 21/06/1998, Porto , Portugal . DOI Link to record in KAR http://kar.kent.ac.uk/21644/ Document Version UNSPECIFIED
31
Embed
Parallel and Distributed Computing in Education (Invited Talk)
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Kent Academic RepositoryFull text document (pdf)
Copyright & reuse
Content in the Kent Academic Repository is made available for research purposes. Unless otherwise stated all
content is protected by copyright and in the absence of an open licence (eg Creative Commons), permissions
for further reuse of content should be sought from the publisher, author or other copyright holder.
Versions of research
The version in the Kent Academic Repository may differ from the final published version.
Users are advised to check http://kar.kent.ac.uk for the status of the paper. Users should always cite the
published version of record.
Enquiries
For any further enquiries regarding the licence status of this document, please contact:
If you believe this document infringes copyright then please contact the KAR admin team with the take-down
information provided at http://kar.kent.ac.uk/contact.html
Citation for published version
Welch, Peter H. (1998) Parallel and Distributed Computing in Education (Invited Talk). In:VECPAR''98: Third International Conference on Vector and Parallel Processing - Selected Papers,21/06/1998, Porto , Portugal .
DOI
Link to record in KAR
http://kar.kent.ac.uk/21644/
Document Version
UNSPECIFIED
Parallel and Distributed Computing
in Edu ation (Invited Talk)
Peter H. Wel h
Computing Laboratory, University of Kent at Canterbury, CT2 7NF.
P.H.Wel h�uk .a .uk
Abstra t. The natural world is ertainly not organised through a en-
tral thread of ontrol. Things happen as the result of the a tions and
intera tions of unimaginably large numbers of independent agents, oper-
ating at all levels of s ale from nu lear to astronomi . Computer systems
aiming to be of real use in this real world need to model, at the appro-
priate level of abstra tion, that part of it for whi h it is to be of servi e.
If that modelling an re e t the natural on urren y in the system, it
ought to be mu h simpler
Yet, traditionally, on urrent programming is onsidered to be an ad-
van ed and diÆ ult topi { ertainly mu h harder than serial omputing
whi h, therefore, needs to be mastered �rst. But this tradition is wrong.
This talk presents an intuitive, sound and pra ti al model of parallel
omputing that an be mastered by undergraduate students in the �rst
year of a omputing (major) degree. It is based upon Hoare's mathe-
mati al theory of Communi ating Sequential Pro esses (CSP), but does
not require mathemati al maturity from the students { that maturity is
pre-engineered in the model. Fluen y an be qui kly developed in both
message-passing and shared-memory on urren y, whilst learning to ope
with key issues su h as ra e hazards, deadlo k, livelo k, pro ess starva-
tion and the eÆ ient use of resour es. Pra ti al work an be hosted on
ommodity PCs or UNIX workstations using either Java or the o am
multipro essing language. Armed with this maturity, students are well-
prepared for oping with real problems on real parallel ar hite tures that
have, possibly, less robust mathemati al foundations.
1 Introdu tion
At Kent, we have been tea hing parallel omputing at the undergraduate level
for the past ten years. Originally, this was presented to �rst-year students before
they be ame too set in the ways of serial logi . When this ourse was expanded
into a full unit (about 30 hours of tea hing), timetable pressure moved it into
the se ond year. Either way, the material is easy to absorb and, after only a
few (around 5) hours of tea hing, students have no diÆ ulty in grappling with
the intera tions of 25 (say) threads of ontrol, appre iating and eliminating ra e
hazards and deadlo k.
Parallel omputing is still an immature dis ipline with many on i ting ul-
tures. Our approa h to edu ating people into su essful exploitation of parallel
me hanisms is based upon fo using on parallelism as a powerful tool for simpli-
fying the des ription of systems, rather than simply as a means for improving
their performan e. We never start with an existing serial algorithm and say:
`OK, let's parallelise that!'. And we work solely with a model of on urren y
that has a semanti s that is ompositional { a fan y word for WYSIWYG { sin e,
without that property, ombinatorial explosions of omplexity always get us as
soon as we step away from simple examples. In our view, this rules out low-level
on urren y me hanisms, su h as spin-lo ks, mutexes and semaphores, as well
as some of the higher-level ones (like monitors).
Communi ating Sequential Pro esses (CSP)[1{3℄ is a mathemati al theory for
spe ifying and verifying omplex patterns of behaviour arising from intera tions
between on urrent obje ts. Developed by Tony Hoare in the light of earlier
work on monitors, CSP has a ompositional semanti s that greatly simpli�es
the design and engineering of su h systems { so mu h so, that parallel design
often be omes easier to manage than its serial ounterpart. CSP primitives have
also proven to be extremely lightweight, with overheads in the order of a few
hundred nanose onds for hannel syn hronisation (in luding ontext-swit h) on
urrent mi ropro essors [4, 5℄.
Re ently, the CSP model has been introdu ed into the Java programming
language [6{10℄. Implemented as a library of pa kages [11, 12℄, JavaPP[10℄ en-
ables multithreaded systems to be designed, implemented and reasoned about
entirely in terms of CSP syn hronisation primitives ( hannels, events, et .) and
onstru tors (parallel, hoi e, et .). This allows 20 years of theory, design pat-
terns (with formally proven good properties { su h as the absen e of ra e hazards,
deadlo k, livelo k and thread starvation), tools supporting those design patterns,
edu ation and experien e to be deployed in support of Java-based multithreaded
appli ations.
2 Pro esses, Channels and Message Passing
This se tion des ribes a simple and stru tured multipro essing model derived
from CSP. It is easy to tea h and an des ribe arbitrarily omplex systems. No
formal mathemati s need be presented { we rely on an intuitive understanding
of how the world works.
2.1 Pro esses
A pro ess is a omponent that en apsulates some data stru tures and algorithms
for manipulating that data. Both its data and algorithms are private. The outside
world an neither see that data nor exe ute those algorithms. Ea h pro ess is
alive, exe uting its own algorithms on its own data. Be ause those algorithms are
exe uted by the omponent in its own thread (or threads) of ontrol, they express
2
the behaviour of the omponent from its own point of view1. This onsiderably
simpli�es that expression.
A sequential pro ess is simply a pro ess whose algorithms exe ute in a single
thread of ontrol. A network is a olle tion of pro esses (and is, itself, a pro ess).
Note that re ursive hierar hies of stru ture are part of this model: a network is
a olle tion of pro esses, ea h of whi h may be a sub-network or a sequential
pro ess.
But how do the pro esses within a network intera t to a hieve the behaviour
required from the network? They an't see ea h other's data nor exe ute ea h
other's algorithms { at least, not if they abide by the rules.
2.2 Syn hronising Channels
The simplest form of intera tion is syn hronised message-passing along hannels.
The simplest form of hannel is zero-bu�ered and point-to-point. Su h hannels
orrespond very losely to our intuitive understanding of a wire onne ting two
(hardware) omponents.
Ac
B
Fig. 1. A simple network
In Figure 1, A and B are pro esses and is a hannel onne ting them. A wire
has no apa ity to hold data and is only a medium for transmission. To avoid
undete ted loss of data, hannel ommuni ation is syn hronised. This means
that if A transmits before B is ready to re eive, then A will blo k. Similarly, if
B tries to re eive before A transmits, B will blo k. When both are ready, a data
pa ket is transferred { dire tly from the state spa e of A into the state spa e of
B. We have a syn hronised distributed assignment.
2.3 Legoland
Mu h an be done, or simpli�ed, just with this basi model { for example the de-
sign and simulation of self-timed digital logi , multipro essor embedded ontrol
systems (for whi h o am[13{16℄ was orignally designed), GUIs et .
Here are some simple examples to build up uen y. First we introdu e some
elementary omponents from our `tea hing' atalogue { see Figure 2. All pro-
esses are y li and all transmit and re eive just numbers. The Id pro ess y les
1 This is in ontrast with simple `obje ts' and their `methods'. A method body nor-
mally exe utes in the thread of ontrol of the invoking obje t. Consequently, obje t
behaviour is expressed from the point of view of its environment rather than the
obje t itself. This is a slightly onfusing property of traditional `obje t-oriented'
programming.
3
through waiting for a number to arrive and, then, sending it on. Although in-
serting an Id pro ess in a wire will learly not a�e t the data owing through
it, it does make a di�eren e. A bare wire has no bu�ering apa ity. A wire on-
taining an Id pro ess gives us a one-pla e FIFO. Conne t 20 in series and we
get a 20-pla e FIFO { sophisti ated fun tion from a trivial design.
Idin out
Id (in, out)
in outSucc
Succ (in, out)
+out
in0
in1
Plus (in0, in1, out)
inout0
out1
Delta (in, out0, out1)
in outTail
Tail (in, out)Prefix (n, in, out)
in outn
Fig. 2. Extra t from a omponent atalogue
Su is like Id, but in rements ea h number as it ows through. The Plus
omponent waits until a number arrives on ea h input line (a epting their arrival
in either order) and outputs their sum. Delta waits for a number to arrive and,
then, broad asts it in parallel on its two output lines { both those outputs must
omplete (in either order) before it y les round to a ept further input. Prefix
�rst outputs the number stamped on it and then behaves like Id. Tail swallows
its �rst input without passing it on and then, also, behaves like Id. Prefix
and Tail are so named be ause they perform, respe tively, pre�xing and tail
operations on the streams of data owing through them.
It's essential to provide a pra ti al environment in whi h students an develop
exe utable versions of these omponents and play with them (by plugging them
together and seeing what happens). This is easy to do in o am and now, with
the JCSP library[11℄, in Java. Appendi es A and B give some of the details. Here
we only give some CSP pseudo- ode for our atalogue (be ause that's shorter
than the real ode):
Id (in, out) = in ? x --> out ! x --> Id (in, out)
Su (in, out) = in ? x --> out ! (x+1) --> Su (in, out)
4
Plus (in0, in1, out)
= ((in0 ? x0 --> SKIP) || (inl ? x1 --> SKIP));
out ! (x0 + x1) --> Plus (in0, in1, out)
Delta (in, out0, out1)
= in ? x --> ((out0 ! x --> SKIP) || (out1 ! x --> SKIP));
Delta (in, out0, out1)
Prefix (n, in, out) = out ! n --> Id (in, out)
Tail (in, out) = in ? x --> Id (in, out)
[Notes: `free' variables used in these pseudo- odes are assumed to be lo ally
de lared and hidden from outside view. All these omponents are sequential pro-
esses. The pro ess (in ? x --> P (...)) means: \wait until you an engage
in the input event (in ? x) and, then, be ome the pro ess P (...)". The input
operator (?) and output operator (!) bind more tightly than the -->.℄
2.4 Plug and Play
Plugging these omponents together and reasoning about the resulting behaviour
is easy. Thanks to the rules on pro ess priva y2, ra e hazards leading to unpre-
di table internal state do not arise. Thanks to the rules on hannel syn hronisa-
tion, data loss or orruption during ommuni ation annot o ur3. What makes
the reasoning simple is that the parallel onstru tor and hannel primitives are
deterministi . Non-determinism has to be expli itly designed into a pro ess and
oded { it an't sneak in by a ident!
Figure 3 shows a simple example of reasoning about network omposition.
Conne t a Prefix and a Tail and we get two Ids:
(Prefix (in, ) || Tail ( , out)) = (Id (in, ) || Id ( , out))
Equivalen e means that no environment (i.e. external network in whi h they
are pla ed) an tell them apart. In this ase, both ir uit fragments implement a
2-pla e FIFO. The only pla e where anything di�erent happens is on the internal
wire and that's undete table from outside. The formal proof is a one-liner from
the de�nition of the parallel (||), ommuni ations (!, ?) and and-then-be omes
(-->) operators in CSP. But the good thing about CSP is that the mathemati s
engineered into its design and semanti s leanly re e ts an intuitive human feel
for the model. We an see the equivalen e at a glan e and this qui kly builds
on�den e both for us and our students.
2 No external a ess to internal data. No external exe ution of internal algorithms
(methods).3 Unreliable ommuni ations over a distributed network an be a ommodated in this
model { the unreliable network being another a tive pro ess (or set of pro esses)
that happens not to guarantee to pass things through orre tly.
5
c outinn Tail
c outinId Id
=
Fig. 3. A simple equivalen e
out
Succ
0
c
a
b
Numbers (out)
outin
+
0
c
a
b
Integrate (in, out)
Pairs (in, out)
outin+
a
Tailb c
Fig. 4. Some more interesting ir uits
6
Figure 4 shows some more interesting ir uits with the �rst two in orporating
feedba k. What do they do? Ask the students! Here are some CSP pseudo- odes
for these ir uits:
Numbers (out)
= Prefix (0, , a) || Delta (a, out, b) || Su (b, )
Integrate (in, out)
= Plus (in, , a) || Delta (a, out, b) || Prefix (0, b, )
Pairs (in, out)
= Delta (in, a, b) || Tail (b, ) || Plus (a, , out)
Again, our rule for these pseudo- odes means that a, b and are lo ally
de lared hannels (hidden, in the CSP sense, from the outside world). Appendi es
A and B list o am and Java exe utables { noti e how losely they re e t the
CSP.
Ba k to what these ir uits do: Numbers generates the sequen e of natural
numbers, Integrate omputes running sums of its inputs and Pairs outputs
the sum of its last two inputs. If we wish to be more formal, let <i> represent
the i'th element that passes through hannel { i.e. the �rst element through
is <1>. Then, for any i >= 1:
Numbers: out<i> = i - 1
Integrate: out<i> = Sum {in<j> | j = 1..i}
Pairs: out<i> = in<i> + in<i + 1>
Be areful that the above details only part of the spe i� ation of these ir uits:
how the values in their output stream(s) relate to the values in their input
stream(s). We also have to be aware of how exible they are in syn hronising
with their environments, as they generate and onsume those streams. The base
level omponents Id, Su , Plus and Delta ea h demand one input (or pair of
inputs) before generating one output (or pair of outputs). Tail demands two
inputs before its �rst output, but thereafter gives one output for ea h input.
This e�e t arries over into Pairs. Integrate adds 2-pla e bu�ering between
its input and output hannels (ignoring the transformation in the a tual values
passed). Numbers will always deliver to anything trying to take input from it.
If ne essary, we an make these syn hronisation properties mathemati ally
pre ise. That is, after all, one of the reasons for whi h CSP was designed.
2.5 Deadlo k { First Conta t
Consider the ir uit in Figure 5. A simple stream analysis would indi ate that:
Pairs2: a<i> = in<i>
Pairs2: b<i> = in<i>
Pairs2: <i> = b<i + 1> = in<i + 1>
Pairs2: d<i> = <i + 1> = in<i + 2>
Pairs2: out<i> = a<i> + d<i> = in<i> + in<i + 2>
7
Pairs2 (in, out)
in out+
a
b
c
dTail Tail
Fig. 5. A dangerous ir uit
But this analysis only shows what would be generated if anything were gen-
erated. In this ase, nothing is generated sin e the system deadlo ks. The two
Tail pro esses demand three items from Delta before delivering anything to
Plus. But Delta an't deliver a third item to the Tails until it's got rid of its
se ond item to Plus. But Plus won't a ept a se ond item from Delta until it's
had its �rst item from the Tails. Deadlo k!
In this ase, deadlo k an be designed out by inserting an Id pro ess on
the upper (a) hannel. Id pro esses (and FIFOs in general) have no impa t on
stream ontents analysis but, by allowing a more de oupled syn hronisation, an
impa t on whether streams a tually ow. Beware, though, that adding bu�ering
to hannels is not a general ure for deadlo k.
So, there are always two questions to answer: what data ows through the
hannels, assuming data does ow, and are the ir uits deadlo k-free? Deadlo k
is a monster that must { and an { be vanquished. In CSP, deadlo k only o urs
from a y le of ommitted attempts to ommuni ate (input or output): ea h pro-
ess in the y le refusing its prede essor's all as it tries to onta t its su essor.
Deadlo k potential is very visible { we even have a deadlo k primitive (STOP) to
represent it, on the grounds that it is a good idea to know your enemy!
In pra ti e, there now exist a wealth of design rules that provide formally
proven guarantees of deadlo k freedom[17{22℄. Design tools supporting these
rules { both onstru tive and analyti al { have been resear hed[23,24℄. Deadlo k,
together with related problems su h as livelo k and starvation, need threaten us
no longer { even in the most omplex of parallel system.
2.6 Stru tured Plug and Play
Consider the ir uits of Figure 6. They are similar to the previous ir uits,
but ontain omponents other than those from our base atalogue { they use
omponents we have just onstru ted. Here is the CSP:
We have seen how �xed apa ity FIFO bu�ers an be added as a tive pro esses
to CSP hannels. For the o am binding, the overheads for su h extra pro esses
are negligible.
With the JavaPP libraries, the same te hnique may be used, but the hannel
obje ts an be dire tly on�gured to support bu�ered ommuni ations { whi h
saves a ouple of ontext swit hes. The user may supply obje ts supporting any
bu�ering strategy for hannel on�guration, in luding normal blo king bu�ers,
overwrite-when-full bu�ers, in�nite bu�ers and bla k-hole bu�ers ( hannels that
an be written to but not read from { useful for masking o� unwanted outputs
from omponents that, otherwise, we wish to reuse inta t). However, the user
had better stay aware of the semanti s of the hannels thus reated!
Asyn hronous ommuni ation is ommonly found in libraries supporting inter-
pro essor message-passing (su h as PVM and MPI). However, the on urren y
model usually supported is one for whi h there is only one thread of ontrol on
ea h pro essor. Asyn hronous ommuni ation lets that thread of ontrol laun h
an external ommuni ation and ontinue with its omputation. At some point,
that omputation may need to blo k until that ommuni ation has ompleted.
These me hanisms are easy to obtain from the on urren y model we are
tea hing (and whi h we laim to be general). We don't need anything new.
Asyn hronous sends are what happen when we output to a bu�er (or bu�ered
hannel). If we are worried about being blo ked when the bu�er is full or if we
need to blo k at some later point (should the ommuni ation still be un�nished),
we an simply spawn o� another pro ess7 to do the send:
(out ! pa ket --> SKIP |PRI| someMoreComputation (...));
ontinue (...)
The ontinue pro ess only starts when both the pa ket has been sent
and someMoreComputation has �nished. someMoreComputation and sending the
pa ket pro eed on urrently. We have used the priority version of the parallel
operator (|PRI|, whi h gives priority to its left operand), to ensure that the send-
ing pro ess initiates the transfer before the someMoreComputation is s heduled.
Asyn hronous re eives are implemented in the same way:
(in ? pa ket --> SKIP |PRI| someMoreComputation (...));
ontinue (...)
2.10 Shared Channels
CSP hannels are stri tly point-to-point. o am3[28℄ introdu ed the notion of
(se urely) shared hannels and hannel stru tures. These are further extended
in the KRoC o am[29℄ and JavaPP libraries and are in luded in the tea hing
model.
7 The o am overheads for doing this are less than half a mi rose ond.
17
A hannel stru ture is just a re ord (or obje t) holding two or more CSP
hannels. Usually, there would be just two hannels { one for ea h dire tion of
ommuni ation. The hannel stru ture is used to ondu t a two-way onversation
between two pro esses. To avoid deadlo k, of ourse, they will have to understand
proto ols for using the hannel stru ture { su h as who speaks �rst and when the
onversation �nishes. We all the pro ess that opens the onversation a lient
and the pro ess that listens for that all a server8.
clients servers
Fig. 13. A many-many shared hannel
The CSP model is extended by allowing multiple lients and servers to share
the same hannel (or hannel stru ture) { see Figure 13. Sanity is preserved
by ensuring that only one lient and one server use the shared obje t at any
one time. Clients wishing to use the hannel queue up �rst on a lient-queue
(asso iated with the shared hannel) { servers on a server-queue (also asso iated
with the shared hannel). A lient only ompletes its a tions on the shared
hannel when it gets to the front of its queue, �nds a server (for whi h it may
have to wait if business is good) and ompletes its transa tion. A server only
ompletes when it rea hes the front of its queue, �nds a lient (for whi h it may
have to wait in times of re ession) and ompletes its transa tion.
Note that shared hannels { like the hoi e operator between multiple events
{ introdu e s heduling dependent non-determinism. The order in whi h pro esses
are granted a ess to the shared hannel depends on the order in whi h they join
the queues.
Shared hannels provide a very eÆ ient me hanism for a ommon form of
hoi e. Any server that o�ers a non-dis riminatory servi e9 to multiple lients
should use a shared hannel, rather than ALTing between individual hannels
from those lients. The shared hannel has a onstant time overhead { ALTing
is linear on the number of lients. However, if the server needs to dis riminate
between its lients (e.g. to refuse servi e to some, depending upon its internal
state), ALTing gives us that exibility. The me hanisms an be eÆ iently om-
bined. Clients an be grouped into equal-treatment partitions, with ea h group
lustered on its own shared hannel and the server ALTing between them.
8 In fa t, the lient/server relationship is with respe t to the hannel stru ture. A
pro ess may be both a server on one interfa e and a lient on another.9 Examples for su h servers in lude window managers for multiple animation pro esses,
data loggers for re ording tra es from multiple omponents from some ma hine, et .
18
For deadlo k freedom, ea h server must guarantee to respond to a lient all
within some bounded time. During its transa tion with the lient, it must follow
the proto ols for ommuni ation de�ned for the hannel stru ture and it may
engage in separate lient transa tions with other servers. A lient may open a
transa tion at any time but may not interleave its ommuni ations with the
server with any other syn hronisation (e.g. with another server). These rules
have been formalised as CSP spe i� ations[21℄. Client-server networks may have
plenty of data- ow feedba k but, so long as no y le of lient-server relations
exist, [21℄ gives formal proof that the system is deadlo k, livelo k and starvation
free.
Shared hannel stru tures may be stret hed a ross distributed memory (e.g.
networked) multipro essors[15℄. Channels may arry all kinds of obje t { in lud-
ing hannels and pro esses themselves. A shared hannel is an ex ellent means for
a lient and server to �nd ea h other, pass over a private hannel and ommuni-
ate independently of the shared one. Pro esses will drag pre-atta hed hannels
with them as they are moved and an have lo al hannels dynami ally (and
temporarily) atta hed when they arrive. See David May's work on I arus[30, 31℄
for a onsistent, simple and pra ti al realisation of this model for distributed
and mobile omputing.
3 Events and Shared Memory
Shared memory on urren y is often des ribed as being `easier' than message
passing. But great are must be taken to syn hronise on urrent a ess to shared
data, else we will be plagued with ra e hazards and our systems will be useless.
CSP primitives provide a sharp set of tools for exer ising this ontrol.
3.1 Symmetri Multi-Pro essing (SMP)
The private memory/algorithm prin iples of the underlying model { and the
se urity guarantees that go with them { are a powerful way of programming
shared memory multipro essors. Pro esses an be automati ally and dynami-
ally s heduled between available pro essors (one obje t ode �ts all). So long
as there is an ex ess of (runnable) pro esses over pro essors and the s heduling
overheads are suÆ iently low, high multipro essor eÆ ien y an be a hieved {
with guaranteed no ra e hazards. With the design methods we have been de-
s ribing, it's very easy to generate lots of pro esses with most of them runnable
most of the time.
3.2 Token Passing and Dynami CREW
Taking advantage of shared memory to ommuni ate between pro esses is an
extension to this model and must be syn hronised. The shared data does not
belong to any of the sharing pro esses, but must be globally visible to them {
either on the sta k (for o am) or heap (for Java).
19
The JavaPP hannels in previous examples were only used to send data values
between pro esses { but they an also be used to send obje ts. This steps outside
the automati guarantees against ra e hazard sin e, un onstrained, it allows
parallel a ess to the same data. One ommon and useful onstraint is only to
send immutable obje ts. Another design pattern treats the sent obje t as a token
onferring permission to use it { the sending pro ess losing the token as a side-
e�e t of the ommuni ation. The tri k is to ensure that only one opy of the
token ever exists for ea h sharable obje t.
Dynami CREW (Con urrent Read Ex lusive Write) operations are also pos-
sible with shared memory. Shared hannels give us an eÆ ient, elegant and easily
provable way to onstru t an a tive guardian pro ess with whi h appli ation pro-
esses syn hronise to e�e t CREW a ess to the shared data. Guarantees against
starvation of writers by readers { and vi e-versa { are made. Details will appear
in a later report (available from [32℄).
3.3 Stru tured Barrier Syn hronisation and SPMD
Point-to-point hannels are just a spe ialised form of the general CSP multi-
pro ess syn hronising event. The CSP parallel operator binds pro esses together
with events. When one pro ess syn hronises on an event, all pro esses registered
for that event must syn hronise on it before that �rst pro ess may ontinue.
Events give us stru tured multiway barrier syn hronisation[29℄.
P
M
D
b1 b1b0 b0 b0 b0b2 b2
Fig. 14. Multiple barriers to three pro esses
We an have many event barriers in a system, with di�erent (and not ne es-
sarily disjoint) subsets of pro esses registered for ea h barrier. Figure 14 shows
the exe ution tra es for three pro esses (P, M and D) with time owing horizon-
tally. They do not all progress at the same { or even onstant { speed. From
time to time, tha faster ones will have to wait for their slower partners to rea h
an agreed barrier before all of them an pro eed. We an wrap up the system in
typi al SPMD form as:
|| <i = 0 FOR 3>
S (i, ..., b0, b1, b2)
20
where b0, b1 and b2 are events. The repli ated parallel operator runs 3 instan es
of S in parallel (with i taking the values 0, 1 and 2 respe tively in the di�erent
instan es). The S pro ess simply swit hes into the required form:
S (i, ..., b0, b1, b2)
= CASE i
0 : P (..., b0, b1)
1 : M (..., b0, b1, b2)
2 : D (..., b1, b2)
and where P, M and D are registered only for the events in their parameters. The
ode for P has the form:
P (..., b0, b1)
= someWork (...); b0 --> SKIP;
moreWork (...); b0 --> SKIP;
lastBitOfWork (...); b1 --> SKIP;
P (..., b0, b1)
3.4 Non-Blo king Barrier Syn hronisation
In the same way that asyn hronous ommuni ations an be expressed (se tion
2.9), we an also a hieve the somewhat ontradi tory sounding, but potentially
useful, non-blo king barrier syn hronisation.
In terms of serial programming, this is a two-phase ommitment to the bar-
rier. The �rst phase de lares that we have done everything we need to do this
side of the barrier, but does not blo k us. We an then ontinue for a while, doing
things that do not disturb what we have set up for our partners in the barrier
and do not need whatever it is that they have to set. When we need their work,
we enter the se ond phase of our syn hronisation on the barrier. This blo ks us
only if there is one, or more, of our partners who has not rea hed the �rst phase
of its syn hronisation. With lu k, this window on the barrier will enable most
pro esses most of the time to pass through without blo king:
doOurWorkNeededByOthers (...);
barrier.firstPhase ();
privateWork (...);
barrier.se ondPhase ();
useSharedResour esProte tedByTheBarrier (...);
With our lightweight CSP pro esses, we do not need these spe ial phases to
get the same e�e t:
doOurWorkNeededByOthers (...);
(barrier --> SKIP |PRI| privateWork (...));
useSharedResour esProte tedByTheBarrier (...);
The explanation as to why this works is just the same as for the asyn hronous
sends and re eives.
21
3.5 Bu ket Syn hronisation
Although CSP allows hoi e over general events, the o am and Java bindings
do not. The reasons are pra ti al { a on ern for run-time overheads10. So,
syn hronising on an event ommits a pro ess to wait until everyone registered for
the event has syn hronised. These multi-way events, therefore, do not introdu e
non-determinism into a system and provide a stable platform for mu h s ienti�
and engineering modelling.
Bu kets[15℄ provide a non-deterministi version of events that are useful for
when the system being modelled is irregular and dynami (e.g. motor vehi le
traÆ [33℄). Bu kets have just two operations: jump and ki k. There is no limit
to the number of pro esses that an jump into a bu ket { where they all blo k.
Usually, there will only be one pro ess with responsibility for ki king over the
bu ket. This an be done at any time of its own (internal) hoosing { hen e the
non-determinism. The result of ki king over a bu ket is the unblo king of all the
pro esses that had jumped into it11.
4 Con lusions
A simple model for parallel omputing has been presented that is easy to learn,
tea h and use. Based upon the mathemati ally sound framework of Hoare's CSP,
it has a ompositional semanti s that orresponds well with out intuition about
how the world is onstru ted. The basi model en ompasses obje t-oriented de-
sign with a tive pro esses (i.e. obje ts whose methods are ex lusively under their
own thread of ontrol) ommuni ating via passive, but syn hronising, wires. Sys-
tems an be omposed through natural layers of ommuni ating omponents so
that an understanding of ea h layer does not depend on an understanding of the
inner ones. In this way, systems with arbitrarily omplex behaviour an be safely
onstru ted { free from ra e hazard, deadlo k, livelo k and pro ess starvation.
A small extension to the model addresses fundamental issues and paradigms
for shared memory on urren y (su h as token passing, CREW dynami s and
bulk syn hronisation). We an explore with equal uen y serial, message-passing
and shared-memory logi and strike whatever balan e between them is appro-
priate for the problem under study. Appli ations in lude hardware design (e.g.
FPGAs and ASICs), real-time ontrol systems, animation, GUIs, regular and
irregular modelling, distributed and mobile omputing.
o am and Java bindings for the model are available to support pra ti al
work on ommodity PCs and workstations. Currently, the o am bindings are
10 Syn hronising on an event in o am has a unit time overhead, regardless of the num-
ber of pro esses registered. This in ludes being the last pro ess to syn hronise, when
all blo ked pro esses are released. These overheads are well below a mi rose ond for
modern mi ropro essors.11 As for events, the jump and ki k operations have onstant time overhead, regardless
of the number of pro esses involved. The bu ket overheads are slightly lower than
those for events.
22
the fastest ( ontext-swit h times under 300 nano-se onds), lightest (in terms
of memory demands), most se ure (in terms of guaranteed thread safety) and
qui kest to learn. But Java has the libraries (e.g. for GUIs and graphi s) and
will get faster. Java thread safety, in this ontext, depends on following the CSP
design patterns { and these are easy to a quire12.
The JavaPP JCSP library[11℄ also in ludes an extension to the Java AWT
pa kage that drops hannel interfa es on all GUI omponents13. Ea h item (e.g.
a Button) is a pro ess with a onfigure and a tion hannel interfa e. These are
onne ted to separate internal handler pro esses. To hange the text or olour
of a Button, an appli ation pro ess outputs to its onfigure hannel. If some-
one presses the Button, it outputs down its a tion hannel to an appli ation
pro ess (whi h an a ept or refuse the ommuni ation as it hooses). Exam-
ple demonstrations of the use of this pa kage may be found at [11℄. Whether
GUI programming through the pro ess- hannel design pattern is simpler than
the listener- allba k pattern o�ered by the underlying AWT, we leave for the
interested reader to experiment and de ide.
All the primitives des ribed in this paper are available for KRoC o am and
Java. Multipro essor versions of the KRoC kernel targeting NoWs and SMPs will
be available later this year. SMP versions of the JCSP[11℄ and CJT[12℄ libraries
are automati if your JVM supports SMP threads. Hooks are provided in the
hannel libraries to allow user-de�ned network drivers to be installed. Resear h
is ontinuing on portable/faster kernels and language/tool design for enfor ing
higher level aspe ts of CSP design patterns (e.g. for shared memory safety and
deadlo k freedom) that urrently rely on self-dis ipline.
Finally, we stress that this is undergraduate material. The on epts are ma-
ture and fundamental { not advan ed { and the earlier they are introdu ed the
better. For developing uen y in on urrent design and implementation, no spe-
ial hardware is needed. Students an graduate to real parallel systems on e they
have mastered this uen y. The CSP model is neutral with respe t to parallel
ar hite ture so that oping with a hange in language or paradigm is straight-
forward. However, even for uni-pro essor appli ations, the ability to do safe and
lightweight multithreading is be oming ru ial both to improve response times
and simplify their design.
The experien e at Kent is that students absorb these ideas very qui kly and
be ome very reative14. Now that they an apply them in the ontext of Java,
they are smiling indeed.
12 Java a tive obje ts (pro esses) do not invoke ea h other's methods, but ommu-
ni ate only through shared passive obje ts with arefully designed syn hronisation
properties (e.g. hannels and events). Shared use of user-de�ned passive obje ts will
be automati ally thread-safe so long as the usage patterns outlined in Se tion 3 are
kept { their methods should not be syn hronized (in the sense of Java monitors).13 We believe that the new Swing GUI libraries from Sun (that will repla e the AWT)
an also be extended through a hannel interfa e for se ure use in parallel designs {
despite the warnings on erning the use of Swing and multithreading[34℄.14 The JCSP libraries used in Appendix B were produ ed by Paul Austin, an under-
graduate student at Kent.
23
Referen es
1. C.A. Hoare. Communi ation Sequential Pro esses. CACM, 21(8):666{677, August
1978.2. C.A. Hoare. Communi ation Sequential Pro esses. Prenti e Hall, 1985.3. Oxford University Computer Laboratory. The CSP Ar hive. <URL: http://
www. omlab.ox.a .uk/ ar hive/ sp.html>, 1997.4. P.H. Wel h and D.C. Wood. KRoC { the Kent Retargetable o am Compiler. In
B. O'Neill, editor, Pro eedings of WoTUG 19, Amsterdam, Mar h 1996. WoTUG,
IOS Press. <URL:http:// www.hensa.a .uk/ parallel/ o am/ proje ts/ o am-
for-all/ kro />.5. Peter H. Wel h and Mi hael D. Poole. o am for Multi-Pro essor DEC Alphas.
In A. Bakkers, editor, Parallel Programming and Java, Pro eedings of WoTUG
20, volume 50 of Con urrent Systems Engineering, pages 189{198, Amsterdam,
Netherlands, April 1997. World o am and Transputer User Group (WoTUG),
IOS Press.6. Peter Wel h et al. Java Threads Workshop { Post Workshop Dis us-