Top Banner

of 12

Scientific modeling with massively parallel SIMD computers

Apr 08, 2018

Download

Documents

allangoo
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    1/12

    Scientific Modeling with MassivelyParallel SIMD ComputersNIGEL B. WILDING, ARTHUR S. TREW, KEN A. HAWICK, AND G. STUART PAWLEYInvited Paper

    A great many scientific models poss ess a high degree of inherent para l-lelism. For simulation purposes this may often be exploited by employinga massively parallel SIMD computer. We describe one such computer,the distributed array processor (DAP), and discuss the optimal mappingof a typical problem onto the computer architecture to best exploit themodel parallelism. By focussing on specific m odels currently under study,we exemplify the types of problem which benefit most from a paral-lel implementation. The extent of this benefit is considered relative toimplementation on a machine of conventional architecture.

    I. INTRODUCTIONIt has been a tradition in science to seek an understandingof the world by decomposing problems into progressivelysmaller subunits. While being most successful in manysenses, this approach has suffered from the disadvantagethat it fails to provide information on the manner in whichindividually simple units or subunits interact with oneanother to generate collectively complex behavior. Thus forexample, we gain little understanding of the cooperativebehavior of the atoms in a crystal from the study of theinner structure of atoms. Nevertheless, there are a numberof fundamental many-body problems that must be tackled,and in many cases very little analytic progress has beenmade towards their understanding. In an effort to gain

    more knowledge of such complex phenomena, scientists areappealing increasingly to computer simulation techniques.In recent years, advances in these techniques have coupledwith the huge expansion in computer power and availabilityto establish simulation as a vital tool for the study of manyscientific phenomena.

    Manuscript received April 5, 1990; revised August 31, 1990. Fundingfor DAP 510 and DAP 608 was provided by the Alvey ARC H001 projectand by the Computer Board. The first author was supported by a sciencefaculty scholarship from the University of Edinburgh. The third authorwas supported by an SERC CASE studentship with AERE Harwell.N. B. Wilding and G. S. Pawley are with the Department of Physics,University of Edinburgh, Edinburgh EH9 3JZ, United Kingdom.A. S. Trew and K. A. Hawick a re with the Edinburgh Parallel ComputingCentre, University of Edinburgh, Edinburgh EH 9 3JZ, United Kingdom.IEEE Log Number 9042729.

    In many respects, a computer simulation resembles anexperiment. No apriori bias is incorporated in a simulation,we simply specify the physical laws governing the inter-actions between the constituents and allow the simulationto evolve as it will. In common with real experiments,simulations offer the possibility of new discoveries whilepossessing the additional advantage that complete controlis retained over all experimental parameters. Simulationsare also of use to theorists, as they may help to elucidate theproperties of analytically opaque models. They, therefore,play an important complementary role to experimentationand theory alike and serve as a modern bridge betweenthem.Although the task of programming a computer to modela complex system can be relatively easy, these programsalmost invariably require extremely computer-intensive cal-culations. This difficulty stems from the fact that naturalsystems typically involve huge numbers of interacting par-ticles. As the number of particles that can be used in asimulation is necessarily small, it is imperative that westudy the limitations imposed by the restrictions on theparticle number. With present-day computers attainablesample sizes range typically from lo2 to lo 6 particles.The most serious drawback of small simulations is thattheir spatial extent is often short relative to the correlationlengths over which cooperative phenomena can occur.When this occurs, a simulation can develop spurious ar-tifacts and non-physical features, usually termed finite-size effects. Although there is usually some scope forextrapolation of results derived from medium-scale modelsto larger systems, this is generally a non-trivial procedure.One is therefore often faced with the need to performlarger-scale simulations. This requirement itself necessitatescomputing power usually found only in supercomputers.

    11. THERISEOF TH E MACHINEThe first computers were serial devices, that is they ad-hered to the model of a von Neumann m achine and executed

    0018-9219/91/04~574$01.00 0 1991 IEEE

    57 4 PROCEEDINGS OF THE IEEE, VOL. 79, NO. 4, APRIL 1991

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    2/12

    I

    program instructions in strict time sequence. Clearly, thisis a very important computational model since the vastmajority of computers are still made this way. It does,however, suffer from one important problem: a potentiallack of speed. Since all operations are performed consecu-tively, the only way to reduce overall execution time is toreduce the time required for each instruction. This meansconstructing faster processors which, in order to reducesignal propagation times, means making smaller chips.Unfor tunately , it is now clear that in terms of chip designwe are approaching the physical limits of conventionaltechnology. Therefore, orders of m agnitude improvementsin processor speeds seem improbable, yet the computationaltasks which must be tackled are becoming ever larger. Sohow do we progress?One answer has been known for many years: m ove awayfrom the sequential model of computing. There are basicallytwo m ethods for d oing this, vectorization or parallelization.The former is used in a number of different commercialcomputers, most notably the Cray. It relies upon the p rin-ciple of a pipeline of processors, each performing part ofthe work and then passing its result along to the next, andso forth. This is analogous to an assembly line in a factory.Vectorization is most powerful when working with largedata sets in which op erations on arrays form a large part ofthe computational ioad. However, it suffers from the sam eproblem as the serial machine with physical limitations onprocessor speeds. Even the vector com puter manufacturersare now looking towards parallelism as the direction forthe future.A p arallel computer is one in which there are a large num-ber of linked, but independent processors. These executeinstructions on their data concurrently, passing informa-tion between processors as required. There are two basicmodels of parallelism: single instruction multiple datas-tream (SIMD) and multip le instructio n multiple datastream(MIMD). In the former all processors perform the sametasks synchronously but on different data, while in the latterboth the tasks and data may differ. In either case the speedof the compu ter is derived not from the power of any singleprocessor but from their multiplicity. This permits the useof less soph isticated, and therefo re cheaper, chip sets thantheir serial or vector counterparts. The Active MemoryTechnology distributed array processor (DAP) describedhereafter is a massively parallel SIMD array processor onwhich we have ten years of experience.Many problems in scientific programming, for examplefluid flow or image processing, have an inherent parallelismwhich results in the need to repeat the same operationmany times on different data. In these cases, the SIMDcomputational model fits the application more naturallythan MIMD and since the operations are identical theproblem is inherently load-balanced. This m eans that thereis no loss of efficiency due to the processors workingin lock-step. In fact, the terms data-parallel and control-parallel have been coined by Thinking Machines, the makerof the Connection Machine-a large SIMD computer, asalternative descriptions of SIMD and MIMD since theseWILDING er al.; SCIENTIFIC MODELING WITH SIMD COMPUTERS

    better reflect the form of the parallelism within the mod el.

    111. DAP ARCHITECTUREThe DAP is a massively parallel SIMD supercomputerwhich contains either32 x 32 (DAP 500 series) or 64 x 64(DAP 600) proces sing elements (PEs). It is fron t-ended bya UNIX o r VMS w orkstation which provides the operatingsystem and program VO and from the users point of view,therefore, operation is very straightforward.The PEs are arranged as a tw o-dim ension al array, theedges of which may be connected together to form a torustopology. This ability to select the hardware configurationfrom softw are enhances the range of applica tion of themachine con siderab ly. The PEs can perform va riable pre-cision arithmetic and are closely connected with high speednearest-neighbor links. There is also a set of orthogonal datahighways. The nearest-neighbor links provide a very fastinterprocessor commun ication rate of 1.1 Gbyte/s makingdata broadcast very efficient. The provision of logicalmasks m eans that individual processors can effectively beswitched off from parts of the calculation if required.Each PE has direct connection to its own localmemory, the size of which depends upon the individual

    machine but, for example, that at Edinburgh has 64kbit per PE, giving a total of 32 Mbyte. There is noglobal memory, though there is a separate code storein direct comm unicatio n with the master con trol unit(MCU) which, in turn, issues commands to the processorarray. A schematic diagram showing the structure ofthe DAP is given in Fig. 1. All current DAPs have a10-MHz clock, however, that at Edinburgh is a som ewhatolder model and runs at 8 MHz. The performance figuresquoted later (Table 1)can therefore be updated for the newhardware.The PEs are simple bit-serial processors which handleall functions (real and integer multiplies, trigonometricfunctions, etc.) in software. This has the disadvantage ofsome loss of speed com pared to handling this in hardware,but does allow for a grea ter range of variable precision thanis found normally. A new DAP with 8-bit coprocessorsattached to each PE is scheduled for release in late 1990.This addition will greatly enhance its performance forinteger and floating-point arithm etic.A fuller description ofthe DAP h ardware may be foun d in Past, Present, Parallel

    The DAP may be programmed in an expand ed dialect ofFortran called FORTRAN-PLUS which sup ports a numberof parallel datatypes permitting simple mapping of theproblem onto the proc essor array[2].An assembly language(APAL)is also available and this may be used with a re-sultant increase in speed within CPU intensive subroutines,especially when the variables are of low precision.Table 1gives some impression of the current DAPs arithmeticspeed when using FORTRAN-PLUS. The figures illustratecertain unusual features, for examp le a squaring is muchfaster than a multiplication.The suitability of the DAP for logical or short length

    P I .

    575

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    3/12

    Array Memory64kb

    ProcessorsFig. 1. Schematic diagram of DA P 600 architecture illustrating therelationship between the processor array, its memory and the mastercontrol unit. The host computer is also shown.

    Table 1 Rate of Operation on the DA P 608 Measuredin Million Operations/sOperation Rate Operation RateReal*4 multiply 22.6 Real*4 add 42.2Real*4 square 43.0 Integer* l multiply 117

    the atoms in a crystal or the neurons in a brain.In com mon with all parallel computers, a little thoughtis required prior to writing a DAP program to effect theoptimal mapping of a problem on to the machine architec-ture. In p ractice, the app ropriate mapping almost invariablytakes the form of som e type of geom etrical decompositionin which the total problem is subdivided into a numberof subunits, each of which contains one or more interactingsubsystem. The programmer then tries to assign one subunitto each processing element such that neighboring subunitsof the problem are mapped onto neighboring PEs. Su ch adecomposition ensures that all information transfer betweensubunits (needed for the calculation of interactions) in-volves only p hysically close processors, thereby minimizingcommunications overheads. Once achieved, an efficientmapping allows the full exploitation of the SIMD characterof the DAP by permitting the simultaneous manipulationof whole m atrices of variables.While the parallel character of the DAP makes it ex-tremely fast for a wide range of problems, it transpiresthat its performance is most o utstanding when dealingwith subsystems whose state can be represented by logicalvariables. This is in co ntrast to many machines which oftenwastefully use a byte per logical. In fact, a surprisingnumber of mod els can be represented in terms of logicalvariab les. For exam ple, as we shall detail later, the essen tialphysical features of ferro-magnetism can be formulatedin terms of a two state quantity known as spin. Aspin variable (+1or -1) is assigned to each atom in themodel ferromagnet, and it may flip between these valueson occasion depending on, amon g other things, the valuesof the spins of neighboring atoms.In FORTRAN-PLUS, the bit-matrices used to hold theReal*4 saxpy 86 Logical OR 2500

    sin(Real*4) 7.6 log(Real4) 19.2 spin state variables of our example ferromagnet could beeasily handled with statements like the following:arithmetic is obvious from these figures and it will be clearfrom the following sections that there are many prob lemswhich can exploit this to good advantage. Nevertheless,even for real arithmetic applications (e.g., Molecular Dy-namics simulations cf. Section 6.2) we have com pared thespeed of the DAP with the Cray-1 when both are runn ing anappropriate algorithm and find that the DAP 608 is fasterby a factor of 2 [3].Data transfer between the host and the DAP is handledby a special purpose SCSI interface which operates at arate of 1-2 Mbyte/s. A faster data channel for both inputand output is also available, this being used predominantlyfor graphical display. The provision of good graphicalvisualization is o ften essential for the full exploitation ofsuch machines since it can display results in a very directand immediate fashion.

    IV. IMPLEMENTNIONHere we discuss the implementation on the DA P of atypical complex system composed of a large number ofinteracting subsystems, which co uld represent, for examp le,

    logical L(*lOO, *loo)CC set spins randomly using system-suppliedC random number generatorC L = RANDOM-SETUP()

    do 10 istep=l, nstepsL = FUNC(L, shNc(L), shSc(L), shEc(L), shWc(L) )10 continue

    This declares L to be a two-dimensional (100 x 100)logical matrix, though the same program structure will holdirrespec tive of the data type of L. Th e * in the declarationof L indicates that these dimen sions may be operated onin parallel. The interaction function used to calculate thespin flip is expressed by FUNC which operates on thesubsystems themselves (stored in L ) , and their four nearestneighbors each shifted by one lattice spacing in the north,south, east, and west directio ns. These sh ift functions arepart of the FORTRAN-PLUS lang uage, and take only a few

    576 PROCEEDINGS OF THE IEEE, VOL. 79, NO. 4, APRIL 1991

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    4/12

    machine clock cycles to move a whole matrix. By virtueof the SIMD character of the DAP, all PEs update theirelements of L simultaneously.Since the matrix L is larger than the size of the processo rarray it is necessary to take into accou nt the mapping of thedata structure across the physical array boundaries whenperforming shift functions, etc. In older implementationsof the compiler such allowances had to be carried out bythe program mer, but in the latest version this is handled bysystem software and is transparent to the user. Althoughthis has resulted in a con siderable increase in ease of use itis still true that the DAP is most efficient when the matrixdimensions are integral multiples of the ph ysical array size.Despite being a 2-D array of processors, the DAP isnevertheless readily pro grammed for 1-D, 3-D and indeedmodels of any dim ensionality. The DA P system softwareallows a quantity declared as a matrix to be treated as a1-D array known a s a long-vector, with the d ifferences inconnectivity handled by the system software at a little lossof speed. Problems involving dim ensionality greater thantwo are handled by using stacks of matrices. In general,programm ing is simplest if one restricts the parallelism to2-D with serial looping over higher dimensions. Howevermore soph isticated mapping strategies have been dev eloped

    Below, we describe in some detail a number of modelsof interest to com putational scientists, which have been re-cently implementedon the DAPs at Edinburgh. In additionto those models discussed, this machine has been widelyused for ap plications as diverse as quantum chromodynam-ics, hydrodynam ics, image-processing and neural netwo rks

    ~41 .

    [5I, [6I*v. CEUULARAUTOMATAA. Introduction

    The concept of the Cellular Automaton (CA) wasfirst proposed by von Neumann in the 1940s [7]. CAsare essentially elementary systems whose collective be-havior can mimic the highly complex phenomena foundin real physical systems. They therefore offer a wealthof potential for sophisticated modeling while rem ainingcomputationally simple. This simplicity offers the dualadvantages of both high speed and ea se of programmingcompared with the m ore traditional brute force simulationtechniques, such as the numerical solution of systems ofpartial differential equations.Essentially, CAs can be envisaged as cells in spaceforming a regular grid or lattice. Each cell is surroundedby neighbors, so, for example, a cell in a 2-D square gridhas four nearest neighbors to the North, South, East, andWest directions and 4 next-nearest neighbors in the NW,NE,SE, and SW directions.Each autom aton can ex ist in a number of discrete states,such as dormant or active, si ck or healthy. Hencein this case the autom aton could be described by a pair oflogical (boolean) variables, e.g., (0 ,l) meaning dormantbut healthy. The state of any given automaton mayWILDING et al.; SCIENTIFIC MODELING WITH SIMD COMPUTERS

    change at regu lar intervals, or timestep s, according to a setof rules. Typical rules for updating depen d jointly on thestate at the previous timestep of the automaton and thoseof its neighbor cells.Clearly each autom aton cell is updated in the same way,and this type of parallelism is ideally suited to the SIMDcharacter of the DAP. Thus for example, with a 64 x 64processor DAP we can simu ltaneously update 4096 au-tomata states, a clear advantage over a serial machine whichwould have to update each state successively. Even greaterbenefits accrue from the D AP standp oint when o ne canrepresent the automata states by Boolean variables ratherthan integers or reals.In the following we examine examples of the applicationsof CAs which have been implemented on the EdinburghDAPs.

    1)S and ona Table This is a very good example of acellular automaton because it is both simple and easilyunderstood yet its results are fascinating and unexpected.Consider a table divided into sm all squares, the C A cells,which can each contain a number of grains of sand. The CAstate is then simply the number of grains in each square.At any on e time-step each CA will give a grain of sand toall its four neighbors provided it has four or m ore grains togive. Accordingly, it will receive grains of sand from thoseof its neighbo rs who h ave four or m ore grains. All activityceases when no CA has more than three grains of sand;this happens because those CAs at the edge send grainsoff the table which are lost.The interest in this CA problem is in the pattern whichresults as the end point of all the activity. It is rathersurprising that if the system starts in an extremely orderedarrangement, such as by having N = 7 grains on eachsquare, a fractal pattern results. Figure 2(a) show s theendpoint for such a starting arrangement on a table of25 6 x 256 squares. A certain motif is easily identifiedwhich appears at different sizes with small systematicvariations. The detail becomes progressively finer as thetable size increases, but the basic features remain. Thebiggest difference can be found by starting with a differentvalue of N, as Fig. 2(b) sh ows for N = 4 on a 512 x 5 1 2table.The interest in this problem does not end here, andthe reader may wish to consider the following question.The results have been ac hieved by simultaneou sly updating4096 CAs in such a way that none of the neighbors areupdated at the same time, and all the CAs are updatedregularly. Does the final result depend on this way ofupdating? Would either asynchronous or random u pdatingproduce a different pattern, or is there a fixed qnd-point?Answering this question required much more ingenuity onthe DAP because the program had to perform in a wayquite alien to its SIM D nature. However, invariance of thefinal result was exa ctly as the theorists had predicted.

    2) A Forest Fire This is another very simple CA whichis very well suited to the DAP. T he model is of fire burningthat the endpoint is a fixed point, invariant under the update scheme.The prediction of Ron Peierls of Brookhaven National Laboratory was

    577

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    5/12

    (b )Fig. 2. The pattern of sand grains remainingon the table at the end pointof runs with (a) 256 x 256 table with N = 7 (b) 512 x 512 table withN = 4. The white regions are where there are either 1 or 2 grains left.

    in a forest, or equivalen tly of diseas e spreading through apopulation. As above, the CA space is a square grid, butnow cyclic boundaries are used to approximate an infinitesystem. The CA has only three states, empty, tree, orbuming. At any one time, a tree ignites if any one of itsfour neig hbors is on fire. Simultan eously, any tree initiallybuming becomes exhausted and the CA state changes toempty. Any cell which is empty at the start of the timestephas a probability, P, of tree regeneration. Although thereis some debate as to whether such random features fallwithin the orig inal strict definition of the CA co ncept, theyare clearly very similar to their fully deterministic cousins,578

    -I --

    while being undo ubtedly distinct from the stochastic modelsto be discussed later in this work. They therefore meritdiscussion in the present context.Unlike the previous example, it is now important toupdate all the CAs simultaneously as asynchronous up-dating would giv e quite a differe nt result. On the DAP it iseasy to perform such updating with a 6 4 x 64 sample, butwith a larger system it is necessary to have some serialcode to handle the sub-lattices in such a way that theresult is equivalent to that on an SIMD machine whichwas the full sue of the problem. In our implementationwe have a maximum world size of 1024 x 1024; thisthen matches the graphics monitor exactly. Thus we coulduse the terminology of the Con nection Machine, and saythat each PE was operating as 256 virtual PEs, giving a1048576 virtual-PE machine for this problem.

    This CA problem must be one of the simplest to demon-strate critical behav ior. The m odel is desc ribed by a singleparameter, the regeneration probability (P). Varying thisquantity results in different behavior and it may be seen thatthere exists a c ritical value of P below which the fire cannotbe sustained, even in a truly infinite forest, This value caneasily be found (to a fair degree of accuracy) using trialand error by an interactive user. The examples shown inFig. 3 demonstrate the difference found when P is close tothe critical value, as in Fig. 3(a) for which P = 0.005, andwhen P is well away from criticality, P = 0.025, Fig. 3(b).As P becomes larger the wavefronts disappear.It is instructive to watch the progress of this model onthe monitor. If P is very small, fire is rare and large tractsof forest slowly become dense. Fire then sweeps throughthese regio ns with a very clear wave front. For the fire to besustained continually it is necessary that the regenerationof the forest behind th is front is sufficiently rapid for fire tocatch back. This becomes less likely as P diminish es, withthe result that the fire wavefronts start from rare events andthen become very large. The time between on e devastationand the next increases with decreasing P. There is alsoa slowing down of the phenomenon as we move to largersystem sizes, a topic to which we w ill retum in Section VI-C-1). These are two very fundamental propertiesof criticalphenomena, and make it progressively more difficult toimprove on the accuracy of measurement of the criticalvalue of P.3) EvolutionaryModelling-CAB Cellular Automata forEvolution (CAfE) is a DAP program w ritten to investigatethe interaction between different species of CAs witheach other and their environment. The CAfE worldis populated with a number of different species of CAwhich have varying success in survival or colonizationof a section of the world depen dent upon their gene ticmakeup and local conditions. Like the model abov e CAfEincludes a degree of randomn ess which extends the basicCA definition.The C A rule d etermines the survival and propag ation ofany given CA, dependent upon the number and distributionof its neighbors. In CAfE, the CAs are distributed on aregular orthogonal grid and each is described by a 16-bit

    PROCEEDINGS OF THE IEEE, VOL. 79, NO. 4, APRIL 1991

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    6/12

    Time = t

    (b)Fig. 3. Plane view of a burning forest. The white dots represent trees,and the burning trees are easily located at the wavefronts. The samplearea is 512 x 512, and the regeneration probability P is (a) 0.005 and(b) 0.025.

    logical string (the genome), giving 65526 unique species.The environment of each cell is described by a numberknown as its surround code ( S ) and is calculated for filledand empty cells alike. The surround-c ode has a value in therange 0-15 and is calculated by a sum of the occupanciesof nearest neighbo r cells, on a system in which N = 1, S =2 , E = 4,W = 8. Thus a cell with neighbors only on theSouth and East, therefore, has S = 6.A pictorial representation of the updating scheme for asingle cell is shown in Fig. 4, where for simplicity we haveassumed that this region is isolated. Su rvivalof a given CAis determined by a combination of its genome and surround-code: if the ( S + 1) genome bit element is 1, then thatWILDING et al.: SCIENTIFIC MODELING W I T H S l M D COM P UT E RS

    -_-I

    16

    Time = t + 1

    Fig. 4. Pictorial representation of the CAfE method. The central (dark)cell is surrounded on the N , S, an d E by an eternal, non-reproductivespecies (light grey). The genome for the central cell is shown below.Given the surround code fo r this cell (7) and that of its W neighbor (4) itmay be seen that it both survives and propagates to the west.

    cell survives, otherwise it dies. The propagation of speciesinto an empty site is also determined by the combinationof the surround-code for that cell and the genomes of itsneighbors. If the (S + 1) bit in a neighbor is 0 then thatCA may propagate into the empty cell. Several exclusionand preference rules also apply to regulate growth and todecide between potential propagators should more than onebe able to expand into a given cell.From the correspondence between the survival and prop-agation rules it is clear that there is an anti-correlationbetween the survivability of a CA species and its abilityto procreate. This gives an interesting conflict between theephemeral but highly reproductive species and those whichare long-lived but slow in propag ating. Similar balances areto be seen in nature. Mutation of CAs is also possible viaa random process which alters the genome by a smallamount, and this can alter the success of the new species.This is responsible for much of the complex behaviordisplayed by the model.The maximum world size is constrained by displaylimitations; this restricts CAfE to 512 x 51 2 grids. Al-though the complexity of the world is greater for largersimulations the same behavior is observed for a range ofmodel sizes. From our simulations, we have observed thatan initially complex eco-system will quickly stabilize intoone containing only a few species. Left on its own thedominant species may well expand to encompass the entireworld but with mutation new CA types are continuallycreated, some of which may supplant their parent. Thisprocess of the dominant species being replaced by a mu-tation is observed to recur. The establishment of symbiotic

    579

    I

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    7/12

    relationships between species, enabling local populationdensities to rise above the norm, is also seen. Such behaviormay be understood by considering the potential conflictwithin a species between its desire to have neighbors ina given direction and its inability to propagate in that way .When it encounters another CA spec ies which fulfills thisrequirement then the local den sity may rise.Given the simplicity of the model it is perhaps surprisingthat it exhibits such a large range of types of behavior in

    common w ith biological systems. We are at present inves-tigating the properties of the model in detail and observeeffects such as parasitism, comensulism and symbiosis.

    W. THERMALMODELSA . Introduction

    Among the most interesting and most studied of macro-scopic phenomena in nature, are the transformations oc-curring between the various states or phases of matter.Such phase transitions range from the very common, e.g.freezing of water, to the more bizarre, e,g. the onset ofsuperconductivity or the transition to chaos.Phase transitions are signaled by abrupt changes andlarge-scale fluctuations in the observables which charac-terize the system state. For such effects to occur requiressubstantial cooperative behavior between the m icroscopicconstituents of the system. It is of great interest to under-stand how microscopic behavior can lead to macroscopicchanges and phase transitions are therefore a topic con-cerning many scientists today.Statistical mechanics provides us with a precise mathe-matical relationship between the m icroscopic description ofa physical system and the resulting macroscopic behavior.Unfortunately this relationship is expressed in terms ofintegrals of high dimensionality which, with a few excep-tions, turn out to be intractable to d irect analytic assault.Simplifications or approximation schemes have thereforeto be found.

    Th e two principal computational techniques used in thestudy of thermal models are Molecular Dynamics (MD)and Monte Carlo simulations (MC). Most MD simulationsattempt to simplify the situation by considering the systemof interest as being completely isolated thermallyso that thetotal system energy is a constant.An interaction potentialis specified for the particles and the time evolution ofthe system is found by numerically integrating Newtonsequations of motion. This method gives good informationon time-dependent properties but provides little informationon statistical properties such as entropy and free energy.In contrast to the fully deterministic nature of MDsimulations, Monte Carlo simulations involve the use ofrandom numbers. Essentially MC serves as an approxima-tion scheme for estimating the high-dimensional integralsappropriate to the non-isolated ensemble (for which thetotal system energy is perm itted to fluctuate). Although theyprovide little time-dependent information, MC simulationsare a useful route to the statistical prop erties of sy stems .580

    -1 --

    B. Molecular Dynamics SimulationsI ) BulkSimulations The calculation of the dynamics ofan interacting many-body system is well suited for parallelcomputation because of the inherent parallelism of thesystem. In solid-state physics much work has been do ne

    on the simulation of phase changes by this technique usingvery simple forms of the inter-atomic potential.One approach which has been adopted to try to overcome

    finite-size effects in MD simulations, is to use periodicboundary conditions (PBC). This effectively produces aninfinite crystal composed of many self images of the op-erational data set. While undoubtedly attractive this tech-nique does en coun ter difficulties if the interactio n length iscomparable with the repeat distance of the system since un-physical self-selfinteractions are thereby introduced. Somephase changes may also be inhibited by the cyc lic bound-aries.The first application we performed on the ICL DAP in1980 was a series of bulk M D simulations of Sulphu rHexafluoride (SFs), to model the condensed molecularphases and their transitions so as to aid understanding andto develop experiments [8]. The ordered nature of truecrystals makes theoretical calculations possible, however onheating many crystalline materials undergo transformationsto phases with a high degree of disorder. It is only throughsimulation that such disordered states may be understood.Naturally any such models must mirror the natural phe-nomena. If the results reproduce the observed featuresofthe real system satisfactorily, then the detailed dynamics(not always available to ex periment), may be accepted as agood representation of the actual behavior.In these calculations we use the truncated Lennard- Jon espotential which ha s a very simple analytic form. It permitsrapid calcu lation of the motion of all molecu les in thesample, though a very small timestep is required to givea good approximation. New positions and orientations arecalculated at every timestep. However, it is important thatthe cho sen cutoff be s ufficiently large so as not to affectappreciably the particle dynamics.When liquid SF6 is cooled, the first solid state formed hasa body-centered cubic (BCC) crystal structure. Althoughthe molecules have octahedral symmetry the effect of theprotruding F atoms on the resulting potential surface isquite small. Hence at these high temperatures the moleculesenjoy a h igh degree of orientational freedom. This state ofmatter is called a plastic crystal [9]. On further coolingthe disordering role of temperature diminishes and theoctahedral molecular structure asserts itself, the systemundergoing a so lid-state phase transition to a true crystalof very low symmetry. This low temperature structure wasunknown when these simulatio ns were initiated.

    The DAP sim ulations used a sample of 4096 molecules,initially arranged on a BC C lattice. We first tried to simu latethe formation of the unknown low-temperature crystallinestructure by cooling from the plastic state and found that thesystem contracted with an evolution of latent heat givinga polycrystalline sample with an obvious crystal structure.

    PROCEEDINGS OF THE IEEE, VOL. 79, NO. 4, APRIL 1991- -I

    I

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    8/12

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    9/12

    equilibrated into the experimentally observed low temper-ature phase (C2/c). On heating to 65 * 10K a transitionwas observed to a phase which had not previously beenobserved experimentally. This new phase was identified tohave a rhombohedral space group, R3m. Further heatingbeyond 175 2 15K removed all apparent orientational orderand a further transformation took place, giving the correctunderlying FCC lattice of the plastic crystal. This phasewas stable in only a narrow temperature range, and by200 2 15 K the crystal had melted. This is approximately thecorrect melting temperature for the bulk natural material.Melting occurred rather than sublimation, again consistentwith the natural phenomena.

    A set of neutron powder diffraction experiments haverecently been performed to test for the existence of thepredicted rhombohedral phase. Although no evidence forthis phase was found at ambient pressure it is possible thatnew phases recently discovered under pressure may showsimilarities to the rhombohedral structure. Further analysisof these results is in progress.C . Monte Carlo Models

    In general a thermodynamic assembly of particles willfluctuate. The probability, P({a } ) ,of finding an assemblycorresponding to a particular temperature T , in a particularconfiguration (i.e. arrangement of states) {a} of energyE({a } ) ,is proportional to the Boltzmann weight:

    P({a) )= ex p ( -E ({a ) ) /k T ) . (1)

    Our inability to solve this system analytically is reflectedin the fact that we have no information concerning theconstant of proportionality X in this relationship. H owever,in MC, this problem can be circumvented by a procedurewhereby X cancels out. This procedure necessitates thegeneration of a sequence of configurations which simulatethe energy fluctuations of a system in contact with athermal reservoir or heat-bath. The distribution of theseconfigurations is such as to satisfy the exact form of

    To facilitate MC simulations, a model must provide anexpression for the energy E ( { a } )of the ensemble. Theappropriate energy equation is usually in the form of a sumover the states of the particles an d an interaction with theirneighbors (cf. equation (2)). Thermal fluctuations may besimulated with the aid of computer-generated pseudoran-dom numbers according to the Metropolis algorithm asfollows [17]:1. Set up the simulation in some essentially arbitraryinitial configuration and specify a value for the tem-perature T.2. Generate a trial configuration which differs from theprevious one in some way.3. Using the model energy equation, calculate the energydifference (a )between this trial configuration andthe previous one. If dE is negative (lower in energy),accept the trial configuration as the new one. If dE

    P({ff)).

    582

    is positive then generate a uniform random numberin the range (0-1) and accept the trial configurationas the new configuration only if P = e x p ( - d E / k T )exceeds the random number.

    4. Repeat from 2).These rules are clearly consistent with the fact that a

    thermal system will always evolve towards configurationsof lowest total free energy, although thermal fluctuations(whose size is governed by the Boltzmann weight) maycause the energy to increase transiently.The above procedure requires that the ratio of probabil-ities for the change under consideration and the reversechange is given by the Boltzmann factor. From this ittranspires that the unknown constant X cancels out. How-ever, for this to happen, it is important to ensure that theconfiguration obtained after the change must be exactly asassumed in the calculation of dE.Thus no other part of theconfiguration involved in dE may be updated in parallel.Failure to observe this condition (known as the conditionof detailed balance) correctly has been the cause of someerroneous MC results!Following a number of steps to bring the simulation toequilibrium at the desired temperature, the sequence ofconfigurations generated by the MC procedure will havean energy distribution given by the exact form of P( {a} ) .All observable quantities in the simulation are thereforealso distributed according to this relationship. By m easuringthe quantities of interest as an average over a sequenceof configurations we therefore obtain an estimate of thevalue which would result from a full analytic solution ofthe model, were that available.Using the Monte Carlo technique, computational physi-cists have been able to tackle a wide range of analyticallyintractable models, some of the most important of whichwe discuss below.1)The Ising Model and Critical Phenomena Possibly themost studied of all thermal mod els in physics is the standardIsing model. This is the prototype for a ferromagnet andconsists of a regular lattice of variables known as spins.

    In the king model, a spin represents a magnetic momentand can take either of two values; +1 (up) or -1(down). The spins can, therefore, (after an appropriatetransformation), be represented by logicals in the DAF.On a regular orthogonal lattice, spins interact with theirnearest neighbors via the following equation:J2E = -- C S i S j

    where the notation (ij)means sum over the ith spinSi withits j nearest neighbors S,, and the quantity J is a couplingconstant determining the strength of the interaction betweenneighbors. The factor of 1/ 2 avoids the double counting ofeach bond implied by the sum.It is clear from this equation that the total energy of thesystem is least when all spins are mutually aligned, i.e.,when all are up or all down. Thus at low temperatures,the ground state of the system has all spins ordered in the

    PROCEEDINGS OF THE IEEE, VOL. 79, NO. 4, APRIL 1991

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    10/12

    so-called ferromagnetic phase. Howev er, as the temperaturerises it may be seen that this free energy minimum becomesshallower and in fact above a certain temperature the free-energy is a minimum when all the spins are randomlyoriented giving maximum entropy (the paramagnetic ordisordered phase). The phase transition from the low-temperature ordered state to the high temperature disorderedstate occurs at a well-defined temperature T, . At thistemperature, the model exhibits a variety of complex effectsknown as critical phenomena. Principally, at the criticalpoint there is a divergence in the correlation length and aneffect known as critical slowing down. This latter effectoften results in inconveniently large equ ilibration times atcriticality.The study of critical phenomena has wide relevance inphysics, since similar effects are found in many diverseareas, e.g., solid -state physics, fluid-d ynam ics and par ticlephysics. It is because the ki ng model is the simplest systemto exhibit critical behavior that it has attracted so muchattention.The ability to represent the Ising spins by logicals, andthe inherent parallelism of the M C update schem e, makesit ideal for study with the DAP. The system is mapped ontothe DAP by geometric decomposition and thus 642 spinscan be updated simultaneously, as described earlier forthe example ferromagn et. Indeed the DAP is arguably thefastest kin g eng ine available today, boasting the abilityto perform up to 1500 million individual spin updates persecond. For some time now, we have been studying thecubic (3-D) Ising model in order to determine precisevalues for quantities such as the transition temperatureT,. On a 643 lattice of spins the Monte Carlo techniquehas been used to find the most accurate value yet forthe transition temperature [18]. This work is now beingextended using a larger simulation (1283 spins) therebyincreasing the accuracy of the measurements by reducingfinite-size effects. However, the scale of this calculation issuch that, even with the DA P, it is anticipated that severalthousand hours of computer time will be required.

    One very important and highly successful theory toemerge from the study of critical phenomena is the so-called universality hypothesis. According to this theorymany of the large-scale physical features which characterizethe critical point in a g iven system may also be commo n toa whole class of apparently distinct systems, irrespectiveofthe details of their microscopic interactions. Systems whichare similar in this way belong to the same universalityclass. One such class, the ki ng class is that set of modelswhose critical properties are similar to those of the Isingmodel. At present we are pursuing a DA P sim ulation witha view to showing that a 2-D particle fluid interactingvia the Lennard-Jones potential also falls into the kinguniversality class. However, this sim ulation, like the 3-DIsing model, is hampered by the effect of critical slowingdown and is expected to take many hou rs of CPU timeeven on the DAP.

    2) ki ng Related Models There a re various extensions tothe Ising model, two of which are the Potts model and

    the binary alloy model. The former is simply a multiplespin-state version of the Ising model. Typically the cellsmay be in any one of Q different states and there is analigning interaction between c ells of the sam e Q.An updateconsists of selecting a new trial Q-value for each cell usingthe Metropolis algorithm. Models with a large numberof possible Q-states usually display a wealth of phasetransitions and interesting growth behavior, as domains ofdifferent Q-value compete for dom ination of the lattice.Unlike the Ising and Potts models, the binary alloyconserves the number of cells in each of the possible states.Instead of a ce ll flipping to a new sta te at each update,neighboring cells exchange states. This is analogous tothe mixing and unmixing processes that occur among thedifferent atomic species in metallic alloys and is a goodmodel for predicting structure formation in real materials.In a collaborative project with H arwell Laboratory(AERE,U.K.) and Argonne National Laboratory (U.S.), this modelhas been used for com parison with neutron scattering datafrom real alloy materials.The speed of parallel processing is making it possibleto emulate longer alloy aging times than would be possibleusing c onventional computer architectures. The simulationsyield results which are directly comparable with transmis-sion electron micrographs obtained for the experimentalalloys, and with small angle neutron scattering (SANS)data. To date the simulations have assisted in identifyingmicrostructural stability in PE16 as due to the formationof the so-called gamma-prim e phase, where sphericalparticles of this nickel-rich phase becom e imm obile afterheat treatment. This is by no m eans clear from experimentalwork alone.D . Variable-Pressure Simulations

    Many of the interesting natural phenomena which w emight wish to simu late, such as phase-transitions, are notdriven solely by temperature, but are also influenced bypressure. All of the MC models described above havebeen closed systems, that is to say, the number ofparticles in the simulations is constant. However, it ispossible to devise models in which the pressure mayalso be varied. In MC simulations of this type, this isnormally achie ved by a llowin g the number of particlesin the assembly to fluctuate. Such open systems couldrepresent a gas at co nstant volume and pressure, but withvariable temperature. To achieve controlled pressure in aconstant volume assembly, we must couple our assemblyto a p article reservoir, with which it can exchange particles.Thus on increasing the temperature, there is a net transferof particles to the simulated gas from the reservoir. Onesuch model exhibiting interesting properties as a functionof pressure as well as temperature, is the diatomic latticegas, now to be described.

    1 ) Diatomic Lattice Gas This mo del, originally proposedby Parrin ello and Tosatti, is an attempt to model the phasesof diatom ic matter like nitrogen or hydrogen [19]. Basicallywe construct a g as of simulated atoms which may occupysites on a square lattice. The simulation is connected to a

    WILDING et al.: SCIENnFIC MODELING WITH SIMD COMPUTERS 583

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    11/12

    particle reservoir, so that atoms may appear and disappearfrom the lattice according to a Monte Carlo rule (in muchthe same way as a spin would flip in the Ising mo del). Themodel does not however provide for an atom to movearound the lattice.The interatomic interactions of this simulation are suchthat it is energetically favorable for an atom to have oneand only one nearest neighbor atom. These features aredesigned to mimic, albeit crudely, the behavior of realdiatomic systems in which one finds relatively few singleatoms compared to a vast preponderance of bound atompairs (dimers). The many-body nature of real bonds alsomeans that the close approach of a third atom to a boundpair can d isrupt and break an interatomic bond. This lattereffect is also catered for in the interaction scheme.We have studied this model as a function of temperatureand pressure in both two and three dimensions. In thelow pressure regime, we observe that relatively few bounddimers exist and these form a dimer fluid. On increasingthe pressure, the density of dimers is observed to increase,their numbers eventually being sufficient for crystallizationto occur. This crystalline structure is such as to maximizethe packing fraction of dimers. At still greater pressuresatoms are forced into interstitial positions within the crystal,breaking the dimer bonds and resulting in a continuousdistributio n of atom s, com pletely filling the lattice. This lastsituation is analogous to the metallization which occu rs, forexample, in nitrogen under conditions of extreme pressure.The effect of increasing temperature on the simulation isto disorder the regular crystal structure eventually leadingto melting.This model also appears to exhibit critical behavior ata certain temperature and pressure and further work isplanned in order to investigate the universality class of themodel.

    VII. CONCLUSIONSTh e DAP is a massively parallel single instruction mul-tiple data stream (SIMD) computer, employing a largenumber of sim ple bit-serial proce ssors. The high sp eed ofthe machine is therefore derived from p rocessor multiplicityrather than complexity. This architecture lends itself w ellto the simulation of a diverse range of physical problemswhich possess inherent parallelism.The role of such simulations within modem science isvery important since it is often the only way to investi-gate certain phenomena. The DA P is esp ecially useful forsimulations which can be expressed as bit manipulationproblems, however, even for real number calculations itcompares favorably with conventional vector supercomput-

    ,

    -

    ers. In this paper we have discussed a number of scientificmodels which are currently under study at Edinburgh andthe methods which have been used to map them onto theDAP architecture.584

    REFERENCES[l] A.S. Trew and G.V. Wilson, Eds., Past, Present, Parallel.A Survey of Available Computing Systems. London, U.K.:Springer-Verlag, to be published.[2] R. W. Hockney and C. R. Jesshope, Parallel Computers. Bris-tol, U.K.: Adam Hilger, 1981, 1st edition.[3] A. S. Trew and G. S. Pawley, Canad. J. Chem., vol. 66, p. 1018,1988.[4] G.S. Pawley and G.W. Thomas, J. Comput. Phys., vol. 47,p. 165, 1982.151 K.C. Bowler, A.D. Bruce, R.D. Kenway, G.S. Pawley, andD. J. Wallace, Physics Toby, vol. 40, no. 10, p. 40, 1987.[6] K. C. Bowler and G. S. Pawley, Proc IEEE, vol. 72, p. 42,1984.[7] The first discussion von Neumann gives of self- reproducingautomata was in his talk, The general and logical theoryof automata, presented at a symposium in Pasadena, CA,1948. Also, see his Collected Works, vol. 5, New York:Macmillan, p. 288.[8] G.S. Pawley, Molecular dynamics and spectroscopy, in Neu-tron Scattering in Method s of Experimental Physics, (Eds. D. L.Price and K. Skold),[9] J. N. Sherwood, Ed., The Plastically Crystalline State. NewYork Wiley, 1979.[lo] G.S. Pawley and G.W. Thomas, Phys. Rev. Lett., vol. 48,p. 410, 1982.[ l l ] G. Raynerd, G.J. Tatlock and J.A. Venables, Acta Cryst. ,vol. B38, p. 1896, 1982.[12] G. S. Pawley and M. T. Dove, Chem. Phys. Lett., vol. 99, p. 45,1983.[13] J. Farges, M. F. de Feraudy, B. Raoult, G. Torchet, A. H. Fuchs,and G. S. Pawley, J . Chem. Phys., in press, 1990.141 M. T. Dove and G. S. Pawley, J. Phys C, vol. 17, p. 6581, 1984.151 A. S. Trew, G. S. Pawley, and A. J. Cairns-Smith,Acta Cryst. ,vol. A46, p. 979, 1990.[16] A.H. Fuchs and G. S. Pawley, J . Phys., France, vol. 49, p. 41,1988.[17] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E.Teller, J. Chem. Phys., vol. 21, p. 1087, 1953.[18] G.S. Pawley, R. H. Swendsen, D . J. Wallace, and K. G. Wilson,Phys. Rev., vol. B29, p. 4030, 1984.[19] M. Parrinello and E. Tosatti, Phys. Rev. Left. , vol. 49, no. 16,p. 1165, 1982.

    New York: Academic, 1986, p. 441.

    I

    Arthur S. hew received the B.Sc. and thePh.D. degrees in astrophysics from the Univer-sity of Edinburgh.He joined the Department of Physics in theUniversity of Edinburgh in 1986 as a ResearchFellow and is now a member of staff in theEdinburgh Parallel Computing Centre where heis working on a joint SIMDMIMD computer.He is also a joint editor on a book about parallelcomputers:Past Present, Parallel [I].

    Nigel B. Wilding received the BSc. degree inphysics from the University of Edinburgh in1988. Following the award of a Colin and EthelGordon Scholarship he remained in Edinburghto pursue research towards the Ph.D. degree.His research interests include computerstudies of universality and critical phenomenaand experimental studies of pressure-inducedphase transitions by means of neutron powderdiffraction.

    PROCEEDINGS OF THE IEEE, VOL. 19, NO. 4, APRIL 1991

    I

  • 8/7/2019 Scientific modeling with massively parallel SIMD computers

    12/12

    Ke n A. Hawick graduated in physics from theUniversity of Edinburgh and has just completedwork for the Ph.D. degree also at Edinburgh.His research work was financed by a CASEaward with the Material Science Division atAEA Technology, Harwell.He has worked with parallel computers atBritish Telecom Research Laboratories, Ipswich,at CalTech, Pasadena, CA, and Argonne Na-tional Laboratory. He is n ow with the EdinburghParallel ComDuting Centre on a collaborativeproject with the UK Meteorological Office.-

    G. Stuart Pawley received the M.A. and Ph.D.degrees from Cambridge University, Cambridge,U.K., in physics.Following his degree, he spent two yearsat Harvard University, Cambridge, MA , in theDepartment of Chemistry. In 1969/1970 he wasa guest professor at the Chemistry Department,Aarhus University, Denmark. Currently, he isProfessor of Computational Physics at the Uni-versity of Edinburgh, Scotland, in the Depart-ment of Physics. His computational work wasbuilt on work in condensed matter physics, especially in the study ofthe dynamics of condensed molecular matter. H e instigated the workin parallel computation in Edinburgh, which was deemed vital for thesimulation of the much more complex phase of molecular matter, namelythe plastic crystalline phase, through molecular dynamics simulations.Dr. Pawley is a Fellow of the Royal Society of Edinburgh.

    585WILDING et al.: SCIENTIFIC MODELING WITH SIMD COMPUTERS