Top Banner
ZKLUIU Scientific associate at CERN, funded by Norwegian Research Council for Science and Humanities OCR Output Fellow at CERN joint spokesmen Radstone Technology plc, Towcester, UK C. Davis Creative Electronic Systems (CES), Geneva, Switzerland F-H. Worm, J. Bovier Digital Equipment Corporation (DEC}, Joint Project at CERN A. Guglielmi Dolphin Server Technology A.S., Oslo, Norway E.H. Kristiansen, B. Solbergz Institute of Nuclear Research, Academy of Sciences, Moscow, USSR V.I. Vinogradov INFN Sezione di Roma and University of Rome, La Sapienza, Italy S. Falciano, F. Cesaroni University of Oslo, Physics Department, Norway B. Skaali, G. Midttun, D. Wormald, J. Wikne CERN, Geneva, Switzerland A. Bogaertsl, J. Buytaertz, R. Divia, H. Miillerl, C. Paxkman, P. Ponting Q; ug Data Acqu1s1t1on at LHC Applications of the Scalable Coherent Interface to “' " ~5-' 3 c. P SCOOOOOIZZ Pow? 6 E S Illllllllllll Illlllllllilllllllillllllllllllllll CERN LIBRARIES, GENEVA . ’ R QE N CERN DRDC/91-45 EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH
29

CERN LIBRARIES, GENEVA Illllllllllll ...

Jun 18, 2022

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CERN LIBRARIES, GENEVA Illllllllllll ...

ZKLUIU

Scientific associate at CERN, funded by Norwegian Research Council for Science and Humanities OCR Output

Fellow at CERN

joint spokesmen

Radstone Technology plc, Towcester, UK

C. Davis

Creative Electronic Systems (CES), Geneva, SwitzerlandF-H. Worm, J. Bovier

Digital Equipment Corporation (DEC}, Joint Project at CERNA. Guglielmi

Dolphin Server Technology A.S., Oslo, NorwayE.H. Kristiansen, B. Solbergz

Institute of Nuclear Research, Academy of Sciences, Moscow, USSRV.I. Vinogradov

INFN Sezione di Roma and University of Rome, La Sapienza, ItalyS. Falciano, F. Cesaroni

University of Oslo, Physics Department, NorwayB. Skaali, G. Midttun, D. Wormald, J. Wikne

CERN, Geneva, SwitzerlandA. Bogaertsl, J. Buytaertz, R. Divia, H. Miillerl, C. Paxkman, P. Ponting

Q; ug Data Acqu1s1t1on at LHCApplications of the Scalable Coherent Interface to

“' "

~5-' 3 c. P SCOOOOOIZZPow?6E S Illllllllllll Illlllllllilllllllillllllllllllllll

CERN LIBRARIES, GENEVA. ’ R QE N

CERN DRDC/91-45

EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH

Page 2: CERN LIBRARIES, GENEVA Illllllllllll ...

a basis for future, standard SCI-based data acquisition systems. OCR Outputware will be made in collaboration with other LHC R&D projects to providemaintainable components. The proposed studies on SCI hardware and softcomputer and VLSI manufacturers is foreseen to assure the production ofavoids complex event building hardware and software. Collaboration withory with the virtual memory of modern RISC processors. This approachtrigger can be achieved by combining SCI’s shared and distributed memarchitectures. In particular, a very efficient implementation of the 3"d levelallow efficient implementation of both data and processor driven readouta significant simplification of data acquisition software. Novel SCI featuresfibers. SCI protocols have been entirely implemented in VLSI, resulting inswitches. The interconnections may be flat cables, coaxial cables, or opticalacquisition systems can be built using either simple SCI rings or complexprocessors), providing data transfer rates of up to 1 Gbyte / s. Scalable databus-like services. It can connect a maximum of 65536 nodes (memories orIEEE standard which uses fast point-to-point links to provide computerbased on SCI as a reconfigurable and scalable system. SCI is a proposedmercial trigger processors. Both the global 2"‘f and 3"d level trigger can bespeed interconnect between LHC detector data buffers and farms of com

We propose to use the Scalable Coherent Interface (SCI) as a very high

Page 3: CERN LIBRARIES, GENEVA Illllllllllll ...

Timescales, Milestones 26 OCR Output

25Responsibilities

Budgets 23

22Collaboration with industry

203.10 Modelling and Simulation ......

.otware ................ 1939 Sf

183.8 Diagnostics, using a Protocol Tracer

183.7 Intelligent Data Controller ......

183.6 SCI/ VME Single Board Computer .

163.5 SCI Bridges and Interfaces ......

emory ............. 163.4 SCI M

153.3 Direct SCI-Computer Interface . . .

143.2 SCI Ringlet Test System .......

143.1 General Purpose SCI Interface . . .

14Research and Development Program

122.6 Demonstration Systems .......

.112.5 Use of SCI for the 3"d Level 'Trigger

112.4 Use of SCI for the 2'“‘ Level 'Irigger

102.3 Coherent Caching ...........

2.2 Impact of SCI on Data Acquisition Systems

2.1 Status of the SCI standard ..........

Application of SCI to Global 2and 3Level Triggers'“* "'

.nerace o everer ................16 Itft2”‘ Ll Tigg

1.5 Data Streams after 2”“ Level Trigger ............

1.4 SCI Readout Node Implementation ............

1.3 Size of the SCI Readout System ...............

ors ........................1.2 LHC Detect

on ..........................1.1 Introducti

LHC Detector Readout

Contents

Page 4: CERN LIBRARIES, GENEVA Illllllllllll ...

caches avoid repetitive data access. OCR Output

tained by using the processor’s virtual memory hardware for implicit event building, whilstavailable computers. A significant simplification of both hardware and software can be obtrigger processors. These and the data logger are all implemented as farms of commerciallyare stored only in the (local) 2"‘ level output buffers from where data can be accessed by thethe data logger using a uniform SCI network: copying of data is avoided and events

We propose to implement the global 2'“’ level trigger, the 3"' level trigger and

[4] cannot be scaled to the size of an LHC system.nectivity over large distances is very problematic and event building methods based on busestrigger [3]: the expected data rates exceed the capacity of existing buses, the required con

There are several reasons not to use buses for the readout system after the local 2"“ level

implemented by a processor farm.further reducing the rate to z 100 Hz. Both the 3'“ level trigger and data logger can be

Final event rejection is accomplished by a 3"" level trigger, based on complete event data,

of 100 processors.

of up to 1 ms for the global 2'"' level decision, such a reduction could be obtained by a farmat z 100 kHz; the output rate after the global 2**** level trigger at z 1 kHz. With an averagestage. The event rate at the input stages of the local and global 2"‘ level trigger is estimatedtrigger processors which can correlate the pre·processed event and trigger data of the firstwhich are stored in output buffers. The overall trigger decision is taken by global 2"d levelof a detector only. These produce reformatted, reduced events complemented by trigger dataThe first stage consists of local trigger processors which have access to data of one segment

Event data is further filtered by a 2"" level trigger which is implemented in two stages.

using conventional backplane bus systemsThe data volume of each segment is sufficiently small that these buffers can be implementeddetector by point-to-point links, further concentrated in bus units and stored in data buffers.is implemented by electronics located close to the detector. Next, data is carried off theis illustrated in fig. 1. A first stage of data concentration after the 1** level trigger decisionA possible readout scheme for such a detector, according to the current understanding [1] [2],

Muon Tracking: negligible amount from up to 106 sparsely filled channelsCalorimeter: 200 kbyte per 15 ns bunch, 200 000 channelsInner Tracking: 1 Mbyte per 15 ns bunch, 20 million channels

volume generated by such a detector is estimated as:at a luminosity of 2 >•· 103‘cm2s‘1 will not exceed 105 Hz after the 1" level trigger. The dataAccording to ECFA studies [1], the event rate for a general purpose LHC detector, operated

1.1 Introduction

1 LHC Detector Readout

Page 5: CERN LIBRARIES, GENEVA Illllllllllll ...

transferred to the further readout system. OCR Output

the order of 10. This stage will require fast processors and storage of compacted data, to beimplemented using standard bus units, providing an additional concentration of channels inat least 800 Kbyte/ s data throughput to the 2"‘ level local trigger stage. This stage can be12.5 kbyte/ s: a typical 32 bit bus can combine outputs from 64 chips into one link, requiringsuch chips. At I" level trigger rate of 50 kHz, the output rate per 64 pads is rather low atinputs at the VLSI level, data are further concentrated via 32 bit bus units, containing 64compaction and formatting takes place within a VLSI chip. After multiplexing of silicon-pad

Preshower Detector: for the Silicon Tracker Preshower ( SITP ) detector, 1" level data

channels between the 1'° and 2”d level trigger is foreseen.to be processed and compacted for each. For both detectors a large concentration of inputPreshower Detectors [6] are considered as examples. Roughly 20 * 106 channels need

Amongst various candidates for inner tracking, the Silicon Strip [5] and Silicon

and inner tracker.

candidate detectors include: a silicon preshower tracker, a liquid argon calorimeterThe EAGLE proto-collaboration serves as a model for our application proposal. Leading

1.2 LHC Detectors

Figure 1: LHC Detector Readout based on SCI.

ciosurz rsnagga

at canein cache | a cms|v¤m1 Mmm - IVi!¤n1 Muncry

Fum Farm Fum

ii? SCI Interconnect

a cia.: |5°°° SCI N°°°‘

Link1fxunDcxoc\¤rE1e¤¤¤niu

._ ». ·<

‘ Cmcaumtus :·`....»...........,s¤ Cuixltrllnfl §:;

T1 Dm-·

·.; ·¤: :i· :*· r1 nu. §a ;~: ·>. :;: _::Tl Dill .___;.......... 1 . . . '—2—Z—‘b1·Z·'·l` '~2~2' ' '£-1·i~Z·Z*Z*Z·Z'·Z·2·2 +2 ·2·Z-§;gm> x>¢ewa3q4§;;:; ·g;:;;.,w__;_;; Tit?

iii$:`—·$<$$E¤§:§i$S<$i`i$S·i:¤:¤5?·:§;§>Q: I{2{7;:;:¥>!i¥;§,§¥$?!~K¤S$§:R?$?Z~$1?‘,{§

Page 6: CERN LIBRARIES, GENEVA Illllllllllll ...

outputs to an SCI system. A general purpose detector would therefore require as 4000 SCI OCR OutputMicrostrip, the calorimeters and the muon chambers could each contribute roughly 1000

We estimate that after the local 2"d level, the output channels from the SITP, the Si

channels to these bus units.

bus. Unidirectional links, such as I·HPPI [7] could transfer data from detector-resident VLSI2"d level bus units will very likely be based on conventional buses such as Fastbus or VMEand compaction techniques) have little induence on the proposed SCI readout system. Thechoices taken for the previous stages ( analog or digital pipelines, local 2"" level processingare the data buffers of the local 2'“* level data concentrator units [Fig.2]. Implementation

2'“l Level Data Concentrator Units: the connection points to an SCI based readout

in channels achieved by the concentrator units.

be transferred to the readout system. The size of an SCI system will depend on the reductionis reduced to several thousand output channels across which different data streams have tooutput buffer with compacted event and trigger data. The massive number of input channelsvia point to point links. The bus units contain local 2'"’ level processors which feed anchannels is based on front end VLSI chips, read out by bus units which are interconnectedThe concentration of detector input channels into a smaller number of 2”" level output

1.3 Size of the SCI Readout System

for such correlations.

a global 2"d level trigger. An SCI system would provide a fast and uniform interconnectionfast access to a random subset of muon channels is required, in particular for correlations bychannels will be read out after the 1'° level decision. Though the amount of data is small,

Muon Chambers: these would have up to 106 channels of which less than 20,000

level local trigger, which again could be connected to an SCI readout system.calorimeter channels can be concentrated in less than 1000 output channels after the 2wafer [17] techniques which can integrate local processing and storage. We expect that

Proposals to process, format and compact this data are based on VLSI and multi—chip

be read out.

around 30,000 channels. After the 1“ level decision, less than 1/4 of these channels have toinput channels whilst a hadronic spectrometer (such as SPACAL or liquid Argon) will have

Calorimeters: an electromagnetic liquid argon spectrometer will probably provide 200,000

of the SITP.

concentrators could be connected to an SCI system with less than 1000 nodes, as in the caseto data concentrating bus units. Local 2”“ level trigger processors and buffers in the busbuses. Further concentration by a factor of 32 could be achieved via point to point links128 such channels are contained in one readout block, consisting of both analog and digitalwhose output is kept in both analog and digital stores during the 2"" level decision time.decision time at a rate of 1 kHz. An analog pipeline feeds an Analog Pulse Shaper Processor

Silicon Strip detector: the Si Strip Detector is parameterized for 20-100 ps 2”" level

2"" level data to an SCI memory.based readout system. All these SCI nodes need to be capable of transferring the bus-resident

From 25 * 106 silicon pads less than 1000 output channels could be connected to an SCI

Page 7: CERN LIBRARIES, GENEVA Illllllllllll ...

providing ample freedom in the Hnal implementation. OCR Outputsystem. Scaling to much larger or smaller size is an important feature of SCI,

We assume 5000 nodes as a rough measure of the size of our proposed SCI

nodes, if a uniform system is implemented.

Figure 2: Data Concentrator Units

sum T scmm

_:;5”°‘"’ I l ¤·‘~··~·

5 n \_ k, \ \.,- K KeLinks from Detector Electronics

·.\Q·Z$%\\ @

$ :¥:- :¢:¥zaiii ééé¥é§2

Lum mm Duck: m

%?=3%-=;-wi .=>;

·:·:·<~>:·m.v·xi§·2

é=E =E=§

=2=2&=;$¤=€¥5=;¥s&=&=a=sTz'=a¤;$=e=‘s‘s.’s=';Qn: .». %z?2€¥z5z%t?sPzE=§·r1 mu

2525ai25si23§2i2i!2i§2iai2i2E252i§s€§§2i2%laisis·:k¤:¤:¤:·:¤:¢::2¤:>:¤:¥:2¤:¤:L·1:ht€¢:g;Ei&§EFIUlla!

<1U|¤ s:~t·:¢

··· 2 tz;;&=&>z¤m>»>>uxs$

;-:|·:¤;§:§:>.

+1:

Page 8: CERN LIBRARIES, GENEVA Illllllllllll ...

handle this bandwith. OCR Output

of at least 100 Mbytes / s. We expect that future computers equipped with SCI interfaces canthousands of input nodes, the streams routed to the trigger processors add up to a bandwithsmall packets without blocking to their destination. Though the input bandwith is divided byA system is required which can transfer and route the constituents of the above streams aseach a factor of z 100 above typical LEP figures, such that conventional buses cannot be used.

Required Bandwith: the three data streams constitute a very large total bandwidth,

Data Logger: 108 Byte/s event data108 Byte/s event data3"‘l level:

Global 2”d level: 108 Byte/s trigger data

ECFA subgroup on buses on links [2] [3] and is summarized as follows:each of the streams. A study of these, based on compacted data, has been published by theseen in terms of three data streams [Fig. 3]. All 2'“* level output buffers contribute data toThe parameters for an LHC data acquisition system after the local 2”" level trigger are best

1.5 Data Streams after 2"‘ Level Trigger

these versions.

It is likely that all four versions will be required. Therefore we propose to investigate all of

about 150-250 Mbytes/s and distances 25-30 rn)connections are required for distances of up to 15 m. (First implementation probablyDiferential low voltage CMOS, triaxial flat cables up to 1Gbyte/s: where fast and short

are required for distances of up to 15 m.Diferential ECL, triaxial flat cables at 1 GByte/s: where fast and short connections

are desired for distances up to 50 m.Single ended Coaxial cables at 100 Mbyte/s: where medium sized, cheap connections

Optical Fibers at 100 Mbyte/s: where long distances up to 10 km have to be covered.

Nodes can be interconnected in 4 ways:

readout methods.

level trigger. The SCI node supports read and write access, allowing to implement differenttransfer event and trigger data to the data logger, the 3"" level trigger and the global 2A direct connection to VLSI based buffers will be studied as well. Each SCI node can

program foresees several ways to implement SCI nodes for either VMEbus or Fastbus systems.A11 SCI node provides access to the data buffers of the local 2"° level trigger. Our R&D

1.4 SCI Readout Node Implementation

Page 9: CERN LIBRARIES, GENEVA Illllllllllll ...

in collaboration with computer industry. OCR Output

and a. fast RISC or CISC processor is therefore part of our feasibility study to be carried outsufficiently powerful processor is chosen. The combination of a standard SCI node interfaceconcentrator unit which is connected to SCI could be performed by the same processor if acontrollers would be suiiicient for this purpose. The 2"" level processing required within adata to a processor farm or to transfer local data to cached SCI memory. Cheap micr0—

The SCI readout node requires an associated microprocessor to write local 2”d level output

1.6 Interface to 2'“‘ Level Trigger

choice.

200Mbyte/ s to 1 Gbyte/ s peak performance is required and SCI is an adequatemeans that in order to transfer 100 Mbyte/ s streams on average, a system withto be a signiHcantly higher than the peak data rate required by the system. This

As a rule of thumb, the maximum bandwith of the transmission medium has

Figure 3: Data Streams

(trigger data) (gym; dam) (event data)

Trigger Trigger Logger

DamGlobal 2*** Level 3¤·* Level

1¤“Byres/¤v‘§§§§§g§‘ 10‘Bytes/sv' 10‘Bytes/ov

10'B es/s 10°B we 10°B wsI at ·=;:i;i:i:§gigi;i;5a‘ I

SCI Output Port

Datalnput Port

audOutput Buffers

Local 2*** Level Processors

Page 10: CERN LIBRARIES, GENEVA Illllllllllll ...

of SCI node chips should provide large system scaling. Another important scaling feature is OCR OutputGbytes / s. Active SCI switches which are expected to be developed soon after the availabilitythat simple systems based on SCI rings only scale to bandwith limits of the order of a fewbuses, SCI provides scaling in terms of size and performance. Simulations [9] have shownperformance should be noted by enlarging the system to its final size. Contrary to backplane

In view of the large size of an LHC experiment scalability is a necessity. No loss in

memory and coherent caching.

cached to speed up repeated access. Inter·processor communication is possible using sharedlibraries or drivers. There is no restriction on the direction of the data flow. Data is normallyApplications written in high level languages can access data implicitly without using specialdesigned to ensure reliable transmission and data coherency in a multi processor environment.and error recovery are entirely implemented in VLSI. SCI protocols have been carefullyunderlying protocols which take care of packet (dis)assembly, buffering, routing, prioritiessuch a network, any application program can trivially access data in any other node. Thepassive ringlets interconnected by long distance optical fibers or short coaxial cables. Withinuniform SCI system can be constructed from a mixture of high speed SCI switches and cheap,approaches to data acquisition. These provide very efficient access to distributed memories. AAmongst the innovations of SCI, coherent caching and virtual addressing can lead to new

2.2 Impact of SCI on Data Acquisition Systems

and are under investigation at CERN for applications in Data Acquisition Systems [20].Hewlett Packard. The latter work with various framing protocols such as HIPPI and SCIcoaxial SCI versions are combinations of the node chip and the transceiver GIGA-chips frombased node interface, a chip set and optionally a Diagnostic 'I`racer. The optical fiber andprovide interfaces to memory and second level cac.hes. The starter kit consists of a VMEbuschips cormect directly to a cable for transmission at 1 Gbyte/ s and further CMOS chipsVLSI and starter kits from Dolphin are expected by the first quarter of 1992. Their node

The first VLSI chips for SCI are developed by Dolphin Server Technology in Oslo. First

P1596.5 SCI Transfer Formats

P1596.4 High bandwidth memory interfaceP1596.3 Low voltage differential signals

P1596.2 SCI extensions

P1596.l SCI to VMEbus bridge architecture

cover the following areas:Standards Board after a short review in October. New working groups have been set up toend of 1991. It has passed the first level of balloting and it will be forwarded to the IEEE[8], consisting of the logical and physical specifications, will very likely be finalised before thepared to backplane buses, far higher speeds are possible. The proposed IEEE P1596 standardSCI provides computer—bus-like services but uses a network of fast point—to—point links. Com

2.1 Status of the SCI standard

2 Application of SCI to Global 2"" and 3"" Level Triggers

Page 11: CERN LIBRARIES, GENEVA Illllllllllll ...

10 OCR Output

Figu.re 4: Coherency Protocols, transferring data directly between two cache memories.

Processor

Directory

Cache

Data ?

Cache E

Status

Dm Dena

iaézcrene a Memory ii

Directory Directory

Cache Memory

ManoryProcessor l Sums

processing. Cache-prefetch of data improves latencies for read operations.cache to cache transfers). SCI cycles to access directory information may overlap withejcient data transfers: SCI supports many options to optimize data transfers (such as

fast data access: data is cached at both ends, which speeds up repeated access.

low bandwidth usage: only data which is accessed is transferred.

using special libraries or being linked to complex data acquisition software.application software can be written natumlly: applications access data directly without

transfer data transparently.

no software needed to tmnsfer data: SCI coherency protocols implemented in VLSI

[fig. 4]. This has several advantages for data acquisition hardware and software:updates, even allowing data to How between caches without being stored in the main memoryof pointers and status information. The protocols have been optimized to minimize memoryprocessors. For this purpose, SCI memories and caches contain, apart from data, a directorytocols ensure that changes to data in one processor are reflected in the caches of the sharingShared data residing in SCI memory may be cached by several processors. Coherency pro

2.3 Coherent Caching

small systems will be usable without modifications or loss of performance in large systems.topologydndependent data access. This means that data acquisition software designed for

Page 12: CERN LIBRARIES, GENEVA Illllllllllll ...

11 OCR Output

delays each time the processor tries to access a data item which is not yet cached. ThisAccessing data over a large network introduces latencies which manifest themselves as

and software.

contiguous single event, as illustrated in Fig. 5. This avoids complex event building hardwareall modern processors can map the data which is distributed over many output buffers into awithout copying data. In addition, the virtual memory hardware which is implemented on

SCI allows data to be accessed directly from applications running on the processors,

data rate of z 100 Mbytes/s.Assuming z 100 Hz for the event rate after the 3"' level trigger decision, this results also in

The data logger needs only to copy those events which pass the 3"' level trigger decision.

the trigger processors to z 100 Mbytes/ s.data (z 10%) is used for the rejection decision This reduces the effective data rate intotrigger, our argument is based on the observation that only a small fraction of the eventdata rate by a factor z 10 and simplify at the same time the event building. For the 3*** level

We propose to exploit the memory mapping capabilities of SCI to reduce the effective

the event building which is required for both the 3*** level trigger and the data recording.data rate remains still enormous (1 Gbytef s). A further difficulty to be solved for LHC isAfter the 2"" level trigger the event rate is reduced to a tolerable 1 kHz though the total

2.5 Use of SCI for the 3"‘* Level Trigger

processors as an SCI based farm.

2"d level output buffers and SCI. We propose to implement the global triggermodules developed by our industrial partners provide connections between thetrigger using an SCI network for the data collection. Both CISC and RISC based

We propose to investigate the data driven approach for the global 2"d level

would dispose of z 1 ms per processor and event.processors and the rejection rate. Assuming a reduction to z 1 kHz, a farm of 100 processors

The average processing time available to each trigger processor depends on the number of

using shared memory can be very fast.Amongst these, the cache coherency protocols are very promising because synchronizationprocessor. SCI provides a rich set of primitives for the required synchronization of processors.bandwith. A rapid decision (100 kHz) must be made by z 5000 nodes to select a free triggera destination processor of a trigger farm. In addition, this makes better use of availablethe data driven approach where data is written by a processor of an SCI readout node todata over large distances, latencies become a limiting factor. These can be reduced usingover a large number of memories at a rate of z 100 kHz. Since this involves transporting

The main difficulty of the global 2"" level trigger is the recombination of data distributed

a small amount of additional trigger data into an output buffer, interfaced to SCI.parallelism and a modest data rate. Each local processor compacts the events and producesresiding in bus systems. Because of the locality of the data, there is a large degree ofWe assume that local 2"‘ level trigger processing is predominately performed by processors

2.4 Use of SCI for the 2"‘ Level Trigger

Page 13: CERN LIBRARIES, GENEVA Illllllllllll ...
Page 14: CERN LIBRARIES, GENEVA Illllllllllll ...

13 OCR Output

understanding SCI and its possibilities.Project [10] is currently concentrating on I·HPPI links and switches, however is interested interest in SCI for the 3'“ level trigger, and regular contacts exist. The EAST 2"" Level Trigger

The Scalable Data Taking System Project at a. test beam for LHC [16] has expressed in·

starter kit in an environment comprising VMEbus and SUN SPARC stations running UNIX.output buffers to trigger farms. A close collaboration already exists to test the first SCIProject [12] views SCI as a good candidate for cached readout, without copying data fromprovided that the hardware interface is designed at CERN. The Readout System Test BenchesFastbus interface and the necessary system software. DEC offers to contribute to the software

Figure 6: SCI Demonstration System.

Workstation

Intertaca bus

.iit.$<=¤¤

Bus Interface

CHIP

SCI

Trigger processors

·--·( ·~--- ··--—- » ‘.

SCI Domonstrator System

CHIP

scr gpiii

cnc¤<=¤§.ii§.

RAM

Ahpnr K QQU

SCI memory nodeLevel 2 data access

OCR OutputSCI dual-ported memory node

Page 15: CERN LIBRARIES, GENEVA Illllllllllll ...

14 OCR Output

lprocessor bus of the Motorola MC88110 RISC processor

RSTB project [12] on the programming part of SPARC-VMEbus interfaces.used on a SPARCstation which we have purchased for this purpose. We collaborate with the

User Environment: for test and diagnostics, the UNIX system and the C language are

SPARCstation from Performance Technology [11].ment. The VME bridge and SCI memory are available from Dolphin, the interface to acommercial cards and allow testing of SCI transactions without initial hardware develop

Ringlet Components: the memory and the interface to the workstation are based on

data transfers from external buffers.

transfers between SCI memory and a workstation, a third node will be added later to testbe scaled to larger size in the future [fig. 8]. Starting with a two·node ringlet to test dataIn order to understand the functionality we plan to build a minimal SCI system which can

3.2 SCI Ringlet Test System

adapters to both Fastbus and TURBOchannel.subset of protocols are sufficient for data readout applications. We plan to work on simplewhich, when fully implemented, are called bus bridges. Simple adapters, which implement a

Bus Interfaces: buses like TURBOchannel or Fastbus require a bus protocol adapter

is compliant to the SCI transaction and cache coherence protocols.GCC controller. It provides a 1 Gbyte/ s transfer rate on differential coaxial SCI cables and

The SCI interface contains sufficient dynamic memory as SCI cache under control of the

can be used.

processing, data formatting and data collection. RISC or CISC processors or microcontrollersas local 2"" level output buffer. If required, the same processor can be used for local trigger

Each interface requires a processor to transfer local data to SCI memory, which serves

such protocol adapters.via protocol adapters. Dolphin can provide development tools, based on Verilog, to design

connection which is compliant to the MC88110 bus 1. Other bus protocols can be interfacedcontroller (GCC) and cache memory and provides an SCI connector and an external busfor both existing and new implementations. It contains the node chip, the general cacheand processors. Conceived as a. daughter board [fig. 7], Dolphin’s SCI interface can be usedA small, VLSI-based interface is required to connect SCI to local data buffers, user logic

3.1 General Purpose SCI Interface

test bed for SCI software.

are used to build a small, scalable SCI environment to test these protocols and to develop aapplications we will also test non-cached transfers. Commercial components and modulesSCI’s cache and virtual memory system, as well as the coherency option. For low latencydescribed. Following the requirements of an LHC data acquisition system we need to testThe Research and Development Program carried out by the different partners is briefly

3 Research and Development Program

Page 16: CERN LIBRARIES, GENEVA Illllllllllll ...

15 OCR Output

°DEC’s 'Iiubochaunel (100 Mbyte/s), Sun’s SBUs (200 Mbytes/s) and the Futurebus+ (600-800 Mbytes/s)These tests could be carried out in collaboration with INR., Moscow.

experiment on a TUB.BOchannel-based VAXstation interface.

This computer bus is sufficiently fast and simple and allows us to collaborate with the Delphileading candidates 3 we have chosen the TURBOchannel for a test of a direct SCI interface.direct SCI- Computer interface, serving as a first model of a trigger processor farm. From thewith SCI interfaces will become available in the future, we need to test the functionality of afast access to the CPU and its memory system. Though we expect that commercial computersOnly on a the long term need SCI-workstation interfaces to be optimized for speed, requiring

3.3 Direct SCI-Computer Interface

SCI ringlets has been gained.[14]. Testing of the SCI fiber version is foreseen at a later stage after experience with smallGore [13] whilst the required IEEE 1301.1 standard connectors will be purchased from Du.Pont

Cabling and Connectors: the SCI cables (ECL differential coax) will be bought from

Figure 7: SCI interface to User logic

ADAPTER

PROTOCOL

extemal bus

PROCESSOR

88110 bus

CACHEUSER LOGIC

DRAM

CBUS

NODE CHIP CONTROLLER

Page 17: CERN LIBRARIES, GENEVA Illllllllllll ...

16 OCR Output

an option to include a Motorola 88110 RISC processor, this 6U VME board conforms tomodule from Dolphin is implemented according to the IEEE substandard P1596.1. Withis a fully understood commercial system supported by a range of computer interfaces. This

VMEbus Bridge: the VME bridge is a base module for our test system, since VMEbus

to develop a full implementation of bus bridges within this project.TUR.BOchannel in the computer farms. More details are listed below. We do not proposeends, linked via VICbus, HIPPI or Fastbus cable segments. Futurebus+ as well as SBus andthe local 2"J level trigger. Possible candidates are [fig. 9]: VMEbus and Fastbus in the frontBridges between SCI and other buses and links are required to interface processor farms to

3.5 SCI Bridges and Interfaces

Figure 8: SCI diagnostic environment

Bus l lllégléZ·I·I·FI-Z-Z·2·2-Z·?·Zyl-Z-}2{Z·2-57·Z·Z)Z·Z·Z-Z{·2·Z·CjZ;Z·l§l;lgl;l;lI

....._. . ....... _.

``" `

Bus Bridge Bus Ad¤Pt0¥‘·

,;,;,;,,3,,,;,;,,;,;.;.;,;,,,,,,;,,:2,,;.,:,g3,,§;§;;5§5

SCI l_..emo'? ¤t¢*¢¤¤¤¢¢*[racerNl Tzrs:;;=;=;:;.;·;-.·.-.-.-.-....,;·;.;.,..,..;.;._.;.;...;.;.;.;.;.;._..,.,.,... ..

’$:EiE’§i‘i‘:5

our ringlet test we need one SCI memory board.of 4 Gbyte per node. SCI memories can be shared between a large number of processors. Forexample based on the GMC global memory controller from Dolphin, providing a maximumSCI memory is exclusively accessed and modified by SCI transactions. Such a memory is for

3.4 SCI Memory

Page 18: CERN LIBRARIES, GENEVA Illllllllllll ...

17 OCR Output

for high performance future platforms. A participation in this activity is desirable for ourinterest for such an interface is high. Major computer manufacturers plan to use Futurebus+

Futurebusl': a bridge will be specified by an IEEE Working group, since commercial

Xilinx interface, using Verilog design tools from Dolphin.can be provided by our collaborators from DEC. ECP-EDA provide the development of thebased on Xilinx Logic Cell Array (LCA) technology. This board as well as software supportI/O window. We are planning to use an existing TURBOchannel interface board which isquests. SCI responders can subsequently directly write, noncached, into TURBO channels

TURBOcharmel: simple TURBOchannel I/O request are converted into SCI read re

Fastbus Service Requests.

protocol is sufficient for data transfers to SCI. Read requests from SCI are converted intowith a slave port input on the Fastbus backplane. A small subset of the Fastbus slaveFastbus card with an SCI interface and a processor has an internal, fifo-type data memorydata transfers &om Fastbus to SCI, making use of Service Request on the Fastbus port. A

Fastbus: an interface between Fastbus and SCI is plamied to implement uni·directional

later.

VMEbus 32 bit protocols. It is expected that a 64 bit VMEbus version will become available

Figure 9: SCI bridges to various bus systems

FUTUREBU5 TURBOCHANNEL

SQLSC; NODENODE

VMEBUSFASTBUS

Page 19: CERN LIBRARIES, GENEVA Illllllllllll ...

18

*Rndstoue specific line of plug-in modules

adapt it to other platforms, such as SUN Sparc or DECstation.The University of Oslo has started to develop the user interface. It may be necessary to

Diagnostic hardware, the SCI Tracer, and low level libraries will be provided by Dolphin.

UNIX environment are required.Higher level libraries for the portable interactive user interface (based on X-windows) for apassively on an SCI ringlet. Low level libraries will be available for displaying bit patters.for easy application of diagnostics to SCI tests. It is implemented as a VME card, actingdevices and tools. A user interface to the SCI Protocol Tracer from Dolphin is requiredA complete test system for SCI consists of an SCI ringlet, a workstation and diagnostic

3.8 Diagnostics, using a Protocol Tracer

‘“" OCR OutputThe IDC can be used as a bus concentrator with optional local 2'“* level trigger processor.

division at CERN [20].the RIO 8260 from CES and the HIPPI Source Daughter Module developed by the ECPthe beginning of 1993. Preliminary studies can be carried out using a VMEbus-SCI bridge,crates, Fastbus, CAMAC or HIPPI. CES intends to have a. working prototype available byallow collecting data from distributed buses. The VICbus allows accessing other VMEbus

connection to a VICbus or HIPPI interfaces which can be stacked via an internal bus to

IDC are the R4000 RISC processor from MIPS, on-board memory, an SCI interface and aData Controller (HJC) with an SCI connection [fig. 11]. The principal components of theCreative Electronics SA (CES) intends to study the feasibility of an VMEbus based Intelligent

3.7 Intelligent Data Controller

software base.

require VMEbus, a connection to an SCI network, a general purpose processor and an existingThe resulting SCI/VME SBC is particularly attractive in smaller sub—systems which

range of software Development can start in the middle of 1992.will plug into an existing 68040 processor card, the 68-42, thus providing access to a wideAPEX *, containing an SCI interface and coherent cache memory [fig. 10]. The SCI/ APEXRadstone Technology propose to design an Advandced Processor Extension Interface called

3.6 SCI/VME Single Board Computer

will design such an interface in the future.popularity of SBus justifies such an interface. We assume that one of our industrial partners

SCI·SBus: we have no immediate plans for such an interface though the speed and the

design a VLSI implementation towards the end of 1992.SCI-SCI: one of the most important bridges is between SCI ringlets. Dolphin plans to

this development.project and contacts with IEEE are maintained. INFN of Rome have expressed interest in

Page 20: CERN LIBRARIES, GENEVA Illllllllllll ...

20 OCR Output

along the project, and make the tools available to designers and integrators of data acquisition(initialisation, sustained traffic, noise). We intend to continue architectural studies all the wayto isolate and optimize critical paths and estimate throughput under different conditionsmemory requirements and scalability of different topologies. Detailed studies are needed

Several global aspects of Data Acquisition Systems must be studied to evaluate latencies,

of cheap workstations.a granularity of z 100 ns (the typical SCI packet size) is now possible due to the availabilityacquisition systems containing thousands of nodes. A precise simulation of the data flow at

At CERN, we have developed a general SCI Modelling Program to simulate large data

program written in SIMULA as a research tool to study cache coherence protocols.University of Oslo where these simulations were carried out has also developed a simulationwhich are part of the standard SCI protocol specifications. The Institute of Informatics at thecomprising several nodes, simulated on a cycle by cycle basis (2ns) using the C routinesof the SCI node chip in VERILOG. Also existing today are complete SCI based systemsdevelopments. Available today, Dolphin provides a complete, vendor independent simulationSimulation is widely used by many of our partners for various stages of hardware and software

3.10 Modelling and Simulation

Figure 11: Intelligent Data Controller.

VMEbus

HIPPI/VIC

Adaptoradaptor

Processor busmm

‘..i;.¤¤¤c

CHIP

SCI

SCI

Page 21: CERN LIBRARIES, GENEVA Illllllllllll ...
Page 22: CERN LIBRARIES, GENEVA Illllllllllll ...

21 OCR Output

which is now being used by many other HEP institutes.systems. We have chosen MODSIM H [21], a commercial, object oriented simulation language

data logger.plot shows the amount of data flowing into the processors belonging to the trigger farms and

¤ 1000 memory nodes. The upper plot shows the raw data traffic on the SCI links. The lowerFigure 12: Simulation of an SCI based data Acquisition System with data distributed over

Node BandwidthSCI node Id

0 200 400 600 600 10003 0

@0.01

:0.02

90.03

@0.04

}0.05

Node Input Link BandwidthSCI node Id

0 200 400 600 800 1 000

>0.04

80.0s

@0.12

00.16

OCR Outputvi 0.2 0

LHC Ddtd Acquisition System Model nodes 1 — 1083.

Page 23: CERN LIBRARIES, GENEVA Illllllllllll ...

22 OCR Output

which will be used for the collection and high-performance processing of data.implementations and system interconnect technology to provide an intelligent data controllerresearch laboratories. They are bringing their expertise in VMEbus RISC microprocessorCES is a very important supplier of VMEbus and related equipment to CERN and other

CREATIVE ELECTRONIC SYSTEM SA (Switzerland)

concepts without major investments in hardware and software.into existing data acquisition systems and allow the early testing of some basic architecturalCISC-based processor module in VMEbus. This device will provide the means to graft SCIare bringing that expertise into the project to design and construct an SCI general-purposeRadstone is a major manufacturer of VMEbus boards and VMEbus-based systems. They

RADSTONE TECHNOLOGY plc (United Kingdom)

Norwegian Research Coumcil for Science and Humanities.are contributing one engineer who is working full-time at CERN paid by Dolphin and thein developing the VLSI parts necessary for the initial implementation of SCI node. Theywell as a VMEbus to SCI bridge (IEEEP1596.1). They have taken the world-wide leadare heavily involved in the IEEE standardisation process for the SCI itself (IEEE P1596) asDolphin is the supplier of indispensable technology and expertise for the SCI. Their personnel

DOLPHIN SERVER TECHNOLOGY AS (Norway)

(low-voltage CMOS implementations of SCI).Packard (optical SCI applications and specific SCI memories) and National Semiconductordardisation process. These include: Apple Computers (personal computer networks), Hewlettnumerous relations with industry at large through its involvement with the IEEE SCI stan

In addition to bilateral contacts with our direct partners (listed below), the Project has

development effort.proposal. Several companies are involved, lending their expertise as well as design andThe European computer industry is providing a large measure of essential support for our

A Collaboration with industry

Page 24: CERN LIBRARIES, GENEVA Illllllllllll ...

23 OCR Output

CERN 242 kCHF

INFN 45 kCHF (Futurebus+ base equipment)Radstone 20 kCHF (SCI processor prototype)CES 20 KCHF (IDC prototype)DEC 40 kCHF (VAXstation and Xilinx board)University Oslo 30 kCHF (TRACER. and CPU)Dolphin 55 kCHF (Partial funding of B.So1berg Associateship, Verilog tools)

Budget Partitioning

452Qotal Budget:

40Technical Student:

45Futurebus Interface + Crate:

25Fastbus Interface: [fig. 14]

40TUR.BOchanne1 Interface: [Hg. 14]

Cables and Connectors:

35Instruments, use and purchase:Computers: 23 [fig. 13]

50Inhastructure trips and visits:

364 SCI General Purpose Interfaces:

34Software Development tools: +15 seed money 1991

45Diagnostic Tracer and CPU: [fig. 13]72VME basic SCI equipment: [fig. 13]

kCHF

additional budget.allowing us to set up an SCI Ringlet Test System. Continuing future work will require anOur budget estimation covers one year of purchase and development of SCI test equipment

B Budgets

Page 25: CERN LIBRARIES, GENEVA Illllllllllll ...

24

F`;|7`|11'9 14- "FTTRPO¢·}·mnnn]-R(YT-F‘a<th11¤¤¤t11n

VAX station

DEC station

TURB0 OCR OutputExsrnus

88110sam

Figure 13: VMEbus Modules for Rjnglet Test

_A..» H ........

SUN station

EXTERNAL Bus

snus

z; - -;¢Zf‘¤‘*E;} #:2E:¤r¤¢=¤¢¤:§:ici¢=:¤:¤:¤:*:¢:¤·¤ :;:;:;E§?"‘f[ '‘'‘" ‘ `’`‘" $55555

VME

Page 26: CERN LIBRARIES, GENEVA Illllllllllll ...

25 OCR Output

SCI DAQ Proposal + Softw. to be negotiatedTurbochannel/SCI Interface Hw P33 CERN Delphi, CERN

SCI to Fastbus Interface P33, CERN

Modsim H Simulations P33, CERN

Architecture Studies P33, CERNParticipation in Data Acq.Multi-Ringlet Test System P33, CERNData Logger Tests P33, CERN3rd Level Trigger Tests P33, CERNGlobal 2nd Level ’I`rigger Tests P33, CERN

SCI Test Software P33, CERNRinglet Test System P33, CERN

Futurebus-}- Bridge INFN Rome

VMEbus / SCI General Purpose Proc Radstone Technology plcIntelligent Data Controller Creative Electronics S.A.

rI`l1.I'bOCl'l8Il.1'l€l/ SCI Interface Sw Digital Equipment Corp. CERN Joint ProjectDiagnostic Tracer Software University of Oslo, Physics DepartmentSCI to SCI Bridge Dolphin Server Technology A.S.Diagnostic Tracer Hardware Dolphin Server Technology A.S.SCI to VMEbus Bridge Dolphin Server Technology A.S.SCI Chips Set Dolphin Server Technology A.S.SCI Memory Dolphin Server Technology A.S.General Purpose SCI Interface Dolphin Server Teclmology A.S.Verilog Tools Dolphin Server Technology A.S.

negotiated at a later stage.jects during the development phase. For a future continuation, responsibilities need to beAs shown in the timescales and milestones, responsibilities are required for various subpro

C Responsibilities

Page 27: CERN LIBRARIES, GENEVA Illllllllllll ...
Page 28: CERN LIBRARIES, GENEVA Illllllllllll ...

27 OCR Output

York USA

[14] EIA IS 64 2mm Device Connector System, DuPont Company, Electronics Division, New

to SCI )UNITED KINGDOM KY11 5PU (Information sheet on Goretex cables and its application

[13] W. L. Gore & Assoc. (UK) Ltd. Pitreavie Business Park, Dimfarmline Fife Scotland,

RD12

[12] RSTB collabortaion, Readout system test benches, CERN/DRDC/90-62, P15,

York USA.

[11] Model PT-SBS915 SBus to VMEbus adapter from Performance Technology Inc. New

LHC Experiments CERN/DRDC/90-56[10] EAST collaboration, Embedded Architectures for Second-level Triggering in

1991, Jiilich, Fed. Rep. of Germany.IEEE Real Time ’91 Conference Proceedings, IEEE Seventh Conference 25 - 28 JuneA. Bogaerts et al., SCI based Data Acquisition Architectures, to be published in[9]

server: hplsci.hpl.hp.com in pub / scitavson, SLAC Chairman Specifications in Postscript or Mac format are available on ftpSCI Scalable Coherent Interface, Proposed standard IEEE P1596 D.B. Gus[8]

Chairman

Speciication, Proposed ANSI standard X3T9.3, Bob Morris, Intelligent Interface Inc.High Performance Parallel Interface Mechanical, Electrical and Signalling Protocol[71

track/ preshower detector ECFA 90-133, 201LHC CERN/DRDC/P3 and A.Poppelt0n, Fast electron triggers from a siliconSITP Collaboration, A Proposal to Study a Tracking/Preshower Detector for[61

Luminosity at LHC, CERN DRDC 91-10Development of High Resolution Si Strip detectors for Experiments at High[5}

Submitted to NIM Proceedings of Sth Pisa meeting o.a.dvanced DetectorsH. Miiiler et al., New buses and links for Data Acquisition CERN-ECP-91-xx,[4]

Aachen Oct 90, Proceedings Vol m, ECFA 90-133, p. 161 fl`, and: CERN/ECP 90-10Buses and Standards for LHC H. Miller Convenor Large Hadron Collider Workshop[3]

ceedings Vol IH, ECFA 90-133 p. 165 ii`.J .F.Renardy et al.,SCI at LHC Large Hadron Collider Workshop Aachen Oct 90, Pro[2]

90-133

Acquisition Large Hadron Collider Workshop Aachen Oct 90, Proceedings Vol I, ECFAN.E1]js, S.Citt01in and L.Ma.pe1li, CERN Signal Processing, 'I&·iggering and Data{1]

OCR OutputReferences

Page 29: CERN LIBRARIES, GENEVA Illllllllllll ...

28

[21] CACI Products Company, La Jolla, CA, USA “MODSIM II”

Laboratory, hpl 90105 July 1990

Interface Chip Set for Computer Data Transmission Instruments and Photonetic ,...at the Nucl. Science Symposium, S.Fe 1991 and R.C. Walker et al. 1.5 Gbit/s Link

[201 T. Angelov et al. HIPPI Developments for CERN Experiments to be presented

[19} Hewlett Packard, 1501 Page Mill Road 3U, Palo Alto, CA 94304 (Giga·Link project)

[18} Dolphin Server Technology A.S., Oslo, PO Box 52 Bogerud, Norway, ( SCI project)

Calorimetry at LHC CERN/DRDC/90~74, P 19, RD16[17] FERW collaboration A Digital Front-end and Readout Microsystem for

P16, RD-13

[16] A Scalable Data Taking System at a Test Beam for LHC CERN/DRDC/90-64

on ftp server: hplsci.hp1.hp.com in pub / csrSCI and Futurebus, preliminary specifications in Postscript or Mac format are available

[15] CSR1212 is a proposed IEEE standard for Control and Status Register architecture for