Communication & Data Flow Marlon Barbero, Bonn University FE-I4 Review, CERN Nov. 3 rd - 4 th 2009
Jan 06, 2016
Communication & Data Flow
Marlon Barbero, Bonn University
FE-I4 Review, CERNNov. 3rd - 4th 2009
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 2
Contents
10:00 Communication and data flow (1h00')
- Input clock and command - Clock multiplier - Data output- Readout architecture overview, simulations &
calculations
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 3
Talk OverviewPixel Array:
data formatting
/ compressi
on
80×336 digital pixels
Asynch. FIFO (hamming code)
‘LVDS’-out
160Mb/s
2
monitoring
config.Periphery:
PLL, 40MHz in, 160MHz
out
40MHz
digital ctrl block
interface
L1Tglobal
configglobal register
bank
pixel confi
g
trigger FIFO
EoC
Powering
clk select
160MHz
aux
L1T, token, read, …
L1T, token, read, …
EoC
tokenEoC
token28 b × 40 DC
pixel configWhat drove choice of region
architecture? Efficiency region? Extra features?
Data formatting block. How? Why chosen format?
Data flow in DC. Coding, format.
Storage in FIFO. How? Why?
8b10b encoder unit. Why? Specs/Protocol?
PLL (40MHz 160MHz). Input clock, MUX, clock Xer, high speed serializer…
Command decoder: Configuration, reset & L1T. Protocol chosen.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 4
A- Inputs
Plan:• A- Inputs.• B- Readout Architecture & Data Flow.• C- Other means of communication -I/O.
• A- Inputs: • A1- Clocks & clock multiplication.• A2- Command decoder.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 5
A1- Clock Input• Main clock input: LVDS 40_MHz_clock_in. This is the clock which is:
– sent to the PLL higher frequency clock generation.– sent to and used by ‘all’ blocks: Command Decoder (CMD),
End Of CHip Logic (EOCHL), End Of Double-Column Logic (EODCL), Pixel Digital Region (PDR)…
– Note that Data Output Block (DOB) uses higher frequency clock to stream out data at 160Mb/s. We’ll come back to that.
• Auxiliary clock input: LVDS AUX_clock_in. This is the clock which goes:
– to the PLL. Bypass PLL and use AUX for data streaming possible.
– might be used by somewhere else? (stop mode?)
Clock & timing distribution, Abder’s
Andre
Tomek
Implement. in progress
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 6
CLKGEN: Clock Multiplier• For IBL, need to transmit data out at BW of 160Mb/s• 2 options:
– send a 80MHz CLK to the FE and use both edges to transmit• Needs modification of BOC / ROD to produce higher speed TTC• Needs synchronization protocol on the FE between 80MHz
clock & beam crossing.• A new DORIC needs to decode CLK at twice frequency
– send a 40MHz CLK to the FE and multiply clock on FE• Needs a clock multiplier on chip• Note: synergy with what the strip MCC need
• In FE-I4, we have FE clock multiplier + AUX clock input:– Clock multiplier from the 40MHz input clock– AUX: possibility to send “your choice of clock” to the FE
I/O choices for ATLAS IBL, ATLAS Pixel System Design Task Force
Andre
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 7
CLKGEN specs• I/O:
– CLKGEN Input: REFCLK 40MHz input clock PLL: 640MHz, divided down to 320 /160 / 80 / 40 MHz.
– CLKGEN Output: 2 Single Ended clocks selected from internal clocks (not 640MHz), 40MHz in or AUX in.
• Why 640 MHz:– Good Duty Cycle for divided down clocks (dual edge
serializing initially intended).– Higher freq VCO Smaller LF cap, reduced area.– Synergy with other projects.– Drawbacks: Power, switching noise.
• Area 236×281 μm2, Iaverage_nominal ~ 3-4 mA, settling time ~1.2 μs, loss of lock detect.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 8
PLL OverviewCharge Pump
Voltage ControlledOscillator
Phase Frequency Detector
FrequencyDivider
Loop Filter
Conversion, Enabling and Buffering
IN: 40 MHz
OUT 640 MHz
40, 80, 160, 320 available
CLK fed back
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 9
CLKGEN Overview
PLL
MU
XM
UX
CLK0_out
CLK1_out
160320
8040
config -> registers
EN_40MEN_80M
EN_160MEN_320M
EN_PLL
ICP (from DAC)Ibias (from DAC)
Each current controlled by 8 bit DACregisters
config -> registers
ENables registers
Ref (40M In)
640
AUX
160320
8040
40 In
Used for data stream-out in DOB / serializer
Used in 4 to 1 MUX
AUX
160320
8040
40 In
Ref2Fast Fb2Fast
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 10
CLKGEN • MUX scheme allows: 1- serializing data with
various clock. 2- tests of 4-chip modules for sLHC in star configuration.
This FE has a special role. Accepts 80 Mb/s streams from 3 FE and streams out at 320 Mb/s.
3 FEs, each send data at 80 Mb/s
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 11
A2- Command Decoder• There exists a dual path for configuration. • Test Mode: CMOS pins + Shift Register à la FE-I4_proto1.• We focus here on “standard input”: command decoder.• Select between Command Decoder & Bypass from bond
(InMUX_select=1).• CmdDec Inputs: LVDS Command in, LVDS clock in. • Similar to FE-I3 command decoder.• 3 classes of commands: trigger, fast, slow.• Issuing commands during running? No automatic exit from
RunMode anymore, but a choice of user (slow ctrl command needed to exit).
• If RunMode off, fast command and trigger NOT accepted.
Maurice test overview tomorrow
Roberto
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 12
Main feature• Robust vs. SEU:
– All triplicated logic + majority vote (Address / WrRegData corrected each clock cycle).
– State machine returns to idle quickly by construction (no need of reset -FE-I3 like-).
– Error detection provided (XOR of all triplicated outputs). (Increments counter & stored in CmdBitFlip[4:0] Config. Reg.)
– Trigger 11101 single bit flip safe (bit flip flagged, but trigger issued).
– Various error counters (e.g. invalid field 1, 2, 3)
• Fully scan-able: 3 ports, TST_SE (Enable), TST_SI (Scan In), TST_Out (Scan Out).
More details provided tomorrow
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 13
Commands• trigger, fast, slow:
• Trigger: only the LV1 command• Fast: 3 commands.• Slow: allows 6 commands (16 possible).
Type Name Field 1 Field 2 Field 3 Description
Trigger LV1 11101 Level 1 trigger
Fast BCR 10110 0001 Bunch Counter Reset
Fast ECR 10110 0010 Event Counter Reset
Fast CAL 10110 0100 Calibration pulse
Slow CMD 10110 1000 Command Slow command header
Trigger OR Fast / Slow?
Which of 3 Fast OR Slow?
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 14
Commands• Trigger: LV1: In RunMode only. Trigger acquisition of
event. Only Field 1 (11101) needed OK with ATLAS requirement (1 trigger per 5 clock cycle).
• Fast: In RunMode only. Field 2 (not 1000): – BCR: Bunch Counter set to 0.– ECR: Event Counter reset. Clears data path (all memory
pointers, data structures, clears pending events). Interrupts data transmission if in progress.
– CAL: Calibration pulse sent in response. Delay (bx granularity) up to 64 bx. Width (1-256). Dig & Analog hits.
• Slow: Accepted at all time. No automatic taking of RunMode. Field 2 is 1000.
Cal Inj. Abder
Dig. Inj. Tomek
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 15
Slow Commands
Name Field 3 Field 41 Field 52 Field 6 Bits2 Description
RdRegister 0001 ChipId Address - - Read addressed FE register.
WrRegister 0010 ChipId Address Data 16 Write into addressed FE register.
WrFrontEnd 0100 ChipId - Data 672Write conf data to enabled DC. (1 to 40 DC @ a time)
GlobalReset3 1000 ChipId -
Reset command. Puts the chip in its idle state.
GlobalPulse3 1001 ChipId - - 1-64
Has variable pulse width.Used to latch / read data, inject dig hit, enable clock in stop mode. Reset command too.
EnDataTake3 1010 ChipId - - Sets the FE in RunMode.
Identifies slow command
3+1 bits Chip ID; broadcast 1xxx
6 bit address, for WrReg & RdReg
used by all write operations: into RegisterBank OR Config FE (40 DC shift register, 672 bits)
Note: A DisableRunMode command will be added.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 16
Notes on slow commands -1-
• List of global register (8-16b): doc FEI4_Global_Register_vX.– These are SEU-hard latches:
• Analog Pixel tuning: e.g.: PrmpVbpf, DisVbnA, Amp2Vbn… (~20)• FE Config.: PxStrobes (13 bits / 13 latches), PxSRSetup (Write to
which DC, S0, S1, … global communication to/from analog DC).
• LVDS (bias) / PLL (bias, clk config.) / VCAL (Internal calibration, 10 bits + setting delay 6 bits, LSB~1ns).
• DIG mode (DC clock source in stop mode, 8b/10b disable,…)• ColMask / ErrorMask / Trigger (Latency setting, self trigger, #
consecutive…).• Empty Record (empty pattern when 8b10b disabled).• ANAsel 1/2/3: MUX test analog buffer.• CMOSout 1/2: sel for InMUX.
see data out protocol
Abder
see InMUX
MichaelAndre
Tomek/ J-D
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 17
Notes on slow commands -2-
• Global Register:– Also EFuse shadow register: for redundant SR of
Double-Columns, trimming of references (CREF, VREF), Chip Serial Number.
• Grand Total: ~ 50 Global Register, either 8b or 16b wide.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 18
Notes on slow commands -3-
• WrFrontEnd: writing configuration to FE, 672 bits / 1 Double-Column granularity. Which DC addressed set by PxSRSetup register (0-40 possible). 13b / pixel Configuration of complete FE takes ~9ms.
• Note on WrFrontEnd: writing the register is also shift register out.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 19
Notes on slow commands -4-
• GlobalReset: Reset the whole FE to initial state. • GlobalPulse: Reset command of various length.
Selective reset based on length (à la FE-I3). Also used to latch / read data, inject dig hit, ctrl stop mode.
• EnDataTake: Sets the FE in RunMode. Can then decode L1T and Fast Commands. No automatic exit.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 20
B- Readout Architecture, Data Flow
• B1- 4-pixel digital region.• B2-Data transfer through the Double-Column.• B3- Compression / formatting at End of Column.• B4- Storage in FIFO.• B5- 8b10b coder and protocol out.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 21
B1- 4-pixel digital region• Choice made based on 3 ways of checking performance
of architecture chosen:– C++ description of chip: flexible framework with time-
based description of pixel region / DC / Chip / Communication protocol all 1st studies coupling pixels in phi / z / z&phi, various region size… Based on physics hits (see backup). Identified 4-pixel region architecture.
– Verilog model and test bench: Towards implementation. Other sources of inefficiency + power.
– Analytical model: Mathematical crosscheck of inefficiency (not time-based, no protocol).
Coherent picture For 3×LHC full lumi & 3.7cm layer: 4-pixel region tied in phi & z the winner!
David Arutinov
Tomek
Tomek
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 22
4-pixel region specs• Storage of up to five 4-pixel + neighbor events.• Small / big hit discrimination, 3 programmable
modes (of course no discrimination available too). 2 BX association for small hits.
• Analog info = 4b ToT.• Neighbor Logic (small hits in adjacent pixels -phi-):
4 bits.• Records up to 16 consecutive triggers.
Programmable latency up to 255 BX.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 23
Digital Pixel: Regional Architecture
Control
ClusteringLogic
Shared Digital Part
Local Buffer
DA
TA
TriggerLogic
local storage
• Store hits locally in region until L1T: 0.25% of pixel hits shipped to EoC DC bus traffic “low”.
low traffic on DC bus
Consequences of regional architecture:• Each pixel is tied to its neighbors -time info- (clustered nature of real hits).
Small hits are close to large hits! To record small hits, use position instead of time. Handle on TW. Spatial association of digital hit to recover lower analog performance.
• Lowers digital power consumption (below 10 μW / pixel at IBL occupancy).• Physics simulation Efficient architecture.
disc. top left
disc. bot. left
disc. top right
disc. bot. right
5 ToT memory /pixel
5 latency counter / region
hit proc.: TS/sm/big/ToT
Read & Trigger
Neighbor
Token
L1T Read
Digital Region4-Pixel Unit
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 24
Performance / EfficiencyIBL: charge sharing in Z comparable to phi
MemoriesSimulation Analytical
IBL 10xLHC IBL 10xLHC
50.047
% 2.19% 0.029% 2.25%
60.011
% 0.65% 0.003% 0.57%
7<0.01
% 0.16% <0.01% 0.13%
η=0
Mean ToT = 4
0.6%
Regional Buffer Overflow
@ IBL rate, pile-up inefficiency is the dominant source of inefficiency Inefficiency:
• Pile-up inefficiency (related to pixel x-section and return to baseline behavior of analog pixel) ~ 0.5%.
• Regional buffer overflow ~0.05%.• Inefficiency under control for IBL
occupancy.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 25
Digital Power
4-pixel region
for 21 regions <7mV
• Digital power:• at IBL occupancy,
digital power < 10μW/pixel.
Drop on Vdd
Tomek gives more recent estimate in his PDR
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 26
B2- Digital DC / Data transfer
• Made of 168 4-pixel digital region.• In DC, Token based readout (dual token scheme DC / EoC
with triple redundancy + majority voting).• 21 4-pixel digital region the base structure for clock /
buffering:– Skew-compensated clock routing ~0.8ns skew for all pixels
of array?– Buffering of read / L1T.
• Data transferred to FIFO asap. All controlled by EOCHL.• Address transfer with minimal number of gates for yield
enhancement (thermal encoder scheme). Data + Address is hamming coded, decoded and corrected before data compression block.
Tomek
Jan-David
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 27
B3- Formatting at EoC• Reducing bandwidth an issue, both at IBL & sLHC.• Estimated data rates with the same tools as
previously described, physics-based (MC data from Vadim Kostyukhin), extrapolation at various radius and various possibility to reduce rates.
• Studied clustering possibility, proximity algorithms, formatting. See backup formatting section.
• Formatting also to fit FIFO / 8b10b coding needs.
Tomek
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 28
e.g.: 10×LHC (50ns bx) / sLHC
40
80
120
160
200
1000
0 200 300 400 500 600
r [mm]
z [mm]
324 524
37/37
70
131
201
50.5
88.5
122.5
mean: 60
mean: 34
mean: 13.4
mean: 8.4
mean: 3.9
210
150
FE-I4, 50μm×250μm.FE-I4 simul., 50μm×250μm.FE-I4 Nigel, 50μm×250μm.FE-I4 sdtf 220908, 50×250μm2.
rates given in [pixel hits.bx-1cm-
2]
η=0 η=0.1 η=0.2 η=0.3 η=0.4 η=0.5 η=0.6 η=0.7 η=0.8 η=0.9 η=1.0
η=1.2
η=1.5
η=2.0
η=2.5
η=3.0
η=3.5
55.10 38.6759.1560.1260.0258.7461.18
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 29
Pixel occupancy Data bandwidth
• Pixel hit rate FE output bandwidth:– # bits / pixel transmitted?
» address 7+9 bits, analog info 4+2 bits 22b?» data output protocol?
• Reduce data output by taking into account clustered nature of physics hits / geometry.
NU
MB
ER
OF
PIX
EL
S
FE-I4, central module, 21cm layer
FE-I4, central module, 3.7cm layer
10xLHC
FE-I4, central module, 3.7cm layer
3xLHC
3xLHC
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 30
Formatting considered• Clustering: Z-clustering. Can have logic
in EoC to calculate Zcluster size (above certain # pixel adjacent in Z) discard analog info (long clusters in Z info not useful), ship out pixel ID + size of cluster At η~2.0, 0.6 BW? BUT Very dependant on FE location, and throw away analog info. NO.
• Proximity algorithm: Send out relative addresses pix: 7+9b 1 + 3b address (8 “next pixels” coded this way). 0.8 BW? BUT variable data format, error prone.
• Fixed format clustered data transfer.
distance count
2 37975
1 27978
655 1921
3 1527
4 929
653 878
6 482
5 352
8 345
10 303
12 280
11 278
14 267
7 262
9 257
0 1 656 657
2 3 658 659
4 5 .. ..
.. ..
653
654 655
numbering scheme histo distance
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 31
Fixed format clustered data• compression factor (all at 3×LHC)
3.7cm (vs. 21cm), η=0• indiv pixels: 4.09 (0.25)×(7+9+4+2)= 1.00 (1.00) A.U.• static 1×2: 3.45 (0.18)×(7+8+2×4+2)=0.96 (0.83) A.U.• dynamic 1×2: 3.02 (0.15)×(7+9+2×4+2)= 0.87 (0.74) A.U.• static 1×4: 2.86 (0.17)×(6+8+4×4+4)=1.08 (1.08) A.U.• dyn. in-DC 1×4: 2.43 (0.15)×(6+9+4×4+4)= 0.95 (0.95) A.U.• dynamic 1×4: 2.13 (0.14)×(7+9+4×4+4)= 0.85 (0.94) A.U.
DC (×40)
row
(×336)
column
row ToT
NL106.count.FE-
1.s-1
preliminary
Choice: Dynamic phi-pairing (dynamic 1×2) merge neighbours and small hits in process. Compression ok, simple to do and good format, 24 bits (nice for FIFO and 8b10b). Note that hamming decoding needed before formatter.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 32
B4- data storage FIFO
• In FIFO, record words stored as 3×8b words.• Beginning of data event, EOCHL stores 24-b Header
in FIFO.• Then data words are stored, address (16-b) + ToTs
(8-b) for 2 pixels.• In FIFO are also stored:
– Read back from Configuration.– Service messages.
• More in summary data format below.
Jan-David
From DC to FIFO
DC 6b
Region Add 8b
Data 20b
Ham
min
g D
ecod
er
From Columns
Event Builder
Hamming Encoder
Hamming Encoder
Hamming Encoder D
ata
Switc
h
Read out Control
Header
Fifo8 places
3 * 12 Bit
Word 0
Word 1
Word 2
8 Bits 12 Bits
36 Bits
Rea
d
Bus
y
Write Full
ServiceRead Back
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 34
B5- 8b10b encoder and protocol
• Normal mode is 8b10b coded.• Test mode is 8b10b off. Good thing for testing the
link (requirement of off-detector group).
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 35
8b10b• For IBL, need to transmit data out at BW of 160Mb/s• At BOC/ROD:
– Data rate 4 times the clock rate– Phase adjustment
• Use Clock Data Recovery mechanism• CDR requires an output data stream with good
engineering properties• 8b10b:
– adequate for this purpose, enough transitions for reliable CDR
– widely used easy to implement– provides some level of error detection– provides comma for frame identification & synchronization
I/O choices for ATLAS IBL, ATLAS Pixel System Design Task Force
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 36
Control symbols & Commas
• Control symbols: a set of 12 extra valid 10-bit sequences.
• Can be used as command.• K28.1, K28.5,
K28.7: commas.
11111 or 00000 can not be found anywhere else in data stream
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 37
K.28.7 & bit flip• Single bit flip in K.28.7 cannot transform the stream
into another meaningful stream (only into K.28.1 & K.28.5).
• K.28.1, K.28.5, K.28.7 can be used for re-synchronization of the 10-bit streams in case of loss of synchronization (only streams having a running disparity of +/-5).
• In sync. state, flip in regular data can not generate K.28.7.
• Note: Restriction. Not 2 K.28.7 in a row use K.28.5/1 for fillers.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 38
Frame• Frame built up (meaningful frames + empty records)
such as:
– detects single bit flips in synchronized state.– tolerant to loss of sync (data slipping) re-sync on
next commas.
– state machine in receiving part:• can do CDR.• can search the 11111 or 00000 stream and re-
synchronize to the commas.
– state machine in receiving part needs to: • check for violation of 8b10b protocol.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 39
Format implemented -1• Three 8b10b commas used:
– SOF: K.28.7.– EOF: K.28.5.– Idle state: K.28.1.
• Record words: (all that follows shown before 8b10b for clarity)– 24 bits long.– all start with 11101 (except: data record DR & empty record ER).– Data Header (DH): | 11101 | 001 | xxxx | [3:0]trigID | [7:0]bcID |
• header for transmission of regular data.• 001: 1-b flip give invalid code.• xxxx: for later uses.• trigID is trigger ID as received by ROD, bcID bunch crossing ID,
needed for internal consecutive triggers (up to 16 trigg depending on RunMode), stop mode (up to 255 trigg).
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 40
Format implemented -2– Data Record (DR): | [6:0]Column | [8:0]Row | [3:0]ToTtop |
[3:0]ToTbot |• Column numbering goes from 0000001 to 1010000.
– Address Record (AR): | 11101 | 010 | Type | [14:0]Address |• Address of a global register, or the position of the shift register.• 010: Flags the Address Record; 1b flip gives invalid code.• Type: 1 bit information. 0 = Global Register; 1 = Shift Register.• [14:0]Address: If Type 0, address gives the Global Register ID.
If Type 1, address gives the Shift Register position.• Note that the transmission of an Address Record always
requires the transmission of an associated Value Record.
– Value Record (VR): | 11101 | 100 | [15:0]Value |• Value of a global register, or value contained in the shift
register.• Note that the use of 11101 followed by 100 for a Value Record
allows also for sending Value Records with no Address Record before.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 41
Format implemented -3– Service Record (SR): | 11101 | 111 | [15:0]Message |
• A service message (e.g. error message).• 111: Flags the Service Record; 1b flip gives invalid code.• [15:0]Message: Service message.• Note that SR can belong to a data stream only a single SR is
then allowed.
– Empty Record (ER): | 3×ER[xxxx.xxxx] |• When 8b10b coding is turned off, to fit the 24-bit long record
word requirement and to ease the recognition of the end of a data stream, Empty Records are simply made of as many as needed 24-bit long programmable words. These can be 0-frames, but also 11001100 for example (that’s the 40MHz clk sent back).
• When 8b10b is on, and no data / SR / config. read back is pending, ERs are then transmitted out, made of as many as needed K.28.1 commas.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 42
Summary data format24-bit Record Word
Acro -nym
Field 1 Field 2 Field 3 Field 4Field
5Comments
Data Header
DH 11101 001 xxxx[3:0] trigID
[7:0] bcID
xxxx reserved for later use
Data Record
DR[6:0]
Column[8:0]Row
[3:0] ToTtop
[3:0] ToTbot
Column numbering: 0000001 to
1010000
Address
RecordAR 11101 010 Type
[14:0] Addres
s
Type 0: Global Register /
Type 1: Shift Register Position
Value Record
VR 11101 100[15:0] Value
Value Record without previous Address Record allowed
Service Record
SR 11101 111[15:0]
Message
Service Message (e.g. error codes)
Empty Record
ER ERvalue ERvalue ERvalue
Idle = K.28.1 commas
(8b10b coding case)
e.g.: SOF | DH | DR | DR | DR | SR | EOF | Idle | Idle | AR| VR | AR | VR | Idle…..
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 43
Output rates (sLHC)Trigger rate 100 kHz
Interactions per crossing 400
Sensor model 260um planar, unirradiated
Comparator threshold 4000e
Output format for analog data Fixed frame dynamic 2 pixel phi pairing
Bits / pixels per analog output frame 26 / 2
Output format for binary data Fixed frame dynamic 2 pixel phi pairing (L2, L3) Fixed frame dynamic 4 pixel group (L0, L1)
Bits / pixels per binary output frame 24 / 4 (L2, L3) ; 20 / 2 (L0, L1)
Encoding, parity, redundancy or headers None
Design margin Factor of 2
Layer(~rad.) , [cm]
comp. firing per cm^2 per BX
Required bandwidth per chip (Mb/s) (analog / binary)
chips/ module
320Mb/s LVDS outputs / module(analog / binary)
EOS card data volume (Gb/s)(analog / binary)
FE-I4 chip data losses (*) x 10-4
3.7 60.0 749 / 454 1 3 / 2 12.0 / 7.3 n/a + 5
7 18.4 230 / 140 4 3 / 2 5.5 / 3.4 n/a + 2
16 6.6 75 / 58 4 1 / 1 2.4 / 1.8 18 + 1
20 3.9 42 / 32 4 1 / 1 2.7 / 2.1 10 + 0
disks 80 max? 4 1 2.9? 10 to 20?
Now dynamic phi-pairing, 24 bits / 2 pixels.
Requirements for SLHC pixel electrical system (system design task force)
The simulations for an IBL at 3.7 cm radius and a luminosity of x3 LHC indicate that a data rate of at least 86 Mbps per FE.
I/O recommendations for IBL, Dec. 08. (system design task force)
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 44
C- Other means of communication -I/O
• C1- Stop mode.• C2- External control of DC.• C3- No 8b10b.• C4- InMUX.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 45
C1- Stop mode• Stop standard data acquisition and read all hits from
chip.• How:
– clock gating.– L1T and clock controlled externally by user (or logic).– e.g. procedure:
• Set latency to proper value (max?). • StopMode on. Clock control gated.• Send one trigger, send one clock, read all FE.
• Implementation details still worked on.• This will be an important test mode: test PDR!
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 46
C2- External control of DC
• A feature implemented in test submission of digital region (3D Tezzaron-Chartered).
• Control of DC externally with few signals:– L1T, read, peripherical trigger_counter value (and
clock) to be provided from outside.– Off-chip, sense token (sent out) to know if data is
available.
• Might turn out to be an interesting test feature too.• Needed? Still debatable. Implementation not yet
done.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 47
C3- No 8b10b• Link test like (need?) clock sent back from FE
turn off 8b10b, send empty frame with appropriate Empty Record value to mimic clock.
• Convenient to more directly check output data.
• Speed
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 48
C4- InMUX• Multipurpose slow control access:
– 4 configurable CMOS inputs and 4 outputs (their function depends on the 3 configuration pins InMUX_select).
– InMUX 1: direct control of Global registers and pixel SR.
– InMUX 2: manual control of EODCL.– InMUX 3: control of end of chip logic.– InMUX 4: scan chain for CMD.– InMUX 5: scan chain for EOCHL.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 49
THE END
• MORE INFO IN BACK-UP SLIDES (organized by topic) IF NEED IS.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 50
BACKUP
•BACKUP CLOCK
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 51
Star Config. Timing
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 52
Why fout,max=640 MHz
• fout=640 MHz is not a big challenge in 130 nm CMOS
• Frequency division easily possible
• More precise duty cycle handling @ 160 MHz
• Smaller capacitance values in LF, less area
• Potential need of a higher clock in the future
• Synergy with other projects
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 53
BACKUP
•BACKUP 4-pixel region
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 54
Introduction• Pixel hit rate consequences on digital architecture and on
FE data bandwidth (data output protocol, module concept, EoS…).
• Events: (Pythia generator)– WH(120GeV); Hbb.– overlaid with: 24 / 75 / 240 / 400 events pileup. “LHC”/“3×LHC”/ “sLHC” (25ns / 50ns
bx)
• Geometry: (Geant3 simulation package)– pixel size: FE-I3: 400×50μm2; FE-I4: 250×50μm2.– first: 4 barrels, 3.7 (FE-I4) & 5.05/8.85/12.25 cm radius FE-I3.– new: 6 barrels, 3.7/5.05/8.85/12.25/16/21 cm radius FE-I4.
• Threshold: first 3750e-. New down to 1000e-.• Files contain a list of pixels that record digital hit on a
bunch-crossing basis.
_
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 55
Foreword: Minimal Bias events
• FE-I4 for:
- b-layer upgrade: luminosity? radius? 75 ev pile-up & 3.7cm.
- s-LHC: lumi.? radius? 240/400 ev pile-up & outer layer. • Extrapolation to LHC energy:
extrapolation @ 14TeV: uncertainty ~ 30%? (1st years operation crucial to feedback simulation)
<# charged particles> / interaction
<pt charged particle> at η=0
Option for the region (history)
Different pixel organizations 2x2 (truncation) 1x4 (truncation) 1x4 1x8
Timing 40MHz Clock 20MHz Clock BCID (8bit gray timing)
Hit processing (HC3 mode)- schematic
Receives comparator output
BC resolution Generates Leading
Edge (LE) Generates Small hit
Leading Edge (sLE) Generates Trailing
Edge (TE) Generates ToT counter
reset and enable (rst_cnt, en_cnt)
Hit processing timings
58
HC
=1
1. Signal from comparator to short (no positive edge of clock)
2. Signal from comparator is recorded 3. Small leading edge (sLE) - signal to neighbor
(always 2 BC) 4. Leading Edge (LE) 5. Trailing edge (TE) 6. Reset (rst_cnt) and enable counter
1 2 3 4 5
6
compclkLE
sLETE
rst_cntcnt_clktot_cnt
mem_clk
ToT processing - schematic
59
Start ToT Counter Global LE
generation (orLE) Reset memory
signal generation (rst_mem)
Memory pointer selection (freeAddr)
Record reset/small in memory
Record neighbor Record TOT value in
memory
Memory Management - schematic
60
Selects free memory
Token management Selects triggered
memory during read
Enables outputs
Design: x5 latency cell
Latency Cell /Trigger- schematic
61
Start/Reset latency counter Indicate status (full) Trigger (triggered) Store/Recognize trigger ID
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 62
BACKUP
•BACKUP formatting
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 63
e.g.: 3×LHC
40
80
120
160
200
1000
0 200 300 400 500 600
r [mm]
z [mm]
37
50.5
88.5
122.5
6.24 6.40 5.98 5.80 5.87 6.40 6.06
2.53 2.54 2.52 2.53 2.62 2.63 2.62
1.40 1.23 1.25 1.25 1.36 1.33 1.32
4.07 3.87 4.04 3.98 3.94 4.07 2.69
FE-I3, 50μm×400μm.FE-I4 simul., 50μm×250μm.
η=0.1 η=0.2 η=0.3 η=0.4 η=0.5 η=0.6 η=0.7 η=0.8 η=0.9 η=1.0
η=1.2
η=1.5
η=2.0
η=2.5
η=3.0
η=3.5
(assumption: 100kHz L1T, 336×80 pixels FE-I4)
rates given in [106 pixel hits.module-1s-1][106 pixel hits.FE-1s-1]
For reference in backup slides:rates given in [pixel hits.bx-1cm-
2]
For reference in backup slides: 3LHCsame radius, FE-I4
3.98
6.06
2.55
1.30
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 64
Extrapolations to other radius
Reasonable fit with:exp(1.34-0.57*R)+0.15-0.0053*R
sLHC, 50ns bx / 400 events pileup
Hit
s/m
m2
r [cm]
Hit
s/m
m2
r [cm]
sLHC, 25ns bx / 240 events pileup
Reasonable fit with:exp(0.86-0.58*R)+0.088-0.0031*R
sLHC (25ns) sLHC (50ns)
Radius layer [mm] [pix.bx-1.cm-2] [pix.bx-1.cm-2]37 35 60
50.5 19.5 3470 10.6 18.4
88.5 7.8 13.4122.5 4.7 8.4
131 4.7 8.3150 4.2 7.1201 2.5 4.4210 2.3 3.9
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 65
Pixel occupancy Data bandwidth
• Example 1: pixel clustered in z.– no useful analog info.– can have logic in EoC to calculate Zcluster
size ship out pixel ID + size of cluster.
FE-I4, module 4, 3.7cm layer
red
uctio
n in
B
W
EoC
13579
≥ cluster size
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 66
Pixel occupancy Data bandwidth
• Example 1: pixel clustered in z.– no useful analog info.– can have logic in EoC to calculate Zcluster
size ship out pixel ID + size of cluster.
red
uctio
n in
B
WFE-I4, central module, 3.7cm layer
EoC
13579
≥ cluster size
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 67
Pixel occupancy Data bandwidth
• Example 1: pixel clustered in z.– no useful analog info.– can have logic in EoC to calculate Zcluster
size ship out pixel ID + size of cluster.
red
uctio
n in
B
WFE-I4, central module, 3.7cm layer
Very dependant on FE
location!And throw away analog info.
EoC
13579
≥ cluster size
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 68
Pixel occupancy Data bandwidth
• Example 2: proximity algorithms.
distance count
2 37975
1 27978
655 1921
3 1527
4 929
653 878
6 482
5 352
8 345
10 303
12 280
11 278
14 267
7 262
9 257
FE-I4, central module, 3.7cm layer
0 1 656 657
2 3 658 659
4 5 .. ..
.. ..
653
654 655
Send out relative addresses:pix: 7+9b 0 + 16b add1 + 3b add (8 “next pixels” coded this way)
compression efficiency: ~ 0.66(address only!)3xLHC
10xLHC
But: variable data format length error prone
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 69
Pixel occupancy Data bandwidth
• Example 3: clustered data out with fixed format.
• bit count / pixel (all at 3×LHC)3.7cm (vs. 21cm), η=0
• indiv pixels: 4.09 (0.25)×(7+9+4+2)= 90.0 (5.49) Mb.FE-1.s-1
• static 1×2: 3.45 (0.18)×(7+8+2×4+2)=86.2 (4.58) Mb.FE-1.s-1
• dynamic 1×2: 3.02 (0.15)×(7+9+2×4+2)= 78.5 (4.08) Mb.FE-1.s-1
• static 1×4: 2.86 (0.17)×(6+8+4×4+4)=97.2 (5.92) Mb.FE-1.s-1
• dyn. in-DC 1×4: 2.43 (0.15)×(6+9+4×4+4)= 85.3 (5.23) Mb.FE-1.s-1
• dynamic 1×4: 2.13 (0.14)×(7+9+4×4+4)= 76.5 (5.15) Mb.FE-1.s-1
DC (×40)
row
(×336)colum
nrow ToT
NL
Disclaimer: no header, trailer, DC-balancing, error correction…
assumption: 100kHz L1T, 336×80 pixels FE-I4
106.count.FE-
1.s-1
preliminary
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 70
Pixel occupancy Data bandwidth
• Example 3: clustered data out with fixed format.
• compression factor (all at 3×LHC)3.7cm (vs. 21cm), η=0
• indiv pixels: 4.09 (0.25)×(7+9+4+2)= 1.00 (1.00) A.U.• static 1×2: 3.45 (0.18)×(7+8+2×4+2)=0.96 (0.83) A.U.• dynamic 1×2: 3.02 (0.15)×(7+9+2×4+2)= 0.87 (0.74) A.U.• static 1×4: 2.86 (0.17)×(6+8+4×4+4)=1.08 (1.08) A.U.• dyn. in-DC 1×4: 2.43 (0.15)×(6+9+4×4+4)= 0.95 (0.95) A.U.• dynamic 1×4: 2.13 (0.14)×(7+9+4×4+4)= 0.85 (0.94) A.U.
DC (×40)
row
(×336)colum
nrow ToT
NL
assumption: 100kHz L1T, 336×80 pixels FE-I4
Disclaimer: no header, trailer, DC-balancing, error correction…
dyn. 1×4 better at small R? (larger η!) dyn. 1×2 at large R?
106.count.FE-
1.s-1
preliminary
For reference in backup slides: same at higher η
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 71
BACKUP
•BACKUP FIFO
After FIFO
Fifo8 places
3 * 12 Bit
Hamming Decoder
Hamming Decoder
Hamming Decoder
8 Bit / 10Bit SerializerMulti-
plexer
Fifo emptyRead
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 73
BACKUP
•BACKUP 8b10b and CDR
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 74
Main characteristics of 8b10b coding
• 8 bits data 10 bits data.• DC-balance: same number of 0’s and 1’s.• Disparity of 10b word: - 2, 0 or +2.• Maximum run length without transitions: 5 bits.• DC balancing Frequent transitions in data stream
Essential for clock recovery from the data stream (allows CDR).
• Some low level of error detection.• 256 data symbols (Dx.y) + 12 specific control
symbols (Kx.y)
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 75
8b10b code mapping: data -1-
• Dx.y: 8 bits 10bits.• Splits the 8 bits in 3 MSBs (y) and 5 LSBs
(x).• y = 3MSBs 4 bits.• x = 5LSBs 6 bits.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 76
8b10b code mapping: data -2-
3 bits
4 bits
5 bits
6 bits
(…) (…) (…)
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 77
8b10b code mapping: data -2-
3 bits
4 bits
5 bits
6 bits
(…) (…) (…)
Choice made from value of 4b6b stream to respect some properties (Disparity, uniqueness of some bit string).
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 78
8b10b code mapping: disparity counter
• Disparity D= #1’s - #0’s.• 4 bits: 16 values. Only 6 are disparity neutral (need 23).• 6 bits: 64 values. Only 20 are disparity neutral (need
25).• 4 bits and 6 bits: Only even disparity possible.• Therefore allow transmission of values with disparity -
2, 0 and +2.• Track down RD counter value and compensate!
RD=-1
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 79
8b10b code mapping
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 80
8b10b code mapping
RD+ data w. D = -2 or 0 transmitted
RD- data w. D = 0 or +2 transmitted
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 81
Error detection & Control symbols• Not real error detection: out of 1024 10-bit sequences,
only ~1/2 + 12 are allowed, remaining produce an error flag.
• Control symbols: a set of 12 extra valid 10-bit sequences.• Can be used
as command.• K28.1, K28.5,
K28.7: commas.
11111 or 00000 can not be found anywhere else in data stream
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 82
Robustness?• Stuff bits for error detection (parity check?/CRC?)?
• Max 5 consecutive identical bits. Furthermore, in commas only!
• Keep event synchronization: Robustness against single bit flip in header. • Use commas in the header? unique stream in K28.1, K28.5, K28.7
data / monitoring / configuration.• Some level of burst error protection too.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 83
Clock and Data Recovery -CDR- 1
• In receiver, PLL with approximate frequency reference, where phase alignment is done (phase alignment to the transitions in the data stream).
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 84
Clock and Data Recovery -CDR- 2
• PLL needs to lock to 8b10b stream: not a periodic signal!• PFD needs to be flexible enough to allow no transition in data stream during clock period.• Ex: Use tapped delay line, designed to be more than 1 data bit period, but less than 2.
• Transition in data stream:– centered: correct phase.– elsewhere: correct the VCO.– not present: do nothing.
Marlon Barbero, FE-I4 Review, Communication / Data Flow, CERN 2009 Nov. 3/4 85
Clock and Data Recovery -CDR- 3