LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW LBT AO Real Time Software Roberto Biasi, Mario Andrighettoni, Dietrich Pescoller Microgate S.r.l. Via Stradivari,4 39100 – Bolzano Italy
Dec 25, 2015
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
LBT AO Real Time Software
Roberto Biasi, Mario Andrighettoni, Dietrich Pescoller
Microgate S.r.l.Via Stradivari,4
39100 – Bolzano Italy
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Outline
Hardware platform
Software platform and Programming language
Interfaces (communication)
Slope computer SW
Real Time Reconstructor SW
Mirror Control SW
Safety issues
Performances
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Hardware platform
Real time software: Analog Devices TigerShark DSP 32 bit floating point
Super-Harward architecture
SIMD: Single Instruction Multiple Data
Internal clock 242.86 MHz 485 32 bit floating point Mmac/s (if properly programmed…)
Diagnostic: NIOS embedded processor, implemented on Altera Stratix FPGA 32 bit RISC processor, clock 60.7 MHz
Offloading of several tasks on FPGA
‘Internal’ and ‘external’ connectivity among the main design goals from the beginning, no bottlenecks !
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Software platform/programming language
Real time software: TigerShark assembly only ! Why ?
Performance loss using C is all but negligible (~2.5x)
AD assembly language is ‘relatively’ readable
Diagnostic software (NIOS): C/C++
Operating System
All real-time software is OS-free!
It’s a hard real time environment with very few tasks and very limited interaction between tasks
We never felt the need of an operating system
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Data transfer latency is a critical aspect in parallel computers
• Strict separation between real-time communication and diagnostic communication
Real-time
Used to transfer real-time data (slopes from Slope Computer to Real Time Reconstructor, intermediate data of parallel processing), in a time deterministic way
Based on 2.125 Gbit/s Fibre-channel physical standard (layer 0, 1)
Thoroughly handled by HW (FPGA)Thoroughly handled by HW (FPGA)
Up to 4.25 Gbit/s ‘raw’ communication speed (two channels paired), 3.3Gbit/s3.3Gbit/s typical data throughput
Robust protocol (CRC control)
Interfaces –communication
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Interfaces –communication (cntd.)
Diagnostic
Used to transfer diagnostic data (circular buffers), system configuration (code and coefficients uploading), operation (change operating mode), housekeeping (temperatures, currents, …) and maintenance (firmware upgrades, …)
Access to all devices embedded on BCU and DSP boards (DSP, SDRAM, SRAM, FPGA, SRAM). Memory mapped access.
Based on Gigabit Ethernet, dedicated UDP/IP protocol (MGP). Now we are convinced it has been a good choice
UDP stack implemented on NIOS soft-core embedded processor (BCU FPGA)
Speed: currently ~90MBit/s actual data throughput. Main bottleneck on host PC. Improvement margin also on BCU.
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Interfaces – communication primitives
Small set of low-level primitives
Write_same: copies a data vector from source memory to the same internal addresses of all destination devices (e.g. DSPs)
HOSTDSP #N
FIRST DATA
LAST DATA
FIRST ADDRESS
LAST ADDRESS
DSP #N+1
DSP #N+M
Write_sequential: the source data vector is split among the destination devices, writing different data to the same internal addresses
Read_sequential: data are read sequentially from the same internal address of the devices
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Same concept very extensively used on MMT336
On each DSP board, 6 linear or circular buffers can be filled with the content of any DSP memory location (single location or contiguous vector).
Bi-directional (content of buffer is written to a DSP memory location)
Sampling occurs at each local control cycle (~71 KHz)
Decimation
Triggering on logic condition (=, , <, >) occurring on another memory location
Buffering management and data storing handled by FPGA. No DSP Buffering management and data storing handled by FPGA. No DSP SW overhead, DMA memory accessSW overhead, DMA memory access
64 Mbytes available on each DSP board, dynamic memory allocation
Interfaces – diagnostic buffers
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Slope computer
Implemented on a single BCU (1 TigerShark DSP)
Direct interfacing to SciMeasure LittleJoe through AIA interface
The slope computer is also the arbiter of the RTR operation. Sequencing of operations relies on determinism of real-time communication
Time master is the frame sync signal from CCD controller
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Slope computer
CCD is sampled by SciMeasure LittleJoe
Data transfer through standard AIA port, connected to BCU PIO port
The BCU FPGA sequences the pixel data into DSP memory, according to a user-definable LUT
The BCU DSP performs slopes computation, sequences the AdSec RTR operation and stores the diagnostic data
Through the LUT, the user specifies how many new slopes can be computed by the DSP
Pipelined operationPipelined operation
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Slope computer
Start_of_frame
Pixel reading
kfkk
fk qaq 0
)()(
)(
)()(
4,)(
3,)(
2,)(
1,)()(
,
)(4,
)(3,
)(2,
)(1,
)()(,
fk
fk
fk
fk
fk
fky
fk
fk
fk
fk
fk
fkx
qqqqb
qqqqb
)(4,
)(3,
)(2,
)(1,
)( 1fk
fk
fk
fk
fk qqqqb
SUBN
j
fj
fj
fj
fj
SUBfk
qqqq
Nb
1
)(4,
)(3,
)(2,
)(1,
)(
constant)( fkb
Pixel offset & gain correction:
Slopes calculation:
where:
1,0,)()( )( offoffff f ssσs Slope correction:
RTR management
‘regular’
Average flux normalization
User-definedconstant norm.
OR
OR
On-sky interaction matrix calibration
Slopes computation
User-definable parameters. soff,0 can be modified on-fly
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Real Time Reconstructor
Implemented on the Adaptive Secondary control electronics
Parallel computing: every DSP computes 4 outputs
Parallel operation sequencing and data distribution managed by Slope Computer
Very efficient data re-circulation through real-time communication
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Real Time Reconstructor
Start_RTR )1()1()(0
)( iiii edsBGm end_of_RTR=TRUE
iii pc 0)()(
1 Mmc 0__ mmofnend
)(1
)( iiF Kcf
)1(0
)()( iFF
iiC ffff
)()1()( iF
iFF
iFF fff
)1(1
)1(1
)(0
)( iFCF
iCF
iCF
iFC fAfBfBf
)()()1()( iFC
iF
iF
iF ffff
)1()(1
)( NiN
ii sBsBd )1()(
1)( Mi
Mii mAmAe
Start_MM
Start_FF
end_of_MM=TRUE
update_FF = TRUE
To control routine
Triggeredby
SlopeComputer
Checkedby
SlopeComputer
Actual mean gap
Control forces pseudo-integrator (offloading to feedforward)
Can be swapped on-fly: [G] diag.matrix (global gain) [B] gain matrixes BF0 BF1 AF1 force integrator coeffs.Max N=3, M=4
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Local mirror control
Loop runs @ ~70kHz (capsens sampling @ ~140kHz)
Computational time: 2.4µs (Total time for 4 channels handled by each DSP.)
16% of CPU time used by local control algorithm
Capsens linearization
Position compensator (4/4 IIR on pos.error)Velocity compensator (4/4 IIR on position)
Interrupt start(capsens ready)
CF ffff 0
CapsensADC EOC
Write to drives DACs
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Timing
1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7
SOF
Pixel readout
Slopes computation
RTR and FeedForward
1: transfer of 1256 slopes to DM2 crates2: last tap of RTR IIR filter3: modes vector re-circulation4: commands calculation (modes to pos)5: commands vector re-circulation6: feed forward currents calculation
7: FFWD currents check (Slope Comp.)
Slopes computation
Pixel readout (SciMeasure LittleJoe)
25µs7µs8µs3µs8µs3µs
5µs
~ 0 µs (pipelined)
850µs
13µs7µs
20µs35µs20µs35µs
35µs
~ 0 µs (pipelined)
850µs
130µs
P45 LBT672
54µs
SC: EEV39 (80x80), no binningRTR: 3/4 IIR filter, modal control, 672 modes
(order of filter does not affect computational time !)
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Safety (shell…)
The Slope Computer can perform efficiently global checks on position commands, feedforward force commands, modal amplitudes
Check of position and modal amplitudes has moderate cost in terms of time (~5µs each): data are re-circulated anyway !
Check of feedforward currents more time consuming (read, check and write): ~35µs
Policy in case of exceeding command (To Be Tested):
Skip thoroughly external loop control cycle (= keep old mirror commands)
Apply only ‘acceptable’ commands, replace exceeding ones with old commands
LBT AO meeting – Arcetri, February 2005 LBT AO Real-Time SW
Conclusions
• The real time SW, including diagnostic and real-time communication, performs as expected by initial design. We are happy…
• Development completed ! Minor issues:
• Minor refinement and interface debugging with high level interface
• ‘shell safety’ policy still to be decided