Intelligent trigger for Hyper-K with GPUs Akitaka Ariga University of Bern, Switzerland
Jan 18, 2018
Intelligent trigger for Hyper-K with GPUs
Akitaka ArigaUniversity of Bern, Switzerland
Recent changes in design• Conventional design
– 10 compartments– Noise rate in each of them is about
SK scale
• Recently coming back to SK style– For cost optimization– 1 (or a few) large detector– Longer gate width– Larger number of PMT per
detector– Large noise rate to cope
Noise rate in Hyper-K• SK -> HK : Smaller signal and larger background
– Detector size -> larger -> gate width longer• 200ns ->500ns
– # of sensors -> larger N• 12k -> 20k ~ 80k
– Noise rate -> larger N• 4kHz -> 10kHz
– Photo coverage -> smaller smaller S• 40% -> 15% ~ 20%
• SK: 200ns x 12,000PMTs x 4kHz = 10 hits/gate (SK threshold = 33 hits)• HK:500ns x 20,000PMTs x 10kHz = 100 hits/gate
Direct impact on low energy neutrino physics, supernova and partially on proton decay
Signal in SK (40%)
Signal in HK (20%)
Noise level in HK
Noise level in SK
Solar neutrino
Supernova
Signal / background• Signal: 6 hits/MeV (SK,40%), 3 hits/MeV (HK,20%)• Noise level: expected number of hits in a gate– SK: 200ns x 12,000PMTs x 4kHz = 10 hits/gate– HK:500ns x 20,000PMTs x 10kHz = 100 hits/gate
Noise hits will be dominant at low energy (E<30MeV)
Solar neutrino
Supernova
Detectable energy• Detectable : Signal+Noise > Noise + noise fluctuation• Noise issue is essential to access low energy physics
below 20 MeV, where most of supernova, solar neutrino, some of proton decay signals exist.
Signal + noise in SK
Signal + noise in HK
Noise + 5s fluctuation= realistic threshold
detectable
Need to improve trigger quality• Be intelligent!– Use of 4D information hits, (x,y,z,t)
• Many ideas– Exploit TOF information to narrow gate
width next page– Vertex calculation: 2 hits can make a
hyperbolic surface, 4 hits can make unique identification of vertex position
– Ring pattern fitting
AB
C
Hyperbolic by A, B
Hyperbolic by B, C
),,,( tzyx
One of many ideas: Sub-volume triggering
• The largest factor of noise increase is gate width due to large detector Let’s make it small.
• Sub-volume triggering– Divide detector into several sub-volumes– In each sub-volume, perform inversion of
hit-time using distance from hit-positions– smaller gate width, canceling detector
size increase• Large computing power required
– triggering in O(100) sub-volumes
),,,( tzyx
)||,,,( 000 cAVtzyx
A
V),,( 000 zyx
center of sub-volume
projected params
A’
t t’
Intelligent trigger with GPUs• To profit of 4D data, need more computing power• GPU is an ideal solution: Expertise in LHEP-Bern– GPU: Graphic Processing Unit– Parallel processing with O(1000) processing cores– Triggering code can be highly parallelized
Parallel processing
• GPU allow you a parallel processing with thousands of processing cores.
Serial processCPU
Parallel processGPU
task 1task 2...
High computing power
1 full tower of CPU based computing cluster = 5-10 TFLOPS
NVIDIA GeforceTitan Z= 8 TFLOPS
FLOPS = floating-point operations per second
CMOS camera0.5 – 2.4 Gbyte/s
Experience of LHEP-Bern 1: High speed emulsion reconstruction
Custom-made real-time scanning microscope
(Real time) 3D track reconstruction with GPUs
x90 faster
Geforece GTX TITAN x 32688 cores, 6GB memory, 4.5 TFLOPs in each
JINST 9 P04002 (2014), GTC2014, GPU in high energy physics (2014)
• Hough transform with GPU• x 50 faster processing achieved
x 50 faster
LAr detector (ArgonTube at LHEP-Bern)
Experience of LHEP-Bern 2: Reconstruction of LAr-TPC
Possible hardware for HK• Data will be distributed to several nodes equipped
with GPUs• O(100) processes run with O(100,000) GPU cores
4U processing server2 CPU x 10 cores8 GPUs (24,000 cores)
Processing machine
GPU
2.5 Gbyte/s
CPUCPU
Processing machine
GPU
CPUCPU
Processing machine
Improve WIT?
• One of the bottlenecks with current algorithm is number of combinations.– To calculate a vertex with 4 hits– nC4 quickly increase like n4
– 10C4 = 210 (SK level), 100C4 = 3.9x106 (HK level)– (according to Michael Smy, a hit selection can
reduce n4 -> n3, which is implemented in WIT)• A comparison of processing time is quickly
studied.
Vertexing by 4-hits combination• Using a WCSim-simulated data provided by Yano
– H 100m, D 69m, electrons start from center– Only signal hits are used, 5000 events.
• Implement code in CPU and GPU• Equivalent result is, of course, obtained in GPU
CPU GPUVertices are reconstructed at center of detector (0,0,0), as it should be.
First comparison in speed• Basic optimization done for CPU code• Factor 35 faster with GPU• In my experience, it can be additional factor 2-5 faster with
further optimization.
3MeV 5 711
1315 MeV(about 500,000 combinations / event)9
20 MeV(about 1.6 million combinations / event)cpu 788 secgpu 22.71 sec
Sub-volume triggering
• In each sub-volume, perform inversion of hit-time using distance from hit-positions– smaller gate width, canceling
detector size increase• Test with simulated data– H 100m, D 50m– electron emitted from center to
x direction
),,,( tzyx
)||,,,( 000 cAVtzyx
A
V),,( 000 zyx
center of sub-volume
projected params
A’
t t’
xz
y
(0,0,0)
Sub-volume triggering
),,,( tzyx
)||,,,( 000 cAVtzyx
A
V
),,( 000 zyxpredefined vertex
projected params
A’
xz
y
• time back-calculation to predefined vertices along xx axis = [500, 1500] ns, 10 ns binning, blue histogram = event related
100 m height, 69 m diameter, 19 k PMTs, 9 MeV
Center
Subvolume triggering• time back-calculation to predefined vertices along Z
),,,( tzyx
)||,,,( 000 cAVtzyx
A
V
),,( 000 zyxpredefined vertex
projected params
A’
xz
y
x axis = [500, 1500] ns, 10 ns binning, blue histogram = event related
100 m height, 69 m diameter, 19 k PMTs, 9 MeV
Center
軸方向に vertexを並べたときに比べてピークが局在化。高い値を持つ領域は楕円球状に存在する trackingできる、そしていくつかの subvolumeの連続することを要求すればBGも落とせる。
Signal/BG Separation• Predefine vertices every 5m in detector
volume(~3000 vertices)• Find vertex which has highest entry in one
of time bin• 9 MeV electron from center x 5000 events Predefine vertex
every 5m
Simply counting # of hits in 500 ns gate width
Number of hits in 10 ns in the most probable predefined vertex (time-space)数字上 2.7から 7シグマに向上するが思ったよりセパレーションがよくない。。。そもそもガウシアンではない。 Noise onlyに対しても3000個の Vertex で最大値を取ると chance coincidenceで高く出てしまうことが原因。要改良。
noise
meanss
s=2.7 s=7.0
noise only noise + signal
スピード
Summary
• Noise rate is a crucial issue for low energy neutrino, supernova and proton decay
• We are investigating an intelligent trigger by exploiting 4D data from detector
• Larger computing power of >O(100) could be necessary An use of GPUs is a promising solution