Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula
Mar 30, 2015
Peter Chochula, DCS Workshop Geneva, February 28, 2005
DCS Test campaign(Talk based on presentation to TB in December 2004)
Peter Chochula
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Purpose of the tests• DCS is requested to provide information on system
scale• Test of hardware compatibility (e.g. PCI controllers
installed in PCI risers), components verification before the mass purchase
• Test procedure for the components delivered by sub-detectors
• Test procedures for hardware components before the installation
Peter Chochula, DCS Workshop Geneva, February 28, 2005
The System Scale
• Two main activity areas of DCS tests:– Performance and stability tests– Resource consumption (implications on system
scale and hardware requirements)
• The tests cover also the DCS core computing (database, domain controllers, RAS…), system management and security
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Performance tests with impact on DCS scale planning
• PVSS II is a system which can be distributed and/or scattered to many CPUs
• Two extreme approaches are possible:– group all processes on one machine
• Even if this configuration runs stable for some systems, there could be a problem if peak load occurs
– Dedicate one machine per task (LV, HV…)• Surely the computer resources would be wasted
• Definition of optimal balance between the performance and the size of the system requires tests with realistic hardware and data
Peter Chochula, DCS Workshop Geneva, February 28, 2005
• Who is doing what?
– ACC follows tests performed by other groups and provides feedback to sub-detectors
– ACC performs test which complement the work of JCOP and other groups. This includes:
• Tests not performed by external groups• Tests for which the external planning is incompatible with our
schedule (e.g. OS management)• Alice-specific tests (FERO)
Peter Chochula, DCS Workshop Geneva, February 28, 2005
PVSS Performance Tests
• Tests performed in the frames of the SUP– Communication between 130
PVSS systems. The influence of heavy load on PVSS has been studied
• Alarms absorption, display, cancellation and acknowledgment
• Data archival• Trending performance• Influence of heavy network traffic
SUP test hierarchyData generated by leaf
nodes was transported to the top level machines
Peter Chochula, DCS Workshop Geneva, February 28, 2005
(Some) SUP results• 130 systems (with ~5 million DPEs defined)
interconnected successfully• Connection of 100 UIs to a project generating 1000
changes/s has been demonstrated– These tests were later repeated by our team in order to
understand the remote access mechanism
• Performance tests on realistic systems– 40000 DPE/machine equivalent to 5 CAEN crates– 1000 alerts generated on a leaf machine in a burst lasting
0.18 sec and repeated after 1s delay
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Alert absorption by PVSS system
• The PVSS system was able to absorb all alarms generated on a leaf node– Display of 5000 came alerts: 26s– Cancellation of 5000 alerts: 45 s– Acknowledgment of 10000 alerts: 2 min 20 s
• ETM is implementing a new alarm handling mechanism which includes alarm filtering, summary alarms and will provide higher performance
Peter Chochula, DCS Workshop Geneva, February 28, 2005
• The archiving was fully efficient during these tests (no data loss)– We would like to see the performance with the RDB archive
• Trending performance depends on queue settings. Evasive PVSS action (protecting the PVSS system from overloading) can disturb the trend, but data can be recovered from archive once the avalanche is gone.
• Alert avalanche is memory hungry (>200B/simple DP)• The ACC participated in additional tests (December 2004),
where the network has been flooded– No performance drop in the abovementioned tests has been observed– Report was published (Paul)
Peter Chochula, DCS Workshop Geneva, February 28, 2005
DCS Test Setup• Test setup installed in
DCS lab– Prototypes of DCS core
computers– Worker nodes (rental)
Domain Controller
Terminal Server Router
Database Server
Database Server
CERN
Worker NodeWorker Node
Worker NodeWorker Node
Worker Node
Worker NodeWorker Node
Worker NodeWorker Node
Worker Node
Peter Chochula, DCS Workshop Geneva, February 28, 2005
DCS core computers• In order to operate the DCS following core computers
will be needed:
– Windows DC – Application Gateway for remote access – Database server(s) with mass storage DCS infrastructure
node (prototype available)– Central operator’s computer
Prototypes for all components available. Database servers need further testing
Peter Chochula, DCS Workshop Geneva, February 28, 2005
The DCS Lab
Backend Prototype
Pre-installationServers
Fronted prototype
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Backend Systems
Backend Prototype
Pre-installation Servers
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Complementary tests performed by the ACC – Remote access
• Tests performed by SUP indicated that a large number of UI can be connected to a running project – no interference observed up to 100 UIs – tests did not go further
• Our tests tried to simulate an external access using W2k3 Terminal Services to a heavy loaded system and observe the effects on– Terminal server– Running project
Remark: The Terminal Server is technology recommended by CERN’s security. Tests performed by ALICE were presented to JCOP
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Win
dows
Ser
ver 2
003
Win
dows
Ser
ver 2
003
Win
dows
XP Pro
Win
dows
XP Pro
Win
dows
XP Pro
Computer Infrastructure for Remote Access Tests
DCS Private Net. 192.168.39.0
CERN Net.
Terminal server Router
PVSSMaster Project Remote
User
Remote UserRemote User
Win
dows
XP Pro
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Computer loads for large number of remote clients
Terminal Server
#clients
Average CPU load
[%]
Mem [kB]
60 11.2 2781282
55 11.0 2788719
45 13.8 2790181
35 12.0 2672998
25 9.7 2067242
15 7.2 1448779
5 4.2 934763
0 4.9 666914
Workstation running the DCS project
#clients
Average CPU load
[%]
Mem [kB]
60 85.1 579666
55 86.6 579829
45 84.9 579690
35 81.3 579405
25 80.9 579384
15 81.4 579463
5 83.0 580003
0 83.7 579691
Master project generated 50000 datapoints and updated 3000 /s. Remote client displayed 50 values at a time
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Conclusions on remote access tests
• Prototype performed well
• Memory consumption ~35 MB per “heavy” session
• CPU usage reasonable, one Xeon CPU running at 3GHz can handle the load
• Stability tested over weeks
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Additional possible bottlenecks to be tested
• SUP tests were focused only on performance of the PVSS system
• What is different in real DCS configuration? Please remember the discussion which we had this morning
EM
UI
CM
VA
OPCc
OPCs
HW
OPCc
OPCc
OPCs
OPCs
HW
HW
Peak load ?Data queuing ?
Peter Chochula, DCS Workshop Geneva, February 28, 2005
OPC tests• Results from CAEN OPC tests performed by HMPID
and JCOP made available in December 2004• Test setup covered the full controls hierarchy• Time to switch 200 channels : 8s• Recommendations:
– Maximum number of 4 fully equipped crates (~800 channels) per computer
• Tests provided very useful comparison between real hardware and software simulators
See G
iacin
to’s
talk
for d
etai
ls
Peter Chochula, DCS Workshop Geneva, February 28, 2005
What happens next
• We are preparing inventory of the hardware (number of channels, crates, etc.) – data available on DCS page, detectors are regularly
requested to update the information
• More (not only) performance tests scheduled for early 2005
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Additional activities in the lab• Component compatibility tests
– Will the PCI cards work together ?– Are they compatible with the PCI riser cards and rack-
mounted chassis ?– How do they perform in a rack ?
• Obvious but non-trivial questions– How do we arrange components in the racks?– What about cables, air-flow, …
• And of course – preparing for your software
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Component Compatibility Tests
PCI Can Controller
NI-MXI2VME Master
2U PCI Riser
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Trying to arrange the components in the racks…
Peter Chochula, DCS Workshop Geneva, February 28, 2005
Test schedule (as presented to the TB)
12/04 1/05
2/053/05
4/055/05
6/05
ConfDB tests…………..ArchDB tests…………………..Full system configuration……………OPC stability………………..…..Alarms………………………………………..ArchDB – connection to IT……………………………..FERO – SPD prototype…………………….
Mixed system environment…...Patch deployment………………………..Network security tests…………………
Additional delay with respect to this planning (~2 months) accumulated due to late delivery of computers, but we need to catch up
Peter Chochula, DCS Workshop Geneva, February 28, 2005
• Input from sub-detectors is essential• Most unexpected problems are typically discovered
only during the operation– This experience cannot be obtained in lab– Pre-installation is a very important period for DCS
• Efficient tests can be performed only with realistic hardware– Components are missing– We do not have enough manpower to perform tests for all
possible combinations of hardware at CERN