Top Banner
Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula
25

Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Mar 30, 2015

Download

Documents

Ezekiel Holt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

DCS Test campaign(Talk based on presentation to TB in December 2004)

Peter Chochula

Page 2: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Purpose of the tests• DCS is requested to provide information on system

scale• Test of hardware compatibility (e.g. PCI controllers

installed in PCI risers), components verification before the mass purchase

• Test procedure for the components delivered by sub-detectors

• Test procedures for hardware components before the installation

Page 3: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

The System Scale

• Two main activity areas of DCS tests:– Performance and stability tests– Resource consumption (implications on system

scale and hardware requirements)

• The tests cover also the DCS core computing (database, domain controllers, RAS…), system management and security

Page 4: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Performance tests with impact on DCS scale planning

• PVSS II is a system which can be distributed and/or scattered to many CPUs

• Two extreme approaches are possible:– group all processes on one machine

• Even if this configuration runs stable for some systems, there could be a problem if peak load occurs

– Dedicate one machine per task (LV, HV…)• Surely the computer resources would be wasted

• Definition of optimal balance between the performance and the size of the system requires tests with realistic hardware and data

Page 5: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

• Who is doing what?

– ACC follows tests performed by other groups and provides feedback to sub-detectors

– ACC performs test which complement the work of JCOP and other groups. This includes:

• Tests not performed by external groups• Tests for which the external planning is incompatible with our

schedule (e.g. OS management)• Alice-specific tests (FERO)

Page 6: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

PVSS Performance Tests

• Tests performed in the frames of the SUP– Communication between 130

PVSS systems. The influence of heavy load on PVSS has been studied

• Alarms absorption, display, cancellation and acknowledgment

• Data archival• Trending performance• Influence of heavy network traffic

SUP test hierarchyData generated by leaf

nodes was transported to the top level machines

Page 7: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

(Some) SUP results• 130 systems (with ~5 million DPEs defined)

interconnected successfully• Connection of 100 UIs to a project generating 1000

changes/s has been demonstrated– These tests were later repeated by our team in order to

understand the remote access mechanism

• Performance tests on realistic systems– 40000 DPE/machine equivalent to 5 CAEN crates– 1000 alerts generated on a leaf machine in a burst lasting

0.18 sec and repeated after 1s delay

Page 8: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Alert absorption by PVSS system

• The PVSS system was able to absorb all alarms generated on a leaf node– Display of 5000 came alerts: 26s– Cancellation of 5000 alerts: 45 s– Acknowledgment of 10000 alerts: 2 min 20 s

• ETM is implementing a new alarm handling mechanism which includes alarm filtering, summary alarms and will provide higher performance

Page 9: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

• The archiving was fully efficient during these tests (no data loss)– We would like to see the performance with the RDB archive

• Trending performance depends on queue settings. Evasive PVSS action (protecting the PVSS system from overloading) can disturb the trend, but data can be recovered from archive once the avalanche is gone.

• Alert avalanche is memory hungry (>200B/simple DP)• The ACC participated in additional tests (December 2004),

where the network has been flooded– No performance drop in the abovementioned tests has been observed– Report was published (Paul)

Page 10: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

DCS Test Setup• Test setup installed in

DCS lab– Prototypes of DCS core

computers– Worker nodes (rental)

Domain Controller

Terminal Server Router

Database Server

Database Server

CERN

Worker NodeWorker Node

Worker NodeWorker Node

Worker Node

Worker NodeWorker Node

Worker NodeWorker Node

Worker Node

Page 11: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

DCS core computers• In order to operate the DCS following core computers

will be needed:

– Windows DC – Application Gateway for remote access – Database server(s) with mass storage DCS infrastructure

node (prototype available)– Central operator’s computer

Prototypes for all components available. Database servers need further testing

Page 12: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

The DCS Lab

Backend Prototype

Pre-installationServers

Fronted prototype

Page 13: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Backend Systems

Backend Prototype

Pre-installation Servers

Page 14: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Complementary tests performed by the ACC – Remote access

• Tests performed by SUP indicated that a large number of UI can be connected to a running project – no interference observed up to 100 UIs – tests did not go further

• Our tests tried to simulate an external access using W2k3 Terminal Services to a heavy loaded system and observe the effects on– Terminal server– Running project

Remark: The Terminal Server is technology recommended by CERN’s security. Tests performed by ALICE were presented to JCOP

Page 15: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Win

dows

Ser

ver 2

003

Win

dows

Ser

ver 2

003

Win

dows

XP Pro

Win

dows

XP Pro

Win

dows

XP Pro

Computer Infrastructure for Remote Access Tests

DCS Private Net. 192.168.39.0

CERN Net.

Terminal server Router

PVSSMaster Project Remote

User

Remote UserRemote User

Win

dows

XP Pro

Page 16: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Computer loads for large number of remote clients

Terminal Server

#clients

Average CPU load

[%]

Mem [kB]

60 11.2 2781282

55 11.0 2788719

45 13.8 2790181

35 12.0 2672998

25 9.7 2067242

15 7.2 1448779

5 4.2 934763

0 4.9 666914

Workstation running the DCS project

#clients

Average CPU load

[%]

Mem [kB]

60 85.1 579666

55 86.6 579829

45 84.9 579690

35 81.3 579405

25 80.9 579384

15 81.4 579463

5 83.0 580003

0 83.7 579691

Master project generated 50000 datapoints and updated 3000 /s. Remote client displayed 50 values at a time

Page 17: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Conclusions on remote access tests

• Prototype performed well

• Memory consumption ~35 MB per “heavy” session

• CPU usage reasonable, one Xeon CPU running at 3GHz can handle the load

• Stability tested over weeks

Page 18: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Additional possible bottlenecks to be tested

• SUP tests were focused only on performance of the PVSS system

• What is different in real DCS configuration? Please remember the discussion which we had this morning

EM

UI

CM

VA

OPCc

OPCs

HW

OPCc

OPCc

OPCs

OPCs

HW

HW

Peak load ?Data queuing ?

Page 19: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

OPC tests• Results from CAEN OPC tests performed by HMPID

and JCOP made available in December 2004• Test setup covered the full controls hierarchy• Time to switch 200 channels : 8s• Recommendations:

– Maximum number of 4 fully equipped crates (~800 channels) per computer

• Tests provided very useful comparison between real hardware and software simulators

See G

iacin

to’s

talk

for d

etai

ls

Page 20: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

What happens next

• We are preparing inventory of the hardware (number of channels, crates, etc.) – data available on DCS page, detectors are regularly

requested to update the information

• More (not only) performance tests scheduled for early 2005

Page 21: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Additional activities in the lab• Component compatibility tests

– Will the PCI cards work together ?– Are they compatible with the PCI riser cards and rack-

mounted chassis ?– How do they perform in a rack ?

• Obvious but non-trivial questions– How do we arrange components in the racks?– What about cables, air-flow, …

• And of course – preparing for your software

Page 22: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Component Compatibility Tests

PCI Can Controller

NI-MXI2VME Master

2U PCI Riser

Page 23: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Trying to arrange the components in the racks…

Page 24: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

Test schedule (as presented to the TB)

12/04 1/05

2/053/05

4/055/05

6/05

ConfDB tests…………..ArchDB tests…………………..Full system configuration……………OPC stability………………..…..Alarms………………………………………..ArchDB – connection to IT……………………………..FERO – SPD prototype…………………….

Mixed system environment…...Patch deployment………………………..Network security tests…………………

Additional delay with respect to this planning (~2 months) accumulated due to late delivery of computers, but we need to catch up

Page 25: Peter Chochula, DCS Workshop Geneva, February 28, 2005 DCS Test campaign (Talk based on presentation to TB in December 2004) Peter Chochula.

Peter Chochula, DCS Workshop Geneva, February 28, 2005

• Input from sub-detectors is essential• Most unexpected problems are typically discovered

only during the operation– This experience cannot be obtained in lab– Pre-installation is a very important period for DCS

• Efficient tests can be performed only with realistic hardware– Components are missing– We do not have enough manpower to perform tests for all

possible combinations of hardware at CERN