Top Banner
GridPP Status David Britton, 3/Sep/08.
22

GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

Mar 27, 2015

Download

Documents

Jason Ross
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

GridPP Status

David Britton, 3/Sep/08.

Page 2: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

210/04/23

Switching on the LHCThe LHC was fully cold by mid August. This is being followed by continued powering tests, consolidation and machine checkout in preparation for beam. Two short injection tests took place 9/10 August and 23/34 August.

During the 2nd of these tests, Low intensity pilot bunches were injected at point 8, through LHCb, sector 78 and to the collimators at point 7. It all went very well! Remarkable performance from a huge number of systems.

The third test is planned for 5th - 8th September during which beam will be taken from point 8 to the beam dump at point 6, and a the same time perform a dry run of the totality of the LHC in preparation for the start of beam commissioning proper on the 10th Sep.

Page 3: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

3

Switching on the Experiments

10/04/23

Page 4: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

4

Switching on the Grid

10/04/23

CCRC08 in March and May

Page 5: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

5

CCRC08 Conclusions (Jamie Shiers)

10/04/23

Page 6: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

6

WLCG Growth

10/04/23

March 2008 September 2008

Page 7: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

7

GridMap

10/04/23

Most sites appear to be "ready" for September with storage tokens inplace and reasonable SAM test availability. However, GridMap continuesto show many sites as degraded. RAL-LCG2 (Tier-1) due to CASTOR; QMULhas a clock problem (also bringing a SToRM instance online); ECDFunknown; Brunel's cluster is undergoing maintenance ; IC-HEP failed overthe weekend possibly due to network outage (manpower shortage); Bristol(CMS filled storage) and BHAM network related problems. Other sites withrecent problems include RHUL as a disk pool failed, Oxford after atorque server directory filled up, Lancaster after certificate problems.Glasgow saw problems with its CE overnight which required aglobus-gatekeeper restart.

Page 8: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

8

UK Contribution

10/04/23

UKI

2006: 27%

2007: 18%

2008: 15%?

UKI

Page 9: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

9

Tier Centre Contributions

10/04/23

LondonNorthGrid

ScotGrid

Tier1SouthGrid

GridIreland

LondonNorthGrid

ScotGridTier1

SouthGrid

GridIreland

2007

NorthGrid: 34%

London: 28%

ScotGrid: 18%

Tier-1: 13%

SouthGrid: 7%

Page 10: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

10

Storage (doh!)

10/04/23

Page 11: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

11

CPU (in)efficiency

10/04/23

Page 12: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

12

Availability and Reliability

10/04/23

Last 6 months: RAL Reliability = 97%

Target reliability for best 8 sites raised

From 95% to 97% in June 2008.

July-08: RAL Reliability = 99%

(Target 97%)

July-08: RAL Availability = 98%

(Target 97%)

Page 13: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

13

UK Reliability (Steve’s Tests)

10/04/23

Page 14: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

14

Outcome of Funding Crisis

• As anticipated at the time of GridPP20, the programmatic review recommended a 5% cut to GridPP:

10/04/23

• Although ALICE and LHCb ultimately rescued, the cut was still imposed. However, there was a silver lining:

Page 15: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

15

Funding Cut

• Bottom line was that GridPP had to return £1.2m which was achieved by:

– Planned and unplanned late starts to a number of GridPP3 posts.

– Reduction in Tier-1 hardware to reflect changes imposed by the programmatic review (LHCb and BaBar).

– Recosting of hardware based on the 2007 procurement.

– A reduction in the budget line for the second tranche of Tier-2 hardware, consistent with the reduction in Tier-1 hardware.

– Reduction in travel and miscellaneous spending.10/04/23

Page 16: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

16

Tier-1 Hardware

• The FY2007 hardware procurement was brought in to production at the end of April:

– 182 Disk servers added 1439 TB of disk to the existing 922 TB

– 113 CPU systems added 3000 KSI2K to the existing 1450 KSI2K

– 3030 tapes (0.45TB each) were added to the existing ~4000.

– 12 T10K Tape drives added to the existing 3 (TK10) + 6(9940).

• This was a major increase in hardware in preparation for data

• FY2008 hardware procurement is now well underway (£2m).10/04/23

Page 17: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

17

Current Issues: Castor

• At the last Oversight Committee meeting in Oct 2007 CASTOR was a flagged as a key concern.

• Subsequently, various deployments of CASTOR culminating in version 2.1.6, deployed as separate instances for each experiment, established a successful service for CCRC08.

• Unfortunately, various issues and missing functionality required one further upgrade to 2.1.7. Although this was tested, once deployed it fails under heavy load.

• Unfortunately, it is a different subversion than CERN (due to site differences).

• Unfortunately, it is not possible to roll-back to 2.1.6.• Fortunately, the problem is only a corruption of requests

and not of files.• CASTOR team has worked extremely hard and this is an

undeserved bookend to mark 12 months of work!10/04/23

Page 18: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

18

Current Issues: CA

• Another current issue is the Certificate Authority.• Spun-off from GridPP to NGS a few years ago, we have

come to rely on this service.• “A Series of Unfortunate Events” beset the CA,

including the loss of a copy of the root certificate private key and then the Debian security problem which required a new root key.

• The NGS has responded decisively and have implemented management changes designed to try head off future unfortunate events, or handle them with greater agility.

• This is a timely move to try and ensure the highest level of service, which GridPP welcomes and thanks the NGS.

• GridPP should take note – we need to ensure all are services aspire to the highest levels.

10/04/23

Page 19: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

19

The Three R’s

• Reduce, Re-use, and Recycle (motto for the 21st Century?)

• Robust, Resilient and Reliable (motto for Grid Services?)

10/04/23

Robust: Strongly built or constructed.

Resilient: Able to deal readily with unexpected difficulties.

Reliable: To be certain of it working as expected.

....well, if I were going there, I wouldn’t start from here...

Page 20: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

20

The Elephant in the Room

• Our tidy little world of expert users running well honed production jobs on tidy data sets is about to be trampled under foot.

• We must anticipate working with a large number of less expert users, who are likely to want to run a more diverse range of applications in a whole variety of ways.

10/04/23

Page 21: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

21

Red Button Day

• September 10th is the official LHC start.

• All day Radio-4 coverage http://www.bbc.co.uk/radio4/bigbang/

• Events in Westminster and the Scottish Parliament.• Anticipate local interest (and ensure your site is

visible!)10/04/23

Page 22: GridPP Status David Britton, 3/Sep/08.. 2 31/03/2014 Switching on the LHC The LHC was fully cold by mid August. This is being followed by continued powering.

22

The end: The Beginning

• After 7 years of preparation, this is the moment we’ve been working towards.

• There are challenges ahead.

• Some things will go wrong.

• Communication is vital.

10/04/23