Top Banner
vendredi 25 mars 202 2 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France
11

Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Dec 31, 2015

Download

Documents

Ezra Griffith
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

mercredi 19 avril 2023

CA update procedure

Hélène Cordier

IN2P3/CNRS Computing Centre, Lyon, France

Page 2: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Contents

Context Rationale Feedback and Suggestions Conclusions

Page 3: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Context

http://goc.grid.sinica.edu.tw/gocwiki/Procedure_for_new_CA_release

D.Groep opens a ticket in GGUS SA3 creates a test repository with the new CA rpms. SAM team makes a new version of the CA sensor. CERN-PPS upgrades to the new CA rpms. SAM validation instance runs over CERN-PPS to test the new sensor. The new sensor is put into the SAM production instance. SA3 updates the lcg-CA production repository. Broadcast to sites.

Page 4: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Context

Friday 01/02/08

https://gus.fzk.de/ws/ticket_info.php?ticket=31993&from=search« Due to miscommunication step 5 was already executed while the rest of the procedure was blocked on step 3, which was finally done Monday at the end of the afternoon, but then step 6 could not be executed due to a problem with AFS permissions, which should get solved on Tuesday morning...

Earlier« Release of CA1.17-1 that sites are complaining that they are in a "warning status" without being told that there is a new set of rpms; i.e *whithout* being told as rpms are set in the repository. i.e the step 8 of the process seems not to be properly followed up. In this specific instance, sites were appearing in "warning state" while the new CA version was updated. The associated GGUS ticket was closed - SAM tests and rpms released - without relevant "broadcasts" being published »

The topic came out again at the EGEE'07 ROC managers and again on the ROC list.

Page 5: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Rationale

A round the table has recently been done with the involved SAM/integration/deployment/security teams and no dramatic truths have emerged against the current process.

Except SAM'tests modifications request recorded below, recommendations to be examined in the ROC managers attendance basically concern external CA upgrade procedure mechanisms to improve and ensure the procedure is smoothly (and rapidly, i.e less than a day) followed up until the very last step and to help fulfill the sites' need for communication on that process.

Page 6: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Feddback from mailing listsSAM CA tests

[SAM – GGUS ticket # 32204 – How about SAM tests modificationsComes from Stephen Burke's remark on Feb 04th on rollout makes also echo to several others‘ – M.Lithmath, Jeremy Coles, COD meetings.« Now, none of this would have been a problem if the CA sensor only required a _minimum_ version instead of an exact version. Maybe I am missing some technical detail here, but I argue that the test should be changed. » M.Lithmath LHC Computer Grid - Rollout > [mailto:[email protected]] On Behalf Of Maarten Litmaath said: > Yes, that is a possibility, but it gives more work to the SAM team. As opposed to causing trouble for 250+ sites ... anyway, the current procedure manages to have a switch from generating warnings to errors, is it that hard to have two switches - nothing -> warning -> error? Stephen .No specific answer has ever been given except a non-warning "one-day grace period«  - Nov 19th 2007Last update on ticket:From SAM team : proposes now procedure modifications; no improvements in the tests so far.From H.Cordier : Please take into account SAM tests improvements suggestions and help that have been proposed on rollout.Namely from Eygene Ryabinkin : [email protected].]

Page 7: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Suggestions /feedback from round the table 1/2The procedure is followed up till the end

1.1. Start the whole process on Mondays onlyStart the whole process on Mondays only::Proposed by SAM yesterday # 32204Proposed by SAM yesterday # 32204Wednesdays 12:00 seem to be more reasonable as urgent updates need Wednesdays 12:00 seem to be more reasonable as urgent updates need to take place and in emergency the whole process should not take to take place and in emergency the whole process should not take longer than a day.longer than a day.

2.2. Involve the OSCT-DC in the ticket so that they close the GGUS ticket at Involve the OSCT-DC in the ticket so that they close the GGUS ticket at the last step of the processthe last step of the process, in order to decouple the people doing the , in order to decouple the people doing the procedure from the people verifying the process, making sure the repositories procedure from the people verifying the process, making sure the repositories are updated and broadcast done.are updated and broadcast done.

OSCT validate the need of an external observer and their involvement is OSCT validate the need of an external observer and their involvement is validated by CERN teams.validated by CERN teams.

Page 8: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Suggestions /feedback from round the table 2/2Improve communication towards sites

3.3. Introduce a CA release process indicatorIntroduce a CA release process indicator, to allow sites to follow the process - namely when a release is about to be prepared.

Indeed, if site admins wish to be informed they could just subscribe to a RSS flow against the change of status of this CA release process indicator. Consequently, site admins know at which stage the new CA release process is in.

Integration team /J.Flammer supports this idea and mentions that the GGUS ticket number could be published together with the indicator process. This small page could be developped within integration team - TBC.

Page 9: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Example of process indicator for sites

First step could be that D. Groep changes the status of the CA release process indicator from "done" to  a status "initialized" at D-15.

At D-Day, D. Groep creates the GGUS ticket according to the current procedure *and* modifies the CA release process indicator to « in progress  ».

The procedure goes then unchanged until the last step when the integration team instead of closing the ticket directly at the end of step 8, assigns the ticket to OSCT-On-Duty.

An extra "step 9" could be : The duty is for the OSCT-DC to close the given GGUS ticket after checking that both links in the step 7 of the procedure are correctly updated and that the broadcast in step 8 is done. Finally, he then sets the CA release process indicator to "done".

Page 10: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Additional Remarks

•SAM test improvements #32204 OCC

• Remaing improvement margins seem to be reliant on SAM/integration/deployment internal organisation and are very difficult to have leverage on (priorities ????) and out of scope here, except for ROC/sites to mention *each time* that the existing process is no sufficient counterpart to this lack. Namely, improvement in direct communication between CERN teams / phone numbers cf  .J.Flammer, F.Schaer

•Add INFO status within SAM tests in addition to WARNING status, proposal by Gergely D., Fred Schaer, S. Burke.

•SAM test improvements #32204 OCC

• Remaing improvement margins seem to be reliant on SAM/integration/deployment internal organisation and are very difficult to have leverage on (priorities ????) and out of scope here, except for ROC/sites to mention *each time* that the existing process is no sufficient counterpart to this lack. Namely, improvement in direct communication between CERN teams / phone numbers cf  .J.Flammer, F.Schaer

•Add INFO status within SAM tests in addition to WARNING status, proposal by Gergely D., Fred Schaer, S. Burke.

Page 11: Lundi 12 octobre 2015 CA update procedure Hélène Cordier IN2P3/CNRS Computing Centre, Lyon, France.

Summary Conclusions

Round the table closed. Sites should add comments in their RC reports for further debate. Actions needed from ROC and OCC to validate and follow-up :

– 3 proposals on the procedure itself.– SAM tests modification and GGUS ticket # 32204– Nominate a responsible body for keeping the procedure updated:

Btw/ 2nd link in item 7 of the procedure does not work – mail from Romain on January 23rd 2008.