Jan 21, 2016
And the EGI Added Value?• In order to be both attractive and maintainable, Grids need
to have the following attributes:1. Low cost of entry;2. Low cost of ownership.both in terms operations as well as application and user supportapplication and user support
Currently, adapting an existing application to the Grid environment is a non-trivial exercise that requires an in-depth understanding not only of the Grid computing paradigm but also of the computing model of the application in question.
• One reason for the success of the Application Support team at CERN has been the very close physical proximity of: “the highest level of middleware expertise”
I would also like to add the enormous enthusiasm and dedication of the people involved! 2
What is this EGI?
• The roles and scope of EGI are presumably strongly determined by the NGIs
• The NGIs are presumably(?) influenced by the user communities that they serve
• This leads to two clear lines of action:
1. Directly to the NGIs, clarifying the possible functions that could be performed by EGI, the “value-add”, etc.
2. To existing – and most importantly – new user communities
The latter is the area that I believe I can add most value in the next phase – together with “my friends” JK
3
How Many People?Some examples: • LHC experiments currently have one dedicated person (EIS
team) plus a 2nd “cloud” of roughly 1 person per experiment. However, this is complemented by (at least) a handful or two of Grid experts within each experiment.
• For smaller applications (UNOSAT, GEANT4, ITU), a small team (<4 people) has been able to provide sufficient support to enable these applications to use the Grid successfully. How this would scale to much larger numbers of applications is not clear.
• The absolute minimum size for a team is 2 people – 5 (or perhaps more) at “Centres of excellence” would be a good basis
These people would nevertheless be well positioned to perform additional part-time functions (evangelising...)4
Where?
• A model where the AS team(s) are physically separated (i.e. isolated) from the communities that they should support just ain’t gonna work!
• And as the user communities don’t sit in one place, a distributed implementation model falls out naturally
• “Centre(s) of excellence” could / should / would provide significant added value, both in directly supporting appropriate communities and in assisting the distributed support effort – presumably implemented by the NGIs and / or local communities
One such centre of excellence could well be situated “at” the EGI (whatever “at” means in this context…)
5
Why?
• This slide should clearly be completely superfluous• If one has to start by motivating the need for Grid
computing per se then the road will be long & tough! For completeness – for March and for anything
distributed beforehand – we should nevertheless assemble the most persuasive and convincing arguments that we have developed so far!
In particular, I feel the value of landing some “big fish” should be emphasized – i.e. not just applications looking for a “free lunch”
• This means projects such as ITER…6
How? (Scenario 1)
• The project identifies a promising application (good examples could be ITU, QCD, ...)
• A “community/application” is identified by the project (EGI or proposed by one NGI) and this application is given support and resources via a lightweight procedure formal MoU, negotiation of sharing of resources etc... should be
avoided at this stage and left to another part of the project when the application/community has got some convincing result
This requires the application/community to invest a relatively modest amount of time with a team of skilled experts as described [in this document]
• This procedure delivers a concrete (& quick) demonstration of the feasibility and value of the “gridification”
7
How? (Scenario 2 & Wrap-up)• In the case of complex applications (mainly from the
sociological/geographical point of view), the EGI should provide the management expertise more than the “gridification” know-how and organise the collaboration • This is non-trivial because if users from n countries are
involved, the complexity goes like n2 since we will have “automatically” also n NGIs interacting.
• Note that this is already visible now outside HEP: when the situation is favourable, groups from different countries “meet” because of the grid and start collaborating (e.g. Biomed H5N1 docking). When they are a bit less lucky, they cannot really start to collaborate because sidetracked by “simple” technical issues.
• Differences between these scenarios are as follows: 1. The deliverable is a demonstration and the success depends
on maintaining technical excellence “in house” (which is ~impossible unless there is long-term involvement with a challenging application)
2. The deliverable involves the creation of a (long-term) technical support structure. 8
What is this EGI?
• The roles and scope of EGI are presumably strongly determined by the NGIs
• The NGIs are presumably(?) influenced by the user communities that they serve
• This leads to two clear lines of action:
1. Directly to the NGIs, clarifying the possible functions that could be performed by EGI, the “value-add”, etc.
2. To existing – and most importantly – new user communities
9
Boot-strapping the NGIs - & EGI!
• Over the past 3.5 years, we have performed a series of “Service Challenges”, leading into “Full Dress Rehearsals” and (now) the “Common Computing Readiness Challenge”
• The main purpose has been to get the service up to the required levels – those needed for 2008 data taking – as well as provide increasingly functional services & resources to allow the community to develop & harden their Grid applications
• This has been an enormous amount of work – at it’s still far from over!
Despite the fact that this was preceded by many years of earlier data challenges & integration tests• As well as decades of earlier collaboration in “non-Grid” environments
• Depending on how much functionality the EGI / NGI world should offer, one should nevertheless not underestimate the amount of effort, commitment – and length of time – that this takes!
• It almost certainly requires a dedicated coordination team…• …and would have to start now with a full programme and clear
milestones if a working multi-NGI environment is to be ready in 2010
10
LCG – 2007
• LCG ran ~ 44 M jobs in 2007 – workload has continued to increase – now at ~ 165k jobs/day
• Distribution of work across Tier0/Tier1/Tier 2 really illustrates the importance of the grid system– Tier 2 contribution is around 50%; > 85% is
external to CERN
• Data distribution CERN Tier 1s– Achieved target peak rates with ATLAS and CMS
real workloads
165k/day
Target 2008
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
2007 Highlights – 1
14 January 2008 12IT Departmental Meeting
• EGEE infrastructure– World wide Grid grew from 196 to 260 sites (50K CPUs)
• 170 k jobs/day (100K for LCG)• ~ 500 M SI2k-days/month
– Grid operation shared between 10 Regional Operation Centres • Seamless interoperation with OSG & related projects
– CERN ROC integrated with IT support structure – Local grid services use FIO’s procedures and tools
• Most production services have been handed over to FIO
– Evolution of grid monitoring tools• Common scalable grid monitoring infrastructure architecture defined • Integration of EGEE and experiment monitoring dashboards started
What Next?
• We must reach agreement – or acceptable compromise – on what it is we are trying to achieve – as well as when
• We must do this first amongst ourselves (mega-urgentmega-urgent) and then within the NGI & Application Communities
• Many of the high-level statements that we quote right now are open to a wide-range of interpretations• Starting with the “legendary quote” from EGEE ’06…
With a sustained and concerted effort we can produce a reasonable “portfolio of functions” and “costing models” by March – although it would IMHO still be desirable to expose this earlier…
• And we can also make the May deliverables…• But it means changing up a gear or two… Unfortunately, timescales don’t look promising
• EGEE UF, other existing commitments, …
13
EGI Operations Vision• Notwithstanding the different and evolving needs of application
communities and NGIs, a key component of the EGI vision is the provision of a large-scale, production Grid infrastructure – built on National Grids that interoperate seamlessly at many levels, offering reliable and predictable services to a wide range of applications, ranging from “mission critical” to prototyping and research.
• It is understood that it will be a long and continuous process to reach this possibly utopian goal, with additional NGIs and/or application communities joining at different times, with varying needs and different levels of “maturity”
• In addition, sites of widely varying size, complexity and stage of maturity must clearly be taken into account
• However, it is felt important to emphasize this “vision” as a key component of the proposed EGI / NGI strategy
• The EGI must also have a role in the appropriate policies, such as “standards” closely related to operational aspects and security• Including Low Cost of Entry & Low Cost of Ownership
14
EGI_DS WP3 OPS WG “Volunteers”
• Tiziana• Per• Rolf• Fotis• Jamie
• [email protected]• Underscores seem to be not permitted by the naming
wizard!
15
WG Schedule• ‘Final’ draft needs to be ready for discussion at WP5 in Munich
on March 3rd
• This means that we have just 4 weeks to converge!• We can’t count on February 29th – that is double booked already!
• EGEE III has the following ‘operations’ tasks:• Ops.1.1 Grid Management (OCC, ROC, (P)PS, OAG, accounting)• Ops.1.2 Operations and support• Ops.1.3 Interoperation• Ops.1.4 Grid Security• Ops.1.5 User and application support (GGUS, AS, …)• Ops.1.6 Monitoring tools• Ops.1.7 Coordination w/other activities• Ops.1.8 Overhead tasks
¿ Is there a better place to start?● With respect to EGEE, could concentrate primarily on
coordination, rather than (also) execution● This could perhaps result in significant savings in what has to be
provided “at the EGI level” N.B. consider not only the total cost but the cost benefit
● If it costs a small amount more but is nevertheless of much more value it may still make sense! 16
Proposal – John Gordon
• EGI will (at least initially) adopt the EGEE middleware stack and the monitoring regime as a test of compliance.
• gNOCs will take on the duties currently delivered by ROCs.
• If an NGI does not have the critical mass to do this it should form an alliance with neighbouring (or partner) countries.
17
Summary & Conclusions
• The only water-tight argument for Grid computing comes from the Application Communities that are thereby enabled to do more & better …
Ultimately, we will be judged by the success of these user communities
And it is this success – or otherwise – that will convince governments and / or other noble benefactors to provide long-term funding
18
The End