EGI Application Support Models ¿What use is a Grid without grid users? [email protected].

EGI Application Support Models

¿What use is a Grid without grid users?

[email protected]

And the EGI Added Value?• In order to be both attractive and maintainable, Grids need

to have the following attributes:1. Low cost of entry;2. Low cost of ownership.both in terms operations as well as application and user supportapplication and user support

Currently, adapting an existing application to the Grid environment is a non-trivial exercise that requires an in-depth understanding not only of the Grid computing paradigm but also of the computing model of the application in question.

• One reason for the success of the Application Support team at CERN has been the very close physical proximity of: “the highest level of middleware expertise”

I would also like to add the enormous enthusiasm and dedication of the people involved! 2

What is this EGI?

• The roles and scope of EGI are presumably strongly determined by the NGIs

• The NGIs are presumably(?) influenced by the user communities that they serve

• This leads to two clear lines of action:

1. Directly to the NGIs, clarifying the possible functions that could be performed by EGI, the “value-add”, etc.

2. To existing – and most importantly – new user communities

The latter is the area that I believe I can add most value in the next phase – together with “my friends” JK

3

How Many People?Some examples: • LHC experiments currently have one dedicated person (EIS

team) plus a 2nd “cloud” of roughly 1 person per experiment. However, this is complemented by (at least) a handful or two of Grid experts within each experiment.

• For smaller applications (UNOSAT, GEANT4, ITU), a small team (<4 people) has been able to provide sufficient support to enable these applications to use the Grid successfully. How this would scale to much larger numbers of applications is not clear.

• The absolute minimum size for a team is 2 people – 5 (or perhaps more) at “Centres of excellence” would be a good basis

These people would nevertheless be well positioned to perform additional part-time functions (evangelising...)4

Where?

• A model where the AS team(s) are physically separated (i.e. isolated) from the communities that they should support just ain’t gonna work!

• And as the user communities don’t sit in one place, a distributed implementation model falls out naturally

• “Centre(s) of excellence” could / should / would provide significant added value, both in directly supporting appropriate communities and in assisting the distributed support effort – presumably implemented by the NGIs and / or local communities

One such centre of excellence could well be situated “at” the EGI (whatever “at” means in this context…)

5

Why?

• This slide should clearly be completely superfluous• If one has to start by motivating the need for Grid

computing per se then the road will be long & tough! For completeness – for March and for anything

distributed beforehand – we should nevertheless assemble the most persuasive and convincing arguments that we have developed so far!

In particular, I feel the value of landing some “big fish” should be emphasized – i.e. not just applications looking for a “free lunch”

• This means projects such as ITER…6

How? (Scenario 1)

• The project identifies a promising application (good examples could be ITU, QCD, ...)

• A “community/application” is identified by the project (EGI or proposed by one NGI) and this application is given support and resources via a lightweight procedure formal MoU, negotiation of sharing of resources etc... should be

avoided at this stage and left to another part of the project when the application/community has got some convincing result

This requires the application/community to invest a relatively modest amount of time with a team of skilled experts as described [in this document]

• This procedure delivers a concrete (& quick) demonstration of the feasibility and value of the “gridification”

7

How? (Scenario 2 & Wrap-up)• In the case of complex applications (mainly from the

sociological/geographical point of view), the EGI should provide the management expertise more than the “gridification” know-how and organise the collaboration • This is non-trivial because if users from n countries are

involved, the complexity goes like n2 since we will have “automatically” also n NGIs interacting.

• Note that this is already visible now outside HEP: when the situation is favourable, groups from different countries “meet” because of the grid and start collaborating (e.g. Biomed H5N1 docking). When they are a bit less lucky, they cannot really start to collaborate because sidetracked by “simple” technical issues.

• Differences between these scenarios are as follows: 1. The deliverable is a demonstration and the success depends

on maintaining technical excellence “in house” (which is ~impossible unless there is long-term involvement with a challenging application)

2. The deliverable involves the creation of a (long-term) technical support structure. 8

What is this EGI?

• The roles and scope of EGI are presumably strongly determined by the NGIs

• The NGIs are presumably(?) influenced by the user communities that they serve

• This leads to two clear lines of action:

1. Directly to the NGIs, clarifying the possible functions that could be performed by EGI, the “value-add”, etc.

2. To existing – and most importantly – new user communities

9

Boot-strapping the NGIs - & EGI!

• Over the past 3.5 years, we have performed a series of “Service Challenges”, leading into “Full Dress Rehearsals” and (now) the “Common Computing Readiness Challenge”

• The main purpose has been to get the service up to the required levels – those needed for 2008 data taking – as well as provide increasingly functional services & resources to allow the community to develop & harden their Grid applications

• This has been an enormous amount of work – at it’s still far from over!

Despite the fact that this was preceded by many years of earlier data challenges & integration tests• As well as decades of earlier collaboration in “non-Grid” environments

• Depending on how much functionality the EGI / NGI world should offer, one should nevertheless not underestimate the amount of effort, commitment – and length of time – that this takes!

• It almost certainly requires a dedicated coordination team…• …and would have to start now with a full programme and clear

milestones if a working multi-NGI environment is to be ready in 2010

10

LCG – 2007

• LCG ran ~ 44 M jobs in 2007 – workload has continued to increase – now at ~ 165k jobs/day

• Distribution of work across Tier0/Tier1/Tier 2 really illustrates the importance of the grid system– Tier 2 contribution is around 50%; > 85% is

external to CERN

• Data distribution CERN Tier 1s– Achieved target peak rates with ATLAS and CMS

real workloads

165k/day

Target 2008

CERN IT Department

CH-1211 Genève 23

Switzerland

www.cern.ch/it

2007 Highlights – 1

14 January 2008 12IT Departmental Meeting

• EGEE infrastructure– World wide Grid grew from 196 to 260 sites (50K CPUs)

• 170 k jobs/day (100K for LCG)• ~ 500 M SI2k-days/month

– Grid operation shared between 10 Regional Operation Centres • Seamless interoperation with OSG & related projects

– CERN ROC integrated with IT support structure – Local grid services use FIO’s procedures and tools

• Most production services have been handed over to FIO

– Evolution of grid monitoring tools• Common scalable grid monitoring infrastructure architecture defined • Integration of EGEE and experiment monitoring dashboards started

What Next?

• We must reach agreement – or acceptable compromise – on what it is we are trying to achieve – as well as when

• We must do this first amongst ourselves (mega-urgentmega-urgent) and then within the NGI & Application Communities

• Many of the high-level statements that we quote right now are open to a wide-range of interpretations• Starting with the “legendary quote” from EGEE ’06…

With a sustained and concerted effort we can produce a reasonable “portfolio of functions” and “costing models” by March – although it would IMHO still be desirable to expose this earlier…

• And we can also make the May deliverables…• But it means changing up a gear or two… Unfortunately, timescales don’t look promising

• EGEE UF, other existing commitments, …

13

EGI Operations Vision• Notwithstanding the different and evolving needs of application

communities and NGIs, a key component of the EGI vision is the provision of a large-scale, production Grid infrastructure – built on National Grids that interoperate seamlessly at many levels, offering reliable and predictable services to a wide range of applications, ranging from “mission critical” to prototyping and research.

• It is understood that it will be a long and continuous process to reach this possibly utopian goal, with additional NGIs and/or application communities joining at different times, with varying needs and different levels of “maturity”

• In addition, sites of widely varying size, complexity and stage of maturity must clearly be taken into account

• However, it is felt important to emphasize this “vision” as a key component of the proposed EGI / NGI strategy

• The EGI must also have a role in the appropriate policies, such as “standards” closely related to operational aspects and security• Including Low Cost of Entry & Low Cost of Ownership

14

EGI_DS WP3 OPS WG “Volunteers”

• Tiziana• Per• Rolf• Fotis• Jamie

• [email protected]• Underscores seem to be not permitted by the naming

wizard!

15

mailto:[email protected]

WG Schedule• ‘Final’ draft needs to be ready for discussion at WP5 in Munich

on March 3rd

• This means that we have just 4 weeks to converge!• We can’t count on February 29th – that is double booked already!

• EGEE III has the following ‘operations’ tasks:• Ops.1.1 Grid Management (OCC, ROC, (P)PS, OAG, accounting)• Ops.1.2 Operations and support• Ops.1.3 Interoperation• Ops.1.4 Grid Security• Ops.1.5 User and application support (GGUS, AS, …)• Ops.1.6 Monitoring tools• Ops.1.7 Coordination w/other activities• Ops.1.8 Overhead tasks

¿ Is there a better place to start?● With respect to EGEE, could concentrate primarily on

coordination, rather than (also) execution● This could perhaps result in significant savings in what has to be

provided “at the EGI level” N.B. consider not only the total cost but the cost benefit

● If it costs a small amount more but is nevertheless of much more value it may still make sense! 16

Proposal – John Gordon

• EGI will (at least initially) adopt the EGEE middleware stack and the monitoring regime as a test of compliance.

• gNOCs will take on the duties currently delivered by ROCs.

• If an NGI does not have the critical mass to do this it should form an alliance with neighbouring (or partner) countries.

17

Summary & Conclusions

• The only water-tight argument for Grid computing comes from the Application Communities that are thereby enabled to do more & better …

Ultimately, we will be judged by the success of these user communities

And it is this success – or otherwise – that will convince governments and / or other noble benefactors to provide long-term funding

18

The End

EGI Application Support Models ¿What use is a Grid without grid users? [email protected].

Documents

project egi

egi preparation meeting

scope of egi

application support

existing application

grid environment

grid experts

grid computing paradigm