www.kit.edu KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) ITIL and Grid services at GridKa CHEP 2009, 21 - 27 March, Prague Tobias König , Dr. Holger Marten Steinbuch Centre for Computing (SCC)
Jan 20, 2016
www.kit.eduKIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH)
ITIL and Grid services at GridKaCHEP 2009, 21 - 27 March, Prague Tobias König, Dr. Holger Marten
Steinbuch Centre for Computing (SCC)
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
Main aim of GridKa is offering sustainable Grid services
The availability and reliability of IT Services directly affects The users‘ satisfaction The reputation of the Computing Centre SCC
and also not to forget the economical aspects
Thus it is important to implement processes and tools that increase the availability and reliability of the IT services. The Information Technology Infrastructure Library (ITIL) is a process-orientated framework for the management of IT processes
| Tobias König | CHEP 2009 | 21 - 27 March Prague2
GridKa: The German Tier-1 centre hosted by the SCC
Steinbuch Centre for Computing: The Computing Centre of the KIT
Introduction
Karlsruhe Institute of Technology: The cooperation of the Forschungszentrum Karlsruhe GmbH (FZK) and the University of Karlsruhe (TH)
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
General information
The SCC puts emphasis on ITIL v2 Service Support processes Configuration Management Incident Management and
Service Desk Problem Management Change Management Release Management
Only these ITIL processes are currently implemented at the SCC. These processes are the most relevant and most important processes of ITIL for the SCC
The other ITIL Service Delivery processes like Service Level Management Availability Management Capacity Management Continuity Management Financial Management
are not ITIL standardized and implemented at the SCC at the moment and in consequence they are not part of this talk
| Tobias König | CHEP 2009 | 21 - 27 March Prague3
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
| Tobias König | CHEP 2009 | 21 - 27 March Prague4
Structure of this talk
Incident Management
Problem Management
Change Management
Configuration Management
ITIL Processes at GridKa
Respon-sibility
(internal roles and
groups)
Internal view
External view
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
| Tobias König | CHEP 2009 | 21 - 27 March Prague5
External/Internal Configuration Management
Incident Management
Problem Management
Change Management
Configuration Management
Respon-sibility
(internal roles and
groups)
Internal view
External view
ITIL Processes at GridKa
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
External/Internal Configuration Management
The Configuration Management Database (CMDB) is the basis of all ITIL processes
Following items are stored within the CMDB: All the Configuration Items
CIs (HW, SW, Racks) The CIs are related to the
service components The IT services themselves
are compositions of the service components.
Etc.| Tobias König | CHEP 2009 | 21 - 27 March Prague6
Configuration Management
Respon-sibility
(internal roles and
groups)
Internal view
External view
Service components
Services
CIs
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
| Tobias König | CHEP 2009 | 21 - 27 March Prague7
Incident Management
Incident Management
Problem Management
Change Management
Configuration Management
ITIL Processes at GridKa
Respon-sibility
(internal roles and
groups)
Internal view
External view
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
Incident Management
Prime service hours:Tickets are directly assigned from the 1st level support to the 2nd level support, the technical experts. The experts start immediately to solve the incident
On-call service hours:An On-Call-Engineer (OCE) starts immediately to solve the incident
Documentation of solution| Tobias König | CHEP 2009 | 21 - 27 March Prague8
Incident Management
Problem Management
Respon-sibility
(internal roles and
groups)
Internal view
External view
Help Desk 1st Level Support
Experts 2nd Level Support
5*12 Experts /7*24 On-Call Engineers
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
| Tobias König | CHEP 2009 | 21 - 27 March Prague9
Problem Management
Incident Management
Problem Management
Change Management
Configuration Management
ITIL Processes at GridKa
Respon-sibility
(internal roles and
groups)
Internal view
External view
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
Problem Management
Prime service hours: Meeting of the task force (OCEs and Experts)
On-call service hours:The OCE can call other experts the next morning or another OCE during the whole night
Problem Management is the detection of the under-lying causes of an incident and their subsequent resolution and prevention.
| Tobias König | CHEP 2009 | 21 - 27 March Prague10
Incident Management
Problem Management
Respon-sibility
(internal roles and
groups)
Internal view
External view
Task Force: Problematic tickets, weekly reports, Availability, 7*24
Operations hand over
Experts 2nd Level Support
7*12 Experts /7*24 On-Call Engineers
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
| Tobias König | CHEP 2009 | 21 - 27 March Prague11
External/Internal MonitoringTools to support the Incident and Problem Management
Incident Management
Problem Management
Change Management
Configuration Management
Respon-sibility
(internal roles and
groups)
Internal view
External view
ITIL Processes at GridKa
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
External/Internal MonitoringTools to support the Incident and Problem Management
External: A incident or problem can be recognized from outside by the user or the external monitoring system. These alarms automatically create a ticket
Internal: An incident or problem can be recognized from GridKa’s Nagios system
Planned workflow between both systems
| Tobias König | CHEP 2009 | 21 - 27 March Prague12
Incident Management
Problem Management
Respon-sibility
(internal roles and
groups)
Internal view
External view
External Monitoring / SAM tests
Internal Monitoring / Nagios, Ganglia
External ticket system / GGUS
Internal ticket system / Remedy
7*24 Knowledgebase / Wiki and Remedy
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
| Tobias König | CHEP 2009 | 21 - 27 March Prague13
Change Management
Incident Management
Problem Management
Change Management
Configuration Management
ITIL Processes at GridKa
Respon-sibility
(internal roles and
groups)
Internal view
External view
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
Change Management
For changes of the IT infrastructure a Request for Change (RfC) is created in the CMDB
There are 3 kinds of RfCs: Emergency RfC Standard RfC Planned RfC
GridKa’s downtimes are announced externally via EGEE Broadcasts and via the Change Calendar at http://www.gridka.de/monitoring
| Tobias König | CHEP 2009 | 21 - 27 March Prague14
Change Management
Respon-sibility
(internal roles and
groups)
Internal view
External view
Change calendar for
Maintenances
Planned RfC
Standard RfC
Emergency RfC
CMDB (RfCs)
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
| Tobias König | CHEP 2009 | 21 - 27 March Prague15
Special management roles
Incident Management
Problem Management
Change Management
Configuration Management
ITIL Processes at GridKa
Respon-sibility
(internal roles and
groups)
Internal view
External view
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
Special management roles
The Configuration Manager Responsible that all CIs are
stored in the CMDB and that the DBs are up to date
The Change Manager Responsible for the change
process and the formal correctness of the RfCs
Organizes the Change Advisory Board (CAB)
Special role: Contact person to the IT Service management department
| Tobias König | CHEP 2009 | 21 - 27 March Prague16
Change Management
Respon-sibility
(internal roles and
groups)
Internal view
External view
Change Manager
CAB Change Advisory Board
Configuration Management
Configuration Manager
GridKa Contact person to Service ManagementCollecting data for CMDB Supporting experts in
filing RfCs
www.kit.eduKIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH)
Thank you for your attention!
Steinbuch Centre for Computing (SCC)
KIT – The cooperation of Forschungszentrum Karlsruhe GmbH and Universität Karlsruhe (TH)
| Tobias König | CHEP 2009 | 21 - 27 March Prague18
Discussion
Incident Management
Problem Management
Change Management
Configuration Management
ITIL Processes at GridKa
Respon-sibility
(internal roles and
groups)
Internal view
External view
Change Manager
Configuration Manager
Task Force: Problematic tickets, weekly reports, Availability, 7*24
Operations hand over
Help Desk 1st Level Support
Experts 2nd Level Support
5*12 Experts /7*24 On-Call Engineers
CAB Change Advisory Board
GridKa Contact person to Service Management
Collecting data for CMDB organizing Task Force and 7*24 operationsSupporting experts in
filing RfCs
Change calendar for
Maintenances
Planned RfC
Standard RfC
Emergency RfCService components
External Monitoring / SAM tests
Internal Monitoring / Nagios, Ganglia
CMDB (CIs: Services, Service components, HW, SW, RfCs)
SLA
Services External ticket system / GGUS
7*24 Knowledgebase / Wiki and Remedy
Internal ticket system / Remedy
Workflow between Ticket Systems