Retail Market Messaging Incident Report 1
Dec 13, 2015
Retail Market MessagingIncident Report
1
Agenda
• Recap of integrated market messaging system
• Incident Overview
• Current Status
• Incident Observations
• Next Steps
2
Agenda
• Recap of integrated market messaging system
• Incident overview
• Current Status
• Incident Observations
• Next Steps
3
Arrangements before May 2012
SAP IS-U SeebeyondMessage Hub
SupplierBilling System
MPCC(Supplier)
CC&BSupplier
Billing SystemEMMA(Supplier)
SeebeyondEMMA
SeebeyondGEMMA
Arrangements after Go-Live in October 2012
SAP IS-UVPN
Central Messaging
Hub
SAP IS-U
Common technical harmonisation solution Single instance supported by Northgate
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA (MPCC)
~30M
~ 4M
Total Market Messaging Volumes PA
Agenda
• Recap of integrated market messaging system
• Incident overview
• Current Status
• Incident Observations
• Next Steps
6
Incident/s in November, December ‘12 & January ‘13
Central Messaging
Hub
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Incident/s in November, December ‘12 & January ‘13
Central Messaging
Hub
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
DB DB
Processing Messages
Incident/s in November, December ‘12 & January ‘13
Central Messaging
Hub
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
Market ParticipantBilling System
Market ParticipantMessaging Application
EMMA
DB DB
Processing Messages
Incident Description
• Serious performance issue/s at EMMA level and multi-faceted– EI EMMA server resources
• CPU, memory etc. at upper limits
– EI EMMA database/application• performance issues & timeouts
• database table/storage limits reached
• system stressed
– Others MPs EMMAs had similar issues but greatly reduced impact
10
Incident Description
• Central messaging hub issues– Messages back-logged and clogging the system
• Time-out parameters
– System resources tied-up on processing un-sent messages, retry after retry
– Performance impacted overall– Systems stressed and data storage management issues ( processing folder )
– Market Participants impacts dependent on volumes and timings
– Central hub database integrity was impacted
– Presented serious operational issues on system performance
– Highlighted specific application and database performance weaknesses
11
TIBCO and Non-TIBCO Related Events
• A number of events occurred that impacted market messaging delivery
– only some of these events were the result of the market messaging application incidents
• Summary of the main events that occurred:-– EMMA and Central Hub performance issues– Northgate outages
• Emergency and planned
– Market Participants systems outages• EMMA and back-end systems (Non TIBCO)
– Multi hardware faults• ESBN SAP IS-U and PI
– ESB data centre switch-overs• Planned and short notice
– Connectivity issues
12
Incident Management
• EMMA’s
– Electric Ireland EMMA • Additional EMMA application server resources where added
• Significant performance improvements resulted
• EMMA database monitoring fully put in place by EI
– Eirgrid EMMA• Application service resources for outbound messages re-allocated to inbound
messages.
• Good performance improvements resulted
– Airtricity EMMA • Database issue resolved
– Database indexes applied at EMMAs
– 341 market messaging defect fix applied to EMMAs
13
Incident Management
• Central messaging hub
– Manual manipulation of messaging folders required to resolve incidents successfully
– Manual operational management of un-sent messages put in place
– Manual reconciliation processes put in place
– Additional central messaging hub server and data storage resources were added
– Database indexes applied
14
Agenda
• Recap of integrated market messaging system
• Incident overview
• Current Status
• Incident Observations
• Next Steps
15
Current Status
• EMMAs are working well• Central messaging is working well• Working manual processes in place to deal with an event of a
significant back-log of messages at central messaging hub• A clear scope of specific TIBCO software changes has been
identified for project delivery• Market Participants (MPs)
– Monitoring of their EMMA database– Monitoring their EMMA servers with regard to performance
16
Risk Management
• There are various events that could potentially cause a central messaging hub message backlog, for example:-
– EMMA performance issues– MP loss of data communications services – broadband– MP hardware failure– Northgate internet services outage– Others
• Mitigation– Improvements already made to EMMA’s and central hub– MP management of EMMA– Increased Northgate awareness and experience of central hub management– Planned contingency solution– Improved communications overall – Remedial project delivery
17
Agenda
• Recap of integrated market messaging system
• Incident overview
• Current Status
• Incident Observations
• Next Steps
18
Incident Observations
• Performance Testing– Significant repeated performance testing was carried out with message volumes
of 350,000 plus in ~12 hour time periods• Current working day market messaging max volumes total is in the range of ~135,000 ( end of month
figure )
– Scenario testing coverage• Loss of a EMMA that processed large volumes of market messages, for a prolonged period of time
and resulting in a significant back-log of market messages at the central messaging hub was not included in the performance testing scope
• EMMA Application and Database Management– Current MP EMMA documentation needs to be updated with relevant material on
EMMA housekeeping activities– Documentation needs to be issued for comment and final version published to all– Archiving tools required for EMMA housekeeping– MP role and responsibilities in EMMA housekeeping
19
Incident Observations
• Central Messaging Hub and EMMA
– TIBCO system review carried out by Wipro as System Implementer – Northgate carried out an operational review– Application software changes now identified and scoped to deliver improvements
in reliance, performance and database integrity– Performance testing will be required on all software changes– Project required for the delivery of these changes.
20
Agenda
• Recap of integrated market messaging system
• Incident overview
• Current Status
• Incident Observations
• Next Steps
21
Next Steps
• Remedial Project Scope and Delivery
– Delivery of archiving tools for solution overall– Delivery of reconciliation of physical xml market messages– Contingency solution for possible future events– Update of market participants EMMA documentation on housekeeping– Communications to market participants on EMMA housekeeping– Review of EMMA specification overall for the market.– Delivery of messaging enquiry tool ( MET ) for central messaging hub– Review of TIBCO software modules with regard to software versions, upgrade
paths and end of maintenance support.
22
Integrated Plan ( development in progress )
• Market schema release 2013• IPT for new market entrants• Remedial project• TIBCO product upgrade• Market Participants IT plans
23
24