-more- PR006/21 1 February 2021 MTR Announces Launch of East Rail Line New Signalling System and 9-car Trains on 6 February and a Series of Measures to Enhance Shatin to Central Link Project Control MTR Corporation announces today (1 February 2021) the commissioning of the East Rail Line (EAL) new signalling system and 9-car trains on 6 February 2021, after the satisfactory completion of all further testings, as well as approvals from relevant Government departments on safe and sound condition of the new signalling system and trains. At the same time, the Corporation announces the establishment of a dedicated “Shatin to Central Link (SCL) Technical and Engineering Assurance Team”, directly accountable to the Chief Executive Officer (CEO), to monitor the SCL project from both a technical and service readiness perspective and to identify important unknown issues of the remaining works of the SCL project for timely reporting and follow up. The establishment of the team is an initiative of the Corporation after reviewing the Report of the Investigation Panel (the Report) into the postponement of the commissioning of the EAL new signalling system in mid-September last year. In addition, as per request by the Government, a new Service Reliability Report has been introduced as part of the Government’s reviewing mechanism of the commissioning of new lines to ensure the timely reporting and handling of issues with a potentially significant reliability impact. This report will complement the existing System Safety Report. Dr Jacob Kam, CEO of MTR Corporation, points out that the Corporation decided to implement these two measures following the experience gained from the postponement and a detailed review of the Report. The Corporation also accepts and will implement the other recommendations made in the Report, as follows: 1. providing internal procedures to ensure that relevant Government departments are kept adequately informed of all significant reliability issues in the future; 2. strengthening training to raise sensitivity around public concerns, effective communications and the importance of service quality and reliability, in addition to safety issues, and 3. reinforcing second line of defence arrangements on risk management and compliance control to detect and escalate important issues early. Dr Kam reiterated the Corporation’s commitment to enhancing the project’s preparation works in relation to safety and passenger service.
55
Embed
MTR Announces Launch of East Rail Line New Signalling ...
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
-more-
PR006/211 February 2021
MTR Announces Launch of East Rail Line New Signalling System and 9-car Trains on 6 February and
a Series of Measures to Enhance Shatin to Central Link Project Control MTR Corporation announces today (1 February 2021) the commissioning of the East Rail Line (EAL) new signalling system and 9-car trains on 6 February 2021, after the satisfactory completion of all further testings, as well as approvals from relevant Government departments on safe and sound condition of the new signalling system and trains. At the same time, the Corporation announces the establishment of a dedicated “Shatin to Central Link (SCL) Technical and Engineering Assurance Team”, directly accountable to the Chief Executive Officer (CEO), to monitor the SCL project from both a technical and service readiness perspective and to identify important unknown issues of the remaining works of the SCL project for timely reporting and follow up. The establishment of the team is an initiative of the Corporation after reviewing the Report of the Investigation Panel (the Report) into the postponement of the commissioning of the EAL new signalling system in mid-September last year. In addition, as per request by the Government, a new Service Reliability Report has been introduced as part of the Government’s reviewing mechanism of the commissioning of new lines to ensure the timely reporting and handling of issues with a potentially significant reliability impact. This report will complement the existing System Safety Report. Dr Jacob Kam, CEO of MTR Corporation, points out that the Corporation decided to implement these two measures following the experience gained from the postponement and a detailed review of the Report. The Corporation also accepts and will implement the other recommendations made in the Report, as follows:
1. providing internal procedures to ensure that relevant Government departments are kept adequately informed of all significant reliability issues in the future;
2. strengthening training to raise sensitivity around public concerns, effective
communications and the importance of service quality and reliability, in addition to safety issues, and
3. reinforcing second line of defence arrangements on risk management and compliance
control to detect and escalate important issues early. Dr Kam reiterated the Corporation’s commitment to enhancing the project’s preparation works in relation to safety and passenger service.
“We never compromise the safety of our passenger services, and also attach great importance to their reliability. This is also the case for the new signalling system and it will only be launched when the Corporation and the relevant Government departments are satisfied with its performance in these aspects,” said Dr Kam. Dr Kam expressed his gratitude to the Investigation Panel chaired by Ir Edmund Leung. The Corporation announced the setting up of the Panel on 13 September 2020 to look into the communication and reporting mechanisms of the Corporation both internally and with relevant Government departments from May 2020, when the issue leading to the eventual postponement was first identified, to 11 September 2020 when the deferral decision was made. The Corporation submitted to the Transport and Housing Bureau on 21 January 2021 the report prepared by the Panel, which is appended in full (Appendix 1). The Corporation acknowledges and accepts the findings of the Panel which include a finding that the issue concerned is not an issue of safety but of service reliability. Safety has been reaffirmed by the technical investigation, which has shown that the concerned issue was caused by a non-safety critical software module being overloaded by a new software module specifically built for the Corporation to provide extra train monitoring information to the Operations Control Centre. The contractor has resolved the issue by upgrading the software and stopping the new software module. The Investigation Panel has during its course of the investigation referenced the findings of the technical investigation. The report of the technical investigation is appended in full (Appendix 2).
-End- About MTR Corporation
Every day, MTR connects people and communities. As a recognised world-class operator of sustainable rail transport services, we are a leader
in safety, reliability, customer service and efficiency.
MTR has extensive end-to-end railway expertise with more than 40 years of railway projects experience from design to planning and
construction through to commissioning, maintenance and operations. Going beyond railway delivery and operation, MTR also creates and
manages dynamic communities around its network through seamless integration of rail, commercial and property development.
With more than 40,000 dedicated staff*, MTR carries over 13 million passenger journeys worldwide every weekday in Hong Kong, the United
Kingdom, Sweden, Australia and the Mainland of China. MTR strives to grow and connect communities for a better future.
For more information about MTR Corporation, please visit www.mtr.com.hk.
*includes our subsidiaries and associates in Hong Kong and worldwide
Investigation Panel Report on Route Recall Issue Submitted by:
_________________ __________________ Ir Edmund KH Leung Dr Peter Ronald Ewen Chairman Deputy Chairman
Date: 8 December 2020
Table of Contents 1. Introduction ………………………………………………………….. P.1 2. Safe and Sound (S&S) …………………………………………….. P.3 3. Sequence of Events ………………………………………………... P.4 4. Findings ……………………………………………………………... P.13
4.1 Safe & Sound ……………………………………………….. P.13
4.2 Identification, Analysis and Classification of the Issue….. P.15
4.3 Internal Escalation ………………………………………….. P.16
4.4 Reporting to Government ………………………………….. P.17
4.5 Advisers …………………………………………………….. P.19
5. Summary of Findings ………………………………………………. P.19 6. Recommendations ………………………………………………… P.21
Page 1 of 22
1. Introduction
1.1 On 11 September 2020, MTR Corporation Limited (MTRCL) announced a postponement of the commissioning of the new signalling system and roll-out of new trains on the East Rail Line (EAL), which would have brought in the Mixed-Fleet Operation (MFO). The postponement decision was made after conducting a final review of the new system prior to service commencement. The announcement put on hold the changeover to MFO that was due to take place on 12 September 2020.
1.2 In the final review, MTRCL noted that during on-site testing in Non-Traffic Hours (NTH) on 11 May there had been a software issue that could potentially cause deviation of trains from their intended route. The software issue had been discovered and reported by Siemens, the contractor, as a defect that required remedial measures, but it had not been deemed to be a Safe and Sound (S&S) issue by the MTRCL team. Train deviations due to this software issue were only observed during analysis of logfiles during testing/simulation and did not actually occur in real operation on that day.1 Although the probability of occurrence was considered remote (at that time), train deviation from an intended route remained a possibility. The problem was identified as a Route Recall (RR) issue.
1.3 The precautionary postponement of the commencement of MFO
to address the software issue was deemed necessary to better ensure smooth and reliable operations of the new signalling system. It should be noted that the public has not experienced any disruption of the EAL service as a result of the deferment of MFO. However, the decision to defer the commissioning of the new signalling system and MFO just a day before the planned changeover has raised public concerns about the process of communication within MTRCL and its interaction with the Government.
1 In the course of successfully replicating the RR issue during testing in October and December, among other technical findings, the team also found that train was routed to the wrong platform of the correct station in two occasions (due to another known software bug in the time-table module and incorrect software installation sequence respectively, not caused by or linked to a RR. Both issues have now been corrected to prevent recurrence.).
Page 2 of 22
1.4 On 13 September, MTRCL announced the establishment of an Investigation Panel (IP) to investigate the matter.
1.5 The mandate of the Panel was:
• To ascertain how the potential route recall issue in the new signalling system was identified, confirmed, analysed and followed up
• To review whether the internal communication and reporting mechanism of MTRCL was sufficiently robust and was being timely and properly implemented during the above-mentioned process
• To investigate the reporting by MTRCL to relevant Government departments and ascertain whether this was timely and properly implemented
1.6 It should be noted that since postponement of MFO, MTRCL has
conducted a formal and comprehensive technical investigation into the RR issue. This is part of MTRCL’s ongoing effort to re-affirm its readiness for the commissioning of the new signalling system and MFO. The IP took note of this parallel stream of work by MTRCL which has, amongst other things, reviewed and validated the root cause2 of the RR issue and identified solutions accordingly. The scope of the technical investigation is different from that of this IP’s work. The focus of the IP is primarily on the period between the emergence of the software issue on 11 May and the announcement of deferral on 11 September, and to ascertain the way in which the RR issue was handled, review the internal communication and reporting mechanisms, and investigate the communication of the issue with the concerned Government departments. Since the technical investigation was conducted after the 11 September deferral, it bears little direct reference to the focus of the IP.
2 The new software link between the two internal ATS modules, which was built specifically for MTR to provide extra train fault status monitoring to the Traffic Controller, overloaded the Train Monitoring Tracking (TMT) data processing capacity and resulted in the RR issue.
Page 3 of 22
2. Safe and Sound (S&S) 2.1 For the purposes of the third part of the IP’s mandate as set out
in Paragraph 1.5 above, the reporting to relevant Government departments refers mainly to the S&S declaration process. The S&S process is a formal approval process between MTRCL and Government departments to ensure that a new project is safe and ready for operation which requires MTRCL to issue a declaration to the Railways Branch of EMSD (EMSD RB).
2.2 The requirements and guidance on what constitutes a safety-
related issue are comprehensive and clear; and having considered all these, the potential consequences of the RR issue, and the findings of technical investigation conducted after 12 September, the IP is of the view that the RR issue is NOT an issue of safety. This will be discussed in greater detail in later sections of this report and is in line with the view of the Independent Safety Assessor (ISA).3
2.3 The scope of what must be demonstrated to prove “Sound” is
agreed on a project-by-project basis. For MFO, Sound requirements related primarily to the demonstration of train service headway and journey time. The Sound requirements included the demonstration of the signalling system that it met the appropriate reliability performance; and the demonstration of the readiness of the Operating Team to operate the system and recover any potential reliability issues during service. These demonstrations were successfully completed for MFO.
2.4 A new project cannot commence passenger service without a
S&S declaration from MTRCL and then a follow-up letter of ‘satisfaction’ of S&S conditions from EMSD RB. MTRCL submitted two S&S declarations for MFO: one in May and one in August. The reason for the second submission was to reconfirm the first one, following the later discovery of some issues (unrelated to the RR issue) on 23 and 25 May, which will be described later. With the second submission, EMSD RB issued a letter of satisfaction of S&S conditions for MFO on 25 August.
3 The ISA is contracted by MTRCL to advise on critical safety aspects of the new signalling system.
Page 4 of 22
3. Sequence of Events 3.1 Initial discovery: As part of the process to satisfy itself of the
safety and soundness of the new signalling system and MFO (hereafter referred to as “the project”), MTRCL conducted extensive testing during NTH. The RR issue was first noted in an NTH test on 11 May. When a train that has already been given a route by the Automatic Route Setting (ARS) system physically moves past a signal, the route should be released. However, if the virtual representation of the train lags slightly behind its actual position (this phenomenon is termed “late stepping”) RR could occur. The virtual representation of trains is tracked and reported by a sub-function, Train Monitoring and Tracking (TMT) of the Automatic Train Supervision (ATS) system. When a ‘late stepping’ issue happens, the ARS sets or “re-calls” the identical route, which can be picked up by the following train.
Page 5 of 22
3.2 A RR issue can lead to a “deadlock” situation when a number of
trains cannot continue along their path, because they are blocked by another train. At bifurcations and at stations with multiple platforms, an RR issue could potentially lead to a train taking the wrong route. There are no safety implications as the signalling safety sub-systems would maintain the safe separation of trains and prevent collisions even if a train took the wrong route. However, a RR issue may result in service disruption or passenger inconvenience through a delay of service or, train movement along an unintended route, even though safety is protected throughout the whole movement.
3.3 Analysis and classification of the RR issue: As soon as the
problem was noted, Siemens, the contractor for the Shatin to Central Link signalling project, began analysis of the issue. On 12 May, the Siemen’s team presented to MTRCL their conclusion that the RR issue was caused by system overloading, which affected the TMT functions of the ATS system software. The RR issue was jointly (i.e. with MTRCL’s agreement) classified as a “medium” ATS issue and was categorised as a “Day 2” item. Day 2 items are those not considered to be critical to the commencement of service (commencement is “Day 1”) and can be resolved after commencement of MFO. The classification of it as a “medium” issue meant that it required a fix but was not
Page 6 of 22
critical. As the RR issue observed on 11 May had been classified a Day 2 item and was not considered to be an S&S issue, later on the day of 12 May MTRCL submitted it S&S declaration to EMSD RB.
3.4 Identification of corrective measure: By 4 June, Siemens
presented to MTRCL a corrective measure to resolve the RR issue and suggested it could be implemented with the next version of the ATS software. According to the plan, this new version would be installed on 15 September, shortly after the scheduled commencement of MFO on 12 September. The issue remained designated as a “medium” issue.
3.5 The IP observed that there were no significant consequences of
the RR issue identified at this point and, according to Siemens, the RR issue was not relevant to safety. It appears that neither Siemens or MTRCL had contemplated the RR issue from the perspective of “soundness”, which (as explained earlier) is based primarily on headway and journey time. This is despite the possibility, albeit considered at that time to be remote, of a deadlock and/or mis-routing in a real-life RR situation. Indeed, it appears that with the identification of a corrective action, the Siemens and MTRCL teams considered the issue as temporarily “resolved” insofar as a solution had been identified for near future implementation, and that no further action was therefore required at that stage. This judgement is now known to have been flawed. The IP opines that at this stage Siemens should have provided, and MTRCL should have requested, a full investigation of the RR issue including a probability and impact statement which should have then been included in the change request documentation (which will be discussed later). This analysis and the change process would have initiated the escalation and reporting process.
3.6 Meanwhile, on 23 May and 25 May separately, during a further
phase of NTH testing three significant issues occurred. The first incident involved the Signalling Automatic Train Supervision (ATS) subsystem and resulted in a display Greyout in the Operations Control Centre (OCC). It was concluded that the problem was the result of the activation of a data logging function, Paktel, in the ATS Subsystem during the testing and it was
Page 7 of 22
determined that the data logging function would not be used in normal service operation. The major lesson learned from this incident was to be very cautious about making last-minute changes to a system. The second incident involved the shutdown of the interlocking system and it was determined that the cause was due to the simultaneous manual shutdown of two of the four safety computers instead of the normal sequential shutdown. This was a procedural error and relevant maintenance manuals were subsequently updated to highlight those precautions and appropriate methods of shutting down safety related computers. The third incident involved a test train proceeding in the wrong direction and passing a red signal under “Restricted Manual (RM)” mode. This incident was attributed to human factor and the training and assessment of Train Captains on RM mode driving has since been enhanced.
3.7 While the three issues on 23 May and 25 May were not directly
related to the RR issue, the Greyout issue, in particular, had an impact on the resolution of the RR issue. First, resources and attention were focused on resolving these new issues. Second, because the root cause of the Greyout issue was related to the changing of a logging function, also used for debugging purposes in the ATS system, there was an increased caution in making any alterations to the ATS system, particularly as a last-minute change. It should be noted that all three issues were resolved before the scheduled date of MFO commencement and a full report was submitted to the Government4 and a Press Release5 issued by MTRCL in August. A subsequent inspection of logs during the technical investigation undertaken after 12 September has shown that RR issues had occurred in the same test sequence but went unremarked, due to the belief that RR was a Day 2 issue and a solution had already been identified and, that the SPAD, Greyout and interlocking shutdown issues demanded more immediate attention.
4 Report for Incidents during the Two Tests for East Rail Line (EAL) Mixed Fleet Operation (MFO) on 22^23 and 24^25 May 2020 dated 13 August 2020. 5 Press Release PR055/20 dated 17 August 2020: Report on the three incidents on East Rail Line May 2020.
Page 8 of 22
3.8 Submission of EDOC to adjust TMT trace-level settings: In early July, Siemens proposed adjusting the TMT trace-level settings on the ATS system to boost its overall performance (TMT trace-level setting alters the debugging and logging details but is different from the debugging tool Paktel, that caused the Greyout). Over the course of July, as testing of the systems and analysis was ongoing, it appeared (at that time) that this adjustment could also help improve the TMT system overloading issue, which Siemens had reported at the end of July to be the cause of the RR issue. (This was a mitigation but not the full fix, which was still planned to be achieved through the software upgrade post-commencement of MFO). Within MTRCL, with the formal handover of the system from the Projects team to Operations, as per normal practice, on 11 May, any change to the system required a formal application and approval through the Engineering Document (EDOC) process. The EDOC process is a change approval process for any change to a system under the custodianship of Operations. At the end of July, an EDOC was submitted to adjust the TMT trace-level settings, and this was approved on 18 August.
3.9 The EDOC process requires that all parties that could be
impacted by the EDOC review and sign the documents for the concerned system changes to proceed according to the implementation steps defined therein. This is to ensure that the integrity and configuration of the system is maintained. Once an EDOC is signed, it is expected that it will be implemented. In this case, implementation of the EDOC was planned to be before commencement of MFO (as ‘strongly suggested’ in Section 4.3 of the signed EDOC). The EDOC went through two drafts from 29 July to its final signing on 18 August. The second draft explained that the change to the TMT trace-level settings would improve TMT performance, as well as reduce the likelihood of occurrence of RR. However, although this might have been the understanding at the time, the IP notes that MTRCL’s technical investigation (referred to in Paragraph 1.6 above) has subsequently shown that this TMT trace-level settings adjustment would have no notable improvement for the RR issue.
3.10 Efforts to effect adjustment in the TMT trace-level settings prior
to MFO: At this stage, the RR issue continued to be a Day 2 issue.
Page 9 of 22
Over the course of August, there were increasing efforts to change the TMT trace-level settings, mainly because it was seen as a relatively easy procedure that would improve the performance of the ATS system. It is notable that the implications for soundness or reliability, such as possible train next stop/destination deviations or deadlocks, were not presented as reasons for making the change. However, Siemens urged MTRCL to make the TMT trace-level settings adjustment before commencement of MFO. On 7 August, Siemens “strongly suggest” implementing the change before the start of MFO, because “it is the first step to improve TMT performance without changing the software itself.” Although this “strong recommendation” was not explicitly linked by Siemens to the RR issue, it was understood to be so by the relevant MTRCL team and was articulated as such in the EDOC. It appears that the consequences of not implementing the TMT trace-level settings change were not made explicit by Siemens or the MTRCL team because it was assumed that it would be executed upon approval of the EDOC. However, Siemens never declared that the TMT trace-level settings adjustment was a pre-requisite for MFO, despite their repeated requests to implement the change. That is, Siemens never asked to stop MFO because the adjustment had not been made.
3.11 As will be discussed in later sections, the IP opines that the
receipt of the “strong recommendation” to implement the change before MFO should have been the trigger point for MTRCL to proactively engage EMSD RB. The IP also believes that if the root cause and full solution requirements of RR had been investigated fully at this point (as was done by the technical investigation after 12 September), it would have been known that the cause of TMT overloading was more complex and the full solution required much more consideration than was thought at that time. The IP therefore believes that MTRCL team initially under-estimated the complexity of the RR issue, leading to a delayed reporting and resolution of the issue.
3.12 MTRCL had planned to implement the TMT trace-level settings
change to test its effectiveness (with a plan to reset it after the test whilst the results were being analysed), prior to NTH testing on August 18^19, but due to a Typhoon Signal No. 8, this activity
Page 10 of 22
was cancelled. However, throughout August it was believed that the EDOC would still be implemented and subsequent attempts to implement the TMT trace-level settings adjustment were made before the end of the month.
3.13 Second S&S declaration issued: Meanwhile, the second and
final S&S declaration was under preparation and was signed off for submission to EMSD RB on 17 August. As mentioned in Paragraph 3.3, the first S&S declaration submitted on 12 May did not include the RR issue, because it was not considered a S&S issue. The second S&S declaration was also issued without mention of the RR issue because it was still considered a priority Day 2 issue i.e. highly desirable but not essential for Day 1 MFO operations. Furthermore, it was believed (at that time) that the issue would be mitigated by the adjustment of the TMT trace-level settings prior to MFO commencement and that there was therefore no reason to raise it to EMSD RB. The IP opines that if the complexity of the RR issue had been fully understood, the issue would have been raised to EMSD RB.
3.14 Growing pressure from Siemens to adjust the TMT trace-level
settings: By the end of August, there was growing pressure from Siemens to adjust the TMT trace-level settings. On 24 August, Siemens reiterated its advice to adjust the TMT trace-level settings. MTRCL advised that it would be discussed at the meeting of the Signalling Implementation Task Force (ITF) on 25 August6. At the meeting there was a discussion of the TMT trace-level settings adjustment and the strong need to get its implementation approved although there is no record of the potential full consequences being explicitly articulated.
3.15 There were then two further attempts in emails on 26 August and
31 August to get the EDOC implemented. Again, there were concerns expressed internally that any changes to the ATS system could lead to unforeseen issues, and therefore there was a reluctance to make the TMT trace-level settings adjustment. It appears that the priority was to deliver the commencement of MFO and that any changes to the systems were seen as creating
6 The ITF is an MTRCL internal co-ordination meeting to prioritise site-work on the new signalling system, and to minimise the impact of site-work to the existing train service on East Rail Line.
Page 11 of 22
a risk of potentially introducing unknown new problems and were therefore advised against.
3.16 Simulation tests were also carried out to assess the impact of the
TMT trace-level settings change. No notable improvement to the system loading and TMT performance was observed after the simulated change to the TMT trace-level settings. The IP opines that this was another trigger for a fuller technical investigation into the RR issue, at least to establish how much benefit the TMT trace-level settings adjustment could actually bring to improve the RR issue.
3.17 Final attempt to implement the EDOC and decision to defer
action by Joint T&C Safety Panel: The final attempt to implement the EDOC came in the week of 7 September. On 7 September, Siemens specified the possibility of a 20-minute delay and that a train could be wrongly routed. These issues were reiterated in an internal early morning email on 8 September.
3.18 This was then followed by a meeting on 8 September of the Joint
Testing & Commissioning (T&C) Safety Panel7. The focus of the discussion of the RR issue at the meeting was on the deadlocking situation in the terminus. There was a clear discussion and full agreement that this was not a safety issue. The potential consequences of a 20-minute delay due to deadlocking at Terminus and bifurcation point were discussed and understood. However, while the potential for wrong routing at a bifurcation point was mentioned by Siemens, it did not appear to have been picked up by all the participants at the meeting.
3.19 At this Joint T&C Safety Panel meeting, a decision was made to
delay the implementation of the TMT trace-level settings adjustment until after commencement MFO for a number of reasons:
7 The Joint T&C Safety Panel is an MTRCL panel with external experts and contractor attendance. It is responsible for reviewing the readiness for any major tests and drills of the new signalling system to be carried out on site.
Page 12 of 22
• The implementation of the TMT trace-level settings change would only mitigate and not solve the RR issue, so there was still a possibility that the issue could occur. In fact, technical investigations after 12 September have now shown that the TMT trace-level settings adjustment will have no notable improvement to the RR issue.
• It was considered that there would be some risks entailed in
implementing any change so close to the commencement of MFO, and the concern over this risk was heightened by the Greyout issue in May, which had also been related to late changes to the ATS system.
• The likelihood of occurrence of the RR issue was considered
to be remote. Furthermore, if RR happens in service, it was considered that the situation could be detected and controlled manually by operational procedure.
3.20 The IP agrees that making last-minute changes should be
avoided whenever possible; however, the decision whether or not to implement the TMT trace-level settings adjustment should have been made in the light of a clear understanding of the probability and impact of the RR issue occurring and, the potential benefits that the adjustment would bring. Furthermore, this decision should have been discussed with Government in order to develop an agreed way forward.
3.21 The Joint T&C Safety Panel asked the team to develop an
operational procedure to handle the deadlock issue and to further investigate what would need to be done before implementation of the TMT trace-level settings adjustment.
3.22 The development of operational measures to manage the RR
issue for passenger operations: Following the Joint T&C Safety Panel meeting on 8 September, the Projects team and Operation Control Centre (OCC) staff met on 9 and 10 September to discuss an operational procedure as the TMT trace-level settings adjustment was not going to be implemented before the commencement of MFO. There were some objections raised to the initial proposed procedure, as it would involve someone
Page 13 of 22
monitoring a screen for 19 hours. In light of this concern, a more robust operational procedure was developed.
3.23 Over the course of 9 and 10 September, the operational
procedure was refined, incorporating observations from various stakeholders.
3.24 On 10 September, Siemens reiterated the need to adjust the
TMT trace-level settings. Meanwhile, there were further discussions that day among various MTRCL staff to develop the operational procedure, and by the afternoon of 10 September, a fully developed operational procedure was in place.
3.25 At the same time, the media raised questions with the
Government and MTRCL about the train routing issue and the possible consequences, including the potential for wrong routing.
3.26 Later on 10 September, the Executive was informed about the
issue and the media interest. It was agreed to hold any decision making until the issue had been discussed with Government the following day.
3.27 After discussion with relevant Government departments, it was
decided that a technical solution, rather than operational procedure, would be the optimal way to deal with the RR issue. Accordingly, on 11 September, MTRCL decided to defer the commencement of MFO and an announcement was made in the afternoon. The IP opines that this was the correct decision on 11 September. Furthermore, if in June the RR issue, its root cause, consequence and solution requirements, and even interim operational procedures had been determined and discussed with EMSD RB, a more robust and earlier decision concerning MFO could have been made.
4. Findings
4.1 Safe & Sound
4.1.1 Having reviewed the evidence, including those related to the S&S declaration, the IP is of the view that the RR
Page 14 of 22
issue was not a safety related issue. MTRCL has conducted assessments of the potential for misrouting of a train including both as part of the risk review undertaken during the original project development process and subsequently as part of the technical investigation undertaken since 12 September. It has been formally assessed as having no safety impact and this view aligns with that of the ISA also given as part of the recent technical investigation undertaken.
4.1.2 While the IP could not identify any single definition for
‘soundness’ a process has been implemented based on a practice which is understood by relevant stakeholders. The scope of what must be demonstrated to Government departments to prove soundness is agreed on a project-by-project basis.
4.1.3 The assessment of the soundness of the Shatin to
Central Link (SCL) project followed the normal practice, notably by assessing and demonstrating journey time and train service headway reliability performance. The RR issue was not considered by the MTRCL team to fall into the formal criteria for reporting to Government departments based on the established S&S.
4.1.4 The IP opines that additional analysis on the root-cause
as well as the probability and impact of the RR issue on service reliability should have been undertaken from when the proposed 15 September software implementation fix was identified in June (as discussed in para 3.4). Further, even without this analysis, as the understanding of the ultimate consequences of this RR issue and the urge to make the TMT trace-level settings adjustment before commencement of MFO grew, MTRCL should have proactively discussed it with Government. Since the first identification of the issue in May, there were many opportunities to discuss the issue with Government officials through formal and ad hoc channels.
Page 15 of 22
4.2 Identification, Analysis and Classification of the Issue
4.2.1 The RR issue was first identified on 10^11 May. It was considered a Day 2 item upon its identification and analysis. However, the situation started to change from early July when Siemens first suggested an adjustment to the TMT trace-level settings to improve the performance of the ATS system.
4.2.2 By early August, the linkage between the adjustment of
TMT trace-level settings and the RR issue was understood by the MTRCL team and Siemens increased the strength of their recommendation. On 7 August, Siemens replied to an MTRCL email, stating: “We strongly suggest to implement this [trace-level] change before start of MFO because it is the first step to improve TMT performance without changing the SW [software] itself” as they considered that an improvement had been identified which was “a very simple means to gain significant performance, hence providing an extra level of confidence and therefore was strongly recommended to be done before start of revenue operation.” However, the TMT trace-level settings adjustment was still only considered by the MTRCL team to be a highly desirable short-term Day 1 mitigation measure, with the software update as the Day 2 full resolution. An EDOC was created to implement the TMT trace-level settings change before commencement of MFO. This EDOC could be considered as the beginning of the escalation process given its circulation for comments and the need to for sign-off. The EDOC was finally signed-off by MTRCL on 18 August.
4.2.3 The IP considers that the overall performance of the TMT
appeared to be becoming increasingly concerning over the period in question with respect to its impact on the functionality of the ARS system and the RR issue. Siemens’ change of stance to a “strong suggestion”, followed by the persistent urging to implement the TMT trace-level settings adjustment, suggests that this issue had grown in significance. This change of significance
Page 16 of 22
does not appear to have been fully recognised within the MTRCL team who then underestimated the complexity of the issue and trusted that there was a simple mitigation (by way of implementation of the TMT trace-level settings adjustment). Furthermore, the MTRCL team were not sensitive enough to the potential service impact of the RR issue. The IP opines that the MTRCL team were lacking in demanding Siemens provide a more detailed analysis and follow-up of the RR issue and the overall ARS functionality. This underestimation then influenced the reporting actions that are covered in the next sections. At the same time it was also incumbent on Siemens, as the signalling experts and system supplier and contractor, to provide a proper analysis of the RR issue and better explain their reasoning for requesting the trace level setting be adjusted before commencement of MFO. It appears too much reliance was being placed on the assumption that the trace level setting would help without there being a detailed analysis.
4.2.4 Technical Investigation after 12 September has now
shown that the RR issue is more complex and the TMT trace-level settings adjustment “strongly suggested by Siemens” has no notable effect in terms of mitigation and the full resolution requires software changes.
4.3 Internal Escalation
4.3.1 MTRCL has a hierarchy of governance and decision-
making bodies to oversee the commissioning of the new signalling system. Project progress is reported through a hierarchy of reports from team and project levels to Executive meetings, and to Board’s Capital Works Committee and the Board itself. The RR issue has not been reported in this path. To manage drills and exercises and to co-ordinate on-site testing and commissioning activities, there is the Joint T&C Safety Panel; and the ITF. There are therefore clear and well understood paths for escalation of issues and decision making at an appropriate level depending on the seriousness of an issue. Having reviewed this
Page 17 of 22
governance and reporting mechanism the IP considers it to be robust, as long as an issue has been appropriately classified and understood by the relevant teams which, in this case, was not.
4.3.2 There were three unsuccessful attempts to implement the
TMT trace-level settings adjustment, but these efforts were unsuccessful due to a typhoon and the prioritisation of other work which were considered to be more urgent. The issue was then escalated through the ITF to the Joint T&C Safety Panel on 8 September as a final last-minute attempt to get the TMT trace-level settings adjusted before MFO. However, by this time, commencement of MFO was only four days away. Moreover, there was an underlying concern that any last-minute software changes could lead to a repetition of the Greyout issue, which had caused significant project progress disruption between May and August. The lack of time and the uncertainty caused by previous issues created a reticence to implement the TMT trace-level settings adjustment.
4.3.3 Although the issue was eventually escalated to Director
level on 8 September at the Joint T&C Safety Panel, this was too late. The opinion of the IP is that the issue should have been escalated earlier and more widely within MTRCL from early August when Siemens’ position on the implementation of the adjustment to the TMT trace-level settings stated to be a “strong suggestion”. The IP therefore believes that MTRCL’s internal checking (known as ‘Second Line of Defence’) should be enhanced to detect and escalate issues early.
4.4 Reporting to Government
4.4.1 The fact that the RR issue was not considered a
reportable Day 1 issue was acceptable in May but became increasingly less so from the time Siemens first time stated their suggestion in early August.
Page 18 of 22
4.4.2 As discussed above, the complexity of the issue was underestimated by the MTRCL team who were not sensitive enough to the potential service impact of the issue. Both Siemens and MTRCL considered that the potential risk resulting from the RR issue would be mitigated by the implementation of the TMT trace-level settings adjustment prior to commencement of MFO. They also considered that the TMT trace-level settings adjustment was a simple procedure. The fact that the RR issue would be largely mitigated before MFO with a simple fix led MTRCL to believe that it was not an issue that warranted being raised with Government. It also limited internal discussion on the ultimate consequences of the issue and the operational procedure that could have been required.
4.4.3 The IP is of the opinion that the fact the Government was
not informed was a misjudgement on the part of MTRCL. Given this, and despite the belief at that time that a simple change was going to be made before commencement of MFO, the Panel opines that the Government should have been informed by MTRCL. Furthermore, once the decision not to implement the TMT Trace Settings adjustment was made at the Joint T&C Safety Panel, MTRCL should have discussed the issue with Government to agree a proposed way forward.
4.4.4 MTRCL’s Project Integrated Management System (PIMS)
guidelines on whether, when and how Government departments should be informed of potential soundness issues and the overall Sound process are unclear. Therefore, the IP recommends that PIMS should be reviewed to provide more clarity on the Sound process and reporting responsibilities. This should include clear reference to and guidance on working with the Government’s newly introduced Service Reliability Reporting mechanism.
Page 19 of 22
4.5 Advisers
4.5.1 Modern signalling systems are highly complex. While MTRCL has a high level of signalling expertise there is inevitably a degree of reliance on the deep system knowledge within the contractor organisation. Therefore, in order to support the assurance process MTRCL employs specialist advisers from whom advice can be sought, namely, the ISA and the Independent Reviewer (IR). As the ISA is contracted by MTRCL to advise on critical safety aspects of the new signalling system and given that this issue was never contemplated to be a safety issue by MTRCL and Siemens, it is understandable that the ISA was not engaged on this issue. However, the IR is contracted by MTRCL to provide advice and guidance on the technical maturity of the new signalling system and specifically on issues pertaining to performance and reliability. The IR was not consulted, despite working from the same office space. The IP considers that the IR should have been engaged as soon as Siemens made their “strong suggestion” and before making the decision as to whether or not to implement the TMT trace-level settings adjustment.
5. Summary of Findings
5.1 The key findings are as follows:
5.1.1 The RR issue was not a safety issue; it was one of reliability.
5.1.2 The root cause of the issue was not investigated
thoroughly for a number of reasons: there was a diversion of attention to arising from issues of greater potential operational and safety impact (23 and 25 May); the possibility of an RR was considered remote; and both Siemens and MTRCL believed that the proposed TMT trace-level settings adjustment was to be implemented before commencement of MFO and the issue would be sufficiently mitigated (now known to be incorrect).
Page 20 of 22
Furthermore, it was thought that the problem would be fully resolved shortly after commencement of MFO by a planned software upgrade. The IP opines that the MTRCL team and Siemens should have carried out a more detailed investigation earlier. Had they done so they would have realised there were additional underlying root causes to the issue, which were more complex. This would likely have alerted them to the necessity to escalate the matter within MTRCL and report it to Government. It was an error of judgement not to carry out a more detailed investigation earlier. The MTRCL team and Siemens were aware of the consequences of misrouting of a train, with a risk review undertaken as part of the project development process. It is considered this judgement error was influenced by the aforementioned factors and the IP found no evidence of deliberate concealment by MTRCL or Siemens of the RR issue.
5.1.3 The full consequences of the RR issue were not explicitly
articulated until 01 September. The IP opines that the consequences should have been more communicated by both Siemens and the MTRCL team.
5.1.4 The technical investigation conducted after 12
September has now identified the root cause of the RR issue and a full technical solution is being developed. This should have been done earlier by Siemens and MTRCL.
5.1.5 The issue was escalated internally to MTRCL Director
level, when it was realised that the TMT trace-level settings adjustment was not going to be undertaken before commencement of MFO. However, this was too late. Rather, it should have been escalated as soon as the contractor, Siemens, first stated their position to be a “strong suggestion”, that the TMT trace-level settings adjustment should be implemented before the commencement of MFO.
5.1.6 While it is not considered by the MTRCL work team to fall
within the pre-existing classification of matters required to be reported to relevant Government departments, the IP
Page 21 of 22
considers that this issue should have been discussed with the relevant Government departments. The IP notes and supports the recently added requirement of a Service Reliability Report, which would have reported matters such as the un-resolved RR issue, alongside the existing System Safety Report in the S&S process.
6. Recommendations
6.1 The Panel has the following recommendations after reviewing the relevant procedures and evidence: 6.1.1 There was no specific articulation, assessment, or
questioning of the probability or impact of the RR issue, nor an assessment of the potential reduction on the likelihood of an RR occurrence by the proposed adjustment to the TMT trace-level settings. Once the issue had been identified, the potential consequences of not implementing the change should have been explicitly articulated in the TMT trace-level settings adjustment documentation. The IP recommends a review of the PIMS and EDOC processes to give clear guidance on:
• The need to make explicit the probability and impact of
the subject issue occurring
• The quantifiable benefits or improvements expected from implementing a proposed change
• A clear recommendation of when the change should
be made and the relevance of that timing. 6.1.2 Relevant staff should be briefed to raise the sensitivity
and awareness of public concerns, effective communication and the importance of service quality and reliability issues, in addition to the current focus on safety.
6.1.3 As the importance of the RR issue grew with time, it
should have been escalated more quickly and better use
Page 22 of 22
should have been made of the IR. Furthermore, Government should have been informed. However, there was insufficient clarity in terms of whether this should have been reported to or shared with the relevant Government departments and by whom. The IP recommends a revision of PIMS to add more clarity on the use of both ISAs and IRs and, escalation and reporting including the importance and requirements of the S&S submission process.
6.1.4 The Second Line of Defence arrangements within
MTRCL should be enhanced in order to detect and escalate important issues early.
6.1.5 The IP is of the opinion that while the RR issue did not
concern safety, it was one of reliability and should have been reported to Government. The IP strongly supports the recent introduction of the Service Reliability Report, to accompany the existing System Safety Report. This should ensure that Government departments are kept adequately informed of all significant reliability issues in the future.
Appendix 2
Page 1
Technical Investigation Report
on Route Recall Issue
Submitted by:
Date: 21 January 2021
____________________ Operations Director
Dr. Tony Lee
Chairman of the Technical
Investigation Core Team
Page 2
Table of Contents
Executive Summary 1. Preamble
2. Route Recall (RR) Issue
3. Technical Investigation
4. Findings on the RR Issue
5. Two Additional Automatic Route Setting issues
6. Known system performance improvements
7. Conclusions
8. Recommendations
Annex 1: Technical Investigation Core Team Members
Annex 2: TMT Data Processing between ATS modules “lidi” and “TMT”
Annex 3: Delayed TD Stepping Phenomenon
Page 3
Executive Summary
Subsequent to the deferral of the planned Mixed Fleet Operation (MFO)
of the East Rail Line (EAL) on 12 September 2020 due to the “Route
Recall (RR)” Issue of the Automatic Train Supervision (ATS) subsystem
in the new signalling system, a technical investigation on the root cause
of the RR Issue together with other technical issues related to the
launching of EAL new signalling system has been conducted by the
Technical Investigation Core Team comprising MTR Operations and
Projects teams, Siemens (the Contractor) and External Technical
Advisors.
The Core Team has conducted a series of simulation and train testing
in the non-traffic hours (NTH) on 12, 19 and 28 October 2020 and
reviewed all the observed results to ascertain the root cause of the RR
Issue and develop the technical solutions.
The investigation confirmed that the RR Issue is not a safety issue but
a service reliability issue. It could lead to potential routing of a train to
an unintended destination and may generate false “Signal Passed at
Danger (SPAD)” alarms. It was attributed to the unexpected high
volume of data in the new Fault Classification Update (FCU) software
routine between two ATS modules, which was a non-standard software
specifically built by the Contractor as an add-on feature to fulfil MTRCL’s
requirement to provide extra train monitoring information to the Traffic
Controller in Operations Control Centre (OCC). The root cause of the
Page 4
RR Issue is due to the software defect in the new FCU software routine
which was not identified during testing carried out by the Contractor.
The overall system performance for normal train operation will not be
affected if the new FCU software routine is shut down. The provisioning
of the Fault Classification Update software routine for future use, if
needed, will be subjected to Government approval with further design,
testing and validation processes before its introduction.
Simulation tests done in August 2020 and the series of NTH testing in
October 2020 also confirmed that the adjustment of the “Train
Monitoring and Tracking (TMT)” trace-level settings inside the ATS
subsystem as proposed by the Contractor from July to early September
2020 would have no observable effect in resolving the RR issue.
To rectify the RR issue, the Contractor had (a) shut down the new FCU
software routine, (b) upgraded the computer hardware and (c) upgraded
the Automatic Route Setting (ARS) software as an extra assurance to
prevent the wrong routing. MTRCL and Contractor had conducted a
series of NTH testing and verified their effectiveness in eliminating the
RR Issue.
During the NTH train testing on 12 October and 6 December 2020, there
were two additional ARS issues that led to train route being assigned to
Page 5
the unintended platform of the correct station due to other software
defect and software installation issues respectively. They had been
fixed by upgrading the corresponding software and enhancing the
software upgrading control & authorization procedures respectively.
In the course of the technical investigation of the RR issue, the Core
Team has also taken the opportunity to review latest known signalling
system performance improvement items as at early January 2021 with
EMSD, Highways Department (HyD) and Transport Department (TD).
They have been properly addressed with the appropriate operational
measures derived to handle any related incidents during service
particularly those that may lead to minor service impact after MFO
commencement.
MTRCL will continue to monitor the performance of the signalling
system closely if any further improvement areas are identified thereafter
through operational experience after MFO. MTRCL will keep on
optimizing and enhancing its performance.
It is recommended to commence MFO after the completion of Safe and
Sound process.
Page 6
1. Preamble
1.1 North South Line (NSL), which is being built under the Shatin to
Central Link (SCL) project, is an extension of the existing East
Rail Line (EAL) to cross the harbor to the existing Admiralty
Station (ADM). The existing EAL Signalling System will be
migrated to the new EAL Signalling System prior to the opening
of NSL. The new EAL Signalling System will be put in passenger
service on EAL from the commencement of Mixed Fleet
Operation (MFO).
1.2 A new signalling system with introduction of MFO is necessary
as to replace the existing aged signalling system, meet
requirements of extended part of East Rail Line (EAL) to
Admiralty, and support gradual conversion of 12-car to 9-car train
operation during MFO to achieve final 9-car train operation for
SCL project.
1.3 In preparation for MFO on the East Rail Line (EAL), two tests
were conducted during the Non-Traffic Hours (NTH) of 22^23
and 24^25 May 2020 respectively to demonstrate the operation
of the new EAL signalling system. Two independent incidents on
system behavior were reported during the tests. One was about
Automatic Train Supervision (ATS) Line Overview Display grey-
Page 7
out caused by activation of the data logging function i.e. Paktel1,
which was wrongly deployed for normal operation. It degraded
the processing performance. Upon subsequent detailed
investigation of the root cause of the incident, this logging
function has since been banned from use. The other one was
Interlocking shutdown incident which was related to
simultaneous manual shutdown of the signalling interlocking
computers instead of sequential shutdown of computers. To
address the issue, revision to relevant maintenance manual has
been arranged together with necessary briefing to relevant staff
and posting of prominent warning notice besides the related
computers to prevent simultaneous manual shutdown. For
these incidents, MTRCL had submitted the investigation report
with findings reviewed and accepted by the Government on 17
August 2020 and the improvement action plan as mentioned
above has been implemented.
1.4 Thereafter, MTRCL planned to commission the new signalling
system and gradually introduce new 9-car trains on the EAL
(collectively, “MFO”) starting from 12 September 2020.
1 Paktel is a data logging function designed for use in testing and maintenance.
Page 8
2. Route Recall Issue
2.1 On 10 September 2020, the media raised questions with the
Government and MTRCL about the train routing issue and the
possible consequences, including the potential for wrong routing.
On 11 September 2020, MTRCL and Siemens (the Contractor)
notified the Government about an issue observed during train
testing in non-traffic hours on 11 May 2020 (the “Route Recall
Issue”). In essence, each scheduled train shall have a pre-
determined route in the pre-set train operation schedule. The
symptom of the Route Recall (RR) Issue was a slow response of
the ATS system so that, even though a train had already entered
the designated route, the route for this train was recalled by the
Automatic Route Setting function (ARS)2 for the following train.
As such, for train routing at a bifurcation, it was possible that the
following train may have gone to an unintended route and
destination e.g. Sheung Shui to Lo Wu/Lok Ma Chau, and
between Shatin and University via Fo Tan/Racecourse. Upon a
final review of the new system prior to service commencement
on 11 September 2020, the planned MFO commencement
scheduled for 12 September 2020 was held over to allow MTRCL
to find out the root cause and technical solution instead of using
operational measures to handle the issue.
2 Automatic Route Setting (ARS) is a function in ATS to perform route setting for train movement
automatically.
Page 9
2.2 Subsequent technical investigation revealed that RR also
occurred on 23 May during the NTH testing when the ATS grey
out was observed, which was due to the activation of a specific
data logging function, Paktel, in the ATS subsystem, as
mentioned in paragraph 1.3. On some occasions, false “Signal
Passed At Danger (SPAD)3” alarm will come with the RR as well,
although there were physically no such train passing through
stop signals in red. The original intent of SPAD alarm serves to
alert operators for intervention when detecting a train passing
through a stop signal in red without authorization.
3. Technical Investigation
3.1 A Technical Investigation Core Team was set up on 18
September 2020 to examine the root cause of the RR Issue and
develop technical solutions before the commencement of MFO.
The Core Team is staffed with MTRCL staff from Operations and
Projects teams, Contractor and External Technical Advisors (as
listed in Annex 1). The methodology adopted in the investigation
was primarily based on Root Cause Analysis i.e. identifying