Predicting Post-SafeTrack Metro Reliability Georgetown SCS Data Science Team Members: Patrick McGrady, Micah Melling, and Drew Wheatley Problem Statement: With an average weekday ridership exceeding 800,000 passengers, the Washington DC Metro is the second busiest rapid transit system in the United States. Through an ever expanding hub-and-spoke system of 91 stations, Metro provides service to two states and the District of Columbia. Many riders choose Metro as an alternative to what is arguably the worse street traffic congestion in the country. Given the heavy reliance of Washington-area population on the rail system, delays in train service can lead to serious issues in productivity In May 2016, following a series of high-profile delays, a deadly smoke crisis affecting the yellow line, and a blistering report from the National Transportation Safety Board, Metro officials announced the SafeTrack project. SafeTrack is a comprehensive track work maintenance effort designed to improve safety and reliability. Track work was previously constricted to the 33 hours a week train service was shut down, but SafeTrack calls for maintenance work that cuts into Metro’s operating schedule. This, in turn, leads to station shut- downs, widespread single-tracking, and reduced service hours. WMATA officials say the project will take 12 months and has an estimated price tag of $60 million. The three members of our team, each a Metro commuter, were curious about the potential effect SafeTrack would have on our daily schedules and on the region as a whole. We set out to create a Metrorail simulation model to inform riders about the potential impact of SafeTrack on their commutes. At the conclusion of our project, we wanted to gauge the effectiveness of the maintenance project and answer the question on every Metro commuters’ mind: Will it be worth it? Methodology: Our project adhered to the Data Science Pipeline outlined by Tony Ojeda and Ben Bengfort, which identifies five stages of data research. We will show below how each step led us to our final product. Data Ingestion and Wrangling: To develop a simulation model for DC’s Metrorail system, we needed two main data inputs: 1) the theoretical runtime of each line and 2) in which ways the theoretical runtime is interrupted. At a high level, we needed data that would allow us to, as accurately as possible, portray how the Metro system is disrupted from reaching its theoretical operating condition.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Predicting Post-SafeTrack Metro Reliability
Georgetown SCS Data Science
Team Members: Patrick McGrady, Micah Melling, and Drew Wheatley
Problem Statement:
With an average weekday ridership exceeding 800,000 passengers, the Washington DC Metro is the second busiest rapid transit system in the United States. Through an ever expanding
hub-and-spoke system of 91 stations, Metro provides service to two states and the District of
Columbia. Many riders choose Metro as an alternative to what is arguably the worse street traffic
congestion in the country. Given the heavy reliance of Washington-area population on the rail
system, delays in train service can lead to serious issues in productivity
In May 2016, following a series of high-profile delays, a deadly smoke crisis affecting the
yellow line, and a blistering report from the National Transportation Safety Board, Metro
officials announced the SafeTrack project. SafeTrack is a comprehensive track work
maintenance effort designed to improve safety and reliability. Track work was previously
constricted to the 33 hours a week train service was shut down, but SafeTrack calls for
maintenance work that cuts into Metro’s operating schedule. This, in turn, leads to station shut-
downs, widespread single-tracking, and reduced service hours. WMATA officials say the
project will take 12 months and has an estimated price tag of $60 million.
The three members of our team, each a Metro commuter, were curious about the potential effect
SafeTrack would have on our daily schedules and on the region as a whole. We set out to create a
Metrorail simulation model to inform riders about the potential impact of SafeTrack on their
commutes. At the conclusion of our project, we wanted to gauge the effectiveness of the
maintenance project and answer the question on every Metro commuters’ mind: Will it be worth
it?
Methodology:
Our project adhered to the Data Science Pipeline outlined by Tony Ojeda and Ben Bengfort,
which identifies five stages of data research. We will show below how each step led us to our
final product.
Data Ingestion and Wrangling:
To develop a simulation model for DC’s Metrorail system, we needed two main data inputs:
1) the theoretical runtime of each line and 2) in which ways the theoretical runtime is interrupted.
At a high level, we needed data that would allow us to, as accurately as possible, portray how the
Metro system is disrupted from reaching its theoretical operating condition.
Data on theoretical perfect runtimes were obtained from timetables on WMATA’s website. To
obtain data on system interruptions, we attempted to pull delay data from WMATA’s API.
However, we discovered that WMATA no longer made this data available, necessitating a
change of course.
Therefore, our team used a list of disruption reports found on the website Open Data DC. The
dataset was made available by a WMATA employee who was formerly tasked with compiling
this information and adding it to wmata.com. The dataset includes 23,630 instances of delays on
Metro trains. Each instance includes date, time, the line on which the incident occurred, the
direction the train was heading, a brief description of the incident, a cause, and the length of delay
in number of minutes. The data was in a downloadable CSV format. There were 349 different
types of disruption “causes”, which we were reclassified into one of three categories:
Technical: technology/mechanical failure (3rd rail power fail, radio malfunction, signal
problem, switch issue, unscheduled maintenance)
Operational: act of nature/scheduling issues (fires, medical emergency, police