Top Banner
Vol.:(0123456789) 1 3 Journal of Big Data Analytics in Transportation (2020) 2:1–16 https://doi.org/10.1007/s42421-020-00014-z ORIGINAL PAPER Impact of Bikesharing Pricing Policies on Usage and Revenue: An Evaluation Through Curation of Large Datasets from Revenue Transactions and Trips Mohan Venigalla 1  · Shruthi Kaviti 2  · Thomas Brennan 3 Received: 20 September 2019 / Revised: 17 January 2020 / Accepted: 22 January 2020 / Published online: 3 February 2020 © Springer Nature Singapore Pte Ltd. 2020 Abstract A sustainable and robust stream of revenues is an essential element of the economic sustainability of bikeshare systems. To this effect, there is an unrelenting need to quantify and understand the impacts of pricing policies and operational con- siderations on a bikeshare system’s revenue and ridership. A notable gap exists in literature on studies related to the impact of changes in pricing policy on ridership and revenue. The primary objective of this research was to assess the impact of the introduction of a $2 per trip single-trip fare (STF) product for casual users by Capital Bikeshare (CaBi), the bikeshare system in the Washington DC metro area, on its ridership and revenue. Two-year ridership and revenue transaction datasets of CaBi were used in the analysis. A substantial data curation effort was undertaken to fuse elements between the two large transactional datasets. The effort not only facilitated the impact assessment but also enhanced the value of ridership dataset by identifying trips made by casual users by the type of fare product they purchased. The casual bikeshare user revenues were traced to individual bikeshare stations where trips originated, which allowed the comparison between revenues and ridership ‘before’ and ‘after’ the launch of STF at the station level. Over 22 million records on individual bikeshare trips and revenue transactions for 3 years and 330 bikeshare stations were analyzed. The results showed a statistically significant increase in casual user ridership after the introduction of STF. There was a statistically significant decrease in revenue per ride. Statistical tests indicated that these changes might be attributable to the introduction of STF. The methods used in this study are transferable. They can be used for curating ridership data and studying the impacts of bikeshare pricing policy changes on system usage and revenues at various public bikesharing systems with similar characteristics as Capital Bikeshare. Keywords Bike share · Pricing · Revenue · Ridership · Data curation · Data fusion · Big data · Price sensitivity Introduction The rapid growth of shared mobility services such as car- pooling/ridesharing, ride hailing, carsharing and micromo- bility (which includes scootersharing and bikesharing) in recent years has reshaped the landscape of urban transporta- tion systems. Since the 2007 launch of Velib by the city of Paris (France), public bikeshare systems have caused major disruption to the landscape of urban transportation systems around the world. DeMaio (2009) argued that, as an emer- gent mode of mass transportation, bikesharing has the poten- tial for profound impact decreasing greenhouse gases and improving public health. According to various reports, there are over 100 bikesharing systems in cities and university campuses across the USA. Over 50 of these systems are in operation for 5 years or more. At present, the top five major bikeshare operators are collectively managing bikesharing systems for 25 major US cities (NACTO 2018). These sys- tems include 3rd generation (dock based), 4th generation (dockless), electric bikesharing and electric scootersharing. During 2010–2017, an estimated 160 million trips or more were registered at various bikesharing systems in US cities (NACTO 2018) and in 2018 alone, there were 36.5 million bikeshare trips (NACTO 2019). * Mohan Venigalla [email protected] 1 Office of the Secretary of Transportation (OST), US DOT, George Mason University, Fairfax, VA 22030, USA 2 Institute of Aeronautical Engineering (IARE), Hyderabad 500 043, India 3 The College of New Jersey, Armstrong Hall-RM 173, 2000 Pennington RD, Ewing Township, NJ 08628, USA
16

Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3...

Aug 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

Vol.:(0123456789)1 3

Journal of Big Data Analytics in Transportation (2020) 2:1–16 https://doi.org/10.1007/s42421-020-00014-z

ORIGINAL PAPER

Impact of Bikesharing Pricing Policies on Usage and Revenue: An Evaluation Through Curation of Large Datasets from Revenue Transactions and Trips

Mohan Venigalla1  · Shruthi Kaviti2 · Thomas Brennan3

Received: 20 September 2019 / Revised: 17 January 2020 / Accepted: 22 January 2020 / Published online: 3 February 2020 © Springer Nature Singapore Pte Ltd. 2020

AbstractA sustainable and robust stream of revenues is an essential element of the economic sustainability of bikeshare systems. To this effect, there is an unrelenting need to quantify and understand the impacts of pricing policies and operational con-siderations on a bikeshare system’s revenue and ridership. A notable gap exists in literature on studies related to the impact of changes in pricing policy on ridership and revenue. The primary objective of this research was to assess the impact of the introduction of a $2 per trip single-trip fare (STF) product for casual users by Capital Bikeshare (CaBi), the bikeshare system in the Washington DC metro area, on its ridership and revenue. Two-year ridership and revenue transaction datasets of CaBi were used in the analysis. A substantial data curation effort was undertaken to fuse elements between the two large transactional datasets. The effort not only facilitated the impact assessment but also enhanced the value of ridership dataset by identifying trips made by casual users by the type of fare product they purchased. The casual bikeshare user revenues were traced to individual bikeshare stations where trips originated, which allowed the comparison between revenues and ridership ‘before’ and ‘after’ the launch of STF at the station level. Over 22 million records on individual bikeshare trips and revenue transactions for 3 years and 330 bikeshare stations were analyzed. The results showed a statistically significant increase in casual user ridership after the introduction of STF. There was a statistically significant decrease in revenue per ride. Statistical tests indicated that these changes might be attributable to the introduction of STF. The methods used in this study are transferable. They can be used for curating ridership data and studying the impacts of bikeshare pricing policy changes on system usage and revenues at various public bikesharing systems with similar characteristics as Capital Bikeshare.

Keywords Bike share · Pricing · Revenue · Ridership · Data curation · Data fusion · Big data · Price sensitivity

Introduction

The rapid growth of shared mobility services such as car-pooling/ridesharing, ride hailing, carsharing and micromo-bility (which includes scootersharing and bikesharing) in recent years has reshaped the landscape of urban transporta-tion systems. Since the 2007 launch of Velib by the city of

Paris (France), public bikeshare systems have caused major disruption to the landscape of urban transportation systems around the world. DeMaio (2009) argued that, as an emer-gent mode of mass transportation, bikesharing has the poten-tial for profound impact decreasing greenhouse gases and improving public health. According to various reports, there are over 100 bikesharing systems in cities and university campuses across the USA. Over 50 of these systems are in operation for 5 years or more. At present, the top five major bikeshare operators are collectively managing bikesharing systems for 25 major US cities (NACTO 2018). These sys-tems include 3rd generation (dock based), 4th generation (dockless), electric bikesharing and electric scootersharing. During 2010–2017, an estimated 160 million trips or more were registered at various bikesharing systems in US cities (NACTO 2018) and in 2018 alone, there were 36.5 million bikeshare trips (NACTO 2019).

* Mohan Venigalla [email protected]

1 Office of the Secretary of Transportation (OST), US DOT, George Mason University, Fairfax, VA 22030, USA

2 Institute of Aeronautical Engineering (IARE), Hyderabad 500 043, India

3 The College of New Jersey, Armstrong Hall-RM 173, 2000 Pennington RD, Ewing Township, NJ 08628, USA

Page 2: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

2 Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

Although bikeshare reduces ownership burden and increase travel choices, a variety of structural and financial barriers prevent low-income communities from fully access-ing these systems. Increasing participation in shared rider-ship in the region has the potential to advance the system’s economic sustainability. A robust participation is also an indication of successful strategies that foster benefits to more people in the system. It is therefore necessary to quantify and understand how pricing policy and operational consid-erations impact a bikeshare system’s revenues and ridership. Similar to transit, the cost of the ridership of a bikeshare trip plays a major role in mode choice behavior of users. To cater to the preferences of users, improve service and increase ridership, bikeshare providers routinely change the pricing of existing fare products, introduce new products and alter the menu of pricing models for all user types. Pucher et al. (2010) and De Nazelle et al. (2011) have addressed the infra-structure and pricing policy implications on cycling usage. Despite the importance of pricing to bikeshare patronage, few studies focused on the impact of a well-defined pric-ing strategy on revenue and ridership (Venigalla et al. 2018, 2019; Kaviti 2018; Kaviti et al. 2018; Kaviti and Venigalla 2019).

The primary goal of this research is to examine the impact of changes made to bikeshare pricing policy on bikesharing usage and revenue. This goal is accomplished by curating, fusing and analyzing large datasets on revenue transactions and system usage at Capital Bikeshare in the metropolitan Washington DC area.

Motivation

Public bikesharing programs typically serve three user groups—members (users with an annual or monthly mem-bership); casual users (short-term bikesharing users who purchase a single trip or 24 h or multiday passes); and occasional members (users with a special key to pay for a short-term pass) (Shaheen et al. 2016). Subscriptions from members provide a steady stream of revenue to bikesharing programs. Therefore, many bikesharing providers place an emphasis on catering to the preferences of members. On the other hand, for the year 2012, casual users of bikeshare programs in North America generate the largest source of revenue through membership and usage fees ranging from 44 to 67% of the program’s total revenue (Shaheen et al. 2014). Casual users continue to account for over 70% of total revenue at Capital Bikeshare (Venigalla et al. 2018).

The motivation to conduct this research was a result of the policy decision made by Capital Bikeshare (CaBi), the public bikeshare system in the Metro Washington DC area, to launch a single-trip fare (STF) product in 2016 for its casual users. Overseen by the District Department

of Transportation (DDOT), CaBi currently has over 500 stations and more than 5000 pedal bikes and is frequently expanding its coverage in the region (DDOT 2015; CaBi 2019). CaBi serves three types of users; casual users, occa-sional members, and registered members. At CaBi, the cas-ual users and registered members combined constitute more than 98% of the bikeshare users (Venigalla et al. 2018). As of September 2019, subscription prices of prominent fare products offered by CaBi include the following.

Casual users:

• Single-trip fare (STF) for $2, for trips up to 30-min dura-tion (introduced in June 2016).

• 24-h pass for $8, for unlimited trips of 30-min duration or less in the 24-h period after the pass is purchased.

• 3-day pass for $17, for unlimited trips of 30-min duration or less in the 72-h period after the pass is purchased.

Registered members/occasional members:

• 30-day (monthly) pass for $28, for unlimited trips of 30-min duration or less that is valid for 30 days.

• Annual pass for $85, for unlimited trips of 30-min dura-tion or less that is valid for 365 days.

In addition to the subscription fee, CaBi riders incur usage fees for trip durations exceeding 30 min.

CaBi added the STF product for casual users in June 2016, in conjunction with the first scheduled SafeTrack, which is a track maintenance and safety rehabilitation ini-tiative of the Metro Rail system, operated by Washington Metropolitan Area Transit Authority (WMATA). During this rehabilitation process, Metro Rail had encouraged alterna-tive travel options because of expected delays and capac-ity restrictions. CaBi’s rationale for charging per ride as opposed to offering only 24-h and 3-day pass options for cas-ual use was that fixed cost per ride could widen the appeal of Capital Bikeshare to new audiences seeking alternative travel options during SafeTrack beyond current subscriber base. The STF option was also aimed at potentially drawing new registered members towards regular bikeshare. Within a short time after its launch, STF has become a very popular fare option among the CaBi users (Venigalla et al. 2018). However, the potential effect of neither the price of STF nor the timing of the launch on acceptance by CaBi users was studied before STF was introduced. A few months after the launch of STF, CaBi initiated this structured evaluation of the impact of STF on revenues and ridership at CaBi.

Research Objectives

The primary objective of the study that led to the develop-ment of this research paper was to evaluate the impact of

Page 3: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

3Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

the introduction of this popular new fare product in the form of STF on revenue and ridership in the Capital Bikeshare system by conducting disaggregate analysis of revenue and ridership data. This paper specifically addresses the follow-ing research questions:

(i) Are there any statistically significant changes in revenue and ridership from casual users of Capital Bikeshare after the launch of STF?

(ii) If there is a significant change, is this change attrib-utable to the launch of STF or is it simply an exten-sion of the background trend that existed before the launch?

Publicly available trip data on CaBi ridership do not iden-tify casual trips by their type (namely, STF, 24-h pass, and 3-day pass). Therefore, the ridership data were found to be inadequate in addressing the above research questions. For this reason, curating ridership data to identify ridership type have germinated as the secondary objective for the study. The availability of large amounts of ridership and revenue data at individual trip level and revenue transaction level, respectively, provided an opportunity to accomplish the study objectives.

Literature Overview

Of particular interest to this research are studies related to subscription products or ‘fare products’ and their pricing as they play a key role in policy and practice when consider-ing bikesharing systems. The literature search was focused on two primary themes. First, the focus was on studies that employed disaggregate analyses of ridership and revenue data at the level of individual stations. The second emphasis of the literature search was given to studies that examined the impact of pricing on bikeshare systems’ ridership and revenues.

Station‑Level Analysis of Bikeshare Usage Data

Rixey (2013) studied the impact of demographic and built environmental characteristics on bikeshare ridership at station level for CaBi, Denver B-cycle, and NiceRide MN systems. The results indicated that bikeshare ridership has positive correlations with population and retail job den-sity; the presence of bikeways; and bike, walk, and transit commuters. The findings also showed that the minority population and days of precipitation have a negative asso-ciation with the station-level bikeshare ridership levels. El-Assi et al. (2017) conducted a similar study to iden-tify factors affecting Toronto’s bikeshare demand at the station level by developing trip generation models. The

study further developed a station-pair regression model, which showed a positive correlation with the increase in infrastructure, a decrease in the number of intersections with major roads and a negative correlation between dis-tance and bicycle ridership. Ma et al. (2014) explored the linkages between bikeshare and transit at the station level, and demonstrated that bikesharing programs can help to increase transit ridership. The analysis showed that Metro-rail stations have been the source of important origin and destinations for Capital Bikeshare trips and concluded that an increase in trips would also increase transit ridership.

A few studies discussed how regression models could be used to determine the bikeshare ridership at the sta-tion level. Zhang et al. (2017) developed multiple linear regression models to study the effect of built environment variables on trip demand and the ratio of demand to sup-ply (D/S) at station level for public bikesharing system in Zhongshan, China. The results showed that both trip demand and D/S were positively correlated with popula-tion density, length of bike lanes, and diverse land-use types near the station. The findings also suggest that adding a new station with additional capacity within a 300-meters (m) radius of an existing station can improve the D/S at the station level. Wang et al. (2015) developed regression models to identify factors affecting bike sta-tion activity for Nice Ride Minnesota. The study showed that the proximity to Central Business District; academic campuses and parks; and access to off-street paths have the highest marginal effects on the station use. At the same time, sociodemographic characteristics and economic vari-ables have minimal marginal effects.

de Chardon and Caruso (2015) compared various aggre-gation models to calculate daily trips at different public bike-share systems. The study developed day aggregation, inter-val aggregation and station aggregation models to estimate the number of daily trips for eight major bicycle sharing sys-tems in Europe and North America. The results showed that the daily aggregate model provides better estimates of trips compared to other models. In a study of big data associated with cycling, Romanillos et al. (2016) pointed towards the possible new insights through analysis of combined spatio-temporal data sets on cycling along with a fusion of more conventional health, socio-demographic or transport data.

Research on comparative assessment of aggregate and disaggregate models for the prediction of bikeshare demand is sparse. Biehl et al. (2018) developed two Generalized Linear Models at station and community level to predict average annual daily bicyclists for Chicago’s Divvy bike-share system. The results show that the station-level analysis has superior predictive capacity than the community-level analysis, and averaging of disaggregate results to represent community areas has better accuracy than the aggregate model. This is because the disaggregate model contains

Page 4: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

4 Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

more information regarding the bikeshare system, built envi-ronment and socioeconomic factors that impact bike usage.

Studies Related to Impact of Pricing on Usage

Though numerous studies discussed factors affecting the bikeshare ridership, very few studies included pricing as one of the factors (Kaviti 2018; Kaviti et al. 2019). Jurdak (2013) analyzed the time-specific cost structure of the public bikesharing systems in Boston and Washington, DC. The study observed that registered users exhibit higher cost sen-sitivity around 30- and 60-min pricing boundaries compared to casual users. One of the recommendations of this study is that incentives should be provided to bikeshare users on specific congested roads with dynamic pricing based on the current traffic conditions. Goodman and Cheshire (2014) examined how the profile of income-deprived and women users changed in the first 3 years of operations at the London Bicycle Sharing System (LBSS). The percentage of income-deprived users doubled as the LBSS expanded its system to areas with low-income populations and women users make a higher share of casual trips. However, these positive developments have been partially offset by increasing the then prevailing prices at LBSS by 50%. The study further argues that bikeshare fares should be in a reasonable range to maximize the bikeshare usage and to make the system more equitable to all the users.

A report by Venigalla et al. (2018) and research paper by Kaviti et al. (2018) discussed the impact of the launch of $2/trip STF by CaBi on its revenue and ridership at the jurisdic-tion level. These two studies examined the interrelationship of revenue and ridership with other system variables such as supply (as measured by the number of stations and bike racks or docks), jurisdiction, seasonality, transit disruptions, day of week and precipitation. Aggregate analysis performed at the level of two urban (Washington DC and Arlington, VA) and two suburban jurisdictions (Alexandria, VA, and Montgomery County, MD) showed a significant increase in casual user ridership for the two identical 12-month periods before and after the introduction of STF. However, the study found that the analysis on the impact of STF on revenue from casual users before and after STF at jurisdic-tion level was inconclusive. Though notable changes were observed in revenues aggregated at the jurisdiction level, the paper could not verify if the changes observed in revenues after the introduction of STF were, in fact, attributable to the introduction of STF. The analysis performed by Kaviti et al. (2018) was primarily based on ridership and revenue data aggregated by month and jurisdiction, which has no fidelity at the daily level and station level. Furthermore, in normalizing revenues and ridership on a ‘per-dock’ basis, the analysis by Kaviti et al. (2018) not only included new stations with sparse ridership but also diluted the true impact

of the introduction of single-trip fare at stations that have high ridership. Therefore, disaggregate analysis of the data at station level (i.e., analysis of individual trips and revenue transactions by station) could provide additional valuable insights on the impact of STF.

Ahillen et al. (2016) compared the policies and rider-ship trends of Washington, DC’s Capital Bikeshare and Brisbane’s Citycycle. The findings show that CaBi had few changes in its pricing policy since its launch in 2010. How-ever, Brisbane CityCycle reduced the daily subscription fees from $11 to $2, introduced weekly subscriptions and pro-vided free helmets at each of the stations. The results show providing helmets, reducing subscription fees, and adding flexible subscriptions to users may have contributed to a 50% increase in Citycycle ridership in just six months. Kaviti et al. (2018) studied the impact of introducing single-trip fare (STF) for $2 on CaBi ridership and revenue. The results showed that introducing this new fare option increased the monthly ridership for first-time casual users and all casual users by 79% and 41%, respectively.

Summary

The literature review identified limited research on station-level analysis and the benefits of using the disaggregate analyses over aggregate analyses in the public bikeshare sys-tem. Studies on the impact of pricing changes on bikeshare ridership are scant. This study attempts to fill these gaps by analyzing the impact of a single-trip fare on the Capital Bikeshare ridership and revenue at the station level. Also, this research compares the disaggregate models with that of the aggregate models for the newly introduced fare product.

Data and Data Curation

Municipal jurisdictions with public bikesharing systems typically enter into contractual agreements with bikeshare operators to provide extensive, high-quality data on individ-ual trips with spatio-temporal references for every trip. For example, publicly available bikeshare trip data at the seven largest bikeshare systems in the USA (New York City’s Citi Bike; Chicago’s Divvy; Washington DC’s Capital Bikeshare; Boston’s Blue Bikes; San Francisco’s Bay Wheels; Minne-apolis’ Nice Ride; Los Angeles’ Metrobike and Philadel-phia’s Indego) date back to 2010 and contains data on over 115 million trips. These data are extensively used by practi-tioners and researchers to study such a diverse set of issues as travel behavior of riders, ridership trends, etc. (Ahillen et al. 2016; Biehl et al. 2018; Goodman and Cheshire 2014; Handy et al. 2014; Kaviti 2018; Kaviti et al. 2018, 2019; Kodransky and Lewenstein 2014; Rixey 2013; Romanillos et al. 2016; Shaheen et al. 2014, 2016; Venigalla et al. 2019).

Page 5: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

5Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

For example, Fishman et al. (2014) and Shaheen et al. (2016) studied the data and observed that operational factors such as bicycle base station density and availability of bikes pose barriers to increasing the usage of bikesharing.

Study Data

The study employed two primary data sources, which include data on individual CaBi trips and revenue transac-tional data for every CaBi revenue transaction during the period of January 2015 through May 2017.

• Dataset 1—CaBi ridership data. This dataset contains information on anonymous individual trips and is avail-able to the public at http://www.capit albik eshar e.com/trip-histo ry-data. The dataset contains detailed informa-tion on each trip, which includes start and end stations, start and end times, duration of the trip, etc.

• Dataset 2—revenue transactional data. These data, which were obtained exclusively for this study and are not avail-able for public, include information on each revenue recognition transaction, including type of subscription (fare product) and refunds issued to customers. Variables included in the dataset are transaction date (includes time to the second), fare product (single trip, annual member-ship, etc.), transaction amount, and the station at which the transaction occurred. To protect the security and identity of the users, all personally identifiable data were not provided to the researchers.

Additionally, for the purposes of analysis control, daily weather data were obtained from Weather Underground history data website (2016, http://www.weath erund ergro und.com) which offers historical weather data for different regions in the USA.

Curation and Fusion of Datasets

Data curation is the act of discovering a data source(s) of interest, cleaning and transforming the new data, semanti-cally integrating it with other data sources, and deduplicat-ing the resulting composite (Stonebraker et al. 2013). Data-sets 1 and 2 acquired for this study are disparate sets and have no relational connectivity. The trip dataset identifies trip-makers only as “Casual” users or “Registered” mem-bers. To determine the impact of STF on ridership and rev-enues, especially from the casual user segment, it is essential to identify the casual users by the type of fare products they used. However, when designing the data products for public consumption, CaBi operators have not included the type of fare product in the ridership data. It was not clear if the lost data fidelity on trip-type could be reclaimed using the all available data. Therefore, a major data curation effort was

necessary to fuse data elements between the two datasets for achieving the primary objective of this research. The curation process posed a unique challenge because the two datasets are not related by a foreign key. However, the opera-tional details of the CaBi point of sale (POS) system during the evaluation period offered a solution to this challenge.

Typical points of sale for major public bikeshare systems in the USA include websites, self-service kiosks located at bikeshare stations, some retail outlets and mobile apps. Until the launch of CaBi mobile app in late July 2017, casual users of CaBi could only purchase single-trip or multi-ride passes via credit card transactions at the self-serve kiosks located at bike stations. Furthermore, all casual users began their first trip at the station where they made the purchase. At the same time, trips made by registered members, and the second or subsequent trips by 24-h pass and 3-day pass holders cannot be attributed to any particular transaction in Dataset 2. For, these membership products are bought from anywhere with an internet access and bikes are checked out from any of the 500+ stations.

These POS operational details at CaBi facilitated attribut-ing every casual fare product transaction (single-trip, 24-h and 3-day passes) to at least one trip made from the single station where the transaction occurred, and the trip began. Making such attribution would be difficult for casual fare product purchases made using the CaBi app because, like membership purchases, casual fare purchases can be made anywhere using the app, while the trip can only begin at a station. During the study period, a typical registered member of CaBi purchased a monthly or annual pass only through the CaBi website. Though currently most casual user and member transactions occur through CaBi mobile app, during the study period for this research, CaBi mobile app was yet to be launched. (It is important to note here that the adoption of mobile apps as POS has been slow at several bikeshare systems in the USA. For example, CaBi launched its mobile app only in July 2017, which is nearly seven-years after the launch of the bikeshare program.)

Using this detailed knowledge of CaBi operations and its POS system, a data curation methodology was developed to merge data elements from trip and revenue transaction datasets (Fig. 1). To reconstruct the lost fidelity on casual user type, ridership data (Dataset 1 in Fig. 1) and revenue transaction data (Dataset 2 in Fig. 1) were closely examined for relational elements between them. As illustrated in the figure, at station 31277, a ‘24-h’ pass was purchased with transaction ID #14134 on January 4, 2015, at 4:22 PM. A ‘Casual’ trip with ID #12062 has originated at 4:23 PM at the same station. Based on the aforementioned operational details, a casual trip with ID #12062 is taken by a ‘24-h’ pass purchaser. By the same logic, casual trip ID #12065 was made by a 3-day pass purchaser. Both these casual trips (#12062 and #12065) were the first casual trips made by

Page 6: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

6 Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

these users. Thus, trips #12062 and #12065 would be identi-fied as trips by users of ‘24-h’ and ‘3-day’ passes. However, subsequent casual trips made by these two casual users can-not be identified by their pass type and hence, they would be identified as ‘Repeat casual’ users.

After identifying the relationship between revenue trans-action data and trip data, a relational database model (Fig. 2)

was implemented. A visual basic (VB) script was developed to implement the aforementioned logic and identify each casual trip as an STF, 24-h pass, 3-day pass or repeat casual trip. The pseudo-code for the script is shown in Fig. 3.

Using the VB script, each casual user transaction in the revenue database was searched for matching records in the trip database and the user information (24-h pass, 3-day

Fig. 1 Illustration of the data curation methodology

Fig. 2 Entity relationship (ER) for fusing transactional data with trip data

Page 7: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

7Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

pass or STF) was then transferred to the trip database. This curation process required looping over 22 million transactional records several times. When the task was completed, more than 99% of the casual user transactions could be matched with a casual trip within 5 min of the transaction at the stations where the transaction occurred. Those ‘Casual’ trips in the ridership tables with no match-ing transaction were deemed to be ‘Repeat casual’ users of those who purchased ‘24-h’ and ‘3-day’ passes. No attempt was made to distinguish ‘Registered’ users in the ridership tables as ‘annual’ or ‘monthly’ members. For, these products are not purchased at CaBi kiosks, which is an essential requirement for the above logic to work.

In addition to needing the fare-type information on cas-ual trips for meeting the primary objective of this study, the information is also important for gaining insights into travel behavior of CaBi users based on the fare product they purchased. For example, plotting (Fig. 4) the trip-length frequencies (at 5-min intervals) characteristics of casual CaBi users was not possible before this curation effort. As shown in Fig. 4, four types of casual users (STF, first trips of 24-h and 3-day pass buyers and repeat trips) have distinctly different trip-length frequency profiles. While more than 70% of registered members (Fig. 4e) complete their trips under 15 min (well below the first 30 min of free riding time), usage times for casual users are stretched over and beyond the 30 min duration.

This tedious data curation task would not have been necessary if the ridership data contained information on the fare product (namely, single-trip, 24-h, 3-day, annual, etc.) under the ‘Member Type’ field, rather than aggre-gated categories ‘Casual’ and ‘Registered’. It should be pointed out that this problem is not unique to CaBi.) With the gaining popularity of CaBi App, users can purchase all fare products anywhere and therefore purchase point and start station of the trip could not be used to reconstruct the fare product purchased by the CaBi trip maker. Neverthe-less, the data curation exercise helped in augmenting the

trip dataset that is amenable to analyzing the impact of price changes on ridership and revenues.

Methodology and Results

The curated datasets were analyzed using SQL queries within the relational database management system and the statistical analysis system (SAS). The analysis methods and results are presented and discussed in this section.

Response Variables

System usage and revenue are the primary response vari-ables examined in the impact assessment analysis. The extent of system usage is reflected in the number of trips taken by users, and trip lengths or trip durations. However, trip-length information is not available in the data. For this reason, only the trip duration was used as one of the three response variables. While total revenues are an indicator of the impact, true impacts on revenue may be captured only through revenue normalized for usage (or, revenue per trip). In summary, the set of response variables included in the analysis are the following:

• Ridership (number of trips).• Usage (trip length in minutes).• Revenue (total revenue and revenue per trip).

The revenue analysis was performed only on the casual user segment. Revenues from registered members were not included in the analysis for two reasons. First, Kaviti et al. (2018) established that the launch of STF has not impacted the ridership of registered users. Second, the revenue from registered users could not be sourced to individual stations where the registered users have made their trips.

Fig. 3 Pseudocode for fusing data between revenue transac-tion and trip datasets

Page 8: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

8 Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

Explanatory Variables

Variations in response variables were examined as a func-tion of the following explanatory variables and their two-way and three-way interactions:

• Station: a single station or set of stations based on their location,

• Weekend/weekday: whether or not the rides were taken on a weekday where commute trips could be predomi-

(a) (b)

(c) (d)

(e)

0.0%2.0%4.0%6.0%8.0%

10.0%12.0%14.0%16.0%

Trip Length (min)

Single Trip Users

2016 2017

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

Trip Length (min)

24-Hour Pass Users

2015 2016 2017

0.0%2.0%4.0%6.0%8.0%

10.0%12.0%14.0%16.0%18.0%

Trip Length (min)

3-Day Pass Users

2015 2016 2017

0.0%2.0%4.0%6.0%8.0%

10.0%12.0%14.0%16.0%18.0%

Trip Length (min)

Repeat Casual Users

2015 2016 2017

0.0%

5.0%

10.0%

15.0%

20.0%

25.0%

30.0%

35.0%

40.0%

Trip Length (min)

Registered Users

2015 2016 2017

Fig. 4 Trip-length frequencies by fare product type (made possible by data curation)

Page 9: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

9Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

nant, or on a weekend where recreation trips could be predominant.

• Month: month in which trips were taken to account for seasonality.

Control Variables

To enable a classical ‘before-and-after’ experimental set-up for evaluating the true impact of STF on response variables, other variables that could potentially influence the outcomes must be controlled for. These controls and treatments for the experimental setup and evaluation included the following:

• The station-level disaggregate comparative analysis is conducted by pairing variables only at 330 stations that are common to the 12-month periods ‘before’ and ‘after’ the launch of STF. This direct comparison excludes sta-tions that are open only for a partial time in the 24-month analysis period and also eliminates the impact of season-ality.

• Days with precipitation were excluded from the analysis.• No adjustments were made for temperature variations.

However, by including calendar month as an independent

variable, seasonal effects on ridership were controlled for.

Casual User Revenues at Top 20‑Common Stations

A prior analysis of aggregate monthly summaries of rev-enues showed a decline in revenue from casual users (Veni-galla et al. 2018; Kaviti et al. 2018). However, due to nor-malization by the number of docks, the aggregate analysis did not fully explain the impact of STF on revenue from casual users. To closely examine the STF at individual sta-tions, casual user revenues at kiosks located at each of the 330 common stations were analyzed. In the revenue trans-action database, ‘Billing’ status designation as ‘Product’ indicates a sale at a station kiosk Only the revenues that are marked as ‘Product’ sales at a CaBi station are included in the analysis. Usage fees and refunds were excluded.

Casual user revenues located at the top 20 of the 330 common stations (Table 1) indicates that the introduction of STF resulted in a notable reduction in casual user rev-enues at almost all 20 stations. The declines in revenues from 24-h and 3-day passes are 42% and 34%, respectively, indicating a shift from casual usage towards the STF prod-uct. After the launch of STF, revenues from all casual users

Table 1 Revenues from casual fare products at the top 20 stations

Station Before STF (June 2015–May 2016) After STF (June 2016–May 2017)

24-h pass 3-day pass Total casual 24-h pass 3-day pass Single trip Total casual

Jefferson Dr and 14th St SW $157,272 $10,540 $167,812 $90,968 $7021 $31,564 $129,553Lincoln Memorial $141,872 $8755 $150,627 $58,344 $5287 $29,418 $93,049Smithsonian-National Mall/Jefferson Dr and 12th St $113,368 $7735 $121,103 $67,960 $5287 $23,052 $96,2994th and C St SW $48,440 $5253 $53,693 $28,392 $3060 $9222 $40,674New York Ave and 15th St NW $48,800 $4182 $52,982 $29,616 $3043 $9664 $42,323Massachusetts Ave and DuPont Circle NW $35,880 $7157 $43,037 $23,016 $4930 $8450 $36,396Ohio Dr and West Basin Dr SW/MLK and FDR Memo-

rials$41,272 $1343 $42,615 $24,040 $1037 $10,026 $35,103

Constitution Ave and 2nd St NW/DOL $38,216 $4148 $42,364 $26,160 $3468 $10,352 $39,980Jefferson Memorial $33,272 $1734 $35,006 $22,904 $1530 $12,030 $36,46419th St and Constitution Ave NW $33,072 $1921 $34,993 $14,184 $1122 $5688 $20,994Columbus Circle/Union Station $26,800 $4828 $31,628 $17,096 $3315 $7498 $27,90910th St and Constitution Ave NW $28,392 $2414 $30,806 $18,704 $1275 $6584 $26,56317th and G St NW $28,760 $1938 $30,698 $20,576 $1649 $6322 $28,54714th and D St NW/Ronald Reagan Building $26,344 $2805 $29,149 $20,864 $2227 $6470 $29,561Thomas Circle $22,272 $6069 $28,341 $12,968 $3485 $5498 $21,951USDA/12th and Independence Ave SW $25,032 $2414 $27,446 $16,400 $1156 $5006 $22,56221st St and Constitution Ave NW $24,504 $2329 $26,833 $9960 $1190 $4808 $15,958Georgetown Harbor/30th St NW $23,912 $1581 $25,493 $14,120 $884 $7332 $22,3367th and F St NW/National Portrait Gallery $22,200 $2448 $24,648 $12,904 $1547 $6542 $20,993Washington and Independence Ave SW/HHS $20,904 $2176 $23,080 $12,104 $1156 $4786 $18,046Totals $940,584 $81,770 $1022,354 $541,280 $53,669 $210,312 $805,261Percent change after STF − 42% − 34% N/A − 21%

Page 10: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

10 Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

at these 20 stations declined by 21%, despite a 3.5% increase in ridership. A closer examination of casual user revenues at individual stations indicates that all but two of the top 20 stations (Jefferson Memorial; and 14th and D St NW/Ronald Reagan Building) experienced a decline in revenues. Declines in revenues at individual stations range from about 12% at Columbus Circle/Union Station to over 40% at 21st St and Constitution Ave NW (computations are not shown in Table 1).

Comparisons at All 330 Common Stations

Table 2 presents a summary of usage and revenues at the top 20 stations and 330 stations that existed throughout the study period. The comparison of usage metrics at the top 20 stations indicates that casual user ridership has generally increased after the launch of STF. After STF, trips starting at the top 20 stations have grown by less than 1% and total trip hours increased by nearly 2%. In contrast, for all 330 common stations casual trips increased by nearly 20% and trip duration increased by 38%. Of the 330 stations, 282 (or 86%) stations recorded growth in trips and 266 (or 81%) recorded growth in trip durations. It is interesting to note here that the usage (both in terms of trips and trip hours) by casual users increased at nearly twice as many stations as is the case for registered users. Despite such large increases in usage at common stations, it can be seen that the total revenue at 330 stations declined by 16% (over 21% decline at the top 20 stations). Figure 5 illustrates the heat maps of changes in ridership and revenue after the launch of STF.

Hypotheses Testing

A number of pairwise comparisons were made to statisti-cally verify if the launch of STF has impacted the revenue and ridership in the CaBi system. Hypotheses tests were conducted on mean values of response variables, namely, number of trips, trip duration, and normalized revenue, and the growth rates of ridership and revenue. Because of its simplicity, time-tested dependability in establishing sta-tistical significance, (and also because the number of sta-tions is much greater than 30), paired z test is determined to be the most appropriate hypothesis test for comparing the response variables ‘before’ and ‘after’ the introduction of STF. The generalized formulation of hypotheses tested is shown below.

Null hypothesis, H0:

Alternate hypotheses, Ha:

where (

�r,p

)

A is the mean of response variable r for the

comparison pair p—after the launch of STF; and (

�r,p

)

B is

the mean of response variable r for the comparison pair p—before the launch of STF.

(

�r,p

)

A−(

�r,p

)

B= 0.

(

𝜇r,p

)

A>(

𝜇r,p

)

B, (One - tailed)

(

𝜇r,p

)

A<(

𝜇r,p

)

B, (One - tailed) or

(

�r,p

)

A−(

�r,p

)

B≠ 0, (Two - tailed)

Table 2 Stations with changes to revenue from, and ridership of casual users

Revenue recognized from registered members is not attributable to any particular station

Metric (summaries are for trips origi-nating at the stations)

12-month period Stations with increase after STF launch

Before STF After STF % change Number Percent

20 stations with the highest ridership Casual users Trips 212,004 213,315 0.6 9 45.0

Trip hours 118,602 120,887 1.9 10 50.0Revenue ($) $1,022,354 $805,261 − 21.2 2 10.0

 Registered users Trips 187,926 181,285 − 3.5 4 20.0Trip hours 42,017 41,341 − 1.6 5 25.0

330 stations that existed throughout the 24-month analysis period Casual users Trips 607,621 727,691 19.8 282 85.5

Trip hours 322,446 445,439 38.1 266 80.6Revenue ($) $2,205,559 $1,850,467 − 16.1 70 21.2

 Registered users Trips 2,541,227 2,610,443 2.7 142 43.0Trip hours 487,501 442,828 − 9.2 117 35.5

Page 11: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

11Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

Response variable set, r represents the mean ridership (number of casual users); mean normalized revenue ($ per casual ride); mean growth rate in ridership; and mean growth rate in normalized revenue.

Pair-level p represents the paired levels of independent variables at which comparisons are made. (a) 330 individual stations (319 in the case of growth rate comparisons); (b) station and weekend/weekday (two-way interaction); (c) station and month (two-way interaction); and (d) station, month, and weekday/weekend (three-way interaction).

Tests for Normality

Paired z tests are applicable only to normally distributed populations. To confirm if the response variables are nor-mally distributed, mean values of ridership, normalized revenue ($ per trip) and growth rates of revenue and rid-ership were tested for Normality using descriptive (box plots) and theory-driven methods (quantile–quantile or Q–Q plots). Box plots (Fig. 6) show that whiskers are evenly spread out around the boxes, and the median val-ues are generally in the middle of the box—both of which are indicative of a Normal distribution of the variables. Box plots also indicate a sharp decline in revenue for a

casual ride and the associated growth rates (Fig. 6a, c), a noticeable increase in ridership growth (Fig. 6b). Q–Q plots and comparative histograms illustrating the distribu-tion of response variables for all stations in the analysis are shown in Fig. 6. Though the Q–Q plots are slightly convex at edges (Fig. 6d–f), the histograms’ approximation of the Gaussian curve is interpreted as an indication that three response variables are normally distributed. Furthermore, the close alignment of bulk of the observations along the 45° line suggests that the data are normally distributed.

The data preparation for hypotheses testing included the following steps:

1. Arranging ‘before’ and ‘after’ revenue, casual trips, and trip-hours data aggregated by all possible combinations of the station, month, and weekday/weekend.

2. Maintaining aggregation of paired observations of response variables by the station, month, and whether the trip occurred on a weekday or a weekend. This grouping is chosen to verify if calendar month or week-day status has any impact on the increase/decrease because it has been widely established in the literature that bikeshare ridership is dependent on these variables.

Fig. 5 Heat map of changes in ridership and revenue after the introduction of STF

Page 12: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

12 Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

3. Normalizing station-level revenue per casual trip (as opposed to total revenue) to smooth wide variations in total revenue among stations.

4. Removing data points on days with precipitations as precipitation has its own impact on bikeshare ridership. However, no attempt was made to control for tempera-ture such as eliminating data points on extremely cold or hot days.

5. Computing background growth rates using available data for 5 months prior to the launch of STF so as to compare

these rates to the growth rates after the launch of STF. Such comparison would establish whether or not the background growth itself has changed due to the launch of STF, thereby confirming or negating the impact of STF on trips and revenue by casual users.

Pairwise Comparisons

A series of pairwise comparisons were made to verify the following two primary one-tailed alternative hypotheses:

(a) Revenue per casual ride (n=330)

(b) Calendar month growth rates of casual user ridership (n=319)

(c) Calendar month growth rate of revenue per casual user (n=319)

(d) Revenue per casual ride (n=330)

(e) Calendar month growth rates of casual user ridership (n=319)

(f) Calendar month growth rate of revenue per casual user (n=319)

Normal Kernel

Fig. 6 Box plots and Q–Q plots and comparative histograms with normal and kernel densities

Page 13: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

13Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

Hypothesis 1 Casual user revenues decreased significantly after the launch of STF product. i.e.,

(

𝜇r,p

)

A<(

𝜇r,p

)

B.

Hypothesis 2 Casual user ridership increased significantly after the launch of STF product, i.e.,

(

𝜇r,p

)

A>(

𝜇r,p

)

B.

Presented in Table 3a are the results of hypotheses tests at various levels of aggregation for hypothesis 1. The table shows that the mean values of revenue per ride 12 months before and 12 months after STF for each combination of 330 stations, 12 months and 2 weekday/weekend possibilities are $5.05 and $3.11, respectively. These mean values indicate that before the launch of STF, on average casual users paid $5.05 per trip. This amount declined to $3.11 per trip after the STF launch. The total possible number of paired obser-vations for these combinations would be 7920 (330 stations, 12 months and 2 weekend/weekday designations). Statistics presented in the table shows that the decline in mean rev-enue is statistically significant at 5% level of significance as indicated by a z score of 59.9 and a p value of near zero. Likewise, pairwise comparisons of mean values of revenues aggregated at station and month, and station and weekday/weekend combinations indicate a statistically significant decline in revenues after STF launch.

Table 3b presents an analysis for change in casual rider-ship (trips) in a month before and after the launch of STF. As the table shows, the average number of trips for each combi-nation of 330 stations, 12 months and 2 weekend/weekday possibilities before and after STF is 90.5 and 101.1, respec-tively. The difference is indicative of an increase in ridership after STF launch. The z score (− 2.545) and p value (0.005) denote statistical significance to this increase. Similarly,

pairwise comparisons aggregated at all possible combina-tions of station and month indicate a statistical significance to the ridership increase at each station by month. The p value of 0.276 for the difference in average trips at the sta-tion level (151.7 vs. 169.4) indicates that there is relatively weaker evidence of station-level aggregate increase in trips after the launch of STF. Pairwise comparison for casual user ridership was not examined for dataset aggregated by station and weekday/weekend because the casual user ridership in a month was considered in the analysis. A closer examination of the data indicated that a few outliers that saw dramatic reductions in ridership might have skewed station-level aggregation. However, for consistency, no attempt was made to remove those outliers. For example, in the CaBi service area the March 2017 was unusually colder when compared to March 2016. This resulted in a dramatic drop in ridership in March 2017 over March 2016 (Venigalla et al. 2018).

Thus, the common stations have experienced a generally significant increase in ridership and decisively significant decline in revenue after the launch of STF. It is not known if the launch of STF itself caused these changes or if the changes were due to the continuation of a trend that was in existence from months prior to the launch. Additional pair-wise z tests were performed to verify if the growth trends in revenues and ridership have significantly changed after STF.

Presented in Table 4a are the pairwise comparisons of revenue and ridership growth rates for 5 months before and after the launch of STF, respectively. The 5-month period (as opposed to the 12-month period) was chosen due to limited availability of data. The mean revenue growth rate of 0.162 (column labeled �B ) indicates that before the launch of STF, an average growth rate in casual user revenue of 16.2% was

Table 3 Pairwise comparisons of revenue and ridership

Pair level (p) N Observation pair: mean revenue ($) per casual ride

z test

12 months before STF �

B

12 months after STF �

A

Ha (alternative hypothesis)

z score p value

(a) Revenue per casual ride Station, month and weekday/weekend 6635 5.046 3.113 �

B > �

A59.96 0.00

 Station and month 3512 5.131 3.127 �B > �

A49.91 0.00

 Station and weekday/weekend 655 5.214 3.147 �B > �

A33.16 0.00

 Station 330 5.236 3.147 �B > �

A25.63 0.00

Pair level (p) N Observation pair: mean monthly casual user ridership (trips)

z test

12 months before STF �

B

12 months after STF �

A

Ha (alternative hypothesis)

z score p value

(b) Casual user ridership Station, month and weekday/weekend 6635 90.54 101.09 �

B < �

A− 2.54 0.005

 Station and month 3512 171.05 190.98 �B < �

A− 1.82 0.034

 Station 330 151.70 169.37 �B < �

A− 0.59 0.276

Page 14: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

14 Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

recorded for each combination of 319 stations, five calendar months and two weekdays or weekend designates. Its coun-terpart after STF (column labeled �A ) registered about 29% decline in revenues after the launch of STF. That is, trends in revenue growth changed from positive growth to negative growth after STF launch. On the other hand, as Table 4b shows, mean year-over-year growth rates of the casual user ridership for comparable calendar months have acceler-ated after the introduction of STF from about 66% to about 119% (station level). The pattern is similar for other levels of aggregation. Thus, the statistical measures presented in Table 3 establish statistical significance to the decline in revenue growth and an increase in ridership growth after the launch of STF.

Since trip duration and ridership tend to be highly cor-related, pairwise comparisons were not performed on trip duration as a response variable.

Conclusions and Discussion

As its primary objective, this research examined the impact of the launch of a single-trip fare (STF) product on Capital Bikeshare ridership and revenue by analyzing large amounts of system-wide data. The unique characteristics of the point of sale system at Capital Bikeshare were leveraged not only for data curation but also for designing and executing a con-trolled ‘before’ and ‘after’ experiment. The analysis included complex data curation and fusion process that allowed to

reclaim most of the lost fidelity of trip data that included a combined total of over 22 million records.

Impact of Pricing

The curated dataset facilitated the design of an experiment that allowed revenues to be sourced to individual stations, which further allowed comparing station-level revenues and ridership before and after the launch of STF. The revenue and ridership analyses were performed at individual station level where trips were aggregated by comparable month, weekend and weekday categories.

Statistical tests were performed on casual user revenue and casual user ridership for the 12-month period before and after the introduction of STF at the 330 common stations. The results showed a decrease in casual user revenue per ride and an increase in monthly casual user ridership after the introduction of the STF. Furthermore, calendar-month growth rates for ridership and revenue were compared for periods before and after the launch of the new fare product for a five-month period at hundreds of common stations. The study has established statistical evidence that the launch of STF has significantly decreased revenues and increased ridership at CaBi. Additionally, trends in revenue growth changed from positive growth to negative growth after the launch of STF. However, it should be noted that it is not practical to identify and control for all possible variables that could have caused the decline. This study also demonstrates that the disaggregate analysis conducted at the station level has superior accuracy and helps in better understanding of

Table 4 Pairwise comparisons of growth rates

a Only 319 of the 330 stations which existed during January–May 2015 are used in growth rate analysis

Pair level (p) N Observation pair: mean growth rates of casual user revenue (ratio)

z test

12 months before STF �

B

12 months after STF �

A

Ha (alternative hypothesis)

z score p value

(a) Casual user revenue Station, month and weekday/weekend 2407 0.162 − 0.287 �

B > �

A18.79 0.00

 Station and month 1319 0.171 − 0.289 �B > �

A15.94 0.00

 Station and weekday/weekend 622 0.168 − 0.308 �B > �

A15.34 0.00

 Station 319a 0.165 − 0.314 �B > �

A12.66 0.00

Pair level (p) N Observation pair: mean growth rates of casual user ridership (ratio)

z test

12 months before STF �

B

12 months after STF �

A

Ha (alternative hypothesis)

z score p value

(b) Casual user ridership Station, month and weekday/weekend 2407 0.662 1.194 �

B < �

A− 8.82 0.00

 Station and month 1319 0.734 1.262 �B < �

A− 6.99 0.00

 Station and weekday/weekend 622 0.617 1.288 �B < �

A− 7.91 0.00

 Station 319a 0.648 1.332 �B < �

A− 5.82 0.00

Page 15: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

15Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

the data than the community-level analysis performed by Kaviti et al. (2018).

Most importantly, this paper fills a notable gap in the lit-erature related to the impact of introducing new fare options on bikeshare ridership and revenue. It should be noted that this study only examined the impact of pricing change on usage and did not investigate the user behavioral factors that may have influenced the changes in usage. It is pos-sible that the results and findings may be unique to Capital Bikeshare. However, the controlled nature of the experiment and the analyses results shed light on the fundamental nature of the impact of change in fare structure on revenues and ridership. Bikeshare providers who are considering making changes to fare product line and their pricing could benefit from the findings of this study. In cases where changes have already been made, the methods used in this research may be employed to evaluate the impact of those changes on rider-ship and revenue at those systems. For example, other cit-ies have introduced single-trip fare products as well: Metro Bike (Los Angeles) in 2017, and Divvy (Chicago) and Citi Bike (New York) in 2018 have introduced single-trip fare products ($3/trip at Divvy and Citi Bike; and $3.50/trip at Metro Bike). The methods discussed in this paper are flex-ible enough to study the impact of STF on ridership and revenue at these systems.

Several factors pose barriers for implementing bikeshar-ing operations in any urban environment. Of particular concern are low-income areas, which, based on previous research, stand to have the most social and economic ben-efits from a shared mobility system. A strategy that increases participation in shared ridership in the region, while concur-rently advancing the system’s economic sustainability, is an indication that more people are benefiting from the system. Studies focused on examining inter-relationship between pricing and user sensitivity to pricing such as developing price elasticities, logit models, etc., can further advance this research.

The Need for, and the Value of Data Curation

The need for, and the effort required to curate and fuse data could have been eliminated had the trip data contained information on the type of casual users making the trips. A similar need could arise to differentiate the type of regis-tered members for different analytical needs. The study team examined the publicly available trip datasets at the seven largest bikeshare systems in the USA. Five of these seven systems categorize trips into only two user types: Casual (alternative designations used: Single User, Customer, etc.) and Registered (alternative designation: Subscriber). Such level of aggregation leads to loss of fidelity of data with respect to price paid for those trips. Thus, a need for data curation could arise at those five systems for performing

such analyses as usage vs. price paid per trip (similar to trip-length analysis shown in Fig. 4). Therefore, it is rec-ommended that bikeshare providers and operators include additional details on the type of user (in the case of Capital Bikeshare the variable is “Member Type”).

Acknowledgements This research was performed in cooperation with the District Department of Transportation (DDOT) and the Federal Highway Administration (FHWA). The contents of this report reflect the views of the authors, who are responsible for the facts and the accuracy of the data presented herein. The contents do not necessarily reflect the official view or policies of the FHWA or DDOT. This report does not constitute a standard, specification, or regulation. Also, partial funding for the research was provided by DDOT and the US Depart-ment of Transportation’s University Transportation Centers research program. The study team would like to express its gratitude to the panel, especially to Ms. Kimberley Lucas, Dr. Stefanie Brodie and Ms. Stephanie Dock of DDOT for their invaluable guidance, input and support.

References

Ahillen M, Mateo-Babiano D, Corcoran J (2016) Dynamics of bike-sharing in Washington, DC and Brisbane, Australia: implications for policy and planning. Int J Sustain Transp 10(5):441–454

Biehl A, Ermagun A, Stathopoulos A (2018) Community mobility MAUP-ing: a socio-spatial investigation of bikeshare demand in Chicago. J Transp Geogr 66:80–90

Capital Bikeshare (CaBi) (2019) Experience Metro DC on two wheels. https ://www.capit albik eshar e.com/how-it-works . Accessed 31 Aug 2019

DDOT (2015) District of Columbia Capital Bikeshare Development Plan. https ://ddot.dc.gov/sites /defau lt/files /dc/sites /ddot/page_conte nt/attac hment s/Draft %20DDO T%20Bik eshar e%20Dev elopm ent%20FIN AL%20red uced.pdf. Accessed 31 Aug 2019

de Chardon CM, Caruso G (2015) Estimating bike-share trips using station-level data. Transp Res Part B Methodol 78:260–279

DeMaio P (2009) Bike-sharing: history, impacts, models of provision, and future. J Public Transp 12(4):3

De Nazelle A, Nieuwenhuijsen MJ, Antó JM, Brauer M, Briggs D, Braun-Fahrlander C, Cavill N, Cooper AR, Desqueyroux H, Fruin S, Hoek G, Panis LI, Janssen N, Jerrett M, Joffe M, Andersen ZJ, van Kempen E, Kingham S, Kubesch N, Leyden KM, Marshall JD, Matamala J, Mellios G, Mendez M, Nassif H, Ogilvie D, Peiró R, Pérez K, Rabl A, Ragettli M, Rodríguez D, Rojas D, Ruiz P, Sallis JF, Terwoert J, Toussaint JF, Tuomisto J, Zuurbier M, Leb-ret E (2011) Improving health through policies that promote active travel: a review of evidence to support integrated health impact assessment. Environ Int 37(4):766–777

El-Assi W, Mahmoud MS, Habib KN (2017) Effects of built envi-ronment and weather on bikesharing demand: a station-level analysis of commercial bikesharing in Toronto. Transportation 44(3):589–613

Fishman E, Washington S, Haworth N, Mazzei A (2014) Barriers to bikesharing: an analysis from Melbourne and Brisbane. J Transp Geogr 41:325–337

Goodman A, Cheshire J (2014) Inequalities in the London bicycle shar-ing system revisited: impacts of extending the scheme to poorer areas but then doubling prices. J Transp Geogr 41:272–279

Handy S, Van Wee B, Kroesen M (2014) Promoting cycling for trans-port: research needs and challenges. Transp Rev 34(1):4–24

Page 16: Impac B P P U Rv: A Evalua Thr Cura L D Rv Transac Trips · 6 Journal of Big Data Analytics in Transportation (2020) 2:1–16 1 3 theseusers.Thus,trips#12062and#12065wouldbeidenti-edastripsbyusersof‘24-h’and‘3-day’passes.However,

16 Journal of Big Data Analytics in Transportation (2020) 2:1–16

1 3

Jurdak R (2013) The impact of cost and network topology on urban mobility: a study of public bicycle usage in 2 US cities. PLoS One 8(11):e79396

Kaviti S (2018) Profiles, preferences and reactions to price changes of bikeshare users: a comprehensive look at capital bikeshare data. A dissertation submitted in partial fulfillment of requirements for doctor of philosophy degree. George Mason University

Kaviti S, Venigalla MM (2019) Modeling bikeshare user sensitivity and elasticity to pricing using monadic design and ordered logit. In: Proceedings of the 15th world conference on transport research (WCTR). May 26–31, 2019. Mumbai

Kaviti S, Venigalla MM, Zhu S, Lucas K, Brodie S (2018) Impact of pricing and transit disruptions on bikeshare ridership and revenue. Transportation. https ://doi.org/10.1007/s1111 6-018-9904-5

Kaviti S, Venigalla MM, Lucas K (2019) Travel behavior and price preferences of bikesharing members and casual users: a capital bikeshare perspective. Travel Behav Soc 15:133–145. https ://doi.org/10.1016/j.tbs.2019.02.004

Kodransky M, Lewenstein G (2014) Connecting low-income people to opportunity with shared mobility. Institute for Transportation and Development Policy and Living Cities

Ma T, Liu C, Erdogan S (2014) Bicycle sharing and transit: does capital bikeshare affect metrorail ridership in Washington, DC. University of Maryland, College Park

NACTO (2018) Bike Share in the US: 2010–2016 | National Associa-tion of City Transportation Officials. https ://nacto .org/bike-share -stati stics -2016/. Accessed 6 Apr 2019

NACTO (2019) 84 Million Micromobility Trips in 2018. National Association of City Transportation Officials. https ://nacto .org/2019/04/17/84-milli on-trips -on-share d-bikes -and-scoot ers/. Accessed 16 June 2019

Pucher J, Dill J, Handy S (2010) Infrastructure, programs, and poli-cies to increase bicycling: an international review. Prev Med 50:S106–S125

Rixey R (2013) Station-level forecasting of bikesharing ridership: sta-tion network effects in three US systems. Transp Res Rec J Transp Res Board 2387:46–55

Romanillos G, Zaltz Austwick M, Ettema D, De Kruijf J (2016) Big data and cycling. Transp Rev 36(1):114–133

Shaheen SA, Martin EW, Cohen AP, Chan ND, Pogodzinski M (2014) Public bikesharing in North America during a period of rapid expansion: understanding business models, industry trends & user impacts, MTI Report 12-29

Shaheen S, Cohen A, Zohdy I (2016) Shared mobility: current practices and guiding principles (No. FHWA-HOP-16-022)

Stonebraker M, Bruckner D, Ilyas IF, Beskales G, Cherniack M, Zdonik SB, Xu S (2013) Data curation at scale: the data tamer system. In: CIDR

Venigalla M, Kaviti S, Pierce W, Zhu S (2018) Analysis of single-trip fare data for capital bikeshare. District Department of Transporta-tion (DDOT). Final Report

Venigalla MM, Kaviti S, Brennan T (2019) Impact of the introduction of single-trip fare product on bikeshare usage and revenue: the Capital Bikeshare experience. In: Proceedings of the 15th world conference on transport research (WCTR), May 26–31, 2019, Mumbai

Wang X, Lindsey G, Schoner JE, Harrison A (2015) Modeling bike share station activity: effects of nearby businesses and jobs on trips to and from stations. J Urban Plan Dev 142(1):04015001

Weather Underground (2016) http://www.weath erund ergro und.com. Accessed 31 Mar 2018

Zhang Y, Thomas T, Brussel M, Van Maarseveen M (2017) Explor-ing the impact of built environment factors on the use of public bikes at bike stations: case study in Zhongshan, China. J Transp Geogr 58:59–70

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.