Usage-Based Pricing of the Internet...Usage-Based Pricing of the Internet Aviv Nevoy Northwestern University John Turnerz University of Georgia Jonathan Williamsx University of Georgia

Usage-Based Pricing of the Internet�

Aviv Nevoy

Northwestern UniversityJohn Turnerz

University of Georgia

Jonathan Williamsx

University of Georgia

Preliminary and IncompleteNovember 2012

Abstract

We estimate demand for residential broadband to study the e¢ ciency propertiesof usage-based billing. Using detailed-high frequency internet protocol data records,we exploit variation in the intertemporal tradeo¤s faced by subscribers to estimate thedistribution of subscribers� preferences for di¤erent characteristics of service; accessand overage fees, usage allowances, and connection speeds. We �nd signi�cant hetero-geneity in tastes along each dimension of service. Using these estimates, we examinethe e¢ ciency of various 3-part tarri¤ pricing schedules. We �nd that usage-based pric-ing models currently being employed in North America are successful at eliminatinglarge volumes of low-value tra¢ c while having a minimal impact on subscriber welfare.These �ndings provide strong support for the FCC�s backing of the industry�s moveaway from �at-rate pricing.Keywords: Demand, Broadband, Dynamics, Usage-based Pricing, Welfare.JEL Codes: L13.

�We thank those North American Internet Service Providers that provided the data used in this paper.We thank Terry Shaw, Jacob Malone, Scott Atkinson, and seminar participants at Ga. Tech and UGA forinsightful comments that signi�cantly improved this paper. Jim Metcalf provided expert computational andstorage support for this project. All remaining errors are ours.

yDepartment of Economics, Northwestern University, [email protected], ph: (847) 491-7001.zDepartment of Economics, University of Georgia, [email protected], ph: (706) 542-3376.xCorresponding Author: Department of Economics, University of Georgia, [email protected], ph: (706)

542-3689.

1

1 Introduction

In the U.S., �last mile" connectivity to the internet is privately provided by telecomm (e.g.,

AT&T) and/or cable (e.g., Comcast) companies. This leaves the problem of allocating

scarce network resources (i.e., bandwidth) at the discretion of the internet service provider

or ISP. This ability of ISPs to price the delivery of content from the internet to subscribers

(and vice versa) has important implications for the future development of online content,

communications, and more generally, the way in which people use the internet. Therefore,

this has lead to signi�cant discussion on the way the Internet should be regulated.

During the past decade in the U.S., ISPs have typically sold unlimited access to the

internet for a �xed monthly fee. During this time, the average residential subscriber�s usage

has grown 50% annually. This dramatic growth in usage has led to a shift in the industry

towards usage-based pricing plans similar to those commonly associated with cellular phones.

Typically, these plans take the form of a three-part tari¤: a �xed access price, a usage

allowance, and a marginal price for usage in excess of the allowance. Just this year, two of

the largest cable providers, Comcast and TimeWarner Cable, conducted trials of usage-based

pricing in select markets.

ISPs argue that usage-based pricing is necessary to curtail the usage of the small number

of subscribers that dramatically drive up network costs and degrade the quality of service

for other subscribers. This views usage-based pricing as a type of Pigouvian tax that helps

equate a subscriber�s private bene�t to costs realized by the ISP (i.e., network investment)

and other subscribers (i.e., degraded service). ISPs also argue that usage-based pricing gives

the right incentives for content developers to minimize the bandwidth requirements of their

applications. For example, Youtube recently added an option for users to degrade the quality

of video streams. This allows the subscriber to degrade quality to a level acceptable to them,

while avoiding overage charges and minimizing any costs to the ISP or other users of the

tra¢ c they generate. These types of arguments led the Federal Communications Commission

(FCC) to recently back the practice, "Usage-based pricing would help drive e¢ ciency in the

networks," Julian Genachowski, FCC Chairman (Chicago Tribune, May 22, 2012).

2

The recent shift in the industry towards usage-based pricing models, along with the sup-

port of government regulators, has given rise to numerous organizations devoted to pre-

venting it. These include web sites (e.g. www.stopthecap.com and openmedia.ca/meter)

that monitor ISPs�activities for indications (e.g., reporting usage on subscribers�bills) that

usage-based pricing will be introduced. The web sites then ask their followers to bombard

the ISP with complaints, often providing direct contact information for the companies�exec-

utives, in the hope of preventing or delaying it. More formal organizations lobby regulators

and legislators directly. Geekdom is one such organization. "It�s like locking the doors to the

library," Nicholas Longo, Geekdom Director (NY Times, June 26, 2012). Generally, these

organizations believe that the activities of high volume subscribers are of high value and any

type of �caps" or usage-based pricing will result in signi�cant welfare losses for subscribers.

The extant academic literature on topics related to the economics of internet access is

very limited and almost exclusively theoretical in nature. Existing theoretical studies (e.g.

Odlyzko (2012)) often reach very strong and con�icting conclusions regarding the welfare

consequences of usage-based pricing. To date, the lack of detailed data on consumption of

internet content has limited empirical work on the topic to only a couple papers: Goolsbee

and Klenow (2001) and Lambrect et. al. (2007). Goolsbee and Klenow (2001) use Forrester

Technographics Survey data on individuals� time spent on the internet and earnings to

innovatively estimate the private bene�t to subscribers of residential broadband. However,

the authors�estimate relies on the potentially dubious assumption that an hour spent on the

internet is an hour in forgone wages.

The lack of more empirical studies on these important issues is largely due to the propri-

etary nature of, and technological constraints associated with collecting, much of the data

required to study these issues. However, the dramatic growth in usage over the past decade

has forced ISPs to invest in technology to track usage and better manage scarce network

resources. These investments in data collection have created an opportunity for academic

researchers to better understand the economics of broadband internet access. In this paper,

we use 5 months of detailed hourly data on internet utilization data we obtained from a group

3

of North American ISPs, some of which employ usage-based pricing, to study the impact

of usage-based pricing on subscribers. We observe the total volume of content downloaded

and uploaded each hour for approximately 5 months from late 2011 to early 2012 for over

30,000 subscribers.

To study the welfare implications of usage-based pricing, we begin by building a dynamic

model of subscribers�inter-temporal decision making throughout a billing cycle under usage-

based pricing. Speci�cally, we model subscribers as utility-maximizing agents that solve a

dynamic optimization program each billing cycle. While we do not have variation in the

service plans or tiers o¤ered to subscribers during the sample period, the high frequency

nature of our data and variety of three-part tarri¤s o¤ered to consumers allow us to accurately

estimate demand. In particular, the high frequency data allows us to exploit variation in

the shadow price, i.e., implications of current consumption on the probability of incurring

overage charges later in the billing cycle, to trace out marginal utility for subscribers. In

addition, selection into plans or choice of a particular three-part tarri¤ reveals a great deal

about preferences by revealing an average willingness to pay for content and a preference over

the speed of one�s connection (i.e., Mb/s). Our provider o¤ers plans ranging from almost

linear tarri¤s (i.e., very low usage allowances) to plans with allowances well over 100GBs.

The connection speeds, overage prices, and usage allowances are all non-decreasing in the

�xed access fee.

A potential concern with such an approach with such a model is that it is wrong to have

each subscriber solve his/her optimization problem in isolation. Or, network externalities

among users makes the problem a dynamic game in which subscribers choose when to use

the internet based on preferences and expectations regarding congestion in the network.

However, the data we use in this paper comes from an ISP that operates an over-provisioned

and pristine network. This allows us to accurately model a subscriber�s usage decision as

an independent one. We discuss how we can measure the absence of network externalities

in this provider�s network in Section 2.

To estimate the model, we adapt the techniques of Ackerberg (2009), Bajari et. al. (2007)

4

and Fox et. al. (2012). These techniques avoid very computationally expensive �xed-point

estimation algorithms. In particular, it is only necessary to solve the dynamic programming

problem for each type of agent a single time. Second, one can relax parametric/distributional

assumptions typically made when estimating such dynamic models. This is critical for our

purposes given the limited amount of information we have about a subscriber and the extreme

heterogeneity in usage behavior is di¢ cult to model. Finally, the techniques naturally deal

with di¢ cult to model forms of selection, which is an important issue in our application.

Subscribers select into a service tier or plan and we only observe usage under that optimally

selected plan.

The application itself is of interest and has important policy implications. As desirable,

yet bandwidth-intensive, applications continue to be developed and subscriber usage grows,

it will be increasingly important to have accurate measures of the demand for content. Our

results largely support the current regulatory stance of the FCC. We �nd that usage-based

pricing, as currently implemented by North American providers, is successful at removing

a great deal of low-value tra¢ c from the networks. This is largely due to the negative

correlation between the value and volume of typical online activities (e.g., 5GBs to stream

a movie and only bytes to send 1000s of emails).1 We show that subscribers derive a great

deal of utility from the �rst bytes of tra¢ c generated on a broadband connection, but utility

diminishes rapidly thereafter. This supports the goal of the FCC to increase the reach of

basic broadband service to more rural and under-served areas as stated in their National

Broadband Plan.2 Finally, our results are important for how content will be delivered in

the future by telecomm and cable companies. There is an appeal to unicasting, or allowing

each user to view content at their convenience. However, the low willingness to pay we �nd

for this convenience and high costs (i.e., large amount of additional tra¢ c to accomodate on

networks) suggests that into the forseeable future (a short time in this industry), the cost

e¤ectiveness of broadcasting will continue to dominate arguments against usage-based billing

1This is consistent with Net�ix�s failed attempt to raise the prices of their service in 2012.2The FCC is currently working with telecomm and cable providers to o¤er a basic $9.99 tier to low-income

households. See Greenstein and Prince (2008) for more on the historical reach of broadband services.

5

of the internet.

The remainder of the paper is as follows. In Section 2 we discuss our data in greater

detail and provide reduced-form results that motivate assumptions made in the structural

model and demonstrate a high price elasticity for online content. In Section ?? we discuss

the model used to capture the intertemporal decisions of subscribers regarding consumption

of content under usage-based billing. Section 4 presents our methodology for estimating

the structural model and presents the results. Sections 5 and 6 presents the results of our

counterfactual exercise to identify the bene�t to subscribers of removing usage allowances

and �nal conclusions, respectively.

2 Data

The data used in this paper are Internet Protocol Data Record (IPDR) data. IPDR is a

standardized framework for collecting usage and performance data from IP-based services

and is currently the most popular way for cable operators to e¢ ciently measure subscriber

usage. The IPDR framework is supported by a DOCSIS (Data Over Cable Service Interface

Speci�cation) 2.0/3.0 compliant CMTS (Cable Model Termination System). A CMTS uses

a collector to gather data at a minimum con�gurable reporting period of 15 minutes. Our

data reports the volume of a subscriber�s usage every hour, linking subscriber�s to their

usage through their cable modem�s (CM) MAC (Media Access Control) address. The

usage reported for a subscriber does not re�ect DOCSIS framing overhead; however, it does

include operator-initiated management, control tra¢ c, and Internet-originated tra¢ c (pings,

port scans, etc) which must be considered when metering internet usage. See Clarke (2009)

for more details on the structure of DOCSIS cable networks.

The unit of observation for the IPDR data is a MAC address and a record creation time,

or month, day, and hour. The data also reports the DOCSIS mode, 2.0 or 3.0, of each

subscriber�s modem. DOCSIS 3.0 modems permit greater provisioned speeds, as DOCSIS

2.0 modems limit a subscriber�s connection to no more than 42.88 Mb/s. In the data, we

observe bytes and packets passed by a subscriber, both in the upstream (e.g., uploading a

6

�le to Dropbox) and downstream (e.g., streaming a movie from Net�ix) directions. For

billing purposes, and consequently our purposes, the direction of the tra¢ c is ignored and

we examine the total tra¢ c in either direction.

In addition to the number of bytes and packets passed by a subscriber, we also observe

the number of packets that are delayed or dropped by the network for each subscriber.

Delayed packets correspond to those that are requested by a subscriber in excess of their

connections provisioned speed, e.g., requesting packets at a rate of 10 Mb/s on a connection

that is provisioned for 8 Mb/s. Typically, delayed packets are ultimately passed. Dropped

packets are those that never reach their destination. Observing the extent of dropped

packets is extremely important, because in a network that is inadequately provisioned (i.e.,

not enough bandwidth to handle requests for content) externalities among users can result

in interdependent demands. Fortunately, our data comes from a market and internet service

provider (ISP) that operates an overly-provisioned and pristine network. Over a 5 month

period, not a single subscriber had more than 0:001% of packets dropped in any one hour

period. Our discussions with industry experts suggest that dropped packets in excess of 1%

correspond to a degraded quality of service that would be noticeable to the subscriber. For

this reason, in modeling a subscriber�s usage decision, we reasonably assume that their usage

decision does not depend on concerns over congestion in the network. Otherwise stated,

we assume independence of subscribers�demand functions. IPDR data does identify the

CMTS interface that a user is linked to. Using this information, one can infer which users

are connected on the network and allow for interdependent demands. See Malone (2012)

for an empirical study of network externalities in broadband networks.

As mentioned above, a subscriber�s usage is linked to the MAC address of their cable

modem. Through the MAC, we�re able to link in information about the subscriber�s service

tier (e.g. usage allowance of 50 GB and a provisioned downstream (upstream) speed of 8

Mb/s (1 Mb/s)) and the day that the billing cycle resets (e.g., usage counters reset to zero on

7th of each month). We have monthly reports that give the service tier for each subscriber.

Not surprisingly, since our provider did not change any features of any tiers, we see very

7

few subscribers (less than 0:1%) switch tiers during the �ve months for which we have data.

This lack of variation in the features of tiers would seem to be discouraging for identi�cation

purposes. However, as we discuss below, the high frequency nature of the data allows us

to see intertemporal decisions made by every subscriber at each point in the billing cycle,

compensating for the lack of variation in features of the service tiers.

2.1 Sample and Descriptive Statistics

Our sample includes hourly usage for approximately 30,000 subscribers from October 1st of

2011 to February 14th of 2012 in a single metropolitan market. Due to the sheer volume of

data and the fact that over 85% of residential�s usage is during peak hours (7pm-11pm), we

aggregate usage to a daily level. See Figure 1 for average subscriber utilization in Kb/s

over the day and across all subscribers and tiers. In addition, we remove subscriber-day

observations that are not part of a complete billing cycle for a subscriber. This results in

either 3 or 4 complete billing cycles for each subscriber, depending on when the subscriber�s

usage counter resets each month.

The internet service provider o¤ers multiple tiers, which di¤er along a few dimensions of

signi�cance. Similar to broadband service o¤ered by a typical North American provider, our

provider di¤erentiates service tiers by provisioned speed, ranging from 2 Mb/s (1 Mb/s) to

60 Mb/s (2 Mb/s) downstream. The fairly uncommon aspect of our provider�s broadband

service is that each tier is priced with a three-part tari¤, similar to many cell phone plans,

which includes an access fee, a usage allowance, and a per GB overage fee. The access fees

are a �xed fee paid each month, irrespective of usage, while the usage allowance permits the

subscriber to use a certain amount of data before incurring overage fees for each GB of data

in excess of the allowance. From the least to the most expensive tier (lowest to highest

access fee), the usage allowance and provisioned speed are non-decreasing.

Figures 2a and 2b plot monthly usage quantiles for subscribers on the least and most

expensive tiers as a percentage of the usage allowance, respectively. Figure 2a (2b) shows

that on the lowest (highest) tier approximately 30% (20%) of subscribers exceed their usage

8

allowance. The large number of subscribers well below the usage allowance demonstrates

the importance of allowing for satiation for online content in subscribers�preferences. These

�gures also point to the large degree of heterogeneity in usage across subscribers, even within

a tier, with the heaviest users on each tier in a month use 20 times more than the median

user. We discuss how we control for selection into service tiers when estimating demand in

Section 4.

Table 1 breaks down usage at a daily frequency, the unit of observation for the remainder

of our analysis, aggregating across service tiers. Average usage in a month is 21.7 GBs,

while median usage is only 8.5 GBs. This corresponds to an interquartile range of 56 GBs,

with the 75th percentile (62 GBs) over 10 times the 25th percentile (6 GBs). On average,

approximately 6% of users exceed their usage allowance. Of those who exceed their usage

allowance, the average (median) overage is 26.9 GBs (14.2 GBs). For all subscribers, the

median price paid per GB of content is $5.73, while the 25% is $1.79 and the 75% is $19.73.

As we discuss in Section 4.1, these average willingness to pay statistics will be important for

inferring preferences for the subscribers that have a negligible probability of exceeding their

usage allowance in a given month (face a shadow price near zero for consuming content at

each point in the billing cycle).3

2.2 Preliminary Analysis

Before moving to the structural model we provide evidence that suggests that subscribers

are aware of their state (position relative to usage allowance and time remaining in month)

and are forward looking. Similar to many cell phone operators, our provider gives notices

via E-mail and text as a subscriber nears their allowance, allows the subscriber to login and

check their usage to date, and provides an application for web browsers that monitors usage

in real time. Thus, the cost of verifying one�s state should is small. We begin by running

the following regression

cikmt = �0 + �1

�Cikm(t�1)

Ck

�+ �2daysleftmt + dowmt�+ timet + �im + �ikmt, (1)

3Past studies, e.g., Lambrecht et. al. (2007), have completely relied on such variation to identify demand.

9

where the dependent variable, cikmt, is subscriber i�s usage on plan k, t days from the end of

the billing cycle in month m. The ratio,Cikm(t�1)

Ck, is the proportion of the usage allowance

used to date and is given by the subscriber�s total usage in the previous (t� 1) days of the

billing cycle, Cikm(t�1) =Pt�1

�=1 cikm� , divided by the usage allowance on plan k, Ck. We

also include daysleftmt, the number of days left in the billing cycle, dummies for the days of

the week, dowmt, and a time trend, timet. The inclusion of subscriber-billing month �xed

e¤ects removes persistent forms of heterogeneity across subscribers as well as any billing-cycle

speci�c shocks to usage (e.g., seasonality or trends in usage).

Intuitively, �1 should be negative while the sign of �2 is ambiguous. As the probability

that a subscriber will exceed the usage allowance increases, i.e., the shadow price of current

consumption increases, a subscriber with a high price elasticity of demand will tend to

pull back (�1 < 0) on usage. This reduction in usage may occur well in advance of the

usage allowance if the subscriber wants to ensure that overage charges will not be incurred.

Similarly, for any level of previous usage, a subscriber further from the end of the billing

cycle may want to reduce current usage (�2 < 0) to ensure the usage allowance is not exceed.

However, early in the billing cycle, a great deal of uncertainty regarding future demand

is yet to be realized so the user may want to ensure that they use the entire allowance

(�2 > 0). It is important to note that any form of positive serial correlation in usage

will work against �nding a negative relationship between consumption (cikmt) and previous

consumption (Cikm(t�1)

Ck). Such correlation in usage may arise from the dynamics of the

subscribers�intertemporal decision process itself. For example, a subscriber that enters an

undesirable state (i.e. high cumulative usage early in a billing cycle) may respond by using

the service consistently less throughout the remainder of the billing cycle. This points to

the importance of modeling the entire process for consumption, not just the process near

any nonlinearities in the pricing schedule. We discuss these issues further in Section 4.

The estimates of Equation 1, and variations of, are reported in Table 1a�1d. In each of

the Tables, Columns 1 and 2 report the estimates of Equation 1 where the dependent variable

is in levels and log-transformed, respectively. Columns 1 and 2 both report a negative sign

10

for the proportion of the usage allowance used to date. The pull back in consumption

of -0.255 GB or approximately 17% of daily consumption is statistically and economically

meaningful. These results are consistent with subscribers being aware of their states and

adjusting consumption in response to the probability of exceeding the usage allowance and

incurring overage charges, i.e., an increase in the shadow price of consumption. Only one

of the estimates of the coe¢ cient on the days remaining in the billing cycle is statistically

signi�cant. This may be due in part to the strong correlation between the proportion of

the usage allowance used in a month and the number of days remaining in the month. The

other controls show that the weekend is the heaviest usage day (Sunday is omitted) and

there is some evidence of a weak positive trend in daily usage.

The linear relationship between a subscriber�s current usage and their position relative

to the usage allowance assumed in Columns 1 and 2 is clearly ad-hoc. In particular, one

may expect a highly nonlinear relationship, as it�s not clear when and how a subscriber

will begin to respond to an increasing probability of exceeding their usage allowance. This

would depend on a number of things, including any uncertainty in future demand for internet

content. To better capture any such dynamics in usage, we specify a set of indicators for

a subscriber�s position relative to their usage allowance; between 50% and 75%, 75% and

90%, 90% and 95%, and 95 to 100%, and over the allowance. Columns 3 and 4 of Table 1

report these estimates with the dependent variable in levels and log transformed, respectively.

Column 3 shows a monotonically decreasing consumption pro�le for subscribers nearing (and

exceeding) their usage allowance. Subscribers increasingly reduce consumption as it becomes

clear that the usage allowance will be binding and the shadow price of current consumption

approaches the per-unit overage price. The results in Column 4 are very similar, once a user

exceeds 95% of their usage allowance, they�ve reduced consumption by approximately 27%

and have fully internalized the overage price. Yet, again, well in advance of these levels,

subscribers begin to account for the possibility of exceeding the usage allowance. In the next

section, we formalize this intuition by modeling the intertemporal decision making process

facing subscribers.

11

3 Model

3.1 Utility

We assume consumers derive utility from content and a numeraire good. To consume

content, each consumer must choose a tier or plan, indexed by k: Each plan is characterized

by the speed sk by which content is delivered over the internet, by the usage allowance Ck, by

the �xed fee Fk and by the per-unit price of usage in excess of the allowance, pk. Speci�cally,

Fk pays for all consumption up to Ck, while all units above Ck cost pk per unit. For any

plan, the number of days in the billing cycle is T .

Utility is additively separable over all days in the billing cycle.4 Let consumption of

content on day t of the billing cycle be ct and let consumption of the numeraire good on day

t be yt: We specify the simple quasi-linear form, where the �ow of utility is quadratic in ct.

Speci�cally, a consumer of type h on plan k has

uh(ct; yt; k) = �t ln(1 + ct)� ct (�1h � �2h ln(sk)) + yt;

where the time-varying unobservable, �t; is not known to the subscriber until period t and is

independently and identically distributed on [0; �] according to a distribution Gh:5 Hence,

the consumer�s marginal utility varies randomly across days in ways that the consumer cannot

predict. The speci�cation includes a constant marginal cost of consuming online content,

�1 � �1 ln(sk), that is decreasing in the speed of the connection, sk. This implies that the

consumer has a satiation point, which captures key features of the data. All parameters

di¤er across types of consumers.

Letting income be I and letting total consumption since the beginning of the billing cycle

be Ct �Pt

j=1 cj and Yt �Pt

j=1 yt; respectively, de�ne the monthly budget constraint as

Fk + pk(CT � Ck)1�CT > Ck

�+ YT � I; (2)

4In this way, we assume that content with a similar marginal utility is generated each day or constantlyrefreshed. This may not be the case for a subscriber that has not previously had access to the internet.

5The right-truncation of G is necessary to ensure that the consumer can a¤ord any e¢ cient level of dailyconsumption. Let h, sk and pk be the highest levels of these parameters. It su¢ ces to assume thatTpk ( h + ln(sk) + �) < I.

12

where 1 [�] is the indicator function.

Denote the discount factor � 2 (0; 1): Conditional on choosing plan k, the consumer�s

problem is to choose daily consumption to maximize

U =

TXt=1

�t�1E [uh(ct; yt; k)] ; subject to (2).

Throughout this paper, we will assume that all consumers have su¢ cient income to pay for

satiation levels of content.

3.2 Optimal Consumption

The subscriber�s problem is a �nite-horizon dynamic-programming problem. Consider the

terminal period (T ) of a billing cycle and denote the remaining allowance CkT �MaxfCk�

CT�1; 0g: The e¢ ciency condition for optimal consumption depends on whether it is optimal

to exceed CkT . Intuitively, if the consumer is well below the cap (i.e., CkT is high) and does

not have a particularly high draw of �T , then she consumes content up to the point where

the marginal utility of content is zero. If marginal utility at ct = CkT is positive but below

pk; then it is optimal to consume the remaining allowance. If one is already above the cap

(i.e., CkT = 0) or draws an extremely high �T ; then it is optimal to consume up the the

point where the marginal utility of content equals the overage price.

In each of these situations in the last period, there are no intertemporal tradeo¤s. Usage

today has no impact on next period�s state, as cumulative consumption resets to zero at

the beginning of each billing cycle. Thus, the problem is reduced to solving a static util-

ity maximization problem, given a subscriber�s cumulative usage up until period T , CT�1,

and the realization of preference shock, �T , which together determine the implications for

overage charges and the marginal utility of usage, respectively. Denote this optimal level of

consumption, for a given realization of �T , by c�hkT .

For a given realization of the preference shock in period T , the suscriber�s utility from

13

entering the �nal period with state CT�1 and behaving optimally in the terminal period is

VhkT (CT�1; �T ) =

24 �T ln(1 + c�hkT )� c�hkT (�1h � �2h ln(sk)) + yt�pkfc�hkT1�CT�1 > Ck

��(CT�1 + c�hkT � Ck)1

�CT�1 < Ck < CT�1 + c

�hkT

�g.

35Prior to the realization of �T , the subscriber�s expected utility is then

E [VhkT (CT�1)] =

�Z0

VhkT (CT�1; �T )dGh(�T ).

The expected value function, E [VhkT (CT�1; �T )], is de�ned for all CT�1 > 0. Similarly, the

expected usage of a subscriber prior to the realization of �T is given

E [c�hkT (CT�1)] =

�Z0

c�hkT (CT�1; �T )dGh(�T ).

Other conditional moments of optimal consumption can be calculated similarly for each

state, (CT�1; t).

Similarly, for any day in the billing period besides the last day, t < T , the optimal policy

function for a subscriber of type h on plan k is

c�hkt(Ct�1; �t) = argmaxct

24 �T ln(1 + ct)� ct (�1h � �2h ln(sk)) + yt�pkfct1�Ct�1 > Ck

�� (Ct � Ck)1

�CT�1 < Ck < Ct

�g

+�E�Vhk(t+1)(Ct�1 + ct)

�.

35and the value functions are given by

Vhkt(Ct�1; �t) =

24 �T ln(1 + c�hkt)� c�hkt (�1h � �2h ln(sk)) + yt�pkfc�hkt1�Ct�1 > Ck

�� (CT�1 + c�hkt � Ck)1

�CT�1 < Ck < CT�1 + c

�hkt

�g

+�E�Vhk(t+1)(CT�1 + c

�hkt)�:

35for each ordered pair (Ct�1; �t).6 Similar to the terminal period, the expected value function

is

E [Vhkt(Ct�1)] =

�Z0

Vhkt(Ct�1; �t)dGh(�t).

6Notice this formulation of the optimization problem assumes that the subscriber is aware of their cumu-lative consumption, Ct�1, on each day in the billing cycle. This is a realistic assumption for our data, asthe results in Table 2 demonstrate.

14

for all t < T = 30 and the mean of the mean of a subscriber�s usage at each state is

E [c�hkt(Ct�1)] =

�Z0

c�hkt(Ct�1; �t)dGh(�t): (3)

The policy functions for each type (h) of subscriber imply a distribution for the time spent

in particular states (t; Ct�1) over a billing cycle. We discuss solving for this distribution,

generated by optimal subscriber behavior, and how it, along with the moments of usage,

forms the basis of our method of moments approach discussed in Section 4.

3.3 Model Solution and Stationary Distribution

Let Gh denote normal distribution, truncated at �, with mean �h and variance �2h. For

a plan, k, and subscriber type, h, characterized by the vector (�1h; �2h; �h; �h), the �nite-

horizon dynamic program described above can be solved recursively, starting at the end of

each billing cycle (t = T ). To do so, we discretize the state space for Ct to a grid of 1800

points with spacing of size, csk GBs, for each plan, k. Our data is hourly, so time is naturally

discrete, but we aggregate time up to the day (t = 1; 2; :::; 30 over a billing cycle with T = 30

days).7 This discretization leaves �t as the only continuous state variable. Because the

subscriber does not know �t prior to period t, we can integrate this out and the solution to

the dynamic programming problem for a subscriber of each type h can be characterized by

the expected value functions, EVhkt(Ct�1), and policy functions, c�hkt(Ct�1; �t). To perform

the numerical integration over the bounded support [0; �] of �t, we use adaptive Simpson

quadrature.

Having solved the program for a subscriber of type h, one can then generate the transition

process for the state vector implied by the solution to the dynamic program. The transition

probabilities between the 54,000 possible states (1800*30) are implicitly de�ned by threshold

values for �t. For example, consider a subscriber of type h on plan k, that has consumed

Ct�1 prior to period t. The value of �t that makes a subscriber indi¤erent between setting

ct = zcsk rather than ct = (z + 1)csk (advance cumulative consumption by z or z + 1 steps

7This aggregation loses very little information, as over 80% of usage is on peak (between 6pm and 11pm).

15

of size csk) equates the marginal utility (net of any overage charges) of an additional unit of

consumption to the loss in the net present value of future utility

��EVhk(t+1)(Ct�1 + (z + 1)cs)� EVhk(t+1)(Ct�1 + zcs)

�.

These thresholds, which along with all subscribers� initial condition, (C0 = 0), de�ne the

transition process between states. Subscribers will consume no less if speed (sk) is higher

(lower opportunity cost of time), the overage price is lower, and the gradient of the expected

value function is not too steep in cumulative consumption.

For each subscriber type, h, and plan, k, we characterize this transition process by the

cdf of the stationary distribution that it generates,

�hkt(C) = P (Ct�1 < C) ,

the proportion of subscribers that have consumed less than C through period t of the billing

cycle.8 These probabilities, for di¤erent values of C, are directly observable in our data and

form the basis for our method of moments approach discussed in Section 4.

3.4 Optimal Plan Choice

After solving the dynamic program a subscriber type, h, under every plan, k, selection into

plans by subscribers can be naturally dealt with. A subscriber selects a plan with knowledge

of their type, (�1h; �2h; �h; �h), and the features of the plan, but not the realization of their

particular needs (realizations of �t for t = 1::T ) over the course of a billing cycle. In this case,

the subscriber will select the plan, k = 1; ::; K, with the highest expected utility, or choose

no plan at all, k = 0. To identify the optimal plan for each type, one can simply �nd the

plan that gives the highest expected utility at the beginning of a billing cycle, E [Vhk1(0)],

and then ensure that this is greater than zero (the outside option�s value, E [Vh01(0)] is

normalized to 0). The optimal plan for a type h subscriber is then

k�h = argmaxk2f0;1;:::;Kg

fE [Vhk1(0)]� Fkg .

where the �xed fee for the outside option is, F0 = 0.8The discretized state space makes this cdf a step function.

16

4 Estimation

We use a method of moments approach to recover the primitives of the model, the joint

distribution of the parameter vector (�1h; �2h; �h; �h). Our model predicts moments of

optimal behavior at each state, along with the time spent in di¤erent states, (Ct�1; t), for

each subscriber type. We seek to �nd the distribution of subscriber types that matches

the distribution of Ct�1 in the population of subscribers at each point in the billing cycle,

t. This approach has the advantage of exploiting the high-frequency nature of our data, as

it allows us to use variation in the intertemporal decisions made by subscribers at di¤erent

states, rather than the end-product of these decisions (e.g. monthly internet usage).

Our approach to estimation is most similar to the two-step algorithms advocated by

Ackerberg (2009), Bajari et. al. (2007), and Fox et. al. (2011). The �rst step is to recover

the moments to be matched from the data and solve the dynamic program for a wide variety

of subscriber types, (�1h; �2h; �h; �h).9 We recover both the cdf of cumulative consumption,

Ct�1, for each plan, k, at each point in the billing cycle and the unconditional mean and

variance of usage at each state, (Ct�1; t). In the second step, we follow Fox et. al. (2011)

by searching for the weights or density of each type that best match the moments recovered

from the data. The moments we chose to match were chosen for their identifying power

and computational ease. In particular, these moments are linear in the type-speci�c weights

which reduces the matching process to a linear regression subject to a linear constraint and

non-negativity restrictions. In addition to the computational advantages, this approach

has the advantage of not placing parametric restrictions on the shape of the subscriber type

distribution and naturally deals with selection (i.e., identify each type�s optimal plan, k�h, in

the �rst step).

9Fox et. al. (2011) correctly point out that identifying the correct support for the parameter vector,(�1h; �2h; �h; �h), may in fact be viewed as an additional step to the estimation process. Yet, their motivatingexample of a random coe¢ cient demand model and aggregated data (i.e., market shares for each product)is much di¤erent than our application. In particular, the authors are assuming that one observes onlyaggregate data making it impossible to know exactly what range of types are consistent with the data.However, in our application, we know the complete distribution of usage and this dramatically simpli�esidentifying the support of the type distribution that is consistent with even the most infrequent occurrencesin the data.

17

4.1 Identi�cation

To realize the full computational advantages of the Fox et. al. (2012) approach, we consider

those moments with the most identifying power and then decompose the moments into

parts that are linear in the parameters. The advantage of our data is that we observe the

distribution of actions for subscribers at each state, (Ct�1; t), along with the distribution of

subscribers across states. Thus, we observe how consumers respond to marginal (shadow)

prices ranging zero upto the overage price. This allows us to consider any moments of

the conditional distribution of consumption at those states at which a subscriber is present.

We focus on the conditional mean and variance of usage at each state. These moments are

determined by the probability that di¤erent types reach a particular state and the actions

taken at this state. Thus, it is important that our econometric approach correctly identify

both this compositional aspect of the conditional moments.

To see this, consider a model with two types, low usage (L) and high usage (H) subscribers.

Consider those states that are only be reached by the low types (i.e., low cumulative con-

sumption well into a billing period). At these states, subscribers are essentially solving

a static utility maximization problem with a marginal price of zero, as there is a negligi-

ble probability they will exceed the usage allowance (i.e., shadow price of consumption is

nearly zero). Knowing only low usage subscribers are present in these states and observing

a subscriber solve this problem each day, equating marginal utility to zero, identi�es the

parameters of the utility function for these L types. Similarly, high demand subscribers are

likely to exceed the usage allowance, equating marginal utility to the overage price from the

beginning of the billing cycle. Thus, observing variation in usage at these states identi�es

the utility function for high demand subscribers.

One might then argue that the weights, relative mass of H and L types in the population,

for each type are identi�ed by the mixture of actions taken at intermediate states that can be

reached by both types. However, this is a very weak source of identi�cation in our data due

to the large degree of heterogeneity among users. Speci�cally, consumers sort themselves out

across the state space so quickly at the beginning of the month that the only real identifying

18

variation for the weights comes on the very �rst day of the billing cycle for which all types

are at the same state. After that point, the types are essentially over disjoint portions of

the state space. Thus, the conditional moments by themselves may identify the types, h, of

subscribers that are present in the data but provide very little information on their relative

weights.

Along with the lack of identifying power for the weights, considering the conditional

moments also has the problem of being nonlinear in the weights. For any reasonable number

of types (e.g., 500 or more), this results in an infeasible constrained-nonlinear optimization

problem. For example, the conditional mean of consumption at each state is a mixture of

type-speci�c policy functions,

E [c�kt(Ct�1)] =HPh=1

E [c�hkt(Ct�1)]�ht(Ct�1),

where

�ht(Ct�1) = ht(Ct�1)�hHPh=1

ht(Ct�1)�h

:

Thus, �ht(Ct�1) is a nonlinear function of both the probability a type reaches a particular

state and the relative mass of the type in the population. The conditional variance of usage

is de�ned similarly.

To remedy both the computational and identi�cation di¢ culties with these moments, we

decompose the conditional moments in to two parts, the numerator and denominator The

numerator,HPh=1

E [c�hkt(Ct�1)] ht(Ct�1)�h,

is just the unconditional mean of usage at each state while the denominator,

HPh=1

ht(Ct�1)�h,

is the mass of subscribers at a particular level of cumulative consumption, Ct�1, on day t

of the billing cycle. Both these moments are linear in the weights, �h, and together solve

the identi�cation problem. In particular, by matching both sets of moments, we match the

19

conditional usage at each state (useful for identifying utility of each type) while also pinning

down the relative weights of each type by matching the distribution of subscribers across the

state space. The details of the matching procedure are discussed in Section 4.3.

4.2 Recovering Empirical Moments

The large number of observations and high frequency of our data, along with the low dimen-

sionality of our state space, (Ct�1; t), allows us to adopt a �exible nonparametric approach

for recovering moments from the data to match to our model. We recover both the cdf of

cumulative consumption for each day in the billing cycle, t, along with the conditional mean

and variance of usage at each state. The unconditional mean and variance are then the

product of the pdf of cumulative consumption and the conditional moments.

4.2.1 CDF of Cumulative Consumption

To recover the cumulative distribution of Ct�1 at each point in the billing cycle, t, for each

plan, k, we use a smooth version of a simple Kaplan-Meier estimator,

b�kt(C) = 1

Nk

NkXi=1

1�Ci(t�1) < C

�.

We estimate these moments for each k and t, considering values of C such that b�kt(C) 2[0:01; 0:99], ensuring that we �t the tails of the usage distribution. This results in approxi-

mately 30,000 moments to match for each plan. Let b�cdfk denote the vector of moments for

plan k.10

To compute point-wise standard errors for our estimates of these distributions, we draw

on the literature on resampling methods with dependent data, see Lahiri (2003). The de-

pendence in our data comes from the panel nature of the data, as we observe individuals

making daily decisions on consumption over 3 or 4 full billing cycles. The straightforward

structure of our panel signi�cantly simpli�es the resampling procedure. We repeatedly esti-

mate the cumulative distribution functions, leaving out di¤erent groups of subscribers. We

10We use a normal kernel and adaptive bandwidth to smooth the empirical cdf.

20

choose 1,000 randomly sampled groups of 5,000 subscribers and re-estimate each distribution

omitting the di¤erent groups of subscribers each time. These estimates are then used to cal-

culate a variance-covariance matrix, bVcdfk , for the moments for each plan, k. This weighting

matrix is used to account for the di¤erent scale of our moments and inversely weight more

variable moments.

Figures 2a, 2b, and 2c present the recovered cdf of cumulative consumption for each

day of the billing cycle, for the least expensive, most popular, and most expensive plans,

respectively. The least and most expensive plans are the two least popular plans o¤ered

by our provider. Yet, there is still a more than adequate number of observations to get an

accurate characterization of the time spent in di¤erent states by subscribers on these plans.

On both the least and most expensive plans, there are a signi�cant proportion of subscribers

that exceed their usage allowance, 20% and 30%, respectively. While the proportion of

subscribers exceeding the allowance on the most popular plan is small, the absolute number

of users is actually larger than the total number of users to exceed the allowance on all other

plans combined.

4.2.2 Unconditional Mean and Variance of Consumption

The large number of observations in and richness of our data, along with the low dimension-

ality of the our state space, (Ct�1; t), allows us to adopt a very �exible estimation approach

to recover the moments of usage at each state. Our problem essentially reduces to estimating

a surface de�ned over the (Ct�1; t) plane.

To �exibly estimate the conditional moments, we adopt a nearest neighbor approach.

Consider point in the state space, ( eCet�1;et). A neighbor is an observation in the data for

which t = et and Ct�1 is within some distance of eCet�1 (e.g., 0.5 GBs). Denote the �xed

number of nearest neighbors, those with the smallest distance from eCet�1, used to estimatethe moments at ( eCet�1;et), under plan k, by Nk( eCet�1;et). The estimate of the conditional

mean at ( eCet�1;et) isbE hc�kt( ~C~t�1)i = 1

Nkt( eCet�1)Nkt( eCet�1)X

i=1

ci,

21

where i = 1::::Nk( eCet�1;et) indexes the set of nearest neighbors. Similarly, our estimator of

the conditional variance is

bV hc�kt( ~C~t�1)i = 1

Nkt( eCet�1)� 1Nkt( eCet�1)X

i=1

�ci � bE hc�kt( ~C~t�1)i�2 .

If Nk( eCet�1;et) < 100, we do not estimate the conditional mean. If there are at least 100 butless than 500 neighbors, we use all neighbors to estimate the conditional mean. If there are

more than 500 neighbors, we use those 500 neighbors nearest to eCet�1. The unconditional

mean is then recovered as the product of the probability of observing a subscriber at state

( eCet�1;et), estimated from the cdf of cumulative consumption we recover, and the conditionalmean. Let the vector of estimates for the unconditional means and variances for plan k be

denoted by b�avgk and b�vark , respectively.

The nearest neighbor approach has a number of advantages over other estimators for

our application. First, as with any nonparametric estimator, it imposes no parametric

restrictions on the surface. Second, nearest neighbor estimators inherently are bandwidth

adaptive, see Pagan and Ullah (1999). This is particularly useful in our application. The

number of users reaching very high volumes of cumulative consumption can be small for

some plans. In these low-density situations, nearest neighbor estimators will expand the

bandwidth appropriately until a given number of observations are included in the estimator

of the surface. We do restrict the degree to which the estimator can expand the bandwidth

in these low-density situations in order to limit any potential bias such expansion might

introduce. Our results are very robust to varying both the minimum number of neighbors

(Nkt( eCet�1) > 100) required for a conditional moment to be estimated and the cuto¤ that

determines that determines how much the bandwidth can adapt in low-density situation to

identify Nkt( eCet�1) neighbors. We estimate each surface at the same set of discrete set of

state space points used when numerically solving the dynamic programming problem for

each subscriber type. We again use a block-resampling procedure to compute a variance

covariance matrix for our estimates for the conditional mean and variance, bVavgk and bVvar

k ,

respectively. We use these matrices to inversely weight more variable moments.

22

While we will match the unconditional mean and variance at each state, it is useful and

intuitive to present the conditional means which demonstrate a few properties of our data

more clearly than the analogous unconditional moments. These results are summarized

in Figures 3a-3c and 4a-4c for the mean and variance, respectively, for the least expensive,

most popular, and most expensive plans. For each plan, the surfaces characterizing the

conditional means have the same pattern.

Very early in the billing period the di¤erent types of subscribers reveal themselves. The

high types sort themselves to high cumulative consumption states and continue to consume

at a very high level. Interestingly, we see that consumption is relatively smooth across the

billing cycle, which suggests that (high volume) subscribers are quite adept at smoothing

consumption. Or, we do not see much of a drop in average consumption for the highest

volume subscribers as they near the overage, reinforcing our decision to model subscribers

as forward-looking and rational economic agents. The low-volume types tend to migrate to

low cumulative consumption states as the billing cycle progresses and continue to consume

at low levels. In addition, there is a wide variety of intermediate types that consume at a

fairly constant level throughout the billing cycle. The estimates of the standard deviation

in usage at each state follow a similar patterns to the means. The sorting is again evident

and higher mean types tend to have much more variable usage, while the standard deviation

of usage tends to be proportional to the mean.

4.3 Matching Moments

4.3.1 Objective Function

The second step of our estimation approach follows the method of moments approach due

to Bajari et. al. (2007) and Fox et. al. (2011). Our objective is to match, as closely

as possible, the empirical moments we recover from the data to those predicted by our

model. The parameters we minimize over are the relative mass of di¤erent types, �h, in the

population of subscribers that choose a plan, k.

23

Speci�cally, for each plan, our estimates of these weights are chosen to satisfy

b�k = argmin�k

gk(�k)0 bV�1

k gk(�k),

subject to the weights for all types choosing plan k summing to unity,

HkXh=1

�kh = 1,

and each of the Hk weights being nonnegative,

�kh � 0 8h.

The vector

gk(�) =

0B@ b�avgk � �avgk �kb�vark � �vark �kb�cdfk � �cdfk �k

1CAis the di¤erence between the moments recovered from the data (b�k) and the weighted averageof type speci�c moments predicted by the model (�k is a matrix with Hk columns). Thus,

each element of gk(�k), corresponds to a unique ordered pair, t and C, is of the form

b�kt(C)� HkPh=1

�h�hkt(C).

As in Fox et. al. (2011), one can then think of gk(�k) as a vector of random variables, where

the randomness is a result of sampling variability in the empirical moments (measurement

error in observed market shares in their example).

The weighting of moments by the block diagonal matrix,

bV�1k =

0B@ bVavgk 0 0

0 bVvark 0

0 0 bVcdfk

1CA�1

;

ensures that more variable moments receive relatively less weight, although we �nd virtually

no di¤erence in our estimates and unweighted estimates. After estimating the weights

associated with each type that selects a plan, we appropriately normalize the weights to

24

re�ect the number of subscribers choosing each plan to get the joint distribution of types

across all plans.

As pointed out by Bajari et. al. (2007) and Fox et. al. (2011), least squares minimization

over a bounded support and subject to linear constraints is a well-de�ned convex optimization

problem. Thus, convergence ensures a global minimum, although not necessarily unique.

In cases with many types, some of which behave similarly, identi�cation issues can arise

regardless of the richness of the moments one is matching. The approach of Bajari et. al.

(2007) and Fox et. al. (2012), as with any approach, requires that the type-speci�c matrix

of moments (�avgk , �vark , and �cdfk ) be of full rank. If types are too similar in their behavior,

collinearity issues arise and it is not possible to separately identify the weights associated

with each type. Fortunately, in our application, it is not necessary to accurately identify

the weights associated with each individual type, rather only the total mass of types that

behave similarly.

We take intuitive steps to reduce any such issues associated with collinearity. After

extensive experimentation to identify the support for the parameter vector that completely

encompasses the range of individual behaviors observed in the data, we solve the dynamic

programming problem over an extensive grid. In total, we solve the dynamic programming

problem for 10,000 types (10 values for each parameter), identifying the optimal plan for each

or whether to subscribe at all. This leads to a total of 6,409 types selecting a plan rather

than not subscribing. For those types selecting a particular plan, of the 6,409 choosing any

plan, we use the following algorithm to identify a set of types that are not too similar to one

another.

We begin by constructing the matrix of correlation coe¢ cients of each column (each

corresponding to a type-speci�c moment) of gk(�k) with every other column. Thus, the

(i,j) element of this matrix is the correlation coe¢ cient of the moments predicted by the

model for types i and j, two types that chose plan k. We then take the �rst type and

remove all those types whose moments have a correlation coe¢ cient greater than 0.99 with

the �rst type. We take the next type, of those remaining, and eliminate all types that have

25

a correlation coe¢ cient greater than 0.99 with this type. We continue this process, cycling

through the types, until we�re left with a set of types whose moments have correlation

coe¢ cients less than 0.99 with all other types�moments. This process results in a well

de�ned optimization problem, such that gk(�k) is of full rank (moments of types remaining

are not too near to being collinear). Notice the algorithm would give a di¤erent set of types,

depending on which type it is initiated with. However, the resulting set of types (number of

columns in gk(�k)) will always be the same and be indistinguishable from the perspective of

their ability to match the behavior observed in the data. The algorithm results in a total,

across all plans, of 1,189 types for which we will estimate weights. For each plan, the search

algorithm takes less than one minute to converge.

4.3.2 Results

Similar to Bajari et. al. (2007), we �nd a relatively small number of types are assigned a

nonzero weight despite estimating weights for 1,189 types. No plan has more than 28 types

receiving positive weights, while the average across plans is only 15. This has the advantage

of signi�cantly simplifying the counterfactual analysis, where the dynamic programming

problem is solved repeatedly. We return to this point in Section 5.

Plans with higher usage allowances attract types with higher average (i.e., high �h) and

more variable (i.e., high �h) usage. This reinforces the �nding of Lambrect et. al. (2007)

that uncertainty plays an important role in subscribers� plan choice. Those types with

a higher preference for speed (i.e., high �2h) select the more expensive plans with greater

provisioned speeds, while those with the highest opportunity cost of time (i.e., high �1h)

often to select plans with lower �xed fees and lower usage allowances.

The 4-dimensional joint distribution of the taste parameters is di¢ cult to visualize, so

in Figures 5a-5d, we present the marginal distribution of types across all plans. These

�gures makes clear the bene�t of the nonparametric approach of Baraji et. al. (2007) and

Fox et. al. (2012), as common parametric densities would give an extremely poor �t. We

�nd signi�cant outliers along each dimension.

26

More important than the weights themselves, are what the weight implies about the

model�s ability to �t the behavior observed in the data. Figures 6a, 6b, and 6c present the

estimates of the cdf of cumulative consumption for the least expensive, most popular, and

most expensive plans at each day in the billing cycle. Comparing the model�s �t for these

plans to the moments recovered from the data, Figures 2a, 2b, and 2c, it is clear that the

model �ts the data quite well. The one exception is at the very low end of distribution, where

the model tends to over-predict usage over the course of the billing cycle. In particular,

the model has di¢ culty rationalizing an individual subscribing to broadband service and not

using the service at all (less than 0.5 GBs a month). This situation is likely to arise in the

data as a result of individuals taking extended vacations or simply not cancelling the service

in periods when their demand is extremely low. To better visualize the model �t, Figures

7a, 7b, and 7c plot both the empirical moments and the model �t for the last day of the

billing cycle. For over 95% of the usage distribution, the �t is very tight.

5 Welfare Implications of Usage-Based Pricing

Currently, some of the largest providers of residential broadband in the United States are

in the process of implementing or conducting trials of usage-based pricing (i.e., usage al-

lowances and a non-zero overage price).11 The estimates of the structural model provide an

opportunity to explore the implications of usage-based pricing for consumer welfare and its

potential to drive e¢ ciency in broadband networks.

To accomplish this, we consider two alternative scenarios. We �rst examine how usage

and consumer surplus changes when overage fees are simply eliminated and all other features

(i.e., �xed fees and provisioned speeds) are held constant. We compare these outcomes to

those when the provider is permitted to choose new �xed fees.12

There are a couple caveats to consider with this counter-factual exercise. First, our �exible

11http://www.�rstcoastnews.com/topstories/article/276426/483/Cable-companies-cap-data-use-for-revenue12It is important to note that our analysis only accounts for private welfare. In particular, we do not

account for any positive (e.g., education) or negative (e.g. violent games) externalities due to exposure tocontent.

27

nonparametric approach to identifying the joint distribution of subscribers�preferences limits

what we can say about types that don�t select a plan under the current usage-based pricing

schedule. In particular, we only identify weights for those types that actually selected

a plan.13 If those types not subscribing currently were to subscribe once overages were

eliminated, we could understate the bene�t to subscribers. This is likely not much of a

concern for qualitative conclusions, as those types with the greatest demand for broadband

will choose to subscribe in either case. Second, we do not allow the provider to change

the number of plans o¤ered or provisioned speeds. This can only reduce revenues to the

provider and decrease the likelihood of �nding that usage-based pricing is welfare improving.

Permitting the provider to select the number of plans or alter speeds requires nesting a

solution algorithm for the dynamic program for the many types of subscribers that we

estimate a positive weight for inside a high-dimensional optimization problem. This problem

is computationally prohibitive.

Usage is signi�cantly higher for users at the top end of the distribution under both al-

ternative scenarios, nearly doubling, while the bottom end of the distribution is basically

identical as one would expect. The only di¤erence in usage in the two alternative scenarios

is that the change in the �xed fees causes some subscriber types to switch to a lower tier.

Overall, we �nd that average usage increases by approximately 38% under both alternative

pricing regimes, with all of this increase coming from the top end of the distribution. The

average usage increases from 21.7GBs under the current pricing regime to approximately

30GBs for both alternatives. We �nd that the overall increase in subscriber subplus is only

4% and 2% in the two respective scenarios. This results in a very large drop in the average

value of each GB of data consumed. The average value of the additional GBs consumed by

subscribers is less than $0.18. If variable costs increase linearly with usage, as ISPs argue,

the small increase in consumer surplus associated with eliminating usage-based pricing does

not warrant the additional costs.13One option would be to impose some type of smoothness restrictions on the distribution of the taste

parameters. However, given the very unsmooth nature of the marginal distributions in Figures 5a-5d, it isnot clear what those restrictions would be.

28

6 Conclusion

The topic of how to price content being delivered over the internet will be an increasingly

important topic as more bandwidth-intensive applications are developed and the fraction

of computer savvy individuals in the US population grows. Our results provide strong

support for the FCC�s decision to support the industry�s move towards usage-based pricing

as a means to e¢ ciently allocate bandwidth to subscribers. We show that usage-based

pricing is an e¤ective means to eliminate extremely low value tra¢ c from the internet, while

minimizing the impact on consumer surplus. This suggests that the broadcasting model of

delivering content will continue to be more e¢ cient than unicasting means, unless there is a

signi�cant technological breakthrough that dramatically lowers the cost of bandwidth.

In addition to the policy implications for our �ndings, this study also represents a con-

tribution to the literature on estimating demand in a dynamic setting. To our knowledge,

we are the �rst to apply and demonstrate the usefulness of the nonparametric techniques

of Fox et. al. (2012). While our application is very well suited to these techniques, i.e.,

low dimensionality of type-speci�c parameter vector, it is our belief that both the �exibility

and computational ease of the techniques will make them appropriate for a wide variety of

settings in empirical microeconomics outside Industrial Organization.

Access to data from a provider operating a pristine and uncongested network allowed us

to ignore any interdependence in the decisions of subscribers when estimating preferences for

online content. However, this is often not the case on broadband networks. In a recent Wall

Street Journal article, the FCC published aggregate statistics on the signi�cant degradation

in the performance of broadband networks during peak hours. Measuring the extent of

network externalities in communication networks is an important topic that future research.

Malone (2012) represents a �rst step towards better understanding how congestion impacts

the utility that subscribers derive from broadband service, yet more work needs to be done.

29

References[1] Ackerberg, Daniel (2009) "A New User of Importance Sampling to Reduce Computa-

tional Burden in Simulation Estimation", Quantitative Marketing and Economics, 7(4),343-376.

[2] Aviva Aron-Dine, Liran Einav, Amy Finkelstein, and Mark Cullen (2012) "Moral hazardin health insurance: How important is forward looking behavior?", Working Paper.

[3] Bajari, Patrick, Jeremy Fox, and Stephen Ryan (2007) "Linear Regression Estimationof Discrete Choice Models with Nonparametric Distributions of Random Coe¢ cients",American Economic Review P&P, 97(2), 459-463.

[4] Copeland, A. and Cyril Monnet (2009) "The Welfare E¤ects of Incentive Schemes",Review of Economic Studies, 76(?), 93-113.

[5] Chung, Doug, Thomas Steenburgh, and K. Sudhir "Do Bonuses Enhance Sales Produc-tivity? A Dynamic Structural Analysis of Bonus-Based Compensation Plans", HBSWorking Paper #11-041.

[6] Fox, Jeremy, Kyoo il Kim, Stephen Ryan, and Patrick Bajari "A Simple Estimator forthe Distribution of Random Coe¢ cients", forthcoming in Quantititative Economics.

[7] Goolsbee, Austan and Peter J. Klenow (2006) "Valuing Products by the Time SpentUsing Them: An Application to the Internet", American Economic Review P&P, 96(2),108-113.

[8] Hendel, Igal, and Aviv Nevo (2006) "Measuring the Implications of Sales and ConsumerInventory Behavior", Econometrica, 74(6), 1637-1673.

[9] Johari, Ramesh, Gabriel Weintraub, and Benjamin Van Roy (2009) "Investment andMarket Structure in Industries with Congestion", Working Paper.

[10] Lahiri, S.N. (2003) "Resampling Methods for Dependent Data", Springer.

[11] Lambrecht, Anja, Katja Seim, and Bernd Skiera (2007) "Does Uncertainty Matter?Consumer Behavior Under Three-Part Tari¤s", Marketing Science, 26(5), 698-710.

[12] Malone, Jacob (2012) "Measuring Congestion Externalities in Communication Net-works", Working Paper.

[13] Marsh, Christina (2012) "Estimating Demand Elasticities Using Nonlinear Pricing",Working Paper.

[14] Misra, Sanjong and Harikesh Nair (2010) "The Welfare E¤ects of Incentive Schemes",Working Paper.

[15] Odlyzko, Andrew, Bill St. Arnaud, Erik Stallman, and Michael Weinberg (2012) "Con-sidering the Role of Data Caps and Usage Based Billing in Internet Access Service",Public Knowledge Whitepaper.

[16] Pagan, Adrian and Aman Ullah (1999) "Nonparametric Econometrics", Cambridge Uni-versity Press.

30

[17] Weintraub, Gabriel Y., C. Lanier Benkard, and Benjamin Van Roy (2010) "Computa-tional Methods for Oblivious Equilibrium", Operations Research, 58(4), 1247-1265.

[18] Weintraub, Gabriel Y., C. Lanier Benkard, and Benjamin Van Roy (2008) "MarkovPerfect Industry Dynamics with many Firms", Econometrica, 76(6), 1375-1411.

[19] Yao, Song, Carl Mela, Jeongwen Chiang, and Yuxin Chen (2011) "Determining Con-sumers�Discount Rates with Field Studies", Working Paper.

31

Levels-Fraction Log-Fraction Level-Dummy Log-Dummy

C(t-1) -0.255*** -0.171***

(0.004) (0.006)

dum_50to75 -0.228*** -0.169***

(0.005) (0.008)

dum_75to90 -0.350*** -0.159***

(0.009) (0.014)

dum_90to95 -0.483*** -0.289***

(0.017) (0.026)

dum_95to100 -0.566*** -0.279***

(0.019) (0.029)

dum_gt100 -0.773*** -0.272***

(0.011) (0.016)

Time Left 0.026 0.079** 0.026 0.079**

(0.027) (0.040) (0.027) (0.040)

Day of Week Dummies yes yes yes yes

Time Trend yes yes yes yes

Subscriber-Month Dummies yes yes yes yes

R2 0.242 0.122 0.257 0.152

# of Observations 3,046,570 2,993,767 3,046,570 2,993,767

Table 1a: Consumption Regressions: All Tiers


C(t-1) -0.032*** -0.051***

(0.001) (0.008)

dum_50to75 0.017*** -0.063**

(0.004) (0.031)

dum_75to90 -0.003 -0.019

(0.006) (0.049)

dum_90to95 -0.004 -0.085

(0.011) (0.082)

dum_95to100 -0.000 -0.092

(0.011) (0.085)

dum_gt100 0.011** -0.069*

(0.005) (0.039)

Time Left -0.001*** 0.000*** -0.003*** -0.003***

(0.000) (0.000) (0.001) (0.001)




R2 0.211 0.100 0.227 0.196

# of Observations 96,223 96,223 94,949 94,949

Table 1b: Consumption Regressions: Least Expensive Tier


C(t-1) -0.818*** -0.522***

(0.008) (0.013)

dum_50to75 -0.233*** -0.180***

(0.006) (0.009)

dum_75to90 -0.394*** -0.174***

(0.010) (0.017)

dum_90to95 -0.608*** -0.354***

(0.020) (0.032)

dum_95to100 -0.587*** -0.280***

(0.023) (0.036)

dum_gt100 -1.075*** -0.372***

(0.013) (0.021)

Time Left 0.024 0.024 0.089** 0.089**

(0.026) (0.026) (0.042) (0.042)




R2 0.271 0.131 0.258 0.196

# of Observations 2,608,388 2,608,388 2,559,925 2,559,925

Table 1c: Consumption Regressions: Most Popular Tier


C(t-1) -2.178*** -0.499***

(0.182) (0.091)

dum_50to75 -0.510*** -0.130**

(0.104) (0.051)

dum_75to90 -0.812*** -0.189**

(0.153) (0.076)

dum_90to95 -0.642** -0.249*

(0.278) (0.137)

dum_95to100 -1.909*** -0.528***

(0.288) (0.142)

dum_gt100 -1.156*** -0.196**

(0.183) (0.090)

Time Left -0.036*** -0.015*** -0.007*** -0.002

(0.005) (0.004) (0.002) (0.002)




R2 0.271 0.131 0.266 0.135

# of Observations 17,057 17,057 16,907 16,907

Table 1d: Consumption Regressions: Most Expensive Tier

12am 5am 10am 3pm 8pm0

200

400

600

800

1000

1200

Time of Day

Ave

rage

Tra

ffic

per

Sub

scrib

er (

Kb/

s)

Figure 1: Traffic by Time of Day

downloads (Kb/s)uploads (Kb/s)

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F o

f Cum

ulat

ive

Con

sum

ptio

n

Cumulative Consumption (% of Allowance)

Figure 2a: Least Expensive Plan − CDF

0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F o

f Cum

ulat

ive

Con

sum

ptio

n


Figure 2b: Most Popular Plan − CDF

0 0.5 1 1.5 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

CD

F o

f Cum

ulat

ive

Con

sum

ptio

n


Figure 2c: Most Expensive Plan − CDF

0

5

10

15

20

25

30

01

2’30

12’3

00

0.1

0.2

0.3

0.4

0.5

Days into Billing Cycle

Figure 3a: Least Expensive Plan − Nearest−Neighbor Mean


Mea

n of

GB

/day

0

5

10

15

20

25

30

0

0.5

1

1.5

20

2

4

6

8

10


Figure 3b: Most Popular Plan − Nearest−Neighbor Mean


Mea

n of

GB

/day

0

5

10

15

20

25

30

0

0.5

1

1.50

2

4

6

8

10


Figure 3c: Most Expensive Plan − Nearest−Neighbor Mean


Mea

n of

GB

/day

0

5

10

15

20

25

30

01

2’30

12’3

00

0.5

1

1.5

2


Figure 4a: Least Expensive Plan − Nearest−Neighbor SD


Mea

n of

GB

/day

0

5

10

15

20

25

30

0

0.5

1

1.5

20

2

4

6

8

10


Figure 4b: Most Popular Plan − Nearest−Neighbor SD


Mea

n of

GB

/day

0

5

10

15

20

25

30

0

0.5

1

1.50

2

4

6

8

10


Figure 4c: Most Expensive Plan − Nearest−Neighbor SD


Mea

n of

GB

/day

0 5 10 15 20 250

10

20

30

40

50

60

70

80

90

100

µh

Figure 5a: Marginal Distribution of µh

Rel

ativ

e F

requ

ency

of T

ypes

0 5 10 150

10

20

30

40

50

60

70

80

90

κ1h

Figure 5b: Marginal Distribution of κ1h

Rel

ativ

e F

requ

ency

of T

ypes

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

10

20

30

40

50

60

κ2h

Figure 5c: Marginal Distribution of κ2h

Rel

ativ

e F

requ

ency

of T

ypes

0 5 10 15 20 250

10

20

30

40

50

60

70

80

90

σh

Figure 5d: Marginal Distribution of σh

Rel

ativ

e F

requ

ency

of T

ypes

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

End of Month Usage (% of Allowance)

CD

F

Figure 7a: Model Fit of Least Expensive Plan CDF on Last Day of Billing Cycle

DataModel

0 1/4 1/2 3/4 1 5/4 3/2 7/4 20

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


CD

F

Figure 7b: Model Fit of Most Popular Plan CDF on Last Day of Billing Cycle

DataModel

0 1/3 2/3 1 4/3 5/3 2 7/3 8/3 30

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1


CD

F

Figure 7c: Model Fit of Most Expensive Plan CDF on Last Day of Billing Cycle

DataModel

Usage-Based Pricing of the Internet...Usage-Based Pricing of the Internet Aviv Nevoy Northwestern University John Turnerz University of Georgia Jonathan Williamsx University of Georgia

Documents