Usage-Based Pricing of the Internet Aviv Nevo y Northwestern University John Turner z University of Georgia Jonathan Williams x University of Georgia Preliminary and Incomplete November 2012 Abstract We estimate demand for residential broadband to study the e¢ ciency properties of usage-based billing. Using detailed-high frequency internet protocol data records, we exploit variation in the intertemporal tradeo/s faced by subscribers to estimate the distribution of subscriberspreferences for di/erent characteristics of service; access and overage fees, usage allowances, and connection speeds. We nd signicant hetero- geneity in tastes along each dimension of service. Using these estimates, we examine the e¢ ciency of various 3-part tarri/ pricing schedules. We nd that usage-based pric- ing models currently being employed in North America are successful at eliminating large volumes of low-value tra¢ c while having a minimal impact on subscriber welfare. These ndings provide strong support for the FCCs backing of the industrys move away from at-rate pricing. Keywords: Demand, Broadband, Dynamics, Usage-based Pricing, Welfare. JEL Codes: L13. We thank those North American Internet Service Providers that provided the data used in this paper. We thank Terry Shaw, Jacob Malone, Scott Atkinson, and seminar participants at Ga. Tech and UGA for insightful comments that signicantly improved this paper. Jim Metcalf provided expert computational and storage support for this project. All remaining errors are ours. y Department of Economics, Northwestern University, [email protected], ph: (847) 491-7001. z Department of Economics, University of Georgia, [email protected], ph: (706) 542-3376. x Corresponding Author: Department of Economics, University of Georgia, [email protected], ph: (706) 542-3689. 1
54
Embed
Usage-Based Pricing of the Internet...Usage-Based Pricing of the Internet Aviv Nevoy Northwestern University John Turnerz University of Georgia Jonathan Williamsx University of Georgia
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Usage-Based Pricing of the Internet�
Aviv Nevoy
Northwestern UniversityJohn Turnerz
University of Georgia
Jonathan Williamsx
University of Georgia
Preliminary and IncompleteNovember 2012
Abstract
We estimate demand for residential broadband to study the e¢ ciency propertiesof usage-based billing. Using detailed-high frequency internet protocol data records,we exploit variation in the intertemporal tradeo¤s faced by subscribers to estimate thedistribution of subscribers� preferences for di¤erent characteristics of service; accessand overage fees, usage allowances, and connection speeds. We �nd signi�cant hetero-geneity in tastes along each dimension of service. Using these estimates, we examinethe e¢ ciency of various 3-part tarri¤ pricing schedules. We �nd that usage-based pric-ing models currently being employed in North America are successful at eliminatinglarge volumes of low-value tra¢ c while having a minimal impact on subscriber welfare.These �ndings provide strong support for the FCC�s backing of the industry�s moveaway from �at-rate pricing.Keywords: Demand, Broadband, Dynamics, Usage-based Pricing, Welfare.JEL Codes: L13.
�We thank those North American Internet Service Providers that provided the data used in this paper.We thank Terry Shaw, Jacob Malone, Scott Atkinson, and seminar participants at Ga. Tech and UGA forinsightful comments that signi�cantly improved this paper. Jim Metcalf provided expert computational andstorage support for this project. All remaining errors are ours.
yDepartment of Economics, Northwestern University, [email protected], ph: (847) 491-7001.zDepartment of Economics, University of Georgia, [email protected], ph: (706) 542-3376.xCorresponding Author: Department of Economics, University of Georgia, [email protected], ph: (706)
542-3689.
1
1 Introduction
In the U.S., �last mile" connectivity to the internet is privately provided by telecomm (e.g.,
AT&T) and/or cable (e.g., Comcast) companies. This leaves the problem of allocating
scarce network resources (i.e., bandwidth) at the discretion of the internet service provider
or ISP. This ability of ISPs to price the delivery of content from the internet to subscribers
(and vice versa) has important implications for the future development of online content,
communications, and more generally, the way in which people use the internet. Therefore,
this has lead to signi�cant discussion on the way the Internet should be regulated.
During the past decade in the U.S., ISPs have typically sold unlimited access to the
internet for a �xed monthly fee. During this time, the average residential subscriber�s usage
has grown 50% annually. This dramatic growth in usage has led to a shift in the industry
towards usage-based pricing plans similar to those commonly associated with cellular phones.
Typically, these plans take the form of a three-part tari¤: a �xed access price, a usage
allowance, and a marginal price for usage in excess of the allowance. Just this year, two of
the largest cable providers, Comcast and TimeWarner Cable, conducted trials of usage-based
pricing in select markets.
ISPs argue that usage-based pricing is necessary to curtail the usage of the small number
of subscribers that dramatically drive up network costs and degrade the quality of service
for other subscribers. This views usage-based pricing as a type of Pigouvian tax that helps
equate a subscriber�s private bene�t to costs realized by the ISP (i.e., network investment)
and other subscribers (i.e., degraded service). ISPs also argue that usage-based pricing gives
the right incentives for content developers to minimize the bandwidth requirements of their
applications. For example, Youtube recently added an option for users to degrade the quality
of video streams. This allows the subscriber to degrade quality to a level acceptable to them,
while avoiding overage charges and minimizing any costs to the ISP or other users of the
tra¢ c they generate. These types of arguments led the Federal Communications Commission
(FCC) to recently back the practice, "Usage-based pricing would help drive e¢ ciency in the
networks," Julian Genachowski, FCC Chairman (Chicago Tribune, May 22, 2012).
2
The recent shift in the industry towards usage-based pricing models, along with the sup-
port of government regulators, has given rise to numerous organizations devoted to pre-
venting it. These include web sites (e.g. www.stopthecap.com and openmedia.ca/meter)
that monitor ISPs�activities for indications (e.g., reporting usage on subscribers�bills) that
usage-based pricing will be introduced. The web sites then ask their followers to bombard
the ISP with complaints, often providing direct contact information for the companies�exec-
utives, in the hope of preventing or delaying it. More formal organizations lobby regulators
and legislators directly. Geekdom is one such organization. "It�s like locking the doors to the
library," Nicholas Longo, Geekdom Director (NY Times, June 26, 2012). Generally, these
organizations believe that the activities of high volume subscribers are of high value and any
type of �caps" or usage-based pricing will result in signi�cant welfare losses for subscribers.
The extant academic literature on topics related to the economics of internet access is
very limited and almost exclusively theoretical in nature. Existing theoretical studies (e.g.
Odlyzko (2012)) often reach very strong and con�icting conclusions regarding the welfare
consequences of usage-based pricing. To date, the lack of detailed data on consumption of
internet content has limited empirical work on the topic to only a couple papers: Goolsbee
and Klenow (2001) and Lambrect et. al. (2007). Goolsbee and Klenow (2001) use Forrester
Technographics Survey data on individuals� time spent on the internet and earnings to
innovatively estimate the private bene�t to subscribers of residential broadband. However,
the authors�estimate relies on the potentially dubious assumption that an hour spent on the
internet is an hour in forgone wages.
The lack of more empirical studies on these important issues is largely due to the propri-
etary nature of, and technological constraints associated with collecting, much of the data
required to study these issues. However, the dramatic growth in usage over the past decade
has forced ISPs to invest in technology to track usage and better manage scarce network
resources. These investments in data collection have created an opportunity for academic
researchers to better understand the economics of broadband internet access. In this paper,
we use 5 months of detailed hourly data on internet utilization data we obtained from a group
3
of North American ISPs, some of which employ usage-based pricing, to study the impact
of usage-based pricing on subscribers. We observe the total volume of content downloaded
and uploaded each hour for approximately 5 months from late 2011 to early 2012 for over
30,000 subscribers.
To study the welfare implications of usage-based pricing, we begin by building a dynamic
model of subscribers�inter-temporal decision making throughout a billing cycle under usage-
based pricing. Speci�cally, we model subscribers as utility-maximizing agents that solve a
dynamic optimization program each billing cycle. While we do not have variation in the
service plans or tiers o¤ered to subscribers during the sample period, the high frequency
nature of our data and variety of three-part tarri¤s o¤ered to consumers allow us to accurately
estimate demand. In particular, the high frequency data allows us to exploit variation in
the shadow price, i.e., implications of current consumption on the probability of incurring
overage charges later in the billing cycle, to trace out marginal utility for subscribers. In
addition, selection into plans or choice of a particular three-part tarri¤ reveals a great deal
about preferences by revealing an average willingness to pay for content and a preference over
the speed of one�s connection (i.e., Mb/s). Our provider o¤ers plans ranging from almost
linear tarri¤s (i.e., very low usage allowances) to plans with allowances well over 100GBs.
The connection speeds, overage prices, and usage allowances are all non-decreasing in the
�xed access fee.
A potential concern with such an approach with such a model is that it is wrong to have
each subscriber solve his/her optimization problem in isolation. Or, network externalities
among users makes the problem a dynamic game in which subscribers choose when to use
the internet based on preferences and expectations regarding congestion in the network.
However, the data we use in this paper comes from an ISP that operates an over-provisioned
and pristine network. This allows us to accurately model a subscriber�s usage decision as
an independent one. We discuss how we can measure the absence of network externalities
in this provider�s network in Section 2.
To estimate the model, we adapt the techniques of Ackerberg (2009), Bajari et. al. (2007)
4
and Fox et. al. (2012). These techniques avoid very computationally expensive �xed-point
estimation algorithms. In particular, it is only necessary to solve the dynamic programming
problem for each type of agent a single time. Second, one can relax parametric/distributional
assumptions typically made when estimating such dynamic models. This is critical for our
purposes given the limited amount of information we have about a subscriber and the extreme
heterogeneity in usage behavior is di¢ cult to model. Finally, the techniques naturally deal
with di¢ cult to model forms of selection, which is an important issue in our application.
Subscribers select into a service tier or plan and we only observe usage under that optimally
selected plan.
The application itself is of interest and has important policy implications. As desirable,
yet bandwidth-intensive, applications continue to be developed and subscriber usage grows,
it will be increasingly important to have accurate measures of the demand for content. Our
results largely support the current regulatory stance of the FCC. We �nd that usage-based
pricing, as currently implemented by North American providers, is successful at removing
a great deal of low-value tra¢ c from the networks. This is largely due to the negative
correlation between the value and volume of typical online activities (e.g., 5GBs to stream
a movie and only bytes to send 1000s of emails).1 We show that subscribers derive a great
deal of utility from the �rst bytes of tra¢ c generated on a broadband connection, but utility
diminishes rapidly thereafter. This supports the goal of the FCC to increase the reach of
basic broadband service to more rural and under-served areas as stated in their National
Broadband Plan.2 Finally, our results are important for how content will be delivered in
the future by telecomm and cable companies. There is an appeal to unicasting, or allowing
each user to view content at their convenience. However, the low willingness to pay we �nd
for this convenience and high costs (i.e., large amount of additional tra¢ c to accomodate on
networks) suggests that into the forseeable future (a short time in this industry), the cost
e¤ectiveness of broadcasting will continue to dominate arguments against usage-based billing
1This is consistent with Net�ix�s failed attempt to raise the prices of their service in 2012.2The FCC is currently working with telecomm and cable providers to o¤er a basic $9.99 tier to low-income
households. See Greenstein and Prince (2008) for more on the historical reach of broadband services.
5
of the internet.
The remainder of the paper is as follows. In Section 2 we discuss our data in greater
detail and provide reduced-form results that motivate assumptions made in the structural
model and demonstrate a high price elasticity for online content. In Section ?? we discuss
the model used to capture the intertemporal decisions of subscribers regarding consumption
of content under usage-based billing. Section 4 presents our methodology for estimating
the structural model and presents the results. Sections 5 and 6 presents the results of our
counterfactual exercise to identify the bene�t to subscribers of removing usage allowances
and �nal conclusions, respectively.
2 Data
The data used in this paper are Internet Protocol Data Record (IPDR) data. IPDR is a
standardized framework for collecting usage and performance data from IP-based services
and is currently the most popular way for cable operators to e¢ ciently measure subscriber
usage. The IPDR framework is supported by a DOCSIS (Data Over Cable Service Interface
Speci�cation) 2.0/3.0 compliant CMTS (Cable Model Termination System). A CMTS uses
a collector to gather data at a minimum con�gurable reporting period of 15 minutes. Our
data reports the volume of a subscriber�s usage every hour, linking subscriber�s to their
usage through their cable modem�s (CM) MAC (Media Access Control) address. The
usage reported for a subscriber does not re�ect DOCSIS framing overhead; however, it does
include operator-initiated management, control tra¢ c, and Internet-originated tra¢ c (pings,
port scans, etc) which must be considered when metering internet usage. See Clarke (2009)
for more details on the structure of DOCSIS cable networks.
The unit of observation for the IPDR data is a MAC address and a record creation time,
or month, day, and hour. The data also reports the DOCSIS mode, 2.0 or 3.0, of each
where the time-varying unobservable, �t; is not known to the subscriber until period t and is
independently and identically distributed on [0; �] according to a distribution Gh:5 Hence,
the consumer�s marginal utility varies randomly across days in ways that the consumer cannot
predict. The speci�cation includes a constant marginal cost of consuming online content,
�1 � �1 ln(sk), that is decreasing in the speed of the connection, sk. This implies that the
consumer has a satiation point, which captures key features of the data. All parameters
di¤er across types of consumers.
Letting income be I and letting total consumption since the beginning of the billing cycle
be Ct �Pt
j=1 cj and Yt �Pt
j=1 yt; respectively, de�ne the monthly budget constraint as
Fk + pk(CT � Ck)1�CT > Ck
�+ YT � I; (2)
4In this way, we assume that content with a similar marginal utility is generated each day or constantlyrefreshed. This may not be the case for a subscriber that has not previously had access to the internet.
5The right-truncation of G is necessary to ensure that the consumer can a¤ord any e¢ cient level of dailyconsumption. Let h, sk and pk be the highest levels of these parameters. It su¢ ces to assume thatTpk ( h + ln(sk) + �) < I.
12
where 1 [�] is the indicator function.
Denote the discount factor � 2 (0; 1): Conditional on choosing plan k, the consumer�s
problem is to choose daily consumption to maximize
U =
TXt=1
�t�1E [uh(ct; yt; k)] ; subject to (2).
Throughout this paper, we will assume that all consumers have su¢ cient income to pay for
satiation levels of content.
3.2 Optimal Consumption
The subscriber�s problem is a �nite-horizon dynamic-programming problem. Consider the
terminal period (T ) of a billing cycle and denote the remaining allowance CkT �MaxfCk�
CT�1; 0g: The e¢ ciency condition for optimal consumption depends on whether it is optimal
to exceed CkT . Intuitively, if the consumer is well below the cap (i.e., CkT is high) and does
not have a particularly high draw of �T , then she consumes content up to the point where
the marginal utility of content is zero. If marginal utility at ct = CkT is positive but below
pk; then it is optimal to consume the remaining allowance. If one is already above the cap
(i.e., CkT = 0) or draws an extremely high �T ; then it is optimal to consume up the the
point where the marginal utility of content equals the overage price.
In each of these situations in the last period, there are no intertemporal tradeo¤s. Usage
today has no impact on next period�s state, as cumulative consumption resets to zero at
the beginning of each billing cycle. Thus, the problem is reduced to solving a static util-
ity maximization problem, given a subscriber�s cumulative usage up until period T , CT�1,
and the realization of preference shock, �T , which together determine the implications for
overage charges and the marginal utility of usage, respectively. Denote this optimal level of
consumption, for a given realization of �T , by c�hkT .
For a given realization of the preference shock in period T , the suscriber�s utility from
13
entering the �nal period with state CT�1 and behaving optimally in the terminal period is
35for each ordered pair (Ct�1; �t).6 Similar to the terminal period, the expected value function
is
E [Vhkt(Ct�1)] =
�Z0
Vhkt(Ct�1; �t)dGh(�t).
6Notice this formulation of the optimization problem assumes that the subscriber is aware of their cumu-lative consumption, Ct�1, on each day in the billing cycle. This is a realistic assumption for our data, asthe results in Table 2 demonstrate.
14
for all t < T = 30 and the mean of the mean of a subscriber�s usage at each state is
E [c�hkt(Ct�1)] =
�Z0
c�hkt(Ct�1; �t)dGh(�t): (3)
The policy functions for each type (h) of subscriber imply a distribution for the time spent
in particular states (t; Ct�1) over a billing cycle. We discuss solving for this distribution,
generated by optimal subscriber behavior, and how it, along with the moments of usage,
forms the basis of our method of moments approach discussed in Section 4.
3.3 Model Solution and Stationary Distribution
Let Gh denote normal distribution, truncated at �, with mean �h and variance �2h. For
a plan, k, and subscriber type, h, characterized by the vector (�1h; �2h; �h; �h), the �nite-
horizon dynamic program described above can be solved recursively, starting at the end of
each billing cycle (t = T ). To do so, we discretize the state space for Ct to a grid of 1800
points with spacing of size, csk GBs, for each plan, k. Our data is hourly, so time is naturally
discrete, but we aggregate time up to the day (t = 1; 2; :::; 30 over a billing cycle with T = 30
days).7 This discretization leaves �t as the only continuous state variable. Because the
subscriber does not know �t prior to period t, we can integrate this out and the solution to
the dynamic programming problem for a subscriber of each type h can be characterized by
the expected value functions, EVhkt(Ct�1), and policy functions, c�hkt(Ct�1; �t). To perform
the numerical integration over the bounded support [0; �] of �t, we use adaptive Simpson
quadrature.
Having solved the program for a subscriber of type h, one can then generate the transition
process for the state vector implied by the solution to the dynamic program. The transition
probabilities between the 54,000 possible states (1800*30) are implicitly de�ned by threshold
values for �t. For example, consider a subscriber of type h on plan k, that has consumed
Ct�1 prior to period t. The value of �t that makes a subscriber indi¤erent between setting
ct = zcsk rather than ct = (z + 1)csk (advance cumulative consumption by z or z + 1 steps
7This aggregation loses very little information, as over 80% of usage is on peak (between 6pm and 11pm).
15
of size csk) equates the marginal utility (net of any overage charges) of an additional unit of
consumption to the loss in the net present value of future utility
��EVhk(t+1)(Ct�1 + (z + 1)cs)� EVhk(t+1)(Ct�1 + zcs)
�.
These thresholds, which along with all subscribers� initial condition, (C0 = 0), de�ne the
transition process between states. Subscribers will consume no less if speed (sk) is higher
(lower opportunity cost of time), the overage price is lower, and the gradient of the expected
value function is not too steep in cumulative consumption.
For each subscriber type, h, and plan, k, we characterize this transition process by the
cdf of the stationary distribution that it generates,
�hkt(C) = P (Ct�1 < C) ,
the proportion of subscribers that have consumed less than C through period t of the billing
cycle.8 These probabilities, for di¤erent values of C, are directly observable in our data and
form the basis for our method of moments approach discussed in Section 4.
3.4 Optimal Plan Choice
After solving the dynamic program a subscriber type, h, under every plan, k, selection into
plans by subscribers can be naturally dealt with. A subscriber selects a plan with knowledge
of their type, (�1h; �2h; �h; �h), and the features of the plan, but not the realization of their
particular needs (realizations of �t for t = 1::T ) over the course of a billing cycle. In this case,
the subscriber will select the plan, k = 1; ::; K, with the highest expected utility, or choose
no plan at all, k = 0. To identify the optimal plan for each type, one can simply �nd the
plan that gives the highest expected utility at the beginning of a billing cycle, E [Vhk1(0)],
and then ensure that this is greater than zero (the outside option�s value, E [Vh01(0)] is
normalized to 0). The optimal plan for a type h subscriber is then
k�h = argmaxk2f0;1;:::;Kg
fE [Vhk1(0)]� Fkg .
where the �xed fee for the outside option is, F0 = 0.8The discretized state space makes this cdf a step function.
16
4 Estimation
We use a method of moments approach to recover the primitives of the model, the joint
distribution of the parameter vector (�1h; �2h; �h; �h). Our model predicts moments of
optimal behavior at each state, along with the time spent in di¤erent states, (Ct�1; t), for
each subscriber type. We seek to �nd the distribution of subscriber types that matches
the distribution of Ct�1 in the population of subscribers at each point in the billing cycle,
t. This approach has the advantage of exploiting the high-frequency nature of our data, as
it allows us to use variation in the intertemporal decisions made by subscribers at di¤erent
states, rather than the end-product of these decisions (e.g. monthly internet usage).
Our approach to estimation is most similar to the two-step algorithms advocated by
Ackerberg (2009), Bajari et. al. (2007), and Fox et. al. (2011). The �rst step is to recover
the moments to be matched from the data and solve the dynamic program for a wide variety
of subscriber types, (�1h; �2h; �h; �h).9 We recover both the cdf of cumulative consumption,
Ct�1, for each plan, k, at each point in the billing cycle and the unconditional mean and
variance of usage at each state, (Ct�1; t). In the second step, we follow Fox et. al. (2011)
by searching for the weights or density of each type that best match the moments recovered
from the data. The moments we chose to match were chosen for their identifying power
and computational ease. In particular, these moments are linear in the type-speci�c weights
which reduces the matching process to a linear regression subject to a linear constraint and
non-negativity restrictions. In addition to the computational advantages, this approach
has the advantage of not placing parametric restrictions on the shape of the subscriber type
distribution and naturally deals with selection (i.e., identify each type�s optimal plan, k�h, in
the �rst step).
9Fox et. al. (2011) correctly point out that identifying the correct support for the parameter vector,(�1h; �2h; �h; �h), may in fact be viewed as an additional step to the estimation process. Yet, their motivatingexample of a random coe¢ cient demand model and aggregated data (i.e., market shares for each product)is much di¤erent than our application. In particular, the authors are assuming that one observes onlyaggregate data making it impossible to know exactly what range of types are consistent with the data.However, in our application, we know the complete distribution of usage and this dramatically simpli�esidentifying the support of the type distribution that is consistent with even the most infrequent occurrencesin the data.
17
4.1 Identi�cation
To realize the full computational advantages of the Fox et. al. (2012) approach, we consider
those moments with the most identifying power and then decompose the moments into
parts that are linear in the parameters. The advantage of our data is that we observe the
distribution of actions for subscribers at each state, (Ct�1; t), along with the distribution of
subscribers across states. Thus, we observe how consumers respond to marginal (shadow)
prices ranging zero upto the overage price. This allows us to consider any moments of
the conditional distribution of consumption at those states at which a subscriber is present.
We focus on the conditional mean and variance of usage at each state. These moments are
determined by the probability that di¤erent types reach a particular state and the actions
taken at this state. Thus, it is important that our econometric approach correctly identify
both this compositional aspect of the conditional moments.
To see this, consider a model with two types, low usage (L) and high usage (H) subscribers.
Consider those states that are only be reached by the low types (i.e., low cumulative con-
sumption well into a billing period). At these states, subscribers are essentially solving
a static utility maximization problem with a marginal price of zero, as there is a negligi-
ble probability they will exceed the usage allowance (i.e., shadow price of consumption is
nearly zero). Knowing only low usage subscribers are present in these states and observing
a subscriber solve this problem each day, equating marginal utility to zero, identi�es the
parameters of the utility function for these L types. Similarly, high demand subscribers are
likely to exceed the usage allowance, equating marginal utility to the overage price from the
beginning of the billing cycle. Thus, observing variation in usage at these states identi�es
the utility function for high demand subscribers.
One might then argue that the weights, relative mass of H and L types in the population,
for each type are identi�ed by the mixture of actions taken at intermediate states that can be
reached by both types. However, this is a very weak source of identi�cation in our data due
to the large degree of heterogeneity among users. Speci�cally, consumers sort themselves out
across the state space so quickly at the beginning of the month that the only real identifying
18
variation for the weights comes on the very �rst day of the billing cycle for which all types
are at the same state. After that point, the types are essentially over disjoint portions of
the state space. Thus, the conditional moments by themselves may identify the types, h, of
subscribers that are present in the data but provide very little information on their relative
weights.
Along with the lack of identifying power for the weights, considering the conditional
moments also has the problem of being nonlinear in the weights. For any reasonable number
of types (e.g., 500 or more), this results in an infeasible constrained-nonlinear optimization
problem. For example, the conditional mean of consumption at each state is a mixture of
type-speci�c policy functions,
E [c�kt(Ct�1)] =HPh=1
E [c�hkt(Ct�1)]�ht(Ct�1),
where
�ht(Ct�1) = ht(Ct�1)�hHPh=1
ht(Ct�1)�h
:
Thus, �ht(Ct�1) is a nonlinear function of both the probability a type reaches a particular
state and the relative mass of the type in the population. The conditional variance of usage
is de�ned similarly.
To remedy both the computational and identi�cation di¢ culties with these moments, we
decompose the conditional moments in to two parts, the numerator and denominator The
numerator,HPh=1
E [c�hkt(Ct�1)] ht(Ct�1)�h,
is just the unconditional mean of usage at each state while the denominator,
HPh=1
ht(Ct�1)�h,
is the mass of subscribers at a particular level of cumulative consumption, Ct�1, on day t
of the billing cycle. Both these moments are linear in the weights, �h, and together solve
the identi�cation problem. In particular, by matching both sets of moments, we match the
19
conditional usage at each state (useful for identifying utility of each type) while also pinning
down the relative weights of each type by matching the distribution of subscribers across the
state space. The details of the matching procedure are discussed in Section 4.3.
4.2 Recovering Empirical Moments
The large number of observations and high frequency of our data, along with the low dimen-
sionality of our state space, (Ct�1; t), allows us to adopt a �exible nonparametric approach
for recovering moments from the data to match to our model. We recover both the cdf of
cumulative consumption for each day in the billing cycle, t, along with the conditional mean
and variance of usage at each state. The unconditional mean and variance are then the
product of the pdf of cumulative consumption and the conditional moments.
4.2.1 CDF of Cumulative Consumption
To recover the cumulative distribution of Ct�1 at each point in the billing cycle, t, for each
plan, k, we use a smooth version of a simple Kaplan-Meier estimator,
b�kt(C) = 1
Nk
NkXi=1
1�Ci(t�1) < C
�.
We estimate these moments for each k and t, considering values of C such that b�kt(C) 2[0:01; 0:99], ensuring that we �t the tails of the usage distribution. This results in approxi-
mately 30,000 moments to match for each plan. Let b�cdfk denote the vector of moments for
plan k.10
To compute point-wise standard errors for our estimates of these distributions, we draw
on the literature on resampling methods with dependent data, see Lahiri (2003). The de-
pendence in our data comes from the panel nature of the data, as we observe individuals
making daily decisions on consumption over 3 or 4 full billing cycles. The straightforward
structure of our panel signi�cantly simpli�es the resampling procedure. We repeatedly esti-
mate the cumulative distribution functions, leaving out di¤erent groups of subscribers. We
10We use a normal kernel and adaptive bandwidth to smooth the empirical cdf.
20
choose 1,000 randomly sampled groups of 5,000 subscribers and re-estimate each distribution
omitting the di¤erent groups of subscribers each time. These estimates are then used to cal-
culate a variance-covariance matrix, bVcdfk , for the moments for each plan, k. This weighting
matrix is used to account for the di¤erent scale of our moments and inversely weight more
variable moments.
Figures 2a, 2b, and 2c present the recovered cdf of cumulative consumption for each
day of the billing cycle, for the least expensive, most popular, and most expensive plans,
respectively. The least and most expensive plans are the two least popular plans o¤ered
by our provider. Yet, there is still a more than adequate number of observations to get an
accurate characterization of the time spent in di¤erent states by subscribers on these plans.
On both the least and most expensive plans, there are a signi�cant proportion of subscribers
that exceed their usage allowance, 20% and 30%, respectively. While the proportion of
subscribers exceeding the allowance on the most popular plan is small, the absolute number
of users is actually larger than the total number of users to exceed the allowance on all other
plans combined.
4.2.2 Unconditional Mean and Variance of Consumption
The large number of observations in and richness of our data, along with the low dimension-
ality of the our state space, (Ct�1; t), allows us to adopt a very �exible estimation approach
to recover the moments of usage at each state. Our problem essentially reduces to estimating
a surface de�ned over the (Ct�1; t) plane.
To �exibly estimate the conditional moments, we adopt a nearest neighbor approach.
Consider point in the state space, ( eCet�1;et). A neighbor is an observation in the data for
which t = et and Ct�1 is within some distance of eCet�1 (e.g., 0.5 GBs). Denote the �xed
number of nearest neighbors, those with the smallest distance from eCet�1, used to estimatethe moments at ( eCet�1;et), under plan k, by Nk( eCet�1;et). The estimate of the conditional
mean at ( eCet�1;et) isbE hc�kt( ~C~t�1)i = 1
Nkt( eCet�1)Nkt( eCet�1)X
i=1
ci,
21
where i = 1::::Nk( eCet�1;et) indexes the set of nearest neighbors. Similarly, our estimator of
the conditional variance is
bV hc�kt( ~C~t�1)i = 1
Nkt( eCet�1)� 1Nkt( eCet�1)X
i=1
�ci � bE hc�kt( ~C~t�1)i�2 .
If Nk( eCet�1;et) < 100, we do not estimate the conditional mean. If there are at least 100 butless than 500 neighbors, we use all neighbors to estimate the conditional mean. If there are
more than 500 neighbors, we use those 500 neighbors nearest to eCet�1. The unconditional
mean is then recovered as the product of the probability of observing a subscriber at state
( eCet�1;et), estimated from the cdf of cumulative consumption we recover, and the conditionalmean. Let the vector of estimates for the unconditional means and variances for plan k be
denoted by b�avgk and b�vark , respectively.
The nearest neighbor approach has a number of advantages over other estimators for
our application. First, as with any nonparametric estimator, it imposes no parametric
restrictions on the surface. Second, nearest neighbor estimators inherently are bandwidth
adaptive, see Pagan and Ullah (1999). This is particularly useful in our application. The
number of users reaching very high volumes of cumulative consumption can be small for
some plans. In these low-density situations, nearest neighbor estimators will expand the
bandwidth appropriately until a given number of observations are included in the estimator
of the surface. We do restrict the degree to which the estimator can expand the bandwidth
in these low-density situations in order to limit any potential bias such expansion might
introduce. Our results are very robust to varying both the minimum number of neighbors
(Nkt( eCet�1) > 100) required for a conditional moment to be estimated and the cuto¤ that
determines that determines how much the bandwidth can adapt in low-density situation to
identify Nkt( eCet�1) neighbors. We estimate each surface at the same set of discrete set of
state space points used when numerically solving the dynamic programming problem for
each subscriber type. We again use a block-resampling procedure to compute a variance
covariance matrix for our estimates for the conditional mean and variance, bVavgk and bVvar
k ,
respectively. We use these matrices to inversely weight more variable moments.
22
While we will match the unconditional mean and variance at each state, it is useful and
intuitive to present the conditional means which demonstrate a few properties of our data
more clearly than the analogous unconditional moments. These results are summarized
in Figures 3a-3c and 4a-4c for the mean and variance, respectively, for the least expensive,
most popular, and most expensive plans. For each plan, the surfaces characterizing the
conditional means have the same pattern.
Very early in the billing period the di¤erent types of subscribers reveal themselves. The
high types sort themselves to high cumulative consumption states and continue to consume
at a very high level. Interestingly, we see that consumption is relatively smooth across the
billing cycle, which suggests that (high volume) subscribers are quite adept at smoothing
consumption. Or, we do not see much of a drop in average consumption for the highest
volume subscribers as they near the overage, reinforcing our decision to model subscribers
as forward-looking and rational economic agents. The low-volume types tend to migrate to
low cumulative consumption states as the billing cycle progresses and continue to consume
at low levels. In addition, there is a wide variety of intermediate types that consume at a
fairly constant level throughout the billing cycle. The estimates of the standard deviation
in usage at each state follow a similar patterns to the means. The sorting is again evident
and higher mean types tend to have much more variable usage, while the standard deviation
of usage tends to be proportional to the mean.
4.3 Matching Moments
4.3.1 Objective Function
The second step of our estimation approach follows the method of moments approach due
to Bajari et. al. (2007) and Fox et. al. (2011). Our objective is to match, as closely
as possible, the empirical moments we recover from the data to those predicted by our
model. The parameters we minimize over are the relative mass of di¤erent types, �h, in the
population of subscribers that choose a plan, k.
23
Speci�cally, for each plan, our estimates of these weights are chosen to satisfy
b�k = argmin�k
gk(�k)0 bV�1
k gk(�k),
subject to the weights for all types choosing plan k summing to unity,
1CAis the di¤erence between the moments recovered from the data (b�k) and the weighted averageof type speci�c moments predicted by the model (�k is a matrix with Hk columns). Thus,
each element of gk(�k), corresponds to a unique ordered pair, t and C, is of the form
b�kt(C)� HkPh=1
�h�hkt(C).
As in Fox et. al. (2011), one can then think of gk(�k) as a vector of random variables, where
the randomness is a result of sampling variability in the empirical moments (measurement
error in observed market shares in their example).
The weighting of moments by the block diagonal matrix,
bV�1k =
0B@ bVavgk 0 0
0 bVvark 0
0 0 bVcdfk
1CA�1
;
ensures that more variable moments receive relatively less weight, although we �nd virtually
no di¤erence in our estimates and unweighted estimates. After estimating the weights
associated with each type that selects a plan, we appropriately normalize the weights to
24
re�ect the number of subscribers choosing each plan to get the joint distribution of types
across all plans.
As pointed out by Bajari et. al. (2007) and Fox et. al. (2011), least squares minimization
over a bounded support and subject to linear constraints is a well-de�ned convex optimization
problem. Thus, convergence ensures a global minimum, although not necessarily unique.
In cases with many types, some of which behave similarly, identi�cation issues can arise
regardless of the richness of the moments one is matching. The approach of Bajari et. al.
(2007) and Fox et. al. (2012), as with any approach, requires that the type-speci�c matrix
of moments (�avgk , �vark , and �cdfk ) be of full rank. If types are too similar in their behavior,
collinearity issues arise and it is not possible to separately identify the weights associated
with each type. Fortunately, in our application, it is not necessary to accurately identify
the weights associated with each individual type, rather only the total mass of types that
behave similarly.
We take intuitive steps to reduce any such issues associated with collinearity. After
extensive experimentation to identify the support for the parameter vector that completely
encompasses the range of individual behaviors observed in the data, we solve the dynamic
programming problem over an extensive grid. In total, we solve the dynamic programming
problem for 10,000 types (10 values for each parameter), identifying the optimal plan for each
or whether to subscribe at all. This leads to a total of 6,409 types selecting a plan rather
than not subscribing. For those types selecting a particular plan, of the 6,409 choosing any
plan, we use the following algorithm to identify a set of types that are not too similar to one
another.
We begin by constructing the matrix of correlation coe¢ cients of each column (each
corresponding to a type-speci�c moment) of gk(�k) with every other column. Thus, the
(i,j) element of this matrix is the correlation coe¢ cient of the moments predicted by the
model for types i and j, two types that chose plan k. We then take the �rst type and
remove all those types whose moments have a correlation coe¢ cient greater than 0.99 with
the �rst type. We take the next type, of those remaining, and eliminate all types that have
25
a correlation coe¢ cient greater than 0.99 with this type. We continue this process, cycling
through the types, until we�re left with a set of types whose moments have correlation
coe¢ cients less than 0.99 with all other types�moments. This process results in a well
de�ned optimization problem, such that gk(�k) is of full rank (moments of types remaining
are not too near to being collinear). Notice the algorithm would give a di¤erent set of types,
depending on which type it is initiated with. However, the resulting set of types (number of
columns in gk(�k)) will always be the same and be indistinguishable from the perspective of
their ability to match the behavior observed in the data. The algorithm results in a total,
across all plans, of 1,189 types for which we will estimate weights. For each plan, the search
algorithm takes less than one minute to converge.
4.3.2 Results
Similar to Bajari et. al. (2007), we �nd a relatively small number of types are assigned a
nonzero weight despite estimating weights for 1,189 types. No plan has more than 28 types
receiving positive weights, while the average across plans is only 15. This has the advantage
of signi�cantly simplifying the counterfactual analysis, where the dynamic programming
problem is solved repeatedly. We return to this point in Section 5.
Plans with higher usage allowances attract types with higher average (i.e., high �h) and
more variable (i.e., high �h) usage. This reinforces the �nding of Lambrect et. al. (2007)
that uncertainty plays an important role in subscribers� plan choice. Those types with
a higher preference for speed (i.e., high �2h) select the more expensive plans with greater
provisioned speeds, while those with the highest opportunity cost of time (i.e., high �1h)
often to select plans with lower �xed fees and lower usage allowances.
The 4-dimensional joint distribution of the taste parameters is di¢ cult to visualize, so
in Figures 5a-5d, we present the marginal distribution of types across all plans. These
�gures makes clear the bene�t of the nonparametric approach of Baraji et. al. (2007) and
Fox et. al. (2012), as common parametric densities would give an extremely poor �t. We
�nd signi�cant outliers along each dimension.
26
More important than the weights themselves, are what the weight implies about the
model�s ability to �t the behavior observed in the data. Figures 6a, 6b, and 6c present the
estimates of the cdf of cumulative consumption for the least expensive, most popular, and
most expensive plans at each day in the billing cycle. Comparing the model�s �t for these
plans to the moments recovered from the data, Figures 2a, 2b, and 2c, it is clear that the
model �ts the data quite well. The one exception is at the very low end of distribution, where
the model tends to over-predict usage over the course of the billing cycle. In particular,
the model has di¢ culty rationalizing an individual subscribing to broadband service and not
using the service at all (less than 0.5 GBs a month). This situation is likely to arise in the
data as a result of individuals taking extended vacations or simply not cancelling the service
in periods when their demand is extremely low. To better visualize the model �t, Figures
7a, 7b, and 7c plot both the empirical moments and the model �t for the last day of the
billing cycle. For over 95% of the usage distribution, the �t is very tight.
5 Welfare Implications of Usage-Based Pricing
Currently, some of the largest providers of residential broadband in the United States are
in the process of implementing or conducting trials of usage-based pricing (i.e., usage al-
lowances and a non-zero overage price).11 The estimates of the structural model provide an
opportunity to explore the implications of usage-based pricing for consumer welfare and its
potential to drive e¢ ciency in broadband networks.
To accomplish this, we consider two alternative scenarios. We �rst examine how usage
and consumer surplus changes when overage fees are simply eliminated and all other features
(i.e., �xed fees and provisioned speeds) are held constant. We compare these outcomes to
those when the provider is permitted to choose new �xed fees.12
There are a couple caveats to consider with this counter-factual exercise. First, our �exible
11http://www.�rstcoastnews.com/topstories/article/276426/483/Cable-companies-cap-data-use-for-revenue12It is important to note that our analysis only accounts for private welfare. In particular, we do not
account for any positive (e.g., education) or negative (e.g. violent games) externalities due to exposure tocontent.
27
nonparametric approach to identifying the joint distribution of subscribers�preferences limits
what we can say about types that don�t select a plan under the current usage-based pricing
schedule. In particular, we only identify weights for those types that actually selected
a plan.13 If those types not subscribing currently were to subscribe once overages were
eliminated, we could understate the bene�t to subscribers. This is likely not much of a
concern for qualitative conclusions, as those types with the greatest demand for broadband
will choose to subscribe in either case. Second, we do not allow the provider to change
the number of plans o¤ered or provisioned speeds. This can only reduce revenues to the
provider and decrease the likelihood of �nding that usage-based pricing is welfare improving.
Permitting the provider to select the number of plans or alter speeds requires nesting a
solution algorithm for the dynamic program for the many types of subscribers that we
estimate a positive weight for inside a high-dimensional optimization problem. This problem
is computationally prohibitive.
Usage is signi�cantly higher for users at the top end of the distribution under both al-
ternative scenarios, nearly doubling, while the bottom end of the distribution is basically
identical as one would expect. The only di¤erence in usage in the two alternative scenarios
is that the change in the �xed fees causes some subscriber types to switch to a lower tier.
Overall, we �nd that average usage increases by approximately 38% under both alternative
pricing regimes, with all of this increase coming from the top end of the distribution. The
average usage increases from 21.7GBs under the current pricing regime to approximately
30GBs for both alternatives. We �nd that the overall increase in subscriber subplus is only
4% and 2% in the two respective scenarios. This results in a very large drop in the average
value of each GB of data consumed. The average value of the additional GBs consumed by
subscribers is less than $0.18. If variable costs increase linearly with usage, as ISPs argue,
the small increase in consumer surplus associated with eliminating usage-based pricing does
not warrant the additional costs.13One option would be to impose some type of smoothness restrictions on the distribution of the taste
parameters. However, given the very unsmooth nature of the marginal distributions in Figures 5a-5d, it isnot clear what those restrictions would be.
28
6 Conclusion
The topic of how to price content being delivered over the internet will be an increasingly
important topic as more bandwidth-intensive applications are developed and the fraction
of computer savvy individuals in the US population grows. Our results provide strong
support for the FCC�s decision to support the industry�s move towards usage-based pricing
as a means to e¢ ciently allocate bandwidth to subscribers. We show that usage-based
pricing is an e¤ective means to eliminate extremely low value tra¢ c from the internet, while
minimizing the impact on consumer surplus. This suggests that the broadcasting model of
delivering content will continue to be more e¢ cient than unicasting means, unless there is a
signi�cant technological breakthrough that dramatically lowers the cost of bandwidth.
In addition to the policy implications for our �ndings, this study also represents a con-
tribution to the literature on estimating demand in a dynamic setting. To our knowledge,
we are the �rst to apply and demonstrate the usefulness of the nonparametric techniques
of Fox et. al. (2012). While our application is very well suited to these techniques, i.e.,
low dimensionality of type-speci�c parameter vector, it is our belief that both the �exibility
and computational ease of the techniques will make them appropriate for a wide variety of
settings in empirical microeconomics outside Industrial Organization.
Access to data from a provider operating a pristine and uncongested network allowed us
to ignore any interdependence in the decisions of subscribers when estimating preferences for
online content. However, this is often not the case on broadband networks. In a recent Wall
Street Journal article, the FCC published aggregate statistics on the signi�cant degradation
in the performance of broadband networks during peak hours. Measuring the extent of
network externalities in communication networks is an important topic that future research.
Malone (2012) represents a �rst step towards better understanding how congestion impacts
the utility that subscribers derive from broadband service, yet more work needs to be done.
29
References[1] Ackerberg, Daniel (2009) "A New User of Importance Sampling to Reduce Computa-
tional Burden in Simulation Estimation", Quantitative Marketing and Economics, 7(4),343-376.
[2] Aviva Aron-Dine, Liran Einav, Amy Finkelstein, and Mark Cullen (2012) "Moral hazardin health insurance: How important is forward looking behavior?", Working Paper.
[3] Bajari, Patrick, Jeremy Fox, and Stephen Ryan (2007) "Linear Regression Estimationof Discrete Choice Models with Nonparametric Distributions of Random Coe¢ cients",American Economic Review P&P, 97(2), 459-463.
[4] Copeland, A. and Cyril Monnet (2009) "The Welfare E¤ects of Incentive Schemes",Review of Economic Studies, 76(?), 93-113.
[5] Chung, Doug, Thomas Steenburgh, and K. Sudhir "Do Bonuses Enhance Sales Produc-tivity? A Dynamic Structural Analysis of Bonus-Based Compensation Plans", HBSWorking Paper #11-041.
[6] Fox, Jeremy, Kyoo il Kim, Stephen Ryan, and Patrick Bajari "A Simple Estimator forthe Distribution of Random Coe¢ cients", forthcoming in Quantititative Economics.
[7] Goolsbee, Austan and Peter J. Klenow (2006) "Valuing Products by the Time SpentUsing Them: An Application to the Internet", American Economic Review P&P, 96(2),108-113.
[8] Hendel, Igal, and Aviv Nevo (2006) "Measuring the Implications of Sales and ConsumerInventory Behavior", Econometrica, 74(6), 1637-1673.
[9] Johari, Ramesh, Gabriel Weintraub, and Benjamin Van Roy (2009) "Investment andMarket Structure in Industries with Congestion", Working Paper.
[10] Lahiri, S.N. (2003) "Resampling Methods for Dependent Data", Springer.
[11] Lambrecht, Anja, Katja Seim, and Bernd Skiera (2007) "Does Uncertainty Matter?Consumer Behavior Under Three-Part Tari¤s", Marketing Science, 26(5), 698-710.
[12] Malone, Jacob (2012) "Measuring Congestion Externalities in Communication Net-works", Working Paper.
[13] Marsh, Christina (2012) "Estimating Demand Elasticities Using Nonlinear Pricing",Working Paper.
[14] Misra, Sanjong and Harikesh Nair (2010) "The Welfare E¤ects of Incentive Schemes",Working Paper.
[15] Odlyzko, Andrew, Bill St. Arnaud, Erik Stallman, and Michael Weinberg (2012) "Con-sidering the Role of Data Caps and Usage Based Billing in Internet Access Service",Public Knowledge Whitepaper.
[16] Pagan, Adrian and Aman Ullah (1999) "Nonparametric Econometrics", Cambridge Uni-versity Press.
30
[17] Weintraub, Gabriel Y., C. Lanier Benkard, and Benjamin Van Roy (2010) "Computa-tional Methods for Oblivious Equilibrium", Operations Research, 58(4), 1247-1265.
[18] Weintraub, Gabriel Y., C. Lanier Benkard, and Benjamin Van Roy (2008) "MarkovPerfect Industry Dynamics with many Firms", Econometrica, 76(6), 1375-1411.
[19] Yao, Song, Carl Mela, Jeongwen Chiang, and Yuxin Chen (2011) "Determining Con-sumers�Discount Rates with Field Studies", Working Paper.