Optimal Large-Scale Internet Media Selection Courtney Paulson, Lan Luo, and Gareth M. James ∗ July 10, 2015 Abstract Internet advertising is vital in today’s business world. It is uncommon for a major Internet advertising campaign not to include an online display component. Neverthe- less, research on optimal Internet media selection has been sparse. Firms face consider- able challenges in their budget allocation decisions: the large number of websites they may potentially choose; the vast variation in traffic and costs across websites; and the inevitable correlations in viewership among these sites. Generally, attempting to select the optimal subset of websites among all possible combinations is a NP-hard problem. Therefore, existing approaches can only handle Internet media selection in settings on the order of ten websites. We propose an optimization method that is computationally feasible to allocate advertising budgets among thousands of websites. While perform- ing similarly to extant approaches in settings scalable to prior methods, our approach successfully tackles the challenging task of large-scale optimal Internet media selection. Our method is also flexible to accommodate practical Internet advertising considera- tions such as targeted consumer demographics, mandatory media coverage to matched content websites, and target frequency of ad exposure. 1. Introduction With the increased role of Internet use in the United States economy, Internet advertising is becoming vital for company survival. In 2012, U.S. digital advertising spending (including display, search, and video advertising) totaled 37 billion dollars (eMarketer, 2012). Of that 37 billion dollars, Internet display advertising accounted for 40%. Internet display ad spending is also expected to grow to 45.6% of the total in 2016, outpacing paid search ad spending (eMarketer, 2012). Such an increasing trend in Internet display advertising is related to a * Marshall School of Business, University of Southern California. 1
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Optimal Large-Scale Internet Media Selection
Courtney Paulson, Lan Luo, and Gareth M. James ∗
July 10, 2015
Abstract
Internet advertising is vital in today’s business world. It is uncommon for a major
Internet advertising campaign not to include an online display component. Neverthe-
less, research on optimal Internet media selection has been sparse. Firms face consider-
able challenges in their budget allocation decisions: the large number of websites they
may potentially choose; the vast variation in traffic and costs across websites; and the
inevitable correlations in viewership among these sites. Generally, attempting to select
the optimal subset of websites among all possible combinations is a NP-hard problem.
Therefore, existing approaches can only handle Internet media selection in settings on
the order of ten websites. We propose an optimization method that is computationally
feasible to allocate advertising budgets among thousands of websites. While perform-
ing similarly to extant approaches in settings scalable to prior methods, our approach
successfully tackles the challenging task of large-scale optimal Internet media selection.
Our method is also flexible to accommodate practical Internet advertising considera-
tions such as targeted consumer demographics, mandatory media coverage to matched
content websites, and target frequency of ad exposure.
1. Introduction
With the increased role of Internet use in the United States economy, Internet advertising is
becoming vital for company survival. In 2012, U.S. digital advertising spending (including
display, search, and video advertising) totaled 37 billion dollars (eMarketer, 2012). Of that 37
billion dollars, Internet display advertising accounted for 40%. Internet display ad spending
is also expected to grow to 45.6% of the total in 2016, outpacing paid search ad spending
(eMarketer, 2012). Such an increasing trend in Internet display advertising is related to a
∗Marshall School of Business, University of Southern California.
1
wide range of benefits offered by this advertising format, including building awareness and
recognition, forming attitudes, and generating direct responses such as website visits and
downstream purchases (Danaher et al., 2010; Hoban and Bucklin, 2015; Manchanda et al.,
2006).
Nevertheless, firms face considerable challenges in optimal Internet media selection of
online display ads. Because each website represents a unique advertising opportunity, the
number of websites firms may potentially choose to advertise among is extremely high. These
websites also vary vastly by their traffic and advertising costs. Furthermore, when optimizing
advertising budgets across a large number of websites, it is crucial for firms to account for the
inevitable correlations in the viewership among these sites. For example, the 2011 comScore
Media Metrix data show there is over 95% correlation in the viewership of Businessweek.com
and Reuters.com. In such cases, heavy advertising on both websites will inefficiently cause
firms to advertise twice to mostly the same viewers.
These challenges are so formidable that, although Internet advertising is increasingly
recommended to reach consumers (e.g. Unit, 2005; Chapman, 2009), companies often have
to rely on advertising exchanges such as DoubleClick to manage their Internet ad campaigns
(Lothia et al., 2003). These exchanges are recent innovations in advertising that allow firms
to outsource their Internet ad campaigns, giving firms the opportunity to expand online
advertising without having to combat the challenges themselves (Muthukrishnan, 2009).
Generally, a company will specify campaign characteristics (such as which types of consumers
to target) and pay a certain amount of money to the exchange to conduct a campaign with
those characteristics.
One advantage of ad exchanges is their ability to employ behavioral ad targeting, that
is, targeting ads to consumers based on their Internet browsing histories (Chen et al., 2009).
This is usually accomplished by installing cookies or web bugs on users’ computers to track
their online activity. However, this has led to numerous privacy concerns and, in some
cases, legal action against behavioral targeters (Hemphill, 2000; Goldfarb and Tucker, 2011).
Another major concern with outsourcing Internet display ad campaigns to ad exchanges is
that companies must turn over the control of the campaign to the exchange, which creates a
classical principal-agent problem. While the focal firm can request target demographics, the
exchange will ultimately solely determine how funds are allocated (Muthukrishnan, 2009). In
2
such cases, the ad exchange serves as a broker who maximizes its own profit via distributing
ad impressions across multiple campaigns from multiple firms, rather than allocating funds
aligning with each individual firm’s best interest. Consequently, when running an online ad
campaign through an ad exchange, the focal firm’s budget allocation may be more or less
sub-optimal compared with the alternative of managing its own campaign.
In this paper, we propose a method to overcome the above challenges and concerns.
We emphasize a scenario in which firms wish to retain control of their online advertising
campaigns, rather than entirely outsourcing such campaigns to advertising exchanges. In
particular, we consider a setting in which a company wishes to maximize reach, i.e. the
fraction of customers who are exposed to a given ad at least one time. In such cases,
firms still face the same Internet advertising challenges of overwhelming scope and variety.
Historically, to be in full control of their own online advertising campaigns, firms often had
to employ heuristics to choose a select number of websites over which to advertise. These
heuristics include advertising only at big-name websites like Amazon or Yahoo or allocating
evenly over the most visited websites under consideration (Cho and Cheon, 2004). While
such heuristics have been adopted in practice, they can lead to substantial suboptimal budget
allocation. For example, the five highest traffic websites are likely not the optimal sites for
firms to advertise over. Consider again the case of Businessweek and Reuters. The two
websites are both high in traffic. But they actually share highly similar users. A firm will
waste money without gaining many new ad viewers by heavily advertising on both websites,
even if a firm wishes to target primarily frequent viewers of such websites. In addition, a
very popular, high-traffic website may also be very expensive to advertise on and may have
a large percentage of repeat visitors. Hence, it may not be the most cost-effective option for
firms to spend a considerable portion of their advertising budgets on such websites. In many
cases, choosing a less visited but also less expensive website could be a better choice.
Despite the considerable importance of optimal Internet media selection for online display
ads, very few researchers have proposed methods to alleviate the above challenges faced by
firms. Danaher’s Sarmanov-based model (Danaher et al., 2010) was among the first and
most successful attempts to optimally allocate budget across multiple online media vehicles.
This Sarmanov-based method has been proven to work well for budget allocation in settings
on the order of 10 websites. While Danaher’s work represents the most state-of-the-art
method for allocating Internet advertising budget, under this method the consideration of
3
each additional website increases the optimization difficulty exponentially such that the
Sarmanov criterion becomes very difficult to optimize over more than approximately 10
websites (Danaher, 2007; Danaher et al., 2010). For example, even if firms know they wish
to advertise across only 10 out of 50 potential websites, they must test each possible 10-
website combination, resulting in over 10 billion individual problem calculations. Since each
website represents a separate advertising opportunity, such methods are hindered by the
huge volume of Internet websites on which firms could potentially choose to advertise.
The primary goal of this research is to develop a method that allows firms to efficiently
select and allocate budget among a large set of websites (e.g., thousands). One reason for
the difficulty in considering a large number of websites is that the problem of choosing a
subset of websites is generally NP-hard. In a setting involving p potential websites, each of
the 2p possible website subsets must be considered separately, leading to a computationally
infeasible problem.
In a linear regression setting, a similar problem is encountered when performing variable
selection involving large numbers of independent variables. A common solution, adopted by
the statistical literature, involves optimizing a constrained convex loss function, a relaxed
version of the NP-hard variable selection problem. A selection of recent papers includes the
Lasso (Tibshirani, 1996), SCAD (Fan and Li, 2001), the elastic net (Zou and Hastie, 2005),
the adaptive Lasso (Zou, 2006), CAP (Zhao et al., 2009), the Dantzig selector (Candes and
Tao, 2007), the relaxed Lasso (Meinshausen, 2007), and VISA (Radchenko and James, 2008).
Built upon this stream of research, we develop an analogous constrained criterion ap-
proach in our setting, i.e., a relaxed version of the NP-hard website selection problem. Our
method is related to the well-known Lasso formulation (Tibshirani, 1996), but diverges in
that our optimization criterion does not involve a quadratic loss function. Our empirical
investigation illustrates that, for a small number of websites, the proposed method performs
similarly to Danaher et al. (2010). Furthermore, our method can be used effectively in major
online advertising campaigns where a large number of websites is under consideration. Even
5000 websites takes under twenty seconds to optimize for a particular budget on a personal
laptop computer.
We further demonstrate that this method is flexible enough to accommodate common
practical Internet advertising considerations such as targeted consumer demographics, manda-
4
tory media coverage to matched content websites, and target frequency of ad exposure.
Consequently, firms could use our method to fully control their own Internet advertising
campaigns instead of being forced to rely on advertising exchanges, but without having to
give up specific targeting of particular demographic groups and/or websites. Additionally,
our algorithmic efficiency allows firms to quickly compare expected reach across numerous
budgets and various Internet advertising opportunities, giving firms a broad range of adver-
tising campaign and cost options.
The remainder of the paper is structured as follows: in Section 2, we describe our con-
strained optimization approach as a high-dimensional efficient alternative to existing methods
for large-scale Internet advertising optimization. In Section 3, we discuss simulation studies
that compare our optimization to Danaher et al.’s existing method and demonstrate that the
proposed method can handle budget allocation across thousands of websites. Also in Section
3, we provide two case studies (McDonald’s McRib Advertising Campaign and Norwegian
Cruise Lines Wave Season Advertising Campaign) using 2011 comScore Media Metrix data.
We conclude in Section 4 with a summary of our findings, contributions, and avenues for
future work.
2. Methodology
2.1 Model Formulation
Consider a firm that has a budget B for a campaign that is to be run over a particular time
span (e.g., one month or one quarter). A common goal for such a campaign would be to
allocate the firm’s budget across a set of p possible websites to maximize the probability that
an Internet user views the ad at least once during the campaign. This probability is known
as the reach of a campaign. Let wj represent the budget allocated to advertising at the jth
website, where j = 1, . . . , p. Further, let Xij represent the number of times an ad appears
to customer i during her visits to website j during the course of the ad campaign, where
i = 1, . . . , n. Hence, Yi =∑p
j=1Xij corresponds to the total number of ad appearances to
customer i over all websites. Let us also denote an n by p matrix as Z, with zij corresponding
to the number of visits of customer i to website j during the time span of the ad campaign.
In practice, such data (e.g., the comScore Media Metrix data) are available from commercial
browsing-tracking companies such as comScore.
5
Within this context, our problem can be formulated as a fairly common marketing sce-
nario: given that we are constrained by a budget B, how do we allocate that budget to
maximize reach during our Internet display ad campaign? Mathematically this is equivalent
to the following optimization problem:
minw
1
n
n∑
i=1
P (Yi = 0|zi,w) subject to
p∑
j=1
wj ≤ B, and wj ≥ 0, j = 1, . . . , p, (1)
wherew = (w1, . . . , wp) denotes the budget allocation to the pwebsites, and zi = (zi1, . . . , zip)
represents the number of times consumer i visits the p websites over the course of the Internet
ad campaign.
It is challenging to solve Equation (1) because p may be in the thousands, which means
this is an extremely high dimensional optimization problem. Additionally, the optimal so-
lution to Equation (1) should be able to accommodate corner solutions (i.e., the solution
should allow wj = 0 to arise as an optimal solution for certain websites). We discuss how
we address both challenges below.
We first express P (Yi = 0|zi,w) as a function of zi and w, where Yi =∑p
j=1Xij . A
natural approach is to model Xij as a Poisson random variable with expectation γij, i.e.
Xij |zij, wj ∼ Pois(γij) or equivalently,
P (Xij = x|zij , wj) =e−γijγx
ij
x!. (2)
In Equation (2), we model γij as the expected number of ad appearances to consumer i at
website j, given the consumer’s number of visits to the site (zij) and the amount of money the
focal firm spends on advertising at the site (wj). This expected number of ad appearances
is given by the probability of an ad appearing on a random visit to website j (denoted as sj)
multiplied by the number of visits (zij), i.e. γij = sjzij . For example, if a firm buys 20% of
ad impressions at a particular website, and a consumer visits that website ten times during
the course of the ad campaign, γij = 0.2×10 = 2. In this example, on average we expect the
consumer to see the ad twice during the ten visits. The probability the ad appears is simply
the number of ad impressions bought at the website over the total number of expected visits
by all customers to the site, so sj is called the share of ad impressions (Danaher et al., 2010).
Note because of this, sj is interchangeable with wj; buying all ad impressions for website j
means sj = 1 (or, equivalently, wj is maximized such that the ad appears to all visits to the
6
site), while buying no impressions means sj = 0 (or, equivalently, wj = 0). In the paragraph
below, we provide the formula that outlines the exact correspondence between sj and wj.
Let τj represent the expected total number of visits at the jth website during the course
of the ad campaign. Following Danaher et al. (2010), we operationalize τj as τj = φjN ,
with φj being the expected number of per person visits to site j during the ad campaign,
and N being the total Internet population. Let cj represent the cost to purchase 1000
impressions. (Note that this is an industry standard, referred to popularly as CPM.) Then
the total number of impressions purchased will be given by 1000wj/cj . Hence, we obtain
the corresponding relationship between sj (share of ad impressions) and wj (budget spent)
as follows: sj =1000wj
τjcj. For example, if the CPM of a particular website is $2, the expected
total number of visits to the website during the entire ad campaign is 10 million, and the
firm spends $500 advertising on the website, the firm has bought 2.5% of the ad impressions
at that website.
Given γij = sjzij and substituting sj with1000wj
τjcj, we can express γij as a function of zij
and wj below:
γij = θij × wj where θij =zij
τjcj
1000
. (3)
In Equation (3), θij is a known quantity given values of zij , τj , and cj . With this setup,
correlations in viewership among the p websites are directly captured in the zij terms which
carries into θij and then into γij. In Appendix A, we provide a simple illustration that
demonstrates how correlations in the Z matrix are incorporated in our method.
Thus we can model Yi =∑p
j=1Xij as a Poisson distribution with expected value γi =∑p
j=1 γij , i.e.
P (Yi = y|zi,w) =e−γiγy
i
y!. (4)
Combining Equation (4) with our original Equation (1) gives the criterion we wish to opti-
mize:
minw
1
n
n∑
i=1
e−γi subject to∑
j
wj ≤ B and wj ≥ 0, j = 1, . . . , p. (5)
The optimization in Equation (5) has the following appealing properties. First, because
the objective function is a well-behaved convex and smooth function, it is relatively easy
to solve the optimization, even for large values of p. This transforms the original problem
7
from NP-hard to one that is relatively easy to optimize. The algorithm will also not stall
at suboptimal local minima. Second, the form of Equation (5) encourages sparsity in the
solution. Under each given budget, as the number of websites under consideration increases,
our optimization criterion will automatically set a budget of zero to more websites (hence
the corner solutions as we desired; see more discussions on this in Hastie et al., 2009, p. 71).
Lastly, given the convex and smooth nature of the objective function, prior budget solutions
can be used as effective starting points of neighboring budgets. Therefore, we are able to
efficiently optimize over a range of budgets rather than merely solving one particular budget
at a time.
2.2 The Optimization Algorithm
In order to solve Equation (5), we reformulate the optimization using a Lagrangean1:
minw
1
n
n∑
i=1
e−γi +λ
n(∑
j
wj − B) subject to wj ≥ 0, j = 1, . . . , p, (6)
where λ > 0 is the Lagrangean multiplier. (Note λ must be greater than zero in our setting
given the constraint that budget must always be nonnegative.)
It is evident that, for each given budget, there is a corresponding Lagrangean multiplier
λ. For a given number of websites, as budget increases, λ decreases, and the algorithm
allocates more budget to more websites. As budget decreases, λ increases, and we get a
sparser solution.
Since we optimize over the w terms, Equation (6) can be simplified as Equation (7), with
B dropping out of the first order conditions.
minw
1
n
n∑
i=1
e−γi +λ
n
∑
j
wj subject to wj ≥ 0, j = 1, . . . , p. (7)
Although there is no direct closed form solution to Equation (7), problems similar to
that of Equation (7) have been extensively studied recently in the literature, particularly in
statistics, e.g. (Efron et al., 2004; Friedman et al., 2010; Goeman, 2010; Hesterberg et al.,
1in the statistical literature, this is commonly referred to as a penalized optimization equation. Instatistics, the λ
n
∑
j wj penalty would frequently be written as an ℓ1 penalty rather than a summationpenalty. However, for our setup, these two are identical, since we have the condition wj ≥ 0 for all j.
8
2008; Rosset and Zhu, 2007; Schmidt et al., 2007). As a result there exist very efficient
algorithms for solving such problems. In this paper, we utilize one of the most efficient
and easy to implement algorithms known as coordinate descent to solve Equation (7) over
a grid of values for λ, which in turn provides optimal allocations for a range of possible
campaign budgets. The idea behind coordinate descent simplifies our optimization to a
single one-dimensional optimization as described below (see Appendix B for more details of
the algorithm):
Algorithm 1 Coordinate Descent Algorithm for Budget Optimization
1. Specify a maximum budget, Bmax.
2. Initialize algorithm with w̃ = 0, j = 1, and λ corresponding to B = 0.
3. For j in 1 to p,
(a) Marginally optimize Equation (7) over a single website budget wj, keeping
w1, w2, . . . , wj−1, wj+1, . . . , wp fixed.
(b) Iterate until convergence.
4. Increase budget by incrementally decreasing λ over a grid of values, with each λ cor-
responding to a budget, and repeat Step 3 until reaching Bmax.
What makes this approach so efficient is that each update step is fast to compute and
typically not many iterations are required to reach convergence in Step 3 of the algorithm
above. Note that convergence is guaranteed by (Luo and Tseng, 1992) for the form of
Equation (7) as in Step 3 above. Thus our optimization becomes very efficient to solve for
a range of budgets at once.
However, because there is no closed form solution to Equation (7), we use a quadratic
approximation for the objective function in Step 3 of Algorithm 1. Specifically, since we are
using a coordinate descent approach around a point wj, we employ a second order Taylor
approximation of e−γi around wj as follows:
e−γi ≈ e−γ̃i
1−
p∑
j=1
θij(wj − w̃j) +1
2
p∑
j=1
p∑
k=1
θijθik(wj − w̃j)(wk − w̃k)
s.t. wj , wk ≥ 0, j, k = 1, . . . , p,
(8)
9
where γ̃i =∑p
j=1 θijw̃j , and w̃j can be taken as our most recent estimate for wj based on the
last iteration of the algorithm.
Substituting (8) into (7) and computing the first order condition with respect to wj, all
terms involving w1, w2, . . . , wj−1, wj+1, . . . , wp drop out of our criterion. Hence, up to an
additive constant (i.e. the first term of the Taylor expansion), we can approximate Equation
(7) for a particular coordinate wj as:
minwj
1
n
n∑
i=1
e−γ̃i
(
−θij(wj − w̃j) +1
2θ2ij(wj − w̃j)
2
)
+λ
nwj subject to wj ≥ 0. (9)
With our simplified criterion, we show in Appendix B that the first order condition to
Equation (9) can be written as Equation (10), with the otherwise condition enforcing wj ≥ 0:
wj =
w̃j +∑n
i=1e−γ̃iθij−λ
∑ni=1
e−γ̃iθ2ijfor Hj > λ
0 otherwise,(10)
where Hj =∑n
i=1 e−γ̃iθij(w̃jθij + 1) (note that Hj is always positive here). Equation (10)
incorporates the wj ≥ 0 condition by testing if the wj coefficient has been forced below zero
by the update. If it has, we set that coefficient to 0, the minimum value allowed (since
budget cannot be negative). This equation can be computed quite efficiently.
Therefore, the optimization in Equation (7) can be solved by iteratively computing Equa-
tion (10) for j from 1 to p and repeating until convergence.2 Appendix B also demonstrates
the computational efficiency of our algorithm. When increasing the number of websites under
consideration to 5000, it takes less than twenty seconds to optimize for a particular budget
on a personal laptop computer with a 2.30 GHz processor.
2.3 Model Extensions
In what follows we discuss three extensions to the proposed method. We will provide an
illustration of each extension in Section 3.
2Because we employ a Taylor approximation in our algorithm, we also did some empirical evaluationto verify the convergence of the approximation. We ran our algorithm with numerous initialization pointsto determine if the optimization had converged to a global optimum. In all cases, we obtained identicalsolutions regardless of initialization points and the convergence was achieved under very few iterations.
10
2.3.1 Extension 1: Targeted Consumer Demographics
In this subsection we describe how the method discussed above can be modified to accom-
modate targeted consumer demographics. Suppose that each individual belongs to one of m
possible demographic groups. For example, if we wished to target people based on household
income and whether or not they had children, we could have m = 4 possible demographic
groups (low household income with or without children, and high household income with or
without children). It will often be the case that the “actual” proportions of individuals with
these demographics in our data, P1,a, . . . , Pm,a, will differ from the targeted demographic
makeup, P1,d, . . . , Pm,d, of the firm. For instance, it may be that the fraction of individuals
with low household income and with children in our data Z is PLC,a = 0.3, while the focal
firm’s target consumer base consists of a much greater percentage of such consumers, e.g.,
PLC,d = 0.6. Within this context, we would like to upweight individuals with low household
income and children in our data sample.
This is easily accomplished with a simple adaptation to Equation (7):
minw
1
n
n∑
i=1
pie−γi +
λ
n
∑
j
wj subject to wj ≥ 0, j = 1, . . . , p. (11)
where pi = PDi,d/PDi,a and Di represents the demographic group that individual i falls into.
Since PDi,a is computed from observed data and PDi,d is based on the focal firm’s target
customer base, pi is a fixed and known quantity. Therefore, optimizing Equation (11) is
accomplished in exactly the same fashion as for Equation (7).
2.3.2 Extension 2: Mandatory Media Coverage to Matched Content Websites
Aside from targeted consumer demographics, a firm might wish to impose mandatory media
coverage to certain subsets of websites. For example, when planning the online advertising
campaign for its annual “wave season,” Norwegian Cruise Lines may want to allocate a
certain minimum budget to advertising on aggregate travel sites such as Orbitz or Expedia
in addition to other websites. In this subsection we discuss how the proposed method can
be modified to accommodate such requirements. Specifically, we can modify Equation (7)
to require wj to be above a certain threshold, say wj ≥ minj , to ensure that a minimum
budget is allocated to each aggregate travel website j.
11
Using the same approach as for optimizing Equation (7) we can show that the new
optimization is accomplished by setting the “otherwise” condition in Equation (10) to a
minimum non-zero amount. Specifically, we would replace Equation (10) with the following:
wj =
w̃j +∑n
i=1e−γ̃iθij−λ
∑ni=1
e−γ̃iθ2ij
for Hj − λ > minj
minj otherwise.(12)
2.3.3 Extension 3: Target Frequency of Ad Exposure
Another practical consideration in an online advertising campaign is the target frequency of
ad exposures (e.g., Krugman, 1972; Naples, 1979; Danaher et al., 2010). For example, sales
conversions and profits from online display ads might be highest when the consumer is served
an ad within a certain range of frequencies (e.g., one to three times) during the duration of
the ad campaign. The proposed method can also be readily modified to accommodate such
considerations. Within our context, this corresponds to P (ka ≤ Yi ≤ kb|zi,w) where ka < kb
respectively represent lower and upper bounds on ad exposures. Given prior experience,
the firm might determine the lower bound (i.e., ka) and the upper bound (i.e., kb) for the
target range of ad exposures. This is known as effective frequency or frequency capping (the
latter typically sets the lower bound at 1 and imposes an upper bound on the number of ad
exposures).
Within our context, we can modify Equation (5) as follows to accommodate such con-
siderations:
maxw
1
n
n∑
i=1
kb∑
y=ka
P (Yi = y|zi,w) subject to∑
j
wj ≤ B, and wj ≥ 0, (13)
where as before P (Yi = y|zi,w) =e−γiγ
yi
y!. Using the example of 1 ≤ Yi ≤ 3, our problem
involves maximizing1
n
n∑
i=1
e−γi(γi +1
2γ2i +
1
6γ3i ). (14)
Again we take a second-order Taylor expansion, resulting in equations with a similar form
to Equation (9) and Equation (10).
12
3. Empirical Investigation
In Section 3.1, we compare the proposed method with the method by Danaher et al. (2010).
In Section 3.2, we demonstrate how our method can be used for optimal budget allocation
when the number of websites under consideration is very large (e.g., 5000 websites), which
is computationally prohibitive for extant methods. In sections 3.3 and 3.4, we discuss two
case studies where we use the proposed method and its extensions for McDonald’s McRib
and Norwegian Cruise Lines’ Wave Season online advertising campaigns.
Our empirical illustrations are based on the 2011 comScore Media Metrix data, which
comes from the Wharton Research Data Service (www.wrds.upenn.edu). comScore uses
proprietary software to record daily webpage usage information from a panel of 100,000
Internet users (recorded anonymously by individual computer). Therefore, the comScore
data can be used to construct a matrix of all websites visited and the number of times each
computer visited each website during a particular time period. A number of prior studies
in marketing have utilized comScore Media Metrix data (e.g., Danaher, 2007; Liaukonyte
et al., 2015; Montgomery et al., 2004; Park and Fader, 2004).3
3.1 Comparison between Proposed Method and Danaher et al. (2010)
3.1.1 Comparison using Data Simulated from Danaher et al.s Sarmanov Func-
tion
To date, the state-of-the-art method for optimal budget allocation of Internet display ads is
by Danaher et al. (2010). A basic premise of this method is that the number of visits indi-
viduals have to websites (denoted as a n by p matrix Z in our context) can be characterized
by a multivariate negative binomial distribution (referred to as MNBD hereafter). Within
this setup, Danaher et al. (2010) proposes an optimization method to maximize reach for
each given budget.
3We followed Danaher et al. (2010) to calculate the effective Internet population size for our data (denotedas N in Section 2). We first consider the size of the U.S. population at the time of our data set, which is310.5 million (Schlesinger, 2010). We then multiply it by the proportion of users who actually visited atleast one website in our data set (for example, 48.63% in our comScore January 2011 data). We then defineN as 155.25 million (48.63%*310.5 million). It is worth noting that, because the specific value of N simplyserves as a baseline effective Internet population estimate in our reach estimates, the relative performanceof various methods remain qualitatively intact if N is defined as a smaller/greater proportion of the U.S.population.
13
To examine how our method performs under the basic premise of Danaher et al.’s ap-
proach, we first simulate a Z matrix from an MNBD distribution with a set of known
parameters. Based on the simulated Z matrix, we know the true optimal reach under each
budget. Next, we apply both methods on the simulated Z matrix and compare the discrep-
ancies between the true optimal reach and the reach obtained based on the budget allocations
suggested by the two methods.
Because the Z matrix in this case originates from the MNBD distribution (which is the
basic premise of Danaher et al.’s method), we expect that Danaher et al’s (2010) method
would perform better than the proposed method under such comparisons. Nevertheless, we
aim to evaluate the extent to which the proposed method could achieve a reach that is similar
to the true optimal or the reach obtained under Danaher et al.’s (2010) method. Because
Danaher et al.’s (2010) method is only computationally efficient for budget allocation across
a relatively small number of websites, we demonstrate such comparisons for the case of seven
websites below.
We first generate the Internet usage matrix, Z, with 5000 rows (users) and 7 columns
(websites), based on a MNBD with αj and rj, j = 1, ..., 7, the usual parameters associated
with a MNBD, and ωj,j′, a set of correlation parameters denoting the correlation coefficient
in viewership between websites j and j′. To make our simulation as realistic as possible,
we establish αj , rj , and ωj,j′ as the values from the seven most visited websites from the
December 2011 comScore data. We also use the CPMs provided by comScore’s 2010 Media
Metrix (Lipsman, 2010) in this stimulation. See Appendix C for more details on our data
generation method.
We then employ the following procedure to compare the two methods. We first obtain
the true optimal reach under each budget based on the true αj , rj , and ωj,j′ parameters and
the optimal criterion in Danaher et al.’s (2010) method. Next, we apply both the proposed
and Danaher et al.’s (2010) methods on the simulated Z matrix to obtain the corresponding
reach estimates. Note that Danahers methodology optimizes over share of impressions, sj ,
instead of monetary spending, wj . Nevertheless, we can readily convert sj to wj using the
formula sj =1000wj
τjcjas given in Section 2.4
4Since Danaher et al.s reach function is highly nonconvex, it can find local optima during optimization.Consequently, we run his optimization with several initialization points and choose the results with thehighest reach in our result comparisons. Since a firm cannot buy more than 100% of ad impressions at a
14
0.5 1.0 1.5 2.0
0.4
0.5
0.6
0.7
0.8
0.9
Budget (in millions)
Rea
ch (
Dan
aher
)
0.5 1.0 1.5 2.0
0.4
0.5
0.6
0.7
0.8
0.9
Budget (in millions)
Rea
ch (
Pro
pose
d M
etho
d)
OptimalProposedDanaher
Figure 1: Performance Comparison between Proposed and Danaher’s (Simulated Data)
Given that the proposed and Danaher et al.’s (2010) method each have their own defi-
nitions of the reach function, we report the reach comparisons using both reach definitions.
(See Danaher et al. 2010 for the formal definition of their reach function.) Figure 1 shows
the reach curves for the average reach estimate at each budget across the 100 simulation
runs using both definitions of reach. Note that the true optimal reach is in solid black, the
Danaher estimate is in dashed red, and our estimate is in dotted blue.
When using Danaher et al.’s reach function (left panel), both methods yield reach fairly
close to the true optimal reach. As expected, Danaher’s method performs slightly better
under this comparison, because not only are we using Danaher’s definition of reach, but
website (i.e., 0 ≤ sj ≤ 1), we force our algorithm’s optimization to stop allocating budget to a website oncewj =
τjcj1000
is reached (corresponding to sj = 1).
15
we also generate the data from the MNBD assumed by Danaher’s method. When using
estimate even slightly outperforms the optimal reach toward the higher budgets. This occurs
because the optimal reach is based on Danaher’s reach definition whereas here we are using
our definition of reach in this figure.
Overall, the comparisons above demonstrate that, even when the Internet usage matrix
Z is simulated from the MNBD as assumed in Danaher’s method, our method performs
reasonably well. Comparing the computation speed of the two methods, we discover that the
computation speed of the proposed method is over ten times faster under this setting. Given
the highly non-convex nature of the optimization criterion in Danaher et al’s (2010), the
discrepancies in computation speed would increase exponentially for larger-scale problems.
3.1.2 Comparison using comScore Data
In this subsection, we compare the two methods using the December 2011 comScore Media
Metrix data. Specifically, we use Internet usage data from the top seven most visited websites
that support Internet display advertisements. The full month of data contained 51,093 users
who visited one of the seven websites at least once in December 2011. We fit both the
proposed and Danaher’s method to 100 randomly chosen subsets of these users, each of size
5,109 (approximately ten percent of the population). Again, we use the CPMs as given in
comScore Inc.’s Media Metrix data from May 2010 Lipsman (2010).
Figure 2 shows the reach curves for the average reach at each budget across the 100
sample runs, using both reach functions. Within this context, we define the true optimal
reach (black solid) as that obtained from our method applied to the entire data set of 51,093
users. Danahers (red dashed) and our (blue dotted) estimates are both computed from the
10% subsets of the data. This also approximates real-world conditions in which a company
has access to only part of the total browsing history of all Internet users. All reach curves in
Figure 2 are then calculated on the ninety percent holdout data to ensure fair comparisons
across methods.
When using Danaher’s definition of reach (left panel), the three methods yield relatively
similar reach. Similar to the right panel of Figure 1, the Danaher reach estimate outperforms
the “optimal” reach in the left panel because the optimal reach in this case was computed
16
0.5 1.0 1.5 2.0
0.4
0.5
0.6
0.7
0.8
0.9
Budget (in millions)
Rea
ch (
Dan
aher
)
0.5 1.0 1.5 2.0
0.4
0.5
0.6
0.7
0.8
Budget (in millions)
Rea
ch (
Pro
pose
d M
etho
d)
OptimalProposedDanaher
Figure 2: Performance Comparison between Proposed and Danaher’s (Real Data)
using the proposed method. When using our definition of reach (right panel), the full data
set performs best as expected, followed by results from applying our method to the 10%
subset, and finally the Danaher estimates.
To conclude, both comparisons in Section 3.1 illustrate that the proposed and Danaher’s
methods perform similarly when applied to problem settings scalable to the latter. However,
due to its non-convex optimization criterion, the Danaher approach is considerably slower to
compute and as a result encounters significant computational difficulties in settings involving
a large number of websites. In Section 3.2, we demonstrate that, while computationally
prohibitive for extant methods, the proposed method can be used to optimally allocate
advertising budget across a very large number of websites.
17
3.2 Simulated Large-Scale Problem: 5000 Websites
In practice, most Internet media selection problems involve far more than a handful of web-
sites. In this subsection we illustrate how the proposed method can optimize over thousands
of websites. To demonstrate this, we simulate an Internet usage matrix of 50,000 people over
5000 websites.5 The visits to each website are randomly generated from a standard normal
distribution (after both rounding and taking the absolute value, since website views are pos-
itive integers) which are then multiplied by a random integer from zero to ten with higher
weight on a value of zero. This ensures that our simulated data set has similar characteristics
to the observed comScore data, since we observe a high percentage of matrix entries in the
real data as zeros. The CPMs of these websites are randomly generated, chosen from 0.25
to 8.00 in increments of 0.25.
Similarly as before, we run our method over a 10% subset of the 50,000 users, then calcu-
late reach on the 90% holdout data in our result comparisons. Because it is computationally
prohibitive for Danaher et al.’s method to optimize over 5000 websites, we compare the
proposed method to the following benchmark approaches: 1) equal allocation over all 5000
websites; and 2) cost-adjusted equal allocation (i.e. average number of visits/CPM) over the
most visited 10, 25, and 50 websites. We believe that these alternative approaches mimic
approaches often used in practice when the sheer number of websites is infeasible to examine
individually, such as those outlined in Cho and Cheon (2004).
Figure 3 shows the result comparisons. Even with only a ten percent subset of the data,
the proposed method yields reach estimates very similar to the optimal reach estimate based
on the entire dataset. In addition, the proposed method outperforms all the benchmark
approaches. We find that the equal allocation approach is by far the worst. The cost-
adjusted approaches perform better, but still worse than our method. Overall, we show that
the proposed method can be used to effectively allocate advertising budget across a very
large set of websites.
5We choose to simulate this dataset because data cleaning in comScore for 5000 websites is highly timeconsuming. For simplicity, the data is generated independently without correlations. Since the proposedmethod is designed to leverage correlations across sites, this setup provides a lower bound with respect toadvantages from our approach.
18
0.0 0.1 0.2 0.3 0.4 0.5 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Budget (in millions)
Rea
ch
Method
OptimalProposedTop 100Top 50Top 25Equal
Figure 3: Simulated Data Reach, 5000 Websites
3.3 Case Study 1: McDonald’s McRib Sandwich Online Advertising Campaign
We now demonstrate how the proposed method can be applied in real-world settings. In our
first case study, we consider a yearly promotion for McDonald’s McRib Sandwich, which is
only available for a limited time each year (approximately one month).
Because McRib is often offered in or around December (Morrison, 2012), we consider
the comScore data from December 2011 to approximate a McRib advertising campaign. In
particular, we manually went through the comScore data set to identify the 500 most visited
websites that also supported Internet display ads. Our data then contains a record of every
computer that visited at least one of these 500 websites at least once (56,666 users). Thus
Z is a 56,666 by 500 matrix. We then separate our full data set into a ten percent training
19
data set (5667 users) and a ninety percent holdout data set. Similarly as before, we use the
training data to fit the method, then calculate reach on the holdout data.
Table 1 provides the categorical makeup of the 500 websites we consider in this applica-
tion. We include sixteen categories of websites: Social Networking, Portals, Entertainment,
E-mail, Community, General News, Sports, Newspapers, Online Gaming, Photos, Fileshar-
ing, Information, Online Shopping, Retail, Service, and Travel. The Total Number column
provides the total number of websites in each category. For simplicity, the CPM values for
each website are based on average costs of the website categories provided by comScore Inc.’s
Media Metrix data from May 2010 (Lipsman, 2010).6 Table 1 shows that Entertainment and
Gaming are by far the largest categories (with 92 and 77 websites out of 500, respectively),
while Sports, Newspaper, and General News are the most expensive at which to advertise
(all over $6.00). Additionally, it appears in Table 1 that advertising costs vary considerably
across these website categories. In Appendix D (Table A3), we also provide an overview of
viewership correlations within and across each of the sixteen website categories.
Table 1 also shows the number of websites chosen in each of the sixteen website categories
over three different methods: 1) the original approach that maximizes overall reach, 2) our
extension to maximize reach among targeted consumer demographics, and 3) our extension
to maximize effective reach with target frequency of ad exposures. This table also provides
the number of websites chosen in each category when we only account for the top 25 and
top 50 most visited sites as benchmarks to our approach. More details about our result
comparisons are provided below.7
3.3.1 McRib Campaign: Maximizing Overall Reach
In this subsection, we assume that McDonald’s simply attempts to reach as many users
as possible during its McRib campaign. Again, because Danaher et al’s (2010) method
cannot optimize over 500 websites, we use the following benchmark methods in our model
comparisons: equal allocation over all 500 websites, and cost-adjusted equal allocation across
the top 10, 25, and 50 most visited websites.8
6In practice firms could readily apply actual CPMs of all sites in such an optimization.7Detailed budget allocation results for each budget and each website are available from authors upon
request.8Note that, while included in Figure 4, the 10-website benchmark method is omitted from Table 1 for
space considerations.
20
Proposed Method Benchmark
Budget = $500K Budget = $2 million
Total Targeted Targeted Targeted Targeted Top Top
Category Number CPM Original Consumers Exposures Original Consumers Exposures 25 50
Community 23 2.10 8 8 11 14 14 20 1 4
E-mail 7 0.94 7 7 7 7 7 7 3 5
Entertainment 92 4.75 2 1 10 13 10 29 0 0
Fileshare 28 1.08 23 20 26 24 22 28 2 7
Gaming 77 2.68 30 40 44 37 45 59 0 1
General News 12 6.14 0 0 0 0 0 0 0 0
Information 47 2.52 24 25 29 27 27 36 1 3
Newspaper 27 6.99 0 0 0 0 0 0 0 0
Online Shop 29 2.52 11 12 15 15 15 26 1 1
Photos 9 1.08 6 6 9 8 9 9 0 2
Portal 30 2.60 13 14 17 16 16 26 5 7
Retail 57 2.52 33 39 39 36 41 49 2 7
Service 18 2.52 13 14 10 14 14 12 2 2
Social Network 17 0.56 16 17 17 17 17 17 8 11
Sports 17 6.29 0 0 1 1 0 1 0 0
Travel 10 2.52 6 7 8 8 8 8 0 0
Table 1: Website Categories Chosen by Method, McRib
Table 1 reports the categorical makeup of chosen sites under two budgets ($500K and $2
million). This categorical makeup shows how many websites in each category were chosen
with non-zero budget allocation in the solutions of the optimization. It is not surprising
that the optimization does not select many websites in relatively expensive categories such
as Sports, Newspaper, and General News. Advertising at a relatively expensive website is
only desirable when that website can reach an otherwise unreachable audience. In this case,
other websites offer reach without the high price. Social Networking, for example, offers a
relatively inexpensive way to reach consumers who are visiting other websites as well. Note
that in Table A3 in Appendix D, social networking sites have relatively high correlations
in viewership across other site categories with the only exception being email and gaming
sites. Consequently, the optimization ultimately includes all 17 Social Networking websites
and leaves out the expensive categories where reach would be duplicated.
Priceline.com, Travelocity.com, and TripAdvisor.com). We require our optimization to place
at least 2.5 percent of the budget at each of these eight sites.
We follow the same procedure as in the previous case study to obtain the 500 most
visited websites in January 2011 that supported online display advertisements. These 500
websites are also divided into sixteen categories and assigned an average CPM based on
their category. 48,628 users visited at least one of these 500 websites during January 2011,
meaning our Z matrix is 48,628 by 500. We again divide this data into a 10% subset (4,863
users) of calibration data and use the remaining 90% as holdout data.9
Figure 5 demonstrates our reach curves under this extension. We refer to the optimization
with mandatory media coverage of aggregate travel sites as constrained optimization (in dash
blue), and the standard optimization approach as unconstrained (in solid black). We also
include a naive method, allocating the entire budget evenly to the eight aggregate sites (in
dotted green).
The curves on the left show the calculation of reach using the entire data set, i.e. the
full 90% holdout data. As we expect, the unconstrained curve performs slightly better
9We omit the website category makeup description of this application due to its similarity to Table 1 andpage limits. It is available from authors upon request.
25
0.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
Budget (in millions)
Rea
ch b
ased
on
Ove
rall
Dat
a
0.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
0.8
1.0
Budget (in millions)
Rea
ch b
ased
on
Trav
el U
sers
Sub
set
UnconstrainedConstrainedEqual
Figure 5: Reach with Mandatory Coverage in Aggregate Travel Sites
than the constrained curve, since we cannot do better in overall reach by constraining our
optimization. In addition, the naive approach performs poorly. Because the aggregate travel
websites do not reach a majority of the users of the data set, allocating budget only to these
eight websites will naturally limit the ad’s exposure to all Internet users.
The curves on the right show the reach for the subset of users who visited at least one of
the eight aggregate travel websites in January 2011 (there are 6,431 such individuals in our
data set). Presumably these consumers are more likely to be interested in searching for travel
deals compared to the others. In this case, the constrained curve significantly outperforms
the unconstrained curve. By constraining the optimization to allocate a percentage of the
budget to each aggregate travel website, we reach far more of the users who actually visit
these sites, which is the group NCL would like to target. In this case, NCL can meet its
aggregate travel site requirements without sacrificing much overall reach, meaning that most
26
users will still view the ad in general, but we are also confident that we have reached the
subset of people most likely to book a cruise.
In the right panel of Figure 5, the naive approach of equal allocation across the eight
travel aggregate sites performs slightly better than the proposed method when the reach is
calculated based on the subset of aggregate travel site users (i.e., constrained reach). But
this result is expected. NCL is most likely to reach users on the aggregate sites by putting
as much budget as possible into those eight sites. As we see from the overall reach curves on
the left, that method will not capture users on other websites who might also be attracted
to NCL’s Wave Season campaign but did not visit one of the eight aggregate travel websites.
Depending on whether the firm wants to reach a broader audience or a targeted audience,
either the constrained or the unconstrained optimization could be employed in such online
ad campaigns.
4. Conclusion and Future Work
In the current advertising climate, firms need an online presence more than ever. Never-
theless, the ever-increasing number of websites presents not only endless opportunities but
also tremendous challenges for firms’ online display ad campaigns. While online advertising
is limited only by the sheer number of websites, optimal Internet media selection among
thousands of websites presents a prohibitively challenging task in the modern era.
While existing methods can only solve Internet budget optimization for moderately-sized
problems (e.g. 10 websites), we propose a method that allows firms to efficiently allocate
budget across a large number of websites (e.g. 5000). We demonstrate the applicability
and scalability of our algorithm in real-world settings using the comScore data on Inter-
net usage. We also illustrate that the proposed method extends easily to accommodate
common practical Internet advertising considerations, including targeted consumer demo-
graphics, mandatory media coverage to matched content websites, and target frequency of
ad exposures. Furthermore, the low computational cost means that the proposed method
can rapidly examine a range of possible budgets. As a result, firms can easily examine the
correspondence between budget and reach, providing them with the ability to spend only as
much money as required to achieve a desired level of reach.
27
Consequently, the proposed method provides firms with great flexibility and adaptabil-
ity in their online display advertising campaigns. Accordingly, firms can obtain ultimate
control of their own Internet display ad campaigns, alleviating their need to turn to ad agen-
cies or other conglomerate advertising exchanges with little to no oversight over their own
campaigns.
Our research also offers some promising avenues for further research. For example, while
the proposed method emphasizes maximizing the reach of online display ads, firms could
readily modify our approach and use Internet browsing-tracking data to maximize click-
through and/or downstream purchases of their Internet display ad campaigns. Additionally,
in the current paper, we consider the perspective of an individual firm that wishes to maxi-
mize reach for its particular campaign. This method could be further extended for use by an
advertising broker who wishes to maximize reach over a set of clients. Advertising brokers
must provide clients with the best possible campaigns but also use as much of their existing
ad space inventory as possible. Thus an interesting extension of our method would be to
maximize over multiple campaigns from the perspective of an advertising agency. We will
leave such endeavors for future work.
References
360i (2008). Point of view on gaming. Technical report.
Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much
larger than n. The Annals of Statistics, 35(6):2313–2351.
Chapman, M. (2009). Digital advertising’s surprising economics. Adweek, 50(10):8.
Chen, Y., Pavlov, D., and Canny, J. (2009). Proceedings of the 15th ACM SIGKDD In-
ternational Conference on Knowledge Discovery and Data Mining: Large-scale behavioral
targeting.
Cho, C. and Cheon, H. (2004). Why do people avoid advertising on the internet? Journal
of Advertising, 33(4):89–97.
Danaher, P. (2007). Modeling page views across multiple websites with an application to
internet reach and frequency prediction. Marketing Science, 26(3):422–437.
28
Danaher, P., Janghyuk, L., and Kerbache, L. (2010). Optimal internet media selection.
Marketing Science, 29(2):336–347.
Drewnowski, A. and Darmon, N. (2005). Food choices and diet costs: an economic analysis.
The Journal of Nutrition, 135(4):900–904.
Efron, B., Hastie, T., Johnston, I., and Tibshirani, R. (2004). Least angle regression (with
discussion). The Annals of Statistics, 32(2):407–451.
eMarketer (2012). Digital ad spending tops 37 billion. URL:
Table A1: True and Estimated Mean α, r Values, Simulated Data
Table A2 shows a comparison between the estimated and full α and r for the seven-
website data from comScore (Section 3.1.2), where the full values are values based on the
entire December 2011 comScore data set, and the estimated values are the mean values across
100 runs on random 10% subsets. The table also shows the mean squared error between the
full and estimated values over the 100 runs, as well as the mean absolute deviation. Again,
the estimated values based on the subset data highly resemble the values obtained from the
full data.
10Note the simulation used in Section 3.1 is done with 5,000 synthetic respondents due to the computationalcomplexity involved in estimating Danaher et al.’s method for 50,000 synthetic respondents.