Learning Demand Curves in B2B Pricing: A New Framework …scholar.rhsmith.umd.edu/sites/default/files/iryzhov/files/2015_quryfu_pricing.pdfon the Vendavo dataset. Part of our case

Submitted to Manufacturing & Service Operations Managementmanuscript (Please, provide the manuscript number!)

Authors are encouraged to submit new papers to INFORMS journals by means ofa style file template, which includes the journal title. However, use of a templatedoes not certify that the paper has been accepted for publication in the named jour-nal. INFORMS journal templates are for the exclusive purpose of submitting to anINFORMS journal and should not be used to distribute the papers in print or onlineor to submit the papers to another publication.

Learning Demand Curves in B2B Pricing:A New Framework and Case Study

Huashuai QuDepartment of Mathematics, University of Maryland, College Park, MD 20742

Ilya O. Ryzhov, Michael C. FuRobert H. Smith School of Business, University of Maryland, College Park, MD 20742

[email protected], [email protected]

Eric Bergerson, Megan KurkaVendavo, Inc., Mountain View, CA 94043

In business-to-business (B2B) pricing, a seller seeks to maximize revenue obtained from high-volume trans-

actions involving a wide variety of buyers, products, and other characteristics. Buyer response is highly

uncertain, and the seller only observes whether buyers accept or reject the offered prices. These deals are also

subject to high opportunity cost, since revenue is zero if the price is rejected. The seller must adapt to this

uncertain environment and learn quickly from new deals as they take place. We propose a new framework

for statistical and optimal learning in this problem, based on approximate Bayesian inference, that has the

ability to measure and update the seller’s uncertainty about the demand curve based on new deals. In a case

study, based on historical data, we show that our approach offers significant practical benefits.

Key words : optimal learning; B2B pricing; price optimization; Bayesian learning; approximate Bayesian

inference

History :

1. Introduction

We study the problem of optimally pricing high-volume commercial transactions between busi-

nesses, referred to as business-to-business or B2B pricing. For example, consider a negotiation

between a supplier of raw materials (the seller) and a manufacturer (the buyer), which ends in a

1

Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing2 Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)

final price offer named by the seller. If the price is rejected, the seller incurs a high opportunity cost

(lost revenue); however, it may not be clear whether a lower offer would have gotten the deal, and if

so, how much lower it should have been. If the price is accepted, the seller is left wondering whether

a higher price would have also worked. The seller makes many such pricing decisions over time,

and attempts to maximize revenue, subject to considerable uncertainty about buyer behaviour and

willingness to pay.

We consider a case application based on historical data provided by Vendavo, Inc., a firm special-

izing in B2B pricing science. The data include information on tens of thousands of B2B transactions

(most of them unsuccessful) involving a single seller and a large number of buyers. This seller’s

price optimization problem involves the following challenges:

• Big data. The data are highly heterogeneous, covering thousands of distinct products and

buyers. Different product types have different price sensitivities. Consequently, the data contain a

large number of “rows” (observed deals) as well as “columns” (explanatory variables). Predictive

models may thus be vulnerable to noise accumulation, spurious correlations, and computational

issues (Fan et al. 2014).

• Noise. We are only able to observe a binary (yes/no) response from the buyer, representing

whether the seller’s price was accepted or rejected. The proportion of accepted offers (“wins”) is

very low. Furthermore, many of the products and buyers may appear infrequently and have few or

no wins. Even with a large amount of data, predictive models are likely to be inaccurate.

• High cost of failure. If a price is rejected, the seller’s revenue is zero. In B2B transactions, the

total value of the deal may be in the millions of dollars. If the historical data are insufficient to

make accurate predictions about future deals, the seller must learn quickly from new deals as they

take place. It may not be enough to use a pricing strategy that works well in the long run, as the

practical value is in the very short term.

We seek to address these challenges using predictive and prescriptive analytics, leading to new

developments in both statistical modeling and price optimization. In addition to short-term perfor-

mance, computational efficiency is also an issue. Ideally, price optimization should be implementable

Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!) 3

in real time and on demand, so that a sales representative may access it during a negotiation

through a tablet app.

Many models in revenue management allow stochastic product demand (Bitran and Caldentey

2003), but in our case, the seller faces the additional challenge of environmental uncertainty : we

do not know the exact distribution of the buyer’s willingness to pay. Rather, this distribution

is estimated from historical data, assuming some statistical model (e.g., logistic regression, as in

Hormby et al. 2010), and this model is updated over time as new transactions take place. In this

way, any given deal provides new information about the demand distribution, aside from its purely

economic value in generating revenue. Furthermore, since any given statistical model is likely to

be inaccurate, we may not wish to implement the price that seems to be optimal under that

model. Instead, we may experiment with prices (for instance, charging slightly more or less than

the recommended price) in order to obtain new information and potentially discover better pricing

strategies. Doing this may result in lost revenue at first, but the new information may help to

improve pricing decisions in the (hopefully near) future.

We approach the problem from the perspective of optimal learning (Powell and Ryzhov 2012),

which typically uses Bayesian models to measure the uncertainty or the potential for error in the

predictive model. In our case, we use logistic regression with the coefficients modeled as a random

vector (because their “true” values are unknown). The practical power of these models comes from

the concept of “correlated beliefs” (Negoescu et al. 2011, Qu et al. 2015), which measures the

similarities and differences between various types of deals, so that a sale involving one product

will teach us something about other, similar products. The Bayesian model can then be integrated

with a pricing strategy that accounts for the uncertainty in the model, e.g., by correcting overly

aggressive prices when the uncertainty is high, or by experimenting with higher prices when there

is a chance that they may be better than we think. The outcomes of our decisions feed back into

the model and modify our beliefs for future decisions. This framework can provide meaningful

guidance within very short time horizons, even in the presence of very noisy data.


Optimal learning methods typically use simple Bayesian models that can be updated very quickly.

In linear regression (ordinary least squares), the standard approach is to assume that the regression

coefficients are normally distributed, which enables us to concisely model and update correlated

beliefs. However, there is no analogous model for logistic regression, making it difficult to represent

beliefs about logistic demand curves. We approach this problem using approximate Bayesian infer-

ence (Ryzhov 2015), and create a new learning mechanism that allows us to maintain and update a

multivariate normal belief on the regression coefficients using rigorous statistical approximations.

We then develop a “Bayes-greedy” pricing strategy that optimizes an estimate of expected revenue

by averaging over all possible revenue curves.

We find that the Bayesian framework performs very well in both predictive and prescriptive

roles. Surprisingly, despite the approximations used in the Bayesian model, its predictive power is

quite competitive with exact logistic regression. Our first insight is that uncertainty is valuable:

the benefits of quantifying our uncertainty about the predictive model easily compensate for any

reduction in accuracy incurred by using approximations. Our second insight is that uncertainty

is more valuable for optimization than for prediction: while the exact and Bayesian models have

similar predictive power, the price recommendation obtained from the model is greatly improved

by the inclusion of uncertainty in the pricing decision.

Thus, our paper makes the following contributions: 1) We introduce a new approximate Bayesian

learning model for learning B2B demand curves based on logistic regression.1 Our approach opti-

mizes a statistical measure of distance (Kullback-Leibler divergence) between the multivariate nor-

mal approximation and the exact, non-normal posterior distribution. 2) We show how the seller’s

beliefs can be efficiently updated in this model, using stochastic gradient methods to calculate the

optimal statistical approximation. 3) We propose the Bayes-greedy pricing policy and show how

these prices can be efficiently computed. 4) We demonstrate the practical value of these methods

1 We note that our model could potentially be used in any application of logistic regression, not only pricing. We

believe that its performance in this application illustrates the practical potential of approximate Bayesian methods

in other problems involving decision-making under uncertainty.


on the Vendavo dataset. Part of our case study is purely data-driven, while another part uses

simulation models calibrated using the data.

2. Literature Review

B2B pricing is a multifaceted problem and has been approached from multiple angles in the liter-

ature. Below, we survey other perspectives and contrast them with the present paper.

Practical implementations of dynamic price optimization often use statistical models such as

logistic regression (Agrawal and Ferguson 2007, Hormby et al. 2010). A significant challenge in

practice is to find a segmentation of the customers (Bodea and Ferguson 2014) that will lead

to successful targeted pricing (Cross et al. 2011). The products can also be segmented based

on common characteristics that influence their value (Gale and Swire 2012). In our work, the

segmentation is assumed to be (mostly) given; in our case study, we use statistical model selection

methods (Hastie et al. 2001) to identify the most important segments from among a large number

pre-specified by the seller. Our learning framework assumes a fixed set of regression features. The

question we ask is how the effects of these segments can be learned efficiently, in a way that leads

to improved price optimization within a short timespan.

The literature has devoted considerable attention to behavioural issues affecting B2B negotia-

tions. For instance, one important issue in practice (Elmaghraby et al. 2015) is that, even if the

seller uses price optimization tools, the salespeople conducting the negotiations may choose not to

implement the recommended prices, instead viewing them as references or targets (Bruno et al.

2012). Elmaghraby et al. (2012) studies the effects of such recommendations on changes in sales-

people’s ultimate price quotes. Zhang et al. (2014) proposes a model that explicitly captures the

latent “trust” state of the buyer and infers this state from a sequence of interactions between the

buyer and seller. Other work has considered strategic behaviour on the buyers’ part (Leng and

Parlar 2005, Elmaghraby et al. 2008), and its effect on the efficacy of pricing strategies such as

trade-in rebates (Agrawal et al. 2015). Our approach may be able to partially accommodate some

of these dimensions (for example, when considering a new deal with a given buyer, the regression


features could potentially include information about our history with that buyer, such as previous

price quotes and responses), but we do not explicitly include an economic or game-theoretic model.

Rather, our core focus is on prediction and optimization based on historical data.

Another relevant stream of literature deals with the “learning and earning” problem (Harrison

et al. 2012), which also considers pricing under environmental uncertainty, where learning plays a

major role. A common approach there is to develop pricing strategies that collect revenue at the

best possible asymptotic rate. In many cases (Besbes and Zeevi 2009, Broder and Rusmevichien-

tong 2012, Keskin and Zeevi 2014, den Boer and Zwart 2015), such strategies are “semi-myopic,”

meaning that they alternate a period of randomized exploration with a period of purely myopic

decision-making, with the exploration periods spaced increasingly further apart. More recent work

(Besbes and Saure 2014, Keskin and Zeevi 2015) has extended these ideas to problems with non-

stationary demand curves. While our B2B setting also involves learning, it differs from this work

in two fundamental ways: 1) The rate-optimality of semi-myopic methods is asymptotic, meaning

that the benefits are realized over a long time horizon. This is highly relevant in B2C applica-

tions, e.g., in e-commerce, where millions of transactions are observed, but the opportunity cost

of each individual transaction may be fairly low. However, B2B strategies require good short-term

performance, which is ultimately evaluated empirically. 2) Much of the existing work on learning

in pricing assumes that customers are homogeneous (i.i.d.), and/or that only a single product is

being sold. Pricing strategies may strongly rely on these assumptions; for example, the Gittins

index approach of Xia and Dube (2007) or Chhabra and Das (2011) cannot be easily extended to

heterogeneous data. By contrast, customer and product heterogeneity is built into our approach.

Our methodology in this paper has roots in the simulation literature (Chau et al. 2014), where

Bayesian models are often used to estimate the performance of a simulation system (Chick 2006,

Chen et al. 2015). We also use stochastic gradient methods from this literature (Fu 2015) to

solve the technical problem of optimizing difficult expectations (such as the Bayesian expected

revenue). Approximate Bayesian inference, which we use to develop our statistical model, is a


promising method for designing learning mechanisms when standard models are not usable (Qu

et al. 2015), and has previously demonstrated practical benefits in applications such as market-

making in financial exchanges (Das and Magdon-Ismail 2009, Brahma et al. 2012). While Bayesian

learning has previously been studied in the context of dynamic pricing (Araman and Caldentey

2009, Farias and Van Roy 2010), our approximate scheme provides the modeling flexibility needed

to accommodate detailed segmentation, which plays a major role in the practice of B2B pricing.

3. Demand Model

Section 3.1 gives the basic definitions and notation for logistic demand curves. In Section 3.2, we

explain how our uncertainty about such a curve can be represented by a Bayesian prior. Section 3.3

describes our approach to updating the prior after a single new deal is observed. Finally, Section

3.4 gives the technical details of how this update is implemented.

3.1. Modeling the demand curve

Consider a generic deal in which the seller quotes a price p, and the buyer makes a binary response

denoted by Y . The event that Y = 1 represents a sale (or “win”), whereas Y = 0 is a “loss,” meaning

that the deal did not go through. We express the win probability P (Y = 1) as a function

ρ (x,β) =1

1 + e−β>x, (1)

where x∈RM is a vector that depends on p, as well as on additional characteristics of the product

or the buyer, which are known to the seller at the time p is chosen. The function ρ, which is not

known exactly to the seller, is also called the demand curve (Cope 2007). The seller’s expected

revenue from the deal is given by

R (p;x,β) = p · ρ (x,β) , p≥ 0,

with p∗ = arg maxpR (p;x,β) denoting the optimal price. For simplicity, we work with the revenue

function throughout this paper. However, it is straightforward to modify the analysis to maximize

profit rather than revenue.


Equation (1) is an instance of logistic regression, a standard model for forecasting demand or

sales (Ch. 9, Talluri and Van Ryzin 2006). In the simplest possible case, we can let x = [1, p]>

,

which implies that the buyers are homogeneous (given a fixed price, their valuations are drawn

from a single common distribution). However, in practice, x also contains information such as the

type and quantity of product stipulated in the deal. We may need to use a large number of dummy

variables to describe the product. For example, a large retailer may wish to include features that

classify products by department (e.g., electronic, furniture, housewares), then generally describe

the item in question (e.g., TVs, cameras, tablets), and finally give more detailed information such

as the brand and model of the item. Additionally, x could describe the buyer with varying degrees

of granularity (e.g., whether the buyer is located in Europe or Asia, followed by more detailed

country information), since B2B pricing is highly individualized in practice (Elmaghraby et al.

2015). We could also include interaction terms between product and customer features (e.g., if a

particular product type sells better in a particular region), as well as interactions between these

features and the price (to model the case where different products have different price sensitivities).

Since the outcome of B2B negotiations heavily depends on the individual salesperson, x may also

include characteristics of the sales force. In a practical application, x may include hundreds or

thousands of elements.

However, in all of these cases, the regression coefficients β are unknown to the seller, and must

be inferred based on prior knowledge as well as new information obtained by observing new wins

and losses. The margin for error in estimating β is quite narrow. First, the opportunity cost for

lost deals is extremely high (the seller receives zero revenue if the deal fails). Second, the demand

curve can be highly sensitive to the values of β, meaning that small estimation errors can lead to

large differences in the recommended prices. We now describe a Bayesian framework for optimal

learning on the basis of a single new observation (the goal being to implement this framework

sequentially).


3.2. Bayesian model for learning demand curves

In the Bayesian view, any unknown quantity is modeled as a random variable whose distribution

represents our beliefs about likely values for that quantity. We use a multivariate normal prior

distribution, that is,

β∼N (θ,Σ) . (2)

The main benefit of the multivariate normal distribution is that it allows us to compactly represent

correlated beliefs using the covariance matrix Σ. The off-diagonal entries in this matrix can be

viewed as representing the degree of similarity or difference between the values of different regression

coefficients. Correlations have great practical impact when the design matrix is sparse, that is,

many of the components of x are equal to zero for any given observation. This is likely to be the

case in our application: the seller may include hundreds of distinct products into the model, and

only a few observations may be available for a given product even if the overall dataset is large.

However, if we believe that two products are similar, correlated beliefs will allow us to learn about

one product from a deal that involves the other one. This greatly increases the information value

of a single deal, and allows us to learn about a large number of products from a small number of

observations. Furthermore, normality assumptions will substantially simplify the computation of

optimal prices in Section 4.

However, we first require a mechanism for efficiently updating the covariance matrix after new

observations. We use Bayes’ rule to derive the conditional density of β given Y , the associated

features x, and the modeling assumption in (2). This posterior density represents our new beliefs

about the regression coefficients after an additional observation has been made. We first rewrite

the likelihood function of Y more compactly as ` (H (β;Y )), where ` (z) = 11+e−z and H (β;Y ) =

(2Y − 1)β>x. Then, the posterior density of β can be written as

P (β |x, Y )∝ ` (H (β;Y )) |Σ|−12 e−

12 (β−θ)>Σ−1(β−θ). (3)

In multi-stage problems where decisions are made sequentially, it is desirable to use a conjugate

model (DeGroot 1970) where the prior and posterior distributions belong to the same family (e.g.,

Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B Pricing10Article submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)

multivariate normal). Such models admit computationally efficient learning schemes where the

entire belief distribution is compactly characterized by a finite number of parameters, and these

parameters can be updated recursively after each new observation. However, (3) is non-normal due

to the presence of `.

We would like to retain the multivariate normal distribution in order to use the power of cor-

related beliefs. Since this is not possible using standard Bayesian updating, we use the methods

of approximate Bayesian inference (Ryzhov 2015). Essentially, if the posterior distribution is not

conjugate with the prior, we replace it by a simpler distribution that does belong to our chosen

family (multivariate normal), and optimally approximates the true, non-normal posterior. We use

a variational Bayesian approach, where the parameters (θ′,Σ′) of the desired normal density Q

are chosen to minimize the Kullback-Leibler (KL) divergence between Q and the true posterior

P (· |x, Y ). This quantity is defined as

DKL (Q ‖ P ) =EQ(

logQ (β;θ′,Σ′)

P (β;x, Y,θ,Σ)

), (4)

where EQ is the expectation with respect to Q. The KL divergence, which is always non-negative,

measures the “distance” between two probability distributions. Lower KL divergence suggests that

there is more similarity between P and Q (zero KL divergence occurs if and only if P and Q are

identical). We wish to find

(θ∗,Σ∗) = arg min(θ′,Σ′)

DKL (Q ‖ P ) ,

the parameter values for which the multivariate normal distribution Q optimally approximates the

non-normal distribution P .

3.3. Approximate Bayesian inference for logistic regression

We first observe that the definition in (4) can be partially simplified, due to the following result.

Proposition 1. Given x, Y , and the modeling assumption in (2), the KL divergence can be

written as

DKL (Q ‖ P ) =EQ[log(1 + e−H(β;Y )

)]+h (θ,Σ,θ′,Σ′) , (5)

Qu, Ryzhov, and Fu: Learning Logistic Demand Curves In B2B PricingArticle submitted to Manufacturing & Service Operations Management; manuscript no. (Please, provide the manuscript number!)11

with the second component given in closed form as

h (θ,Σ,θ′,Σ′) =1

2

[tr(Σ−1Σ′

)+ (θ−θ′)>Σ−1 (θ−θ′)−M − log

|Σ′||Σ|

+C

], (6)

where C is a constant that does not depend on θ′,Σ′.

Proof: From (3), we have

logQ (β;θ′,Σ′)

P (β;x, Y,θ,Σ)= log

|Σ′|−12 e−

12(β−θ′)

>Σ−1(β−θ′)

` (H (β;Y )) |Σ′|−12 e−

12 (β−θ)>Σ−1(β−θ)

+C.

Taking expectations yields

DKL (Q ‖ P ) =EQ[log(1 + e−H(β;Y )

)]+DKL (Q ‖ P0) ,

where P0 is the prior distribution N (θ,Σ). The KL divergence between two multivariate normal

distributions is given in (6). Q.E.D.

Unfortunately, even with this simplification, the expectation in (5) cannot be expressed in closed

form. Note, however, that the function inside the expectation is known, and the expectation is taken

with respect to a known distribution. To optimize the expected value, we can use gradient-based

stochastic search (Kim 2006). In gradient-based optimization, we would first calculate

∇DKL (Q ‖ P ) =∇EQ[log(1 + e−H(β;Y )

)]+∇h (θ,Σ,θ′,Σ′) , (7)

where ∇ is the gradient with respect to (θ′,Σ′), and apply a steepest descent algorithm to find

(θ∗,Σ∗) to a desired precision. Since the expectation in (7) is intractable, its gradient also cannot be

written explicitly, but it can be estimated from Monte Carlo simulation. Blei et al. (2012) proposes

to use the likelihood ratio method (Sec. 15.2 in Spall 2005) for estimating the gradient of the KL

divergence in Bayesian logistic regression. However, this and other gradient-based methods often

converge slowly to the optimal solution when the dimensionality of the problem is high. In our

case, we are estimating M 2 +M parameters, where M is on the order of hundreds or thousands.

To mitigate these computational challenges, we propose the following form for (θ′,Σ′):

θ′ = Σ′(

Σ−1θ+

(Y − 1

2

)x

)(8)

Σ′ =(Σ−1 + v−1xx>

)−1(9)


We apply the Sherman-Morrison-Woodbury formula (Golub and Van Loan 2012) to (8)-(9) and

obtain

θ′ = θ+v(Y − 1

2

)−x>θ

v+x>ΣxΣx, (10)

Σ′ = Σ− Σxx>Σ

v+x>Σx. (11)

This form substantially reduces the dimensionality of the optimization problem, as there is now

only a single parameter v to be determined. Aside from this computational convenience, we choose

this precise form for the posterior parameters because it resembles the updating equations used in

Bayesian linear regression. In a standard least-squares model y=x>β+ ε, normality assumptions

on β and the residual error ε induce normality of the posterior distribution of β given y and x.

Furthermore, the parameters of the posterior distribution can be computed recursively from the

prior parameters (Minka 2000) using an update that is very similar to (10)-(11). In our case, the

quantity v in (11) is exactly analogous to the variance of the residual error in linear regression,

while the quantity v(Y − 1

2

)replaces the continuous observation y.

Intuitively, this model treats v(Y − 1

2

)as an observation of the log-odds of success for the next

deal. Subtracting 12

from Y ensures that this observation can be both positive and negative, so that

new wins cause us to increase the estimated win probability, while new losses shift the estimate

downward. This is in line with the standard interpretation of logistic regression that positive

coefficients lead to higher win probabilities. The parameter v can be thought of as a user-specified

measure of the accuracy of this observation (higher v means lower accuracy).

In the literature, it has been fairly common to approach Bayesian logistic regression by forcing

it to resemble linear regression. The main issue is the choice of v, since there is no pre-specified

variance parameter in logistic regression. Spiegelhalter and Lauritzen (1990) proposed to use v =

p (1− p), where p is the predicted success probability for the feature vector x using θ as the

regression coefficients. Jaakkola and Jordan (2000) proposed an improved method where a heuristic

recursive update was used for v. Under this rule, the posterior update in (10)-(11) was shown to

follow from a first-order Taylor series approximation of the non-normal density in (3).


We propose to calculate v by optimizing the KL divergence, that is, v∗ = arg minvDKL (Q ‖ P ).

Even with the simplified form of θ, Σ, the expectation in (5) is not expressible in closed form.

Now, however, since we are solving a scalar optimization problem, gradient-based methods are an

effective way to find v.

3.4. Gradient-based optimization of the KL divergence

We estimate the gradient of the KL divergence using infinitesimal perturbation analysis (IPA; see

Fu 2008). If Q is the distribution of β, we can write

log(1 + e−H(β;Y )

)= log

(1 + e−(2Y−1)(x>θ′+

√x>Σ′xZ)

),

where Y ∈ {0,1} is fixed and Z ∼N (0,1). For a fixed sample path ω, we now write

∇v log(1 + e−H(β(ω);Y )

)=−(2Y − 1)e−H(β(ω);Y )

1 + e−H(β(ω);Y )∇v(x>θ′+

√x>Σ′xZ (ω)

), (12)

where

∇v(x>θ′+

√x>Σ′xZ (ω)

)=

(Y − 1

2

)x>Σx+x>θ

(v+x>Σx)2 x>Σx+

(x>Σx)2

(v+x>Σx)2Z (ω) . (13)

The next result shows that the sample path (IPA) derivative is an unbiased estimator for (5).

Proposition 2. ∇vEQ[log(1 + e−H(β;Y )

)]=EQ

[∇v log

(1 + e−H(β;Y )

)].

Proof: We can directly verify the conditions given in Proposition 1 of L’Ecuyer (1995) for the

interchange between the gradient and the expectation. First, for any ω, the gradient in (12)-(13)

is continuous at all v≥ 0. Second, for any ω, the above gradient exists for all v≥ 0. Third, for any

fixed v≥ 0, the above gradient exists for all ω. Finally, we observe that, for any v,

∣∣∇v log(1 + e−H(β(ω);Y )

)∣∣≤ ∣∣x>Σx+x>θ∣∣

|x>Σx|+ |Z (ω)| ,

whence EQ supv∣∣∇v log

(1 + e−H(β;Y )

)∣∣ <∞. It is therefore valid to interchange the gradient and

the expectation. Q.E.D.


The IPA estimator for fixed v can be constructed as follows. Given fixed θ, Σ, x, and Y ,

we calculate θ′ and Σ′ using (10) and (11). Then, we simulate Z ∼ N (0,1) and calculate β =

x>θ′+√x>Σ′x · Z. The stochastic component of the estimator of ∇vDKL (Q ‖ P ) is given by

G=−(2Y − 1)e−H(β;Y )

1 + e−H(β;Y )

[(Y − 1

2

)x>Σx+x>θ

(v+x>Σx)2 x>Σx+

(x>Σx)2

(v+x>Σx)2 Z

].

To obtain the deterministic component, we return to (5) and differentiate h. The terms in (6) can

be rewritten as

tr(Σ−1Σ′

)= tr

(I − xx>Σ

v+x>Σx

),

(θ−θ′)>Σ−1 (θ−θ′) =

(v(Y − 1

2

)−x>θ

v+x>Σx

)2

x>Σx,

log |Σ′| = log∣∣∣(Σ−1 + v−1xx>

)−1∣∣∣ ,

whence

∇v tr(Σ−1Σ′

)=

tr (xx>Σ)

(v+x>Σx)2 ,

∇v (θ−θ′)>Σ−1 (θ−θ′) = 2

(v(Y − 1

2

)−x>θ

v+x>Σx

) (Y − 1

2

)x>Σx+x>θ

(v+x>Σx)2

(x>Σx

)2

and

∇v log |Σ′| = − tr((∇v(Σ−1 + v−1xx>

))Σ′)

=1

v2tr

(xx>

(Σ− Σxx>Σ

v+x>Σx

))=

1

v

1

v+x>Σxtr(xx>Σ

).

The final form for the IPA estimator is given by

∇vDKL (Q ‖ P ) =

(v(Y − 1

2

)−x>θ

v+x>Σx

) (Y − 1

2

)x>Σx+x>θ

(v+x>Σx)2

(x>Σx

)2− x>Σx

2v

tr (xx>Σ)

(v+x>Σx)2 + G,

and it follows from Proposition 2 that

∇vDKL (Q ‖ P ) =E(∇vDKL (Q ‖ P )

).


We can now apply the Robbins-Monro stochastic approximation method (Kushner and Yin 2003)

vk+1 = vk−αk∇vkDKL (Q ‖ P ) , (14)

which is guaranteed to converge to v∗ from an arbitrary starting point under suitable conditions

on the stepsize αk. The value obtained from this procedure can then be plugged into (10) and (11)

to determine the parameters of the approximate posterior distribution.

4. Price optimization in the multi-stage problem

We now apply our approximate Bayesian framework to the multi-stage pricing problem. Suppose

that we have a sequence of deals, where xn, n= 0,1, ...,N , denotes the features of the (n+ 1)st deal

(including the quoted price pn), and Y n+1 is the buyer’s response. We use different time indices

to express the fact that the response is observed only after the features (and the price) have been

fixed. The seller’s initial beliefs are represented by a multivariate normal distribution with the prior

parameters (θ0,Σ0), which may be calibrated based on historical data (see Section 5).

Suppose now that, after the first n deals have been observed, the seller’s beliefs are represented

by a multivariate normal distribution with parameters (θn,Σn). The features xn of the next deal

become known to the seller, a price pn is quoted, and the response Y n+1 is observed. We now apply

approximate Bayesian inference and assume that the new posterior distribution of β, taking into

account the new information Y n+1, is normal. The parameters of this distribution are obtained

from the recursive update (10)-(11), with the variance parameter v computed using the procedure

in Section 3.4. We then proceed to the next deal under the assumption that the seller’s belief distri-

bution continues to be normal. In this way, approximate Bayesian inference is applied sequentially.

Every new iteration introduces an additional degree of approximation, but the learning mechanism

is computationally efficient, and we maintain the ability to model and update our uncertainty

about β. We now show how price optimization can be integrated into this framework.

4.1. Definition of Bayes-greedy prices

The seller’s pricing decisions are adaptive, so that pn may depend on the posterior parameters

(θn,Σn), as well as on the other features of xn. The seller’s decision is to choose a pricing policy,


which can be represented as a function π mapping (θn,Σn,xn) to a price pn ≥ 0. The optimal

policy maximizes the objective function

supπ

EP

N∑n=0

R (pn;xn,β) , (15)

where we take an additional expectation of the expected revenue since β is random and the price

pn is not known until n deals have been observed. The notation EP means that the expected value

is taken with respect to the probability measure P induced by the approximate Bayesian model.

It is clear that (15) is intractable even for small N , since our distribution of belief is characterized

by M 2 +M continuous parameters, and furthermore, the sequence {xn}Nn=0 of deals is not known to

the seller in advance. In fact, the seller has very little information about the process that generates

the features xn of each deal; modeling this process is substantially more difficult than modeling

uncertainty about the regression coefficients, and is outside the scope of this paper. However, since

the regression features xn become known just before we choose the price for that deal, it is possible

to design a myopic policy that seeks to maximize the revenue from the deal without looking ahead

to future deals. Myopic policies have also attracted attention in recent literature (e.g., Harrison

et al. 2012) because they can be shown to possess asymptotic optimality properties in some cases.

Since we primarily deal with short time horizons in our application, we focus on developing a

myopic policy that is computationally tractable and will perform well in practice.

Recall that, ideally, the seller would like to maximize the true revenue curve by choosing the

price

p∗,n = arg maxp≥0

p

1 + e−(xn)>β,

where xn is a deterministic function of p. Since β is unknown, a standard definition for a myopic

policy is given by

pn = arg maxp≥0

p

1 + e−(xn)>θn, (16)

where θn is the current vector of regression coefficients. This approach is used in frequentist models

(e.g., in Broder and Rusmevichientong 2012), where θn is computed using maximum likelihood


estimation (in other words, frequentist logistic regression). If xn depends linearly on the price, (16)

has a closed-form expression in terms of the Lambert W-function (Li and Huh 2011).

However, we argue that this approach may underperform here, because it does not use all of

the available information. The value of the Bayesian model is that it enables us to quantify the

decision-maker’s uncertainty about the regression coefficients, which should be leveraged when

making the pricing decision. We define the Bayes-greedy price

pn = arg maxp≥0

Enβ(

p

1 + e−(xn)>β

), (17)

where the expectation is taken with respect to the distribution β ∼N (θn,Σn) obtained through

approximate Bayesian inference. Because the revenue function R is nonlinear, (16) and (17) yield

different prices even for the same values of xn and θn. The Bayes-greedy price takes uncertainty

into account by integrating over the entire space of possible revenue curves. The next result shows

that the Bayesian estimate of the revenue is quasi-concave, which implies that it has a single global

maximum at the Bayes-greedy price.

Proposition 3. Suppose that xn is linear in the price p. Then, the Bayes-greedy revenue func-

tion EnβR (p;xn) is quasi-concave in p when p≥ 0.

Proof: It is straightforward to show that R (p;xn,β) is log-concave in p for fixed β. The expectation

is taken over a multivariate normal density, which is log-concave. The product of log-concave

functions is log-concave. From Brascamp and Lieb (1976), the integral of a log-concave function is

also log-concave. The result follows. Q.E.D.

4.2. Computation of Bayes-greedy prices

We now discuss the solution of the Bayes-greedy price optimization problem in (17). Since this

procedure only depends on n through the posterior parameters, we drop the time index in the

following for notational convenience. Under the Bayesian assumption β∼N (θ,Σ), we have

x>β∼N(x>θ,x>Σx

).


Consequently, the revenue function can be rewritten as

R (p;x,β) =p

1 + e−x>θ−√x>Σx·Z

where Z ∼ N (0,1). The normality assumption considerably simplifies the computation of the

Bayes-greedy price, since (17) now requires us to optimize an expectation over a scalar probability

distribution. This expectation is known in statistics as the logistic-normal integral (Demidenko

2013), and cannot be expressed in closed form2. However, we observe that IPA can again be used

to optimize it. Since the win probability ρ (x,β) is continuous, differentiable, and bounded in p, it

is straightforward to show (similarly to Proposition 2) that the relevant conditions for the validity

of the IPA estimator hold, whence

∇pEβR (p;x,β) =Eβ(∇p

p

1 + e−x>β

).

For a fixed sample path ω, we write

∇pR (p;x,β (ω)) =1

1 + e−x>θ−√x>Σx·Z(ω)

+pe−x

>θ−√x>Σx·Z(ω)(

1 + e−x>θ−√x>Σx·Z(ω)

)2∇p(x>θ+

√x>Σx ·Z (ω)

).

(18)

To make this expression more explicit, we need to specify the dependence of x on the price. Suppose

that this dependence is linear, that is, x can be partitioned as

x=[xf , p ·xp

]>,

where xf is a vector of features whose values are known to the seller and not dependent on p, and

xp is another fixed vector of features related to the price sensitivity. Thus, each component of x

either depends linearly on p, or does not depend on p at all. In the simplest possible example, xfi

may be a dummy variable which equals 1 if the buyer is asking for a certain specific product. We

may then have a different feature xpj = xfi for some j, so that our model includes the base effect of

2 A closed-form approximation is available in Crooks (2009). However, we found that it was less reliable than IPA for

price optimization.


the product on the win probability, as well as a specific price sensitivity for that product. We can

then partition

θ=[θf ,θp

]>, Σ =

Σff Σfp

Σpf Σpp

.In this case,

∇p(x>θ+

√x>Σx ·Z (ω)

)= (xp)

>θp +

(xp)>

Σpfxf + p (xp)>

Σppxp√x>Σx

Z (ω) . (19)

The IPA gradient ∇pR (p;x,β) is obtained by generating Z ∼N (0,1) and substituting this quan-

tity for Z (ω) in (18) and (19). The optimal price is found by iterating

pk+1 = pk +αk∇pkR(pk;x,β

). (20)

Due to Proposition 3, this procedure converges to the Bayes-greedy price.

We can now summarize our entire framework for price optimization and statistical estimation.

Suppose that we have already observed outcomes from n deals and constructed the belief param-

eters (θn,Σn). For the (n+ 1)st deal, we are given the features xf,n, xp,n. We then carry out the

following steps:

1. Apply procedure (20) to find the Bayes-greedy price;

2. Implement the price pn that is returned by this procedure (i.e., quote the price to the buyer);

3. Observe the response Y n+1;

4. Apply procedure (14) to find the optimal variance parameter vn;

5. Calculate (θn+1,Σn+1) from (10)-(11).

This process is repeated for n= 0,1, ...,N .

5. Empirical study

We evaluated the proposed methods on a historical dataset provided by Vendavo. Our goal is to

validate two different dimensions of our work: 1) the approximate Bayesian statistical model, which

can be used for estimation and prediction even when price optimization is not required; 2) the

Bayes-greedy pricing policy. Section 5.1 describes the data and the pre-screening procedure used to


reduce the feature space. Section 5.2 compares the approximate Bayesian model with benchmarks

on purely statistical metrics, where the goal is accurate prediction rather than price optimization.

Section 5.3 presents a qualitative comparison of the Bayes-greedy prices with the historical prices

in the data, and Section 5.4 gives the results of simulations evaluating the potential of different

pricing strategies for maximizing revenue.

5.1. Data description and model selection

Historical transaction data were provided by Vendavo in anonymized form. The part of the data

used to train the model consisted of 50,000 individual observations (both historical wins and

historical losses were recorded). The available information included categorical features for product

and customer types; at the most detailed level, there were 1881 different products and 2051 different

customers. The product types were aggregated hierarchically on four levels; a single product ID

was also assigned to a ProductLevel1, ProductLevel2, and ProductLevel3, with ProductLevel1

representing the coarsest aggregation (containing the most products). Similarly, the customer types

were organized into a hierarchy with two levels. The data also included the quantity of product

stated in the deal, the historical unit price that was quoted, the geographical location of the

customer, the manufacturing plant where the product was made, the channel involved in the sale,

and the ID of the sales representative.

In addition, we built additional features that reflect the popularity of the products and customers’

willingness to purchase them on an aggregate level. These include: 1) the average win rate for each

product; 2) the average win rate aggregated by ProductLevel1; 3) the average unit price for each

product; 4) the average win rate for each customer; and 5) the average win rate aggregated by the

top customer level. We also included interaction terms between the price and the ProductLevel1

variables, modeling heterogeneous price sensitivities.

All together, the initial model contained over 5,000 features. However, many of these features

are unlikely to be statistically significant. For example, many individual product IDs appear very

rarely, in only a small number of deals. In many of these cases, we will never obtain enough


data to confirm that the presence of these products has significant effects on the win probability;

furthermore, even if these products are in fact significant, they simply do not appear frequently

enough to exert a heavy impact on revenue. However, it may well be the case that these products

belong to a significant ProductLevel1 or ProductLevel2.

With a large number of features, the cost of estimating a statistical model may also become

prohibitive. For all of these reasons, we first performed statistical screening using the Lasso

method (Tibshirani 1996, Roth 2004) to eliminate features that are unlikely to be correlated with

the response variable. This method applies a regularization penalty to the standard maximum-

likelihood estimation approach. Given a design matrix X and a response vector y, one solves the

problem

β= arg minβ{− logL (β;X,y) +λ‖β‖1} , (21)

where L is the usual likelihood function for logistic regression. The penalty function ‖β‖1 is non-

differentiable at zero, which causes βi to shrink to zero if the ith feature does not sufficiently

improve the likelihood. The parameter λ controls the tradeoff between model accuracy and model

size, and is typically chosen to optimize some statistical criterion. In our case, we use the area

under the ROC curve, which is widely used to measure the quality of a statistical model when the

response is binary with a low proportion of 1s (Smithson and Merkle 2013).

The regularized problem (21) simultaneously performs model selection (eliminating insignificant

features) and estimation (calculating regression coefficients β). However, recent theoretical work

(Belloni and Chernozhukov 2013) has shown that the coefficients directly obtained from Lasso can

be biased. For this reason, practitioners are recommended to use Lasso for screening, remove all

insignificant features (that is, features i with βi = 0) from the model, and then refit the coefficients

of the remaining features using standard logistic regression or some other technique.

At this point, our model enters the picture. From Lasso, we obtain a total of 188 selected features,

including the unit price, 4 ProductLevel1s, 17 ProductLevel2s, 23 ProductLevel3s, 15 Plants, 16

interactions between ProductLevel1s and the unit price, all the engineered features, and various


customer types. All other features are now removed from the model, so x now has 188 elements.

We can then apply our approximate Bayesian procedure from Section 3 to the training data in

order to estimate the regression coefficients and covariance matrix.

5.2. Statistical quality of the approximate Bayesian model

Our first test seeks to evaluate the approximate Bayesian model in a purely statistical sense.

This comparison can be carried out based purely on the historical data, and does not involve

any price optimization. The goal is simply to gauge the predictive power of the model given

pre-specified historical prices. This is not directly related to the seller’s objective of maximizing

revenue, since even a poorly-specified model could potentially yield good price recommendations

(Besbes and Zeevi 2015). Nevertheless, this issue is important for understanding the quality of the

approximations used in the model.

We considered three statistical methods:

1. Bayesian logistic regression with KL minimization (KL). This is the proposed method

described in Section 3, where the posterior update is made to resemble linear regression, and the

variance parameter is chosen to optimize the KL divergence.

2. Bayesian logistic regression with variational bound (VB). This is the technique proposed by

Jaakkola and Jordan (2000), which also forces the learning mechanism to resemble linear regression,

but uses a heuristic to choose the variance parameter. This benchmark allows us to quantify the

value of optimizing the parameter.

3. Frequentist logistic regression (LR). We also implemented classical frequentist LR with

maximum-likelihood estimation. This benchmark allows us to quantify the value of including uncer-

tainty in the form of a Bayesian prior.

Each of these methods began with the same training data after screening (that is, each method had

to fit 188 coefficients). Each model was then evaluated on a separate test dataset containing 19,385

observations. We calculated five different performance metrics for each model, as summarized in

Table 1.


The Accuracy metric is the percentage of correct predictions made by classifying data points as

wins if their predicted win probability is over 0.5. In this particular application, this metric is less

insightful as both the training and test data are imbalanced, that is, most of the deals were losses.

Therefore, a naive model that predicts all deals to be losses might appear to perform well in terms

of accuracy. The area under the ROC curve (AUC) is a better metric when considering unbalanced

binary response values. The F1 score includes both precision and recall in the calculation, but does

not consider the true negative rate.

We see that frequentist LR has the best AUC on both the training and test data. Nonetheless,

AUC scores are fairly close for all three policies. Furthermore, the VB method has the best recall

and F1 score (but also the worst precision and AUC). The KL model is generally situated between

the two. Both Bayesian models have better recall than LR, meaning that they make fewer false

negative predictions (i.e., are less likely to predict historical wins as losses).

The similarities between models may be more surprising than the differences. Recall that the

KL model uses two layers of approximations: first, we force the non-normal posterior to be normal

(Section 3.2), and second, we force the parameters of that posterior to resemble linear regression

(Section 3.3). By contrast, the LR method calculates the exact maximum-likelihood estimate of the

regression coefficients. Nonetheless, the value of incorporating uncertainty into the model, in the

form of the covariance matrix, largely outweighs the loss incurred by using these approximations,

producing a very similar AUC. Our conclusion from Table 1 is that all three models are fairly

competitive in terms of statistical predictive power; we experimented with several different ways

of generating the training and test sets, and found that KL was consistently close to LR. The real

value of Bayesian uncertainty becomes evident when we move from prediction to optimization.

Table 1 Performance metrics from three statistical modelson the training and test data.

MetricsTraining Data Test Data

LR KL VB LR KL VBAccuracy 0.867 0.863 0.858 0.828 0.825 0.823

AUC 0.871 0.858 0.842 0.851 0.839 0.827F1 Score 0.445 0.441 0.471 0.439 0.439 0.479Precision 0.643 0.616 0.566 0.643 0.616 0.591

Recall 0.340 0.343 0.403 0.333 0.341 0.403


5.3. Comparison with historical prices

We now begin to connect the models back to the price optimization problem. Each of the three

models in Section 5.2 can be used to calculate a recommended price for any new deal, given a fixed

set of features xf , xp. For the LR model, we use the simple myopic policy of (16). The Bayes-greedy

policy can be used by KL and VB, since both are Bayesian models. For each of the deals in the

test set, we calculated the recommended price for each model, and compared it to the historical

price in the data.

This comparison can provide qualitative arguments for or against a model. The historical prices

may not be optimal, but we expect them to be realistic. If any model recommends prices that

are consistently, unreasonably higher than historical, the model is most likely not useful for price

optimization. Additional insight can be obtained by separately considering the historical wins and

losses. Intuitively, if the historical price led to a loss, we expect that the price was too high, and

that the optimal price should be lower. Likewise, if the historical price led to a win, the optimal

price may have been higher.

However, we may not see such a clean separation across the whole dataset. Many of the products

appear very rarely in the data and have few wins, if any. For such products, any model will have

trouble distinguishing between wins and losses. The models will learn better for those product

types that appear reasonably often and have sufficiently many wins. For this reason, we conducted

this comparison for three different ProductLevel1s using data from the test set. Figure 1 shows the

empirical distributions of the difference between the recommended and historical prices (positive

values mean higher recommended prices, negative values mean higher historical prices) for each

model across these three types.

We observe the following behaviours:

• The LR model tends to recommend prices that are much higher than historical (the peaks of

the distributions appear to the right of zero). This is generally the case for both wins and losses.

• The VB model tends to recommend prices that are close to historical (the peaks are close to

zero), for both wins and losses.


(a) First ProductLevel1.

(b) Second ProductLevel1.

(c) Third ProductLevel1.

Figure 1 Differences between recommended and historical prices for selected ProductLevel1s.


• The KL model tends to recommend prices that are close to or below historical for losses, but

higher than historical for wins.

This suggests that the KL model, in conjunction with Bayes-greedy pricing, has better potential

than the other models for price optimization, as it has a better ability to detect opportunities for

additional revenue. This is explored further in the next test.

5.4. Expected revenues based on simulated buyers

We now compare the revenues generated by different statistical and optimization models. This is

the main metric of interest for the seller. However, it is less clear how such a comparison should

be designed: unlike the purely statistical comparison in Section 5.2, it cannot be carried out based

purely on the historical data. The reason is that, for any observation, the distribution of the

response Y depends on the price. We only know the response for the historical price, and there is

no way to go back and redo the same deal with the same customer using a different price offer.

For this reason, we compare different pricing methods using a simulation model, where we use

the data to generate win probabilities for a sequence of buyers, then simulate their responses for

different prices. However, the mechanism for generating these customers must necessarily come from

some statistical model, which itself is one of the main research questions of this paper. For example,

if we use classical logistic regression to fit the demand curve used to generate the customers, this

may bias the results in favour of LR-based policies.

Our approach to this issue is to use multiple simulation models:

1. Frequentist model. Win probabilities are generated from (1), where the true coefficients β

are fixed, but unknown to any of the pricing policies. These values of β are obtained by fitting

a frequentist LR model to a large set of data (70,000 deals). The prior coefficients used by the

policies are fit using a smaller training dataset, so they may be quite different from the true β

values. However, one may expect LR-based methods to do better in this setting.

2. Bayesian model (trained). We first fit the KL model to the training data to obtain the prior

parameters (θ0,Σ0). Then, we run 1000 macroreplications, each of which generates a set of values


β ∼ N (θ0,Σ0). These values are fixed within a given macroreplication and plugged into (1) to

calculate win probabilities. This approach may favour the Bayesian model, since the true coefficients

are drawn from the prior. However, within each individual macroreplication, the values of β may

be quite different from the prior coefficients θ0.

3. Bayesian model (noisy). This approach is similar to the previous one. However, after the true

coefficients β are generated, we replace the prior covariance matrix Σ0 by a diagonal matrix. In

this way, the Bayesian methods are given less information about the correlations and start with

less accurate beliefs, making it more important to learn quickly.

In each of these cases, the various pricing methods start with a prior obtained from the training

data, and are implemented for 100 simulated deals. The features xf , xp of each deal are chosen

by randomly sampling (bootstrapping) a row of data from the test set. The price p can then be

calculated based on these features and the current beliefs about the coefficients, and the response

is generated by plugging xf , xp, p, and the hidden values β into (1). If the price is accepted, the

policy generates p dollars in revenue; otherwise, no revenue is earned. The beliefs are then updated

using the appropriate statistical method.

We implemented the following combinations of statistical and optimization schemes:

• Our approximate Bayesian learning scheme is implemented with both the Bayes-greedy (KL-

Bayes) and frequentist (KL-Freq) pricing policies. That is, we use (10)-(11) to update after each

deal, but the recommended prices are computed from (17) and (16), respectively. We also imple-

mented a version where the KL model was used to fit the prior, but this prior was not updated

during the 100 deals (“no learning” or KL-Bayes-NL). The Bayes-greedy policy was used for pric-

ing. Note that, if the customers are homogeneous (that is, x= [1, p]>

), KL-Bayes-NL will always

pick the same price. However, since our data are highly heterogeneous, this was not the case in

our simulations. By including this version, we can measure the value added by continuing to learn

after we have fit the prior.

• The VB model of Jaakkola and Jordan (2000) is implemented with the Bayes-greedy policy

(VB-Bayes).


(a) Frequentist simulation model. (b) Bayesian model (trained).

(c) Bayesian model (noisy).

Figure 2 Cumulative revenues for 100 deals (averaged over 1000 macroreplications).

• Classical logistic regression is implemented with the frequentist policy (LR-Freq).

• The historical pricing strategy is also implemented (since we are bootstrapping the features

xf , xp from the test set, we can simply use the price p that was recorded). This method does not

use any statistical updating since the decisions are already fixed.

Figure 2 shows the averaged cumulative revenues obtained by different methods in each of

the three simulation models. Somewhat surprisingly, variants of the KL model achieve the best

performance in all cases, even when β is not drawn from the Bayesian model. The LR method

underperforms for the reason described in Section 5.3: it tends to recommend high prices, which

are only infrequently successful. However, LR consistently outperforms VB, suggesting that the


(a) Classification of simulated deals by outcome. (b) True win probabilities for simulated deals.

Figure 3 Outcomes and win probabilities of simulated deals.

KL optimality criterion plays an important role for obtaining good practical performance from the

Bayesian model.3

The choice of statistical model accounts for much of the observed differences in performance.

This occurs because the models are using priors that are calibrated from a relatively large training

dataset (30,000 deals). The additional improvement that can be made from 100 deals is small in

comparison, explaining the similar performance of KL-Bayes and KL-Bayes-NL in Figures 2(a)-

2(b). Nonetheless, when covariance information is removed from the prior, the value of sequential

learning is evident (Figure 2(c)). Furthermore, the Bayes-greedy policy consistently outperforms

the frequentist policy, even when both policies use the KL model: in Figure 2(a), the Bayes-greedy

policy has earned approximately 20% more revenue from 100 deals.

Figure 3 examines the simulated deals more closely for insight into why the KL model and Bayes-

greedy policy earn more revenue than LR. In our simulations, different pricing policies make use of

different statistical models, but the resulting pricing decisions are evaluated on the same set of true

3 The VB method would likely perform better if the customers were generated directly from that model. We did not

consider this case here because the VB model demonstrated the lowest AUC in modeling the customers in the data

(Table 1), and thus would produce the least “realistic” customers.


coefficients β and the same bootstrapped transaction data xf , xp. Thus, for any given deal, the

same demand curve is used to evaluate decisions made by two different policies. In Figure 3(a), we

combined all the simulated deals from all the macro-replications and classified them by outcome,

namely, whether KL-Bayes and LR-Freq won or lost the deal, and if they both won, which policy

had the higher price (and thus earned more revenue).

The most immediate insight from Figure 3(a) is that, among the 19.3% of deals won by both

policies, KL-Bayes charged a higher price 75% of the time. Those deals that were won by only

one of the two policies were split almost equally between KL-Bayes and LR-Freq. Unsurprisingly

(considering the overall low proportion of wins in the data), both policies lost nearly half of the

deals.

To obtain additional insight into the five categories from Figure 3(a), we also calculated the

true win probabilities, using (1) with the true coefficients β, for all of the simulated deals (recall

that the coefficients β characterize the true demand curve, and are generated independently of the

pricing policy). The empirical distributions of these quantities are shown in Figure 3(b). First, we

see that the true win probability varies greatly between individual simulated transactions, which

reflects the heterogeneity of products and customers in our data. Simply put, there are many deals

where the win probability will be low regardless of the price.

Although the same demand curves are used to generate outcomes for KL-Bayes and LR-Freq,

those actual outcomes are simulated independently by generating Bernoulli random variables with

success probabilities given by (1). Thus, even if the policies recommend similar prices, low win

probabilities mean that we are much more likely to see outcomes where one policy wins and one

loses than we are to see outcomes where both policies win. This random noise accounts for the

large proportion of such cases in Figure 3(a). As can be seen in Figure 3(b), we are more likely to

see outcomes where both policies win if the win probability is higher overall for that deal.

Thus, the main reason why KL-Bayes outperforms LR-Freq is not because KL-Bayes is able to

win more “long-shot” deals, but rather because KL-Bayes consistently makes better offers for those

deals that are realistically winnable. In other words, KL-Bayes obtains greater value from each

win.


6. Conclusion

We have developed a framework for statistical and optimal learning in B2B pricing. Our statistical

model uses approximate Bayesian inference to learn an unknown logistic demand curve efficiently.

Our optimization strategy then uses the distribution of belief in this model to recommend prices

that are adjusted for the seller’s uncertainty. Our case study shows that this approach performs

efficiently in realistic settings, and its predictive power is competitive against that of a frequentist

approach that does not use any approximations. We believe that this paper lays the methodological

groundwork for improved decision support tools in price optimization. Moosmayer et al. (2013)

finds that target prices are strongly correlated with salespeople’s final price quotes; thus, even if

recommended prices are not implemented directly, they remain quite influential. Thus, a better

model not only makes better recommendations, but also is less likely to mislead the salesperson.

We briefly discuss some avenues for future work. First, our framework assumes that we have

access to data regarding both wins and losses. While this was true in our case study, there may

be other practical settings where only the historical wins may be available (for example, because

the salespeople prefer not to report losses). Then, in order to model the demand curve, the task

would be to reliably infer the number of losses based solely on the wins, a statistical problem

that is outside the scope of the present paper. Second, we have separated the problem of model

selection (identifying the most important segments) from estimation and optimization, whereas

ideally one might wish to adaptively identify new significant features in real time (for example,

if a new product that was not previously in the model suddenly experiences high demand). The

techniques from Li et al. (2015) may be applicable, but would lead to greater computational cost.

A simpler approach for practitioners may be to repeat the model selection procedure at regular

intervals and “reset” the Bayesian model, which would then be used to learn in the short term.

References

Agrawal, V., M. Ferguson. 2007. Bid-response models for customised pricing. Journal of Revenue & Pricing

Management 6(3) 212–228.


Agrawal, V., M. Ferguson, G. C. Souza. 2015. Trade-in rebates for price discrimination and product recovery.

Submitted for publication .

Araman, V. F., R. Caldentey. 2009. Dynamic pricing for nonperishable products with demand learning.

Operations Research 57(5) 1169–1188.

Belloni, A. V., V. Chernozhukov. 2013. Least squares after model selection in high-dimensional sparse models.

Bernoulli 19(2) 521–547.

Besbes, O., D. Saure. 2014. Dynamic pricing strategies in the presence of demand shifts. Manufacturing &

Service Operations Management 16(4) 513–528.

Besbes, O., A. Zeevi. 2009. Dynamic pricing without knowing the demand function: Risk bounds and

near-optimal algorithms. Operations Research 57(6) 1407–1420.

Besbes, O., A. Zeevi. 2015. On the (surprising) sufficiency of linear models for dynamic pricing with demand

learning. Management Science 61(4) 723–739.

Bitran, G., R. Caldentey. 2003. An overview of pricing models for revenue management. Manufacturing &

Service Operations Management 5(3) 203–229.

Blei, D. M., M. I. Jordan, J. W. Paisley. 2012. Variational Bayesian inference with stochastic search.

Proceedings of the 29th International Conference on Machine Learning . 1367–1374.

Bodea, T., M. Ferguson. 2014. Segmentation, Revenue Management and Pricing Analytics. Routledge.

Brahma, A., M. Chakraborty, S. Das, A. Lavoie, M. Magdon-Ismail. 2012. A Bayesian market maker.

Proceedings of the 13th ACM Conference on Electronic Commerce. 215–232.

Brascamp, H. J., E. H. Lieb. 1976. On extensions of the Brunn-Minkowski and Prekopa-Leindler theorems,

including inequalities for log concave functions, and with an application to the diffusion equation.

Journal of Functional Analysis 22(4) 366–389.

Broder, J., P. Rusmevichientong. 2012. Dynamic pricing under a general parametric choice model. Operations

Research 60(4) 965–980.

Bruno, H. A., H. Che, S. Dutta. 2012. Role of reference price on price and quantity: insights from business-

to-business markets. Journal of Marketing Research 49(5) 640–654.


Chau, M., M. C. Fu, H. Qu, I. O. Ryzhov. 2014. Simulation optimization: A tutorial overview and recent

developments in gradient-based methods. A. Tolk, S. Y. Diallo, I. O. Ryzhov, L. Yilmaz, S. Buckley,

J. A. Miller, eds., Proceedings of the 2014 Winter Simulation Conference. 21–35.

Chen, C.-H., S. E. Chick, L. H. Lee, N. A. Pujowidianto. 2015. Ranking and selection: efficient simulation

budget allocation. M. C. Fu, ed., Handbook of Simulation Optimization. Springer, 45–80.

Chhabra, M., S. Das. 2011. Learning the demand curve in posted-price digital goods auctions. Proceedings

of the 10th International Conference on Autonomous Agents and Multiagent Systems. 63–70.

Chick, S. E. 2006. Subjective Probability and Bayesian Methodology. S.G. Henderson, B.L. Nelson, eds.,

Handbooks of Operations Research and Management Science, vol. 13: Simulation. North-Holland Pub-

lishing, Amsterdam, 225–258.

Cope, E. 2007. Bayesian strategies for dynamic pricing in e-commerce. Naval Research Logistics 54(3)

265–281.

Crooks, G. E. 2009. Logistic approximation to the logistic-normal integral. Tech. rep., Lawrence Berkeley

National Laboratory.

Cross, R. G., J. A. Higbie, Z. N. Cross. 2011. Milestones in the application of analytical pricing and revenue

management. Journal of Revenue & Pricing Management 10(1) 8–18.

Das, S., M. Magdon-Ismail. 2009. Adapting to a market shock: Optimal sequential market-making. D. Koller,

Y. Bengio, D. Schuurmans, L. Bottou, R. Culotta, eds., Advances in Neural Information Processing

Systems, vol. 21. 361–368.

DeGroot, M. H. 1970. Optimal Statistical Decisions. John Wiley and Sons.

Demidenko, E. 2013. Mixed models: theory and applications with R (2nd ed.). John Wiley and Sons.

den Boer, A. V., B. Zwart. 2015. Dynamic pricing and learning with finite inventories. Operations Research

63(4) 965–978.

Elmaghraby, W., A. Gulcu, P. Keskinocak. 2008. Designing optimal preannounced markdowns in the presence

of rational customers with multiunit demands. Manufacturing & Service Operations Management 10(1)

126–148.


Elmaghraby, W., W. Jank, I. Z. Karaesmen, S. Zhang. 2012. An exploratory analysis of B2B price changes.

Journal of Revenue & Pricing Management 11(6) 607–624.

Elmaghraby, W., W. Jank, S. Zhang, I. Z. Karaesmen. 2015. Sales force behavior, pricing information, and

pricing decisions. Manufacturing & Service Operations Management (to appear) .

Fan, J., F. Han, H. Liu. 2014. Challenges of big data analysis. National Science Review 1(2) 293–314.

Farias, V. F., B. Van Roy. 2010. Dynamic pricing with a prior on market response. Operations Research

58(1) 16–29.

Fu, M. C. 2008. What you should know about simulation and derivatives. Naval Research Logistics 55(8)

723–736.

Fu, M. C. 2015. Stochastic gradient estimation. M. C. Fu, ed., Handbook of Simulation Optimization.

Springer, 105–147.

Gale, B. T., D. J. Swire. 2012. Implementing strategic B2B pricing: Constructing value benchmarks. Journal

of Revenue & Pricing Management 11(1) 40–53.

Golub, G. H., C. F. Van Loan. 2012. Matrix computations (3rd ed.). JHU Press.

Harrison, J. M., N. B. Keskin, A. Zeevi. 2012. Bayesian dynamic pricing policies: Learning and earning

under a binary prior distribution. Management Science 58(3) 570–586.

Hastie, T., R. Tibshirani, J. Friedman. 2001. The Elements of Statistical Learning (2nd ed.). Springer.

Hormby, S., J. Morrison, P. Dave, M. Meyers, T. Tenca. 2010. Marriott International increases revenue by

implementing a group pricing optimizer. Interfaces 40(1) 47–57.

Jaakkola, T. S., M. I. Jordan. 2000. Bayesian parameter estimation via variational methods. Statistics and

Computing 10(1) 25–37.

Keskin, N. B., A. Zeevi. 2014. Dynamic pricing with an unknown demand model: Asymptotically optimal

semi-myopic policies. Operations Research 62(5) 1142–1167.

Keskin, N. B., A. Zeevi. 2015. Chasing demand: Learning and earning in a changing environment. Submitted

for publication .


Kim, S. 2006. Gradient-based simulation optimization. L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson,

D. M. Nicol, R. M. Fujimoto, eds., Proceedings of the 2006 Winter Simulation Conference. 159–167.

Kushner, H. J., G. Yin. 2003. Stochastic approximation and recursive algorithms and applications (2nd ed.).

Springer.

L’Ecuyer, P. 1995. Note: On the interchange of derivative and expectation for likelihood ratio derivative

estimators. Management Science 41(4) 738–747.

Leng, M., M. Parlar. 2005. Free shipping and purchasing decisions in B2B transactions: A game-theoretic

analysis. IIE Transactions 37(12) 1119–1128.

Li, H., W. T. Huh. 2011. Pricing multiple products with the multinomial logit and nested logit models:

Concavity and implications. Manufacturing & Service Operations Management 13(4) 549–563.

Li, Y., H. Liu, W. B. Powell. 2015. The knowledge gradient policy using a sparse additive belief model.

arXiv preprint arXiv:1503.05567 .

Minka, T. P. 2000. Bayesian linear regression. Tech. rep., Microsoft Research.

Moosmayer, D. C., A. Y.-L. Chong, M. J. Liu, B. Schuppar. 2013. A neural network approach to predicting

price negotiation outcomes in business-to-business contexts. Expert Systems with Applications 40(8)

3028–3035.

Negoescu, D. M., P. I. Frazier, W. B. Powell. 2011. The Knowledge-Gradient Algorithm for Sequencing

Experiments in Drug Discovery. INFORMS Journal on Computing 23(3) 346–363.

Powell, W. B., I. O. Ryzhov. 2012. Optimal learning . John Wiley and Sons.

Qu, H., I. O. Ryzhov, M. C. Fu, Z. Ding. 2015. Sequential selection with unknown correlation structures.

Operations Research 63(4) 931–948.

Roth, V. 2004. The generalized LASSO. IEEE Transactions on Neural Networks 15(1) 16–28.

Ryzhov, I. O. 2015. Approximate Bayesian inference for simulation and optimization. B. Defourny, T. Terlaky,

eds., Modeling and optimization: theory and applications. Springer. To appear.

Smithson, M., E. C. Merkle. 2013. Generalized linear models for categorical and continuous limited dependent

variables. CRC Press.


Spall, J. C. 2005. Introduction to stochastic search and optimization: estimation, simulation, and control .

John Wiley & Sons.

Spiegelhalter, D. J., S. L. Lauritzen. 1990. Sequential updating of conditional probabilities on directed

graphical structures. Networks 20(5) 579–605.

Talluri, K. T., G. J. Van Ryzin. 2006. The theory and practice of revenue management . Springer.

Tibshirani, R. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society

B58(1) 267–288.

Xia, C. H., P. Dube. 2007. Dynamic pricing in e-services under demand uncertainty. Production and

Operations Management 16(6) 701–712.

Zhang, J. Z., O. Netzer, A. Ansari. 2014. Dynamic targeted pricing in B2B relationships. Marketing Science

33(3) 317–337.

Learning Demand Curves in B2B Pricing: A New Framework …scholar.rhsmith.umd.edu/sites/default/files/iryzhov/files/2015_quryfu_pricing.pdfon the Vendavo dataset. Part of our case

Documents