Marketing Engineering Notes

Marketing Engineering Notes

Purpose

During the lectures I will cover some material that is not in the readings or that I do not

think is well explained. The purpose of this set of notes is to provide you some information on

these topics. These notes give you a preview of what I plan to talk about in class and also a

review after my lecture. The spreadsheets that are referenced in this note should be available at

http://www.business.utah.edu/~mktbm/mkt6600/ I will not include this full path in these notes.

Response Models

Response models form the heart of marketing engineering and marketing decision-

making. A response model forecasts the change in a dependent variable, Y, as a function of the

change in one or more independent variables, X. Most commonly, we will look at how changes

in the amount spent on advertising, sale promotions, or the sales force, changes in price, or

changes in the features of a product or service impact sales. Other times we will look at how

changes in the characteristics of a product or service change a person’s preference or probability

of purchasing it. So, typical dependent variables, i.e., Y, include sales, market share, preference,

and probability of choice. Independent variables, i.e, X, include marketing mix elements, product

or service characteristics, and characteristics of the buyer. Page 4 in the Response Models

Technical Note (in the WebCT Technical Notes folder) shows a large number of possible

functional forms of response models. While it is good to know something about each of these

functions, we will deal primarily with four functions: linear, multiplicative, ADBUDG, and logit.

Linear Regression

The most common assumption is that the dependent variable is linearly related to the

1

http://www.business.utah.edu/~mktbm/mkt6600/

independent variable(s). That is, regression is based on an assumption that a set of points can be

adequately represented by a straight line (or a hyper plane when there are several independent

variables), i.e., most of the data points will lie relatively close to the regression line. Consider a

linear equation with two independent variables X1 and X2:

i = 1, …, n

In this equation, a is the intercept. It is the expected value of Yi when both Xs are 0. Each

regression coefficient, bk, (k = 1, 2), is the slope associated with that independent variable, Xki. It

gives the expected change in Yi for a one unit change in Xki holding the effect of the other

variable(s) constant.

Regression finds the combination of estimates of a, b1, and b2 (i.e., ) that

minimize the sum of the squared errors over all n observations, i ei2, where ei is the difference

between the actual and predicted Yi, i.e., ei = Yi - Yi.

Diagnostics. The variation in Y is called the total sum of squares, TSS, the variation in Y

is called the explained sum of squares, ESS, and the variation in e is called the residual sum of

squares, RSS. (However, sometimes this terminology is just reversed and ESS stands for error

sum of squares and RSS stands for regression sum of squares.)

Because the regression line always runs through the mean of the data, the following identity

holds:

R2 is a measure of the amount of variation in the dependent variable, TSS, can be

2

explained by the regression, ESS. R2 is the ratio of explained to total variance.

This equation shows that the same regression weights that minimize the sum of the squared

errors, RSS, also maximize R2. There is no universal standard for a good R2, it depends on the

application. We will see some very successful applications where the R2 is relatively low and

unsuccessful applications with high R2s.

The standard error of the estimate is a measure of how close the points lie to the

regression line, where n is the number of observations and k is the number of

parameters. Usually about 2/3 of the observed Ys will lie within + one se of the regression line

and 95% of the points will lie within + two ses of the line.

One should examine the t-statistics and p-values for each regression coefficient (weight)

to see if it is statistically significant. P-values should be less than .1 and many people believe

they should be less than .05. This corresponds to t-statistics of approximately 1.8 and 2.0.

The medical advertising data (MedAdv.xls) recorded the response to a series of

advertisements. In the weight loss advertising campaign, between 0 and 4 ads were run every

month for a year and the number of calls each month inquiring about a weight loss program were

recorded. We can run a regression to estimate the relationship between the number of ads run in

a given month at the expected number of calls. The dependent variable is the number of calls

each month and the independent variable is the number of ads. Here are the data for the first four

months:

Weight LossAds Calls

3

January 3 113February 2 98March 3 147April 3 115

This resulted in the following regression equation and output:

Callsi = a + b adsi + ei

SUMMARY OUTPUT - Weight Loss Advertising

Regression StatisticsMultiple R 0.95R Square 0.91Adjusted R Square 0.90Standard Error 15.32Observations 12

ANOVA

df SS MS FSignificance

FRegression (Explained) 1 23778.17 23778.17 101.2663 1.5E-06Residual 10 2348.08 234.81Total 11 26126.25

CoefficientsStandard

Error t Stat P-valueIntercept 12.63 6.30 2.01 0.07Ads 36.10 3.59 10.06 0.00

The high R2 = .91 indicates that most of the data points lie close to the regression line.

The Standard error of the estimate is 15.32. That says about 2/3 of the observations will be

within 15 calls from what is predicted by the line.

The Analysis of Variance (ANOVA) shows divides the total sum of squares TSS

(26126.25) into the explained sum of squares ESS (23778.17) and residual sum of squares RSS

(2348.08). Note that ESS + RSS = TSS and that R2 = ESS/TSS = 23,778.17/ 26,126.25 = 1-

(RSS/TSS) = 1-(2,348.08/26,126.25) = .91.

The Intercept of 12.6 has a p-value = .07 < .1 and is significantly different from zero. It

4

says we should expect 12.6 calls a month if there were no advertising. The slope of 36.10 (also

statistically significant p-value < 0.01) indicates that on average each ad generates 36.10 more

calls. The sign of the coefficient is positive, indicating that more ads are associated with more

calls, which is what one would expect. We could examine either a plot of the data or the

residuals, i.e., the eis, to see that a linear model does a good job of representing the data.

Linear regression models are popular because they are easy to estimate and are robust.

That means that even when assumptions are violated, regression typically works pretty well. It is

also a good approximation of the phenomena within a certain range.

Forecasting with Linear Response Models. We can use the estimated regression

coefficients to forecast the number of calls that would be generated from a given number of ads

per month by “plugging” the expected number of ads into the regression equation.

Forecast calls = 12.63 + 36.10 (number of ads)

With zero ads, we expect to attract 12.63 calls, with one ad we expect 48.73 calls (12.63 because

of the intercept and 36.1 because of the ad), and each extra ad would generate 36.1 incremental

calls.

Profit Models. In addition to forecasting sales we can also forecast profits. We will

typically use the following profit model:

Profit = Unit Sales x Margin – Fixed Cost

Assume there is a linear response model for sales, y, as a function of advertising dollars, x,

which is a fixed cost: . Then profits are:

Profits =

Continuing with the weight loss example, if each call generates $59 in contribution and

each ad costs $1300, our forecast profit = $59 * Forecast calls - $1300 * number of ads

5

Each additional ad would generate 36.1 calls and $2130 (= $59 * 36.1) in contribution. It costs

$1300. This campaign generates an incremental profit of $830 per ad. If we do not use our

judgment, the model says we should place an infinite number of ads. With a linear response

model, the optimal action is always going to be to either spend nothing or spend an infinite

amount on advertising. So, we should not take the recommendations literally, but use them to say

what you should do directionally what to do rather than exactly, i.e., run a few more or less ads.

Judgmental Calibration. You are more familiar with statistical estimation or calibration;

however, it is also possible to calibrate models judgmentally. While this lacks the objectivity of

statistical estimation, it has the benefit of incorporating the decision maker’s beliefs into the

model and increases the likelihood of using the model. We will use judgmental estimates with

both multiplicative and ADBUDG models later in the course.

A linear model is a two-parameter model (slope and an intercept) when there is one

independent variable. Therefore we need two judgments to calibrate this model. One way is to

ask (1) what are the current levels of the independent (e.g., dollars spent on advertising or

number of advertisements placed) and dependent variables (sales, market share, or calls) and (2)

how much will the dependent variable change with a one-unit increase in the independent

variable. Looking at the first month of the weight loss data, one would say the current number of

ads was 3 and the current number of calls was 113. If we expect to get 36 more calls per

advertisement, we can solve for the intercept.

Calls = a + 36 (Ads) => 113 = a + 36 * 3 or a = 113 – 108 = 5.

If there are two independent variables, we would need to ask for the current levels of both

independent and single dependent variable as well as how much the dependent would change

with a one-unit increase in each of the independent variables.

6

Multinomial Logit (MNL) Models

Linear regression is an appropriate methodology when the dependent variable is

continuous. However when the dependent variable is either a zero – one variable, (for example,

if the dependent variable is brand choice, where the value is one if that brand was chosen and

zero if it was not chosen) or if it is constrained to be between zero and one, (for example, market

share must be between 0 and 100%), it may be better to use a logit model to estimate the

relationship between independent and dependent variables. For example, in choice-based

conjoint analysis, we use a logit model to estimate the relationship between product

characteristics and the probability that a person would choose a product with those

characteristics. Alternatively, we could estimate how merchandising characteristics, like price

promotions, advertising, displays, etc. influence market share or the probability that an individual

would choose a certain brand.

In this case, the dependent variable is going to be a zero – one variable, e.g., one if the

person chose an alternative and a zero if s/he did not choose it. The independent variables are

variables that might influence choice such as product characteristics or merchandising

characteristics. The output of a logit model is an estimated probability of choosing each

alternative. The formula looks more complicated than a linear regression, but is quite similar.

7

Choice-based conjoint is similar to a ratings-based conjoint model; both are used to

understand why people choose or prefer certain alternatives. We assume people choose or like

things because of the benefits they offer or the characteristics they possess – preference is

“caused” by product characteristics. We typically assume a linear function of the characteristics.

Overall preference is a weighted sum of the attribute levels the product possesses.

In either case, the product descriptions are the independent variables and the measure of

preference is the dependent variable. When the dependent variable, e.g., preference, is measured

on a continuous scale (1- 10), we use regression to estimate the importance of the product

attribute when making preference judgments about the brands.

Once we have estimated the importance weights, we can predict the preference or

likelihood of purchasing any competitive product by substituting its perceptions into the

equation. Also, we can estimate the impact of changing the perception of a given attribute; this

may be a change in the physical product or just a new message. In Excel, we have done this with

the sumproduct function of the regression weights and the independent variables.

In ratings-based conjoint, we typically assume the alterative with the highest predicted

preference is the one that is chosen. The multinomial logit model assumes that the probability of

choosing the ith alternative is equal to:

There are several differences between logit and regression models. The dependent

variable in a logit model is choice (a 0–1 variable) rather than preference rated on a continuous

(1 to 10) scale. These “choices” can be either stated choices, what the person says s/he would

8

choose or revealed choices –what they actually chose. We have similar independent variables to

ratings-based conjoint.

In ratings-based conjoint, if a product with a particular attribute level (low price or high

gas mileage) is typically preferred (receives a high preference rating) the regression weight

associated with that level will be large and positive. In a logit model, if a product with a

particular attribute level is consistently chosen, its regression weight will also be large and

positive. So we interpret the parameters the same way.

The estimated probability of choice is the exponentiated utility of that object over the

sum of the exponentiated utilities of all of the objects. Because it includes all of the competing

alternatives, logit models allow us to capture competitive effects. It can model either market

shares at the aggregate level or choice probabilities at the individual level.

Its form may look complicated, but it has two very nice properties. First Ai is always non-

negative, because it is an exponentiation. Second, the predicted choice probabilities (or market

shares) are all between zero and one and they sum to one.

The models are usually estimated by a procedure called maximum likelihood. The model

finds the parameters that maximize the probability of the observed outcomes. If a person makes a

series of choices, the model finds parameters that make the estimated probabilities of the chosen

alternatives as close to one as possible and the estimated probabilities of the non-chosen

alternatives as close to zero as possible. The parameters are estimated through a search procedure

like Solver, but our logit model does everything automatically.

The MNL model is similar to a preference regression model in that (1) we are trying to

estimate importance weights for product attributes and (2) the independent variables are the

product attributes or characteristics. It differs from the preference regression model in that (1) the

9

dependent variable is choice, or probability of choice, instead of preference or liking (2) all of

the alternatives in a given choice set are considered to be part of one observation instead of each

brand constituting a separate observation. The software accomplishes this by asking for the

number of alternatives (per case). Regression minimizes the sum of the squared errors; the MNL

maximizes a likelihood function.

Example of Modeling Transportation Choice. In a simple example we want to determine

the probability that a person will choose a car or mass transit. In the following table there are two

rows for each person, one for each alternative: Auto or Mass. The dependent variable is the

chosen mode of transportation it is a 0 – 1 variable. The first two people chose mass transit and

the third person chose auto. We could have asked people to choose a transportation mode or we

could have observed what they actually chose.

The independent variables are travel time and a dummy variable (called a brand specific

or alternative specific constant) for the first alternative, auto. The last alternative, mass transit, is

the reference level. The dummy variable is automatically supplied by the program and the

coefficient associated with it is the difference in utility between taking auto and mass transit

when travel time is held constant – some people would rather drive if travel times are similar and

others would rather take mass transportation.

Observations /

Choice data Alternatives Choice Time Auto

1 Auto 0 52.9 1

1 Mass 1 4.4 0

2 Auto 0 4.1 1

2 Mass 1 28.5 0

10

3 Auto 1 4.1 1

3 Mass 0 86.9 0

The output of the model looks like the following for these data:

Variables / Coefficient estimates

Coefficient estimates

Standard errors

t-statistic

Time -0.05 0.02 -2.57

Auto -0.24 0.75 -0.32

The negative coefficient associated with time says that people will tend to choose the

faster transportation mode, i.e., the mode with the smaller travel time. The coefficient associated

with auto is also negative, but insignificant. The insignificance says that people are indifferent

between auto and mass transit.

Forecasting with Logit Models. Just like regression, we use the estimated coefficients and

the logit formula to forecast the probability that a given person would choose either auto or mass

transit. This requires three steps. The logit formula is:

1. Calculate Vi = jjxij for each of the alternatives

2. exponentiate these calculated Vis, Ai = exp(Vi) = exp(jjxij)

3. Plug them into the logit formula

Like the first person in this data set, assume that auto travel time is 52.9 minutes and mass transit

travel time is 4.4 minutes.

V1auto = -.053 * 52.9 - .24 * 1 = -3.04

V1mass = -.053 * 4.4 - .24 * 0 = -.23

11

Exp(V1auto) = exp( -3.04) = .0478

Exp(V1mass) = exp(-.23) = .795

Exp(V1auto) + Exp(V1mass) = .0478 + .795 = .842

P1auto = .0478 / .842 = .057

P1mass = .795 / .842 = .944

As soon as we know a person’s auto and mass transit travel times, we can forecast his/her

transportation choice in a similar manner.

Example of Modeling Detergent Choice. In this example, we will use a logit model to

predict the probability of choosing a different brand of laundry detergent. We want to see how

effective various marketing mix elements are. Specifically we measure the effect of price, price

discount, whether it was on an end of aisle display, or featured in an ad that week. The model

will also put in brand dummy variables for the first three brands to capture perceived product

quality or overall image, and a loyalty variable to capture past purchases. We can use this to see

how many people would purchase due to an end of aisle display and see if that is worth the cost

of paying a store to do that.

In this example, 8 people each made 10 purchases of laundry detergent (once each

month) among four brads: Wisk, All, Tide, and Yes. The first column is the consumer number.

The second column is the purchase number (again, one for each month) for that customer (1

through 10). For each purchase there are four brands listed in the third column. The fourth

column is the 0 – 1 variable showing which brand was chosen. After the loyalty column (which I

will not cover in detail), there are four merchandising variables: List Price, Price Discount,

Display, and Feature Ad. These are the marketing mix elements under control of the retailer.

Finally, there is a dummy variable for each of the first three brands. Yes is the reference level.

12

Observations

/ Choice data Mon

th

Bra

nds

Cho

ice

Loy

alty

Lis

t Pri

ce

Dis

coun

t

Dis

play

Fea

ture

Wis

k

All

Tid

e

1 1 Wisk 0 0.253.25 0.63

0 0 1 0 0

1 1 All 0 0.253.10 0.71

0 0 0 1 0

1 1 Tide 1 0.253.30 0.82

1 1 0 0 1

1 1 Yes 0 0.25 2.950.86

0 0 0 0 0

1 2 Wisk 0 0.23.25 0.63

0 0 1 0 0

1 2 All 0 0.23.10 0.71

0 0 0 1 0

1 2 Tide 0 0.43.67 0.82

0 0 0 0 1

1 2 Yes 1 0.2 2.950.60

0 1 0 0 0

When we estimate this model, we get the following set of coefficients: List price is negative.

That says people are less likely to buy products at a higher price. The price discount is positive

and says people are more likely to purchase when there is a bigger discount. Notice that people

are more sensitive to the amount of the discount than to the list price. People are also more likely

to buy when a product is displayed or when it is in a feature ad. The sizes of these two

merchandising coefficients are approximately equal. The positive coefficient for Tide says that

people are significantly more likely to buy Tide than the reference brand Yes, but there is no

significant difference between the probability of choosing any of the other three brands.

Variables / Coefficient estimates

Coefficient estimates

Standard deviation

t-statistic

Loyalty 1.78 1.22 1.46List Price -3.54 1.10 -3.22Discount 10.58 2.01 5.25Display 1.18 0.43 2.72Feature 1.25 0.45 2.77Wisk 0.37 0.59 0.63All 0.67 0.56 1.20Tide 1.99 0.69 2.90

13

Linearizable Response Functions – Multiplicative Model

Decreasing and Increasing Returns Response Functions. If either theory or an inspection

of the data suggests a nonlinear relationship between one or more of the independent variables

and the dependent variable, you should consider a nonlinear model. The first alternative is a non-

linear model that can be linearized through a simple transformation. This allows you to do the

estimation with linear regression. This is typically easier than nonlinear least squares estimation,

it may be more robust, and the regression module in Excel provides a number of useful

diagnostics like R2 and t-statistics. By far, the most widely used linearizable model is the

multiplicative model. It is the only one we will cover in class.

Multiplicative Model. The multiplicative model is a commonly used model to represent

either an increasing or decreasing returns function. This model is popular because it is a constant

elasticity model, i.e., it models the response in such a way that a given percent change in an

independent variable always produces the same percentage change in the dependent variable. An

increasing returns model (see P5 on Exhibit 2, page 4 of the Response Models Technical Note)

might occur if there are network effects or positive feedback. A decreasing returns model (see P3

on Exhibit 2, page 4 of the Response Models Technical Note) might occur if the impact of a

repeated advertisement declines over time.

The two-variable multiplicative model is written as:

It is called multiplicative because the Xs are multiplied together rather than added. If the bs are

less than one it is a decreasing returns model and if they are greater than one, it is an increasing

returns model. We estimate the parameters by taking logarithms of the above equation:

14

To estimate the model, take the logarithms of Y, X1 and X2 and then regress Ln Y on Ln

X1 and Ln X2.

Once we have estimated the parameters, we can use the following equation to forecast Y.

Judgmental Calibration. This is also a two-parameter model when there is one

independent variable, so we must ask at least two questions to judgmentally estimate the

parameters. First, what is the current level of the independent variable (e.g., dollars spent on

advertising or price) and the current level of the dependent variable (sales or market share)?

Second, what will be the percent change in the dependent variable with a one percent increase in

the independent variable. If current sales are supposed to be 50 units when the price is $130 and

sales will increase by 1.5% with every one percent decrease in price, we would have the

following model:

Sales = a (Price)-b =>50 = a (130)-1.5 = a (.0007) or a = 50/(.0007) = 74111.

If there are two independent variables, we would need to ask for the current levels of both

independent and single dependent variable as well as the expected percentage change in the

dependent with a one percent change in each of the independent variables.

Forecasting with the Multiplicative Model. The process is the same as with the linear

model: “plug” the forecast values of the independent variables into the equation and solve for the

value of the dependent variable. For example, in the sales example above, sales when price is

15

dropped to $120 is calculated as follows:

Sales($120) = a(Price)-b = 74111 ($120)-1.5 = 56.38 units.

Profit Models. We can build profit models just like we did with the linear model. Again,

the general profit function is: Profit = Unit Sales x Margin – Fixed Cost

In the above example, the marketing variable is price, which is not a fixed cost, but enters

into the margin. If the unit cost is $50, then

Profits = a(Price)-b * Margin = 74111 ($120)-1.5 ($120 - $50) = 56.38 * $70 = $3946.60

16

Measuring the Impact of Price and Display on Sales

The cheese data contain weekly unit volume, price, and a measure of display activity on

several key accounts (a city – retailer combination) for approximately 65 weeks. These data are

for a sliced cheese product manufactured by Borden. The measure of display activity is percent

of ACV (all category volume) on display. Later we will look at some soft drink data that have

the same information for both a focal brand and a competitor. The models may look complicated,

so we will build them in steps.

First is a simple model where sales are a function of just price:

Where St is the unit volume at time t, Pt is the price at time t. In this equation, adjusts

for the size of the market. It is the size of the market when all independent variables equal one.

1 is the price elasticity – the percent change in volume for a 1% change in price.

To estimate the model, we take natural logarithms of each side of the equation to get an

equivalent model:

We can estimate this model with regression, where ln(St) is the dependent variable and ln(Pt) is

the independent variable:

Once we have estimated the parameters, the estimated sales volume is:

Next, assume that display activity affects volume only and not price sensitivity. This

results in the following model:

17

Where Dt is the percent of ACV on display and 1 is a multiplier for display, i.e., if

there is no display activity, (i.e., Dt = 0) the impact of 1 is a multiplication by 1, if ACV display

is 1, the sales volume is increased by a factor of 1 It is a measure of the percentage change in

volume when there is a display.

This model can be written equivalently in terms of logarithms as:

We can estimate this model with regression, where ln(St) is the dependent variable and Dt and

ln(Pt) are the independent variables:

Once we have estimated the regression coefficients, we can forecast sales with the following

equation by plugging in the price and display activity:

Next, we can complicate the model even further by assuming that a display impacts not only

volume, but also price sensitivity.

As before, 0 measures the percentage change in volume due to display activity and 3 measures

the change in price sensitivity due to display activity. Again, we can write this into an equivalent

model by taking logarithms of both sides:

18

We can estimate that model with regression where ln(St) is the dependent variable and Dt, ln(Pt),

and Dt*ln(Pt) are the independent variables:

Once we have estimated the regression coefficients, we can forecast sales with the following

equation for any level of price and display activity:

Finally, we look at two brands, i and j, where we will call brand i our own brand and brand j the

other brand. Furthermore, we will model the effects of price and display activity both on our own

brand and on the other brand. Sales of brand i is a function of its own pricing and display

activity as well as the pricing and display activity of the other brand, brand j.

we expect that 2 and 3 will be negative – as its own price increases, its sales will decrease. On

the other hand if the two brands are competing, we expect 5 and 6 to be positive – as the price

of the other brand increases we expect sales of own brand to increase. Similarly we expect

and to be of opposite signs: display activities of own brand should increase own brand sales

and display activities of the other brand should decrease own brand sales.

We can write this model as an equivalent model by taking logarithms, estimate it using

regression, and make forecasts once we have estimated the parameters:

Other Linearizeable Models

19

We will probably not use these models, but they are very similar to the multiplicative

model, so they are briefly mentioned.

Exponential Model. Rather than taking logs of both X and Y, one can take logs of only

one or the other. The exponential model has the following form and can model either increasing

or decreasing returns:

Y = aeXb

If we take logs of both sides of this, we have

Ln Yi = Ln a + bXi + ei = + bXi + ei where = Ln a or a = e

Therefore, if we take the logarithm of Y, but not X, we are estimating this exponential model.

This is one of the curve fitting options in Excel Chart.

Semi-logarithmic Model. It is also possible to take logs of just one or more of the X

variables, i.e.,

Y = a + b1ln X1i + b2 X2i + ei

Typically, we might choose this model when we expect one of the independent variables to

display a nonlinear relationship to Y. This might occur if X1 is a size variable, like number of

employees. There may be large differences between small and medium sized companies, but

smaller differences between large and very large companies.

Example. The spreadsheet NonlinearAdvSales.xls provides an example of nonlinear

modeling. It happens to have been done within the chart option of Excel rather than regression;

however, the appropriate columns allow you to run regression with multiplicative, exponential,

and semi logarithmic models. Look at the R2s and the plots of the residuals to choose the most

appropriate model.

20

Estimating Nonlinear Models with Solver

It is also possible to estimate response models with Excel’s Solver add-in (Read the Excel

Solver Technical Note in the WebCT Technical Notes folder). Solver searches for values of cells,

or parameters, that maximize or minimize another cell, which is a function of the parameters.

When estimating a response model, we will be searching for parameters (like regression weights)

that minimize the sum of squared errors between the predicted and actual dependent variable.

We can also use Solver to find values of marketing mix elements that maximize profits.

The spreadsheet NonlinearLeastSquares.xls contains two examples of estimating

response models using Solver. The first is the linear regression dealing with the weight-loss

problem we saw earlier (see IntroReg sheet, Sheet1 and Chart1). The second deals with the

ADBUDG model (see IntroADBUDG sheet, Sheet2 and Chart2). In either case, the steps are the

same and are given in the two Intro spreadsheets.

The following description is for the Weight Loss advertising

1. Select locations for the parameter you want to estimate and put in initial guesses. Select cells

that are contiguous – A3 and B3.

2. Place the independent variable (in this case number of advertisements) in a column (in this

case column B).

3. Place the dependent variable (calls) in a column (C ). Calculate the mean of that column.

4. Create a column that uses the parameters to estimate the dependent variable (D).

5. Create a column (E) that is the squared difference between the dependent variable and the

predicted dependent variable (C – D)2. Sum this column. That sum, which is the Residual (or

error) sum of squares, is the number you want to minimize. Solver should search over

different values of the parameters (A3 and B3) to minimize this cell.

21

6. This is not required to estimate the parameters, but create a column (F) that is the total sum

of squares, i.e., the squared difference between the dependent variable (C ) and its mean. The

purpose of this is to allow you to calculate R2.

7. Calculate R2.

8. To use solver with either Excel 2003 or Excel 2007, click on Data then Solver (If solver has

not been installed in Excel 2003, click on Tools then Add-ins and click solver. If solver has

not been installed in Excel 2007, click the Microsoft Office Button, Excel Options, Add-Ins,

Manage Excel Add-ins, Go). After the solver dialog box comes up, select the cell to be

minimized (sum of the squared error column), click minimize. Select the cells to be searched

over (A3:B3). Add any appropriate restrictions (none are needed in this case). Click Solve.

ADBUDG

ADBUDG is a flexible model that was developed for judgmental data. It can represent

either an s-shaped model where increasing returns occur up to a point, and decreasing returns

after that or a concave model, which always has decreasing returns. The s-shaped model is

appropriate for the situation where there is little response until we spend more than a certain

amount, and then sales increase rapidly for a period, but at some point advertising will become

increasingly less effective.

ADBUDG has four parameters:

b is the minimum value of Y – “what will sales be if you do not do any advertising or

promotion?”

a is the maximum value of Y - “what will sales be if you spend an infinite amount on

22

advertising?”

c controls the shape of the curve; the curve is concave if 0 < c < 1 and it is s-shaped if c > 1, and

d works with c to control how quickly the curve rises.

Statistical Estimation using Solver. This is based on the model in Sheet2 of the

NonLinearLeastSquares.xls spreadsheet.

1. There are four parameters a – d. They are placed in A6:D6. Initial values are selected: b is set

at a minimum value, a is the maximum value, c is set at 2 (I always do that), and d was set at

20 (that is hard to explain why).

2. The independent variable (marketing effort) is placed in column A.

3. The dependent variable (sales) is in column B. The mean of the column is at the bottom.

4. Forecast sales (Yhat) is in C – check out how this was calculated – I just plugged in the

ADBUDG function using the parameter cells.

5. Calculate a column of Squared errors (C – B)2. The sum is at the bottom. Create a column of

TSS by taking the squared difference between the dependent variable and its mean. Sum this

column.

6. Estimate R2.

7. To run Solver, click Tools, then Solver, We want to minimize the sum of the squared errors

by searching over the parameters (A6:D6). Here we should put some constraints on the

parameters a6:d6 >0 and b6 < a6.

Judgmental parameter estimation. There are four parameters and they can be uniquely

determined with four estimates. Usually these estimates are in terms of changes from the current

situation. By what percent would sales grow (shrink) if you used a saturation level of (did no)

advertising? By what percent would sales increase if you spent 50% more on advertising? We

23

assume sales would remain constant if your level of advertising remained constant.

b = y(0), i.e., the percent of current sales that would be retained if advertising were cut to zero

a = y(), i.e., the percent sales would grow if the advertising level was infinite

y(1) is the sales at the current level

y(1.5) is the percent of current sales you would sell if you spent 50% more on advertising.

Because 1c = 1, we can solve for d with the following formula:

Going through some algebra, we can see d is equal to the following:

when y(1) = 1, .

Assuming that the person also provided an estimate of y(1.5), we can solve for c. After more

algebra:

For example assume a manager assumed that sales would drop to 60% of current without any

advertising, rise to 2X current sales with saturation advertising, and rise to 1.3X current sales

with 1.5X as much advertising. This would generate the following parameters:

a = 2.0

b = .6

24

The cases in the book, Conglom, Syntex, and Blue Mountain Coffee all use a slightly

different method. They ask for estimated change sales from the current sales at the following

four levels of marketing effort: 0, 50% of current, 150% of current, and saturation. The method

implicitly assumes that the current level of marketing effort is going to result in the current level

of sales.

The first three parameters, a, b, and d are estimated in the very same way as above:

b = y(0)

a = y()

y(1) is sales at the current level

as y(1) = 1 .

The other two estimates, a non linear least squares procedure is used with the

observations y(.5) and y(1.5) to estimate c. The errors in the two estimates are:

and

We use Solver to search for the value of c that minimizes e(0.5)2 + e(1.5)2

Both of these procedures are illustrated in the spreadsheet ADBUDGJudmental.xls

Forecasting with ADBUDG Models. We do this the very same way we did forecasting

with a regression mode. The first step is to estimate the four parameters of the ADBUDG model.

This can be done either with marketplace data (see the ADBUDG portion of the

NonLinearLeastSquares.xls spreadsheet) or judgmental data (see the ADBUDGJudmental.xls

spreadsheet). Continuing with the example from the NonLinearLeastSquares.xls spreadsheet, as

25

with the regression model, we use the estimated coefficients and plug in the expected “marketing

effort” to forecast unit sales:

With zero marketing effort we would expect 30.3 sales and with an infinite amount of marketing

effort, we would expect 56.7 sales.

Profit Models. In addition to forecasting sales we can also forecast profits. The general

model will be is the same as earlier:

Profit = Unit Sales x Margin – Fixed Cost

Continuing with the same example and assuming that the margin is $2 and the cost of a

unit of marketing effort is $1.5

The Profit worksheet calculates forecast sales and profits for different levels of marketing effort.

This is graphed in Chart3. Cells H13:H17 of that same sheet allow you to use Solver to find the

level of marketing effort that maximizes profits.

26

Clustering for Segmentation and Classification

Rather than assuming that the data can be represented by a line (or hyper plane) as in

regression, cluster analysis assumes that the data can be represented by a much smaller set of

points in a space. That is, most of the data points are expected to “cluster” around one of a small

number of points. So this smaller set of points can adequately represent the data, just as a line

can adequately represent the data in a regression.

In most cases, we will be clustering people to form market segments. We can think of

each person in an n-dimensional space, where n is the number of variables on which we have

data. For example, in a demographic segmentation, we could have variables for age, income,

educational level, marital status, and region of the country. If we did an attitudinal segmentation,

each person would be represented by their answers to a number of attitudinal questions. We want

to learn if there is some structure to the data, i.e., are people spread out uniformly or are their

distinct groups or segments. Examples might be high income professionals, liberals,

conservatives, etc.

The Segmentation and Classification program in ME>XL has two options: hierarchical

clustering and k-means. Hierarchical clustering is the default method, but I think k-means is

more valuable. The hierarchical clustering program starts with each point in its own cluster. It

goes through a series of steps. In each it combines the two clusters that are most similar, the new

cluster is located at the centroid or the average of the two clusters that have been combined. It

continues through this process until there is only one cluster. At each step the two clusters that

are joined together are the two that would increase the Error Sum of Squares (ESS) by the least

amount – essentially, it combines the two clusters that are closest together. (In regression RSS,

residual sum of squares is the same thing as error sum of squares.) At each step the ESS is the

27

total error sum of squares that is associated with that number of clusters. A smaller ESS means a

better fit. As the data are aggregated into fewer and fewer clusters, the ESS will continue to rise.

The k-means program uses the hierarchical solution as a starting point and “optimizes”

that solution by sequentially moving each point to each cluster to see if the fit improves. It

reports fit as a ratio of between to within sum of squares. This ratio is not that meaningful, but it

can be transformed into a number that is similar to an R2.

Because the clusters are described in terms of the means of the points in the cluster, i.e.,

the centroid, the Total Sum of Squares = Between Sum of Squares + Within Sum of Squares or

TSS = BSS + WSS. This is like regression in which TSS = ESS + RSS. Here BSS corresponds to

ESS, the explained sum of squares, and WSS corresponds to RSS, the residual sum of squares.

The WSS is the sum of the squared distances from all points represented by a cluster to

the centroid of that cluster. It measures how well the cluster centroid represents that set of points.

It is like e2 the sum of the squared errors in regression, the sum of the squared distances of the

data points from the line. BSS is a weighted sum of the squared distances between each pair of

group centroids, where the weight is the number of points represented by each cluster. BSS is

bigger when the groups are further apart or are better separated.

In regression R2 = ESS / TSS. We would like a similar statistic, but the k-means

clustering program reports only the ratio BSS/WSS. However we can go through a little algebra

to calculate an R2:

If we divide the numerator and denominator of the RHS by WSS, we get:

28

Where ratio = BSS/WSS that is printed out by the program. Therefore, we can calculate the R2 = ratio/(ratio+1) from the k-means output.

We can use either the R2 or the ESS to help determine the proper number of clusters. In

either case, we look a big improvement in fit up to a certain number of clusters and a small

improvement after that, this is called an “elbow.”

In the PDA data, we had the following Between/Within Sum of Squares and ESSs for

differing numbers of clusters:

Clusters B/W VAF ESS1 2.842 .3609 0.27 1.533 .6964 0.41 0.834 .9594 0.49 0.675 1.211 0.55 0.446 1.441 0.59 0.317 1.639 0.62 0.318 1.793 0.64 0.25

Looking first at ESS, when we go from one two clusters, the ESS drops by 1.31 (=2.84-

1.53). It drops by .70 (=1.53 - .83) when we go from two to three clusters. It drops by .16 going

from three to four clusters, etc.

If there is a clear “elbow” we would choose that number of clusters. For example suppose

the ESSs for one to five clusters were the following: 2.84, 1.53, .83, .80, and .75. We would see

that there is a large drop in ESS as we go from one to three clusters, but little is gained after that.

With real data we do not usually get this clean of a solution, and we must look at other things

such as size and interpretability of the clusters.

Similarly, if we saw R2s that increased .27, .41, .49, .55, .59, we might choose the

solution with an R2 of .49, or possibly .55, as the gains get smaller after that.

In our data, we see that little is gained after six clusters and quite a bit is gained for the

first three or four clusters. This says that the proper number is probably between 3 and 6. We

29

need to look at size of clusters and the interpretation of the new clusters to make a determination

of the optimal number.

Discriminant Analysis

In addition to clustering, the Segmentation and Classification program also performs a

“discrimination.” Once clusters have been formed on one set of variables, say attitudinal, then

the program attempts to see if there are differences among these clusters in terms of another set

of variables, say demographic. So it may try to determine if there are demographic differences

between liberals and conservatives. This is accomplished through a statistical technique called

discriminant analysis.

Discriminant analysis shares some similarities with both cluster analysis and regression.

Like regression it is a statistical technique that determines the best linear relationship between a

set of independent variables and a dependent variable.

Yi = a + b1 X1i + b2X2i + b3 X3i + ei

Regression finds that linear combination, i.e., that set of a and bs that best explains the variation

in a dependent variable. It finds that combination of a, b1, b2, and b3 that minimizes e2 or

maximizes R2. The independent variable is assumed to be interval scaled. It assumes that the

relationship between the dependent and independent variables can be represented in terms of a

straight line (or actually hyperplane in multiple regression).

In discriminant analysis, the Yi is a categorical variable, i.e., group membership.

Categorical variables are just different, e.g., male and female (or the benefit clusters in the PDA

case), but there is no order to them. Discriminant analysis finds that linear combination (or linear

combinations) that bests separates groups or, equivalently, that does the best job of predicting

group membership.

30

For example, if a market is segmented into benefit or needs clusters, we might use

discriminant analysis to see if a linear combination of demographic variables can separate these

groups, i.e., determine which demographic variables best differentiate, or separate these

segments. Stated differently, we want to see if the clusters differ significantly in terms of

demographic variables.

Rather than estimating a line that all points like close to, it estimates a function such that

the scores of all observations in one group are close to each other and they are far from the

scores of the other groups. So, rather than interpreting the data in terms of a straight line, we

interpret the data in terms of a single point for each group. Can we adequately represent our data

as a set of points? We want a lot of variation (or distance) between groups, this is called Between

group Sum of Squares, BSS, and very little within group – variation, called Within group Sum of

Squares, i.e., we want our points to all be close to the centroid of the group to which it belongs.

Like regression, which attempts to minimize RSS or maximize ESS, discriminant analysis

attempts to maximize a function of the ratio of BSS/WSS.

Like clustering, the statistics are based on WSS, within group sum of squares, and BSS,

between group sum of squares. The big difference from clustering is that we do not know which

cluster each observation is in before we start. We do not even know how many clusters there are.

In discriminant analysis, we know which group each observation is in and we want to find out if

there are any differences in a set of independent variables among observations in different

groups.

Suppose we wanted to discriminate between a group of males and a group of females.

We want to try to predict a group membership, person’s gender, i.e., we want to find a function

of the independent variables that gives high scores to one gender and low scores to the other.

31

Suppose we measure people in terms of height, weight, shoe size, eye color, grade point, and

GMAT. The discriminant function would look like the following:

Genderi = + Hti + Wti + SSi + ECi + GPAi + GMATi

If male is coded as one and female as zero, we want a set of and s that give scores

close to one to men and scores close to zero to women. In this case we might expect that 1 2

and 3 would be greater than zero, i.e., on average men tend to be physically bigger than women.

If the sample consisted of MBA students, we might not expect a significant difference in GPA or

GMAT between men and women. This function tells us which variables differ significantly

between men and women. We can use this function (called a discriminant function) to predict the

gender of a given person, given knowledge of their height, weight, shoe size, eye color, GPA,

and GMAT. Some larger women would get incorrectly classified as men and some smaller men

would get incorrectly classified as women.

One measure of the quality of a discriminant analysis is the proportion of observations

that are correctly classified. This is kind of like an R2 – the amount of explained variance.

If there are only two groups, we can model their centroids in terms of two points on a

line. That is, the space will be one-dimensional. When we have three groups, we will need to

locate them as points in a two-dimensional space unless one of the groups falls on the line

between the other two groups. This means there will be two linear combinations, or discriminant

functions, one for each dimension in the space. The first explains as much of the variation as

possible, i.e., maximizes BSS/WSS. The second function explains as much of the residual

variation as possible, subject to the constraint that it is orthogonal (perpendicular) to the first

function. There can be no more than one fewer dimension (discriminant function) than there are

groups. If we have four groups, we can have at most three discriminant functions. One of the

32

outputs to a discriminant analysis is the amount of variance that is explained by each dimension

as well as the cumulative variance explained by all dimensions up to and including the last. This

can be interpreted like the fit statistics in clustering. You want to balance a small number of

dimensions with as much explanatory power as possible. Like in cluster analysis, where the

question is whether one should add one more cluster to the solution, the question with

discriminant analysis is whether another dimension is needed to adequately represent the groups.

In the following table, taken from the four-cluster needs-based PDA solution, only two

discriminant functions are needed to adequately represent the demographic clusters as they

capture 80% of the variance.

Discriminant Percent of Cumulative Significance function variance percent level ------------ ---------- ---------- ------------ 1 48.49 48.49 .000 2 30.59 79.09 .000 3 20.91 100.00 .015

It would be nice if the discriminant analysis program printed out the actual discriminant

functions, which would be like printing out the regression weights. Unfortunately, ours prints out

the correlations between the independent variables and the discriminant functions. The

correlations are related to the discriminant function weights but are not the same thing. They do

tell show the direction of the weight and which weights are more important in determining each

function.

Variable Func1 Func2 ---------- ------ ------ PDA .708 .132 Income .669 .086 Bus_Week .635 -.089 Education .622 -.011 Professnl .591 .137 M_Gourmet .456 .064 PC_Mag .354 -.024

33

Field&Stre -.277 .674 Construct -.187 .660 Sales -.045 -.512 Emergency -.265 .424 Service -.356 -.328 Age -.076 .030

Again, these correlations are taken from the same PDA discriminant analysis. This says

the first dimension is primarily: PDA, income, education, Professional, etc. and the second

dimension is Field&Stream, Construction, Sales, and Emergency.

Following is a plot of the group centroids from the positioning analysis program. It is

similar to discriminant analysis. The first dimension has been reversed as professional, PDA, etc

are located on the left side of the space.

34

The first, horizontal, dimension separates the professionals from the other groups and has

all the first discriminant function variables lying on that axis. The second, vertical, dimension is

sales, construction, service, and Field & Stream. This shows where the groups fall relative to

each other demographically.

For example, if a market is segmented into benefit clusters, we might use discriminant

analysis to see if a linear combination of demographic variables can separate these groups, i.e.,

determine which demographic variables best differentiate, or separate these clusters.

35

Perceptual Mapping

This section will cover perceptual mapping, or positioning analysis using factor analysis.

Cluster analysis tries to group observations (e.g., people) that are similar in groups. Factor

analysis tries to “group” variables that are similar together. If two (or more) variables are highly

correlated, then a single variable could do a fairly good job of representing both. Factor analysis

replaces a set of correlated variables with a linear combination of them that retains as much of

their information (variance) as possible. These linear combinations are called factors. This allows

us to represent a set of objects in a reduced dimensional space. For example, following are a set

of 10 cars (from several years ago) that have been rated on seven attributes:

Attributes / Brands BMW

Cavalier Intrepid Taurus

Accord Altima Saturn Subaru Camry

VW Passat

Fuel Econ -0.413 -0.152 -0.891 -0.543 0.413 0.065 0.587 0.021 0.587 0.326

Reliability 0.573 -1.034 -0.73 -0.73 0.921 0.182 -0.034 0.182 0.834 -0.165

Style 1.43 -1.091 -0.221 -1.004 0.517 0.12 -0.569 -0.134 0.43 0.517

Price -1.465 0.969 -0.204 0.404 -0.247 0.273 0.969 -0.334 -0.16 -0.204

Fun to Drive 1.704 -1.078 0.139 -1.034 0.182 0.008 -0.6 0.182 0.095 0.4

Safety 0.652 -0.782 -0.217 0 0.217 0 -0.173 0.26 0.217 -0.173

Space -0.543 -0.673 0.326 0.5 0.108 0.108 -0.282 0.63 0.195 -0.369

These numbers of been scaled so the average rating on each attribute is 0.0. Positive

numbers indicate above average ratings and negative numbers represent a lower than average

rating. We can think of these cars as located in a 7-dimensional space. We cannot visualize

things in seven dimensions, but we could plot pairs of dimensions, like plot the cars on the

dimensions of fuel economy and reliability, then fuel economy and style, etc.

The Positioning Analysis program uses factor analysis to derive a smaller number of

dimensions that contains as much information from the original variables as possible. Following

is a correlation table of the attributes:

36

Fuel Reliability Style Price Fun Safety SpaceFuel 1 Reliability 0.59 1.00 Style 0.18 0.77 1.00 Price 0.23 -0.55 -0.87 1.00 Fun -0.05 0.61 0.95 -0.93 1.00 Safety 0.06 0.78 0.76 -0.81 0.75 1.00 Space -0.17 0.09 -0.19 -0.06 -0.16 0.29 1.00

Five of these attributes: Reliability, Style, Price, Fun to Drive, and Safety are highly

(positively or negatively) correlated. If two variables are positively correlated then a car that is

perceived to be higher (or lower) than average on one of these attributes, it is likely to be

perceived as higher (or lower) on the other as well. If two variables are perfectly correlated, then

the second one contains no new information and their sum would contain just as much

information as the two variables by themselves. If several variables are correlated, then their

weighted sum will contain most of the information in all of them individually. In this example,

these five attributes can be represented as a single dimension in a perceptual space without too

much loss of information. Of the other two attributes, Fuel economy is correlated with reliability,

but no other attributes and Space is relatively uncorrelated with any other attributes.

In a two-dimensional space, we might guess that the first dimension – the horizontal

dimension will represent these first five attributes and the second dimension will represent some

combination of the other two. The positioning program produces the following perceptual map:

37

We see that Fun to Drive, Safety, Style, and Reliability all lie close to each other and

Price, which was negatively correlated, points in the opposite direction. The first dimension

accounts for 55.1% of the variation in the data. The second dimension, which is primarily Fuel

Economy, accounts for 20.1% of the variation in the data. Additionally, brands that are similar to

each other are located close to each other in the space, e.g., Camry and Accord are located close

to each other. Brands that are distinct, like BMW and Taurus are located away from other

brands.

Construction of Joint Spaces

The information needed to locate preference vectors or ideal points in the space consists

of consumers’ preferences or purchase likelihoods of each brand. A regression is used to find

38

the relationship between the brand locations and preferences. Preference, or purchase likelihood,

Pj, is the dependent variable and brand locations, X1j and X2j, are the independent variables.

Preference vectors. The location of a preference vector is determined by a regression that

is shown in the next equation:

This procedure is illustrated with an example involving one person's likelihood of

purchasing each automobile on a 0 to 10 point scale (where 10 means the person was very likely

to purchase the automobile).

Respondents / Brands BMW Cavalier Intrepid Taurus Accord Altima Saturn Subaru Camry VW Passat

Bill 6 2 2 4 10 8 2 8 10 8

The brand locations are given in the “Diagnostics” page of the ME>XL output under

Coordinates. The above row of preferences has been special pasted into the last column:

Dimensions / Brands 1 2 BillBMW 0.6245 0.2516 6Cavalier -0.5663 -0.0962 2Intrepid -0.1211 0.5908 2Taurus -0.3228 0.3792 4Accord 0.2363 -0.2636 10Altima 0.0044 -0.071 8Saturn -0.2366 -0.4502 2Subaru 0.1075 0.1541 8Camry 0.2061 -0.3224 10VW Passat 0.0681 -0.1723 8

A regression is run with the preferences as the dependent variable and the two

coordinates as the independent variables. The regression gave the following results:

39

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.75R Square 0.56Adjusted R Square 0.44Standard Error 2.45

Observations 10


Error t Stat P-valueIntercept 6.00 0.78 7.73 0.00

1 6.47 2.45 2.64 0.032 -3.46 2.45 -1.41 0.20

The positive signs on the regression weights for X1 and the negative sign on X2 indicate

that likelihood of purchase increases as one moves to the lower right in the space. Because the

first dimension has a weight that is approximately twice as large as dimension two (i.e., 6.47

versus -3.46), dimension one is more important when determining likelihood of purchasing.

The preference vector is located by drawing a vector through the origin that decreases

3.46 units vertically for every 6.47 units horizontally. The preference vector for this person is

shown in the following figure. This can be drawn manually by starting at the origin and moving

some multiple of 3.46 down for the same multiple of 6.47 to the right.

40

41

Ideal points. This formulation assumes that the squared distance between a brand (X1j,

X2j) and an ideal point (Y1, Y2), in two dimensions is inversely related to preference for the

brand, i.e., an ideal point is located close to brands preferred by that person.

The following equation models preference for a given brand as a linear function of its

squared distance from the ideal point in a two dimensional perceptual space:

Where Pj, a, b, and Xij are the defined as before. Yi is the location of the ideal point on the ith

dimension. The negative sign on the b indicates that preference for a brand decreases as the

further it is from the ideal point1. This relationship between brand preference and its squared

distance from the ideal point can be used to locate the ideal point. Stated differently, we know a

person’s preference for each brand, Pj and the brand’s location in a perceptual space, Xij, we want

to find the location of the ideal point,Yi. The above equation is transformed into the following

nonlinear regression equation:

where Pj is the preference for the jth brand, ( ) are the coordinates of the ideal point, (X1j,

X2j) are the coordinates of the jth brand, and and are regression weights.

We can expand the above equation as follows:

1 ? If the sign on b is positive, it indicates that preference increases as a brand moves away from the ideal point. In this case the ideal point is called an "anti-ideal point" as it indicates a position of minimum, rather than maximum, preference.

42

Remember, the goal is to find estimates for ( ). The above equation is rewritten so

preference can be a function of just the X’s as follows:

where , , and for i=1, 2.

The location of the ideal point on the ith dimension, , is given by:

.

While the math may look complicated, it just shows it is possible to run a regression

where Pj is the dependent variable and X1j2 + X2j

2, X1j, and X2j are the three independent

variables. The location of the ideal point is given by the above equation. If is negative, then

Yi represents an ideal point - a place of maximum preference. If is positive, then Yi represents

an anti-ideal point - a point of minimum preference.

Again, this is illustrated with the data from the same person as before. The only

difference is the addition of the X1j2 + X2j

2 term to the previous regression.

Dimensions / Brands 1 2 Dim 1^2 + Dim 2^2 Bill

BMW 0.6245 0.2516 0.4533 6Cavalier -0.5663 -0.0962 0.3300 2Intrepid -0.1211 0.5908 0.3637 2Taurus -0.3228 0.3792 0.2480 4Accord 0.2363 -0.2636 0.1253 10Altima 0.0044 -0.071 0.0051 8Saturn -0.2366 -0.4502 0.2587 2Subaru 0.1075 0.1541 0.0353 8Camry 0.2061 -0.3224 0.1464 10VW Passat 0.0681 -0.1723 0.0343 8

Again, one would expect the ideal point to be in the center of the Japanese cars and

closest to the Accord and Camry. The following regression is run:

43

SUMMARY OUTPUT

Regression StatisticsMultiple R 0.93R Square 0.86Adjusted R Square 0.79Standard Error 1.50Observations 10


Error t Stat P-valueIntercept 8.55 0.86 9.97 0.00

1 6.18 1.50 4.11 0.012 -0.99 1.65 -0.60 0.57

Dim 1^2 + Dim 2^2 -12.74 3.57 -3.57 0.01

The coefficient associated with the X1j2 + X2j

2 term is negative, so this represents an ideal point.

The coordinates are given by:

Y1 = 6.18 / {2 * (-12.74)} = .25 and

Y2 = -.99 / {2 * (-12.74)} = .04

This location is different from the one generated by ME>XL plotted in the next figure.

44

Marketing Engineering Notes

Business

yi y

y y y y i irss

regression sum of

ei yi ei

predicted yi

yi yinnn2because

linear regression

regression weights