Top Banner
STATISTICAL TOOLS FOR ECONOMISTS Daniel McFadden '2001 Department of Economics University of California Berkeley, CA 94720-3880 ([email protected]) REVISED VERSION, CHAP. 1-7, 1/16/2001 COMMENTS AND CORRECTIONS WELCOME This manuscript may be printed and reproduced for individual use, but many not be printed for commercial purposes without permission of the author.
173

McFadden-Statistical Tools for Economists

Apr 08, 2015

Download

Documents

beaconoflight
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: McFadden-Statistical Tools for Economists

STATISTICAL TOOLSFOR ECONOMISTS

Daniel McFadden ©2001

Department of EconomicsUniversity of California

Berkeley, CA 94720-3880([email protected])

REVISED VERSION, CHAP. 1-7, 1/16/2001

COMMENTS AND CORRECTIONS WELCOME

This manuscript may be printed and reproduced for individual use, but many not be printed for commercial purposes without permission of the author.

Page 2: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools ©2000 Page i______________________________________________________________________________

Section TABLE OF CONTENTS Page1 Economic Analysis and Econometrics 1.1 Introduction 1 1.2 Cab Franc's Rational Decision 1 1.3 Stock Market Efficiency 4 1.4 The Capital Asset Pricing Model 7 1.5 Conclusions 14 1.6 Exercises 14

2 Analysis and Linear Algebra in a Nutshell 2.1 Some Elements of Mathematical Analysis 17 2.2 Vectors and Linear Spaces 20 2.3 Linear Transformations and Matrices 22 2.4 Eigenvalues and Eigenvectors 26 2.5 Partitioned Matrices 27 2.6 Quadratic Forms 28 2.7 LDU and Cholesky Factorizations of a Matrix 29 2.8 Singular Value Decomposition of a Matrix 32 2.9 Idempotent Matrices and Generalized Inverses 33 2.10 Projections 35 2.11 Kronecker Products 36 2.12 Shaping Operations 37 2.13 Vector and Matrix Derivatives 37 2.14 Updating and Backdating Matrix Operations 39 2.15 Notes and Comments 40 2.16 Exercises 40

3 Probability Theory in a Nutshell 3.1 Sample Spaces 43 3.2 Event Fields and Information 43 3.3 Probability 46 3.4 Statistical Independence and Repeated Trials 54 3.5 Random Variables, Distribution Functions,

and Expectations 58 3.6 Transformations of Random Variables 71 3.7 Special Distributions 75 3.8 Notes and Comments 79

Exercises 84

Page 3: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools ©2000 Page ii______________________________________________________________________________

4 Limit Theorems in Statistics 4.1 Sequences of Random Variables 89 4.2 Independent and Dependent Random Sequence 98 4.2 Laws of large Numbers 101 4.3 Central Limit Theorems 105 4.4 Extensions of Limit Theorems 109 4.5 References 115 4.6 Exercises 116

5 Experiments, Sampling, and Statistical Decisions 5.1 Experiments 117 5.2 Populations and Samples 119 5.3 Statistical Decisions 122 5.4 Statistical Inference 127 5.5 Exercises 127

6 Estimation 6.1 Desirable Properties of Estimators 129 6.2 General Estimation Criteria 138 6.3 Estimation in Normally Distributed Populations 140 6.4 Large Sample Properties of Maximum Likelihood Estimators 143 6.5 Exercises 153 7 Hypothesis Testing 7.1 The General Problem 155 7.2 The Cost of Mistakes 155 7.3 Design of the Experiment 156 7.4 Choice of Decision Procedure 158 7.5 Hypothesis Testing in Large Samples 166 7.6 Exercises 169

Page 4: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-1, Page 1____________________________________________________________________________

CHAPTER 1. ECONOMIC ANALYSIS AND ECONOMETRICS

1.1. INTRODUCTION

The study of resource allocation by the discipline of Economics is both a pure science, concernedwith developing and validating theories of behavior of individuals, organizations, and institutions,and a policy science, concerned with the design of institutions and the prediction and socialengineering of behavior. In both arenas, concepts from probability and statistics, and methods foranalyzing and understanding economic data, play an important role. In this chapter, we give threeintroductory examples that illustrate the intertwining of economic behavior, statistical inference, andeconometric forecasting. These examples contain some probability and statistics concepts that areexplained in later chapters. What is important on first reading are the general connections betweeneconomic reasoning, probability, statistics, and economic data; details can be postponed.

1.2. CAB FRANCS RATIONAL DECISION

Cab Franc is a typical professional economist: witty, wise, and unboundedly rational. Cabworks at a university in California. Cab's life is filled with fun and excitement, the high point ofcourse being the class he teaches in econometrics. To supplement his modest university salary, Caboperates a small vineyard in the Napa Valley, and sells his grapes to nearby wineries.

Cab faces a dilemma. He has to make a decision on whether to harvest early or late in theseason. If the Fall is dry, then late-harvested fruit is improved by additional "hang time", and willfetch a premium price. On the other hand, if rains come early, then much of the late-harvested fruitwill be spoiled. If Cab harvests early, he avoids the risk, but also loses the opportunity for themaximum profit. Table 1 gives Cab's profit for each possible action he can take, and each possibleEvent of Nature:

Table 1. Profit from Selling Grapes

Action

Event of Nature Frequency Harvest Early Harvest Late

Wet 0.4 $30,000 $10,000

Dry 0.6 $30,000 $40,000

Expected Profit $30,000 $28,000

Page 5: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-2, Page 2____________________________________________________________________________

Cab wants to maximize expected profit. In other words, he wants to make theprobability-weighted average of the possible profit outcomes as large as possible. Cab is not adverseto risk; he figures that risks will average out over the years. From historical records, he knows thatthe frequency of early rain is 0.4. To calculate expected profit in this case from a specified action,Cab multiplies the profit he will receive in each event of Nature by the probability of this event, andsums. If he harvests early, the expected profit is ($30,000)(0.4) + ($30,000)(0.6) = $30,000. If heharvests late, the expected profit is ($10,000)(0.4) + ($40,000)(0.6) = $28,000. Then, in theabsence of any further information, Cab will choose to harvest early and earn an expected profit of$30,000.

There is a specialized weather service, Blue Sky Forecasting, that sells long-run precipitationforecasts for the Napa Valley. Cab has to choose whether to subscribe to this service, at a cost of$1000 for the year. Table 2 gives the historical record, over the past 100 years, on the jointfrequency of various forecasts and outcomes.

Table 2. Frequency of Forecasts and Outcomes

Blue Sky Forecasts TOTAL

Event of Nature Early Late

Wet 0.3 0.1 0.4

Dry 0.2 0.4 0.6

TOTAL 0.5 0.5 1.0

The interpretation of the number 0.3 is that in 30 percent of the past 100 years, Blue Sky has forecastearly rain and they do in fact occur. The column totals give the frequencies of the different Blue Skyforecasts. The row totals give the frequencies of the different Events of Nature. Thus, Blue Skyforecasts early rain half the time, and the frequency of actual early rain is 0.4. One can also formconditional probabilities from Table 2. For example, the conditional probability of dry, given theevent that late rain are forecast, equals 0.4/(0.1+0.4) = 0.8.

If Cab does not subscribe to the Blue Sky forecast service, then he is in the situation alreadyanalyzed, where he will choose to harvest early and earn an expected profit of $30,000. Nowsuppose Cab does subscribe to Blue Sky, and has their forecast available. In this case, he can do hisexpected profit calculation conditioned on the forecast. To analyze his options, Cab first calculatesthe conditional probabilities of early rain, given the forecast:

Prob(WetForecast Early) = 0.3/(0.3+0.2) = 0.6 Prob(DryForecast Late) = 0.1/(0.1+0.4) = 0.2 .

Page 6: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-3, Page 3____________________________________________________________________________

The expected profit from harvesting early is again $30,000, no matter what the forecast. Nowconsider the expected profit from harvesting late. If the forecast is for early rains, the expectedprofit is given by weighting the outcomes by their conditional probabilities given the forecast, or

($10,000)(0.6) + ($40,000)(0.4) = $22,000.

This is less than $30,000, so Cab will definitely harvest early in response to a forecast of early rain.Next suppose the forecast is for late rain. Again, the expected profit is given by weighting theoutcomes by their conditional probabilities given this information,

($10,000)(0.2) + ($40,000)(0.8) = $34,000.

This is greater than $30,000, so Cab will harvest late if the forecast is for late rain.Is subscribing to Blue Sky worth while? If Cab does not, then he will always harvest early and

his expected profit is $30,000. If Cab does subscribe, then his expected profit in the event of an earlyrain forecast is $30,000 and in the event of a late rain forecast is $34,000. Since the frequency ofan early rain forecast is 0.5, Cab's overall expected profit if he subscribes is

($30,000)(0.5) + ($34,000)(0.5) = $32,000.

This is $2000 more than the expected profit if Cab does not subscribe, so that the value of theinformation provided by the subscription is $2000. This is more than the $1000 cost of theinformation, so Cab will choose to subscribe and will earn an overall expected profit, net of thesubscription cost, of $31,000.

Cab Francs decision problem is a typical one for an economic agent facing uncertainty. He hasa criterion (expected profit) to be optimized, a "model" of the probabilities of various outcomes, thepossibility of collecting data (the forecast) to refine his probability model, and actions that will bebased on the data collected. An econometrician facing the problem of statistical inference is in asimilar situation: There is a "model" or "hypothesis" for an economic phenomenon, data thatprovides information that can be used to refine the model, and a criterion to be used in determiningan action in response to this information. The actions of the econometrician, to declare a hypothesis"true" or "false", or to make a forecast, are similar in spirit to Cab's choice. Further, the solution tothe econmetrician's inference problem will be similar to Cab's solution.

A textbook definition of econometrics is the application of the principles of statistical inferenceto economic data and hypotheses. However, Cab Franc's problem suggests a deeper connectionbetween econometric analysis and economic behavior. The decision problems faced by rationaleconomic agents in a world with imperfect information require statistical inference, and thus are"econometric" in nature. The solutions to these problems require the same logic and techniques thatmust be brought to bear more formally in scientific inference. Thus, all rational economic agents

Page 7: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-4, Page 4____________________________________________________________________________

are informal working econometricians, and the study of formal econometrics can provide at leastprescriptive models for economic behavior. Turned around, econometrics is simply a codificationof the "folk" techniques used by economic agents to solve their decision problems. Thus, the studyof econometrics provides not only the body of tools needed in empirical and applied economics fordata analysis, forecasting, and inference, but also key concepts needed to explain economic behavior.

1.3. STOCK MARKET EFFICIENCY

The hypothesis is often advanced that the stock market is efficient. Among the possiblemeanings of this term is the idea that arbitragers are sufficiently active and pervasive so that potentialwindows of opportunity for excess profit are immediately closed. Consider a broad-based stockmarket index, the New York Stock Exchange (NYSE) value-weighted index of the prices of all thestocks listed with this exchange. The gross return to be made by taking a dollar out of a "risk-free"channel (defined here to be 90-day U.S. Treasury Bills), buying a dollar's worth of the marketstock portfolio, and selling it one day later is gt = log(Mt/Mt-1), where Mt equals the NYSE index onday t, and the log gives the one-day exponential growth rate in share price. It is necessary in generalto account for distributions (dividends) paid by the stocks during the time they are held; this is doneautomatically when gt is reported. An arbitrager on day t-1 knows the history of the market andeconomic variables up through that day; let Ht-1 denote this history. In particular, Ht-1 includes thelevel Mt-1 of the index, the pattern of historical market changes, and the overnight interest rate it-1 on90-day Treasury Bills, representing the opportunity cost of not keeping a dollar in a T-Bill account.The difference Rt = gt - it-1 is the profit an arbitrager makes by buying one dollar of the NYSEportfolio on t-1, and is called the excess return to the market on day t. (If the arbitrager sells ratherthan buys a dollar of the exchange index in t-1, then her profit is -Rt). On day t-1, conditioned onthe history Ht-1, the excess return Rt is a random variable, and the probability that it is less than anyconstant r is given by a cumulative distribution function F(r|Ht-1). Then, the expected profit is

. The argument is that if this expected profit is positive, then arbitragers will

rF(r|Ht1)dr

buy and drive the price Mt-1 up until the opportunity for positive expected profit is eliminated.Conversely, if the expected profit is negative, arbitragers will sell and drive the price Mt-1 down untilthe expected profit opportunity is eliminated. Then, no matter what the history, arbitrage shouldmake expected profit zero. This argument does not take account of the possibility that there may besome trading cost, a combination of transactions charges, risk preference, the cost of acquiringinformation, and the opportunity cost of the time the arbitrager spends trading. These trading costscould be sufficient to allow a small positive expected excess return to persist; this is in fact observed,and is called the equity premium. However, even if there is an equity premium, the arbitrageargument implies that the expected excess return should be independent of history.

Page 8: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-5, Page 5____________________________________________________________________________

1The mean difference is 0.202% with a standard error of 0.010%, and the T-statistic for the hypothesis thatthe two conditional means is the same is 19.5.

From the CRSP financial database and from the Federal Reserve Bank, we take observations ongt and it-1 for all days the market is open between January 2, 1968 and December 31, 1998, a totalof 7806 observations. We then calculate the excess return Rt = gt - it-1. The next table gives somestatistics on these quantities:

Variable Sample Average Sample Standard Deviation

gt 0.0514% 0.919%

it-1 0.0183% 0.007%

Rt 0.0331% 0.919%

These statistics imply that the annual return in the NYSE index over this period was 18.76 percent,and the annual rate of interest on 90-day Treasury Bills was 6.68 percent.

Now consider the efficient markets hypothesis, which we have argued should lead to excessreturns that do not on average differ from one previous history to another. The table below showsthe sample average excess return in the NYSE index under each of two possible conditions, apositive or a negative excess return on the previous day.

Condition Frequency Sample Average Standard Error

Rt-1 > 0 4082 (52.3%) 0.129% 0.013%

Rt-1 < 0 3723 (47.7%) -0.072% 0.016%

We conclude that excess return on a day following a positive excess return is on average positive,and on a day following a negative excess return is on average negative. The standard errors measurethe precision of the sample averages. These averages are sufficiently precise so that we can say thatthe difference in sample averages did not arise by chance, and the expected excess returns under thetwo conditions are different.1 We conclude that over the time period covered by our sample, theefficient markets hypothesis fails. There was a persistence in the market in which good days tendedto be followed by better-than-average days, and bad days by worse-than-average days. There appearsto have been a potentially profitable arbitrage strategy, to buy on up days and sell on down days, thatwas not fully exploited and eliminated by the arbitragers.

Define a positive (negative) run of length n to be a sequence of n successive days on which theexcess return is positive (negative), with a day on each end on which the sign of excess return is

Page 9: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-6, Page 6____________________________________________________________________________

2If P is the probability of an up day, then Pn-1(1-P) is the probability of an up run of length n, and thismultiplied by the total number of positive runs is the expected number of length n. An analogous formula appliesfor negative runs.

-0.3%

-0.2%

-0.1%

0.0%

0.1%

0.2%

0.3%

Aver

age

Exce

ss R

etur

n

-6 -4 -2 0 2 4 6 Up (+) or Down (-) Run Length

NYSE Value-Weighted Index(Excess Return Including Dividends)

reversed. The figure below plots the average excess returns conditioned on the length of an up (+)or down (-) run over the immediate past days. The horizontal lines above and below each point givean indication of the accuracy with which it is estimated; these are 95 percent confidence bounds.The figure suggests that expected returns are positive following up runs and negative following downruns. A straight line is fitted through these points, taking into account the accuracy with which theyare estimated, by the least squares regression method. This is also plotted in the figure, and showsthat the average returns have a positive trend. The data suggest that the trend line may overstate theimpact of longer run lengths, and longer runs may be less predictive. Nevertheless, the figuresupports our previous conclusion that the efficient markets hypothesis fails.

The preceding evidence suggests that arbitragers may underestimate the persistence of up anddown runs, and fail to make profitable bets on persistence. The table below gives the observedcounts of numbers of up and down runs of various lengths. It also includes a prediction of thenumbers of runs of various lengths that you would expect to see if runs are the result of independentcoin tosses, with the probability of an up outcome equal to the 52.4% frequency with which theexcess return was positive in our sample.2 What one sees is that there are many fewer runs of lengthone and more runs of longer lengths than the coin tossing model predicts. There is the possibilitythat the differences in this table are the result of chance, but a statistical analysis using what is calleda likelihood ratio test shows that the pattern we see is very unlikely to be due to chance. Then, upand down runs are indeed more persistent than one would predict if one assumed that the probabilityof a up or down day was independent of previous history.

Page 10: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-7, Page 7____________________________________________________________________________

3 For example, stock prices historically have been adjusted in units of 1/8 of a dollar, rather than to exactmarket-clearing levels. Some day-to-day variations reflect this institutional precularity.

RunLength

Observed Positive

ObservedNegative

ExpectedPositive

ExpectedNegative

1 1772 1772 1940.6 1938.0 2 1037 954 1015.1 924.3 3 602 495 531.0 440.8 4 313 245 277.7 210.2 5 159 120 145.3 100.3

6 or more 186 119 159.3 91.4 Total 4069 3705 4069 3705

This example shows how an economic hypothesis can be formulated as a condition on aprobability model of the data generation process, and how statistical tools can be used to judgewhether the economic hypothesis is true. In this case, the evidence is that either the efficient marketshypothesis does not hold, or that there is a problem with one of the assumptions that we made alongthe way to facilitate the analysis.3 A more careful study of the time-series of stock market prices thatwas done above tends to support one aspect of the efficient markets hypothesis, that expected profitat time t-1 from an arbitrage to be completed the following period is zero. Thus, the elementaryeconomic idea that arbitragers discipline the market is supported. However, there do appear to belonger-run time dependencies in the market, as well as heterogeneities, that are inconsistent withsome stronger versions of the efficient markets hypothesis.

1.4. THE CAPITAL ASSET PRICING MODEL

The return that an investor can earn from a stock is a random variable, depending on events thatimpinge on the firm and on the economy. By selecting the stocks that they hold, investors can tradeoff between average return and risk. A basic, and influential, theory of rational portfolio selectionis the Capital Asset Pricing (CAP) model. This theory concludes that if investors are concerned onlywith the mean and variance of the return from their portfolio, then there is a single portfolio of stocksthat is optimal, and that every rational investor will hold this portfolio in some mix with a risklessasset to achieve the desired balance of mean and variance. Since every investor, no matter what herattitudes to risk, holds the same stock portfolio, this portfolio will simply be a share of the totalmarket; that is, all investors simply hold a market index fund. This is a powerful conclusion, and

Page 11: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-8, Page 8____________________________________________________________________________

one that appears to be easily refutable by examining the portfolios of individuals. This suggests thatother factors, such as irrationality, transactions costs, heterogeneity in information, or preferencesthat take into account risk features other than mean and variance, are influencing behavior.Nevertheless, the CAP model is often useful as a normative guide to optimal investment behavior,and as a tool for understanding the benefits of diversification.

To explain the CAP model, consider a market with K stocks, indexed k = 1,2,...,K. Let Pkt bethe price of stock k at the end of month t; this is a random variable when considered before the endof month t, and after that it is a number that is a realization of this random variable. Suppose aninvestor can withdraw or deposit funds in an account that holds U.S. Treasury 30-Day Bills that payan interest rate it-1 during month t. (The investor is assumed to be able to borrow money at this rateif necessary.) Conventionally, the T-Bill interest rate is assumed to be risk-free and known to theinvestor in advance. The profit, or excess return, that the investor can make from withdrawing adollar from her T-Bill account and buying a dollar's worth of stock k is given by

RktPktdk,t1Pk,t1

Pk,t1

it1,

where dk,t-1 is the announced dividend paid by the stock at the end of month t. The excess return Rktis again a random variable. Let rk denote the mean of Rkt. Let σk

2 denote its variance, and let σkjdenote the covariance of Rkt and Rjt. Note that σkk and σk

2 are two different notations for the samevariance. The square root of the variance, σk, is called the standard deviation.

Consider an investors portfolio of value A* at the beginning of a month, and suppose A dollarsare invested in stocks and A*-A dollars are held in the risk-free account. Many investors will have0 A A*. However, it is possible to have A > A*, so that the investor has borrowed money (at therisk-free rate) and put this into stocks. In this case, the investor is said to have purchased stocks onmargin. For the all-stock component of the portfolio, a fraction θk of each dollar in A is allocatedto shares of stock k, for k = 1,...,K. The excess return to this portfolio is then ARpt, where

Rpt = K

k1θkRkt

is the excess return to the one dollar stock portfolio characterized by the shares (θ1,...,θK). Thefractions θk are restricted to be non-negative. However, the list of stocks may also includefinancial derivatives, which can be interpreted as lottery tickets that pay off in dollars or stocks underspecified conditions. For example, an investor may short a stock, which means that she in effectsells an IOU promising to deliver a share of the stock at the end of the month. She is then obligatedto deliver a share of the stock at the end of the month, if necessary by buying a share of this stockto deliver in order to complete the transaction. Other elementary examples of financial derivativesare futures, which are contracts to deliver stocks at some future date, and mutual funds, which areinstitutions that sell shares and use the proceeds to buy portfolios of stocks. There are also morecomplex financial derivatives that require delivery under specified conditions, such as an increase

Page 12: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-9, Page 9____________________________________________________________________________

in a stock market index of more than a specified percentage. The excess return Rpt is again a random

variable, with mean rp = and variance σp2 = This implies that the stock

K

k1θk rkt

K

k1K

j1θkθjσkj.

portfolio with A dollars invested has an excess return with mean Arp and variance A2σp2 (or standard

deviation Aσp). The covariance of Rkt and Rpt is given by σkp cov(Rkt,Rpt) = Define theK

j1θjσkj.

beta of stock k (with respect to the stock portfolio p) by the formula βk = σkp/σp2, and note that

σp2 = θk = θkσkp = θkβkσp

2, K

k1K

j1θjσkj

K

k1K

k1

and hence that θkβk = 1. K

k1

Now consider the rational investor's problem of choosing the level of investment A and theportfolio mix θ1,θ2,...,θK. Assume that the investor cares only about the mean and standard deviationof the excess return from her portfolio, and always prefers a higher mean and a lower standarddeviation. The investor's tastes for risk will determine how mean and standard deviation are tradedoff. The figure below shows the alternatives available to the investor. The investor prefers to be asfar to the northwest in this figure as possible, where mean is high and standard deviation is low, andwill have indifference curves that specify her tradeoffs between mean and standard deviation. (Notethat an investor with low risk aversion will have indifference curves that are almost horizontal, whilean extremely risk-averse investor will have indifference curves that are almost vertical.) Ifpreferences between mean and standard deviation are derived from a utility function that is concavein consumption, then the indifference curve will be convex. A specified mix (θ1

1,...,θK1) of stocks

determines a one-dollar all-equity portfolio with particular values for standard deviation and mean,and an all-stock portfolio of value A* invested in this mixture will have a mean and standarddeviation that are A* times those of the one-dollar portfolio. This point is indicated in the diagramas Portfolio 1. By holding A of the all-equity portfolio and A* - A of the riskless asset in somecombination, the investor can attain mean and standard deviation combinations anywhere on astraight line through the origin and the all-equity Portfolio 1 point. (Points between zero and the all-equity portfolio correspond to investing only part of the investors assets in stocks, while points onthe line to the right of the Portfolio 1 point correspond to borrowing money to take a margin positionin stocks.) Another mix (θ1

2,...,θK2) gives the point in the diagram indicated as Portfolio 2. Again,

by holding the riskless asset and Portfolio 2 in some combination, the consumer could attain anypoint on the straight line connecting the origin and Portfolio 2. There will be a frontier envelope onthe north-west boundary of all the mean and standard deviation combinations that can be attained

Page 13: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-10, Page 10____________________________________________________________________________

0

0.1

0.2

0.3

0.4

Mea

n

0 0.1 0.2 0.3 0.4 0.5 Standard Deviation

Portfolio Mean and Standard Deviation

1

2BP

OP

with all-equity portfolios with value A*; this envelope is drawn as a heavy curve in the figure, andrepresents efficient portfolios in terms of tradeoffs between mean and standard deviation in all-equityportfolios.

Portfolio 1 is efficient, while Portfolio 2 is not because it is southwest of the all-equity portfoliofrontier. Note however that the consumer can be made better off with a portfolio that is somecombination of Portfolio 2 and the riskless asset than with any combination of Portfolio 1 and theriskless asset, because the line through Portfolio 2 is always northwest of the line through Portfolio1. Consider all the lines that connect the origin and all-equity portfolios. The location of these linesreflects the operation of diversification to reduce risk; i.e., by holding a mix of stocks, some of whichare likely to go up when others are going down, one may be able to reduce standard deviation for agiven level of the mean. There will be an efficient mix (θ1

*,...,θK*), labeled BP (for Best Portfolio)

in the figure, that gives a line that is rotated as far to the northwest as possible. No matter what thespecific tastes of the investor for mean versus standard deviation, she will maximize preferences

Page 14: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-11, Page 11____________________________________________________________________________

somewhere alone this tangency line, using some combination of the riskless asset and the BestPortfolio. The diagram shows for a particular indifference curve how the optimal portfolio, labeledOP in the figure, is determined. Different investors will locate at different points along the optimalline, by picking different A levels, to maximize their various preferences for mean versus standarddeviation. However, all investors will choose exactly the same BP mix (θ1

*,...,θK*) for the stocks that

they hold. But if every investor holds stocks in the same proportions, then these must also be theproportions that prevail in the market as a whole. Then the Best Portfolio (θ1

*,...,θK*) will be the

shares by value of all the stocks in the market. Such a portfolio is called a market index fund. TheCAP model then concludes that rational investors who care only about the mean and standarddeviation of excess return will hold only the market index fund, with the levels of investmentreflecting their heterogeneous tastes for risk. Investors may purchase individual stocks in the BPproportions; however, there are mutual funds that do precisely this, and the investor may then simplyput her stock portfolio into the market index mutual fund.

The problem of determining the optimal portfolio mix (θ1*,...,θK

*) is most easily solved byconsidering a closely related problem. An investor's portfolio is characterized by A and (θ1,...,θK).Given a choice among all the portfolios that achieve a specified level of mean return, the investorwould want to choose the one that minimizes variance. In the figure, this corresponds to getting asfar to the left as possible when constrained to the feasible portfolios on a specified horizontal line.From the previous discussion, the solution to this problem will be a portfolio with the optimal mixof stocks, and the only difference between this problem and the one of maximizing preferences willbe in determining the overall investment level A. The problem of minimizing variance for a givenmean is one of constrained minimization:

Choose A,θ1,...,θK 0 to minimize A2 θkθjσkj, subject to A θkrk = c and θkK

k1K

j1K

k1K

k1

= 1, where c is a constant that can be varied parametrically. The first-order (Kuhn-Tucker)conditions for this problem are

(1) 2A2 θj*σkj λArk + µ, with equality unless θk

* = 0, for k = 1,...,K K

j1

(2) 2A θk*θj

*σkj = λ θk*rk,

K

k1K

j1K

k1

where the scalars λ and µ are Lagrange multipliers. Multiply (1) by θk* and sum over k to obtain the

result

A = µ θk*. 2A

K

k1K

j1θkθ

j σkj λK

k1θkrk

K

k1

Page 15: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-12, Page 12____________________________________________________________________________

Using (2), this implies µ = 0. Then, (1) implies that the optimal θk* satisfy

(3) θj*σkj γrk, with equality unless θk

* = 0, for k = 1,...,K, K

j1

where γ is a scalar defined so that the θk* sum to one. Since by the earlier comments this mix is

simply the mix in the total market, equality will hold for all stocks that are in the market and havepositive value.

Assume now that rp = θk*rk and σp

2 = θk*θj

*σkj refer to the best portfolio. TheK

k1K

k1K

j1

left-hand-side of (3) equals σkp βkσp2, so this condition can be rewritten

(4) βkσp2 γrk, with equality unless θk

* = 0, for k = 1,...,K.

Multiplying both sides of this inequality by θk* and summing yields the condition

γrp γ θk*rk = σp

2 θkβk = σp2,

K

k1K

k1

or rp = σp2/γ. Substituting this into (4) gives us the final form of a main result of the CAP model,

rk βkrp, with equality if the stock is held, for k = 1,...,K. The mean returns are not observed directly, but the realizations of monthly returns on individual

stocks and the market are observed. Write an observed return as the sum of its mean and a deviation

from the mean, Rkt = rk + kt and Rpt = rp + pt. Note that Rpt = θk*Rkt, so then pt = θk

*kt.

K

k1K

k1

For all stocks held in the market, the CAP model implies rk = βkrp. Use the form Rkt = rk + kt torewrite the equation rk = βkrp as Rkt - kt = βk(Rpt - pt). Define νkt = kt - βkpt. Then, the equationbecomes

(5) Rkt = βkRpt + νkt.

This equation can be interpreted as a relation between market risk, embodied in the market excessreturn Rpt, and the risk of stock k, embodied in Rkt. The disturbance νkt in this equation is sometimescalled the specific risk in stock k, the proportion of the total risk in this stock that is not responsiveto market fluctuations. This disturbance has the following properties:

(a) Eνkt = 0;

Page 16: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-13, Page 13____________________________________________________________________________

(b) Eνkt2 = E(kt - βkpt)2 = σk

2 + βk2σp

2 - 2βkσkp σk2 - βk

2 σ2p

(c) EνktRpt = E(kt - βkpt)pt = σkp - βkσp2 = 0.

Equation (5) is a linear regression formula. Properties (a)-(c) are called the Gauss-Markovconditions. We will see that they imply that an estimate of βk with desirable statistical properties canbe obtained by using the method of least squares. Then, the CAP model's assumptions on behaviorimply an econometric model that can then be fitted to provide estimates of the market betas, keyparameters in the CAP analysis.

The market beta's of individual stocks are often used by portfolio managers to assess the meritsof adding or deleting stocks to their portfolios. (Of course, the CAP model says that there is no needto consider holding portfolios different than the market index fund, and therefore no need forportfolio managers. That these things exist is itself evidence that there is some deficiency in theCAP model, perhaps due to failures of rationality, the presence of transactions cost, or the ability tomimic the excess return of the market using various subsets of all the stocks on the market becausethe optimal portfolio is not unique.) Further, statistical analysis of the validity of the assumptions(a)-(c) can be used to test the validity of the CAP model.

The β's in formula (5) convey information on the relationship between the excess return on anindividual stock and the excess return in the market. Subtract means in (5), square both sides, andtake the expectation to get the formula

σk2 = βk

2σp2 + δk

2,

where δk2 = Eνkt

2 is the variance of the disturbance. This equation says that the risk of stock k equalsthe market risk, amplified by βk

2, plus the specific risk. Stock k will have high risk if it has largespecific risk, or a βk that is large in magnitude, or both. A positive βk implies that events thatinfluence the market tend to influence stock k in the same way; i.e., the stock is pro-cyclic. Anegative βk implies that the stock tends to move in a direction opposite that of the market, so that itis counter-cyclic. Stocks that have small or negative βk are defensive, aiding diversification andmaking an important contribution to reducing risk in the market portfolio.

The CAP model described in this example is a widely used tool in economics and finance. Itillustrates a situation in which economic axioms on behavior lead to a widely used statistical modelfor the data generation process, the linear regression model with errors satisfying Gauss-Markovassumptions. An important feature of this example is that these statistical assumptions are impliedby the economic axioms, not attached ad hoc to facilitate data analysis. It is a happy, but rare,circumstance in which an economic theory and its econometric analysis are fully integrated. To seeksuch harmonies, theorists need to draw out and make precise the empirical implications of theirwork, and econometricians need to develop models and methods that minimize the need for

Page 17: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-14, Page 14____________________________________________________________________________

facilitating assumptions that make the statistical analysis tractable but whose economic plausibilityis weak or undetermined.

1.5. CONCLUSION

A traditional view of econometrics is that it is the special field that deals with economic data andwith statistical methods that can be employed to use these data to test economic hypotheses andmake forecasts. If this were all there was to econometrics, it would still be one of the most importantparts of the training of most economists, who in their professional careers deal with economichypotheses, policy issues, and planning that in the final analysis depend on facts and theinterpretation of facts. However, the examples in this chapter are intended to show that the subjectof econometrics is far more deeply intertwined with economic science, providing tools for modelingcore theories of the behavior of economic agents under uncertainty, and a template for rationaldecision-making under incomplete information. This suggests that it is useful to understandeconometrics at three levels: (1) the relatively straightforward and mechanical procedures forapplied econometric data analysis and inference that are needed to understand and carry outempirical economics research, (2) a deeper knowledge of the theory of statistics in the form that itis needed to develop and adapt econometric tools for the situations frequently encountered in appliedwork where conventional techniques may not apply, or may not make efficient use of the data, and(3) a deeper knowledge of the concepts and formal theory of probability, statistics, and decisiontheory that enter models of the behavior of economic agents, and show the conceptual unity ofeconometrics and economic theory.

1.6. EXERCISES

Questions 1-3 refer to the decision problem of Cab Franc, from Section 1.2.

1. The Department of Agriculture offers Cab crop insurance, which costs $1000. If Cabs revenue from selling grapesfalls below 90 percent of his expected net revenue of $31,000, then the insurance reimburses him for the differencebetween his actual revenue and 90 percent of $31,000. Is this insurance actuarially fair if Cab makes the same operatingdecisions that he did without insurance? Will Cab in fact make the same operating decisions if he has the insurance?Will he buy the insurance?

2. Blue Sky introduces a more detailed forecast, with the properties described below, at a cost of $1500. Will Cab buythis forecast rather than the previous one? For this problem, assume there is no crop insurance.

Page 18: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-15, Page 15____________________________________________________________________________

Frequency of Expanded Forecasts and Outcomes

Blue Sky Forecasts TOTAL

Event of Nature Bad Poor Fair Good

Wet 0.15 0.15 0.1 0.0 0.4

Dry 0.05 0.15 0.2 0.2 0.6

TOTAL 0.2 0.3 0.3 0.2 1.0

3. After reviewing his budget commitments, Cab decides he is not risk neutral. If his net income from selling grapesfalls below his mortgage payment of $25,000, then he will have to borrow the difference in the local spot market, andthe vig will double the cost. Put another way, his utility of income Y is u = Y - 0.5max(25000-Y,0). What harvestingdecision will maximize Cabs utility if he subscribes to the Blue Sky forecast? Again assume there is no crop insurance,but assume the expanded Blue Sky forecasts in question 2 are available.

Questions 4-5 refer to the market efficiency analysis in Section 1.3.

4. Formulating the expected markets hypothesis as a feature produced by large numbers of active arbitragers assumesthat there is an objective probability distribution for outcomes, and this distribution is common knowledge to allarbitragers. What would you expect to happen if the probabilities are subjective and are not the same for all arbitragers?If the number of arbitragers is limited? If institutional constraints place limits on the magnitudes of the positions thatarbitragers can take? If arbitragers are risk-averse rather than risk neutral? (This is a thought question, as you do notyet have the tools for a formal analysis.)

5. The narrow form of the efficient markets hypothesis states that arbitragers will buy or sell to take advantage ofexpected profits or losses, and thereby arbitrage away these profits and losses, so that expected returns are zero. Broaderforms of the efficient markets hypothesis state that arbitragers will create derivatives, and trade away any expected profitfor any specified position they could take in the market for these derivatives. For example, a Call (or Call option) isan agreement between the buyer and the seller of the option whereby the buyer obtains the right but not the obligationto buy an agreed amount of a stock at a pre-agreed price (the strike price), on an agreed future date (the value date).Conversely, the seller of a Call has the obligation, but not the right, to sell the agreed amount of a stock at the pre-agreedprice. Obviously, the holder of a Call will exercise it if the price on the value date exceeds the strike price, and willotherwise let the option expire. The price of a Call includes a premium that reflects its value as insurance against anincrease in the price of the stock, and is determined by the beliefs of buyers and sellers about the probabilities of pricesabove the strike price. A Put option is in all respect the same except that it confers the right but not the obligation tosell at a pre-agreed rate. Suppose the cumulative probability distribution F(Pt|Pt-1) for the price of a stock on a value datet, given the current price at t-1, is known to all investors. Derive a formula for the price of a Call option at strike priceP* if all investors are risk neutral.

Questions 6-8 refer to the Capital Asset Pricing Model in Section 1.4.

6. The table below gives the probability distribution of next months price for three stocks, each of which has a currentprice of $100. There are no declared dividends in this month. The risk-free interest rate for the month is 0.05.Calculate the mean excess return for each stock, and the variances and covariances of their excess returns.

Page 19: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 1-16, Page 16____________________________________________________________________________

Probability Stock A Stock B Stock C

0.25 $120 $130 $140

0.2 $110 $110 $100

0.2 $110 $90 $110

0.25 $100 $100 $90

0.1 $90 $100 $100

7. If the market is composed of the three stocks described in Question 6, with an equal number of shares of each stockin the market, calculate the excess return and variance of the market. Calculate the beta of each stock.

8. To derive the CAP model description of the optimal portfolio, instead of minimizing variance subject to a constrainton the expected rate of return, maximize the expected rate of return subject to a constraint on the variance. Give aninterpretation of the Lagrange multiplier in this formulation. Show that it leads to the same characterization of theoptimal portfolio as before.

Page 20: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-1, Page 17 ___________________________________________________________________________

CHAPTER 2. ANALYSIS AND LINEAR ALGEBRA IN A NUTSHELL

2.1. SOME ELEMENTS OF MATHEMATICAL ANALYSIS

2.1.1. Real numbers are denoted by lower case Greek or Roman numbers; the space of realnumbers is the real line, denoted by . The absolute value of a real number a is denoted by a.Complex numbers are rarely required in econometrics before the study of time series and dynamicsystems. For future reference, a complex number is written a + ιb, where a and b are real numbersand ι is the square root of -1, with a termed the real part and ιb termed the imaginary part. Thecomplex number can also be written as r(cos θ + ι sin θ), where r =(a2+b2)1/2 is the modulus of thenumber and θ = cos-1(a/r). The properties of complex numbers we will need in basic econometricsare the rules for sums, (a+ιb) + (c+ιd) = (a+c)+ι(b+d), and products, (a+ιb)(c+ιd) =(ab-cd)+ι(ad+bc).

2.1.2. For sets of objects A and B, the union AB is the set of objects in either; the intersectionAB is the set of objects in both; and A\B is the set of objects in A that are not in B. The empty setis denoted φ. Set inclusion is denoted A B; we say A is contained in B. The complement of a setA (which may be relative to a set B that contains it) is denoted Ac. A family of sets is disjoint if theintersection of each pair is empty. The symbol a A means that a is a member of A; and a Ameans that a is not a member of A. The symbol means "there exists", the symbol means "forall", and the symbol means "such that". A proposition that A implies B is denoted A B, anda proposition that A and B are equivalent is denoted A B. The proposition that A implies B,but B does not imply A, is denoted A | B. The phrase if and only if is often abbreviated toiff.

2.1.3. A function f:A B is a mapping from each object a in the domain A into an object b = f(a)in the range B. The terms function, mapping, and transformation will be used interchangeably. Thesymbol f(C), termed the image of C, is used for the set of all objects f(a) for a C. For D B, thesymbol f-1(D) denotes the inverse image of D: the set of all a A such that f(a) D. The functionf is onto if B = f(A); it is one-to-one if it is onto and if a,c A and a c implies f(a) f(c). Whenf is one-to-one, the mapping f-1 is a function from B onto A. If C A, define the indicator functionfor C, denoted 1

:A , by 1

(a) = 1 for a C, and 1

(a) = 0 otherwise. The notation 1(aC) is

also used for the indicator function 1. A function is termed real-valued if its range is .

2.1.4. The supremum of A, denoted sup A, is the least upper bound on A. A typical applicationhas a function f:C and A = f(C); then supcC f© is used to denote sup A. If the supremum isachieved by an object d C, so f(d) = supcC f(c), then we write f(d) = maxcC f(c). When there isa unique maximizing argument, write d = argmaxcC f(c). When there is a non-unique maximizing

Page 21: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-2, Page 18 ___________________________________________________________________________

argument; we will assume that argmaxcC f(c) is a selection of any one of the maximizing arguments.Analogous definitions hold for the infimum and minimum, denoted inf, min, and for argmin.

2.1.5. If ai is a sequence of real numbers indexed by i = 1,2,..., then the sequence is said to havea limit (equal to ao) if for each > 0, there exists n such that ai - ao < for all i n; the notationfor a limit is limi ai = ao or ai ao. The Cauchy criterion says that a sequence ai has a limit if andonly if, for each > 0, there exists n such that ai - aj < for i,j n. The notation limsupi ai meansthe limit of the supremum of the sets ai,ai+1,...; because it is nonincreasing, it always exists (butmay equal + or -). An analogous definition holds for liminf.

2.1.6. A real-valued function ρ(a,b) defined for pairs of objects in a set A is a distance functionif it is non-negative, gives a positive distance between all distinct points of A, has ρ(a,b) = ρ(b,a),and satisfies the triangle inequality ρ(a,b) ρ(a,c) + ρ(c,b). A set A with a distance function ρ istermed a metric space. A real-valued function a defined for objects in a set A is a norm if a-bhas the properties of a distance function. A typical example is the real line , with the absolute valueof the difference of two numbers taken as the distance between them; then is a metric space anda normed space. A (-)neighborhood of a point a in a metric space A is a set of the form bAρ(a,b) < . A set C A is open if for each point in C, some neighborhood of this point is alsocontained in C. A set C A is closed if its complement is open. The closure of a set C is theintersection of all closed sets that contain C. The interior of C is the union of all open sets containedin C; it can be empty. A covering of a set C is a family of open sets whose union contains C. Theset C is said to be compact if every covering contains a finite sub-family which is also a covering.A family of sets is said to have the finite-intersection property if every finite sub-family has anon-empty intersection. Another characterization of a compact set is that every family of closedsubsets with the finite intersection property has a non-empty intersection. A metric space A isseparable if there exists a countable subset B such that every neighborhood contains a member ofB. All of the metric spaces encountered in econometrics will be separable. A sequence ai in aseparable metric space A is convergent (to a point ao) if the sequence is eventually contained in eachneighborhood of a; we write ai ao or limi ai = ao to denote a convergent sequence. A set C Ais compact if and only if every sequence in C has a convergent subsequence (which converges to acluster point of the original sequence).

2.1.7. Consider separable metric spaces A and B, and a function f:A B. The function f iscontinuous on A if the inverse image of every open set is open. Another characterization ofcontinuity is that for any sequence satisfying ai ao, one has f(ai) f(ao); the function is said to becontinuous on C A if this property holds for each ao C. Stated another way, f is continuous onC if for each > 0 and a C, there exists δ > 0 such that for each b in a δ-neighborhood of a, f(b)is in a -neighborhood of f(a). For real valued functions on separable metric spaces, the conceptsof supremium and limsup defined earlier for sequences have a natural extension: supaA f(a) denotes

Page 22: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-3, Page 19 ___________________________________________________________________________

the least upper bound on the set f(a)aA, and limsupab f(a) denotes the limit as 0 of thesuprema of f(a) on -neighborhoods of b. Analogous definitions hold for inf and liminf. A real-valued function f is continuous at b if limsupab f(a) = liminfab f(a). Continuity of real-valuedfunctions f and g is preserved by the operations of absolute value f(a), multiplication f(a)g(a),addition f(a)+g(a), and maximization maxf(a),g(a) and minimization minf(a),g(a). Thefunction f is uniformly continuous on C if for each > 0, there exists δ > 0 such that for all a C andb A with b in a δ-neighborhood of a, one has f(b) in a -neighborhood of f(a). The distinctionbetween continuity and uniform continuity is that for the latter a single δ > 0 works for all a C.A function that is continuous on a compact set is uniformly continuous. The function f is Lipschitzon C if there exist L > 0 and δ > 0 such that |f(b) - f(a)| Lρ(a,b) for all a C and b A with b ina δ-neighborhood of a.

2.1.8. Consider a real-valued function f on . The derivative of f at ao, denoted f(ao), f(ao), ordf(ao)/da, has the property if it exists that f(b) - f(ao) - f(ao)(b-ao) (b-ao)(b-ao), where limc0 ©= 0. The function is continuously differentiable at ao if f is a continuous function at ao. If a functionis k-times continuously differentiable in a neighborhood of a point ao, then for b in this neighborhoodit has a Taylor's expansion

f(b) = f(i)(ao) + ,k

i0

(bao)i

i!f (k)(λb(1λ)ao) f (k)(ao)

(bao)k

k!

where f(i) denotes the i-th derivative, and λ is a scalar between zero and one.If limi ai = ao and f is a continuous function at ao, then limi f(ai) = f(ao). One useful result for

limits is LHopitals rule, which states that if f(1/n) and g(1/n) are functions that are continuouslydifferentiable at zero with f(0) = g(0) = 0, so that f(n)/g(n) approaches the indeterminate expression0/0, one has lim nf(n)/g(n) = f(0)/g(0), provided the last ratio exists.

2.1.9. If ai for i = 0,1,2,... is a sequence of real numbers, the partial sums sn = ai defineni0

a series. We say the sequence is summable, or that the series is convergent, if lim nsn exists andis finite. An example is the geometric series ai = ri, which has sn = (1-rn+1)/(1-r) if r 1. When |r|< 1, this series is convergent, with the limit 1/(1-r). When r < -1 or r 1, the series diverges. In theborderline case r = -1, the series alternates between 0 and 1, so the limit does not exist. Applying

the Cauchy criterion, a summable sequence has lim nan = 0 and lim n

ai = 0. A sequence

in

satisfies a more general form of summability, called Cesaro summability, if lim n n-1

ni0 ai

exists. Summability implies Cesaro summability, but not vice versa. A useful result known as

Page 23: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-4, Page 20 ___________________________________________________________________________

Kroneckers lemma states that if ai and bi are positive series, bi is monotonically increasing to +,

and ai/bi is bounded for all n, then lim n bn

-1 ai = 0.ni0

ni0

2.1.10. The exponential function ea, also written exp(a), and natural logarithm log(a) appearfrequently in econometrics. The exponential function is defined for both real and complex arguments,

and has the properties that ea+b = eaeb, e0 = 1, and the Taylor's expansion ea = that is valid for

i0

a i

i!

all a. The trigonometric functions cos(a) and sin(a) are also defined for both real and complex

arguments, and have Taylor's expansions cos(a) = , and sin(a) = .

i0

(1)ia 2i

(2i)!

i0

(1)ia 2i1

(2i1)!

These expansions combine to show that ea+ιb = ea(cos(b) + ιsin(b)). The logarithm is defined forpositive arguments, and has the properties that log(1) = 0, log(ab) = log(a) + log(b), and log(ea) = a.

It has a Taylor's expansion log(1+a) = , valid for a < 1. A useful bound on logarithms

i1 a i

is that for a < 1/3 and b < 1/3, Log(1+a+b) - a < 4b + 3a2. Another useful result, obtainedby applying LHopitals rule to the expression log(1+an/n)/(1/n), is that limn

(1+an/n)n = exp(a0)when lim n

an = a0 exists.A few specific series appear occasionally in probability theory. The series ai = iα for i = 1,2,... is

summable for α < -1, and divergent otherwise, with sn = n(n+1)/2 for α = 1, sn = n(n+1)(2n+1)/6 forα = 2, and sn = n2(n+1)2/4 for α = 3. Differentiating the formula sn = (1-rn+1)/(1-r) for a convergent

geometric series leads to the expressions = r/(1-r)2 and = r(1+r)/(1-r)3.

i1 ir i

i1 i 2r i

2.1.11. If ai and bi are real numbers and ci are non-negative numbers for i = 1,2,..., then HoldersInequality states that for p > 0, q > 0, and 1/p + 1/q = 1, one has

. i ciaibi i ci|aibi| i ci|ai|p 1/p i ci|bi|

q 1/q

When p = q = 1/2, this is called the Cauchy-Schwartz inequality. Obviously, the inequality is usefulonly if the sums on the right converge. The inequality also holds in the limiting case where sums arereplaced by integrals, and a(i), b(i), and c(i) are functions of a continuous index i.

2.2. VECTORS AND LINEAR SPACES

2.2.1. A finite-dimensional linear space is a set such that (a) linear combinations of points in theset are defined and are again in the set, and (b) there is a finite number of points in the set (a basis)

Page 24: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-5, Page 21 ___________________________________________________________________________

such that every point in the set is a linear combination of this finite number of points. The dimensionof the space is the minimum number of points needed to form a basis. A point x in a linear space ofdimension n has a ordinate representation x = (x1,x2,...,xn), given a basis for the space b1,...,bn,where x1,...,xn are real numbers such that x = x1b1 + ... + xnbn. The point x is called a vector, andx1,...,xn are called its components. The notation (x)i will sometimes also be used for component i ofa vector x. In econometrics, we work mostly with finite-dimensional real space. When this spaceis of dimension n, it is denoted n. Points in this space are vectors of real numbers (x1,...,xn); thiscorresponds to the previous terminology with the basis for n being the unit vectors (1,0,..,0),(0,1,0,..,0),..., (0,..,0,1). Usually, we assume this representation without being explicit about the basisfor the space. However, it is worth noting that the coordinate representation of a vector depends onthe particular basis chosen for a space. Sometimes this fact can be used to choose bases in whichvectors and transformations have particularly simple coordinate representations.

The Euclidean norm of a vector x is x2 = (x12+...+xn

2)1/2. This norm can be used to define thedistance between vectors, or neighborhoods of a vector. Other possible norms include x1 =

x1+...+xn, x = max x1,...,xn, or for 1 p < +, xp = . Each normx1p...xn

p 1/p

defines a topology on the linear space, based on neighborhoods of a vector that are less than eachpositive distance away. The space n with the norm x2 and associated topology is called Euclideann-space.

The vector product of x and y in n is defined as xy = x1y1+...+xnyn. Other notations for vectorproducts are <x,y> or (when x and y are interpreted as row vectors) xy or (when x and y areinterpreted as column vectors) xy.

2.2.2. A linear subspace of a linear space such as n is a subset that has the property that alllinear combinations of its members remain in the subset. Examples of linear subspaces in 3 are theplane (a,b,c)b = 0 and the line (a,b,c)a = b = 2c. The linear subspace spanned by a set ofvectors x1,...,xJ is the set of all linear combinations of these vectors, L = x1α1+...+xJαJ(α1,...,αJ) J. The vectors x1,...,xJ are linearly independent if and only if one cannot be written as a linearcombination of the remainder. The linear subspace that is spanned by a set of J linearly independentvectors is said to be of dimension J. Conversely, each linear space of dimension J can be representedas the set of linear combinations of J linearly independent vectors, which are in fact a basis for thesubspace. A linear subspace of dimension one is a line (through the origin), and a linear subspace ofdimension (n-1) is a hyperplane (through the origin). If L is a subspace, then L = xn xy = 0for all yL is termed the complementary subspace. Subspaces L and M with the property that xy= 0 for all y L and x M are termed orthogonal, and denoted LM. The angle θ betweensubspaces L and M is defined by cos θ = Min xy y L, y2 = 1, x M, x2 = 1. Then, the anglebetween orthogonal subspaces is π/2, and the angle between subspaces that have a nonzero point incommon is zero. A subspace that is translated by adding a nonzero vector c to all points in thesubspace is termed an affine subspace.

Page 25: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-6, Page 22 ___________________________________________________________________________

2.2.3. The concept of a finite-dimensional space can be generalized. An example, for 1 p < +,

is the family Lp(n) of real-valued functions f on n such that the integral fp = isnf(x)pdx

1/p

well-defined and finite. This is a linear space with norm fp since linear combinations of functionsthat satisfy this property also satisfy (using convexity of the norm function) this property. One canthink of the function f as a vector in Lp(n), and f(x) for a particular value of x as a component of thisvector. Many, but not all, of the properties of finite-dimensional space extend to infinite dimensions.In basic econometrics, we will not need the infinite-dimensional generalization. It appears in moreadvanced econometrics, in stochastic processes in time series, and in nonlinear and nonparametricproblems.

2.3. LINEAR TRANSFORMATIONS AND MATRICES

2.3.1. A mapping A from one linear space (its domain) into another (its range) is a lineartransformation if it satisfies A(x+z) = A(x) + A(z) for any x and z in the domain. When the domainand range are finite-dimensional linear spaces, a linear transformation can be represented as a matrix.Specifically, a linear transformation A from n into m can be represented by a m×n array A with

elements aij for 1 i m and 1 j n, with y = A(x) having components yi = aijxj for 1 i n

j1

m. In matrix notation, this is written y = Ax. A matrix A is real if all its elements are real numbers,complex if some of its elements are complex numbers. Throughout, matrices are assumed to be realunless explicitly assumed otherwise. The set = xnAx = 0 is termed the null space of thetransformation A. The subspace containing all linear combinations of the column vectors of Ais termed the column space of A; it is the complementary subspace to .

If A denotes a m×n matrix, then A denotes its n×m transpose (rows become columns and viceversa). The identity matrix of dimension n is n×n with one's down the diagonal, zero's elsewhere, andis denoted In, or I if the dimension is clear from the context. A matrix of zeros is denoted 0, and an×1 vector of ones is denoted 1n. A permutation matrix is obtained by permuting the columns of anidentity matrix. If A is a m×n matrix and B is a n×p matrix, then the matrix product C = AB is of

dimension m×p with elements cik aijbjk for 1 i m and 1 k p. For the matrix productn

j1

to be defined, the number of columns in A must equal the number of rows in B (i.e., the matrices mustbe commensurate). A matrix A is square if it has the same number of rows and columns. A squarematrix A is symmetric if A = A, diagonal if all off-diagonal elements are zero, upper (lower)

Page 26: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-7, Page 23 ___________________________________________________________________________

triangular if all its elements below (above) the diagonal are zero, and idempotent if it is symmetricand A2 = A. A matrix A is column orthonormal if AA = I; simply orthonormal if it is both squareand column orthonormal.

A set of linearly independent vectors in n can be recursively orthonormalized; i.e., transformedso they are orthogonal and scaled to have unit length: Suppose vectors x1,...,xJ-1 have previously been

orthonormalized, and z is the next vector in the set. Then, z - is orthogonal to x1,...,xJ-1,J1

j1(xjz)xj

and is non-zero since it is linearly independent. Scale it to unit length; this defines xJ. Each columnof a n×m matrix A is a vector in n. The rank of A, denoted r = ρ(A), is the largest number ofcolumns that are linearly independent. Then A is of rank m if and only if x = 0 is the only solutionto Ax = 0. If A is of rank r, then orthonormalization applied to the linearly independent columns ofA can be interpreted as defining a r×m lower triangluar matrix U such that AU is columnorthonormal. A n×m matrix A is of full rank if ρ(A) = min(n,m). A n×n matrix A of full rank istermed nonsingular. A nonsingular n×n matrix A has an inverse matrix A-1 such that both AA-1 andA-1A equal the identity matrix In. An orthonormal matrix A satisfies AA = In, implying that A = A-1,and hence AA = AA = In. The trace tr(A) of a square matrix A is the sum of its diagonal elements.

2.3.2. The tables in this section summarize useful matrix and vector operations. In addition tothe operations in these tables, there are statistical operations that can be performed on a matrix whenits columns are vectors of observations on various variables. Discussion of these operations ispostponed until later. Most of the operations in Tables 2.1-2.3 are available as part of the matrixprogramming languages in econometrics computer packages such as SST, TSP, GAUSS, orMATLAB. The notation in these tables is close to the notation for the corresponding matrixcommands in SST and GAUSS.

TABLE 2.1. BASIC OPERATIONS Name Notation Definition

1. Matrix Product C = AB For m×n A and n×p B: cik = aijbjk

n

j1

2. Scalar Multiplication C = bA For a scalar b: cij = baij3. Matrix Sum C = A+B For A and B m×n: cij = aij + bij 4. Transpose C = A For m×n A: cij = aji5. Matrix Inverse C = A-1 For A n×n nonsingular: AA-1 = Im

6. Trace c = tr(A) For n×n A: c = aii

n

i1

Page 27: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-8, Page 24 ___________________________________________________________________________

TABLE 2.2. OPERATIONS ON ELEMENTS Name Notation Definition

1. Element Product C = A.*B For A, B m×n: cij = aij bij2. Element Division C = A.÷B For A, B m×n: cij = aij/bij3. Logical Condition C = A.B For A, B m×n: cij = 1(aijbij) (Note 1)4. Row Minimum c = vmin(A) For m×n A: ci = min1km aik (Note 2)5. Row Min Replace C = rmin(A) For m×n A: cij = min1km aik (Note 3)6. Column Min Replace C = cmin(A) For m×n A: cij = min1kn aik (Note 4)

7. Cumulative Sum C = cumsum(A) For m×n A: cij = akj

i

k1

NOTES: 1. 1(P) is one of P is true, zero otherwise. The condition is also defined for the logical operations "<", ">", "", "=", and "". 2. c is a m×1 vector. The operation is also defined for "max". 3. C is a m×n matrix, with all columns the same. The operation is also defined for "max" 4. C is a m×n matrix, with all rows the same. The operation is also defined for "max".

TABLE 2.3. SHAPING OPERATIONS

Name Notation Definition1. Kronecker Product C = AB Note 12. Direct Sum C = AB Note 23. diag C = diag(x) C a diagonal matrix with vector x

down the diagonal4. vec or vecr c = vecr(A) vector c contains rows of A, stacked5. vecc c = vecc(A) vector c contains columns of A, stacked6. vech c = vech(A) vector c contains upper triangle

of A, row by row, stacked7. vecd c = vecd(A) vector c contains diagonal of A8. horizontal contatination C = A,B Partitioned matrix C = [ A B ]9. vertical contatination C = A;B Partitioned matrix C = [ A B]10. reshape C = rsh(A,k) Note 3

NOTES: 1. Also termed the direct product, the Kronecker product creates an array made up of blocks, with each block the product ofan element of A and the matrix B; see Section 2.11.

2. The direct sum is defined for a m×n matrix A and a p×q matrix B by the (m+p)×(n+q) partitioned array AB = .A 00 B

3. If A is m×n, then k must be a divisor of mn. The operation takes the elements of A row by row, and rewrites the successiveelements as rows of a matrix C that has k rows and mn/k columns.

Page 28: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-9, Page 25 ___________________________________________________________________________

2.3.3. The determinant of a n×n matrix A is denoted A or det(A), and has a geometricinterpretation as the volume of the parallelepiped formed by the column vectors of A. The matrix Ais nonsingular if and only if det(A) 0. A minor of a matrix A (of order r) is the determinant of asubmatrix formed by striking out n-r rows and columns. A principal minor is formed by striking outsymmetric rows and columns of A. A leading principal minor (of order r) is formed by striking outthe last n-r rows and columns. The minor of an element aij of A is the determinant of the submatrixAij formed by striking out row i and column j of A. Determinants satisfy the recursion relation

det(A) = (-1)i+jaijdet(Aij) = (-1)i+jaijdet(Aij),n

i1

n

j1

with the first equality holding for any j and the second holding for any i. This formula can be usedas a recursive definition of determinants, starting from the result that the determinant of a scalar isthe scalar. A useful related formula is

(-1)i+jaikdet(Aij)/det(A) = δkj,n

i1

where δkj is one if k = j and zero otherwise.

2.3.4. We list without proof a number of useful elementary properties of matrices:

(1) (A) = A.(2) If A-1 exists, then (A-1)-1 = A.(3) If A-1 exists, then (A)-1 = (A-1).(4) (AB) = BA.(5) If A,B are square, nonsingular, and commensurate, then (AB)-1 = B-1A-1.(6) If A is m×n, then Min m,n ρ(A) = ρ(A) = ρ(AA) = ρ(AA).(7) If A is m×n and B is m×r, then ρ(AB) min(ρ(A),ρ(B)).(8) If A is m×n with ρ(A) = m, and B is m×r, then ρ(AB) = ρ(B).(9) ρ(A+B) ρ(A) + ρ(B).(10) If A is n×n, then det(A) 0 if and only if ρ(A) = n.(11) If B and C are nonsingular and commensurate with A, then ρ(BAC) = ρ(A).(12) If A, B are n×n, then ρ(AB) ρ(A) + ρ(B) - n.(13) det(AB) = det(A)det(B).(14) If c is a scalar and A is n×n, then det(cA) = cndet(A)(15) The determinant of a matrix is unchanged if a scalar times one column (row) is added toanother column (row). (16) If A is n×n and diagonal or triangular, then det(A) is the product of the diagonal elements.(17) det(A-1) = 1/det(A).

Page 29: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-10, Page 26 ___________________________________________________________________________

(18) If A is n×n and B = A-1, then bij = (-1)i+jdet(Aij)/det(A).(19) The determinant of an orthonormal matrix is +1 or -1.(20) If A is m×n and B is n×m, then tr(AB) = tr(BA).(21) tr(In) = n.(22) tr(A+B) = tr(A) + tr(B).(23) A permutation matrix P is orthonormal; hence, P = P-1.(24) The inverse of a (upper) triangular matrix is (upper) triangular, and the inverse of adiagonal matrix D is diagonal, with (D-1)ii = 1/Dii. (25) The product of orthonormal matrices is orthonormal, and the product of permutationmatrices is a permutation matrix.

2.4. EIGENVALUES AND EIGENVECTORS

An eigenvalue of a n×n matrix A is a scalar λ such that Ax = λx for some vector x 0. The vectorx is called a (right) eigenvector. The condition (A-λI)x = 0 associated with an eigenvalue impliesA-λI simgular, and hence det(A-λI) = 0. This determanental equation defines a polynomial in λ oforder n, and the n roots of this polynomial are the eigenvalues. For each eigenvalue λ, the conditionthat A-λI is less than rank n implies the existence of one or more linearly independent eigenvectors;the number equals the multiplicity of the root λ. The following basic properties of eigenvalues andeigenvectors of a n×n matrix A are stated without proof:

(1) If A is real and symmetric, then its eigenvalues and eigenvectors are real. However, if A isnonsymmetric, then its eigenvalues and eigenvectors in general are complex. (2) The number of nonzero eigenvalues of A equals its rank ρ(A).(3) If λ is an eigenvalue of A, then λk is an eigenvalue of Ak, and 1/λ is an eigenvalue of A-1 (if theinverse exists).(4) If A is real and symmetric, then the eigenvalues corresponding to distinct roots are orthogonal.[Axi = λixi implies xiAxj = λixixj = λjxixj, which can be true for i j only if xixj = 0.](5) If A is real and symmetric, and Λ is a diagonal matrix with the roots of the polynomialdet(A-λI) along the diagonal, then there exists an orthonormal matrix C such that CC = I and AC= CΛ, and hence CAC = Λ and CΛC = A. The transformation C is said to diagonalize A.[Take C to be an array whose columns are eigenvectors of A, scaled to unit length. In the caseof a multiple root, orthonormalize the eigenvectors corresponding to this root.].(6) If A is real and nonsymmetric, there exists a nonsingular complex matrix Q and a uppertriangular complex matrix T with the eigenvalues of A on its diagonal such that Q-1AQ = T.(7) A real and symmetric implies tr(A) equals the sum of the eigenvalues of A. [Since A =CΛC, tr(A) = tr(CΛC) = tr(CCΛ) = tr(Λ) by 2.3.20.]

Page 30: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-11, Page 27 ___________________________________________________________________________

(8) If Ai are real and symmetric for i = 1,...,p, then there exists C orthonormal such that CAiC,are all diagonal if and only if AiAj = AjAi for i,j = 1,...,p.

Results (5) and (6) combined with the result 2.3.13 that the determinant of a matrix product is theproduct of the determinants of the matrices, implies that the determinant of a matrix is the productof its eigenvalues. The transformations in (5) and (6) are called similarity transformations, and canbe interpreted as representations of the transformation A when the basis of the domain is transformedby C (or Q) and the basis of the range is transformed by C-1 (or Q-1). These transformations are usedextensively in econometric theory.

2.5. PARTITIONED MATRICES

It is sometimes useful to partition a matrix into submatrices,

A = ,A11 A12

A21 A22

where A is m×n, A11 is m1×n1, A12 is m1×n2, A21 is m2×n1, A22 is m2×n2, and m1+m2 = m and n1+n2 =n. Matrix products can be written for partitioned matrices, applying the usual algorithm to thepartition blocks, provided the blocks are commensurate. For example, if B is n×p and is partitioned

B = where B1 is n1×p and B2 is n2×p, one has AB = . B1

B2

A11B1A12B2

A21B1A22B2

Partitioned matrices have the following elementary properties:

(1) A square and A11 square and nonsingular implies det(A) = det(A11)det(A22-A21A11-1A12).

(2) A and A11 square and nonsingular implies

A-1 = A 1

11 A 111 A12C

1A21A1

11 A 111 A12C

1

C 1A21A1

11 C 1

with C = A22-A21A11-1A12. When A22 is nonsingular, the northwest matrix in this partition can also

be written as (A11-A12A22-1A21)-1.

Page 31: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-12, Page 28 ___________________________________________________________________________

2.6. QUADRATIC FORMS

The scalar function Q(x,A) = xAx, where A is a n×n matrix and x is a n×1 vector, is termed aquadratic form; we call x the wings and A the center of the quadratic form. The value of a quadraticform is unchanged if A is replaced by its symmetrized version (A+A)/2. Therefore, A will beassumed symmetric for the discussion of quadratic forms.

A quadratic form Q(x,A) may fall into one of the classes in the table below:

Class Defining ConditionPositive Definite x0 Q(x,A) > 0Positive Semidefinite x0 Q(x,A) 0Negative Definite x0 Q(x,A) < 0Negative Semidefinite x0 Q(x,A) 0

A quadratic form that is not in one of these four classes is termed indefinite. The basic properties ofquadratic forms are listed below:

(1) If B is m×n and is of rank ρ(B) = r, then BB and BB are both positive semidefinite; and ifr = m n, then BB is positive definite.(2) If A is symmetric and positive semidefinite (positive definite), then the eigenvalues of A arenonnegative (positive). Similarly, if A is symmetric and negative semidefinite (negative definite),then the eigenvalues of A are nonpositive (negative).(3) Every symmetric positive semidefinite matrix A has a symmetric positive semidefinite squareroot A1/2 [By 2.4.4, CAC = D for some C orthonormal and D a diagonal matrix with thenonnegative eigenvalues down the diagonal. Then, A = CDC and A1/2 = CD1/2C with D1/2 adiagonal matrix of the positive square roots of the diagonal of D.](4) If A is positive definite, then A-1 is positive definite.(5) If A and B are real, symmetric n×n matrices and B is positive definite, then there exists a n×nmatrix Q that simultaneously diagonalizes A and B: QAQ = Λ diagonal and QBQ = I. [From2.4(5), there exists a n×n orthonormal matrix U such that UBU = D is diagonal. Let G be anorthonormal matrix that diagonalizes D-1/2UAUD-1/2, and define Q = UD-1/2G.] (6) B positive definite and A - B positive semidefinite imply B-1 - A-1 positive semidefinite. [Fora vector z, let x = Q-1z, where Q is the diagonalizing matrix from (5). Then z(B - A)z = xQ(B -A)Qx = x(Λ - I)x 0, so no diagonal element of Λ is less than one. Alternately, let x = Qz.Then z(B-1 - A-1)z = xQ-1(B-1 - A-1)(Q)-1x = x(I - Λ-1)x must be non-negative.](7) The following conditions are equivalent:

(i) A is positive definite (ii) The principal minors of A are positive (iii) The leading principal minors of A are positive.

Page 32: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-13, Page 29 ___________________________________________________________________________

2.7. THE LDU AND CHOLESKY FACTORIZATIONS OF A MATRIX

A n×n matrix A has a LDU factorization if it can be written A = LDU, where D is a diagonalmatrix and L and U are lower triangular matrices. This factorization is useful for computation ofinverses, as triangular matrices are easily inverted by recursion.

Theorem 2.1. Each n×n matrix A can be written as A = PLDUQ, where P and Q arepermutation matrices, L and U are lower triangular matrices, each with ones on the diagonal, andD is a diagonal matrix. If the leading principal minors of A are all non-zero, then P and Q canbe taken to be identity matrices.

Proof: First assume that the leading principal minors of A are all nonzero. We give a recursiveconstruction of the required L and U. Suppose the result has been established for matrices up to ordern-1. Then, write the required decomposition A = LDU for a n×n matrix in partitioned form

= ,A11 A12

A21 A22

L11 0

L21 1

D11 0

0 D22

U11 U21

0 1

where A11, L11, D11, and U11 are (n-1)×(n-1), L21 is 1×(n-1), U21 is 1×(n-1), and A22 and D22 are 1×1.Assume that L11, D11, and U11 have been defined so that A11 = L11D11U11, and that L11

-1 and U11-1 also

exist and have been computed. Let S = L-1 and T = U-1, and partition S and T commensurately withL and U. Then, A11

-1 = U11-1D11

-1L11-1

and the remaining elements must satisfy the equations

A21 = L21D11U11 L21 = A21U11-1D11

-1 A21T11D11-1

A12 = L11D11U21 U21 = D11-1L11

-1A12 D11-1S11A12

A22 = L21D11U21 + D22 D22 = A22 - A21U11-1D11

-1L11-1A12 = A22 - A21A11

-1A12 S21 = -L21S11 S22 = 1 T21 = -T11U21 T22 = 1

where det(A) = det(A11)det(A22 - A21A11-1A12) 0 implies D22 0. Since the decomposition is trivial

for n = 1, this recursion establishes the result, and furthermore gives the triangular matrices S and Tfrom the same recursion that can be multiplied to give A-1 = TD-1S.

Now assume that A is of rank r < n, and that the first r columns of A are linearly independent,with non-zero leading principal minors up to order r. Partition

= , A11 A12

A21 A22

L11 0

L21 I

D11 0

0 0

U11 U21

0 I

Page 33: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-14, Page 30 ___________________________________________________________________________

where A11 is r×r and the remaining blocks are commensurate. Then, U21 = D11-1S11A12 and L21 =

A21T11D11-1, and one must satisfy A22 = L21D11U12 = A21A11

-1A12. But the rank condition implies the

last n-r columns of A can be written as a linear combination = of the first rA12

A22

A11

A21

C

columns, where C is some r×(n-r) matrix. But A12 = A11C implies C = A11-1A12 and hence A22 = A21C

= A21A11-1A12 as required.

Finally, consider any real matrix A of rank r. By column permutations, the first r columns canbe made linearly independent. Then, by row permutations, the first r rows of these r columns can bemade linearly independent. Repeat this process recursively on the remaining northwest principalsubmatrices to obtain products of permutation matrices that give nonzero leading principal minorsup to order r. This defines P and Q, and completes the proof of the theorem. G

Corollary 2.1.1. If A is symmetric, then L = U.Corollary 2.1.2. (LU Factorization) If A has nonzero leading principal minors, then A can bewritten A = LV, where V = DU is upper triangular with a diagonal coinciding with that of D.

Corollary 2.1.3. (Cholesky Factorization) If A is symmetric and positive definite, then A can bewritten A = VV, where V = LD1/2 is lower triangular with a positive diagonal.Corollary 2.1.4. A symmetric positive semidefinite implies A = PVVP, with V lowertriangular with a nonnegative diagonal, P a permutation matrix.Corollary 2.1.5. If A is m×n with m n, then there exists a factorization A = PLDUQ, withD n×n diagonal, P a m×m permutation matrix, Q a n×n permutation matrix, U a n×n lowertriangular matrix with ones on the diagonal, and L a m×n lower triangular matrix with ones onthe diagonal (i.e., L has the form L = [L11 L21] with L11 n×n and lower triangular with oneson the diagonal, and L21 (m-n)×n. Further, if ρ(A) = n, then (AA)-1A = QU

-1D-1(LL)-1LP.Corollary 2.1.6. If the system of equations Ax = y with A m×n of rank n has a solution, then thesolution is given by x = (AA)-1Ay = QU

-1D-1(LL)-1LPy.

Proof outline: To show Corollary 3, note that a positive definite matrix has positive leading principalminors, and note from the proof of the theorem that this implies that the diagonal of D is positive.Take V = D1/2U, where D1/2 is the positive square root. The same construction applied to the LDUfactorization of A after permutation gives Corollary 4. To show Corollary 5, note first that the rowsof A can be permuted so that the first n rows are of maximum rank ρ(A). Suppose A =

Page 34: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-15, Page 31 ___________________________________________________________________________

is of this form, and apply the theorem to obtain A11 = P11L11DUQ. The rank conditionA11 A12

A21 A22

implies that A21 = FA11 for some (m-n)×n array F. Then, A21 = L21DUQ, with L21 = FP11L11, so that

A = DUQ.L11

L21

To complete the proof, apply a left permutation if necessary to undo the initial row permutation ofA. Corollary 6 is an implication of the last result.

The recursion in the proof of the theorem is called Crout's algorithm, and is the method for matrixinversion of positive definite matrices used in many computer programs. It is unnecessary to do thepermutations in advance of the factorizations; they can also be carried out recursively, bringing inrows (in what is termed a pivot) to make the successive elements of D as large in magnitude aspossible. This pivot step is important for numerical accuracy.

The Cholesky factorization of a n×n positive definite matrix A that was obtained above as acorollary of the LDU decomposition states that A can be written as A = LL, where L is lowertriangular with a positive diagonal. This factorization is readily computed and widely used ineconometrics. We give a direct recursive construction of L that forms the basis for its computation.Write the factorization in partitioned form

A = .

A11 A12 A13

A21 A22 A23

A31 A32 A33

L11 0 0

L21 L22 0

U31 L32 L33

L11 0 0

L21 L22 0

L31 L32 L33

´

Also, let V = L-1, and partition it commensurately, so that

.

I1 0 0

0 I2 0

0 0 I3

L11 0 0

L21 L22 0

L31 L32 L33

´ V11 0 0

V21 V22 0

V31 V32 V33

Then A11 = L11L11, A12 = L11L21, A22 = L21L21 + L22L22, V11 = L11-1, V22 = L22

-1, and 0 = L21V11 +L22V22. Note first that if A11 is 1×1, then L11 = A11

1/2 and V11 = 1/L11. Now suppose that one hasproceeded recursively from the northwest corner of these matrices, and that L11 and V11 have alreadybeen computed up through dimension n1. Suppose that A22 is 1×1. Then, compute in sequence L21= V11A12

., L22 = (A22 - L12L12)1/2, V22 = 1/L22, and V12 = - V11L21V22. This gives the required factors

Page 35: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-16, Page 32 ___________________________________________________________________________

up through dimension n1+1. Repeat this for each dimension in turn to construct the full L and Vmatrices.

An extension of the Cholesky decomposition holds for an n×n positive semidefinite matrix A ofrank r, which can be written as A = PLLP with P a permutation matrix and L a lower triangularmatrix whose first r diagonal elements are positive. The construction proceeds recursively as before,but at each stage one may have to search among remaining columns to find one for which L22 > 0,determining the P matrix. Once dimension r is reached, all remaining columns will have L22 = 0.Now reinterpret L21 and L22 as a partition corresponding to all the remaining columns and computeL12 = V11A12

. and L22 = 0 to complete the Cholesky factor.

2.8. THE SINGULAR VALUE DECOMPOSITION OF A MATRIX

A factorization that is useful as a tool for finding the eigenvalues and eigenvectors of a symmetricmatrix, and for calculation of inverses of moment matrices of data with high multicollinearity, is thesingular value decomposition (SVD):

Theorem 2.2. Every real m×n matrix A of rank r can be decomposed into a product A = UDV,where D is a r×r diagonal matrix with positive nonincreasing elements down the diagonal, U is m×r,V is n×r, and U and V are column-orthonormal; i.e., UU = Ir = VV.

Proof: Note that the SVD is an extension of the LDU decomposition to non-square matrices. Toprove that the SVD is possible, note first that the m×m matrix AA is symmetric and positivesemidefinite. Then, there exists a m×m orthonormal matrix W whose columns are eigenvectors ofAA arranged in non-increasing order for the eigenvalues, partitioned W = [W1 W2] with W1 ofdimension m×r, such that W1(AA)W1 = Λ is diagonal with positive, non-increasing diagonalelements, and W2(AA)W2 = 0, implying AW2 = 0. Define D from Λ by replacing the diagonalelements of Λ by their positive square roots. Note that WW = I = WW W1 W1 + W2W2.Define U = W1 and V = D-1UA. Then, UU = Ir and VV = D-1UAAUD-1 = D-1ΛD-1 = Ir. Further,A = (Im-W2 W2)A = UUA = UDV. This establishes the decomposition.

If A is symmetric, then U is the array of eigenvectors of A corresponding to the non-zero roots,so that AU = UD1, with D1 the r×r diagonal matrix with the non-zero eigenvalues in descendingmagnitude down the diagonal. In this case, V = AUD-1 = UD1D-1. Since the elements of D1 and Dare identical except possibly for sign, the columns of U and V are either equal (for positive roots) orreversed in sign (for negative roots). Then, the SVD of a square symmetric nonsingular matrixprovides the pieces necessary to write down its eigenvalues and eigenvectors. For a positive definitematrix, the connection is direct.

Page 36: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-17, Page 33 ___________________________________________________________________________

When the m×n matrix A is of rank n, so that AA is symmetric and positive definite, the SVDprovides a method of calculating (AA)-1 that is particularly numerically accurate: Substituting theform A = UDV, one obtains (AA)-1 = VD-2V. One also obtains convenient forms for a square rootof AA and its inverse, (AA)1/2 = VDV and (AA)-1/2 = VD-1V.

The numerical accuracy of the SVD is most advantageous when m is large and some of thecolumns of A are nearly linearly dependent. Then, roundoff errors in the matrix product AA can leadto quite inaccurate results when a matrix inverse of AA is computed directly. The SVD extracts therequired information from A before the roundoff errors in AA are introduced. Computer programsfor the Singular Value Decomposition can be found in Press et al, Numerical Recipes, CambridgeUniversity Press, 1986.

2.9. IDEMPOTENT MATRICES AND GENERALIZED INVERSES

A symmetric n×n matrix A is idempotent if A2 = A. Examples of idempotent matrices are 0, I,and for any n×r matrix X of rank r, X(XX)-1X. Idempotent matrices are intimately related toprojections, discussed in the following section. Some of the properties of an n×n idempotent matrixA are listed below:

(1) The eigenvalues of A are either zero or one.(2) The rank of A equals tr(A).(3) The matrix I-A is idempotent.(4) If B is an orthonormal matrix, then BAB is idempotent.(5) If ρ(A) = r, then there exists a n×r matrix B of rank r such that A = B(BB)-1B. [Let C be anorthonormal matrix that diagonalizes A, and take B to be the columns of C corresponding to thenon-zero elements in the diagonalization.](6) A, B idempotent implies AB = 0 if and only if A+B is idempotent.(7) A, B idempotent and AB = BA implies AB idempotent. (8) A, B idempotent implies A-B idempotent if and only if BA = B.

Recall that a n×n non-singular matrix A has an inverse A-1 that satisfies AA-1 = A-1A = I. It isuseful to extend the concept of an inverse to matrices that are not necessarily non-singular, or evensquare. For an m×k matrix A (of rank r), define its Moore-Penrose generalized inverse A to be ak×m matrix with the following three properties:

(i) AAA = A, (ii) AAA = A (iii) AA and AA are symmetric

Page 37: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-18, Page 34 ___________________________________________________________________________

The next theorem shows that the Moore-Penrose generalized inverse always exists, and is unique.Conditions (i) and (ii) imply that the matrices AA and AA are idempotent. There are othergeneralized inverse definitions that have some, but not all, of the properties (i)-(iii); in particular A+

will denote any matrix that satisfies (i), or AA+A = A.

Theorem 2.3. The Moore-Penrose generalized inverse of a m×k matrix A of rank r (which hasa SVD A = UDV, where U is m×r, V is k×r, U and V are column-orthogonal, and D is r×r diagonalwith positive diagonal elements) is the matrix A = VD-1U. Let A+ denote any matrix, including A,that satisfies AA+A = A. These matrices satisfy:

(1) A = A+ = A-1 if A is square and non-singular. (2) The system of equations Ax = y has a solution if and only if y = AA+y, and the linear subspaceof all solutions is the set of vectors x = A+y + [I - A+A]z for z k. (3) AA+ and A+A are idempotent. (4) If A is idempotent, then A = A . (5) If A = BCD with B and D nonsingular, then A = D-1 C B-1, and any matrix A+ = D-1C+B-1

satisfies AA+A = A. (6) (A) = (A) (7) (AA) = A(A) (8) (A) = A = AA(A) = (A)AA.

(9) If A = Ai with AiAj = 0 and AiAj = 0 for i j, then A = Ai.

i

i

Theorem 2.4. If A is m×m, symmetric, and positive semidefinite of rank r, then (1) There exist Q positive definite and R idempotent of rank r such that A = QRQ and A =Q-1RQ-1.(2) There exists an m×r column-orthonormal matrix U such that UAU = D is positive diagonal,A = UDU, A = UD-1U = U(UAU)-1U, and any matrix A+ satisfying condition (i) for ageneralized inverse, AA+A = A, has UA+U = D-1. (3) A has a symmetric square root B = A1/2, and A = BB.

Proof: Let U be an m×r column-orthonormal matrix of eigenvectors of A corresponding to thepositive characteristic roots, and W be a m×(m-r) column-orthonormal matrix of eigenvectorscorresponding to the zero characteristic roots. Then [U W] is an orthonormal matrix diagonalizing

A, with = and D positive diagonal. Define Q = ,U

WA U W

D 00 0

U WD 1/2 0

0 Imr

U

W

and R = UU. The diagonalizing transformation implies UAU = D and AW = 0. One has UU = Ir,

Page 38: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-19, Page 35 ___________________________________________________________________________

WW = Im-r, and UU + WW = Im. Since AW = 0, A = A[UU + WW] = AUU and D = UAU =UAA+AU = UAUUA+UUAU = DUA+UD, implying UA+U = D-1. Define B = UD1/2U. G

2.10. PROJECTIONS

Consider a Euclidean space n of dimension n, and suppose X is a n×p array with columns thatare vectors in this space. Let X denote the linear subspace of n that is spanned or generated by X;i.e., the space formed by all linear combinations of the vectors in X. Every linear subspace can beidentified with an array such as X. The dimension of the subspace is the rank of X. (The array Xneed not be of full rank, although if it is not, then a subarray of linearly independent columns alsogenerates X.) A given X determines a unique subspace, so that X characterizes the subspace.However, any set of vectors contained in the subspace that form an array with the rank of thesubspace, in particular any array XA with rank equal to the dimension of X, also generates X. Then,X is not a unique characterization of the subspace it generates.

The projection of a vector y in n into the subspace X is defined as the point v in X that is theminimum Euclidean distance from y. Since each vector v in X can be represented as a linearcombination Xα of an array X that generates X, the projection is characterized by the value of α thatminimizes the squared Euclidean distance of y from X, (y-Xα)(y-Xα). The solution to this problemis the vector = (XX)Xy giving v = X = X(XX)Xy. In these formulas, we use theMoore-Penrose generalized inverse (XX) rather than (XX)-1 so that the solution is defined evenif X is not of full rank. The array PX = X(XX)X is termed the projection matrix for the subspaceX; it is the linear transformation in n that maps any vector in the space into its projection v in X.The matrix PX is idempotent (i.e., PXPX = PX and PX = PX), and every idempotent matrix can beinterpreted as a projection matrix. These observations have two important implications: First, theprojection matrix is uniquely determined by X, so that starting from a different array that generatesX, say an array S = XA, implies PX = PS. (One could use the notation PX rather than PX to emphasizethat the projection matrix depends only on the subspace, and not on any particular set of vectors thatgenerate X.) Second, if a vector y is contained in X, then the projection into X leaves it unchanged,PXy = y.

Define QX = I - PX = I - X(XX)-1X; it is the projection to the subspace orthogonal to thatspanned by X. Every vector y in n is uniquely decomposed into the sum of its projection PXy ontoX and its projection QXy onto the subspace orthogonal to X. Note that PXQX = 0, a property that holdsin general for two projections onto orthogonal subspaces.

If X is a subspace generated by an array X and W is a subspace generated by a more inclusivearray W = [X Z], then X W. This implies that PXPW = PWPX = PX; i.e., a projection onto a subspaceis left invariant by a further projection onto a larger subspace, and a two-stage projection onto a largesubspace followed by a projection onto a smaller one is the same as projecting directly onto thesmaller one. The subspace of W that is orthogonal to X is generated by QXW; i.e., it is the set of

Page 39: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-20, Page 36 ___________________________________________________________________________

linear combinations of the residuals, orthogonal to X, obtained by the difference of W and itsprojection onto X. Note that any y in n has a unique decomposition PXy + QXPWy + QWy into thesum of projections onto three mutually orthogonal subspaces, X, the subspace of W orthogonal to X,and the subspace orthogonal to W. The projection QXPW can be rewritten QXPW = PW - PX = PWQX

= QXPWQX, or since QXW = QX[X Z] = [0 QXZ], QXPW = = QXZ(ZQXZ) ZQX.PQXW PQXZ

This establishes that PW and QX commute. This condition is necessary and sufficient for the productof two projections to be a projection; equivalently, it implies that QXPW is idempotent since(QXPW)(QXPW) = QX(PWQX)PW = QX(QXPW)PW = QXPW.

2.11. KRONECKER PRODUCTS

If A is a m×n matrix and B is a p×q matrix, then the Kronecker (direct) product of A and B is the(mp)×(nq) partitioned array

AB = .

a11B

a21B

:an1B

a12B

a22B

:an2B

...

...:

...

a1mB

a2mB

:anmB

In general, AB BA. The Kronecker product has the following properties:

(1) For a scalar c, (cA)B = A(cB) = c(AB).(2) (AB)C = A(BC). (3) (AB) = (A)(B).(4) tr(AB) = (tr(A))(tr(B)) when A and B are square.(5) If the matrix products AC and BF are defined, then (AB)(CF) = (AC)(BF).(6) If A and B are square and nonsingular, then (AB)-1 = A-1B-1.(7) If A and B are orthonormal, then AB is orthonormal.(8) If A and B are positive semidefinite, then AB is positive semidefinite.(9) If A is k×k and B is n×n, then det(AB) = det(A)n

det(B)k.(10) ρ(AB) = ρ(A)ρ(B).(11) (A+B)C = AC + BC.

Page 40: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-21, Page 37 ___________________________________________________________________________

2.12. SHAPING OPERATIONS

The most common operations used to reshape vectors and matrices are (1) C = diag(x) whichcreates a diagonal matrix with the elements of the vector x down the diagonal; (2) c = vecc(A) whichcreates a vector by stacking the columns of A, and vecr(A) = vecc(A); (3) c = vech(A) which createsa vector by stacking the portions of the rows of A that are in the upper triangle of the matrix; and (4)c = vecd(A) which creates a vector containing the diagonal of A. (In some computer matrixlanguages, vec(A) stacks by row rather than by column.) There are a few rules that can be used tomanipulate these operations:

(1) If x and y are commensurate vectors, diag(x+y) = diag(x) + diag(y).(2) vecc(A+B) = vecc(A) + vecc(B).(3) If A is m×k and B is k×n, then vecr(AB) = (InA)vecr(B) = (BIm)vecr(A).(4) If A is m×k, B is k×n, C is n×p, then vecr(ABC) = (Ip(AB))vecr© = (CA)vecr(B) =((CB)Im)vecr(A). (5) If A is n×n, then vech(A) is of length n(n+1)/2.(6) vecd(diag(x)) = x.

2.13. VECTOR AND MATRIX DERIVATIVES

The derivatives of functions with respect to the elements of vectors or matrices can sometimesbe expressed in a convenient matrix form. First, a scalar function of a n×1 vector of variables, f(x),has partial derivatives that are usually written as the arrays

f/x = , f2/xx = .

f/x1

f/x2

|f/xn

f 2/x1 f 2/x1x2 ... f 2/x1xn

f 2/x2x1 f 2/x2 ... f 2/x2xn

| | |

f 2/xnx1 f 2/xnx2 ... f 2/xn

Other common notation is fx(x) or xf(x) for the vector of first derivatives, and fxx(x) or xxf(x) for thematrix of second derivatives. Sometimes, the vector of first derivatives will be interpreted as a rowvector rather than a column vector. Some examples of scalar functions of a vector are the linearfunction f(x) = ax, which has xf = a, and the quadratic function f(x) = xAx, which has xf = 2Ax.

When f is a column vector of scalar functions, f(x) = [f1(x) f2(x) ... fk(x)], then the array of firstpartial derivatives is called the Jacobean matrix and is written

Page 41: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-22, Page 38 ___________________________________________________________________________

J(x) = .

f 1/x1 f 1/x2 ... f 1/xn

f 2/x1 f 2/x2 ... f 2/xn

f k/x1 f k/x2 ... f k/xn

When calculating multivariate integrals of the form g(y)dy, where y n, A n, and g is aA

scalar or vector function of y, one may want to make a nonlinear one-to-one transformation ofvariables y = f(x). In terms of the transformed variables, the integral becomes

g(y)dy = g(f(x))det(J(x))dx ,A f 1(A)

where f-1(A) is the set of x vectors that map onto A, and the Jacobean matrix is square andnonsingular for well-behaved one-to-one transformations. The intuition for the presence of theJacobean determinant in the transformed integral is that "dy" is the volume of a small rectangle iny-space, and because determinants give the volume of the parallelepiped formed by the columns ofa linear transformation, det(J(x))dx gives the volume (with a plus or minus sign) of the image inx-space of the "dy" rectangle in y-space.

It is useful to define the derivative of a scalar function with respect to a matrix as an array ofcommensurate dimensions. Consider the bilinear form f(A) = xAy, where x is n×1, y is m×1, andA is n×m. By collecting the individual terms f/Aij = xiyj, one obtains the result f/A = xy.Another example for a n×n matrix A is f(A) = tr(A), which has f/A = In. There are a few otherderivatives that are particularly useful for statistical applications. In these formulas, A is a squarenonsingular matrix. We do not require that A be symmetric, and the derivatives do not imposesymmetry. One will still get valid calculations involving derivatives when these expressions areevaluated at matrices that happen to be symmetric. There are alternative, and somewhat morecomplicated, derivative formulas that hold when symmetry is imposed. For analysis, it is unnecessaryto introduce this complication.

(1) If det(A) > 0, then log(det(A))/A = A-1. (2) If A is nonsingular, then (xA-1y)/A = - A-1xyA-1. (3) If A = TT, with T square and nonsingular, then (xA-1y)/T = - 2A-1xyA-1T.

We prove the formulas in order. For (1), recall that det(A) = (-1)i+kaikdet(Aik), where Aik is thek

minor of aik. Then, det(A)/Aij = (-1)i+jdet(Aij). From 2.3.17, the ij element of A-1 is(-1)i+jdet(Aij)/det(A). For (2), apply the chain rule to the identity AA-1 I to get ∆ijA-1 + AA-1/Aij

Page 42: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-23, Page 39 ___________________________________________________________________________

0, where ∆ij denotes a matrix with a one in row i and column j, zeros elsewhere. Then, xA-1y/Aij= - xA-1∆ijA-1y = (A-1x)i(A-1y)j. For (3), first note that Aij/Trs = δirTjs+δjrTis. Combine this with (2)to get

xA-1y/Trs = (A-1x)i(A-1y)j(δirTjs+δjrTis) j

= (A-1x)r(A-1y)jTjs + (A-1x)i(A-1y)rTis = 2(A-1xyA-1T)rs. j

i

2.14. UPDATING AND BACKDATING MATRIX OPERATIONS

Often in statistical applications, one needs to modify the calculation of a matrix inverse or othermatrix operation to accommodate the addition of data, or deletion of data in bootstrap methods. Itis convenient to have quick methods for these calculations. Some of the useful formulas are givenbelow:

(1) If A is n×n and nonsingular, and A-1 has been calculated, and if B and C are arrays that are n×kof rank k, then (A+BC)-1 = A-1 - A-1B(Ik+CA-1B)-1CA-1, provided Ik+CA-1B is nonsingular. Nomatrix inversion is required if k = 1.(2) If A is m×n with m n and ρ(A) = n, so that it has a LDU factorization A = PLDUQ with D n×n

diagonal, P and Q permutation matrices, L and U lower triangular, then the array , with B k×n,AB

has the LDU factorization , where C = BQU-1D-1.

P 00 Ik

LC

DUQ

(3) Suppose A is m×n of rank n, and b = (AA)-1Ay. Suppose A* = and with C k×nAC

yyw

and w k×1, and b* = (A*A*)-1 A*y*. Then,

b* - b = (AA)-1C[Ik+C(AA)-1C]-1(w-Cb) = (A*A*)-1C[Ik-C(A*A*)-1C]-1(w-Cb*).

One can verify (1) by multiplication. To show (2), use Corollary 5 of Theorem 2.1. To show (3),apply (1) to A*A* = AA + CC, or to AA = A*A* - CC, and use A*y* = Ay + Cw.

Page 43: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-24, Page 40 ___________________________________________________________________________

40

2.15. NOTES AND COMMENTS The basic results of linear algebra, including the results stated without proof in this summary, canbe found in standard linear algebra texts, such as G. Hadley (1961) Linear Algebra, Addison-Wesleyor F. Graybill (1983) Matrices with Applications in Statistics, Wadsworth. The organization of thissummary is based on the admirable synopsis of matrix theory in the first chapter of F. Graybill (1961)An Introduction to Linear Statistical Models, McGraw-Hill. For computations involving matrices,W. Press et al (1986) Numerical Recipes, Cambridge Univ. Press, provides a good discussion ofalgorithms and accompanying computer code. For numerical issues in statistical computation, seeR. Thisted (1988) Elements of Statistical Computing, Chapman and Hall.

2.16. Exercises

1. The conjugate of a complex number z = a+ιb is the complex number z* = a - ιb. The square of the modulus of acomplex number is the product of the number and its complex conjugate. Show that this definition is the same as thedefinition of the modulus r of a complex number written in polar representation as z = reιθ = r(cos θ + ιsin θ).

2. Show that A \ B = ABc, AB = AB \ (A\B) \ (B\A),AB = AB (A\B) (B\A), and if AB = , then A\B = A

3. Consider the real-valued function y = f(x) x2 on the real line. Find the image of sets of the form [a,b]. Find theinverse image of sets of the form [c,d]. Is the mapping f-1 a real-valued function?

4. Use the Cauchy criterion to show that the sequence an = 1 + ... + rn has a limit if |r| < 1, but not if r 1.

5. Show that the real line is a separable metric space for each of the following distance functions: ρ(x,y) = x-y, ρ(x,y)= x-y1/2, ρ(x,y) = min(x-y,1). Show that ρ(x,y) = (x - y)2 fails to satisfy the triangle inequality for distance functions.

6. Show that the function f(x) = sin (1/x) is continuous, but not uniformly continuous, on the set (0,1]. Prove that acontinuous function on a compact set is uniformly continuous.

7. What are the differentiability properties of the real-valued function f(x) = x7/2 at x = 0? At x 0? Does this functionhave a second-order Taylor's expansion at x = 0?

8. Find the limit of the function xαlog(x) for positive x as x goes to zero, where α is a positive constant. What about(1+αx)1/x?

9. Show that the series an = (-1)n is Cesaro summable, but not summable.

10. Use Kronecker's lemma to show an = log(1+n) and bn = n1/α for any positive constant α imply n-1/αlog((n+1)!) 0.

Page 44: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-25, Page 41 ___________________________________________________________________________

41

11. State Holder's inequality in the limiting case p = 1 and q = +.

12. Consider the matrix A = . Is it symmetric? Idempotent? What is its rank?1 0.5

0.5 0.25

13. What is the rank of the matrix A = ?1 0.5 0.25

0.5 0.25 0.25

14. For the matrices A = and B = , determine which of the operations in Tables1 0.51 0.5

1 0.5 0.250.5 0.25 0.25

2.1-2.3 can be applied, and calculate the result if the operations do apply.

15. The determanent of a 2×2 matrix A = is det(A) = ad - bc. Show that this formula satisfies thea bc d

determanental identity in Section 2.3.3.

16. Prove Section 2.4 (3) and (4).

17. Prove, by multiplying out the formula, the result in Section 2.5 (2) for the inverse of partitioned matrices.

18. Prove Section 2.6 (1).

19. Calculate the Cholesky factorization and the Singular Value decomposition of the matrix A = .2 11 3

20. The Singular Value Decomposition of a matrix A of dimension m×k and rank r was defined in Section 2.8 as aproduct A = UDV, where U was m×r, D was r×r, V was k×r, the matrices U and V were both column orthogonal (i.e.,UU = I = VV) and D was diagonal with positive diagonal elements. An alternative definition, which is equivalent, isto write A = [U W2][D 0]V, where U, D, and V are the same as before, the array of 0's in [D 0] is r×(m-r), W2 ism×(m-r), and [U W2] is m*m and column orthogonal, and therefore orthonormal. Some computer programs give SVD'sin this alternative form. Define B = V[D-1 0][U W2] and C = V[D-1 G][U W2], where G is any non-zero r×(m-r) array.Show that ABA = ACA = A and that BAB = CAC = B, but then CAC C.

Page 45: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools, © 2000 Chapter 2-26, Page 42 ___________________________________________________________________________

42

21. Calculate the Moore-Penrose generalized inverse of A = , and show that A-A and AA- are idempotent.1 0.51 0.5

22. Consider the matrices A = , B = , C = , E = , and F = .1 01 0

1 00 0

1 10 0

0.5 0.50 0

0.5 0.5α α

Which of the matrices B, C, E, F meet which of the conditions in Section 2.9 to be a generalized inverse of A?

23. Prove Theorem 2.3 (2). Show that if one writes the matrix in terms of its SVD, A = UDV, then the equations havea solution iff UUy = y, and if there is a solution, then it satisfies x = VD-1Uy + [I - UU]z.

24. In 3, consider the subspace A generated by the vectors (1,1,1), (1,-1,1), and (1,3,1), the subspace B generated bythe vectors (2,1,1) and (-4,-2,-2), and the subspace C generated by the vector (1,1,1). What are the dimensions of thesesubspaces? What is the projection of B and C on A? Of A and C on B? Of A and B on C?

25. Prove a linear transformation A in n is a projection iff A is idempotent. Show that if A and B are projections, thenA + B is a projection iff AB = BA = 0, and A - B is a projection iff AB = BA = B.

26. Prove 2.11 (6) and (8).

27. Verify Section 2.12 (3) and (4) for the matrices A = , B = , C = .1 01 0

1 00 0

1 10 0

28. Consider the function g(x1,x2) = exp(-x1/2 -x2/2), and the transformation of variables x1 = rcos θ and x2 = rsin θ for

r 0 and 0 θ 2π. What is the Jacobean of this transformation? Evaluate the integral g(x1,x2)dx1dx2.

29. Prove 2.14 (1) and (2).

30. Suppose real-valued functions F(x) and G(x) are continuously differentiable on [a,b] with f(x) = xF(x) and g(x) =xG(x). Then, x(F(x)G(x)) = F(x)g(x) + f(x)G(x). Integrate this formula over [a,b] to establish the integration by parts

formula f(x)G(x)dx = F(b)G(b) - F(a)G(a) - F(x)g(x)dx. Evaluate the integral xe-xdx.b

a b

a

0

Page 46: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)123()4$5&)61!777777777777777777777777777777777777777777777777777777777777777777777777777!

,894:;<)1=)))9)<;>?;@)A#)4<AB9B?C?:D):8;A<D

1=3=))E9!4C;)E49,;

"#$%&'()'*+,%-.*+'%/.)%-).0(0*1*'2%'#$.)2%*&%'#$%3.+3$-'%./%(%"#$#%!&'!($#)*%4%5#*3#%*&%(%6$&3)*-'*.+./%$7$)2'#*+,%'#('%#(&%#(--$+$6%(+6%5*11%#(--$+%*+%'#$%8+*7$)&$9%%:+%-()'*381()4%'#*&%6$&3)*-'*.+*+3186$&%'#$%.8'3.;$&%./%(11%-).0(0*1*'2%(+6%&(;-1*+,%$<-$)*;$+'&9%%"#$%&$'%./%(11%-.&&*01$%&'('$&%./=('8)$%*&%3(11$6%'#$%"$+,-%!",$.%9%%>$'%"!6$+.'$%(%&'('$%./%=('8)$4%(+6%E%'#$%&(;-1$%&-(3$9%%"#$&$%()$(0&')(3'%.0?$3'&%'#('%-1(2%(%3.+3$-'8(1%)('#$)%'#(+%(%-)(3'*3(1%).1$%*+%'#$%6$7$1.-;$+'%./%-).0(0*1*'2'#$.)29%%@.+&$A8$+'124%'#$)$%3(+%0$%3.+&*6$)(01$%/1$<*0*1*'2%*+%'#*+B*+,%(0.8'%5#('%,.$&%*+'.%'#$6$&3)*-'*.+%./%(%&'('$%./%=('8)$%(+6%*+'.%'#$%&-$3*/*3('*.+%./% '#$%&(;-1$%&-(3$C% '#$%.+12%3)*'*3(1)$&')*3'*.+%*&%'#('%'#$)$%0$%$+.8,#%&'('$&%./%=('8)$%&.%'#('%6*&'*+3'%.0&$)7('*.+&%()$%(15(2&%(&&.3*('$65*'#%6*&'*+3'%&'('$&%./%=('8)$9%%:+%$1$;$+'()2%-).0(0*1*'2%'#$.)24%*'%*&%./'$+%3.+7$+*$+'%'.%'#*+B%./%'#$&'('$&%./%=('8)$%(&%3.))$&-.+6*+,%'.%'#$%.8'3.;$&%./%(%-()'*381()%$<-$)*;$+'4%&83#%(&%/1*--*+,%3.*+&.)%'.&&*+,%6*3$4%(+6%'.%&8--)$&&%'#$%6$&3)*-'*.+%./%$7$)2'#*+,%$1&$%*+%'#$%8+*7$)&$9%%D$3'*.+&%E9FGE9H*+%'#*&%@#(-'$)%3.+'(*+%(%/$5%3)83*(1%6$/*+*'*.+&4%/.)%$7$+'&4%-).0(0*1*'*$&4%3.+6*'*.+(1%-).0(0*1*'*$&4(+6% &'('*&'*3(1% *+6$-$+6$+3$9% % "#$2% (1&.% 3.+'(*+% (% ')$(';$+'% ./% ;$(&8)(0*1*'24% '#$% '#$.)2% ./*+'$,)('*.+4%(+6%-).0(0*1*'2%.+%-).683'%&-(3$&%'#('%*&%+$$6$6%;.&'12%/.)%;.)$%(67(+3$6%'.-*3&%*+$3.+.;$')*3&9%%"#$)$/.)$4%)$(6$)&%5#.%6.%+.'%#(7$%(%,..6%0(3B,).8+6%*+%;('#$;('*3(1%(+(12&*&%;(2/*+6%*'%8&$/81%'.%3.+3$+')('$%.+%'#$%6$/*+*'*.+&%(+6%$<(;-1$&%*+%'#$&$%&$3'*.+&4%(+6%-.&'-.+$%&'862./%'#$%;.)$%;('#$;('*3(1%;('$)*(1%8+'*1%*'%*&%+$$6$69

1=*=));>;F:)#?;CGE)9FG)?F#A<!9:?AF

E9F9I9%%J+%%/%0#!*&%(%&$'%./%&'('$&%./%=('8)$%5*'#%'#$%-).-$)'2%'#('%.+$%3(+%*+%-)*+3*-1$%6$'$);*+$5#$'#$)%'#$%$7$+'%.338)&%.)%+.'9%%:/%&'('$&%./%=('8)$%6$&3)*0$%(11%#(--$+*+,&4%*+3186*+,%'#$%.8'3.;$./%(%-()'*381()%3.*+%'.&&4%'#$+%.+$%$7$+'%;*,#'%0$%'#$%&$'%./%&'('$&%./%=('8)$%*+%5#*3#%'#*&%3.*+%'.&&3.;$&%8-%#$(6&9% %"#$% /(;*12%./%-.'$+'*(112%.0&$)7(01$%$7$+'&% *&%6$+.'$6%02%!9% %"#*&% /(;*12% *&(&&8;$6%'.%#(7$%'#$%/.11.5*+,%-).-$)'*$&KL*M%"#$%N(+2'#*+,%3(+%#(--$+N%$7$+'%E%*&%*+%!9L**M%:/%$7$+'%9%*&%*+%!4%'#$+%'#$%$7$+'%N+.'%9N4%6$+.'$6%93%.)%EO94%*&%*+%!9L***M%:/%9%(+6%B%()$%$7$+'&%*+%!4%'#$+%'#$%$7$+'%N0.'#%9%(+6%BN4%6$+.'$6%9B4%*&%*+%!9L*7M%:/%9I49F4999%%*&%(%/*+*'$%.)%3.8+'(01$%&$A8$+3$%./%$7$+'&%*+%!4%'#$+%'#$%$7$+'%N.+$%.)%;.)$%./

9I%.)%9F%.)%999N4%6$+.'$6% 9*4%*&%*+%!9

1I

J%/(;*12%!%%5*'#%'#$&$%-).-$)'*$&%*&%3(11$6%(%P2'1%-3!L.)%4&&-%$0!P2$-5%6*$M%./%&80&$'&%./%E9%%"#$%-(*)LE(!M%3.+&*&'*+,%./%(+%(0&')(3'%&$'%E%(+6%(%PG/*$16%!%./%&80&$'&%./%E%*&%3(11$6%(%+%$")*$6-%!",$.%4%(+6'#$%&$'&%*+%!%()$%3(11$6%'#$%+%$")*$6-%!&80&$'&%./%E9%%:;-1*3('*.+&%./%'#$%6$/*+*'*.+%./%(%PG/*$16%()$

Page 47: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*()4$5&)66!777777777777777777777777777777777777777777777777777777777777777777777777777!

L7M%:/%9I49F4999%%*&%(%/*+*'$%.)%3.8+'(01$%&$A8$+3$%./%$7$+'&%*+%!4%'#$+ %*&%(1&.%*+%!9

1I, 1

L7*M%:/%9I49F4999%%*&%(%3.8+'(01$%&$A8$+3$%./%$7$+'&%*+%!%'#('%*&%+&0&#&0%!3%.*%$"105%L*9$94%9I%%9F%999M4%'#$+%*'&%1*;*'4%(1&.%6$+.'$6%J*%%JQ4%*&%(1&.%*+%!9%%D*;*1()124%*/%(%&$A8$+3$%*+%!%*&%+&0&#&0%

10.*%$"105!L*9$94%9I%%9F%%999M4%'#$+%*'&%1*;*'%9Q%R% %4%*&%(1&.%*+%!%

1I, 1

L7**M%"#$%$;-'2%$7$+'%H%*&%*+%!9

S$%5*11%8&$%(%/$5%3.+3)$'$%$<(;-1$&%./%&(;-1$%&-(3$&%(+6%PG/*$16&K

;I$J.K&)39%%T"5.%3.*+%'.&&$&U%J%3.*+%*&%'.&&$6%'5*3$4%(+6%/.)%$(3#%'.&&%(%#$(6%.)%'(*1%(--$()&9>$'%V"%6$+.'$%'#$%&'('$%./%=('8)$%*+%5#*3#%'#$%/*)&'%'.&&%2*$16&%(%#$(6%(+6%'#$%&$3.+6%'.&&%2*$16&%('(*19%%"#$+%E%R%WVV4V"4"V4""X9%%>$'%!%%0$%'#$%31(&&%./%(11%-.&&*01$%&80&$'&%./%EC%!%%#(&%FH%;$;0$)&9

;I$J.K&)*9%%T@.*+%'.&&%8+'*1%(%'(*1U%J%3.*+%*&%'.&&$6%8+'*1%(%'(*1%(--$()&9%%"#$%&(;-1$%&-(3$%*&%ER%W"4%V"4%VV"4%VVV"4999X9%%:+%'#*&%$<(;-1$4%'#$%&(;-1$%&-(3$%*&%*+/*+*'$4%08'%3.8+'(01$9%%>$'%!%%0$'#$%PG/*$16%,$+$)('$6%02%'#$%/*+*'$%&80&$'&%./%E9%%"#*&%PG/*$16%3.+'(*+&%$7$+'&%&83#%(&%YJ'%;.&'%'$+#$(6&Z4%(+6%(1&.4%8&*+,%'#$%;.+.'.+$%31.&8)$%-).-$)'2%L7*M%(0.7$4%$7$+'&%&83#%(&%N"$+%.)%;.)$%'.&&$&5*'#.8'%(%'(*1N4%(+6%N(+%$7$+%+8;0$)%./%#$(6&%0$/.)$%(%'(*1N9%%J%&$'%'#('%*&%+.'%*+%!%5*11%#(7$%'#$-).-$)'2%'#('%0.'#%'#$%&$'%(+6%*'&%3.;-1$;$+'%()$%*+/*+*'$9% % :'% *&%6*//*381'% '.%6$&3)*0$%&83#%(%&$'4-)*;()*12% 0$3(8&$% '#$% 1(+,8(,$% '#('% 5$% +.);(112% 8&$% '.% 3.+&')83'% &$'&% '$+6&% '.% 3.))$&-.+6% '.$1$;$+'&%*+%'#$%PG/*$169%%V.5$7$)4%;('#$;('*3(1%(+(12&*&%&#.5&%'#('%&83#%&$'&%;8&'%$<*&'4%0$3(8&$'#$%3()6*+(1*'2%./%'#$%31(&&%./%(11%-.&&*01$%&80&$'&%./%E%*&%,)$('$)%'#(+%'#$%3()6*+(1*'2%./%!%9

;I$J.K&)19%%TD[\%&'.3B%*+6$<U%"#$%&'.3B%*+6$<%*&%(%+8;0$)%*+%'#$%-.&*'*7$%)$(1%1*+$%]4%&.%E%]9%%"(B$%'#$%PG/*$16%./%$7$+'&%'.%0$%'#$%4&*%-!P2'1%-3!"L]M4%5#*3#%*&%6$/*+$6%(&%'#$%&;(11$&'%/(;*12./%&80&$'&%./%'#$%)$(1%1*+$%'#('%3.+'(*+&%(11%'#$%.-$+%*+'$)7(1&%*+%]%(+6%&('*&/*$&%'#$%-).-$)'*$&%L*MGL*7M./%(%PG/*$169%%"#$%&80&$'&%./%]%'#('%()$%*+%"%()$%&(*6%'.%0$%+%$")*$6-%4%(+6%'#.&$%+.'%*+%"%()$%&(*6'.%0$%+.+G;$(&8)(01$9

;I$J.K&)69%%TD[\%&'.3B%*+6$<%.+%&833$&&*7$%6(2&U%"#$%&$'%./%&'('$&%./%=('8)$%*&%'#$%@()'$&*(+-).683'%./%'#$%&$'%./%3(18$&%.+%6(2%.+$%(+6%'#$%&$'%./%7(18$&%.+%6(2%F4%E%R%]^]%L(1&.%6$+.'$6%]FM9"(B$%'#$%PG/*$16%./%$7$+'&%'.%0$%'#$%-).683'%./%'#$%.+$G6*;$+&*.+(1%PG/*$16&4%!%R%"I"F4%5#$)$%NN6$+.'$&%(+%.-$)('*.+%'#('%/.);&%'#$%&;(11$&'%PG/*$16%3.+'(*+*+,%(11%&$'&%./%'#$%/.);%9^,%5*'#%9%"I%(+6%,%%"F9%%:+%'#*&%$<(;-1$4%"I%(+6%"F%()$%*6$+'*3(1%3.-*$&%./%'#$%_.)$1%PG/*$16%.+%]9%%J&&8;$'#('%'#$%*+6$<%5(&%+.);(1*`$6%'.%0$%.+$%%('%'#$%0$,*++*+,%./%'#$%-)$7*.8&%2$()9%%a<(;-1$&%./%$7$+'&*+%!%()$%N0$1.5%I%.+%6(2%IN4%N('%1$(&'%F%.+%0.'#%6(2&N4%(+6%N#*,#$)%.+%'#$%&$3.+6%6(2%'#(+%'#$%/*)&'6(2N9%%"#$%.-$)('*.+%NN%*&%6*//$)$+'%'#(+%'#$%3()'$&*(+%-).683'%N^N4%5#$)$%"I^"F%*&%'#$%/(;*12%./%(11

Page 48: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)121()4$5&)6L!777777777777777777777777777777777777777777777777777777777777777777777777777!

)$3'(+,1$&%9^,%/.);$6%/).;%9%%"I%(+6%,%%"F9%%"#*&%/(;*12%*&%+.'%*'&$1/%(%PG/*$164%08'%'#$%PG/*$16'#('%*'%,$+$)('$&%*&%"I"F9%%b.)%$<(;-1$4%'#$%$7$+'%N#*,#$)%.+%'#$%&$3.+6%6(2%'#(+%'#$%/*)&'%6(2N%*&+.'%(%)$3'(+,1$4%08'%*&%.0'(*+$6%(&%(%;.+.'.+$%1*;*'%./%)$3'(+,1$&9

:+%'#$%/*)&'%$<(;-1$4%'#$%PG/*$16%3.+&*&'$6%./%(11%-.&&*01$%&80&$'&%./%'#$%&(;-1$%&-(3$9%%"#*&%5(&+.'%'#$%3(&$%*+%'#$%1(&'%'5.%$<(;-1$&4%0$3(8&$%'#$%_.)$1%PG/*$16%6.$&%+.'%3.+'(*+%(11%&80&$'&%./%'#$)$(1%1*+$9%%"#$)$%()$%'5.%)$(&.+&%'.%*+').683$%'#$%3.;-1*3('*.+%./%6$(1*+,%5*'#%PG/*$16&%'#('%6.%+.'3.+'(*+%(11%'#$%&80&$'&%./%'#$%&(;-1$%&-(3$4%.+$%&80&'(+'*7$%(+6%.+$%'$3#+*3(19%%"#$%&80&'(+'*7$)$(&.+% *&% '#('% '#$% PG/*$16% 3(+% 0$% *+'$)-)$'$6% (&% '#$% -.'$+'*(1% *+/.);('*.+% '#('% *&% (7(*1(01$% 02.0&$)7('*.+9%%:/%(+%.0&$)7$)%*&%*+3(-(01$%./%;(B*+,%.0&$)7('*.+&%'#('%6*&'*+,8*&#%'5.%&'('$&%./%=('8)$4'#$+%'#$%PG/*$16%3(++.'%3.+'(*+%&$'&%'#('%*+3186$%.+$%./%'#$&$%&'('$&%(+6%$<3186$&%'#$%.'#$)9%%"#$+4%'#$&-$3*/*3('*.+%./%'#$%PG/*$16%5*11%6$-$+6%.+%5#('%*&%.0&$)7(01$%*+%(+%(--1*3('*.+9%%"#$%'$3#+*3(1%)$(&.+*&% '#('%5#$+% '#$% &(;-1$% &-(3$% 3.+'(*+&% (+% *+/*+*'$%+8;0$)%./% &'('$&4% *'%;(2%0$%;('#$;('*3(112*;-.&&*01$% '.% 6$/*+$% -).0(0*1*'*$&%5*'#% &$+&*01$% -).-$)'*$&% .+% (11% &80&$'&% ./% '#$% &(;-1$% &-(3$9c$&')*3'*+,%'#$%6$/*+*'*.+%./%-).0(0*1*'*$&%'.%(--).-)*('$12%3#.&$+%PG/*$16&%&.17$&%'#*&%-).01$;9%

E9F9F9%%:'%*&%-.&&*01$%'#('%;.)$%'#(+%.+$%PG/*$16%./%&80&$'&%*&%6$/*+$6%/.)%(%-()'*381()%&(;-1$%&-(3$E9%%:/%#%*&%(+%()0*')()2%3.11$3'*.+%./%&80&$'&%./%E4%'#$+%'#$%&;(11$&'%PG/*$16%'#('%3.+'(*+&%#%*&%&(*6%'.0$%'#$%PG/*$16%5%0%*$#%3!02%#9%%:'%*&%&.;$'*;$&%6$+.'$6%PL#M9%%:/%!%%(+6%$%()$%0.'#%PG/*$16&4%(+6%$%!4%'#$+%$%*&%&(*6%'.%0$%(%")62'1%-3!./%!4%(+6%!%*&%&(*6%'.%.&0#$10!+&*%!10'&*+$#1&0!.)%*%'10%!$9%%:'*&%-.&&*01$%'#('%+$*'#$)%!%%$%+.)%$%%%!9%%"#$%*+'$)&$3'*.+%!$%%./%'5.%PG/*$16&%*&%(,(*+%(%PG/*$16'#('%3.+'(*+&%'#$%.&++&0!10'&*+$#1&0%*+%!%(+6%$9%%b8)'#$)4%'#$%*+'$)&$3'*.+%./%(+%()0*')()2%3.8+'(01$.)%8+3.8+'(01$%3.11$3'*.+%./%PG/*$16&% *&%(,(*+%(%PG/*$169% %"#$%8+*.+%!$%./% '5.%PG/*$16&% *&%+.'+$3$&&()*12%(%PG/*$164%08'%'#$)$%*&%(15(2&%(%&;(11$&'%PG/*$16%'#('%)$/*+$&%0.'#%!%(+6%$4%5#*3#%*&%&*;-12'#$%PG/*$16%PL!$M%,$+$)('$6%02%'#$%&$'&%*+%'#$%8+*.+%./%!%(+6%$4%.)%-8'%(+.'#$)%5(24%'#$%*+'$)&$3'*.+./%(11%PG/*$16&%'#('%3.+'(*+%0.'#%!%(+6%$9%

;I$J.K&)39%%L3.+'*+8$6M%>$'%!%6$+.'$%'#$%PG/*$16%./%(11%&80&$'&%./%E9%%J+.'#$)%PG/*$16%*&%$%RWH4E4WV"4VVX4W""4"VXX4%3.+'(*+*+,%$7$+'&%5*'#%*+/.);('*.+%.+12%.+%'#$%.8'3.;$%./%'#$%/*)&'%3.*+'.&&9%%d$'%(+.'#$)%PG/*$16%3.+'(*+&%'#$%$7$+'&%5*'#%*+/.);('*.+%.+12%.+%'#$%+8;0$)%./%#$(6&4%08'%+.''#$*)% .)6$)4% %% R% WH4E4WVVX4W""X4WV"4"VX4WVV4""X4WV"4"V4""X4WVV4V"4"VXX9% "#$+4% !3.+'(*+&%;.)$%*+/.);('*.+%'#(+%$%.)%%9% %"#$%*+'$)&$3'*.+%$%% *&% '#$%Y+.%*+/.);('*.+Z%PG/*$16WH4EX9%%"#$%8+*.+%$%%*&%+.'%(%PG/*$164%(+6%'#$%PG/*$16%PL$%M%'#('%*'%,$+$)('$&%*&%!9%%"#*&%3(+%0$7$)*/*$6%3.+&')83'*7$12%L*+%'#*&%/*+*'$%D%3(&$M%02%08*16*+,%8-%PL$%M%02%/.);*+,%*+'$)&$3'*.+&%(+68+*.+&%./%;$;0$)&%./%$%4%08'%*&%(1&.%.07*.8&%&*+3$%B+.5*+,%'#$%.8'3.;$%./%'#$%/*)&'%'.&&%(+6B+.5*+,%'#$%'.'(1%+8;0$)%./%#$(6&%)$7$(1&%/811%*+/.);('*.+%.+%0.'#%'.&&$&9

;I$J.K&)19%%L3.+'*+8$6M%>$'%!%6$+.'$%'#$%_.)$1%PG/*$169%%"#$+%$%R%WH4E4LI4M4LG4IUX%(+6%&%RWH4E4WG4FM4TF4MX%()$%0.'#%PG/*$16&4%'#$%/*)&'%3.))$&-.+6*+,%'.%'#$%(0*1*'2%'.%.0&$)7$%5#$'#$)%'#$

Page 49: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)126()4$5&)6M!777777777777777777777777777777777777777777777777777777777777777777777777777!

*+6$<%*&%(0.7$%I4%'#$%&$3.+6%3.))$&-.+6*+,%'.%'#$%(0*1*'2%'.%'$11%5#$'#$)%*'%*&%(0.7$%F9%%b.)%&#.)'#(+641$'%(%R%LG4IU4%0%R%LG4FU4%3%R%LI4]M4%6%R%LF4]M4%(+6%$%R%LI4FU9%%=$*'#$)%$%.)%&%3.+'(*+&%'#$%.'#$)40.'#%()$%3.+'(*+$6%*+%!4%(+6%'#$*)%*+'$)&$3'*.+%*&%'#$%Y+.%*+/.);('*.+Z%PG/*$16%WH4EX9%%"#$%%PG/*$16,$+$)('$6%02%'#$*)%8+*.+4%3.))$&-.+6*+,%'.%'#$%(0*1*'2%'.%'$11%*/%'#$%*+6$<%*&%*+%(4$4%.)%64%*&%PL$&M%RWH4E4(4043464$4(6X9%

J+%$1$;$+'%B%*+%(%PG/*$16%$%./%&80&$'&%./%E%*&%(+%$#&+%*/%'#$%.+12%&$'%*+%$%'#('%*&%(%-).-$)%&80&$'./%B%*&%'#$%$;-'2%&$'%H9%%:+%'#$%1(&'%$<(;-1$4%&%#(&%('.;&%0%(+6%64%(+6%'#$%('.;&%./%PL$&M%()$%(464%(+6%$4%08'%+.'%0%R%($%.)%3%R%$69%%"#$%('.;&%./%'#$%_.)$1%PG/*$16%()$%'#$%*+6*7*68(1%)$(1%+8;0$)&9J+%$3.+.;*3%*+'$)-)$'('*.+%./%'#*&%3.+3$-'%'#('%*/%'#$%PG/*$16%6$/*+*+,%'#$%3.;;.+%*+/.);('*.+%./'5.%$3.+.;*3%(,$+'&%3.+'(*+&%(+%('.;4%'#$+%(%3.+'*+,$+'%3.+')(3'%0$'5$$+%'#$;%;8&'%#(7$%'#$%&(;$)$(1*`('*.+%+.%;(''$)%5#('%&'('$%./%=('8)$%5*'#*+%'#*&%('.;%.338)&9

1=1=))4<AB9B?C?:D

E9E9I9%%e*7$+%(%&(;-1$%&-(3$%E%(+6%PG/*$16%./%&80&$'&%!4%(%,*&6$61-1#7!L.)%,*&6$61-1#7!+%$")*%M*&%6$/*+$6%(&%(%/8+3'*.+%\%/).;%!%%*+'.%'#$%)$(1%1*+$%5*'#%'#$%/.11.5*+,%-).-$)'*$&K

L*M%\L9M%%Q%/.)%(11%9%%!9L**M%\LEM%R%I9L***M%[email protected]+'(01$%J66*'*7*'2U%:/%9I4%9F4999%%*&%(%/*+*'$%.)%3.8+'(01$%&$A8$+3$%./%$7$+'&%*+%!%%'#('%()$

;8'8(112%$<318&*7$%L*9$94%9*9?%R%H%/.)%(11%*%%?M4%'#$+%\L 9*M%R% \L9*M9%

1I

1I

S*'#%3.+6*'*.+&%L*MGL***M4%\%#(&%'#$%/.11.5*+,%(66*'*.+(1%*+'8*'*7$%-).-$)'*$&%./%(%-).0(0*1*'2%5#$+%9(+6%B%()$%$7$+'&%*+%!K%

L*7M%\L9M%]%\L93M%R%I9L7M%\LHM%R%Q9%%L7*M%\L9BM%R%\L9M%]%\LBM%G%\L9BM9L7**M%\L9M%%\LBM%5#$+%B%%99%%L7***M%:/%9*%*+%!%%*&%;.+.'.+$%6$3)$(&*+,%'.%H%L6$+.'$6%9*%%HM4%'#$+%\L9*M% %Q9%%

L*<M%:/%9*%%!4%+.'%+$3$&&()*12%6*&?.*+'4%'#$+%\L 9*M%% \L9*M9%

1I

1I

L<M%:/%N9*X%*&%(%/*+*'$%.)%3.8+'(01$%,$*#1#1&0%./%E%L*9$94%'#$%$7$+'&%9*%%!%%()$%;8'8(112%$<318&*7$

(+6%$<#(8&'*7$4%.)%9*9?%R%H%/.)%(11%*%%?%(+6% %9*%R%EM4%'#$+%\LBM%R \LB9*M9

1I

1I

Page 50: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12L()4$5&)6O!777777777777777777777777777777777777777777777777777777777777777777777777777!

"#$%')*-1$'%LE4!4\M%3.+&*&'*+,%./%(%;$(&8)(01$%&-(3$%LE(!M%(+6%(%-).0(0*1*'2%;$(&8)$%\%*&%3(11$6%(,*&6$61-1#7!",$.%9

;I$J.K&)3%L3.+'*+8$6M9%@.+&*6$)%'#$%PG/*$16%%%3.+'(*+*+,%*+/.);('*.+%.+%'#$%+8;0$)%./%#$(6&408'%+.'% '#$*)%.)6$)9% %"#$%'(01$%0$1.5%,*7$&%'#)$$%/8+3'*.+&%\I4%\F4%\E%6$/*+$6%.+%%9% %J11%&('*&/2-).-$)'*$&%L*M%(+6%L**M%/.)%(%-).0(0*1*'29%%b8+3'*.+&%\F%(+6%\E%(1&.%&('*&/2%L***M4%(+6%()$%-).0(0*1*'*$&408'%\I%7*.1('$&%L***M%&*+3$%\ILWVVXW""XM%%\ILWVVXM%]%\ILW""XM9%%"#$%-).0(0*1*'2%\F%*&%,$+$)('$602%/(*)%3.*+&4%(+6%'#$%-).0(0*1*'2%\E%02%.+$%/(*)%3.*+%(+6%.+$%0*(&$6%3.*+9

H E VV "" V"4"V VV4"" V"4"V4"" VV4V"4"V

\I Q I IfE IfE IfF IfF FfE FfE

\F Q I IfH IfH IfF IfF EfH EfH

\E Q I IfE Ifg IfF IfF FfE hfg

E9E9F9%%:/%9%%!%%#(&%\L9M%R%I4%'#$+%9%*&%&(*6%'.%.338)%$-+&"#!")*%-7!L(9&9M4%.)%81#9!,*&6$61-1#7!&0%L59-9IM9%%:/%9%%!%#(&%\L9M%R%Q4%'#$+%9%*&%&(*6%'.%.338)%5*'#%,*&6$61-1#7!:%*&!L59-9QM9%%b*+*'$%.)3.8+'(01$%*+'$)&$3'*.+&%./%$7$+'&%'#('%.338)%(1;.&'%&8)$12%(,(*+%.338)%(1;.&'%&8)$124%(+6%/*+*'$%.)3.8+'(01$%8+*.+&%./%$7$+'&%'#('%.338)%5*'#%-).0(0*1*'2%`$).%(,(*+%.338)%5*'#%-).0(0*1*'2%`$).9

;I$J.K&)*9%%L3.+'*+8$6M%:/%'#$%3.*+%*&%/(*)4%'#$+%'#$%-).0(0*1*'2%./%BGI%#$(6&%/.11.5$6%02%(%'(*1*&%IfFB9%%i&$%'#$%,$.;$')*3%&$)*$&%/.);81(&%*+%F9I9IQ%'.%7$)*/2%'#('%'#$%-).0(0*1*'2%./%YJ'%;.&'%E#$(6&Z%*&%IhfIg4%./%N"$+%.)%;.)$%#$(6&N%*&%IfFIQ4%%(+6%./%N(+%$7$+%+8;0$)%./%#$(6&N%*&%FfE9

;I$J.K&)19%%L3.+'*+8$6M%@.+&*6$)%'#$%/8+3'*.+%\%6$/*+$6%.+%.-$+%&$'&%L&4M%%]%02%\LL&4MM%R$G&fF9%%"#*&%/8+3'*.+%;(-&%*+'.%'#$%8+*'%*+'$)7(19%%:'%*&%'#$+%$(&2%'.%&#.5%'#('%\%&('*&/*$&%-).-$)'*$&L*MGL***M%./%(%-).0(0*1*'2%.+%'#$%)$&')*3'$6%/(;*12%./%.-$+%*+'$)7(1&4%(+6%(%1*''1$%5.)B%'.%&#.5%'#('%5#$+(%-).0(0*1*'2%*&%6$'$);*+$6%.+%'#*&%/(;*12%./%.-$+%*+'$)7(1&4%'#$+%*'%*&%8+*A8$12%6$'$);*+$6%.+%'#$PG/*$16%,$+$)('$6%02%'#$&$%*+'$)7(1&9%%a(3#%&*+,1$%-.*+'4%&83#%(&%WIX4%*&%*+%!9%%"(B*+,%*+'$)7(1&%'#('&#)*+B%'.%'#*&%-.*+'4%$(3#%&*+,1$%-.*+'%.338)&%5*'#%-).0(0*1*'2%`$).9%%"#$+4%(%3.8+'(01$%&$'%./%-.*+'&.338)&%59-9Q9

E9E9E9%%j/'$+%(%;$(&8)(01$%&-(3$%LE(!M%5*11%#(7$%(+%(&&.3*('$6%+%$")*%!k%'#('%*&%(%3.8+'(012

(66*'*7$%/8+3'*.+%/).;%!%*+'.%'#$%+.++$,('*7$%)$(1%1*+$C%*9$94%kL 9*M%R% kL9*M%/.)%(+2

1I

1I

&$A8$+3$%./%6*&?.*+'%9*%%!9%%"#$%;$(&8)$%*&%,&"1#1/%!*/%kL9M%%Q%/.)%(11%9%%!C%5$%5*11%3.+&*6$)%.+12-.&*'*7$%;$(&8)$&9%%"#$%;$(&8)$%k%*&%'101#%!*/%kL9M%%l%/.)%&.;$%3.+&'(+'%l%(+6%(11%9%%!4%(+6

Page 51: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12M()4$5&)6P!777777777777777777777777777777777777777777777777777777777777777777777777777!

P2'101#%!*/%!%%3.+'(*+&%(%3.8+'(01$%-()'*'*.+%W9*X%./%E%&83#%'#('%'#$%;$(&8)$%./%$(3#%-()'*'*.+%&$'%*&/*+*'$C%*9$94%kL9*M%m%]9%%"#$%;$(&8)$%k%;(2%0$%(%-).0(0*1*'24%08'%;.)$%3.;;.+12%*'%*&%(%;$(&8)$%./N1$+,'#N%.)%N7.18;$N9%%b.)%$<(;-1$4%*'%*&%3.;;.+%5#$+%'#$%&(;-1$%&-(3$%E%*&%'#$%3.8+'(01$%&$'%./-.&*'*7$%*+'$,$)&%'.%6$/*+$%k%'.%0$%.&)0#105!+%$")*%!5*'#%kL9M%$A8(1%'.%'#$%+8;0$)%./%-.*+'&%*+%99S#$+%'#$%&(;-1$%&-(3$%E%*&%'#$%)$(1%1*+$4%5*'#%'#$%_.)$1%PG/*$16%"4%*'%*&%3.;;.+%'.%6$/*+$%k%'.%0$;%6%"5)%!+%$")*%4%5*'#%kLL(40MM%R%0%G%(%/.)%(+2%.-$+%*+'$)7(1%L(40M9%%_.'#%./%'#$&$%$<(;-1$&%()$-.&*'*7$%PG/*+*'$%;$(&8)$&9%%J%&$'%9%*&%&(*6%'.%0$%./%<2+%$")*%!:%*&%*/%kL9M%R%Q9%%J%-).-$)'2%'#('%#.16&$<3$-'%.+%(%&$'%./%;$(&8)$% $).%*&%&(*6%'.%#.16%$-+&"#!%/%*789%*%%L(9$9M9%%:'%5*11%&.;$'*;$&%0$%8&$/81'.%'(1B%(0.8'%(%PG/*+*'$%;$(&8)$%&-(3$%LE(!4nM%5#$)$%n%*&%-.&*'*7$%(+6%PG/*+*'$%(+6%;(2%$*'#$)%0$%(-).0(0*1*'2%;$(&8)$%.)%(%;.)$%,$+$)(1%3.8+'*+,%.)%1$+,'#%;$(&8)$%&83#%(&%>$0$&,8$%;$(&8)$9

E9E9H9%D8--.&$%/%*&%(%)$(1G7(18$6%/8+3'*.+%.+%(%PG/*+*'$%;$(&8)$%&-(3$%LE(!4nM9%%"#*&%/8+3'*.+%*&+%$")*$6-%% */% % /GIL,M%%!% % /.)%$(3#%.-$+%&$'%,% *+% '#$%)$(1% 1*+$9% %J%;$(&8)(01$%/8+3'*.+%#(&%'#$-).-$)'2%'#('%*'&%3.+'.8)%&$'&%./%'#$%/.);%W&Eo(/L&M3X%()$%3.+'(*+$6%*+%!%9%%"#*&%*;-1*$&%'#('%*/%B%!%%*&%(+%('.;4%'#$+%/L&M%;8&'%0$%3.+&'(+'%/.)%(11%&%%B9%%%

"#$%*+'$,)(1%./%;$(&8)(01$%/%.+%(%&$'%9%%!4%6$+.'$6% /L&MnL6&M4%*&%6$/*+$6%/.)%nL9M%m%],

(&%'#$%1*;*'%(&%+% %%./%&8;&%./%'#$%/.); LBf+MnL,B+M4%5#$)$%,B+%*&%'#$%&$'%./%&'('$&%./

=

=('8)$% *+% 9% /.)% 5#*3#% /L&M% *&% 3.+'(*+$6% *+% '#$% *+'$)7(1% LBf+4LB]IMf+U9% % J% /*+*'$% 1*;*'% $<*&'&

*/ oBf+onL,B+M%m%]4% *+%5#*3#%3(&$%/% *&%&(*6%'.%0$% 10#%5*$6-%%.+%99% %>$'%W9*X%%!) %0$%(

=

3.8+'(01$%-()'*'*.+%./%E%5*'#%nL9*M%m%]4%,8()(+'$$6%02%'#$%PG/*+*'$%-).-$)'2%./%n9%%"#$%/8+3'*.+%/

*&%*+'$,)(01$%.+%(%,$+$)(1%&$'%9%%!%%%*/%*'%*&%*+'$,)(01$%.+%99*%/.)%$(3#%*%(+6%*/ o/L&MonL6&M%R,

1*;+% o/L&MonL6&M%$<*&'&4%(+6%&*;-12% 10#%5*$6-%% */% *'% *&% *+'$,)(01$%/.)%9%R%E9% % :+01I ,,1

,$+$)(14%'#$%;$(&8)$%n%3(+%#(7$%-.*+'%;(&&$&%L('%('.;&M4%.)%3.+'*+8.8&%;$(&8)$4%.)%0.'#4%&.%'#('%'#$

+.'('*.+%/.)%*+'$,)('*.+%5*'#%)$&-$3'%'.%n%*+3186$&%&8;&%(+6%;*<$6%3(&$&9%%"#$%*+'$,)(1% /L&MnL6&M,

5*11%&.;$'*;$&%0$%6$+.'$6% /L&M6n4%.)%*+%'#$%3(&$%./%>$0$&,8$%;$(&8)$4 /L&M6&9%%, ,E9E9h9%b.)%(%PG/*+*'$%;$(&8)$%&-(3$%LE(!4nM4%6$/*+$%CALE4!4nM%/.)%I%%A%m%]%'.%0$%'#$%&$'%./

;$(&8)(01$%)$(1G7(18$6%/8+3'*.+&%.+%E%5*'#%'#$%-).-$)'2%'#('% o/oA%*&%*+'$,)(01$4%(+6%6$/*+$%/A%R

Page 52: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12O()4$5&)6Q!777777777777777777777777777777777777777777777777777777777777777777777777777!

T /L&MA% nL6&MUIfA% '.% 0$% '#$% 0&*+% ./% /9% % "#$+4% CALE4!4nM% *&% (% 1*+$()% &-(3$4% &*+3$% 1*+$()3.;0*+('*.+&%./% *+'$,)(01$%/8+3'*.+&%()$%(,(*+%*+'$,)(01$9%%"#*&%&-(3$%#(&%;(+24%08'%+.'%(114%.//(;*1*()%-).-$)'*$&%./%/*+*'$G6*;$+&*.+(1%a831*6$(+%&-(3$9%%"#$%&$'%./%(11%1*+$()%/8+3'*.+&%.+%'#$%&-(3$CALE4!4nM%/.)%A%p%I%*&%'#$%&-(3$%C)LE4!4nM4%5#$)$%If)%R%I%G%IfA9%%"#*&%/.11.5&%/).;%(+%(--1*3('*.+%./V.16$)q&%*+$A8(1*'24%5#*3#%,$+$)(1*`$&%/).;%/*+*'$%7$3'.)%&-(3$&%'.%'#$%3.+6*'*.+

/%%CALE4!4nM%(+6%,%%C)LE4!4nM%5*'#%AGI%]%)GI%R%I%*;-12 /L&M,L&M%nL6&M%%/A,)9"#$%3(&$%A%R%)%R%F%,*7$&%'#$%@(83#2GD3#5()'`%*+$A8(1*'2%*+%,$+$)(1%/.);9%%"#*&%3(&$%()*&$&%./'$+%*+&'('*&'*3&4%5*'#%'#$%/8+3'*.+&%/%*+'$)-)$'$6%(&%)(+6.;%7()*(01$&%(+6%'#$%+.);%/F%*+'$)-)$'$6%(&%(A8(6)('*3%;$(+%.)%7()*(+3$9%

E9E9g9%"#$)$%()$%'#)$$%*;-.)'(+'%3.+3$-'&%/.)%'#$%1*;*'%./%(%&$A8$+3$%./%/8+3'*.+&%/+%%CALE4!4nM9b*)&'4%'#$)$%*&%.&0/%*5%0.%!10!0&*+4%.)%&').+,%3.+7$),$+3$K%%/%*&%(%1*;*'%./%/+%*/%/+%G%/A% %Q9%%D$3.+64'#$)$%*&%.&0/%*5%0.%!10!>2+%$")*%K%%/%*&%(%1*;*'%./%/+%*/%nLW&D%o/+L&M%G%/L&Mo%p%XM% %Q%/.)%$(3#%%p%Q9

"#*)64%'#$)$%*&%8%$=!.&0/%*5%0.%K%%/%*&%(%1*;*'%./%/+%*/ L/+L&M%G%/L&MM,L&M%nL6&M% %Q%/.)%$(3#%,%C)LE4!4nM%5*'#%If)%R%I%G%IfA9%%"#$%/.11.5*+,%)$1('*.+&#*-%#.16&%0$'5$$+%'#$&$%;.6$&%./%3.+7$),$+3$K

D').+,%@.+7$),$+3$%%S$(B%@.+7$),$+3$%%@.+7$),$+3$%*+%nG;$(&8)$

J+%$<(;-1$%&#.5&%'#('%3.+7$),$+3$%*+%nG;$(&8)$%6.$&%+.'%*+%,$+$)(1%*;-12%5$(B%3.+7$),$+3$K@.+&*6$)%CFLR+(3S4"4nM%5#$)$%"% *&% '#$%_.)$1%PG/*$16%(+6%n% *&%>$0$&,8$%;$(&8)$9% %@.+&*6$)% '#$&$A8$+3$%/+L&M%R+3L&If+M9%%"#$+%nLW&Do%o/+L&Mo%p%XM%R%If+4%&.%'#('%/+%3.+7$),$&%*+%nG;$(&8)$%'.

`$).4%08'%/.)%,L&M%R%&GIfE4%.+$%#(&%,F%R%EIfF%(+6 /+L&M,L&M%nL6&M%R%E+IfEfF%6*7$),$+'9%%J+.'#$)$<(;-1$%&#.5&%'#('%5$(B%3.+7$),$+3$%6.$&%+.'%*+%,$+$)(1%*;-12%&').+,%3.+7$),$+3$K%%@.+&*6$)%ER%WI4F4999X%$+6.5$6%5*'#%'#$%PG/*$16%,$+$)('$6%02%'#$%/(;*12%./%/*+*'$%&$'&%(+6%'#$%;$(&8)$%n%'#(',*7$&%5$*,#'%BGIfF%'.%-.*+'%B9%%@.+&*6$)%/+LBM%R+IfH3LB%R%+M9%%"#$+%/+FR%I9%%:/%,%*&%(%/8+3'*.+%/.)

5#*3#% /+LBM,LBMnLWBXM%R%,L+M+IfH%6.$&%+.'%3.+7$),$%'.%`$).4%'#$+%,LBMF%nLWBXM%*&%0.8+6$6

=I

(5(2%/).;%`$).%*+/*+*'$12%./'$+4%*;-12*+,%,F%R% ,LBMF%nLWBXM%R%]9%%"#$+4%/+%3.+7$),$&

=I

5$(B124%08'%+.'%&').+,124%'.% $).9%%"#$%/.11.5*+,%'#$.)$;4%5#*3#%*&%./%,)$('%*;-.)'(+3$%*+%(67(+3$6$3.+.;$')*3&4%,*7$&%(%8+*/.);*'2%3.+6*'*.+%8+6$)%5#*3#%'#$&$%;.6$&%./%3.+7$),$+3$%3.*+3*6$9

Page 53: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12P()4$5&)L+!777777777777777777777777777777777777777777777777777777777777777777777777777!

:-&T0&J)1=3=%L>$0$&,8$%r.;*+('$6%@.+7$),$+3$M%%:/%,%(+6%/+%/.)%+%R%I4F4999%()$%*+%CALE4!4nM%/.)I%%A%m%]%(+6%(%PG/*+*'$%;$(&8)$%&-(3$%LE(!4nM4%(+6%*/%o/+L&Mo%%,L&M%(1;.&'%$7$)25#$)$4%'#$+%/+3.+7$),$&%*+%nG;$(&8)$%'.%(%/8+3'*.+%/%*/%(+6%.+12%*/%/%%CALE4!4nM%(+6%/+%G%/A% %Q9

j+$% (--1*3('*.+% ./% '#*&% '#$.)$;% *&% (% )$&81'% /.)% *+'$)3#(+,$% ./% '#$% .)6$)% ./% *+'$,)('*.+% (+66*//$)$+'*('*.+9%%D8--.&$%/L4'M%%CALE4!4nM%/.)%'%*+%(+%.-$+%&$'%:%%+9%%D8--.&$%/%*&%31''%*%0#1$6-%4;$(+*+,%'#('%'#$)$%$<*&'&%(%/8+3'*.+%'/L4'M%%CALE4!4nM%/.)%'%%:%&83#%'#('%*/%']#%%:%(+6%#%%Q4%'#$+'#$%)$;(*+6$)%/8+3'*.+%)L&4'4#M%R%T/L&4']#M%G%/L&4'M%G%'/L4'M#Ufo#o%%CALE4!4nM%3.+7$),$&%*+%nG;$(&8)$

'.%`$).%(&%#% %Q9%%r$/*+$%bL'M%R% /L&4'MnL6&M9%%:/%'#$)$%$<*&'&%,%%CALE4!4nM%5#*3#%6.;*+('$&%'#$)$;(*+6$)%/8+3'*.+%L*9$94%o)L&4'4#Mo%%,L&M%(9$9M4%'#$+%"#$.)$;%E9I%*;-1*$&%1*;#Q)L4'4#MA%R%Q4%(+6%bL'M

*&%6*//$)$+'*(01$%(+6%&('*&/*$&%'bL'M%R% '/L&4'MnL6&M9J%/*+*'$%;$(&8)$%\%.+%LE(!M%*&%$6"&-)#%-7!.&0#10)&)"!5*'#%)$&-$3'%'.%(%;$(&8)$%k%*/%%9%%!%%(+6

kL9M%R%Q%*;-12%\L9M%R%Q9%%:/%\%*&%(%-).0(0*1*'2%;$(&8)$%'#('%*&%(0&.18'$12%3.+'*+8.8&%5*'#%)$&-$3'%'.'#$%;$(&8)$% k4% '#$+% (+% $7$+'% ./%;$(&8)$% `$).% .338)&% 59-9Q4% (+6% (+% $7$+'% '#('% *&% ')8$% (1;.&'$7$)25#$)$%.338)&%(1;.&'%&8)$129%%J%/8+6(;$+'(1%)$&81'%/).;%(+(12&*&%*&%'#$%'#$.)$;K

:-&T0&J)1=*=%Lc(6.+G=*B.62;M%:/%(%/*+*'$%;$(&8)$%\%.+%(%;$(&8)(01$%&-(3$%LE(!M%*&%(0&.18'$123.+'*+8.8&%5*'#%)$&-$3'%'.%(%-.&*'*7$%PG/*+*'$%;$(&8)$%k%.+%LE(!M4%'#$+%'#$)$%$<*&'&%(+%*+'$,)(01$%)$(1G7(18$6%/8+3'*.+%-%%CILE4!4kM%&83#%'#('

% -L&MkL6&M%R%\L9M%/.)%$(3#%9%%!9%%,S#$+%\% *&%(%-).0(0*1*'24% '#$%/8+3'*.+%-%,*7$+%02% '#$% '#$.)$;%*&%+.++$,('*7$4%(+6% *&%3(11$6% '#$,*&6$61-1#7!3%0"1#79%%J+%*;-1*3('*.+%./%'#$%c(6.+G=*B.62;%'#$.)$;%*&%'#('%*/%(%;$(&8)(01$%&-(3$LE(!M%#(&%(%-.&*'*7$%PG/*+*'$%;$(&8)$%k%(+6%(%-).0(0*1*'2%;$(&8)$%\%'#('%*&%(0&.18'$12%3.+'*+8.8&%5*'#)$&-$3'%'.%k4%'#$+%'#$)$%$<*&'&%(%6$+&*'2%-%&83#%'#('%/.)%$7$)2%/%%CALE4!4\M%/.)%&.;$%I%%A%m%]4%.+$

#(& /L&M\L6&M%R% /L&M-L&MkL6&M9! !

E9E9s9%%:+%(--1*3('*.+&%5#$)$%'#$%-).0(0*1*'2%&-(3$%*&%'#$%)$(1%1*+$%5*'#%'#$%_.)$1%PG/*$164%5*'#%(-).0(0*1*'2%\%&83#%'#('%\LLG4&UM%R%bL&M%*&%3.+'*+8.8&12%6*//$)$+'*(01$4%'#$%/8+6(;$+'(1%'#$.)$;%./

*+'$,)(1%3(13818&%&'('$&%'#('%-L&M%R%bL&M%&('*&/*$&%bL9M%R% -L&M6&9%%S#('%'#$%c(6.+G=*B.62;,'#$.)$;%6.$&% *&%$<'$+6% '#*&% )$&81'% '.%PG/*+*'$%;$(&8)$%&-(3$&%(+6%5$(B$+% '#$%(&&8;-'*.+%/).;

Page 54: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12Q()4$5&)L3!777777777777777777777777777777777777777777777777777777777777777777777777777!

3.+'*+8.8&%6*//$)$+'*(0*1*'2%'.%(0&.18'$%3.+'*+8*'29%%:+%0(&*3%$3.+.;$')*3&4%5$%5*11%./'$+%3#()(3'$)*`$-).0(0*1*'*$&%0.'#%*+%'$);&%./%'#$%-).0(0*1*'2%;$(&8)$%L.)%6*&')*08'*.+M%(+6%'#$%6$+&*'24%(+6%5*118&8(112%+$$6%.+12%'#$%$1$;$+'()2%3(13818&%7$)&*.+%./%'#$%c(6.+G=*B.62;%)$&81'9%%V.5$7$)4%*'%*&8&$/81% *+% '#$.)$'*3(1% 6*&38&&*.+&% '.% )$;$;0$)% '#('% '#$% c(6.+G=*B.62;% '#$.)$;% ;(B$&% '#$3.++$3'*.+%0$'5$$+%-).0(0*1*'*$&%(+6%6$+&*'*$&9%%S$%,*7$%'5.%$<(;-1$&%'#('%*118&')('$%-)(3'*3(1%8&$./%'#$%3(13818&%7$)&*.+%./%'#$%c(6.+G=*B.62;%'#$.)$;9%%;I$J.K&)19%L3.+'*+8$6M%e*7$+%\LL&4MM%R%$G&fF4%.+$%3(+%8&$%'#$%6*//$)$+'*(0*1*'2%./%'#$%/8+3'*.+

*+%&%'.%(),8$%'#('%*'%*&%(0&.18'$12%3.+'*+8.8&%5*'#%)$&-$3'%'.%>$0$&,8$%;$(&8)$%.+%'#$%1*+$9%%t$)*/202%*+'$,)('*.+%'#('%'#$%6$+&*'2%*;-1*$6%02%'#$%c(6.+G=*B.62;%'#$.)$;%*&%-L&M%R%$G&fFfF9

;I$J.K&)L9%%J%-).0(0*1*'2%'#('%(--$()&%/)$A8$+'12%*+%&'('*&'*3&%*&%'#$%0&*+$-4%5#*3#%*&%6$/*+$6.+% L4"M4% 5#$)$% % *&% '#$% )$(1% 1*+$% (+6% "% '#$% _.)$1% PG/*$164% 02% '#$% 6$+&*'2% 0L&Gn4PM%

4%&.%'#('%\L9M%R% 9%%:+%'#*&%-).0(0*1*'24%n%(+6%P%()$LFuPFMIfF% L"nMFfFPF , LFuPFMIfF% L"nMFfFPF3"

-()(;$'$)&%'#('%()$%*+'$)-)$'$6%(&%6$'$);*+*+,%'#$%-&.$#1&0!(+6%".$-%!./%'#$%-).0(0*1*'24%)$&-$3'*7$129S#$+%n%R%Q%(+6%P%R%I4%'#*&%-).0(0*1*'2%*&%3(11$6%'#$%"#$03$*3!0&*+$-9

E9E9v9%@.+&*6$)%(%-).0(0*1*'2%&-(3$%LE(!4\M4%(+6%(%PG/*$16%$%%!9%%:/%'#$%$7$+'%B%%$%#(&%\LBM%pQ4% '#$+% '#$%.&031#1&0$-!,*&6$61-1#7!./%9%,*7$+%B% *&%6$/*+$6%(&%\L9BM%R%\L9BMf\LBM9% %D'('$6(+.'#$)%5(24%\L9BM%*&%(%)$(1G7(18$6%/8+3'*.+%.+%!^$%5*'#%'#$%-).-$)'2%'#('%\L9BM%R%\L9BM\LBM/.)%(11%9%%!%(+6%B%%$9%%S#$+%B%*&%(%/*+*'$%&$'4%'#$%3.+6*'*.+(1%-).0(0*1*'2%./%9%,*7$+%B%*&%'#$%)('*../%&8;&

%\L9oBM%R% 9",- ?LW"XM

"- ?LW"XM

;I$J.K&)M9%j+%(%A8*`%&#.54%(%3.+'$&'(+'%*&%&#.5+%'#)$$%6..)&4%.+$%./%5#*3#%3.+3$(1&%(%-)*`$4(+6%*&%(&B$6%'.%&$1$3'%.+$9%%_$/.)$%*'%*&%.-$+$64%'#$%#.&'%.-$+&%.+$%./%'#$%)$;(*+*+,%6..)&%5#*3#%#$B+.5&%6.$&%+.'%3.+'(*+%'#$%-)*`$4%(+6%(&B&%'#$%3.+'$&'(+'%5#$'#$)%&#$%5(+'&%'.%B$$-%#$)%.)*,*+(1&$1$3'*.+%.)%&5*'3#%'.%'#$%.'#$)%)$;(*+*+,%8+.-$+$6%6..)9%%D#.816%'#$%3.+'$&'(+'%&5*'3#w%%r$&*,+('$'#$%3.+'$&'(+'q&%*+*'*(1%&$1$3'*.+%(&%6..)%I9%%"#$%&(;-1$%&-(3$%3.+&*&'&%./%-(*)&%./%+8;0$)&%(04%5#$)$(%R%I4F4E%*&%'#$%+8;0$)%./%'#$%6..)%3.+'(*+*+,%'#$%-)*`$%(+6%0%R%F4E%*&%'#$%+8;0$)%./%'#$%6..)%.-$+$602%'#$%#.&'4%5*'#%0%%(K%%D%R%WIF4IE4FE4EFX9%%"#$%-).0(0*1*'2%*&%IfE%'#('%'#$%-)*`$%*&%0$#*+6%$(3#%6..)9"#$%3.+6*'*.+(1%-).0(0*1*'2%./%0%R%F4%,*7$+%(%R%I4%*&%IfF4%&*+3$%*+%'#*&%3(&$%'#$%#.&'%.-$+&%6..)%F%.)6..)%E%('% )(+6.;9% %V.5$7$)4% '#$%3.+6*'*.+(1%-).0(0*1*'2%./%0%R%F4%,*7$+%(%R%F% *&%`$).% (+6% '#$3.+6*'*.+(1%-).0(0*1*'2%./%0%R%F%,*7$+%(%R%E%*&%.+$9%%V$+3$4%\LIFM%R%\LIEM%R%LIfEMLIfFM4%(+6%\LFEM%R\LEFM%R%IfE9%%>$'%J%R%WIF4IEX%0$%'#$%$7$+'%'#('%6..)%I%3.+'(*+&%'#$%-)*`$%(+6%_%R%WIF4EFX%0$%'#$

Page 55: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)123+()4$5&)L*!777777777777777777777777777777777777777777777777777777777777777777777777777!

!"

#"

$"

%"

&"

'!"

()*+,-"(.+/)

!" #" $" %" &" '!"01/2."(.+/)

!"#$%&"'(")(*$+%(*"",(-%"./+

$7$+'% '#('% '#$% #.&'% .-$+&% 6..)% F9% % "#$+% '#$% 3.+6*'*.+(1% -).0(0*1*'2% ./% J% ,*7$+% _% *&\LIFMfL\LIFM]\LEFMM%R%LIfgMfLLIfgM]LIfEMM%R%IfE9%%V$+3$4%'#$%-).0(0*1*'2%./%)$3$*7*+,%'#$%-)*`$%*&%IfE*/%'#$%3.+'$&'(+'%&'(2&%5*'#%#$)%.)*,*+(1%&$1$3'*.+4%FfE%*/%&#$%&5*'3#$&%'.%'#$%.'#$)%8+.-$+$6%6..)9

;I$J.K&)O9%%"5.%/(&'%/..6%&'.)$&%()$%&*'$6%('%)(+6.;%-.*+'&%(1.+,%(%&')$$'%'#('%*&%'$+%;*1$&%1.+,9S#('%*&%'#$%-).0(0*1*'2%'#('%'#$2%()$%1$&&%'#(+%/*7$%;*1$&%(-()'w%%e*7$+%'#('%'#$%/*)&'%&'.)$%*&%1.3('$6('%'#$%'#)$$%;*1$%;()B$)4%5#('%*&%'#$%-).0(0*1*'2%'#('%'#$%&$3.+6%&'.)$%*&%1$&&%'#(+%/*7$%;*1$&%(5(2w"#$%(+&5$)&% ()$%.07*.8&% /).;% '#$% 6*(,)(;%0$1.54% *+%5#*3#% '#$% &(;-1$% &-(3$% *&%6$-*3'$6% (&% ()$3'(+,1$%./%6*;$+&*.+%IQ%02%IQ4%5*'#%'#$%#.)*`.+'(1%(<*&%,*7*+,%'#$%1.3('*.+%./%'#$%/*)&'%&'.)$%(+6'#$%7$)'*3(1%(<*&%,*7*+,%'#$%1.3('*.+%./%'#$%&$3.+6%&'.)$9%%"#$%&#(6$6%()$(&%3.))$&-.+6%'.%'#$%$7$+''#('%'#$%'5.%()$%;.)$%'#(+%/*7$%;*1$&%(-()'4%(+6%'#$%-).-.)'*.+%./%'#$%)$3'(+,1$%*+%'#$&$%()$(&%*&%IfH9@.+6*'*.+$6%.+%'#$%/*)&'%&'.)$%0$*+,%('%-.*+'%E%.+%'#$%#.)*`.+'(1%(<*&4%'#$%&$3.+6%&'.)$%*&%1.3('$6%(')(+6.;%.+%(%7$)'*3(1%1*+$%'#).8,#%'#*&%-.*+'4%(+6%'#$%-).-.)'*.+%./%'#*&%1*+$%'#('%1*$&%*+%'#$%&#(6$6()$(%*&%Ifh9%%>$'%<%0$%'#$%1.3('*.+%./%'#$%/*)&'%&'.)$4%2%'#$%1.3('*.+%./%'#$%&$3.+69%%"#$%3.+6*'*.+(1-).0(0*1*'2%./%'#$%$7$+'%'#('%%o<%G%2o%p%h4%,*7$+%<4%*&%o<GhofIQ9%%"#*&%3.816%#(7$%0$$+%6$)*7$6%02%/.);*+,'#$%-).0(0*1*'2%./%'#$%$7$+'%o<%G%2o%p%h%(+6%3%m%<%m%3]x%/.)%(%&;(11%-.&*'*7$%x4%'(B*+,%'#$%)('*.%./%'#*&-).0(0*1*'2%'.%'#$%-).0(0*1*'2%./%'#$%$7$+'%3%m%<%m%3]x%'.%.0'(*+%'#$%3.+6*'*.+(1%-).0(0*1*'2%./%'#$$7$+'%o<%G%2o%p%h%,*7$+%3%m%<%m%3]x4%(+6%'(B*+,%'#$%1*;*'%x% %Q9%

%

"#$%*6$(%0$#*+6%3.+6*'*.+(1%-).0(0*1*'*$&%*&%'#('%.+$%#(&%-()'*(1%*+/.);('*.+%.+%5#('%'#$%&'('$%./=('8)$%;(2%0$4%(+6%.+$%5(+'&%'.%3(1381('$%'#$%-).0(0*1*'2%./%$7$+'&%8&*+,%'#*&%-()'*(1%*+/.);('*.+9j+$%5(2%'.%)$-)$&$+'%-()'*(1%*+/.);('*.+%*&%*+%'$);&%./%(%&80/*$16C%$9,94%!%%*&%'#$%/*$16%./%$7$+'&%5#*3#6*&'*+,8*&#% .8'3.;$&% *+% 0.'#% '#$% -(&'% (+6% '#$% /8'8)$4% (+6% (% &80/*$16%$% 3.+'(*+&% $7$+'&%5#*3#6*&'*+,8*&#% .+12% -(&'% .8'3.;$&9% % J% 3.+6*'*.+(1% -).0(0*1*'2% \L9BM% 6$/*+$6% /.)%B%%$% 3(+% 0$*+'$)-)$'$6%/.)%/*<$6%9%(&%(%/8+3'*.+%/).;%$%*+'.%TQ4IU9%%".%$;-#(&*`$%'#*&4%3.+6*'*.+(1%-).0(0*1*'*$&

Page 56: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)1233()4$5&)L1!777777777777777777777777777777777777777777777777777777777777777777777777777!

()$%&.;$'*;$&%5)*''$+%\L9$M4%(+6%$%*&%'$);$6%'#$%10'&*+$#1&0!"%#4%.)%(%/(;*12%./%$7$+'&%5*'#%'#$-).-$)'2%'#('%2.8%B+.5%5#$'#$)%.)%+.'%'#$2%#(--$+$6%('%'#$%'*;$%2.8%()$%/.);*+,%'#$%3.+6*'*.+(1-).0(0*1*'29

;I$J.K&)39% L3.+'*+8$6M%:/%$%R%WH4E4WV"4VVX4W""4"VXX4%&.%'#('%$7$+'&% *+%$%6$&3)*0$% '#$.8'3.;$%./%'#$%/*)&'%3.*+%'.&&4%'#$+%\LVVWVV4V"XM%R%\LVVMfL\LVVM]\LV"MM%R%y%*&%'#$%-).0(0*1*'2./%#$(6&%.+%'#$%&$3.+6%'.&&4%,*7$+%#$(6&%.+%'#$%/*)&'%'.&&9%%:+%'#*&%$<(;-1$4%'#$%3.+6*'*.+(1%-).0(0*1*'2./%(%#$(6%.+%'#$%&$3.+6%'.&&%$A8(1&%'#$%8+3.+6*'*.+(1%-).0(0*1*'2%./%'#*&%$7$+'9%%:+%'#*&%3(&$4%'#$.8'3.;$%./%'#$%/*)&'%3.*+%'.&&%-).7*6$&%+.%*+/.);('*.+%.+%'#$%-).0(0*1*'*$&%./%#$(6&%/).;%'#$%&$3.+63.*+4% (+6% '#$% '5.% '.&&$&% ()$% &(*6% '.% 0$% "#$#1"#1.$--7! 103%,%03%0#9% % :/% $' % RWH4E4WV"4"VX4WVVX4W""X4WVVX34W""X3X4%'#$%/(;*12%./%$7$+'&%'#('%6$'$);*+$%'#$%+8;0$)%./%#$(6&'#('%.338)%*+%'5.%'.&&$&%5*'#.8'%)$,()6%/.)%.)6$)4%'#$+%'#$%3.+6*'*.+(1%-).0(0*1*'2%./%#$(6&%.+%'#$/*)&'%'.&&4%,*7$+%('%1$(&'%.+$%#$(64%*&%\LWV"4VVXW""X3M%R%L\LV"M]\LVVMMfLIG\L""MMR%FfE9%%"#$+4'#$%3.+6*'*.+(1%-).0(0*1*'2%./%#$(6&%.+%'#$%/*)&'% '.&&%,*7$+%('% 1$(&'%.+$%#$(6%*&%+.'%$A8(1% '.%'#$8+3.+6*'*.+(1%-).0(0*1*'2%./%#$(6&%.+%'#$%/*)&'%'.&&9

;I$J.K&)19%%L3.+'*+8$6M%D8--.&$%$%R%WH4E4LI4M4LG4IUX%*&%'#$%PG/*$16%3.))$&-.+6*+,%'.%'#$$7$+'%'#('%'#$%*+6$<%$<3$$6&%I4%(+6%1$'%"%6$+.'$%'#$%_.)$1%PG/*$16%3.+'(*+*+,%(11%'#$%.-$+%*+'$)7(1&9"#$%8+3.+6*'*.+(1%-).0(0*1*'2%\LL&4MM%R%$G&fF% *;-1*$&%\LLI4MM%R%$GIfF%R%Q9gQgh9% %"#$%3.+6*'*.+(1-).0(0*1*'2%./%LF4M%,*7$+%LI4M%&('*&/*$&%\LLLF4MLI4MM%R%%\LLI4MLF4MMf\LLI4MM%R%$GIf$GIfF%R%Q9gQghp%\LLF4MM%R%Q9Egsz9%%%"#$%3.+6*'*.+(1%(+6%8+3.+6*'*.+(1%-).0(0*1*'*$&%()$%+.'%'#$%&(;$4%&.%'#('%'#$3.+6*'*.+*+,%$7$+'%-).7*6$&%*+/.);('*.+%.+%'#$%-).0(0*1*'2%./%LF4M9

b.)%(%-).0(0*1*'2%&-(3$%LE(!4\M4%&8--.&$%9I499949B%*&%(%/*+*'$%-()'*'*.+%./%EC%*9$94%9*9?%R%H%(+6

9*% R% E9% % "#$% -()'*'*.+% ,$+$)('$&% (% /*+*'$% /*$16%$%%!% 9% % b).;% '#$% /.);81(% \L9BM% R=1I

\L9BM\LBM%&('*&/*$6%02%3.+6*'*.+(1%-).0(0*1*'*$&4%.+$%#(&%/.)%(+%$7$+'%,%%!%%'#$%/.);81(

\L,M%R% \L,o9*M\L9*M9=

1I

"#*&%*&%./'$+%8&$/81%*+%3(1381('*+,%-).0(0*1*'*$&%*+%(--1*3('*.+&%5#$)$%'#$%3.+6*'*.+(1%-).0(0*1*'*$&()$%(7(*1(01$9%E9E9z9%:+%(%-).0(0*1*'2%&-(3$%LE4!4\M4%'#$%3.+3$-'%./%(%3.+6*'*.+(1%-).0(0*1*'2%\L9oBM%./%9%%!

,*7$+%(+%$7$+'%B%*+%(%PG/*$16%$%%!%%3(+%0$%$<'$+6$6%'.%3(&$&%5#$)$%\LBM%R%Q%02%6$/*+*+,%\L9BM(&%'#$%1*;*'%./%\L9B*M%/.)%&$A8$+3$&%B*%%$%'#('%&('*&/2%\LB*M%p%Q%(+6%B*% %B4%-).7*6$6%'#$%1*;*'$<*&'&9%%:/%5$%/*<%94%(+6%3.+&*6$)%\L9BM%(&%(%;$(&8)$%6$/*+$6%/.)%B%%$4%'#*&%;$(&8)$%.07*.8&12&('*&/*$&%\L9BM%%\LBM4%&.%'#('%*'%*&%(0&.18'$12%3.+'*+8.8&%5*'#%)$&-$3'%'.%\L_M9%%"#$+4%"#$.)$;%E9F

Page 57: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)123*()4$5&)L6!777777777777777777777777777777777777777777777777777777777777777777777777777!

*;-1*$&%'#('%'#$)$%$<*&'&%(%/8+3'*.+%\L9oM%%CILE4$4\M%&83#%'#('%\L9BM%R \L9o&M\L6&M9%%S$-#(7$%5)*''$+%'#*&%/8+3'*.+%(&%*/%*'%5$)$%(%3.+6*'*.+(1%-).0(0*1*'2%./%J%,*7$+%'#$%Y$7$+'Z%W&X4%(+6%*'3(+%0$%,*7$+%'#*&%*+'$)-)$'('*.+9%%:/%B%%$%%*&%(+%('.;4%'#$+%'#$%;$(&8)(0*1*'2%./%\L9oM%5*'#%)$&-$3''.%$%%)$A8*)$&%'#('%*'%0$%3.+&'(+'%/.)%&%%B4%&.%'#('%\L9BM%R%\L9o&M\LBM%/.)%(+2%&%%B4%(+6%5$%3(+*+&'$(6%5)*'$%\L9BM%R%\L9oBM\LBM4%&('*&/2*+,%'#$%6$/*+*'*.+%./%3.+6*'*.+(1%-).0(0*1*'2%$7$+%*/%\LBMR%Q9

;I$J.K&) 69% L3.+'*+8$6M% % @.+&*6$)% !% R%""4% '#$% -).683'% _.)$1% PG/*$16% .+%]4% (+6%$% R"WH9]X4%'#$%PG/*$16%3.))$&-.+6*+,%'.%#(7*+,%3.;-1$'$%*+/.);('*.+%.+%'#$%1$7$1%./%'#$%*+6$<%.+'#$%/*)&'%6(2%(+6%+.%*+/.);('*.+%.+%'#$%&$3.+6%6(29%%D8--.&$%\LL&4M^L'4MM%R%FfLI]$&]'M9%%"#*&%*&%(-).0(0*1*'2% .+% '#$&$% .-$+% *+'$)7(1&% '#('% $<'$+6&% '.% !C% 7$)*/2*+,% '#*&% '(B$&% &.;$% 5.)B9% % "#$3.+6*'*.+(1%-).0(0*1*'2%./%L&4M^L'4M%,*7$+%'#$%$7$+'%L)4M^LQ4M%%$%(+6%&%%)%$A8(1&%\LL)4M^L'4MM6*7*6$6%02%\LL)4M^LQ4MM4%.)%LI]$)MfLI]$)]'M9%%"#$%3.+6*'*.+(1%-).0(0*1*'2%./%L&4M^L'4M%,*7$+%'#$$7$+'%L)4)]xM^LQ4M%%$%(+6%&%%)%*&%TIfLI]$)]'M%G%IfLI]$)]x]'MUfTIfLI]$)M%G%IfLI]$)]xMU9%%"#$%1*;*'%./%'#*&$<-)$&&*.+%(&%x% %Q%*&%$)LI]$)MFfLI]$)]'MF%R%\LL&4M^L'4MoW)X^LQ4MMC%'#*&%/8+3'*.+%./%)%*&%(1&.%'#$*+'$,)(+6%'#('%&('*&/*$&%"#$.)$;%E9F9%%=.'$%'#('%\LL&4M^L'4MoW)X^LQ4MM%%\LL&4M^LQ4MM%R%IfLI]$&M4&.%'#('%'#$%3.+6*'*.+*+,%$7$+'%3.+7$2&%*+/.);('*.+%(0.8'%'#$%-).0(0*1*'2%./%L&4M^L'4M9

1=6=))E:9:?E:?,9C)?FG;4;FG;F,;)9FG)<;4;9:;G):<?9CE

E9H9I9%%@.+&*6$)%(%-).0(0*1*'2%&-(3$%LE(!4\M9%%a7$+'&%9%(+6%,%*+%!%%()$%"#$#1"#1.$--7!103%,%03%0#*/%\L9,M%R%\L9M\L,M9%%b).;%'#$%6$/*+*'*.+%./%3.+6*'*.+(1%-).0(0*1*'24%*/%9%(+6%,%()$%&'('*&'*3(112*+6$-$+6$+'% (+6% \L9M% p% Q4% '#$+% \L,9M% R% \L9,Mf\L9M% R% \L,M9% % "#8&4% 5#$+%9% (+6%,% ()$&'('*&'*3(112%*+6$-$+6$+'4%B+.5*+,%'#('%9%.338)&%*&%8+#$1-/81%*+%3(1381('*+,%'#$%-).0(0*1*'2%'#('%,.338)&9% % "#$% *6$(% ./% &'('*&'*3(1% *+6$-$+6$+3$% ./% $7$+'&% #(&% (+% $<(3'% (+(1.,8$% *+% (% 3.+3$-'% ./&'('*&'*3(1%*+6$-$+6$+3$%./%&80/*$16&9%%>$'%#%R%WH494934EX%(+6%(%R%WH4,4,34EX%0$%'#$%&80/*$16&%./!% ,$+$)('$6% 02%9% (+6%,4% )$&-$3'*7$129% %t$)*/2% (&% (+% $<$)3*&$% '#('% */%9% (+6%,% ()$% &'('*&'*3(112*+6$-$+6$+'4%'#$+%&.%()$%(+2%-(*)%./%$7$+'&%9%%#%(+6%,%%(9%%"#$+4%.+$%3(+%&(2%'#('%'#$%&80/*$16&#% (+6% (% ()$% &'('*&'*3(112% *+6$-$+6$+'9% % j+$% 3(+% $<'$+6% '#*&% *6$(% (+6% '(1B% (0.8'% &'('*&'*3(1*+6$-$+6$+3$%*+%(%3.11$3'*.+%./%&80/*$16&9%%>$'%F)6$+.'$%(+%*+6$<%&$'4%5#*3#%;(2%0$%/*+*'$4%3.8+'(01$4.)%+.+G3.8+'(01$9%%>$'%!*%6$+.'$%(%PG&80/*$16%./%!%%L!*%%!M%/.)%$(3#%*%%F9%%"#$%&80/*$16&%!*%%()$

+)#)$--7!"#$#1"#1.$--7!103%,%03%0.%!LlD:M%*/%(+6%.+12%*/%\L 9?M%R% \L9?M%%/.)%(11%/*+*'$%U@.

@.

%F)(+6%9?%%!?%/.)%?%%U9%%J&%*+%'#$%3(&$%./%&'('*&'*3(1%*+6$-$+6$+3$%0$'5$$+%'5.%$7$+'&%L&80/*$16&M4'#$%3.+3$-'%./%lD:%3(+%0$%&'('$6%*+%'$);&%./%3.+6*'*.+(1%-).0(0*1*'*$&K%%!*%/.)%*%%F%()$%;8'8(112

Page 58: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)1231()4$5&)LL!777777777777777777777777777777777777777777777777777777777777777777777777777!

&'('*&'*3(112%*+6$-$+6$+'%LlD:M%*/4%/.)%(11%*%%F4%/*+*'$%U)%FOW*X%(+6%9?%%!?%/.)%?%%W*XU4%.+$%#(&

\L9* 9?M%R%\L9*M4%&.%'#$%3.+6*'*.+(1%(+6%8+3.+6*'*.+(1%-).0(0*1*'*$&%()$%'#$%&(;$9%%@.

;I$J.K&)39%%L3.+'*+8$6M%>$'%9%R%WVV4V"X%6$+.'$%'#$%$7$+'%./%(%#$(6%/.)%'#$%/*)&'%3.*+4%,%RWVV4"VX%6$+.'$%'#$%$7$+'%./%(%#$(6%/.)%'#$%&$3.+6%3.*+4%G%R%WVV4""X%6$+.'$%'#$%$7$+'%./%(%;('3#4V%R%WVVX%'#$%$7$+'%./%'5.%#$(6&9%%"#$%'(01$%0$1.5%,*7$&%'#$%-).0(0*1*'*$&%./%7()*.8&%$7$+'&9

;W&'/ 9 , G V 9, 9G ,G 9,G 9V

40TX= y y y IfH IfH IfH IfH IfH IfH

"#$%)$&81'%\L9,M%R%\L9M\L,M%R%IfH%$&'(01*&#$&%'#('%9%(+6%,%()$%&'('*&'*3(112%*+6$-$+6$+'9%%t$)*/2'#('%9%(+6%G%()$%&'('*&'*3(112%*+6$-$+6$+'4%(+6%'#('%,%(+6)G%()$%&'('*&'*3(112%*+6$-$+6$+'4%08'%'#('4Y9,GM%% \L9M\L,M\LGM4% &.% '#(')9(),4% (+6%G% ()$% +.'%lD:9% %t$)*/2% '#('%9% (+6%V% ()$% +.'&'('*&'*3(112%*+6$-$+6$+'9%%

;I$J.K&)69%%L3.+'*+8$6M%c$3(11%'#('%E%R%F%5*'#%!%R%""4%'#$%,*&3).#!_.)$1%PG/*$169%%r$/*+$%)R%WH4X%(+6%'#$%&80/*$16&%!I%R%"^)%(+6%!F%R%)^"4%3.+'(*+*+,%*+/.);('*.+%.+%'#$%*+6$<%1$7$1&%.+'#$%/*)&'%(+6%&$3.+6%6(24% )$&-$3'*7$129% %r$/*+$%$% % '.%0$% '#$%PG/*$16%,$+$)('$6%02% '#$%)$3'(+,1$&LQ4IU^LQ4IU4% LQ4IU^LI4M4LI4M^LQ4IU4% (+6LI4M^LI4M9% % "#$+%$% *&% '#$% &80/*$16% ./%"% 3.+'(*+*+,*+/.);('*.+%.+%5#$'#$)%'#$%*+6*3$&%.+%'#$%'5.%6(2&%()$%(0.7$%.+$9%%r$/*+$%!E%'.%0$%'#$%PG&80/*$16./%""% ,$+$)('$6% 02% &$'&% ./% '#$% /.);%9I^9F%5*'#%9I%%$% (+6%9F%%"C% '#$+%!E% 3.+'(*+&% /811*+/.);('*.+%.+%'#$%&$3.+6%6(2%*+6$<4%08'%.+12%'#$%A8(1*'('*7$%*+/.);('*.+%.+%5#$'#$)%'#$%/*)&'%6(2*+6$<%*&%(0.7$%.+$9%%D8--.&$%\LL&4M^L'4MM%R%$G&G'9%%"#$+%W!I4!FX%()$%lD:9%%V.5$7$)4%W!I4!EX%()$%+.'*+6$-$+6$+'9%

;I$J.K&)P9% %@.+&*6$)%E%R%WQ4%I4%F4%E4%H4%h4%g4%sX4%5*'#%!% %$A8(1% '.%(11% &80&$'&%./%E9% %J&%(&#.)'#(+64%1$'%QIFE%6$+.'$%WQ4I4F4EX4%$'39%%r$/*+$%'#$%&80/*$16&%

!I%R%WH4QIFE4Hhgs4EX4'!F%R%WH4FEHh4QIgs4EX4%!E%R%WH4QFHg4IEhs4EX4!H%R%WH4QI4FE4Hhgs4QIFE4FEHhgs4QIHhgs4EX4%

%%''!h%R%WH4QI4FE4Hh4gs4QIFE4QIHh4QIgs4FEHh4FEgs4Hhgs4QIFEHh4QIFEgs4QIHhgs4FEHhgs4EX4%%%''!g%R%WH4Qg4Is4FH4Eh4QIgs4QFHg4QEhg4IFHs4IEhs4FEHh4IFEHhs4QFEHhg4QIEhgs4QIFHgs4EX9%%

"#$%/*$16%!H%*&%(%*%'10%+%0#!./%'#$%/*$16%!I%L*9$94%!I%%!HM4%(+6%3(+%0$%&(*6%'.%3.+'(*+%;.)$%*+/.);('*.+'#(+%!I9%%"#$%/*$16%!h%*&%(%+)#)$-!*%'10%+%0#!./%!I%(+6%!F%L*9$94%!I!F%%!hM4%(+6%*&%*+%/(3'%'#$%&;(11$&';8'8(1%)$/*+$;$+'9%%:'%3.+'(*+&%(11%'#$%*+/.);('*.+%(7(*1(01$%*+%$*'#$)%!I%.)%!F9%%D*;*1()124%!g%*&%(

Page 59: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)1236()4$5&)LM!777777777777777777777777777777777777777777777777777777777777777777777777777!

;8'8(1%)$/*+$;$+'%./%!F%(+6%!E9%%"#$%*+'$)&$3'*.+%./%!h%(+6%!g%*&%'#$%/*$16%!FC%*'%*&%'#$%3.;;.+*+/.);('*.+%(7(*1(01$%*+%!h%(+6%!g9%%:/4%/.)%$<(;-1$4%!h%3#()(3'$)*`$6%'#$%*+/.);('*.+%(7(*1(01$%'.%.+$$3.+.;*3%(,$+'4%(+6%!g%3#()(3'$)*`$6%'#$%*+/.);('*.+%(7(*1(01$%'.%(%&$3.+6%(,$+'4%'#$+%!F%5.8163#()(3'$)*`$%'#$%3.;;.+%*+/.);('*.+%8-.+%5#*3#%'#$2%3.816%0(&$%3.+'*+,$+'%3.+')(3'&9%%D8--.&$\L*M%R%Ifv9%%"#$+%W!I4%!F4%!EX%()$%lD:9%%a9,94%\LQIFEFEHhM%R%\LQIFEQFHgM%R%\LQIFEFEHhQFHgMR%\LQIFEM%R%IfF9%%V.5$7$)4%W!I4%!HX%()$%+.'%*+6$-$+6$+'C%$9,94%I%R%\LQIFEQIM%%\LQIFEM%R%IfF9%%

b.)%!)%F4%1$'%!!%6$+.'$%'#$%&;(11$&'%PG/*$16%3.+'(*+*+,%!*%/.)%(11%*%%!9%%"#$+%lD:%&('*&/*$&%'#$/.11.5*+,% '#$.)$;4% 5#*3#% -).7*6$&% (% 8&$/81% 3)*'$)*.+% /.)% 6$'$);*+*+,% 5#$'#$)% (% 3.11$3'*.+% ./&80/*$16&%*&%lD:K

:-&T0&J)1=1=%:/%!*%%()$%lD:%/.)%*%%F4%(+6%!)%FZW*X4%'#$+%W!*4!!X%()$%lD:9%%b8)'#$)4%!*%/.)%*F()$%lD:%*/%(+6%.+12%*/%W!*4!FO*X%()$%lD:%/.)%(11%*F9%%

;I$J.K&)L9%%L3.+'*+8$6M%:/%!)R%WF4EX4%'#$+%!l%%!g4%(+6%\LQIFE9M%R%y%/.)%$(3#%9%%!!9%%

E9H9F9%%"#$%*6$(%./%*%,%$#%3!#*1$-"!*&%'#('%(+%$<-$)*;$+'4%&83#%(&%(%3.*+%'.&&4%*&%)$-1*3('$6%.7$)(+6%.7$)9%%:'%*&%3.+7$+*$+'%'.%#(7$%3.;;.+%-).0(0*1*'2%&-(3$%*+%5#*3#%'.%6$&3)*0$%'#$%.8'3.;$&%./1(),$)%(+6%1(),$)%$<-$)*;$+'&%5*'#%;.)$%(+6%;.)$%)$-1*3('*.+&9%%"#$%+.'('*.+%/.)%)$-$('$6%')*(1&%5*110$%&*;*1()%'.%'#('%*+').683$6%*+%'#$%6$/*+*'*.+%./%;8'8(1%&'('*&'*3(1%*+6$-$+6$+3$9%%>$'%F)6$+.'$%(/*+*'$%.)%3.8+'(01$%*+6$<%&$'%./%')*(1&4%E*%(%&(;-1$%&-(3$%/.)%')*(1%*4%(+6%$*%(%PG/*$16%./%&80&$'&%./%E*9=.'$%'#('%YE*4$*M%;(2%0$%'#$%&(;$%/.)%(11%*9%%J&&8;$%'#('%LE*4%$*M%*&%'#$%)$(1%1*+$%5*'#%'#$%_.)$1%PG/*$164.)%(%3.8+'(01$%&$'%5*'#%'#$%/*$16%./%(11%&80&$'&4%.)%(%-(*)%5*'#%3.;-()(01$%;('#$;('*3(1%-).-$)'*$&L*9$94%E*%*&%(%3.;-1$'$%&$-()(01$%;$')*3%&-(3$%(+6%$*%*&%*'&%_.)$1%/*$16M9%%>$'%'%R%L&I4&F4999M%R%L&*%K%*FM

6$+.'$%(+%.)6$)$6%&$A8$+3$%./%.8'3.;$&%./%')*(1&4%(+6%EF%R%[*F)E*%6$+.'$%'#$%&(;-1$%&-(3$%./%'#$&$

&$A8$+3$&9%%>$'%!F%R%*F$*%6$+.'$%'#$%PG/*$16%./%&80&$'&%./%EF%,$+$)('$6%02%'#$%'101#%!*%.#$05-%"

5#*3#%()$%&$'&%./%'#$%/.);%L[*U%9*M^L[*FZU%E*M%5*'#%U%(%/*+*'$%&80&$'%./%F)(+6%9*%%$*%/.)%*%%U9"#$%3.11$3'*.+%!F%*&%3(11$6%'#$%,*&3).#!A2'1%-3%./%&80&$'&%./%EF9

;I$J.K&)Q9%%F)R%WI4F4EX4%E*%R%WQ4IX4%$*%R%WH4WQX4WIX4EX%*&%(%&(;-1$%&-(3$%/.)%(%3.*+%'.&&4%3.6$6YIN%*/%#$(6&%(+6%YQN%*/%'(*1&9%%"#$+%EF%R%W&I&F&E&*%%E*X%R%WQQQ4%QQI4%QIQ4%QII4%IQQ4%IQI4%IIQ4%IIIX45#$)$%QQQ%*&%&#.)'#(+6%/.)%'#$%$7$+'%WQX^WQX^WQX4%(+6%&.%/.)'#4%*&%'#$%&(;-1$%&-(3$%/.)%'#)$$%3.*+'.&&$&9%%"#$%/*$16%!F%*&%'#$%/(;*12%./%(11%&80&$'&%./%EF9

b.)%(+2%&80&$'%U)./%F4%6$/*+$%EU%R)[*U)E*%(+6%$U%R%*U$*9%%"#$+4%$U%*&%'#$%-).683'%PG/*$16.+%EU9%%r$/*+$%!U%'.%0$%'#$%PG/*$16%.+%EF%,$+$)('$6%02%&$'&%./%'#$%/.);%9[EFZU)/.)%9)%$U9%%"#$+%$U

Page 60: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)123L()4$5&)LO!777777777777777777777777777777777777777777777777777777777777777777777777777!

(+6%!U%3.+'(*+%$&&$+'*(112%'#$%&(;$%*+/.);('*.+4%08'%$U% *&%(%/*$16%./%&80&$'&%./%EU%(+6%!0% *&%(3.))$&-.+6*+,%/*$16%./%&80&$'&%./%EF%5#*3#%3.+'(*+&%+.%*+/.);('*.+%.+%$7$+'&%.8'&*6$%./%U9%%D8--.&$\F%*&%(%-).0(0*1*'2%.+%LEF4%!FM9%%%"#$%*%"#*1.#1&0!./%\F%'.%LEU4$UM%*&%(%-).0(0*1*'2%\U%6$/*+$6%/.)%9)$U%02%\UL9M%R%\FL9^EFZUM9%%"#$%/.11.5*+,%)$&81'%$&'(01*&#$&%(%1*+B%0$'5$$+%6*//$)$+'%)$&')*3'*.+&K

:-&T0&J) 1=6=% :/%!) %U) (+6% \!4% \U% ()$% )$&')*3'*.+&% ./% \F4% '#$+% \!% (+6% \U% &('*&/2% '#$.&+,$#161-1#7!.&031#1&0!'#('%\!L9M%R%\UL9[EUZ!M%/.)%(11%9)%!!9

"#$)$%*&%'#$+%(%/8+6(;$+'(1%)$&81'%'#('%$&'(01*&#$&%'#('%5#$+%-).0(0*1*'*$&%()$%6$/*+$6%.+%(11%/*+*'$&$A8$+3$&%./%')*(1&%(+6%()$%3.;-('*01$4%'#$+%'#$)$%$<*&'&%(%-).0(0*1*'2%6$/*+$6%.+%'#$%*+/*+*'$%&$A8$+3$./%')*(1&%'#('%2*$16&%$(3#%./%'#$%-).0(0*1*'*$&%/.)%(%/*+*'$%&$A8$+3$%(&%(%)$&')*3'*.+9

:-&T0&J)1=L=%%:/%\U%.+%LEU4$UM%/.)%(11%/*+*'$%U)%F%&('*&/2%'#$%3.;-('*0*1*'2%3.+6*'*.+4%'#$+%'#$)$$<*&'&%(%8+*A8$%\F%.+%LEF4!FM%&83#%'#('%$(3#%\U%*&%(%)$&')*3'*.+%./%\F9%%

"#*&%)$&81'%,8()(+'$$&%'#('%*'%*&%;$(+*+,/81%'.%;(B$%-).0(0*1*'2%&'('$;$+'&%(0.8'%$7$+'&%&83#%(&%Y(+*+/*+*'$%+8;0$)%./%#$(6&%*+%)$-$('$6%3.*+%'.&&$&N99D8--.&$%')*(1&%LE*4$*4\*M%*+6$<$6%02%*%*+%(%3.8+'(01$%&$'%F%()$%;8'8(112%&'('*&'*3(112%*+6$-$+6$+'9

b.)%/*+*'$%U)%F4%1$'%$U%6$+.'$%'#$%-).683'%PG/*$16%.+%EU9%%"#$+%lD:%*;-1*$&%'#('%'#$%-).0(0*1*'2%./

(%&$'%[*U)9*%%$U% &('*&/*$&%\UL[*U)9*M%R% \?L9?M9% % %"#$+4% '#$%3.;-('*0*1*'2%3.+6*'*.+% *+@.

"#$.)$;%E9E%*&%&('*&/*$64%(+6%'#('%)$&81'%*;-1*$&%'#$%$<*&'$+3$%./%(%-).0(0*1*'2%\F%.+%LEF4!FM%5#.&$)$&')*3'*.+&%'.%LEU4$UM%/.)%/*+*'$%U)%F%()$%'#$%-).0(0*1*'*$&%\U9

E9H9E9% %"#$%(&&8;-'*.+%./%&'('*&'*3(112%*+6$-$+6$+'%)$-$('$6%')*(1&% *&%(%+('8)(1%.+$%/.)%;(+2&'('*&'*3(1% (+6% $3.+.;$')*3% (--1*3('*.+&%5#$)$% '#$% 6('(% 3.;$&% /).;% )(+6.;% &(;-1$&% /).;% '#$-.-81('*.+4% &83#% (&% &8)7$2&% ./% 3.+&8;$)&% .)% /*);&9% % "#*&% (&&8;-'*.+% #(&% ;(+2% -.5$)/81*;-1*3('*.+&4%(+6%5*11%0$%8&$6%'.%,$'%;.&'%./%'#$%)$&81'&%./%0(&*3%$3.+.;$')*3&9%%V.5$7$)4%*'%*&%(1&.3.;;.+%*+%$3.+.;$')*3&%'.%5.)B%5*'#%(,,)$,('$%'*;$%&$)*$&%6('(9%%:+%'#$&$%6('(4%$(3#%-$)*.6%./.0&$)7('*.+%3(+%0$%*+'$)-)$'$6%(&%(%+$5%')*(19%%"#$%(&&8;-'*.+%./%&'('*&'*3(1%*+6$-$+6$+3$%(3).&&'#$&$%')*(1&%*&%8+1*B$12%*+%;(+2%3(&$&4%0$3(8&$%*+%;.&'%3(&$&%)$(1%)(+6.;%$//$3'&%6.%+.'%3.+7$+*$+'121*;*'% '#$;&$17$&% '.% &*+,1$% '*;$% -$)*.6&9% % "#$% A8$&'*.+% 0$3.;$&% 5#$'#$)% '#$)$% ()$% 5$(B$)(&&8;-'*.+&%'#('%'*;$%&$)*$&%6('(%()$%1*B$12%'.%&('*&/2%'#('%()$%&'*11%&').+,%$+.8,#%'.%,$'%&.;$%./%'#$0(&*3% &'('*&'*3(1% '#$.)$;&9% % :'% '8)+&% .8'% '#('% '#$)$% ()$% A8*'$% ,$+$)(1% 3.+6*'*.+&4% 3(11$6%+1B105.&031#1&0"4%'#('%()$%$+.8,#%'.%2*$16%;(+2%./%'#$%B$2%)$&81'&9%%"#$%*6$(%0$#*+6%'#$&$%3.+6*'*.+&%*&%'#('8&8(112% $7$+'&% '#('% ()$% /()% (-()'% *+% '*;$% ()$% +$()12% *+6$-$+6$+'4% 0$3(8&$% *+'$)7$+*+,% &#.3B&.7$)5#$1;%'#$%.16$)%#*&'.)2%*+%6$'$);*+*+,%'#$%1('$)%$7$+'9%%"#*&%*6$(%*&%/.);(1*`$6%*+%@#(-'$)%H9

Page 61: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)123M()4$5&)LP!777777777777777777777777777777777777777777777777777777777777777777777777777!

1=L=))<9FGA!)>9<?9BC;E()G?E:<?B\:?AF)#\F,:?AFE()9FG);]4;,:9:?AFE

E9h9I9%%J%*$03&+!/$*1$6-%!%*&%(%;$(&8)(01$%)$(1G7(18$6%/8+3'*.+%.+%(%-).0(0*1*'2%&-(3$%LE4!4\M4.)%KD%%9%%"#$+%$(3#%&'('$%./%=('8)$%&%6$'$);*+$&%(%7(18$%L&M%./%'#$%)(+6.;%7()*(01$4%'$);$6%*'&*%$-1:$#1&0%*+%&'('$%&9%%S#$+%'#$%/8+3'*.+(1%+('8)$%./%'#$%)(+6.;%7()*(01$%*&%'.%0$%$;-#(&*`$64%*'%*&6$+.'$6%LM4%.)%&*;-12%9%%S#$+%*'&%7(18$&%.)%)$(1*`('*.+&%()$%8&$64%'#$2%()$%6$+.'$6%L&M%.)%<9%%b.)$(3#%&$'%B%%"4%'#$%-).0(0*1*'2%./%'#$%$7$+'%'#('%'#$%)$(1*`('*.+%./%%*&%3.+'(*+$6%*+%B%*&%5$11G6$/*+$6(+6%$A8(1&%?qLBM%%?LGILBMM4%5#$)$%?q% *&% '$);$6% '#$%-).0(0*1*'2% 103).%3%.+%%02% '#$%)(+6.;7()*(01$%9%%j+$%3(+%#(7$%;(+2%)(+6.;%7()*(01$&%6$/*+$6%.+%'#$%&(;$%-).0(0*1*'2%&-(3$C%(+.'#$);$(&8)(01$%/8+3'*.+%2%R%dL&M%6$/*+$&%(%&$3.+6%)(+6.;%7()*(01$9%%:'%*&%*;-.)'(+'%*+%5.)B*+,%5*'#)(+6.;%7()*(01$&%'.%B$$-%*+%;*+6%'#('%'#$%)(+6.;%7()*(01$%*'&$1/%*&%(%/8+3'*.+%./%&'('$&%./%=('8)$4%(+6'#('% .0&$)7('*.+&% ()$% ./% )$(1*`('*.+&% ./% '#$% )(+6.;% 7()*(01$9% % "#8&4% 5#$+% .+$% '(1B&% (0.8'3.+7$),$+3$%./%(%&$A8$+3$%./%)(+6.;%7()*(01$&4%.+$%*&%(3'8(112%'(1B*+,%(0.8'%3.+7$),$+3$%./%(&$A8$+3$%./%/8+3'*.+&4%(+6%+.'*.+&%./%6*&'(+3$%(+6%31.&$+$&&%+$$6%'.%0$%/.);81('$6%(&%6*&'(+3$%(+631.&$+$&&%./%/8+3'*.+&9%%l81'*-12*+,%(%)(+6.;%7()*(01$%02%(%&3(1()4%.)%(66*+,%)(+6.;%7()*(01$&4)$&81'&%*+%(+.'#$)%)(+6.;%7()*(01$9%%"#$+4%'#$%/(;*12%./%)(+6.;%7()*(01$&%/.);&%(%-10%$*!/%.#&*",$.%9%%:+%(66*'*.+4%-).683'&%./%)(+6.;%7()*(01$&%()$%(,(*+%)(+6.;%7()*(01$&4%&.%'#('%'#$%/(;*12%./)(+6.;%7()*(01$&%/.);&%(+%C6%-1$0!5*&),!)03%*!+)-#1,-1.$#1&09%%"#$%/(;*12%./%)(+6.;%7()*(01$&%*&(1&.%31.&$6%8+6$)%+$@&*1:$#1&04% &.% '#('%|KD%%%6$/*+$6%02%|L&M%R%;(<LL&M4dL&MM% /.)% )(+6.;7()*(01$&%%(+6%d%*&%(,(*+%(%)(+6.;%7()*(01$9%%"#$+4%'#$%/(;*12%./%)(+6.;%7()*(01$&%/.);&%(%-$##1.%5*'#%)$&-$3'%'.%'#$%-()'*(1%.)6$)%%%dL*9$94%L&M%%dL&M%(1;.&'%&8)$12M9%

E9h9F9%%"#$%'$);%+%$")*$6-%!*+%'#$%6$/*+*'*.+%./%(%)(+6.;%7()*(01$%;$(+&%'#('%/.)%$(3#%&$'%9%*+'#$%_.)$1%PG/*$16%"%./%&80&$'&%./%'#$%)$(1%1*+$4%'#$%*+7$)&$%*;(,$%GIL9M%%W&EL&M9X%*&%*+%'#$PG/*$16%!%./%&80&$'&%./%'#$%&(;-1$%&-(3$%E9%%"#$%(&&8;-'*.+%./%;$(&8)(0*1*'2%*&%(%;('#$;('*3(1'$3#+*3(1*'2%'#('%$+&8)$&%'#('%-).0(0*1*'2%&'('$;$+'&%(0.8'%'#$%)(+6.;%7()*(01$%()$%;$(+*+,/819%%S$&#(11% +.'%;(B$%(+2%$<-1*3*'% )$/$)$+3$% '.%;$(&8)(0*1*'2% *+%0(&*3% $3.+.;$')*3&4% (+6% &#(11% (15(2&(&&8;$%*;-1*3*'12%'#('%'#$%)(+6.;%7()*(01$&%5$%()$%6$(1*+,%5*'#%()$%;$(&8)(01$9

E9h9E9%%"#$%-).0(0*1*'2%'#('%(%)(+6.;%7()*(01$%%#(&%(%)$(1*`('*.+%*+%(%&$'%9%%"%*&%,*7$+%02

%% bL9M%%\LGIL9MM%%\LW&EL&M9XM9

"#$%/8+3'*.+%b%*&%(%-).0(0*1*'2%.+%"C%*'%*&%6$/*+$6%*+%-()'*381()%/.)%#(1/G.-$+%*+'$)7(1&%./%'#$%/.);%9R%LG4<U4%*+%5#*3#%3(&$%bLLG4<UM%*&%(00)$7*('$6%'.%bL<M%(+6%*&%3(11$6%'#$%31"#*16)#1&0!')0.#1&0!L.)4.)+)-$#1/%!31"#*16)#1&0!')0.#1&0D!EFGH!./%9%%%b).;%'#$%-).-$)'*$&%./%(%-).0(0*1*'24%'#$%6*&')*08'*.+/8+3'*.+%#(&%'#$%-).-$)'*$&

Page 62: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)123O()4$5&)LQ!777777777777777777777777777777777777777777777777777777777777777777777777777!

L*M%bLGM%R%Q%(+6%bL]M%R%I9L**M%bL<M%*&%+.+G6$3)$(&*+,%*+%<4%(+6%3.+'*+8.8&%/).;%'#$%)*,#'9L***M%bL<M%#(&%('%;.&'%(%3.8+'(01$%+8;0$)%./%?8;-&4%(+6%*&%3.+'*+8.8&%$<3$-'%('%'#$&$%?8;-&9L\.*+'&%5*'#.8'%?8;-&%()$%3(11$6%.&0#10)1#7!-.*+'&9M

@.+7$)&$124%(+2%/8+3'*.+%b%'#('%&('*&/*$&%L*M%(+6%L**M%6$'$);*+$&%8+*A8$12%(%-).0(0*1*'2%b%.+%-9%%"#$"),,&*#!./%'#$%6*&')*08'*.+%b%*&%'#$%&;(11$&'%31.&$6%&$'%9%%"%&83#%'#('%bL9M%R%I9

;I$J.K&)L9%L3.+'*+8$6M%"#$%&'(+6()6%+.);(1%@rb%*&%LIM%R% 4%.0'(*+$6%02B

LFuMIfF% " FfF3"

*+'$,)('*+,% '#$% 6$+&*'2%L&M% R% 9% %j'#$)% $<(;-1$&% ()$% '#$%@rb% /.)% '#$% &'(+6()6LFuMIfF% " FfF

$<-.+$+'*(1%6*&')*08'*.+4%bL<M%R%I%G%$G<%/.)%<%p%Q4%(+6%'#$%@rb%/.)%'#$%1.,*&'*3%6*&')*08'*.+4%bL<M%R

IfLI]$G<M9%%J+%$<(;-1$%./%(%@rb%'#('%#(&%?8;-&%*&%bL<M%R%I%G%$G<fF%G% /.)%<%p%Q9

=I 3L=BMfF=I

E9h9H9% %:/%b%*&%(0&.18'$12%3.+'*+8.8&%5*'#%)$&-$3'% '.%(%PG/*+*'$%;$(&8)$%k%.+%C% *9$94%b%,*7$&-).0(0*1*'2%`$).%'.%(+2%&$'%'#('%#(&%kG;$(&8)$%`$).4%'#$+%L02%'#$%c(6.+G=*B.62;%'#$.)$;M%'#$)$$<*&'&%(%)$(1G7(18$6%/8+3'*.+%/%.+%4%3(11$6%'#$%3%0"1#7!L.)%,*&6$61-1#7!3%0"1#7!')0.#1&0D!,3'H!./%4&83#%'#('%

% %bL9M%R% /L<MkL6<M%,/.)%$7$)2%J%%"9%%S*'#%'#$%-.&&*01$%$<3$-'*.+%./%(%&$'%./%kG;$(&8)$%`$).4%b%*&%6*//$)$+'*(01$%(+6%'#$6$)*7('*7$%./% '#$%6*&')*08'*.+%,*7$&% '#$%6$+&*'24% /L<M%R%bL<M9% %S#$+% '#$%;$(&8)$%k% *&%;%6%"5)%+%$")*%4%&.%'#('%'#$%;$(&8)$%./%(+%*+'$)7(1%*&%*'&%1$+,'#4%*'%*&%38&'.;()2%'.%&*;-1*/2%'#$%+.'('*.+%(+6

5)*'$%bL9M%R% /L<M6<9,:/%b%*&%(0&.18'$12%3.+'*+8.8&%5*'#%)$&-$3'%'.%3.8+'*+,%;$(&8)$%.+%(%3.8+'(01$%&80&$'%,%./%4%'#$+*'%*&%3(11$6%(%31".*%#%!6*&')*08'*.+4%(+6%'#$)$%*&%(%)$(1G7(18$6%/8+3'*.+%/%.+%,%&83#%'#('%

bL9M%R% /L<M9B,

c$3(11%'#('%'#$%-).0(0*1*'2%*&%*'&$1/%(%;$(&8)$9%%"#*&%&8,,$&'&%(%+.'('*.+%bL9M%R% bL6<M%'#('%3.7$)&,0.'#%3.+'*+8.8&%(+6%3.8+'*+,%3(&$&9%%"#*&%*&%3(11$6%(%;%6%"5)%2I#1%-#@%"%*+'$,)(19

Page 63: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)123P()4$5&)M+!777777777777777777777777777777777777777777777777777777777777777777777777777!

E9h9h9%%:/%L4"4bM%*&%'#$%-).0(0*1*'2%&-(3$%(&&.3*('$6%5*'#%(%)(+6.;%7()*(01$%4%(+6%,K% %%*&(%;$(&8)(01$% /8+3'*.+4% '#$+%d%R% ,LM% *&% (+.'#$)% )(+6.;%7()*(01$9% %"#$% )(+6.;%7()*(01$%d% *&

10#%5*$6-%!5*'#%)$&-$3'%'.%'#$%-).0(0*1*'2%b%*/%%% ,L<MbL6<M%m%]C

*/%*'%*&%*+'$,)(01$4%'#$+%'#$%*+'$,)(1% ,L<MbL6<M%% ,6b%$<*&'&4%*&%6$+.'$6%;),LM4%(+6%*& 3(11$6%#9%!%B,%.#$#1&0!&'!,LM9%%S#$+%+$3$&&()24%'#*&%$<-$3'('*.+%5*11%(1&.%0$%6$+.'$6%;,LM%'.*6$+'*/2%'#$%6*&')*08'*.+%8&$6%'.%/.);%'#$%$<-$3'('*.+9%%S#$+%b%*&%(0&.18'$12%3.+'*+8.8&%5*'#%)$&-$3'

'.%>$0$&,8$%;$(&8)$4%&.%'#('%b%#(&%(%6$+&*'2%/4%'#$%$<-$3'('*.+%*&%5)*''$+%%%;),LM%R% ,L<M/L<M6<9

J1'$)+('$124%/.)%3.8+'*+,%;$(&8)$%.+%'#$%*+'$,$)&%5*'#%6$+&*'2%/LBM4%;),LM%R% ,LBM/LBM9

=

"#$%$<-$3'('*.+%./%4%*/%*'%$<*&'&4%*&%3(11$6%'#$%+%$0!./%9%%"#$%$<-$3'('*.+%./%L%G%;MF4%*/%*'$<*&'&4%*&%3(11$6%'#$%/$*1$0.%!./%9%%r$/*+$%3L$M%'.%0$%(+%*+6*3('.)%/8+3'*.+%'#('%*&%.+$%*/%L&M%$4%(+6%`$).%.'#$)5*&$9%%"#$+4%;%3L$M%R%bL$M4%(+6%'#$%6*&')*08'*.+%/8+3'*.+%3(+%0$%)$3.7$)$6%/).;'#$%$<-$3'('*.+&%./%'#$%*+6*3('.)%/8+3'*.+&9% l.&'%$3.+.;$')*3%(--1*3('*.+&%6$(1%5*'#%)(+6.;7()*(01$&%'#('%#(7$%/*+*'$%7()*(+3$&9%%"#$%&-(3$%./%'#$&$%)(+6.;%7()*(01$&%*&%CFLE4!4?M4%'#$%&-(3$%./)(+6.;%7()*(01$&%%/.)%5#*3#%;%F%R%EL<MF?L6&M%%m%]9%%"#$%&-(3$%CFLE4!4?M%*&%(1&.%'$);$6%'#$&-(3$%./%"J)$*%210#%5*$6-%!')0.#1&0"9%%"#$%+.);%*+%'#*&%&-(3$%*&%)..'G;$(+G&A8()$4%F%R%TEL&MF?L6&MUy%9%%:;-1*3('*.+&%./%%%CFLE4!4?M%()$%;%oo%%E;(<LL&M4IM?L6&M%ELL&MF]IM?L6&M%R%%FF]%I%m%]%(+6%;%L%G%;MF%R%FF%G%L;%ooMF%%FF%m%]4%&.%'#('%%#(&%(%5$11G6$/*+$64%/*+*'$%;$(+(+6%7()*(+3$9

;I$J.K&)39%%L3.+'*+8$6M%r$/*+$%(%)(+6.;%7()*(01$%%02

%%L&M%R%Q 1' " KKI 1' " KL &* LKF 1' " LL

"#$+4%%*&%'#$%+8;0$)%./%#$(6&%*+%'5.%3.*+%'.&&$&9%%b.)%(%/(*)%3.*+4%;)%R%I9

;I$J.K&)*9%%L3.+'*+8$6M%>$'%%0$%(%)(+6.;%7()*(01$%6$/*+$6%'.%$A8(1%'#$%+8;0$)%./%#$(6&%'#('(--$()%0$/.)$%(%'(*1%.338)&9%%"#$+4%-.&&*01$%7(18$&%./%%()$%'#$%*+'$,$)&%,%R%WQ4I4F4999X9%%"#$+%,%*&'#$%&8--.)'%./%9%%b.)%<%)$(14%6$/*+$%T<U%'.%0$%'#$%1(),$&'%*+'$,$)%B%&('*&/2*+,%B%%<9%%J%6*&')*08'*.+

Page 64: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)123Q()4$5&)M3!777777777777777777777777777777777777777777777777777777777777777777777777777!

/8+3'*.+%/.)%4%6$/*+$6%.+%'#$%)$(1%1*+$4%*&%bL<M%R% C%'#$%(&&.3*('$6%6$+&*'2I FTBIU '&* Q B

Q '&* Q p B

6$/*+$6%.+%,%*&%%/LBM%R%FGBGI9%%"#$%$<-$3'('*.+%./%4%.0'(*+$6%8&*+,%$7(18('*.+%./%(%&-$3*(1%&$)*$&%/).;

F9I9IQ4%*&%;)%R% BFGBGI%R%I9

=Q

;I$J.K&)19%%L3.+'*+8$6M%r$/*+$%(%)(+6.;%7()*(01$%%02%L&M%R%&%G%I9%%"#$+4%%*&%'#$%;(,+*'86$./%'#$%6$7*('*.+%./%'#$%*+6$<%/).;%.+$9%%"#$%*+7$)&$%*;(,$%./%(+%*+'$)7(1%L(40M%*&%LIG04IG(MLI](4I]0M%!4%&.%'#('%%*&%;$(&8)(01$9%%j'#$)%$<(;-1$&%./%;$(&8)(01$%)(+6.;%7()*(01$&%()$%d%6$/*+$6%02%dL&MR%l(<%WI4&X%(+6%|%6$/*+$6%02%|L&M%R%&E9

E9h9g9%%@.+&*6$)%(%)(+6.;%7()*(01$%d%.+%L4"M9%%"#$%$<-$3'('*.+%;dB%*&%'#$%BG'#%+&+%0#%./%d4(+6%;LdG;dMB%*&%'#$%BG'#%.%0#*$-!+&+%0#9%%D.;$'*;$&%;.;$+'&%/(*1%'.%$<*&'9%%V.5$7$)4%*/%,LdM%*&3.+'*+8.8&%(+6%0.8+6$64% '#$+%;,LdM%(15(2&%$<*&'&9% %"#$%$<-$3'('*.+%;L'M%R%;$'d% *&% '$);$6%'#$+&+%0#!5%0%*$#105!')0.#1&0!L;,/M%./%dC%*'%&.;$'*;$&%/(*1&%'.%$<*&'9%%%@(11%(%;,/%,*&,%*!*/%*'%*&%/*+*'$/.)%'%*+%(+%*+'$)7(1%().8+6%Q9%%S#$+%(%-).-$)%;,/%$<*&'&4%'#$%)(+6.;%7()*(01$%#(&%/*+*'$%;.;$+'&%./(11%.)6$)&9%%"#$%$<-$3'('*.+%~L'M%R%;$'d4%5#$)$%%*&%'#$%&A8()$%)..'%./%GI4%*&%'$);$6%'#$%.9$*$.#%*1"#1.')0.#1&0!L3/M%./%d9%%"#$%3#()(3'$)*&'*3%/8+3'*.+%(15(2&%$<*&'&9

;I$J.K&)L9%L3.+'*+8$6M%J%6$+&*'2%/L<M%'#('%*&%&2;;$')*3%(0.8'% $).4%&83#%(&%'#$%&'(+6()6%+.);(14

#(&%;B%R% <B/L<M6<%R% <B/LG<M6<%]% <B/L<M6<%R% TI%]%LGIMBU<B/L<M6<%R%Q%/.)

Q

Q

Q

B%.669%%:+'$,)('*.+%02%-()'&%2*$16&%'#$%/.);81(%;B%R%FB %<BGI%TIGbL<MU6<%/.)%B%$7$+9%%b.)%'#$

Q

&'(+6()6%+.);(14%;FB%R R%LFBGIM;FBGF%/.)%B%p%F%8&*+,%*+'$,)('*.+F

QLFuMIfFB F=I% B FfFB3B

02%-()'&4%(+6%;F%R R%FLQM%R%I9%%"#$+4%;H%R%E%(+6%;g%R%Ih9%%"#$F

QLFuMIfF% B FfFB3B

;.;$+'% ,$+$)('*+,% /8+3'*.+% ./% '#$% &'(+6()6% +.);(1% *&% ;L'M% R% 9

LFuMIfF% #B% B FfF3B

@.;-1$'*+,%'#$%&A8()$%*+%'#$%$<-.+$+'%,*7$&%;L'M%R R% 9% # FfF

LFuMIfF% LB#MFfF3B % # FfF

Page 65: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*+()4$5&)M*!777777777777777777777777777777777777777777777777777777777777777777777777777!

E9h9s9%%:/%"%)(+6.;%7()*(01$&%()$%/.);$6%*+'.%(%7$3'.)4%LM%R%LL4IM49994L4"MM4%'#$%)$&81'%*&'$);$6% (% *$03&+! /%.#&*9% % b.)% $(3#% &% % E4% '#$% )$(1*`('*.+% ./% '#$% )(+6.;% 7$3'.)% *&% (% -.*+'LL&4IM49994L&4"MM% *+% "4% (+6% '#$% )(+6.;% 7$3'.)% #(&% (+% *+683$6% -).0(0*1*'2% .+% "% 5#*3#% *&3#()(3'$)*`$6%02%*'&%;81'*7()*('$%@rb4%bL<I49994<"M%R%\LW&EoL&4IM<I49994L&4"M<"XM9%%=.'$%'#('%(11'#$%3.;-.+$+'&%./%(%)(+6.;%7$3'.)%()$%/8+3'*.+&%./%'#$%"$+%%&'('$%./%=('8)$%&4%(+6%'#$%)(+6.;7$3'.)%3(+%0$%5)*''$+%(&%(%;$(&8)(01$%/8+3'*.+%%/).;%'#$%-).0(0*1*'2%&-(3$%LE4!4\M%*+'.%L"4"+"%9L"#$%+.'('*.+%"%"%;$(+&%""999"%%"%'*;$&4%5#$)$%"%*&%'#$%_.)$1%PG/*$16%.+%'#$%)$(1%1*+$9%%"#*&%*&

(1&.%3(11$6%'#$%,*&3).#!PG/*$164%(+6%*&%&.;$'*;$&%5)*''$+%"%"%R%*RI49994"!"*4%5#$)$%'#$%"*%()$%*6$+'*3(13.-*$&%./%"9M%%"#$%;$(&8)(0*1*'2%./%%)$A8*)$&%GIL,M%%E%/.)%$(3#%.-$+%)$3'(+,1$%,%*+%"9%%"#$*+6$-$+6$+3$%.)%6$-$+6$+3$%./%'#$%3.;-.+$+'&%./%%*&%6$'$);*+$6%02%'#$%/*+$%&')83'8)$%./%?%.+%E9J% 8&$/81% *+&*,#'% 3.;$&% /).;% 3.+&*6$)*+,% 6*//$)$+'% )$-)$&$+'('*.+&% ./% 7$3'.)&% *+% /*+*'$G

6*;$+&*.+(1%&-(3$&4%(+6%$<'$+6*+,%'#$&$%*6$(&%'.%*+/*+*'$G6*;$+&*.+(1%&*'8('*.+&9%%".%0$%&-$3*/*343.+&*6$)%F9%%S#$+%5$%$<-)$&&%(%/8+3'*.+%%.+%"%R%WI4FX%(&%(%-.*+'%LLIM4LFMM%*+%'#*&%&-(3$4%5#('5$%()$%)$(112%6.*+,%*&%6$/*+*+,%'5.%/8+3'*.+&%|I%R%LI4QM%(+6%|F%R%LQ4IM%5*'#%'#$%-).-$)'2%'#('%|I%(+6|F%&-(+%'#$%&-(3$4%(+6%'#$+%5)*'*+,%%(&%'#$%1*+$()%3.;0*+('*.+%%R%LIM|I%]LFM|F9%%"#$%-(*)%.//8+3'*.+&%L-.*+'&M%|I%(+6%|F%*&%3(11$6%(%L$+%-!6$"1"%/.)%F4%(+6%$7$)2%-.*+'%*+%'#$%&-(3$%#(&%(%8+*A8$)$-)$&$+'('*.+%*+%'$);&%./%'#*&%0(&*&9% %V.5$7$)4%'#$)$%;(2%0$%;(+2%6*//$)$+'%V(;$1%0(&$&9%%b.)$<(;-1$4%'#$%8+*'%/8+3'*.+%LI4IM%(+6%'#$%/8+3'*.+%3.&Lu'M%.)%LGI4IM%(1&.%/.);%(%V(;$1%0(&*&4%(+6%*+'$);&%./%'#*&%0(&*&%%#(&%'#$%)$-)$&$+'('*.+%%R%yLLIM]LFMMLI4IM%]%yLLFMGLIMMLGI4IM9J+.'#$)%5(2%'.%5)*'$%(%)(+6.;%7$3'.)%%*&%'.%6$/*+$%(+%*+6$<%&$'%:%R%WI49994"X4%(+6%'#$+%6$/*+$

%(&%(%)$(1G7(18$6%/8+3'*.+%.+%E%(+6%:4%KE^:9%%"#$+4%L4'M%*&%(%&*;-1$%)(+6.;%7()*(01$%/.)%$(3#'%%:4%(+6%L&4M%*&%(%)$(1%7$3'.)%'#('%*&%(%)$(1*`('*.+%./%%/.)%$(3#%&%%E9%%J%/8+3'*.+%6$/*+$6%*+%'#*&5(2% *&% (1&.% 3(11$6% (% "#&.9$"#1.! ,*&.%""4% -()'*381()12% 5#$+%:% *&% +.'% /*+*'$9% % "#$% ;$(&8)(0*1*'2)$A8*)$;$+'%.+%%*&%'#$%&(;$%(&%0$/.)$4%08'%3(+%0$%5)*''$+%*+%(%6*//$)$+'%/.);%(&%)$A8*)*+,%'#('%'#$*+7$)&$%*;(,$%./%$(3#%.-$+%*+'$)7(1%*+%%0$%3.+'(*+$6%*+%!*4%5#$)$%*%*&%(%PG/*$16%./%&80&$'&%./%:'#('%3(+%0$%'(B$+%'.%0$%'#$%/(;*12%./%(11%&80&$'&%./%:%(+6%YZ%6$+.'$&%'#$%.-$)('*.+%'#('%/.);&%'#$&;(11$&'%PG/*$16%3.+'(*+*+,%(11%&$'&%9^B%5*'#%9%%!%%(+6%B%%*9%%"#$)$%*&%'#$+%(%3.;-1$'$%68(1*'20$'5$$+%)(+6.;%7$3'.)&%*+%(%"G6*;$+&*.+(1%1*+$()%&-(3$%(+6%)(+6.;%/8+3'*.+&%.+%(%"G6*;$+&*.+(1*+6$<%&$'9%%"#*&%68(1*'2%0$'5$$+%7$3'.)&%(+6%/8+3'*.+&%5*11%,$+$)(1*`$%(+6%-).7*6$%8&$/81%*+&*,#'&%*+'.&'('*&'*3(1%(--1*3('*.+&%*+%5#*3#%:%*&%(%;.)$%,$+$)(1%&$'%*+6$<*+,%'*;$9%%"#$%31"#*16)#1&0!')0.#1&0L@rbM%./%%*&%

%%bL<I49994<"M%R%\LW&D*L&M%%<*%/.)%*%R%I49994"XM9%%%

:/%9%%"%"4%6$/*+$%bL9M%R%\LW&EL&M9XM9%%:/%bL9M%R%Q%/.)%$7$)2%&$'%9%./%>$0$&A8$%;$(&8)$`$).4%'#$+%'#$)$%$<*&'&%(%,*&6$61-1#7!3%0"1#7!')0.#1&0!L-6/M%/L<I49994<"M%&83#%'#('%

Page 66: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*3()4$5&)M1!777777777777777777777777777777777777777777777777777777777777777777777777777!

LIM%%%%%bL<I49994<"M%R% %/L2I499942"M%62I99962"9%BI

BF

999BK

b%(+6%/%()$%'$);$6%'#$%@&10#!.)%+)-#1/$*1$#%!@rb%(+6%-6/4%)$&-$3'*7$124%./%9%%"#$%)(+6.;%7()*(01$I%#(&%(%6*&')*08'*.+%'#('%&('*&/*$&%

%%bIL<IM%%\LW&EIL&M%%<IXM%R%bL<I4]49994]M9

"#*&%)(+6.;%7()*(01$%*&%;$(&8)(01$%5*'#%)$&-$3'%'.%'#$%PG&80/*$16%$I%3.+'(*+*+,%'#$%$7$+'&%5#.&$.338))$+3$%*&%6$'$);*+$6%02%I%(1.+$C%*9$94%$I%*&%'#$%/(;*12%,$+$)('$6%02%&$'&%./%'#$%/.);%9^^999^5*'#%9%%"9%%:/%b%*&%(0&.18'$12%3.+'*+8.8&%5*'#%)$&-$3'%'.%>$0$&A8$%;$(&8)$%.+%"%"4%'#$+%'#$)$%()$(&&.3*('$6%6$+&*'*$&%/%(+6%/I%&('*&/2*+,

LFM%%%%%%%% bIL<IM%R% /IL2IM%62I%BI

7I

LEM%%%%%% /IL<IM%R% /L<I42F499942+M62F99962+9%%

7F

70

bI%(+6%/I%()$%'$);$6%'#$%+$*510$-!@rb%(+6%-6/4%)$&-$3'*7$124%./%I9%%E9h9v9%%@.))$&-.+6*+,%'.%'#$%3.+3$-'%./%(%3.+6*'*.+(1%-).0(0*1*'24%5$%3(+%6$/*+$%(%.&031#1&0$-

31"#*16)#1&0K% D8--.&$% ,% *&% (+% $7$+'% *+% $I% 5*'#% \L,M% p% Q9% % "#$+4% 6$/*+$% bLFML<F49994<+,M% RbLW2+2I,42F<F499942+<+XMfbIL,M%'.%0$%'#$%3.+6*'*.+(1%6*&')*08'*.+%./%LF49994+M%,*7$+%I%%,9S#$+% b% *&% (0&.18'$12% 3.+'*+8.8&% 5*'#% )$&-$3'% '.% >$0$&,8$% ;$(&8)$% .+% +4% '#$% 3.+6*'*.+(16*&')*08'*.+%3(+%0$%5)*''$+%*+%'$);&%./%'#$%?.*+'%6$+&*'24

%%%bLFML<F49994<+,M%R% 97I/

BF

7F B0

70'L7I47F4999470M37I37F999370

7I/

7F

70'L7I47F4999470M37I37F999370

"(B*+,%'#$%1*;*'%(&%,%&#)*+B&%'.%(%-.*+'%I%R%<I4%.+$%.0'(*+&%'#$%3.+6*'*.+(1%6*&')*08'*.+%./%LF49994+M,*7$+%I%R%<I4

%%%bLFML<F49994<+IR<IM%R% %4%%%BF

7F B0

70'LBI47F4999470M37I37F999370

'ILBIM

-).7*6$6%/IL<IM%p%Q9%%b*+(1124%(&&.3*('$6%5*'#%'#*&%3.+6*'*.+(1%6*&')*08'*.+%*&%'#$%3.+6*'*.+(1%6$+&*'2/LFML<F49994<+IR<IM% R% /L<I4<F49994<+Mf/IL<IM9% % l.)$% ,$+$)(1124% .+$% 3.816% 3.+&*6$)% '#$% ;(),*+(16*&')*08'*.+&%./%(+2%&80&$'4%&(2%I4999B4%./% '#$%7$3'.)%4%5*'#%B]I4999+%*+'$,)('$6%.8'C%(+6% '#$

Page 67: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12**()4$5&)M6!777777777777777777777777777777777777777777777777777777777777777777777777777!

3.+6*'*.+(1% 6*&')*08'*.+&% ./% .+$% .)% ;.)$% ./% '#$% 7()*(01$&% B]I4999+% ,*7$+% .+$% .)% ;.)$% ./% '#$3.+6*'*.+&%I%R%<I49994B%R%<B9

E9h9z9%%8&'%(&%$<-$3'('*.+&%()$%6$/*+$6%/.)%(%&*+,1$%)(+6.;%7()*(01$4%*'%*&%-.&&*01$%'.%6$/*+$$<-$3'('*.+&%/.)%(%7$3'.)%./%)(+6.;%7()*(01$&9%%b.)%$<(;-1$4%;LI%G%;IMLFG;FM%*&%3(11$6%'#$.&/$*1$0.%!./%I%(+6%F4%(+6%;$'4%5#$)$%'%R%L'I49994'+M%*&%(%7$3'.)%./%3.+&'(+'&4%*&%(%L;81'*7()*('$M;.;$+'% ,$+$)('*+,% /8+3'*.+% /.)% '#$% )(+6.;% 7$3'.)% 9% % V$)$% ()$% &.;$% 8&$/81% -).-$)'*$&% ./$<-$3'('*.+&%./%7$3'.)&K%

L(M%:/%,LM%*&%(%/8+3'*.+%./%(%)(+6.;%7$3'.)4%'#$+%;,LM%*&%'#$%*+'$,)(1%./%,%5*'#%)$&-$3'%'.%'#$6*&')*08'*.+%./%9%%S#$+%,%6$-$+6&%.+%(%&807$3'.)%./%4%'#$+%;,LM%*&%'#$%*+'$,)(1%./%,L2M%5*'#)$&-$3'%'.%'#$%;(),*+(1%6*&')*08'*.+%./%'#*&%&807$3'.)9L0M%:/%%(+6%|%()$%)(+6.;%7$3'.)&%./%1$+,'#%+4%(+6%$!(+6%6!()$%&3(1()&4%'#$+%;L$%]%6|M%R%$;]%6;|9L3M%T@(83#2GD3#5()'`%*+$A8(1*'2U%:/%%(+6%|%()$%)(+6.;%7$3'.)&%./%1$+,'#%+4%'#$+%L;|MF%L;ML;||M9L6M% Tl*+B.5&B*% :+$A8(1*'2U% :/%% *&%(% )(+6.;%7$3'.)%./% 1$+,'#%+%(+6% )%%I% *&%(% &3(1()4% '#$+

L; *)MIf)%% L;*)MIf)901I

01I

L$M%T>.$7$%:+$A8(1*'2U%:/%%*&%(%)(+6.;%7$3'.)%./%1$+,'#%+%(+6%)%p%Q4%'#$+%; *)%01I

;(<LI4+)GIM% ;*)901I

L/M%T$+&$+%:+$A8(1*'2U%:/%%*&%(%)(+6.;%7$3'.)%(+6%,L<M%*&%(%3.+7$<%/8+3'*.+4%'#$+%;%,LM%,L;M9%%:/%,L<M%*&%(%3.+3(7$%/8+3'*.+4%'#$%*+$A8(1*'2%*&%)$7$)&$69

S#$+%$<-$3'('*.+&%$<*&'4%'#$2%3(+%0$%8&$6%'.%0.8+6%'#$%-).0(0*1*'2%'#('%(%)(+6.;%7()*(01$%'(B$&%.+$<')$;$%7(18$&9%

:-&T0&J)1=M=%%D8--.&$%%*&%(%+^I%)(+6.;%7$3'.)%(+6%%*&%(%-.&*'*7$%&3(1()9%(9%%Tl()B.7%0.8+6U%:/%;(<*;*%m%]4%'#$+%;(<*\)L*%p%M%m%;(<*;*f909%%T@#$02&#$7%0.8+6U%:/%;%m%]4%'#$+%\)LF%p%M%m%;fF939%%T@#$)+.//%0.8+6U%:/%;$'%$<*&'&%/.)%(11%7$3'.)&%'%*+%&.;$%+$*,#0.)#..6%./% $).4%'#$+%/.)%&.;$-.&*'*7$%&3(1()&%%(+6%l4%\)LF%p%M%m%l$G9

\)../K% %J11% '#$&$% *+$A8(1*'*$&% ()$% $&'(01*&#$6%02% '#$% &(;$% '$3#+*A8$K% % :/% )L2M% *&% (%-.&*'*7$%+.+G6$3)$(&*+,%/8+3'*.+%./%2%p%Q4%(+6%;)LM%m%]4%'#$+

Page 68: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*1()4$5&)ML!777777777777777777777777777777777777777777777777777777777777777777777777777!

%%\)L%p%M%R% bL6<M%% T)L<Mf)LMUbL6<M%%;)LMf)LM9%Bp Bp"(B*+,%)L2M%R%2F%,*7$&%'#$%)$&81'%6*)$3'12%/.)%'#$%@#$02&#$7%0.8+69%%:+%'#$%)$;(*+*+,%3(&$&4%/*)&'%,$'(%3.;-.+$+'G02G3.;-.+$+'%*+$A8(1*'29%%b.)%'#$%l()B.7%0.8+64%\)L*%p%M%m%a*f%/.)%$(3#%*,*7$&%'#$%)$&81'9%%b.)%'#$%@#$)+.//%0.8+64

%%\)LF%p%M%% T\)L*%p%+GIfFM%]%\)L*%m%G+GIfFMU01I

&*+3$%*/%'#$%$7$+'%.+%'#$%1$/'%.338)&4%.+$%./%'#$%$7$+'&%.+%'#$%)*,#'%;8&'%.338)9%%"#$+%(--12%'#$*+$A8(1*'2%\)L*%p%M%%;)L*Mf)LM%5*'#%)L2M%R%+GIfF$2%'.%$(3#%'$);%*+%'#$%)*,#'G#(+6G&*6$%&8;9"#$%*+$A8(1*'2%/.)%7$3'.)&%*&%08*1'%8-%/).;%(%3.))$&-.+6*+,%*+$A8(1*'2%/.)%$(3#%3.;-.+$+'9%%

E9h9IQ9% %S#$+% '#$% $<-$3'('*.+% ./% (% )(+6.;%7()*(01$% *&% '(B$+%5*'#% )$&-$3'% '.% (% 3.+6*'*.+(16*&')*08'*.+4%*'%*&%3(11$6%(%.&031#1&0$-!%B,%.#$#1&09%%:/%bL<,M%*&%'#$%3.+6*'*.+(1%6*&')*08'*.+%./%()(+6.;%7$3'.)%%,*7$+%'#$%$7$+'%,4%'#$+%'#$%3.+6*'*.+(1%$<-$3'('*.+%./%(%/8+3'*.+%,LM%,*7$+%,%*&6$/*+$6%(&%

% ;,,LM%R% ,L2MbL62,M9J+.'#$)%+.'('*.+%/.)%'#*&%$<-$3'('*.+%*&%;L,LM,M9%%S#$+%'#$%6*&')*08'*.+%./%'#$%)(+6.;%7()*(01$% *&%(0&.18'$12%3.+'*+8.8&%5*'#% )$&-$3'% '.%>$0$&,8$%;$(&8)$4% &.% '#('% *'%#(&%(%6$+&*'2% /L<M4% '#$

3.+6*'*.+(1%6$+&*'2%3(+%0$%5)*''$+%(& 4%(+6%'#$%3.+6*'*.+(1%$<-$3'('*.+'LBo/M 'LBM3LB/Mf/ 'L"M3"

3(+%'#$+%0$%5)*''$+%

;,,LM%R% %R% 9/5LBM'LBo/M3B/5LBM'LBM3B

/ 'LBM3B

S#$+%'#$%6*&')*08'*.+%./%%*&%6*&3)$'$4%'#*&%/.);81(%0$3.;$&

;,,LM%R% 9=/ 5L=M'L=M

=/ 'L=M

"#$%3.+6*'*.+(1%$<-$3'('*.+%*&%(3'8(112%(%/8+3'*.+%.+%'#$%PG/*$16%(%./%3.+6*'*.+*+,%$7$+'&4%(+6%*&&.;$'*;$&%5)*''$+%;(%,LM%.)%;L,LM(M%'.%$;-#(&*`$%'#*&%6$-$+6$+3$9

%%D8--.&$%9I499949B%,$*#1#1&0%'#$%6.;(*+%./%9%%"#$+%'#$%6*&')*08'*.+%&('*&/*$&

Page 69: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*6()4$5&)MM!777777777777777777777777777777777777777777777777777777777777777777777777777!

bLIM%R% bL<o9*MbL9*M4=1I

*;-12*+,

;,LM%R% ,L<MbL6<M%R ,L<MbL6<9*MbL9*M%R% ;W,LMo9*XbL9*M9 =1I

=1I

"#*&%*&%3(11$6%'#$%-$8!&'!1#%*$#%3!%B,%.#$#1&0"4%(+6%*&%#$(7*12%8&$6%*+%$3.+.;$')*3&9

;I$J.K&)*9% L3.+'*+8$6M%c$3(11% '#('%%*&% '#$%+8;0$)%./%#$(6&% '#('%(--$()%0$/.)$%(% '(*1% *+%(&$A8$+3$%./%3.*+%'.&&$&4%(+6%'#('%'#$%-).0(0*1*'2%./%%R%B%*&%FGBGI%/.)%B%R%Q4I4999%9%%>$'%,%0$%'#$%$7$+'./%(+%$7$+%+8;0$)%./%#$(6&9%%"#$+4

;,%R% R% R%FfE4=Q4F4H4999 =F=I

=Q4F4H4999 F=I

@Q4I4F4999 @H@

@Q4I4F4999 H@fF

5#$)$% '#$% &$3.+6% )('*.% *&% .0'(*+$6% 02% &80&'*'8'*+,% B%R% F?4% (+6% '#$% 7(18$% *&% .0'(*+$6% 8&*+,% '#$&8;;('*.+%/.);81(&%/.)%(%,$.;$')*3%&$)*$&%/).;%F9I9IQ9%%J%&*;*1()%3(1381('*.+%/.)%'#$%$7$+'%J%./%(+.66% +8;0$)% ./% #$(6&% 2*$16&% ;9% R% hfE9% % "#$% -).0(0*1*'2% ./% (+% $7$+% +8;0$)% ./% #$(6&% *&

R%FfE9%%"#$%1(5%./%*'$)('$6%$<-$3'('*.+&%'#$+%,*7$&=Q4F4H4999 F=I

;%%R%;Wo,M\L,M%]%;Wo9M\L9M%R%LFfEMLFfEM%]%LhfEMLIfEM%R%I4

5#*3#%3.+/*);&%'#$%6*)$3'%3(1381('*.+%./%;%9

"#$%3.+3$-'%./%(%3.+6*'*.+(1%$<-$3'('*.+%*&%7$)2%*;-.)'(+'%*+%$3.+.;$')*3&%(+6%*+%$3.+.;*3'#$.)24%&.%5$%5*11%5.)B%.8'%*'&%-).-$)'*$&%*+%&.;$%6$'(*1%/.)%'#$%3(&$%./%'5.%7()*(01$&9%%D8--.&$)(+6.;%7()*(01$&%Li4M%#(7$%(%?.*+'%6$+&*'2%/L84<M9%%"#$%;(),*+(1%6$+&*'2%./%%*&%6$/*+$6%02%

%%,L<M%R% /L84<M684%

)

(+6%'#$%3.+6*'*.+(1%6$+&*'2%./%i%,*7$+%%R%<%*&%6$/*+$6%02%/L8<M%R%/L84<Mf,L<M4%-).7*6$6%,L<M%p%Q9"#$%3.+6*'*.+(1%$<-$3'('*.+%./%(%/8+3'*.+%#Li4M%&('*&/*$&%;L#Li4MR<M%R%#L84<M/L8<M684%(+6*&%(%/8+3'*.+%./%<9%%"#$%8+3.+6*'*.+(1%$<-$3'('*.+%./%#Li4M%&('*&/*$&

;#Li4M%R% #L84<M/L84<M686<%R %R%;;i#Li4MC%

B

)9L)4BM'L)BM3) 5LBM3B

(+.'#$)%$<(;-1$%./% '#$%1(5%./% *'$)('$6%$<-$3'('*.+&9% %"#$%.&031#1&0$-!+%$0!./%i%,*7$+%R<%*&!iL<M%%;iR<iC%02%'#$%1(5%./%*'$)('$6%$<-$3'('*.+&4%'#$%3.+6*'*.+(1%(+6%8+3.+6*'*.+(1%;$(+%()$

Page 70: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*L()4$5&)MO!777777777777777777777777777777777777777777777777777777777777777777777777777!

)$1('$6%02%%;ii%R%;;ii%%;!iLM9%%"#$%.&031#1&0$-!/$*1$0.%!./%i%*&%6$/*+$6%02)>Li<M%R;iLi%G%!iL<MMF9%%:'%*&%)$1('$6%'.%'#$%8+3.+6*'*.+(1%7()*(+3$%02%'#$%/.);81(

;iLi%G%;iiMF%R%;;iLi%G%!iLM%]%!iLM%G%;iiMF%%%%%%R%;;iLi%G%!iLMMF%]%;;iL!iLM%G%;iiMF%]%F;;iLi%G%!iLMML!iLM%G%;iiM%%%%R%;>LiM%]%;L!iLM%G%;iiMF%]%F;L!iLM%G%;iiM;iLi%G%!iLMM%%%%R%;>LiM%]%;L!iLM%G%;iiMF%

"#$+4%'#$%8+3.+6*'*.+(1%7()*(+3$%$A8(1&%'#$%$<-$3'('*.+%./%'#$%3.+6*'*.+(1%7()*(+3$%-18&%'#$%7()*(+3$./%'#$%3.+6*'*.+(1%$<-$3'('*.+9%%

;I$J.K&)3+K%D8--.&$%Li4M%()$%0*7()*('$%+.);(1%5*'#%;$(+&%;i%R%n8%(+6%;%R%n<4%(+6%&$3.+6;.;$+'&%;LiGn8MF%R%P8F4%;LGn<MF%R%P<F4%(+6%;LiGn8MLGn<M%R%P8<%%P8P<9%%r$/*+$

%R% 4IIF

)n)P)

F

BnBPB

F

F)n)P)

BnBPB

(+6%.0&$)7$%'#('

% %%% 9M

BnBPB

F

I

IF)n)P)

BnBPB

F

"#$%0*7()*('$%+.);(1%6$+&*'2%*&%/L84<M%R%TFuP8P<LIGFMIfFUGI$<-LGfFM9%%"#$%;(),*+(1%6$+&*'2%./%%*&+.);(1%5*'#%;$(+%n<%(+6%7()*(+3$%P<FK%%0L<Gn<4P<M%R%LFuP<FMGI$<-LGL<Gn<MFfFP<FM%9%%"#*&%3(+%0$%6$)*7$6/).;% '#$% 0*7()*('$% 6$+&*'2% 02% 3.;-1$'*+,% '#$% &A8()$% /.)% 8% *+% % (+6% *+'$,)('*+,% .7$)% 89% % "#$3.+6*'*.+(1%6$+&*'2%./%i%,*7$+%%'#$+%&('*&/*$&%

%%/L8<M%R%TFuP8P<LIGFMyUGI$<-LGfFMfLFuP<FMGI$<-LGL<Gn<MFfFP<FM9

%% R%TFuP8FLIGFMUGy 9%$<- IFLIFM

)n)P)

BnBPB

F

V$+3$%'#$%3.+6*'*.+(1%6*&')*08'*.+%./%i4%,*7$+%%R%<4%*&%+.);(1%5*'#%3.+6*'*.+(1%;$(+%;LiR<MR%n8%]%P8L<%%n<MfP<%%%n8%]%P8<L<Gn<MfP<F%(+6%7()*(+3$%>LiR<M%%;LLiG;LiR<MMFR<M%RP8FLIGFM%%P8F%G%P8<FfP<F9%%S#$+%i%(+6%%()$%?.*+'%+.);(1%)(+6.;%7$3'.)&%5*'#%;i%R%n84%;%R%n<4;LiGn8MLiGn8M%R%884%;LGn<MLGn<MR%<<4%(+6%;LiGn8MLGn<M%R%8<4%'#$+%LiR<M%*&%+.);(1%5*'#;LiR<M%R%n8%]%8<<<GIL<%G%n<M%(+6%>LiR<M%R%88%G%8<<<GI%<89%%

Page 71: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*M()4$5&)MP!777777777777777777777777777777777777777777777777777777777777777777777777777!

E9h9II9%%@.+6*'*.+(1%6$+&*'*$&%&('*&/2%/L84<M%R%/L8<M,L<M%R%/L<8M#L8M4%5#$)$%#L8M%*&%'#$%;(),*+(16$+&*'2%./%i4%(+6%#$+3$%/L8<M%R%/L<8M%#L8Mf,L<M9%%"#*&%*&%3(11$6%4$7%"!;$89%%S#$+%i%(+6%%()$*+6$-$+6$+'4%/L84<M%R%#L8M,L<M4%.)%/L8<M%R%#L8M%(+6%/L<8M%R%,L<M9%%b.)%i%(+6%%*+6$-$+6$+'4%(+6)LM% (+6% &LM% (+2% /8+3'*.+&4% .+$% #(&% aL)LiMR<M% R% )L8M/L8<M68%% )L8M#L8M68% R%;)LiM4% (+6;L)LiM&LMM% R% )L8M&L<M/L84<M686<% R% &L<M,L<M)L8M/L8<M68% 6<% R% &L<M,L<M;)Li<M6<% RT;&LMUT;)LiMU4%.)%3.7L)LiM4&LMM%R%Q4%-).7*6$6%;)LiM%(+6%;&LM%$<*&'9% % :/%)L8M%R%8%G%;i4%'#$+;L)LiMR<M% R% Q% (+6% 3.7Li4M% R%;LiG;iM%R% Q9% % @.+7$)&$124% &8--.&$%i% (+6%% ()$% ?.*+'126*&')*08'$69%%:/%3.7L)LiM4&LMM%R%Q%/.)%(11%/8+3'*.+&%)%LM4%&LM%&83#%'#('%;)LiM%(+6%;&LM%$<*&'4%'#$+%(+6%i%()$%*+6$-$+6$+'9%%".%&$$%'#*&4%3#..&$%)L8M%R%I%/.)%8%%84%)L8M%R%Q%.'#$)5*&$C%3#..&$%&L<M%R%I/.)%<%%<4%&L<M%R%Q%.'#$)5*&$9%%"#$+%;)LiM%R%VL8M%(+6%;&LM%R%eL<M4%5#$)$%V%(+6%e%()$%'#$;(),*+(1%38;81('*7$%6*&')*08'*.+%/8+3'*.+&4%(+6%Q%R%3.7%R%bL84<M%G%VL8MeL<M4%5#$)$%b%*&%'#$%?.*+'38;81('*7$%6*&')*08'*.+%/8+3'*.+9%%%V$+3$4%bL84<M%R%VL8MeL<M4%(+6%4%i%()$%*+6$-$+6$+'9%%=.'$%'#('%3.7%Li4M%R%Q%*&%+.'%&8//*3*$+'%'.%*;-12%i4%*+6$-$+6$+'9%%b.)%$<(;-1$4%,L<M%R%y%/.)

GI%%<%%1%(+6%/L8<M%R%y%/.)%GI%%8G<F%%I%*&%+.+*+6$-$+6$+'%5*'#%;LiR<M%R%<F4%08'%3.7Li4M%R;E%R%Q9%%b8)'#$);.)$4%;LiR<M%%Q%*&%+.'%&8//*3*$+'%'.%*;-12%i4%*+6$-$+6$+'9%%b.)%$<(;-1$4%,L<MR%y%/.)%GI%%<%%I%(+6%/L8<MR%IfFLI%]%<FM%/.)%GLI]%<FM%%8%%LI%]%<FM%*&%+.+*+6$-$+6$+'%5*'#%;LiF<M%R%LI%]%<F%MF%%;)iF%R%FvfIh4%08'%;LiR<M%%Q9

;I$J.K&)33=%D8--.&$%;.+'#12%/(;*12%*+3.;$%L*+%'#.8&(+6&%./%6.11()&M%*&%(%)(+6.;%7()*(01$%d5*'#%(%@rb%bL2M%R%I%G%2GF%/.)%2%p%I9%%D8--.&$%(%)(+6.;%7()*(01$%|%*&%.+$%/.)%#.;$%.5+$)&%(+6%`$)..'#$)5*&$4% (+6% '#('% '#$% 3.+6*'*.+(1% -).0(0*1*'2% ./% '#$% $7$+'% |% R% I4% ,*7$+%d4% *&% LdGIMfd9% % "#$8+3.+6*'*.+(1%$<-$3'('*.+%./%d%*&%F9%%"#$%?.*+'%6$+&*'2%./%d%(+6%|%*&%/L2M,L`o2M%R%LF2GEM%LI%G%2GIM%/.)

`%R%I9%%"#$%8+3.+6*'*.+(1%-).0(0*1*'2%./%|%R%I%*&%'#$+% /L2M,L`o2M62%R%IfE9%%_(2$&%>(5%,*7$&

7I

'#$%3.+6*'*.+(1%6$+&*'2%./%d%,*7$+%`%R%I4%/L2o`M%R%/L2M,L`o2Mf /L2M,L`o2M62%R%Lg2GEM%LI%G%2GIM4%&.

7I

'#('%'#$%3.+6*'*.+(1%$<-$3'('*.+%./%d%,*7$+%`%R%I%*&%aLdo|RIM%R 2%/L2o`M62%R%E9

7I

;I$J.K&)3*9%"#$%-).01$;%./%*+'$)-)$'*+,%'#$%)$&81'&%./%;$6*3(1%'$&'&%*118&')('$&%_(2$&%>(59%%J01..6%'$&'%/.)%-).&'('$%3(+3$)%*&%B+.5+%'.%2*$16%(%Y-.&*'*7$Z%5*'#%-).0(0*1*'2%Q9z%*/%3(+3$)%*&%-)$&$+'4(+6%(%/(1&$%Y-.&*'*7$Z%5*'#%-).0(0*1*'2%./%Q9F%*/%3(+3$)%*&%+.'%-)$&$+'9%%"#$%-)$7(1$+3$%./%'#$%3(+3$)*+%'#$%-.-81('*.+%./%;(1$&%*&%Q9Qh9%%"#$+4%'#$%3.+6*'*.+(1%-).0(0*1*'2%./%3(+3$)4%,*7$+%(%Y-.&*'*7$Z'$&'%)$&81'4%$A8(1&%'#$%?.*+'%-).0(0*1*'2%./%3(+3$)%(+6%(%-.&*'*7$%'$&'%)$&81'4%LQ9QhMLQ9zM4%6*7*6$6%02%'#$-).0(0*1*'2%./%(%-.&*'*7$%'$&'%)$&81'4%LQ9QhMLQ9zM]LQ9zhMLQ9FM4%.)%Q9FEh9%%"#8&4%(%Y-.&*'*7$Z%'$&'%#(&%%(1.5%-).0(0*1*'2%./%*6$+'*/2*+,%(%3(&$%./%3(+3$)4%(+6%*/%(11%Y-.&*'*7$Z%'$&'&%5$)$%/.11.5$6%02%&8),$)24(0.8'%sh%-$)3$+'%./%'#$&$%&8),$)*$&%5.816%-).7$%8++$3$&&()29

Page 72: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*O()4$5&)MQ!777777777777777777777777777777777777777777777777777777777777777777777777777!

E9h9IF9%%"#$%6*&38&&*.+%./%$<-$3'('*.+&%5*11%0$%3.+3186$6%5*'#%(%1*&'%./%6$'(*1$6%-).-$)'*$&%./3#()(3'$)*&'*3%/8+3'*.+&%(+6%;.;$+'%,$+$)('*+,%/8+3'*.+&K

(9%%~L'M%R%;$'d%%;3.&L'dM%]%;&*+L'dM909%%|%R%(%]%0d%#(&%'#$%3/%$'(~L0'M%(+6%|%R%/LdM%#(&%'#$%3/%;$'/LdM9%39%%:/%;dB%$<*&'&4%'#$+%~LBML'M%%6B~L'Mf6'B%$<*&'&4%&('*&/*$&%'#$%0.8+6%6B~L'Mf6'B%%;dB4%(+6%*&8+*/.);12%3.+'*+8.8&4%(+6%;dB%R%LMB~LBMLQM9%%:/%~LBML'M%$<*&'&4%'#$+%;dB%$<*&'&9%%69%%:/%d%#(&%/*+*'$%;.;$+'&%'#).8,#%.)6$)%B4%'#$+%~L'M%#(&%(%"(21.)&%$<-(+&*.+%

%%~L'M%R %?L;d?M'?f?%]%T~LBML'M%G%~LBMLQMU'BfB%=@Q

5#$)$%%*&%(%&3(1()%5*'#%Q%m%%m%IC%'#$%"(21.)&%$<-(+&*.+%&('*&/*$&%'#$%0.8+6&%

%%%~L'M%G% ?%L;d?M'?f?%%'B;dBfB%=I@Q

(+6

~L'M%G% ?%L;d?M'?f?%%F'B;dBfB%%=@Q

:/%;dB%$<*&'&4%'#$+%'#$%$<-)$&&*.+%L'M%R%>+%~L'M4%3(11$6%'#$%"%.&03!.9$*$.#%*1"#1.!')0.#1&0!.).)+)-$0#!5%0%*$#105!')0.#1&04%#(&%(%"(21.)&%$<-(+&*.+%

%%L'M%R% ??'?f?%]%TLBML'M%G%LBML'MU4%%%=@I

5#$)$%LBM%%6Bf6'B4%(+6%%*&%(%&3(1()%5*'#%Q%m%%m%I9%%"#$%$<-)$&&*.+&%?%()$%3(11$6%'#$%.)+)-$0#"./%'#$%6*&')*08'*.+4%(+6%&('*&/2%I%R%;d%(+6%F%R%t()LdM9%%"#$%$<-)$&&*.+%EfFEfF%*&%3(11$6%'#$"=%80%""4%(+6%'#$%$<-)$&&*.+%HfFF%G%E%*&%3(11$6%'#$%=)*#&"1"!L*9$94%'#*3B+$&&%./%'(*1&%)$1('*7$%'.3$+'$)M4%./%'#$%6*&')*08'*.+9%%$9%%:/%d%*&%+.);(112%6*&')*08'$6%5*'#%;$(+%n%(+6%7()*(+3$%PF4%'#$+%*'&%3#()(3'$)*&'*3%/8+3'*.+%*&$<-Ln'GPF'FfFM9%%"#$%+.);(1%#(&%38;81(+'&%I%R%n4%F%R%PF4%E%R%H%R%Q9%%%/9% %c(+6.;%7()*(01$&%%(+6%d%#(7$%*6$+'*3(1%6*&')*08'*.+%/8+3'*.+&%*/%(+6%.+12%*/%'#$2%#(7$*6$+'*3(1%3#()(3'$)*&'*3%/8+3'*.+&9%%,9%%:/%d+% -%d%L&$$%@#(-9%H9IM4%'#$+%'#$%(&&.3*('$6%3#()(3'$)*&'*3%/8+3'*.+&%&('*&/2%~+L'M% %~L'M%/.)$(3#%'9%%@.+7$)&$124%*/%d+%#(&%3#()(3'$)*&'*3%/8+3'*.+%~+L'M%3.+7$),*+,%-.*+'5*&$%'.%(%/8+3'*.+%~L'M'#('%*&%3.+'*+8.8&%('%'%R%Q4%'#$+%'#$)$%$<*&'&%d%&83#%'#('%~L'M%*&%'#$%3#()(3'$)*&'*3%/8+3'*.+%./%d(+6%d+% -%d9%%#9%%"#$%3#()(3'$)*&'*3%/8+3'*.+%./%(%&8;%./%*+6$-$+6$+'%)(+6.;%7()*(01$&%$A8(1&%'#$%-).683'%./'#$%3#()(3'$)*&'*3%/8+3'*.+&%./%'#$&$%)(+6.;%7()*(01$&4%(+6%'#$%&$3.+6%3#()(3'$)*&'*3%/8+3'*.+%./(%&8;%./%*+6$-$+6$+'%)(+6.;%7()*(01$&%*&%'#$%&8;%./%'#$%&$3.+6%3#()(3'$)*&'*3%/8+3'*.+&%./%'#$&$7()*(01$&C%'#$%3#()(3'$)*&'*3%/8+3'*.+%./%(%;$(+%./%+%*+6$-$+6$+'12%*6$+'*3(112%6*&')*08'$6%)(+6.;7()*(01$&4%5*'#%3#()(3'$)*&'*3%/8+3'*.+%~L'M4%*&%~L'f+M+9%%

Page 73: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*P()4$5&)O+!777777777777777777777777777777777777777777777777777777777777777777777777777!

D*;*1()% -).-$)'*$&% #.16% /.)% -).-$)% ;.;$+'% ,$+$)('*+,% /8+3'*.+&4% 5*'#% .07*.8&% ;.6*/*3('*.+&KD8--.&$%(%)(+6.;%7()*(01$%d%#(&%(%-).-$)%;,/%;L'M4%/*+*'$%/.)%'%m%4%5#$)$%%*&%(%-.&*'*7$%3.+&'(+'9"#$+4%'#$%/.11.5*+,%-).-$)'*$&%#.16K%

(9%%;L'M%R%;$'d%/.)%'%m%9%%09%%|%R%(%]%0d%#(&%'#$%;,/%$'(;L0'M9%%39%%;dB%$<*&'&%/.)%(11%B%p%Q4%(+6%;%%6B;L'Mf6'B%$<*&'&%(+6%*&%8+*/.);12%3.+'*+8.8&%/.)%'%m%4%5*'#;dB%R%;dLQM9%%69%%;L'M%#(&%(%"(21.)&%$<-(+&*.+%L/.)%(+2%BM%;dL'M%R%L;d?M'?f?%]%T;L'M%G%;LQMU'BfB4%5#$)$%%*&(%&3(1()%5*'#%Q%m%%m%I9%%%$9%%:/%d%*&%+.);(112%6*&')*08'$6%5*'#%;$(+%n%(+6%7()*(+3$%PF4%'#$+%*'%#(&%;,/%$<-Ln']PF'FM9%%/9%%c(+6.;%7()*(01$&%%(+6%d%5*'#%-).-$)%;,/%#(7$%*6$+'*3(1%6*&')*08'*.+%/8+3'*.+&%*/%(+6%.+12*/%'#$*)%;,/%()$%*6$+'*3(19%%,9%%:/%d+% -%d%(+6%'#$%(&&.3*('$6%;,/%()$%/*+*'$%/.)%'%m%4%'#$+%'#$%;,/%./%d+%3.+7$),$&%-.*+'5*&$'.%'#$%leb%./%d9%%@.+7$)&$124%*/%d+%#(7$%-).-$)%leb%5#*3#%3.+7$),$&%-.*+'5*&$%'.%(%/8+3'*.+;L'M%'#('%*&%/*+*'$%/.)%'%m%4%'#$+%'#$)$%$<*&'&%d%&83#%'#('%;L'M%*&%'#$%;,/%./%d%(+6%d+% -%d9%%#9%%"#$%;,/%./%(%&8;%./%*+6$-$+6$+'%)(+6.;%7()*(01$&%$A8(1&%'#$%-).683'%./%'#$%;,/%./%'#$&$)(+6.;% 7()*(01$&C% '#$% ;,/% ./% '#$% ;$(+% ./% +% *+6$-$+6$+'12% *6$+'*3(112% 6*&')*08'$6% )(+6.;7()*(01$&4%$(3#%5*'#%-).-$)%;,/%;L'M4%*&%;L'f+M+9%%

"#$%6$/*+*'*.+&%./%3#()(3'$)*&'*3%(+6%;.;$+'%,$+$)('*+,%/8+3'*.+&%3(+%0$%$<'$+6$6%'.%7$3'.)&%./)(+6.;%7()*(01$&9%%D8--.&$%d%*&%(%+^I%)(+6.;%7$3'.)4%(+6%1$'%/%0$%(%+^I%7$3'.)%./%3.+&'(+'&9%%"#$+~L/M%R%;$/d%*&%'#$%3#()(3'$)*&'*3%/8+3'*.+%(+6%;L/M%R%;$/d%*&%'#$%;.;$+'%,$+$)('*+,%/8+3'*.+9%%"#$-).-$)'*$&% ./% 3/% (+6% ;,/% 1*&'$6% (0.7$% (1&.% #.16% *+% '#$*)% ;81'*7()*('$% 7$)&*.+&4% 5*'#% .07*.8&;.6*/*3('*.+&9%%b.)%3#()(3'$)*&'*3%/8+3'*.+&4%'5.%./%'#$%*;-.)'(+'%-).-$)'*$&%')(+&1('$%'.L0qM%|%R%$%]%Bd4%5#$)$%$%*&%(%;^I%7$3'.)%(+6%B%*&%(%;^+%;(')*<4%#(&%3/%$/$~LB/M9L$qM%*/%d%*&%;81'*7()*('$%+.);(1%5*'#%;$(+%^%(+6%3.7()*(+3$%;(')*<%4%'#$+%*'&%3#()(3'$)*&'*3/8+3'*.+%*&%$<-L^/%G%//fFM9

J%8&$/81%*;-1*3('*.+%./%L0qM%(+6%L$qM%*&%'#('%(%1*+$()%')(+&/.);('*.+%./%(%;81'*7()*('$%+.);(1%7$3'.)*&%(,(*+%;81'*7()*('$%+.);(19%%@.+6*'*.+&%L3M%(+6%L6M%)$1('*+,%"(21.)q&%$<-(+&*.+&%(+6%;.;$+'&%/.)8+*7()*('$%3/%#(7$%;81'*7()*('$%7$)&*.+&%5#$)$%'#$%$<-(+&*.+&%()$%*+%'$);&%./%-()'*(1%6$)*7('*7$&%./7()*.8&%.)6$)&9%%@.+6*'*.+&%%L/M%'#).8,#%L#M%()$%8+3#(+,$6%*+%'#$%;81'*7()*('$%7$)&*.+9%%

"#$%-).-$)'*$&%./%3#()(3'$)*&'*3%/8+3'*.+&%(+6%;.;$+'%,$+$)('*+,%/8+3'*.+&%()$%6*&38&&$6%(+6$&'(01*&#$6% *+% @9% c9% c(.% >*+$()% D'('*&'*3(1% :+/$)$+3$4% F09H4% (+6%S9% b$11$)% J+% :+').683'*.+% '.\).0(0*1*'2%"#$.)24%::4%@#(-9%IE%(+6%Ih9%%

Page 74: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)12*Q()4$5&)O3!777777777777777777777777777777777777777777777777777777777777777777777777777!

1=M=)):<9FE#A<!9:?AFE)A#)<9FGA!)>9<?9BC;E

%%g9I9%%D8--.&$%%*&%(%;$(&8)(01$%)(+6.;%7()*(01$%.+%L4"M%5*'#%(%6*&')*08'*.+%bL<M%'#('%*&%(0&.18'$123.+'*+8.8&%5*'#%)$&-$3'%'.%>$0$&,8$%;$(&8)$4%&.%'#('%%#(&%(%6$+&*'2%/L<M9%%@.+&*6$)%(+%*+3)$(&*+,')(+&/.);('*.+%d%R%VLMC%'#$+%d%*&%(+.'#$)%)(+6.;%7()*(01$9%%>$'%#%6$+.'$%'#$%*+7$)&$%/8+3'*.+%./VC%*9$94%2%R%VL<M%*;-1*$&%<%R%#L2M9%%"#$%6*&')*08'*.+%/8+3'*.+%./%d%*&%,*7$+%02

%%eL2M%R%\)Ld%%2M%R%\)LVLM%%2M%R%\)L%%#L2MM%R%bL#L2MM9

S#$+%#L2M% *&%6*//$)$+'*(01$4%5*'#%(%6$)*7('*7$%#L2M%R%6#L2Mf624% '#$%6$+&*'2%./%d% *&%.0'(*+$6%026*//$)$+'*('*+,4%(+6%&('*&/*$&%%,L2M%R%/L#L2MM#L2M9%%D*+3$%2%%VL#L2MM4%.+$%.0'(*+&%02%6*//$)$+'*('*.+'#$% /.);81(% I% % VL#L2MM#L2M4% .)% #L2M% R% IfVL#L2MM9% % D80&'*'8'*+,% '#*&% /.);81(% ,*7$&% ,L2M% R/L#L2MMfVL#L2MM9

;I$J.K&)31=%%D8--.&$%%#(&%'#$%6*&')*08'*.+%/8+3'*.+%bL<M%R%IG$G<%/.)%<%p%Q4%5*'#%bL<M%R%Q%/.)<%%QC%'#$+%%*&%&(*6%'.%#(7$%(+%$<-.+$+'*(1%6*&')*08'*.+9%%D8--.&$%d%R%VLM%%1.,%4%&.%'#('%%R#LdM%%$d9%%"#$+4%eL2M%R%IG$<-LG$2M%(+6%eL2M%R%$<-LG$2M$2%R%$<-L2G$2M%/.)%G%m%2%m%]9%%"#*&%*&%3(11$6(+%$<')$;$%7(18$%6*&')*08'*.+9%%J%'#*)6%$<(;-1$%*&%%5*'#%&.;$%6*&')*08'*.+%/8+3'*.+%b%(+6%6$+&*'2/4%(+6%d%R%bLM4%&.%'#('%/.)%(+2%7(18$%./%4%'#$%3.))$&-.+6*+,%7(18$%./%d%*&%'#$%-).-.)'*.+%./%(11%'#('%()$%0$1.5%'#*&%7(18$9%%>$'%<-%6$+.'$%'#$%&.18'*.+%'.%bL<M%R%-9%%"#$%6*&')*08'*.+%/8+3'*.+%./%d%*&eL2M%R%bL<2M%R%29%%V$+3$4%d%#(&%'#$%8+*/.);%6$+&*'2%.+%'#$%8+*'%*+'$)7(19

"#$%)81$%/.)%(+%*+3)$(&*+,%')(+&/.);('*.+%./%(%)(+6.;%7()*(01$%%3(+%0$%$<'$+6$6%*+%&$7$)(15(2&9%%:/%'#$%')(+&/.);('*.+%d%R%VLM%*&%6$3)$(&*+,%)('#$)%'#(+%*+3)$(&*+,4%'#$+

%%eL2M%R%\)Ld%%2M%R%\)LVLM%%2M%R%\)L%%#L2MM%R%IGbL#L2MM4

5#$)$%#%*&%'#$%*+7$)&$%/8+3'*.+%./%V9%%r*//$)$+'*('*+,4

%%,L2M%R%/L#L2MMLG#L2MM9

"#$+4%3.;0*+*+,%3(&$&4%.+$%#(&%'#$%)$&81'%'#('%'&*!$07!&0%2#&2&0%!#*$0"'&*+$#1&0!d%R%VLM%81#910/%*"%!%R%#LdMD!#9%!3%0"1#7!&'!d%1"

%%,L2M%R%/L#L2MM#L2M%%/L#L2MMfVL#L2M9

J+%$<(;-1$%./%(%6$3)$(&*+,%')(+&/.);('*.+%*&%%5*'#%'#$%$<-.+$+'*(1%6$+&*'2%$G<%/.)%<%p%Q4%(+6%d%RIf9%%D#.5%(&%(+%$<$)3*&$%'#('%eL2M%R%$GIf2%(+6%,L2M%R%$GIf2f2F9

Page 75: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)121+()4$5&)O*!777777777777777777777777777777777777777777777777777777777777777777777777777!

@.+&*6$)%(%')(+&/.);('*.+%d%R%VLM%'#('%*&%+.'%.+$G'.G.+$9%%"#$%*+'$)7(1%LG42M%%*&%'#$%*;(,$%./(%&$'%J2%./%<%7(18$&%'#('%;(2%#(7$%(%3.;-1*3('$6%&')83'8)$9%%j+$%3(+%5)*'$

%%eL2M%R%\)Ld%%2M%R%\)LVLM%%2M%R%\)L%%J2M%R%bLJ2M9

:/%'#*&%$<-)$&&*.+%*&%6*//$)$+'*(01$4%'#$+%*'&%6$)*7('*7$%,*7$&%'#$%6$+&*'29%%

;I$J.K&)36=%%:/%%#(&%(%6*&')*08'*.+%b%(+6%6$+&*'2%/4%(+6%d%R%4%'#$+%J2%R%TG242U4%*;-12*+,eL2M%R%bL2M%G%bLG2M%(+6%/L2M%R%/L2M%]%/LG2M9%%

;I$J.K&)3L9%%:/%d%R%F4%'#$+%J2%R%TG2IfF42IfFU4%eL2M%R%bL2IfFM%G%bLG2IfFM9%%r*//$)$+'*('*+,%/.)%2%%Q4,L2M%R%L/L2IfFM%]%/LG2IfFMMfF2IfF9%%J--12*+,%'#*&%'.%'#$%&'(+6()6%+.);(1%5*'#%bL<M%R%L<M4%'#$%6$+&*'2%./d%*&%,L2M%RL2IfFMf2IfF%%R%LFu2MGy$G2fF4%3(11$6%'#$%3#*G&A8()$%5*'#%.+$%6$,)$$%./%/)$$6.;9

E9g9F9%%=$<'%3.+&*6$)%')(+&/.);('*.+&%./%)(+6.;%7$3'.)&9%%"#$&$%')(+&/.);('*.+&%5*11%-$);*'%8&'.% (+(12`$% &8;&% .)% .'#$)% /8+3'*.+&% ./% )(+6.;%7()*(01$&9% % D8--.&$%% *&% (% +^I% )(+6.;%7$3'.)9@.+&*6$)%/*)&'%'#$%')(+&/.);('*.+%d%R%J4%5#$)$%J%*&%(%+.+&*+,81()%+^+%;(')*<9%%"#$%/.11.5*+,)$&81'%/).;%;81'*7()*('$%3(13818&%)$1('$&%'#$%6$+&*'*$&%./%%(+6%dK%%

:-&T0&J)1=P=%:/%%#(&%6$+&*'2%/L<M4%(+6%d%R%94%5*'#%9%+.+&*+,81()4%'#$+%'#$%6$+&*'2%./%d%*&

%%,L2M%R%/L9GI2Mf6$'L9M%9

\)../K%%S$%5*11%-).7$%'#$%)$&81'%*+%'5.%6*;$+&*.+&4%1$(7*+,%'#$%,$+$)(1%3(&$%'.%'#$%)$(6$)9%%b*)&'4

3.+&*6$)%'#$%3(&$ 5*'#%(II%p%Q%(+6%(FF%p%Q9%%j+$%#(&%eL2I42FM%%bL2If(II42Ff(FFM9NINF

$II Q

Q $FF

OIOF

r*//$)$+'*('*+,%5*'#%)$&-$3'%'.%2I%(+6%2F4%,L2I42FM%%/L2If(II42Ff(FFMf(II(FF9%%"#*&%$&'(01*&#$&%'#$%)$&81'

/.)%6*(,.+(1%')(+&/.);('*.+&9%%D$3.+64%3.+&*6$) %5*'#%(II%p%Q%(+6%(FF%p%Q9%%"#$+NINF

$II Q

$FI $FF

OIOF

eL2I42FM%% /L<I4<FM6<F6<I9%%r*//$)$+'*('*+,%5*'#%)$&-$3'%'.%2I%(+6%2F%2*$16&%7If$II

BI L7F$FIMf$FF

BF

%%FeL2I42FMf2I2F%%,L2I42FM%R%L(II(FFMGI/L2If(II4L2FG2I(FIf(IIMf(FFM9%%

Page 76: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)1213()4$5&)O1!777777777777777777777777777777777777777777777777777777777777777777777777777!

"#*&% $&'(01*&#$&% '#$% )$&81'% /.)% ')*(+,81()% ')(+&/.);('*.+&9% % b*+(1124% 3.+&*6$)% '#$% ,$+$)(1

')(+&/.);('*.+% 5*'#%(II%p%Q%(+6%(II(FFG(IF(FI%p%Q9%%J--12%'#$%)$&81'%/.)%')*(+,81()NINF

$II $IF$FI $FF

OIOF

')(+&/.);('*.+&%/*)&'%'.% 4%(+6%&$3.+6%'.% 9%%"#*&PIPF

I $IFf$IIQ I

OIOF

NINF

$II Q

$FI $FF$IF$FIf$II

PIPF

,*7$&%'#$%,$+$)(1%')(+&/.);('*.+4%(& 9%%"#$%%6$+&*'2%./$II $IF$FI $FF

$II Q

$FI $FF$IF$FIf$II

I $IFf$IIQ I

|% *&% #L`I4`FM% R% /L`IG`F(IFf(II4`FM4% (+6% ./% d% *&% ,L2I42FM% R% #L2If(II4L2FG2I(FIf(IIMfL(FFG(IF(FIf(IIMM9D80&'*'8'*+,%/.)%#%*+%'#$%1(&'%$<-)$&&*.+%(+6%&*;-1*/2*+,%,*7$&%

%%,L2I42FM%R%/LL(FF2IG(IF2FMfr4L(II2FG(FI2IMfrMfr4%

5#$)$%r%R%(II(FFG(IF(FI%*&%'#$%6$'$);*+(+'%./%'#$%')(+&/.);('*.+9%%S$%1$(7$%(&%(+%$<$)3*&$%'#$%-)../%./%'#$%'#$.)$;%/.)%'#$%6$+&*'2%./%d%R%J%*+%'#$%,$+$)(1%3(&$

5*'#%J%+^+%(+6%+.+&*+,81()9%%b*)&'4%)$3(11%'#('%J%3(+%0$%/(3'.)$6%&.%'#('%J%R%\>ri4%%5#$)$%\%(+6%()$%-$);8'('*.+%;(')*3$&4%>%(+6%i%()$%1.5$)%')*(+,81()%5*'#%.+$&%6.5+%'#$%6*(,.+(14%(+6%r%*&%(+.+&*+,81()%6*(,.+(1%;(')*<9% %S)*'$%d%R%\>ri9% %"#$+%3.+&*6$)% '#$% &$)*$&%./% *+'$);$6*('$')(+&/.);('*.+&%.0'(*+$6%02%(--12*+,%$(3#%;(')*<%*+%'8)+4%3.+&')83'*+,%'#$%6$+&*'*$&%(&%5(&%6.+$-)$7*.8&129%%

E9g9E9%%"#$%$<'$+&*.+%/).;%1*+$()%')(+&/.);('*.+&%'.%.+$G'.G.+$%+.+1*+$()%')(+&/.);('*.+&%./7$3'.)&%*&%&')(*,#'/.)5()69%%@.+&*6$)%d%R%VLM4%5*'#%(+%*+7$)&$%')(+&/.);('*.+%%R%#LdM9%%J'%(%-.*+'2.%(+6%<.%R%#L2.M4%(%/*)&'G.)6$)%"(21.)&%$<-(+&*.+%,*7$&

%%2%G%2.%R%9L<%G%<.M%]%&L<%G%<.M4

5#$)$%9%*&%'#$%Q$.&6%$0!;(')*<%

%%%9%R% %

L ILB &MfBI 999 L ILB &MfB0o o

L0LB &MfBI 999 L0LB &MfB0

(+6%'#$%+.'('*.+%&L`M%;$(+&%(+%$<-)$&&*.+%'#('%*&%&;(11%)$1('*7$%'.%`9%%J1'$)+('$124%.+$%#(&

Page 77: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)121*()4$5&)O6!777777777777777777777777777777777777777777777777777777777777777777777777777!

B%R%9GI%R% 9

9 IL7 &Mf7I 999 9 IL7 &Mf70o o

9 0LB &Mf7I 999 9 0L7 &Mf70

"#$%-).0(0*1*'2%./%d%*+%'#$%1*''1$%)$3'(+,1$%T2.42.]2U%*&%(--).<*;('$12%$A8(1%'.%'#$%-).0(0*1*'2%./%*+%'#$%1*''1$%)$3'(+,1$%T<.4<.]JGI2U9%%"#*&%*&%'#$%&(;$%&*'8('*.+%(&%*+%'#$%1*+$()%3(&$4%$<3$-'%'#$)$%'#$$A8(1*'2%5(&%$<(3'9%%"#$+4%'#$%/.);81(&%/.)%'#$%1*+$()%3(&$%3())2%.7$)%6*)$3'124%5*'#%'#$%(3.0$(+;(')*<%./%'#$%')(+&/.);('*.+%)$-1(3*+,%'#$%1*+$()%')(+&/.);('*.+%;(')*<%J9%%:/%/L<M%*&%'#$%6$+&*'2%./4%'#$+%,L2M%R%/L#L2MM6$'LB_%R%/L#L2MMf6$'L9M%*&%'#$%6$+&*'2%./%d9

;I$J.K&)3M=%D8--.&$%(%)(+6.;%7$3'.)%L4|M%#(&%(%6$+&*'2%/L<4`M%/.)%<4`%p%Q4%(+6%3.+&*6$)%'#$+.+1*+$()%')(+&/.);('*.+%S%R%|%(+6%d%R%f|4%5#*3#%#(&%'#$%*+7$)&$%')(+&/.);('*.+%%R%LSdMIfF

(+6%|%R%LSfdMIfF9%%"#$%(3.0$(+%;(')*<%*&%_%R% 4%(+6%6$'LBM%R%IfF29R IfFN IfFfF R IfFN IfFfF

R IfFN IfFfF R IfFN EfFfF

V$+3$4%'#$%6$+&*'2%./%L542M%*&%/LL52MIfF4L5f2MIfFMfF29

:+% -)*+3*-1$4% *'% *&% -.&&*01$% '.% (+(12`$% +G6*;$+&*.+(1% +.+1*+$()% ')(+&/.);('*.+&% '#('% ()$% +.'.+$G'.G.+$% *+% '#$%&(;$%;(++$)%(&% '#$%.+$G6*;$+&*.+(1%3(&$4%02%5.)B*+,%5*'#% '#$%.+$G'.G;(+2*+7$)&$%')(+&/.);('*.+9%%"#$)$%()$%+.%,$+$)(1%/.);81(&4%(+6%$(3#%3(&$%+$$6&%'.%0$%')$('$6%&$-()('$129

j/'$+%*+%(--1*3('*.+&4%.+$%*&%*+'$)$&'$6%*+%(%')(+&/.);('*.+%/).;%(%+^I%7$3'.)%./%)(+6.;%7()*(01$&%'.%(%1.5$)%6*;$+&*.+9%%b.)%$<(;-1$4%.+$%;(2%0$%*+'$)$&'$6%*+%'#$%&3(1()%)(+6.;%7()*(01$%D%R%I]%999%%]%+9%%:/%.+$%N/*11&%.8'N%'#$%')(+&/.);('*.+%*+%(%.+$G'.G.+$%5(24%&.%'#('%'#$%)(+6.;%7()*(01$&%./*+'$)$&'%()$%3.;-.+$+'&%./%'#$%3.;-1$'$%')(+&/.);('*.+4%'#$+%"#$.)$;%E9g%3(+%0$%(--1*$69%%:+%'#$3(&$%./%D4%'#$%')(+&/.);('*.+%dI%%D%/*11$6%.8'%02%d*%R%*%/.)%*%R%F49994+%*&%.+$G'.G.+$4%5*'#%

%%%%% 9%

NINFNEoN0

I I I 999 I

Q I Q 999 Q

Q Q I 999 Q

o o o oQ Q Q 999 I

OIOFOEoO0

Page 78: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)1211()4$5&)OL!777777777777777777777777777777777777777777777777777777777777777777777777777!

;I$J.K&)3O=%%@.+&*6$)%(%)(+6.;%7$3'.)%L4|M%5*'#%(%6$+&*'2%/L<4`M4%(+6%'#$%')(+&/.);('*.+%D%R

%]%|%(+6%"%R%|4%.) 9%%"#$%(3.0$(+%./%'#*&%')(+&/.);('*.+%*&%.+$4%(+6%*'&%*+7$)&$%*&IK

I IQ I

OP

4%&.%'#$%6$+&*'2%./%LD4"M%*&%,L&4'M%R%/L&G'4'M9%%"#$%;(),*+(1%6$+&*'2%./%D%*&%'#$+%,IL&MOP

I IQ I

IK

R% /L&G'4'M6'9%%:/%%(+6%|%()$%&'('*&'*3(112%*+6$-$+6$+'4%&.%'#('%'#$*)%6$+&*'2%*&%/L<4`M%R%/IL<M/FL`M4

#

'#$+%'#*&%0$3.;$&%,IL&M%R% /IL&G'M/FL'M6'9%%"#*&%*&%'$);$6%(%.&0/&-)#1&0%/.);81(9

#

1=O=))E4;,?9C)G?E:<?B\:?AFE

E9s9I9% J% +8;0$)% ./% &-$3*(1% -).0(0*1*'2% 6*&')*08'*.+&% (--$()% /)$A8$+'12% *+% &'('*&'*3&% (+6$3.+.;$')*3&4%0$3(8&$%'#$2%()$%3.+7$+*$+'%/.)%(--1*3('*.+&%.)%*118&')('*.+&4%0$3(8&$%'#$2%()$%8&$/81/.)%(--).<*;('*.+&4%.)%0$3(8&$%'#$2%3).-%8-%*+%1*;*'*+,%(),8;$+'&9%%%"#$%'(01$&%('%'#$%$+6%./%'#*&@#(-'$)%1*&'%;(+2%./%'#$&$%6*&')*08'*.+&9%%

E9s9F9% "(01$% E9I% 1*&'&% 6*&3)$'$% 6*&')*08'*.+&9% % "#$% 0*+.;*(1% (+6% ,$.;$')*3% 6*&')*08'*.+&% ()$-()'*381()12%&*;-1$4%(+6%()$%(&&.3*('$6%5*'#%&'('*&'*3(1%$<-$)*;$+'&%&83#%(&%3.*+%'.&&$&9%%"#$%\.*&&.+6*&')*08'*.+%*&%./'$+%8&$6%'.%;.6$1%'#$%.338))$+3$%./%)()$%$7$+'&9%%"#$%#2-$),$.;$')*3%6*&')*08'*.+*&%(&&.3*('$6%5*'#%31(&&*3(1%-).0(0*1*'2%$<-$)*;$+'&%./%6)(5*+,%)$6%(+6%5#*'$%0(11&%/).;%8)+&4%(+6*&%(1&.%8&$6%'.%(--).<*;('$%;(+2%.'#$)%6*&')*08'*.+&9

E9s9E9%"(01$%E9F%1*&'%(%+8;0$)%./%3.+'*+8.8&%6*&')*08'*.+&4%*+3186*+,%&.;$%0(&*3%6*&')*08'*.+&%&83#(&%'#$%,(;;(%(+6%0$'(%/).;%5#*3#%.'#$)%6*&')*08'*.+&%()$%3.+&')83'$69%%"#$%$<')$;$%7(18$%(+61.,*&'*3%6*&')*08'*.+&%()$%8&$6%*+%'#$%$3.+.;*3%'#$.)2%./%6*&3)$'$%3#.*3$4%(+6%()$%(1&.%./%&'('*&'*3(1*+'$)$&'%0$3(8&$%'#$2%#(7$%&*;-1$%31.&$6%/.);%@rbq&9%%

E9s9H9%"#$%+.);(1%6*&')*08'*.+%(+6%*'&%)$1('$6%6*&')*08'*.+&%-1(2%(%3$+')(1%).1$%*+%$3.+.;$')*3&40.'#%0$3(8&$%'#$2%-).7*6$%'#$%/.8+6('*.+%/.)%/*+*'$G&(;-1$%6*&')*08'*.+%)$&81'&%/.)%)$,)$&&*.+%;.6$1&5*'#%+.);(112%6*&')*08'$6%6*&'8)0(+3$&4%(+6%0$3(8&$%'#$2%(--$()%(&%1*;*'*+,%(--).<*;('*.+&%*+%1(),$&(;-1$&%$7$+%5#$+%'#$%/*+*'$%&(;-1$%6*&')*08'*.+&%()$%8+B+.5+%.)%*+')(3'(01$9%%"(01$%E9E%1*&'&%'#$+.);(1% 6*&')*08'*.+4% (+6% (% +8;0$)% ./% .'#$)% 6*&')*08'*.+&% '#('% ()$% )$1('$6% '.% *'9% % "#$% '% (+6% b6*&')*08'*.+&%(--$()%*+%'#$%'#$.)2%./%#2-.'#$&*&%'$&'*+,4%(+6%'#$%3#*G&A8()$%6*&')*08'*.+%(--$()&%*+

Page 79: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)1216()4$5&)OM!777777777777777777777777777777777777777777777777777777777777777777777777777!

1(),$G&(;-1$%(--).<*;('*.+&9%%"#$%+.+G3$+')(1%7$)&*.+&%./%'#$&$%6*&')*08'*.+&%(--$()%*+%3(1381('*.+&./%'#$%-.5$)%./%#2-.'#$&*&%'$&'&9:'%*&%(%&'(+6()6%$<$)3*&$%*+%;('#$;('*3(1%&'('*&'*3&%'.%$&'(01*&#%'#$%)$1('*.+&#*-&%0$'5$$+%+.);(14

3#*G&A8()$4%b4%(+6%'%6*&')*08'*.+&9%%b.)%3.;-1$'$+$&&4%5$%&'('$%'#$%;.&'%*;-.)'(+'%)$&81'K

:-&T0&J)1=Q=%%=.);(1%(+6%3#*G&A8()$%)(+6.;%7()*(01$&%#(7$%'#$%/.11.5*+,%-).-$)'*$&KL:M%:/%D%R%dIF%]%999%]%dBF4%5#$)$%'#$%dB%()$%*+6$-$+6$+'%+.);(1%)(+6.;%7()*(01$&%5*'#%;$(+&%nB(+6%8+*'%7()*(+3$&4%'#$+%D%#(&%(%+.+G3$+')(1%3#*G&A8()$%6*&')*08'*.+%5*'#%6$,)$$&%./%/)$$6.;-()(;$'$)%B%(+6%+.+G3$+')(1*'2%-()(;$'$)%x%R%nIF%]%999%]%nBF4%6$+.'$6%FLB4xM9%%:/%x%R%Q4%'#*&%*&(%L3$+')(1M%3#*G&A8()$%6*&')*08'*.+%5*'#%6$,)$$&%./%/)$$6.;%-()(;$'$)%B4%6$+.'$6%FLBM9L**M%:/%d%(+6%D%()$%*+6$-$+6$+'4%d%*&%+.);(1%5*'#%;$(+%%(+6%8+*'%7()*(+3$4%(+6%D%*&%3#*G&A8()$5*'#% B% 6$,)$$&% ./% /)$$6.;4% '#$+%"%R%dfLDfBMy% *&% +.+G3$+')(1% 'G6*&')*08'$6%5*'#% 6$,)$$&% .//)$$6.;%-()(;$'$)%B%(+6%+.+G3$+')(1*'2%-()(;$'$)%4%6$+.'$6%'LB4M9%%:/%%R%Q4%'#*&%*&%(%L3$+')(1M'G6*&')*08'*.+%5*'#%6$,)$$&%./%/)$$6.;%-()(;$'$)%B4%6$+.'$6%'LBM9L***M%:/%c%(+6%D%()$%*+6$-$+6$+'4%c%*&%+.+G3$+')(1%3#*G&A8()$%5*'#%6$,)$$&%./%/)$$6.;%-()(;$'$)B%(+6%+.+G3$+')(1*'2%-()(;$'$)%x4%(+6%D%*&%3$+')(1%3#*G&A8()$%5*'#%6$,)$$&%./%/)$$6.;%-()(;$'$)+4%'#$+%b%R%+cfBD%*&%+.+G3$+')(1%bG6*&')*08'$6%5*'#%6$,)$$&%./%/)$$6.;%-()(;$'$)&%LB4+M%(+6+.+G3$+')(1*'2%-()(;$'$)%x4%6$+.'$6%bLB4+4xM9%%:/%x%R%Q4%'#*&%6*&')*08'*.+%*&%bG6*&')*08'$6%5*'#6$,)$$&%./%/)$$6.;%-()(;$'$)&%LB4+M4%(+6%*&%6$+.'$6%bLB4+M9L*7M%"% *&%+.+G3$+')(1% 'G6*&')*08'$6%5*'#%6$,)$$&%./% /)$$6.;%-()(;$'$)%B% (+6%+.+G3$+')(1*'2-()(;$'$)% % */% (+6% .+12% */% b% R% "F% *&% +.+G3$+')(1% bG6*&')*08'$6% 5*'#% 6$,)$$&% ./% /)$$6.;-()(;$'$)&%LI4BM%(+6%+.+G3$+')(1*'2%-()(;$'$)%x%R%F9%

\)../K%"#$&$%)$&81'&%3(+%0$%/.8+6%*+%;.&'%31(&&*3(1%'$<'&%*+%;('#$;('*3(1%&'('*&'*3&C%&$$%-()'*381()12c(.%LIzsEM4%--9%IggGIgs4%IsQGIsF4%IvIGIvF4%.#+&.+%[%.'`%LIzsQM4%@#(-9%FgGEI4%(+6%e)(20*11LIzgIM4%@#(-9%H99%%%

:+%(--1*$6%&'('*&'*3&4%*'%*&%*;-.)'(+'%'.%0$%(01$%'.%3(1381('$%7(18$&%<%R%eGIL-M4%5#$)$%e%*&%'#$%@rb./%'#$%3$+')(1%3#*G&A8()$4%b4%.)%'4%6*&')*08'*.+4%(+6%7(18$&%-%R%eL<M%5#$)$%e%*&%'#$%@rb%./%'#$%+.+G3$+')(1%3#*G&A8()$4%b4%.)%'%6*&')*08'*.+9%%D$1$3'$6%-.*+'&%./%'#$&$%6*&')*08'*.+&%()$%'(01$6%*+%;(+20..B&%./%;('#$;('*3(1%(+6%&'('*&'*3(1%'(01$&4%08'%*'%*&%;.)$%3.+7$+*$+'%(+6%(338)('$%'.%3(1381('$%'#$&$7(18$&%5*'#*+%(%&'('*&'*3(1%.)%$3.+.;$')*3&%&./'5()$%-(3B(,$9%%l.&'%38))$+'%-(3B(,$&4%*+3186*+,%"D\4D"J"J4%(+6%DD"4%3(+%-).7*6$%'#$&$%7(18$&9

E9s9h9%j+$%./%'#$%;.&'%#$(7*12%8&$6%6*&')*08'*.+&%*+%$3.+.;$')*3&%*&%'#$%;81'*7()*('$%+.);(19%%S$6$&3)*0$% '#*&% 6*&')*08'*.+% (+6% &8;;()*`$% &.;$% ./% *'&% -).-$)'*$&9% % J% +^I% )(+6.;% 7$3'.)%D% *&;81'*7()*('$%+.);(1%5*'#%(%7$3'.)%./%;$(+&%^%(+6%(%-.&*'*7$%6$/*+*'$%3.7()*(+3$%;(')*<%`%*/%*'%#(&'#$%6$+&*'2

Page 80: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)121L()4$5&)OO!777777777777777777777777777777777777777777777777777777777777777777777777777!

0La%G%^4M%R%LFuMG+fF%6$'LMGIfF%$<-LGLLa%G%^MGILa%G%^MfFM9

"#*&%6$+&*'2%*&%(1&.%&.;$'*;$&%6$+.'$6%0LaC^4M4%(+6%'#$%@rb%6$+.'$6%(LaC^4M9%%:'&%3#()(3'$)*&'*3/8+3'*.+%*&%$<-L^/%G%//fFM4%(+6%*'%#(&%'#$%;.;$+'&%a%d%R%^%(+6%a%LdG^MLdG^M%R%9%%b).;%'#$3#()(3'$)*&'*3%/8+3'*.+%(+6%'#$%)81$%/.)%1*+$()%')(+&/.);('*.+&4%.+$%#(&%*;;$6*('$12%'#$%-).-$)'2%'#('(%1*+$()%')(+&/.);('*.+&%./%(%;81'*7()*('$%+.);(1%7$3'.)%*&%(,(*+%;81'*7()*('$%+.);(19%%D-$3*/*3(1124*/%d%*&%6*&')*08'$6%(LaC^4M4%'#$+%'#$%1*+$()%')(+&/.);('*.+%|%R%$%]%Bd4%5#*3#%#(&%;$(+%$%]%B^%(+63.7()*(+3$%;(')*<%BB4%*&%6*&')*08'$6%(LbC$)c)B^4BBM9%%"#$%6*;$+&*.+%./%|%+$$6%+.'%0$%'#$%&(;$(&%'#$%6*;$+&*.+%./%d4%+.)%6.$&%_%#(7$%'.%0$%./%;(<*;8;%)(+BC%*/%BB%*&%1$&&%'#(+%/811%)(+B4%'#$+'#$%6*&')*08'*.+%./%|%*&%3.+3$+')('$6%.+%(+%(//*+$%1*+$()%&80&-(3$%./%6*;$+&*.+%+%'#).8,#%'#$%-.*+'$%]%B^9% %>$'%PB%R%LBBMy%6$+.'$% '#$% &'(+6()6%6$7*('*.+%./%dB4%(+6% 1$'%B?%R%B?fPB%P?%6$+.'$% '#$3.))$1('*.+%./%dB%(+6%d?9%%"#$+%'#$%3.7()*(+3$%;(')*<%%3(+%0$%5)*''$+

%R% R%G<G4

PI Q 999 Q

Q PF 999 Q

Q Q 999 P0

I IF 99 I0

FI I 999 F0

0S 0T 99 I

PI Q 999 Q

Q PF 999 Q

Q Q 999 P0

5#$)$%G%R%6*(,LPI49994P+M%(+6%<%*&%'#$%())(2%./%3.))$1('*.+%3.$//*3*$+'&9%%

:-&T0&J)1=3+=%D8--.&$%d%*&%-()'*'*.+$6%D%R%LDI%DFM4%5#$)$%DI%*&%;^I4%(+6%1$'%^%R%L^I%^FM

(+6% %0$%3.;;$+&8)('$%-()'*'*.+&%./%^% (+6%`9% %"#$+% '#$%;(),*+(1%6$+&*'2%./%DI% *&II IF

FI FF

;81'*7()*('$%+.);(1%5*'#%;$(+%^I%(+6%3.7()*(+3$%;(')*<%`II9%%"#$%3.+6*'*.+(1%6$+&*'2%./%DF4%,*7$+DI%R%aI4%*&%;81'*7()*('$%+.);(1%5*'#%;$(+%^F%]%FFGIFILaI%G%^IM%(+6%3.7()*(+3$%;(')*<%`FF%G%FI%IIGIIF9%%"#$+4%'#$%3.+6*'*.+(1%;$(+%./%(%;81'*7()*('$%+.);(1%*&%1*+$()%*+%'#$%3.+6*'*.+*+,%7()*(01$&9

\)../K%%"#$%$(&*$&'%5(2%'.%6$;.+&')('$%'#$%'#$.)$;%*&%'.%)$3(11%/).;%@#(-'$)%F%'#('%'#$%-.&*'*7$6$/*+*'$%;(')*<% %#(&%(%@#.1$&B2%/(3'.)*`('*.+% %R%CC4%5#$)$%C%*&%1.5$)%')*(+,81()4%(+6%'#('%C%#(&(+%*+7$)&$%U%'#('%*&%(,(*+%1.5$)%')*(+,81()9%%:/%d%*&%(%+^I%7$3'.)%./%*+6$-$+6$+'%&'(+6()6%+.);(1)(+6.;%7()*(01$&%L$9,94%$(3#%d*%#(&%;$(+% $).%(+6%7()*(+3$%IM4%'#$+%D%R% %]%Cd%*&%+.);(1%5*'#%;$(+^%(+6%3.7()*(+3$%;(')*<%`9%%@.+7$)&$124%*/%d%#(&%6$+&*'2%0La%G%^4M4%'#$+%d%R%ULd%G%^M%*&%(%7$3'.)./%*9*969%&'(+6()6%+.);(1%)(+6.;%7()*(01$&9%%"#$&$%&'('$;$+'%8&$%'#$%*;-.)'(+'%-).-$)'2%./%+.);(1)(+6.;%7$3'.)&%'#('%(%1*+$()%')(+&/.);('*.+%*&%(,(*+%+.);(19%%"#*&%3(+%0$%&#.5+%6*)$3'12%02%8&*+,'#$% /.);81(&% *+% D$3'*.+% E9g% /.)% 6$+&*'*$&% ./% 1*+$()% ')(+&/.);('*.+&4% .)% 02% .0&$)7*+,% '#('% '#$L;81'*7()*('$M%3#()(3'$)*&'*3%/8+3'*.+%./%d%5*'#%6$+&*'2%0La%G% 4M%*&%$<-L%/^)2)//fFM4%(+6%'#$%/.);./%'#*&%3#()(3'$)*&'*3%/8+3'*.+%*&%8+3#(+,$6%02%1*+$()%')(+&/.);('*.+&9

Page 81: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)121M()4$5&)OP!777777777777777777777777777777777777777777777777777777777777777777777777777!

"#$%@#.1$&B2%3.+&')83'*.+%D%R%^%]%Cd%-).7*6$&%(+%$(&2%6$;.+&')('*.+%/.)%'#$%6$+&*'*$&%./;(),*+(1%.)%3.+6*'*.+(1%&807$3'.)&%./%D9%%\()'*'*.+%C%(+6%d%3.;;$+&8)('$12%5*'#%LDI%DFM4%&.%'#('

C%R% (+6%|%R%LdI%dFM9%%"#$+%II%R%CIICII4%FI)e)CFI%>II4%%FF%R%CFFCFF%]%CFICFI4%(+60II +

0FI 0FF

#$+3$)FIIIGI%R%CFICIIGI4%*;-12*+,%CFFCFF%R%FF%G%FIIIGIIF9%%"#$+4%DI%R% I)c)CIIdI%#(&%(%;(),*+(1;81'*7()*('$%+.);(1%6$+&*'2%5*'#%;$(+% I%(+6%3.7()*(+3$%;(')*<%CII>II%R%II9%%J1&.4%DF%R% F)c)CFIdI]%CFF|F4%*;-12*+,%DF%R%^F)c)CFI>IIGILDI%G%^IM%]%CFFdF9%%@.+6*'*.+$6%.+%DI%R%aI4%'#*&%*;-1*$&%DF%R%^Fc)FIIIGILaI%G%^IM%]%CFF|F%*&%;81'*7()*('$%+.);(1%5*'#%;$(+%^F)2)FIIIGI^I%(+6%3.7()*(+3$%;(')*<FF%G%FIIIGIIF9%%

"#$%+$<'%'#$.)$;%,*7$&%&.;$%(66*'*.+(1%8&$/81%-).-$)'*$&%./%'#$%;81'*7()*('$%+.);(1%(+6%./A8(6)('*3%/.);&%*+%+.);(1%7$3'.)&9%

:-&T0&J)1=33=%%>$'%d%0$%(%+^I%)(+6.;%7$3'.)9%%"#$+4L*M%:/%D%R%LDI%DFM%*&%;81'*7()*('$%+.);(14%'#$+%DI%(+6%DF%()$%*+6$-$+6$+'%*/%(+6%.+12%*/%'#$2()$%8+3.))$1('$69%%V.5$7$)4%DI%(+6%DF%3(+%0$%8+3.))$1('$6%(+6%$(3#%#(7$%(%;(),*+(1%+.);(16*&')*08'*.+%5*'#.8'%+$3$&&()*12%0$*+,%*+6$-$+6$+'9L**M%:/%$7$)2%1*+$()%3.;0*+('*.+%"d%*&%+.);(14%'#$+%d%*&%;81'*7()*('$%+.);(19L***M%:/%d%*&%*9*969%&'(+6()6%+.);(1%(+6%9%*&%(+%*6$;-.'$+'%+^+%;(')*<%./%)(+B%B4%'#$+%d9d%*&6*&')*08'$6%FLBM9L*7M%:/%d%*&%6*&')*08'$6%(L^4?M%(+6%9%*&%(+%*6$;-.'$+'%+^+%;(')*<%./%)(+B%B4%'#$+%d9d%*&6*&')*08'$6%FLB4xM%5*'#%x%R%^9^9L7M%:/%d%*&%*9*969%&'(+6()6%+.);(1%(+6%9%(+6%B%()$%-.&*'*7$%&$;*6$/*+*'$%+^+%;(')*3$&4%'#$+d9d%(+6%dBd%()$%*+6$-$+6$+'%*/%(+6%.+12%*/%9B%R%+9L7*M%:/%d%*&%6*&')*08'$6%(L^4?M4%(+6%9*%*&%(+%*6$;-.'$+'%+^+%;(')*<%./%)(+B%B*%/.)%:%R%I49994%'#$+'#$%d9*)d%()$%;8'8(112%*+6$-$+6$+'%(+6%6*&')*08'$6%FLB*4x*M%5*'#%x*%R%^9*^%*/%(+6%.+12%*/$*'#$)%L(M%9*9?%R%Q%/.)%:%%?%.)%L0M%9I%]%999%]%9%*&%*6$;-.'$+'9L7**M%:/%d%*&%6*&')*08'$6%(L^4?M4%9%*&%(%-.&*'*7$%&$;*6$/*+*'$%+^+%;(')*<4%B%*&%(%B^+%;(')*<4%(+6B9%R%+4%'#$+%Bd%(+6%d9d%()$%*+6$-$+6$+'9L7***M%:/%d%*&%6*&')*08'$6%(L^4?M%(+6%9%*&%(%-.&*'*7$%&$;*6$/*+*'$%+^+%;(')*<4%'#$+%a%d9d%R^9^%]%')L9M9

\)../K%%c$&81'&%L*M%(+6%L**M%()$%-).7$6%*+%J+6$)&.+%LIzhvM4%"#;9%F9H9F%(+6%F9g9F9%%b.)%L***M%(+6%L*7M45)*'$%9%R%\\4%5#$)$%'#*&%*&%*'&%&*+,81()%7(18$%6$3.;-.&*'*.+%5*'#%\%(%+^B%3.18;+%.)'#.,.+(1;(')*<9%%"#$+%\d%*&%6*&')*08'$6%(L\^4?BM4%(+6%'#$%)$&81'%/.11.5&%/).;%"#$.)$;%E9v9%%b.)%L7M4%1$'%B0$%'#$%)(+B%./%9%(+6%;%'#$%)(+B%./%B9%%"#$)$%$<*&'&%(%+^B%;(')*<%\%./%)(+B%B%(+6%(%+^;%;(')*<%>./%)(+B%;%&83#%'#('%9%R%\\%(+6%B%R%>>9%%"#$%7$3'.)&%\d%(+6%>d%()$%8+3.))$1('$64%#$+3$

Page 82: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)121O()4$5&)OQ!777777777777777777777777777777777777777777777777777777777777777777777777777!

*+6$-$+6$+'4%*/%(+6%.+12%*/%\>%R%+9%%_8'%9B%R%\L\>M>%*&% $).%*/%(+6%.+12%*/%\>%R%+%&*+3$%\%(+6>%()$%./%;(<*;8;%)(+B9%%b.)%L7*M4%8&$%'#$%Dtr%6$3.;-.&*'*.+%(&%*+%L*7M9%%b.)%L7**M4%5)*'$%9%R%\\5*'#%\%./%;(<*;8;%)(+B%(&%*+%L7M9%%"#$+%B9%R%LB\M\%R%+%*;-1*$&%B\%R%+4%&.%'#('%Bd%(+6%\d%()$*+6$-$+6$+'%02%L*M9%%b.)%L7**M4%a%d9d%R%^9^%]%a%LdG^M9YdG^M%R%^9^%]%')La%LdG^M9YdG^MM%R^9^%]')L9M9%%

1=P=)FA:;E)9FG),A!!;F:E

"#$%-8)-.&$%./%'#*&%3#(-'$)%#(&%0$$+%'.%3.11$3'%'#$%B$2%)$&81'&%/).;%-).0(0*1*'2%'#$.)2%'#('%()$8&$6%*+%$3.+.;$')*3&9%%S#*1$%'#$%3#(-'$)%*&%)$(&.+(012%&$1/G3.+'(*+$64%*'%*&%$<-$3'$6%'#('%'#$%)$(6$)5*11%(1)$(62%0$%/(;*1*()%5*'#%;.&'%./%'#$%3.+3$-'&4%(+6%3(+%*/%+$3$&&()2%)$/$)%'.%.+$%./%'#$%$<3$11$+''$<'&%*+%0(&*3%-).0(0*1*'2%'#$.)2%(+6%;('#$;('*3(1%&'('*&'*3&4%&83#%(&%\9%_*11*+,&1$24%?*&6$61-1#7!$03U%$")*%4%S*1$24%IzvgC%.)%d9%@#.5%(+6%V9%"$*3#$)4%?*&6$61-1#7!K9%&*74% Izzs9% %J%31(&&*3% '#('-).7*6$&%(+%(33$&&*01$%')$(';$+'%./%/*$16&%./%&80&$'&4%;$(&8)$4%(+6%&'('*&'*3(1%*+6$-$+6$+3$%*&%9=$7$84%U$#9%+$#1.$-!G&)03$#1&0"!&'!#9%!E$-.)-)"!&'!?*&6$61-1#74%V.16$+Gr(24%Izgh9%%J+.'#$)31(&&*3%'#('%3.+'(*+&%;(+2%)$&81'&%/).;%;('#$;('*3(1%&'('*&'*3&%*&%@9%c9%c(.%LIzsEM%;10%$*!I#$#1"#1.$-V0'%*%0.%!$03!V#"!C,,-1.$#1&0"4%S*1$29% %J%3.;-)$#$+&*7$%31(&&*3(1% '$<'%5*'#%')$(';$+'%./%;(+2'.-*3&4%*+3186*+,%3#()(3'$)*&'*3%/8+3'*.+&4%*&%S9%b$11$)4!C0!V0#*&3).#1&0!#&!?*&6$61-1#7!K9%&*7!$03V#"!C,,-1.$#1&0"4%t.19%I[F4%S*1$24%Izhs9%%b.)%&-$3*(1%6*&')*08'*.+&4%-).-$)'*$&%./%6*&')*08'*.+&4%(+63.;-8'('*.+4%(%/.8)G7.18;$%3.;-$+6*8;%02%=9%.#+&.+%(+6%D9%.'`4%r*&')*08'*.+&%*+%D'('*&'*3&4V.8,#'.+Gl*//1*+4%IzsQ4%*&%(%,..6%&.8)3$9%%b.)%'#$%;81'*7()*('$%+.);(1%6*&')*08'*.+4%"9%J+6$)&.+LIzhvM%C0! V0#*&3).#1&0! #&!U)-#1/$*1$#%! I#$#1"#1.$-! C0$-7"1"4%S*1$24% (+6% b9%e)(20*11% LIzgIM%C0V0#*&3).#1&0!#&!;10%$*!I#$#1"#1.$-!U&3%-"4%l3e)(5GV*114%()$%,..6%&.8)3$&9%%c$(6$)&%5#.%/*+6%&.;$&$3'*.+&%./%'#*&%3#(-'$)%8+/(;*1*()%.)%'..%6$+&$%;(2%/*+6%*'%8&$/81%'.%/*)&'%)$7*$5%(+%*+').683'.)2%'$<'('%'#$%8+6$),)(68('$%1$7$14%&83#%(&%9%@#8+,4%C!E&)*"%!10!?*&6$61-1#7!K9%&*74%J3(6$;*3%\)$&&4%=$5d.)B4%.)%c9%>()&$+%(+6%l9%l()<4%?*&6$61-1#7!K9%&*74%\)$+'*3$GV(119

Page 83: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)121P()4$5&)P+!777777777777777777777777777777777777777777777777777777777777777777777777777!

%%%:9BC;)1=3=)E4;,?9C)G?E,<;:;)G?E:<?B\:?AFE%=Jla%[%rjlJ:= ra=D:"d ljla="D% @VJc9%%b=9I9%%_*+.;*(1

0=, =LI,M0=

n%R%+- LIG-]-$'M+

B%R%Q4I49994+ Q%m%-%m%I PF%R%+-LIG-M =.'$%IF9%%V2-$),$.;$')*3

*=

80=

*80

n%R%+)fL)]5M =.'$%F

B%(+%*+'$,$);(<WQ4+G5X%%B[%B%%;*+W)4+X

)]5%p%+%)454+%-.&*'*7$%*+'$,$)& PF%R%

0*8L*8MF

*80*8I

E9%%e$.;$')*3 -LIG-MB n%R%LIG-Mf- =.'$%EB%R%Q4I4F4999 Q%m%-%m%I PF%R%LIG-Mf-FH9%%\.*&&.+ $GBfB n%R% $<-TL$'GIMUB%R%Q4I4F4999 %p%Q PF%R%F =.'$%Hh9%%=$,('*7$%_*+.;*(1

-)LIG-MB*=I=

n%R%)LIG-Mf-

B%R%Q4I4F4999 )%*+'$,$)4%)%p%Q%[%Q%m%-%m%I PF%R%)LIG-Mf-F =.'$%h%

=j"aDI9%%n%%a%L'#$%;$(+M4%(+6%PF%R%aLGnMF%L'#$%7()*(+3$M9% %"#$%6$+&*'2%*&%./'$+%6$+.'$6%0LBC+4-M9%%"#$%;.;$+',$+$)('*+,%/8+3'*.+%*&%LIG-]-$'M+%9%%F9%%"#$%3#()(3'$)*&'*3%(+6%;.;$+'%,$+$)('*+,%/8+3'*.+&%()$%3.;-1*3('$69%%E9%%"#$%3#()(3'$)*&'*3%/8+3'*.+%*&%-fLIGLIG-M$'M%(+6%'#$%;.;$+'%,$+$)('*+,%/8+3'*.+%*&%-fLIGLIG-M$'M4%6$/*+$6%/.)%'%mG1+LIG-M9%%H9%%"#$%;.;$+'%,$+$)('*+,%/8+3'*.+%*&%$<-LL$'GIMM4%6$/*+$6%/.)%(11%'9%%h9%%"#$%3#()(3'$)*&'*3%/8+3'*.+%*&%-)fLIGLIG-M$'M)4%(+6%'#$%;.;$+'%,$+$)('*+,%/8+3'*.+%*&%-)fLIGLIG-M$'M)4%6$/*+$6%/.)'%m%G1+LIG-M9%%

Page 84: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)121Q()4$5&)P3!777777777777777777777777777777777777777777777777777777777777777777777777777!

:9BC;)1=*=)E4;,?9C),AF:?F\A\E)G?E:<?B\:?AFE=Jla%[%rjlJ:= ra=D:"d ljla="D @VJc9%%b=9I9%%i+*/.);

(%%<%%0%%%%%%%%%%%%%%%%%%%IfL0G(M n%R%L(]0MfF

PF%R%L0G(MFfIF % 6#% $#

#L6$M

=.'$%IF9%%")*(+,81() LIGo<of(Mf( n%R%Q

F I3.& $#$ F# Fo<o%m%( PF%R%(F%fg

E9%@(83#2 (fuL(F%]%L<GnMFM +.+$ $'nGo'xoG%m%<%m%]%

H9%%a<-.+$+'*(1 $G<ff n%R% IfLIG'M<%%Q% PF%R%F =.'$%F

h9%%\()$'. 0(0<GG0GI n%R%(0fL0GIM =.'$%E<%%( PF%R%0(FfL0GIMFL0GFM

g9%%e(;;(<%p%Q% %%%%%%%%%%%% B

$I% Bf6

L$M6 $

n%R%(0PF%R%(0F

LIG0'MG(=.'$%H

s9%%_$'(Q%m%<%m%I% <(GILIG<M0GIL$6M

L$ML6M

n%R%(fL(]0M

PF%R $6L$6MFL$6IM

=.'$%h

v9%%a<')$;$%t(18$$<-I

6B$6

% LB$Mf6n%R%(%]%Q9hssFI0 =.'$%g

G%m%<%m%]% PF%R%Lu0MFfIFz9%%>.,*&'*3

I6

$<-LL$BMf6MLI$<-LL$BMf6MMF

n%R%( =.'$%s

G%m%<%m%]% PF%R%Lu0MFfg%=j"aD%I9%%"#$%;.;$+'%,$+$)('*+,%/8+3'*.+%*&%L$0'%G%$('MfL0G(M'4%6$/*+$6%/.)%(11%'9%%%%F9%%"#$%;.;$+'%,$+$)('*+,%/8+3'*.+%*&%IfLI%G%'M4%6$/*+$6%/.)%'%m%If9%%%E9%%"#$%;.;$+'%,$+$)('*+,%/8+3'*.+%6.$&%+.'%$<*&'9%%"#$%;$(+%$<*&'&%/.)%0%p%I4%'#$%7()*(+3$%$<*&'&%/.)%0%p%F9%%%H9%%b.)%(%p%Q4%L(M%R%.%<%(GI$G<6<%*&%'#$%,(;;(%/8+3'*.+9%%:/%(%*&%(+%*+'$,$)4%L(M%R%L(GIM9%%%h9%%b.)%'#$%3#()(3'$)*&'*3%/8+3'*.+4%&$$%@9%c9%c(.4%>*+$()%D'('*&'*3(1%:+/$)$+3$4%S*1$24%IzsE4%-9%IhI9%g9%%"#$%;.;$+'%,$+$)('*+,%/8+3'*.+%*&%$('LI%G%'0M%/.)%'%m%If0%9%%%s9%%"#$%;.;$+'%,$+$)('*+,%/8+3'*.+%*&%$('u0'f&*+Lu0'M%/.)%'m%IfF09%%

Page 85: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)126+()4$5&)P*!777777777777777777777777777777777777777777777777777777777777777777777777777!

:9BC;)1=1=):8;)FA<!9C)G?E:<?B\:?AF)9FG)?:E)<;C9:?>;E=Jla%[%rjlJ:= ra=D:"d ljla="D @VJc9%%b=9I9%%=.);(1

0L<Gn4PMG%m%<%m%]4%P%p%Q

LFuPFMGy$<-L MLBnMF

FPF

n%R%;$(+PF%R%7()*(+3$

$<-L'nGPF'FfFM=.'$%I

F9%%D'(+6()6%=.);(1G%m%<%m%]

L<M%R%LFuMGy$<-LG<FfFM n%R%QPF%R%I

$<-LG'FfFM

E9%%@#*GDA8()$Q%m%<%m%] FL<CBM%R% B

L=fFMI%BfF

L=fFMF=fF

n%R%BPF%R%FBB%R%I4F4999

LIGF'MGBfF=.'$%F

H9%%bG6*&')*08'*.+Q%m%<%m%]

bL<CB4+MB4+%-.&*'*7$%*+'$,$)&

n%R%*/%+%p%F

PF F0 FL=0FM=L0FMFL0HM

%%%%%%%%%%%*/%+%p%H

=.'$%E

h9%%'G6*&')*08'*.+G%m%<%m%] L =I

FMLIB Ff=ML=IMfF

= L IFML IF=

FM

n%R%Q%*/%B%p%IPF%R%BfLBGFM%*/%B%p%F

=.'$%H

I9%%=.+3$+')(1 FL<CB4xM n%R%B]x =.'$%h%%@#*GDA8()$6 B%-.&9%%*+'$,$) PF%R%FLB]FxM

<%p%Q x%%QF9%%=.+3$+')(1 bL<CB4+4xM */%+%p%F4%n%R%+LB]xMfBL+GFM =.'$%g%%bG6*&')*08'*.+

<%p%QB4+%-.&*'*7$%*+'$,$)&x%%Q

*/%+%p%H4%PF%R

FL0f=MFL=xMFL=FxML0FML0FMFL0HM

E9%%=.+3$+')(1'G6*&')*08'*.+

'L<CB4MB%-.&9%%*+'$,$) n%R% %*/%B%p%ILL=IMfFM

L=fFM

=.'$%s

% PF%R%LI]FMBfLBGFM%G%nF%%%%%%%%*/%B%p%F

%%

Page 86: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)1263()4$5&)P1!777777777777777777777777777777777777777777777777777777777777777777777777777!

=j"aD%"j%"J_>a%E9EI9%%"#$%6$+&*'2%*&%./'$+%6$+.'$6%0L<Gn4PFM4%(+6%'#$%38;81('*7$%6*&')*08'*.+%)$/$))$6%'.%(&%(L<Gn4PFM4%.)%&*;-12%(Ln4PFM9%%"#$;.;$+'%,$+$)('*+,%/8+3'*.+%*&%$<-Ln']PF'FfFM4%6$/*+$6%/.)%(11%'9%%"#$%&'(+6()6%+.);(1%6$+&*'2%*&%./'$+%6$+.'$6%L<M4%(+6%'#$&'(+6()6%+.);(1%@rb%*&%6$+.'$6%L<M9% %"#$%,$+$)(1%+.);(1%(+6%&'(+6()6%+.);(1%/.);81(&%()$%)$1('$6%02%0L<Gn4PFM%RLL<GnMfPMfP%(+6%(L<Gn4PFM%R%LL<GnMfPM9%%F9%%"#$%;.;$+'%,$+$)('*+,%/8+3'*.+%*&%LIG'fFMGBfF%/.)%'%m%F9%%"#$%@#*GDA8()$%6*&')*08'*.+%5*'#%-()(;$'$)%B%L%6$,)$$&%.//)$$6.;M%*&%'#$%6*&')*08'*.+%./%'#$%&8;%./%&A8()$&%./%B%*+6$-$+6$+'%&'(+6()6%+.);(1%)(+6.;%7()*(01$&9%%"#$%@#*GDA8()$6$+&*'2%*&%'#$%&(;$%(&%'#$%,(;;(%6$+&*'2%5*'#%0%R%F%(+6%(%R%BfF9E9% %"#$%bG6*&')*08'*.+% *&% '#$%6*&')*08'*.+%./% '#$% $<-)$&&*.+%+ifBt4%5#$)$%i% *&% (% )(+6.;%7()*(01$%5*'#% (%@#*G&A8()$6*&')*08'*.+%5*'#%-()(;$'$)%B4%(+6%t%*&%(+%*+6$-$+6$+'%)(+6.;%7()*(01$%5*'#%(%@#*G&A8()$%6*&')*08'*.+%5*'#%-()(;$'$)%+9

"#$%6$+&*'2%*&% 9%%b.)%+%%F4%'#$%;$(+%6.$&%+.'%$<*&'4%(+6%/.)%+%%H4%'#$%7()*(+3$%6.$&%+.'%$<*&'9L =0

FM

L =FML 0FM= =fF0 0fFB =fFI

L0=BML=0MfF

"#$%3#()(3'$)*&'*3%(+6%;.;$+'%,$+$)('*+,%/8+3'*.+&%()$%3.;-1*3('$69%%%

H9% % :/%d%*&%&'(+6()6%+.);(1%(+6%|%*&% *+6$-$+6$+'12%@#*G&A8()$6%6*&')*08'$6%5*'#%-()(;$'$)%B4% '#$+%df %#(&%(Pf=

"Gr*&')*08'*.+%5*'#%-()(;$'$)%B%L%6$,)$$&%./%/)$$6.;M9%%"#$%3#()(3'$)*&'*3%/8+3'*.+%*&%3.;-1*3('$6C%'#$%;.;$+'%,$+$)('*+,/8+3'*.+%6.$&%+.'%$<*&'9%%h9%%"#$%=.+3$+')(1%@#*G&A8()$%*&%'#$%6*&')*08'*.+%./%'#$%&8;%./%&A8()$&%./%B%*+6$-$+6$+'%+.);(1%)(+6.;%7()*(01$&4%$(3#5*'#%7()*(+3$%.+$4%(+6%5*'#%;$(+&%5#.&$%&A8()$&%&8;%'.%x9%%"#$%=.+3$+')(1%@#*GDA8()$%6$+&*'2%*&%(%\.*&&.+%;*<'8)$%./

L3$+')(1M%@#*G&A8()$%6$+&*'*$&4% T$GxfFLxfFM?f?UFL<CB]F?M9

@Q

g9%%"#$%=.+G3$+')(1%bG6*&')*08'*.+%#(&%(%6$+&*'2%'#('%*&%(%\.*&&.+%;*<'8)$%./%)$&3(1$6%L3$+')(1M%bG6*&')*08'$6%6$+&*'*$&4

T$GxfFLxfFM?f?U bL CB]F?4+M9%%%:'%*&%'#$%6*&')*08'*.+%./%'#$%$<-)$&&*.+%+ifBt4%5#$)$%i%*&%(%=.+3$+')(1

@Q==F@

=B=F@

@#*GDA8()$6%)(+6.;%7()*(01$%5*'#%-()(;$'$)&%B%(+6%x4%(+6%t%*&%(+%*+6$-$+6$+'%3$+')(1%@#*GDA8()$6%6*&')*08'*.+%5*'#-()(;$'$)%+9%%

s9%%:/%d%*&%&'(+6()6%+.);(1%(+6%|%*&%*+6$-$+6$+'12%@#*G&A8()$6%6*&')*08'$6%5*'#%-()(;$'$)%B4%'#$+%Ld]Mf %#(&LPf=M

(%=.+3$+')(1%"Gr*&')*08'*.+%5*'#%-()(;$'$)&%B%(+6%9%%"#$%6$+&*'2%*&%(%\.*&&.+%;*<'8)$%./%&3(1$6%_$'(%6*&')*08'$6%6$+&*'*$&4

% T LFfFM?f?U _L M9%%

@Q %FfF B=L=B FMF

==B F

4 =F4 IF@F

"#$%&A8()$%./%(%=.+3$+')(1%"Gr*&')*08'$6%)(+6.;%7()*(01$%#(&%(%=.+3$+')(1%bGr*&')*08'*.+%5*'#%-()(;$'$)&%I4%B4%(+6%x%R%F9

Page 87: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)126*()4$5&)P6!777777777777777777777777777777777777777777777777777777777777777777777777777!

1=Q);];<,?E;E

I9%:+%a<(;-1$%I4%5)*'$%.8'%(11%'#$%;$;0$)&%./%!9

F9%\).7$%'#('%(%PG/*$16%./%$7$+'&%3.+'(*+&%.&)0#$6-%!*+'$)&$3'*.+&%./%*'&%;$;0$)&9

E9%a<(;-1$%F%31(*;&%'#('%'#$%31(&&%./%(11%&80&$'&%./%3.8+'(01$%E%#(&%,)$('$)%3()6*+(1*'2%'#(+%E%*'&$1/9%%l('#$;('*3(1124%'#*&;$(+&%'#('%*'%*&%+.'%-.&&*01$%'.%(&&.3*('$%(%8+*A8$%$1$;$+'%./%E%5*'#%$(3#%;$;0$)%./%'#$%31(&&9%%i&$%'#$%/.11.5*+,%6$7*3$'.%3.+7*+3$%2.8)&$1/%'#*&%*&%')8$K%S)*'$%$(3#%+8;0$)%*+%'#$%8+*'%*+'$)7(1%(&%(%/)(3'*.+%*+%0*+()2%+.'('*.+4%Q90I0F9999%%J&&.3*('$5*'#%$(3#%+8;0$)%'#$%31(&&%;$;0$)%'#('%3.+'(*+&%'#$%&$A8$+3$%5*'#%?%#$(6&%*/%(+6%.+12%*/%0?%R%I9%%"#$+4%'#$%)$(1%+8;0$)&45#*3#%()$%8+3.8+'(01$4%;(-%*+'.%8+*A8$%;$;0$)&%./%'#$%31(&&4%&.%'#$%31(&&%*&%(1&.%8+3.8+'(01$9

H9%:+%a<(;-1$%H4%&#.5%'#('%'#$%$7$+'%Y'#$%3#(+,$%*&%'#$%&(;$%.+%&833$&&*7$%6(2&Z%*&%+.'%*+%-I^-F%4%08'%*&%(%;.+.'.+$%1*;*'./%&$'&%*+%-I^-F9

h9%a3.+.;*3%(,$+'&%3(+%;(B$%3.+'*+,$+'%')(6$&%.+12%*/%*'%*&%3.;;.+%B+.51$6,$%*/%'#$%3.+'*+,$+32%*&%)$(1*`$69%%:+%a<(;-1$I4%J,$+'%I%B+.5&%14%J,$+'%F%B+.5&%24%J,$+'%E%B+.5&%3%R%W4WVV4""X4WV"4"VX4EX9%%S#('%*&%'#$%3.;;.+%B+.51$6,$./%J,$+'&%I%(+6%Fw%%j/%J,$+'&%I%(+6%Ew

g9%D8--.&$4%*+%a<(;-1$%I4%'#('%*+&'$(6%./%V%(+6%"%0$*+,%$A8(112%1*B$124%'#$%-).0(0*1*'2%;$(&8)$%&('*&/*$&

VV V" "V ""

Q9F Q9H Q9I Q9E

S#('%*&%'#$%-).0(0*1*'2%'#('%'#$%/*)&'%3.*+%*&%#$(6&w%%"#('%'#$%&$3.+6%3.*+%*&%#$(6&w%%"#('%'#$%'5.%3.*+&%,*7$%'#$%&(;$%)$&81'w

s9%%@.+&*6$)%'#$%&$A8$+3$%./%/8+3'*.+&%/+L<M%R%<If+%/.)%Q%m%<%m%I9%%"#$&$%()$%&A8()$%*+'$,)(01$9%%r.%'#$2%3.+7$),$%'.%(%1*;*'4(+6%*/%&.4%5#('%*&%'#$%3.+7$),$+3$%&').+,4%*+%;$(&8)$4%.)%5$(Bw

v9%@.+&*6$)%'#$%-).0(0*1*'2%;$(&8)$%\LTQ4<UM%R%<IfF%.+%Q%%<%%I9%%r.$&%*'%;$$'%'#$%c(6.+G=*B.62;%3.+6*'*.+&%/.)%'#$$<*&'$+3$%./%(%-).0(0*1*'2%6$+&*'2w

z9%:'%*&%B+.5+%'#('%Q9F%-$)3$+'%./%'#$%-.-81('*.+%*&%V:tG-.&*'*7$9%%:'%*&%B+.5+%'#('%(%&3)$$+*+,%'$&'%/.)%V:t%#(&%(%IQ%-$)3$+'3#(+3$%./%*+3.))$3'12%&#.5*+,%-.&*'*7$%5#$+%'#$%&80?$3'%*&%+$,('*7$4%(+6%(%F%-$)3$+'%3#(+3$%./%*+3.))$3'12%&#.5*+,%+$,('*7$5#$+%'#$%&80?$3'%*&%-.&*'*7$9%%S#('%-).-.)'*.+%./%'#$%-.-81('*.+%'#('%'$&'&%-.&*'*7$%#(&%V:tw

IQ9%.#+%(+6%('$%()$%vQ%2$()&%.169%%"#$%-).0(0*1*'2%'#('%.#+%5*11%6*$%*+%'#$%+$<'%2$()%*&%Q9Qv4%(+6%'#$%-).0(0*1*'2%'#('%('$5*11%6*$%*+%'#$%+$<'%2$()%*&%Q9Qh9%%"#$%%-).0(0*1*'2%'#('%.#+%5*11%6*$4%,*7$+%'#('%('$%6*$&4%*&%Q9F9%%S#('%*&%'#$%-).0(0*1*'2'#('%0.'#%5*11%6*$w%%"#('%('%1$(&'%.+$%5*11%6*$w%%"#('%('$%5*11%6*$4%,*7$+%'#('%.#+%6*$&w

II9%"#$%-).0(0*1*'2%'#('%(%6)*7$)%5*11%#(7$%(+%(33*6$+'%+$<'%2$()%*/%&#$%#(&%(%\#9r9%*&%%Q9F9%%"#$%-).0(0*1*'2%&#$%5*11%#(7$(+%(33*6$+'%*/%&#$%6.$&%+.'%#(7$%(%\#9r9%*&%Q9Fh9%%"#$%-).0(0*1*'2%'#$%6)*7$)%#(&%(%\#9r9%(+6%(+%(33*6$+'%*&%Q9QI9%%S#('%*&'#$%-).0(0*1*'2%'#$%6)*7$)%#(&%(%\#9r9w%%S#('%*&%'#$%-).0(0*1*'2%./%(%\#9r9%,*7$+%(+%(33*6$+'w

Page 88: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)1261()4$5&)PL!777777777777777777777777777777777777777777777777777777777777777777777777777!

IF9%J%A8*`%&#.5%.//$)&%2.8%'#$%.--.)'8+*'2%'.%0$3.;$%(%;*11*.+(*)$%*/%2.8%(+&5$)%+*+$%A8$&'*.+&%3.))$3'129%%8$&'*.+&%3(+0$%$(&2%LaM4%;.6$)('$%LlM4%.)%#()6%LVM9%%"#$%)$&-$3'*7$%-).0(0*1*'*$&%'#('%2.8%5*11%(+&5$)%(+%a4%l4%.)%V%A8$&'*.+%3.))$3'12()$%FfE4%IfF4%(+6%IfE9%%:/%2.8%,$'%(+%a%A8$&'*.+4%2.8)%+$<'%A8$&'*.+%5*11%0$%a4%l4%.)%V%5*'#%-).0(0*1*'*$&%IfH4%IfF4%(+6%IfH)$&-$3'*7$129%%:/%2.8%,$'%(%l%A8$&'*.+4%2.8)%+$<'%A8$&'*.+%5*11%0$%a4%l4%.)%V%5*'#%-).0(0*1*'*$&%IfE4%IfE4%(+6%IfE%)$&-$3'*7$129:/%2.8%,$'%(%V%A8$&'*.+4%2.8)%+$<'%A8$&'*.+%5*11%0$%a4%l4%.)%V%5*'#%-).0(0*1*'*$&%IfF4%Q4%(+6%IfF%)$&-$3'*7$129%%"#$%/*)&'A8$&'*.+%*&%(15(2&%(+%a%A8$&'*.+9%%S#('%*&%'#$%-).0(0*1*'2%'#('%2.8%5*11%0$3.;$%(%;*11*.+(*)$w%%TV*+'K%%D#.5%'#('%'#$-).0(0*1*'2%./%5*++*+,%*/%2.8%)$(3#%A8$&'*.+%z%*&%*+6$-$+6$+'%./%5#$'#$)%'#*&%A8$&'*.+%*&%a4%l4%.)%V9%%"#$+%8&$%0(3B5()6)$38)&*.+9U

IE9%D#.5%'#('%*/%9)%B)(+6%\L9M%p%Q4%'#$+%\L,9M%3(+%0$%$*'#$)%1(),$)%.)%&;(11$)%'#(+%\L,BM9

IH9%J+%(*)-1(+$%#(&%IQQ%&$('&9%%"#$%-).0(0*1*'2%'#('%(%'*3B$'$6%-(&&$+,$)%&#.5&%8-%/.)%%'#$%/1*,#'%*&%Q9z4%(+6%'#$%$7$+'&%'#('(+2%'5.%6*//$)$+'%-(&&$+,$)&%&#.5%8-%*&%&'('*&'*3(112%*+6$-$+6$+'9%%:/%'#$%(*)1*+$%&$11&%IQh%&$('&4%5#('%*&%'#$%-).0(0*1*'2%'#(''#$%-1(+$%5*11%0$%.7$)0..B$6w%%V.5%;(+2%&$('&%3(+%'#$%(*)1*+$%&$114%(+6%B$$-%'#$%-).0(0*1*'2%./%.7$)0..B*+,%'.%h%-$)3$+'.)%1$&&w

Ih9%\).7$%'#('%'#$%$<-$3'('*.+%;L%G%3MF%*&%;*+*;*`$6%5#$+%3%R%;9

Ig9%\).7$%'#('%'#$%$<-$3'('*.+%;%G%3%*&%;*+*;*`$6%5#$+%3%R%;$6*(+LM9

Is9%S#('% 7(18$% ./% 3%;*+*;*`$&%;W;(<LG34QM% ]% LIGM;(<L3G4QMXw% % LV*+'K% 6$&3)*0$% '#$% &.18'*.+% *+% '$);&% ./% '#$6*&')*08'*.+%b%./%9U

Iv9%J%&$(1$6%0*6%(83'*.+%#(&%(%')(3'%./%1(+6%/.)%&(1$%'.%'#$%#*,#$&'%./%+%0*66$)&9%%d.8%()$%0*66$)%I9%%d.8)%$<-$)*$+3$%*&%'#(''#$%0*6&%./%$(3#%.'#$)%0*66$)%*&%6*&')*08'$6%5*'#%(%\.5$)%6*&')*08'*.+%bLM%R%%/.)%Q%%%%I9%%d.8)%-)./*'%*/%2.8%()$&833$&&/81%*+%082*+,%')(3'%('%-)*3$%2%*&%I%G%29% %S#('%&#.816%2.8%0*6%'.%;(<*;*`$%2.8)%$<-$3'$6%-)./*'w%%S#('% *&%2.8)-).0(0*1*'2%./%5*++*+,%'#$%(83'*.+w

Iz9%J%)(+6.;%7()*(01$%%#(&%(%+.);(1%6*&')*08'*.+%*/%*'&%6$+&*'2%*&%'L<M%R%%LFuPFMGy$<-LGL<GnMFfFPFM4%5#$)$%n%(+6%PF%()$-()(;$'$)&9%%\).7$%'#('%%#(&%;$(+%n%%(+6%7()*(+3$%PF9%%\).7$%'#('%;LGnME%R%Q%(+6%;LGnMH%R%EPH9%%TV*+'K%%b*)&'%&#.5%'#('<$<-LG<FfFM6<% R% G% $<-LG<FfFM% (+6% /.)% B% p% I4% '#$% *+'$,)('*.+% 02% -()'&% /.);81(% %<B$<-LG<FfFM6<% R% G<BGI$<-LG<FfFM% ]LBGIM<BGF$<-LG<FfFM6<9U

FQ9%D8--.&$%'#$%&'.3B%;()B$'%#(&%'5.%)$,*;$&4%i-%(+6%r.5+9%%:+%(+%i-%)$,*;$4%'#$%-).0(0*1*'2%'#('%'#$%;()B$'%*+6$<%5*11)*&$%.+%(+2%,*7$+%6(2%*&%\9%%:+%(%r.5+%)$,*;$4%'#$%-).0(0*1*'2%'#('%'#$%;()B$'%*+6$<%5*11%)*&$%.+%(+2%,*7$+%6(2%*&%4%5*'#%m%\9%%S*'#*+%(%)$,*;$4%'#$%-).0(0*1*'2%'#('%'#$%;()B$'%)*&$&%.+%(%,*7$+%6(2%*&%*+6$-$+6$+'%./%*'&%#*&'.)29%%"#$%-).0(0*1*'2./%0$*+,%*+%(%i-%)$,*;$%*&%IfF4%&.%'#('% */%2.8%6.%+.'%B+.5%5#*3#%)$,*;$%2.8%()$%*+4%'#$+%(11%2.8%3(+%&(2%*&%'#('%'#$-).0(0*1*'2%'#('%'#$%;()B$'%5*11%)*&$%.+%(+2%,*7$+%6(2%*&%c%R%L\]MfF9%%J&&8;$%'#('%)$,*;$&%-$)&*&'%/()%1.+,$)%'#(+%)8+&%./)*&$&4%&.%'#('%5#$+%(+(12`*+,%)8+&%'#$%)$,*;$%3(+%0$%')$('$6%(&%-$)&*&'*+,%*+6$/*+*'$129%%D#.5%'#('%5#$+%2.8%()$%*+%'#$%i-)$,*;$4%'#$%-).0(0*1*'2%./%(%)8+%./%B%.)%;.)$%&833$&&*7$%6(2&%*+%5#*3#%'#$%;()B$'%)*&$&%*&%\BGI4%(+6%'#('%'#$%-).0(0*1*'2%./(%)8+%./%$<(3'12%B%6(2&%*+%5#*3#%'#$%;()B$'%)*&$&%*&%\BGILIG\M9%%J%&*;*1()%/.);81(%5*'#%%*+&'$(6%./%\%#.16&%5#$+%2.8%()$*+%(%r.5+%)$,*;$9%%D#.5%'#('%$<-$3'$6%1$+,'#%*+%(+%i-%)$,*;$%./%(%)8+%./%)*&$&%*&%IfLIG\M9%%D#.5%'#('%yfLIG\M%]%yfLIGM%IfLIGcM9

FI9% "#$% )(+6.;% 7$3'.)% LI4FM% #(&% '#$% 6*&')*08'*.+% /8+3'*.+% $<-LGL$<-LGF<IM]$<-LGF<FMMIfFM9% % S#('% *&% '#$% ;(),*+(16*&')*08'*.+%./%Iw%%S#('%*&%'#$%3.+6*'*.+(1%6*&')*08'*.+%./%I%,*7$+%F%%3w%%e*7$+%F%R%3w

Page 89: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)1266()4$5&)PM!777777777777777777777777777777777777777777777777777777777777777777777777777!

FF9%"#$%$<-$3'('*.+%aL](|MF%%Q%/.)%)(+6.;%7()*(01$&%4%|%(+6%(+2%&3(1()%(99%%i&$%'#*&%-).-$)'2%'.%-).7$%'#$%@(83#2GD3#5()'`%*+$A8(1*'29

FE9%\).7$%$+&$+q&%*+$A8(1*'2%/.)%(%-).0(0*1*'2%3.+3$+')('$6%('%'5.%-.*+'&9

FH9%:+%a<(;-1$%F4%8&$%'#$%1(5%./%*'$)('$6%$<-$3'('*.+&%'.%3(1381('$%'#$%$<-$3'('*.+%./%'#$%+8;0$)%./%#$(6&4%,*7$+%'#('%'#$+8;0$)%$<3$$6&%.+$9

Fh9%:/%%(+6%|%()$%0*7()*('$%+.);(1%5*'#%;$(+&%I%(+6%F4%(+6%7()*(+3$&%I%(+6%H4%)$&-$3'*7$124%(+6%3.7()*(+3$%4%5#('%*&%'#$6$+&*'2%./%%,*7$+%|%R%`w%%i&$%_(2$&%1(5%'.%6$683$%'#$%3.+6*'*.+(1%6$+&*'2%./%|%,*7$+%%R%<w

Fg9%\).7$%'#$%/.);81(%/.)%'#$%3#()(3'$)*&'*3%/8+3'*.+%./%(%&'(+6()6%+.);(1%)(+6.;%7()*(01$9

Fs9%S#('%*&%'#$%6.;(*+%./%'#$%;.;$+'%,$+$)('*+,%/8+3'*.+%./%(+%$<-.+$+'*(112%6*&')*08'$6%)(+6.;%7()*(01$%5*'#%6$+&*'2/L<M%R%$<-LGE<M%/.)%<%p%Qw

Fv9% :/% L4|M% *&% (% )(+6.;% 7$3'.)% 5*'#% 6$+&*'2% /L<4`M% (+6% `% p% Q4% (+6% D% R% f|4% "% R% |4% 5#('% *&% '#$% (3.0$(+% ./% '#$')(+&/.);('*.+wFz9%:/%%(+6%d%()$%;81'*7()*('$%+.);(1%5*'#%`$).%;$(+&4%a%R%J4%add%R%_4%(+6%ad%R@4%&#.5%'#('%%(+6%|%R%d%G_GI@%()$%*+6$-$+6$+'9

EQ9%b.)%'#$%0*+.;*(1%6*&')*08'*.+%0LBC+4-M4%5#('%*&%'#$%7()*(+3$%./%'#$%/)$A8$+32%/%R%Bf+w

EI9%"#$%#2-$),$.;$')*3%6*&')*08'*.+%6$&3)*0$&%'#$%-).0(0*1*'2%'#('%B%./%+%0(11&%6)(5+%/).;%(+%8)+%5*11%0$%)$64%5#$)$%'#$%8)+3.+'(*+&%)%)$6%(+6%5%5#*'$%0(11&4%(+6%&(;-1*+,%*&%5*'#.8'%)$-1(3$;$+'9%%@(1381('$%'#$%&(;$%-).0(0*1*'2%*/%&(;-1*+,%*&%5*'#)$-1(3$;$+'9%%@(1381('$%'#$%-).0(0*1*'*$&4%5*'#%(+6%5*'#.8'%)$-1(3$;$+'4%5#$+%)%R%IQ4%5%R%zQ4%+%R%h4%B%R%I9

EF9%:+%(%\.*&&.+%6*&')*08'*.+4%5#('%*&%'#$%$<-$3'$6%3.8+'%3.+6*'*.+$6%.+%'#$%3.8+'%0$*+,%-.&*'*7$w

EE9%i+6$)%5#('%3.+6*'*.+&%*&%'#$%3#()(3'$)*&'*3%/8+3'*.+%./%(%8+*/.);%6*&')*08'*.+%./%TG(40U%)$(1w

EH9%D#.5%'#('%*/%%(+6%d%()$%*+6$-$+6$+'%*6$+'*3(112%6*&')*08'$6%$<')$;$%7(18$4%'#$+%%G%d%*&%1.,*&'*3%6*&')*08'$69

Eh9%D8--.&$%'#('%'#$%68)('*.+%./%(%&-$11%./%8+$;-1.2;$+'%L*+%6(2&M%3(+%0$%6$&3)*0$6%02%(%,$.;$')*3%6*&')*08'*.+4%\).0LBMR%-BLIG-M4%5#$)$%Q%m%-%m%I%*&%(%-()(;$'$)%(+6%B%*&%(%+.+G+$,('*7$%*+'$,$)9%%S#('%*&%'#$%$<-$3'$6%68)('*.+%./%8+$;-1.2;$+'wS#('%*&%'#$%-).0(0*1*'2%./%(%&-$11%./%8+$;-1.2;$+'%1(&'*+,%1.+,$)%'#(+%%6(2&w%%S#('%*&%'#$%3.+6*'*.+(1%$<-$3'('*.+%./%'#$68)('*.+%./%8+$;-1.2;$+'4%,*7$+%'#$%$7$+'%'#('%%p%;4%5#$)$%;%*&%(%-.&*'*7$%*+'$,$)w%%TV*+'K%i&$%/.);81(&%/.)%,$.;$')*3&$)*$&4%&$$%F9I9IQ9U

Eg9%i&$%'#$%;.;$+'%,$+$)('*+,%/8+3'*.+%'.%/*+6%;E%5#$+%%#(&%6$+&*'2%$G<ff4%<%p%Q9

Es9%J%1.,%+.);(1%)(+6.;%7()*(01$%d%*&%.+$%'#('%#(&%1.,LdM%+.);(19%%:/%1.,LdM%#(&%;$(+%n%(+6%7()*(+3$%PF4%/*+6%'#$%;$(+(+6%7()*(+3$%./%d9%%TV*+'K%%:'%*&%8&$/81%'.%/*+6%'#$%;.;$+'%,$+$)('*+,%/8+3'*.+%./%|%R%1.,LdM9U

Page 90: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)126L()4$5&)PO!777777777777777777777777777777777777777777777777777777777777777777777777777!

Ev9%:/%%(+6%d%()$%*+6$-$+6$+'%+.);(14%'#$+%]d%*&%(,(*+%+.);(14%&.%'#('%.+$%3(+%&(2%%'#('%#9%!0&*+$-!'$+1-7!1"!.-&"%3)03%*!$331#1&09%%LJ66*'*.+%./%)(+6.;%7()*(01$&%*&%(1&.%3(11$6%3.+7.18'*.+4%/).;%'#$%/.);81(%/.)%'#$%6$+&*'2%./%'#$%&8;9M=.5%&8--.&$%%%(+6%d%()$%*+6$-$+6$+'%(+6%#(7$%$<')$;$%7(18$%6*&')*08'*.+&4%\).0L%%<M%R%$<-LG$(G<M%(+6%\).0Ld%%2MR%$<-LG$0G2M%4%5#$)$%(%(+6%0%()$%1.3('*.+%-()(;$'$)&9%%D#.5%'#('%;(<L4dM%.+3$%(,(*+%#(&%(+%$<')$;$%7(18$%6*&')*08'*.+%L5*'#1.3('*.+%-()(;$'$)%3%R%1.,L$(]$0MM4%&.%'#('%#9%!%B#*%+%!/$-)%!'$+1-7!1"!.-&"%3!)03%*!+$B1+1:$#1&09

Ez9%:/%%*&%&'(+6()6%+.);(14%6$)*7$%'#$%6$+&*'2%(+6%3#()(3'$)*&'*3%/8+3'*.+%./%d%R%F4%%(+6%3.+/*);%'#('%'#*&%*&%'#$%&(;$%(&'#$%'(01$6%6$+&*'2%./%(%3#*G&A8()$%)(+6.;%7()*(01$%5*'#%.+$%6$,)$$%./%/)$$6.;9%%:/%%*&%+.);(1%5*'#%7()*(+3$%.+$%(+6%(%;$(+n%'#('%*&%+.'%`$).4%6$)*7$%'#$%6$+&*'2%./%d4%5#*3#%*&%+.+G3$+')(1%3#*G&A8()$%6*&')*08'$6%5*'#%.+$%6$,)$$%./%/)$$6.;%(+6+.+3$+')(1*'2%-()(;$'$)%nF9%

HQ9%c(+6.;%t()*(01$&%%(+6%d%()$%0*7()*('$%+.);(14%5*'#%;%R%I4%;d%R%E4%(+6%t()LM%R%%H4%t()LdM%R%z4%@.7()*(+3$L4dMR%h9%%%%%%%L(M%S#('%*&%'#$%;$(+%./%|%R%F%G%dw%%%%%L0M%S#('%*&%'#$%7()*(+3$%./%|%R%F%G%dw%%%%%!%S#('%*&%'#$%3.+6*'*.+(1%;$(+%./%|%,*7$+%%R%hw%%%%%L6M%S#('%*&%'#$%3.+6*'*.+(1%7()*(+3$%./%|%,*7$+%%R%hw

HI9%S#('%*&%'#$%-).0(0*1*'2%'#('%'#$%1(),$)%./%'5.%)(+6.;%.0&$)7('*.+&%/).;%(+2%3.+'*+8.8&%6*&')*08'*.+%5*11%$<3$$6%'#$-.-81('*.+%;$6*(+w%HF9%:/%)(+6.;%7()*(01$&%%(+6%d%()$%*+6$-$+6$+'4%5*'#%a%R%I4%ad%R%F4%%aF%R%H4%adF%R%z4%5#('%*&%'#$%8+3.+6*'*.+(1;$(+%(+6%7()*(+3$%./%%Edw%%S#('%*&%'#$%3.+6*'*.+(1%;$(+%(+6%7()*(+3$%./%Ed%,*7$+%%d%R%hw%

HE9%.0&%()$%3#()(3'$)*`$6%02%(%5(,$%)('$%S%(+6%(%68)('*.+%./%$;-1.2;$+'%4%(+6%LS4M%3(+%0$%*+'$)-)$'$6%(&%()(+6.;%7$3'.)9%%"#$%68)('*.+%./%$;-1.2;$+'%#(&%(+%$<-.+$+'*(1%6$+&*'2%$G<4%(+6%%'#$%5(,$%)('$%S%#(&%(%$<-.+$+'*(16$+&*'24%3.+6*'*.+$6%.+%%R%<4%$A8(1%'.%L]<M$GL]<M54%5#$)$%4%4%(+6%%()$%-.&*'*7$%%-()(;$'$)&9%%S#('%*&%'#$%;(),*+(16$+&*'2%./%Sw%%"#$%3.+6*'*.+(1%6$+&*'2%./%%,*7$+%Sw

HH9%c(+6.;%t()*(01$&%%(+6%d%()$%0*7()*('$%+.);(14%5*'#%;%R%I4%;d%R%E4%(+6%t()LM%R%%H4%t()LdM%R%[email protected]()*(+3$L4dM%R%h9%%%%%%%L(M%S#('%*&%'#$%;$(+%./%|%R%F%G%dw%%%%%L0M%S#('%*&%'#$%7()*(+3$%./%|%R%F%G%dw%%%%%!%S#('%*&%'#$%3.+6*'*.+(1%;$(+%./%|%,*7$+%%R%hw%%%%%L6M%S#('%*&%'#$%3.+6*'*.+(1%7()*(+3$%./%|%,*7$+%%R%hw

Hh9%"#$%6('(%&$'%+2&$9'<'%*+%'#$%31(&&%6('(%()$(%./%'#$%31(&&%#.;$%-(,$%3.+'(*+&%6(*12%.0&$)7('*.+&%.+%&'.3B%;()B$')$'8)+&%/).;%(+9%F4%Izgv%'#).8,#%r$39%EI4%Izzv4%(%'.'(1%./%svQg%.0&$)7('*.+&%3.))$&-.+6*+,%'.%6(2&%'#$%;()B$'%5(&.-$+9%%"#$)$%()$%/.8)%7()*(01$&4%*+%3.18;+&%6$1*;*'$6%02%&-(3$&9%%"#$%/*)&'%7()*(01$%LrJ"M%*&%'#$%6('$%*+%77++33/.);('4%'#$%&$3.+6%7()*(01$%Lc=dDaM%*&%'#$%6(*12%)$'8)+%'.%'#$%=dDa%;()B$'%*+6$<4%6$/*+$6%(&%'#$%1.,%./%'#$%)('*.%./'#$%31.&*+,%7(18$%./%'#$%*+6$<%'.6(2%'.%'#$%31.&*+,%*+6$<%.+%'#$%-)$7*.8&%6(2%'#$%;()B$'%5(&%.-$+4%5*'#%6*&')*08'*.+&L6*7*6$+6&M%/(3'.)$6%*+9%%"#$%'#*)6%7()*(01$%LD\hQQM%*&%'#$%D[\hQQ%;()B$'%*+6$<4%(+%*+6$<%./%(%;(?.)*'2%./%'#$%#*,#;()B$'%7(18$%&'.3B&%*+%'#$%=$5%d.)B%&'.3B%$<3#(+,$9%%"#$%/.8)'#%7()*(01$%Lc"_zQM%*&%'#$%)('$%./%*+'$)$&'%*+%'#$&$3.+6()2%;()B$'%/.)%zQG6(2%")$(&8)2%_*11&4%3.+7$)'$6%'.%(%6(*12%)('$%3.;;$+&8)('$%5*'#%c=dDa99

(9%>$'%;+%6$+.'$%(%&(;-1$%(7$)(,$%L$;-*)*3(1%$<-$3'('*.+M9%%b*+6%'#$%&(;-1$%;$(+%n%R%;+4%7()*(+3$%PF%R%;+L%G%nM4&B$5+$&&%;+L%G%nMEfPE4%(+6%B8)'.&*&%;+L%G%nMHfPH%G%E4%/.)%'#$%7()*(01$&%c=dDa%(+6%c"_zQ9%%=.);(112%6*&')*08'$6

Page 91: McFadden-Statistical Tools for Economists

!"#$%%&'()!"#"$%"$&#'()**'%+(!)*+++))))))))))))))))))))))))))))))))))))))))))))))))))),-$./&0)126M()4$5&)PP!777777777777777777777777777777777777777777777777777777777777777777777777777!

)(+6.;%7()*(01$&%#(7$%`$).%&B$5+$&&%(+6%B8)'.&*&%*+%'#$%-.-81('*.+9%%l(B*+,%(+%N$2$0(11N%3.;-()*&.+4%6.%'#$%&(;-1$;.;$+'&%(--$()%'.%0$%3.+&*&'$+'%5*'#%'#$%-).-.&*'*.+%'#('%c=dDa%(+6%c"_zQ%()$%+.);(112%6*&')*08'$6w

09%b.)%c=dDa4%/.);%'#$%"#$03$*31:%3%7()*(01$%|%R%Lc=dDa%G%nMfP4%02%&80')(3'*+,%'#*&%7()*(01$&%&(;-1$%;$(+%(+6'#$+%6*7*6*+,%02%'#$%&A8()$%)..'%./%*'&%7()*(+3$%L.)%&'(+6()6%6$7*('*.+M9%%D.)'%'#$%7(18$&%./%|%/).;%1.5%'.%#*,#4%(+6%'#$+3.+&')83'%(%+$5%7()*(01$%d%'#('%$A8(1&%*fsvQg%/.)%I%%*%%svQg9%%"#$%7(18$&%./%|%()$%3(11$6%'#$%&*3%*!"#$#1"#1."%./%'#$&(;-1$4%(+6%d%*&%'#$%$;-*)*3(1%@rb4%(%@rb%'#('%-8'&%IfsvQg%-).0(0*1*'2%('%$(3#%.0&$)7$6%7(18$%./%c=dDa9%%\1.'%d(,(*+&'%L|M4%5#$)$%%*&%'#$%&'(+6()6%+.);(1%@rb9%%:/%c=dDa%*&%+.);(14%'#$+%'#$&$%38)7$&%5*11%6*//$)%.+12%0$3(8&$%./&(;-1*+,%+.*&$%*+%d9%%r.$&%*'%(--$()%02%$2$0(11%3.;-()*&.+%'#('%'#$2%()$%1*B$12%'.%0$%'#$%&(;$w%%J%-()'*381()%*&&8$%*&%'#$'#$.)$'*3(1%A8$&'*.+%./%5#$'#$)%'#$%6*&')*08'*.+%./%)$'8)+&%#(&%/('%'(*1&4%&.%'#('%'#$%7()*(+3$%(+6%#*,#$)%;.;$+'&%()$%#()6'.%$&'*;('$%-)$3*&$12%.)%;(2%/(*1%'.%$<*&'9%%:+%(%+.);(1%&(;-1$4%.+$%5.816%$<-$3'%'#('%.+%(7$)(,$%zz%-$)3$+'%./&'(+6()6*`$6%.0&$)7('*.+&%()$%1$&&%'#(+%F9hsh%*+%;(,+*'86$9%%r.%'#$%&'(+6()6*`$6%7(18$&%|%(--$()%'.%0$%3.+&*&'$+'%5*'#'#*&%/)$A8$+32w39%J%31(*;%*+%'#$%(+(12&*&%./%&'.3B%;()B$'%)$'8)+&%*&%'#('%'#$%*+').683'*.+%./%/*+(+3*(1%6$)*7('*7$&%(+6%*+6$<%/8+6&'#).8,#%'#$%IzvQ&%;(6$%*'%$(&*$)%/.)%()0*')(,$)&%'.%31.&$%5*+6.5&%./%-)./*'%.--.)'8+*'29%%"#$%(),8;$+'%*&%;(6$%'#('%'#$)$&81'*+,%(3'*.+&%./%()0*')(,$)&%#(7$%;(6$%'#$%;()B$'%;.)$%7.1('*1$9%%@.;-()$%'#$%&80&(;-1$&%./%=dDa%$<3$&&%)$'8)+&La@aDD%R%c=dDa%G%=c"_zQM%/.)%'#$%-$)*.6&%IzgvGIzsv%(+6%IzvvGIzzv9%%_2%$2$0(11%3.;-()*&.+4%5$)$%'#$)$6*//$)$+3$&%*+%;$(+%$<3$&&%)$'8)+%*+%'#$%'5.%6$3(6$&w%%:+%'#$%7()*(+3$%L.)%&'(+6()6%6$7*('*.+M%./%$<3$&&%)$'8)+w%%=.56.%(%F^F%'(01$%./%&(;-1$%;$(+&%31(&&*/*$6%02%'#$%'5.%6$3(6$&%(0.7$%(+6%02%5#$'#$)%.)%+.'%'#$%-)$7*.8&%6(2q&%$<3$&&)$'8)+%5(&%(0.7$%*'&%6$3(6$%(7$)(,$9%%r.$&%*'%(--$()%'#('%'#$%,(-%0$'5$$+%;$(+%$<3$&&%)$'8)+&%.+%6(2&%/.11.5*+,-)$7*.8&%)*&$&%(+6%/(11&%#(&%*+3)$(&$6%.)%&#)8+B%*+%'#$%6$3(6$%./%'#$%zQ&w

Page 92: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-1, Page 89______________________________________________________________________________

CHAPTER 4. LIMIT THEOREMS IN STATISTICS

4.1. SEQUENCES OF RANDOM VARIABLES

4.1.1. A great deal of econometrics uses relatively large data sets and methods of statisticalinference that are justified by their desirable properties in large samples. The probabilisticfoundations for these arguments are laws of large numbers, sometimes called the law ofaverages, and central limit theorems. This chapter presents these foundations. It concentrateson the simplest versions of these results, but goes some way in covering more complicated versionsthat are needed for some econometric applications. For basic econometrics, the most criticalmaterials are the limit concepts and their relationship covered in this section, and for independentand identically distributed (i.i.d.) random variables the first Weak Law of Large Numbers in Section4.3 and the first Central Limit Theorem in Section 4.4. The reader may want to postpone othertopics, and return to them as they are needed in later chapters.

4.1.2. Consider a sequence of random variables Y1,Y2,Y3,... . These random variables are allfunctions Yk(s) of the same state of Nature s, but may depend on different parts of s. There areseveral possible concepts for the limit Yo of a sequence of random variables Yn. Since the Yn arefunctions of states of nature, these limit concepts will correspond to different ways of defining limitsof functions. Figure 4.1 will be used to discuss limit concepts. Panel (a) graphs Yn and Yo asfunctions of the state of Nature. Also graphed are curves denoted Yo± and defined by Yo ± whichfor each state of Nature s delineate an -neighborhood of Yo(s). The set of states of Nature for whichYo(s) - Yn(s) > is denoted Wn. Panel (b) graphs the CDF's of Yo and Yn. For technicalcompleteness, note that a random variable Y is a measurable real-valued function on a probabilityspace (S,F,P), where F is a σ-field of subsets of S, P is a probability on F, and measurable meansthat F contains the inverse image of every set in the Borel σ-field of subsets of the real line. TheCDF of a vector of random variables is then a measurable function with the properties given in 3.5.3.

4.1.3. Yn converges in probability to Yo, if for each > 0, limn Prob(Yn - Yo > ) = 0.Convergence in probability is denoted Yn p Yo, or plimn Yn = Yo. With Wn defined as in Figure4.1, Yn p Yo iff limn Prob(Wn) = 0 for each > 0.

4.1.4. Yn converges almost surely to Yo, denoted Yn as Yo, if for each > 0,limn Prob(supmnYm- Yo > ) = 0. For Wn defined in Figure 4.1, the set of states of nature for

which Ym(w) - Yo(w) > for some m n is Wm, and Yn as Yo iff Prob( Wn) 0.mn mn

An implication of almost sure convergence is limn Yn(s) = Yo(s) a.s. (i.e., except for a set of statesof Nature of probability zero); this is not an implication of Yn p Yo.

Page 93: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-2, Page 90______________________________________________________________________________

0

1

Stat

e of

Nat

ure

0 1

Realizations of Random Variables

Yo

Yn

Yo+

Yo-

Wn

0

1

Prob

abilit

y

0 1

Random Variables

CDF of Yo

CDF of Yn

FIGURE 4.1. CONVERGENCE CONCEPTS FOR RANDOM VARIABLES

Panel (a)

Panel (b)

Page 94: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-3, Page 91______________________________________________________________________________

4.1.5. Yn converges in ρ-mean (also called convergence in ρ norm, or convergence in Lρspace) to Yo if limn EYn - Yo

ρ = 0. For ρ = 2, this is called convergence in quadratic mean. The

norm is defined as Yρ = [ Y(s)ρP(ds)]1/ρ = [EYρ]1/ρ, and can be interpreted as a probability-S

weighted measure of the distance of Y from zero. The norm of a random variable is a moment.There are random variables for which the ρ-mean will not exist for any ρ > 0; for example, Y withCDF F(y) = 1 - 1/(log y) for y e has this property. However, in many applications moments suchas variances exist, and the quadratic mean is a useful measure of distance.

4.1.6. Yn converges in distribution to Yo, denoted Yn d Yo, if the CDF of Yn converges to theCDF of Yo at each continuity point of Yo. In Figure 4.1(b), this means that Fn converges to thefunction Fo point by point for each argument on the horizontal axis, except possibly for points whereFo jumps. (Recall that distribution functions are always continuous from the right, and except atjumps are continuous from the left. Since each jump contains a distinct rational number and therationals are countable, there are at most a countable number of jumps. Then the set of jump pointshas Lebesgue measure zero, and there are continuity points arbitrarily close to any jump point.Because of right-continuity, distribution functions are uniquely determined by their values at theircontinuity points.) If A is an open set, then Yn d Yo implies liminfn Fn(A) Fo(A); conversely,A closed implies limsupn Fn(A) Fo(A) see P. Billingsley (1968), Theorem 2.1. Convergence indistribution is also called weak convergence in the space of distribution functions.

4.1.7. The relationships between different types of convergence are summarized in Figure 4.2.In this table, A | B means that A implies B, but not vice versa, and A B means that Aand B are equivalent. Explanations and examples are given in Sections 4.1.8-4.1.18. On firstreading, skim these sections and skip the proofs.

4.1.8. Yn as Yo implies Prob(Wn) Prob( Wm) 0, and hence Yn p Yo. However,mn

Prob(Wn) 0 does not necessarily imply that the probability of Wm is small, so Yn p Yo doesmn

not imply Yn as Yo. For example, take the universe of states of nature to be the points on the unitcircle with uniform probability, take the Wn to be successive arcs of length 2π/n, and take Yn to be1 on Wn, 0 otherwise. Then Yn p 0 since Pr(Yn 0) = 1/n, but Yn fails to converge almost surelyto zero since the successive arcs wrap around the circle an infinite number of times, and every s inthe circle is in an infinite number of Wn.

4.1.9. Suppose Yn p Yo. It is a good exercise in manipulation of probabilities of events to showthat Yn d Yo. Given > 0, define Wn as before to be the set of states of Nature where Yn(s) - Yo(s)> . Given y, define An, Bo, and Co to be, respectively, the states of Nature with Yn y, Yo y - ,

Page 95: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-4, Page 92______________________________________________________________________________

and Yo y + . Then Bo AnWn (i.e., Yo(s) y - implies either Yn(s) y or Yo(s) - Yn(s) > )and An CoWn (i.e., Yn(s) y implies Yo(s) y + or Yo(s) - Yn(s) > ). Hence, for n largeenough so Prob(Wn) < , Fo(y-) Prob(Bo) Prob(An) + Prob(Wn) < Fn(y) + , and Fn(y) Prob(An) Prob(Co) + Prob(Wn) < Fo(Y+) + , implying Fo(y-) - limn Fn(y) Fo(y+) + . If y is acontinuity point of Yo, then Fo(y-) and Fo(y+) approach Fo(y) as 0, implying limn Fn(y) =Fo(y). This establishes that Yn d Yo.

Convergence in distribution of Yn to Yo does not imply that Yn and Yo are close to each other.For example, if Yn and Yo are i.i.d. standard normal, then Yn d Yo trivially, but clearly not Yn p Yosince Yn - Yo is normal with variance 2, and Yn - Yo > with a positive, constant probability.However, there is a useful representation that is helpful in relating convergence in distribution andalmost sure convergence; see P. Billingsley (1986), p.343.

Theorem 4.1. (Skorokhod) If Yn d Yo, then there exist random variables Yn and Yo such thatYn and Yn have the same CDF, as do Yo and Yo, and Yn as Yo.

4.1.10. Convergence in distribution and convergence in probability to a constant are equivalent.If Yn p c constant, then Yn d c as a special case of 4.1.9 above. Conversely, Yn d c constant meansFn(y) Fo(y) at continuity points, where Fc(y) = 0 for y < c and Fc(y) = 1 for y c. Hence > 0implies Prob(Yn - c > ) = Fn(c-) + 1 - Fn(c+) 0, so Yn p c. This result implies particularly thatthe statements Yn - Yo p 0 and Yn - Yo d 0 are equivalent. Then, Yn - Yo d 0 implies Yn d Yo, butthe reverse implication does not hold.

4.1.11. The condition that convergence in distribution is equivalent to convergence ofexpectations of all bounded continuous functions is a fundamental mathematical result called theHelly-Bray theorem. Intuitively, the reason the theorem holds is that bounded continuous functionscan be approximated closely by sums of continuous almost-step functions, and the expectationsof almost step functions closely approximate points of CDFs. A proof by J. Davidson (1994), p.352, employs the Skorokhod representation theorem 4.1.

4.1.12. A Chebyshev-like inequality is obtained by noting for a random variable Z with density

f(z) that EZρ = zρf(z)dz ρf(z)dz = ρ Prob(Z > ), or Prob(Z > ) EZρ/ρ. z

(When ρ = 2, this is the conventional Chebyshev inequality. When ρ = 1, one has Prob(Z > ) EZ/.) Taking Z = Yn - Yo, one has limn Prob(Yn - Yo > ) -ρ

limn EYn - Yoρ. Hence,

convergence in ρ-mean (for any ρ > 0) implies convergence in probability. However, convergencealmost surely or in probability does not necessarily imply convergence in ρ-mean. Suppose thesample space is the unit interval with uniform probability, and Yn(s) = en for s n-2, zero otherwise.Then Yn as 0 since Prob(Ym 0 for any m > n) n-2, but EYn

ρ = eρn/n2 + for any ρ > 0.

Page 96: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-5, Page 93______________________________________________________________________________

FIGURE 4.2. RELATIONS BETWEEN STOCHASTIC LIMITS(Section numbers for details are given in parentheses)

(1.8) (1.9)

1 Yn as Yo | Yn p Yo | Yn d Yo

(1.4) (1.3) (1.10)

2 Yn - Yo as 0 | Yn - Yo p 0 Yn - Yo d 0 (1.8) (1.10)

3 Yn d c (a constant) Yn p c (1.10)

4 Yn d Yo Eg(Yn) Eg(Yo) for all bounded continuous g (1.11) 5 Yn - Yoρ 0 for some ρ > 0 | Yn p Yo (1.12)

6 Yn - Yoρ M ( all n) & Yn p Yo | Yn - Yoλ 0 for 0 < λ < ρ (1.13) 7 Yn p Yo | as Yo for some subsequence nk, k = 1,2,... (1.14)Ynk

8P(Yn - Yo > ) < + for each > 0 | Yn as Yo (1.15)

n1

9 EYn - Yo

ρ < + (for some ρ > 0) | Yn as Yo (1.15) n1

10 Yn d Yo & Zn - Yn p 0 | Zn d Yo (1.16)

11 Yn p Yo | g(Yn) p g(Yo) for all continuous g (1.17)

12 Yn d Yo | g(Yn) d g(Yo) for all continuous g (1.18)

Page 97: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-6, Page 94______________________________________________________________________________

4.1.13. Adding a condition of a uniformly bounded ρ-order mean EYnρ M to convergence

in probability Yn p Yo yields the result that EYoλ exists for 0 < λ ρ, and EYn

λ EYoλ for 0

< λ < ρ. This result can be restated as "the moments of the limit equal the limit of the moments" formoments of order λ less than ρ. Replacing Yn by Yn - Yo and Yo by 0 gives the result in Figure 4.2. To prove these results, we will find useful the property of moments that EYλ (EYρ)λ/ρ for0 < λ < ρ. (This follows from Holders inequality (2.1.11), which states EUV (EUr)1/r(EVs)1/s

for r,s > 0 and r-1 + s-1 = 1, by taking U = Yλ, V = 1, and r = ρ/λ.) An immediate implication isEYn

λ Mλ/ρ . Define g(y,λ,k) = min (yλ,kλ), and note that since it is continuous and bounded, theHealy-Bray theorem implies Eg(Yn,λ,k) Eg(Yo,λ,k). Therefore,

Mλ/ρ EYnλ Eg(Yn,λ,k) = yλfn(y)dy + kλProb(Yn > k)

k

k

yλfo(y)dy + kλProb(Yo > k). k

k

Letting k establishes that EYoλ exists for 0 < λ ρ. Further, for λ < ρ,

0 EYnλ - Eg(Yn,λ,k) yλfn(y)dy kλ-ρ yρfn(y)dy kλ-ρM.|y|>k |y|>k

Choose k sufficiently large so that kλ-ρM < . The same inequality holds for Yo. Choose nsufficiently large so that Eg(Yn,λ,k) Eg(Yo,λ,k) < . Then

EYnλ-EYo

λEYnλ-Eg(Yn)+Eg(Yn)-Eg(Yo)+Eg(Yo)-EYo

λ 3. This proves that EYn

λ EYoλ.

An example shows that EZnλ 0 for λ < ρ does not imply EZn

ρ bounded. Take Zn discretewith support 0,n and probability log(n)/n at n. Then for λ < 1, EZn

λ = log(n)/n1-λ 0, but EZn1

= log(n) +.

4.1.14. If Yn p Yo, then Prob(Wn) 0. Choose a subsequence nk such that 2-k.Prob(Wnk)

Then Prob( 2-k = 2-k, implying as Yo. k>kWnk) k>k Prob(Wnk

) k>k Ynk

Page 98: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-7, Page 95______________________________________________________________________________

4.1.15. Conditions for a.s. convergence follow from this basic probability theorem:

Theorem 4.2. (Borel-Cantelli) If Ai is any sequence of events in a probability space (S,F,P),

P(Ai) < + implies that almost surely only a finite number of the events Ai occur. If Ai isn1

a sequence of independent events, then P(Ai) = + implies that almost surely an infiniten1

number of the events Ai occur.

Apply the Borel-Cantelli theorem to the events Ai = s S |Yi - Yo| > to conclude that

P(Ai) < + implies that almost surely only a finite number of the events Ai occur, andn1

hence |Yi - Yo| for all i sufficiently large. Thus, Yn - Yo as 0, or Yn as Yo . For the next result

in the table, use (1.12) to get Prob( Prob(Wm) EYm - Yoρ.mnWm) m>n

ρm>n

Apply Theorem 4.2 to conclude that if this right-hand expression is finite, then Yn as Yo. Theexample at the end of (1.12) shows that almost sure convergence does not imply convergence inρ-mean. Also, the example mentioned in 1.8 which has convergence in probability but not almostsure convergence can be constructed to have ρ-mean convergence but not almost sure convergence.

4.1.16. A result termed the Slutsky theorem which is very useful in applied work is that if tworandom variables Yn and Zn have a difference which converges in probability to zero, and if Ynconverges in distribution to Yo, then Zn d Yo also. In this case, Yn and Zn are termed asymptoticallyequivalent. The argument demonstrating this result is similar to that for 4.1.9. Let Fn and Gn be theCDF's of Yn and Zn respectively. Let y be a continuity point of Fo and define the following events: An = sZn(s) < y, Bn = sYn(s) y - , Cn = sYn(s) y + , Dn = s Yn(s) - Zn(s) > . Then An CnDn and Bn AnDn, implying Fn(y-) - Prob(Dn) Gn(y) Fn(y+) + Prob(Dn). Given δ > 0, one can choose > 0 such that y- and y+ are continuity points of Fn, and such thatFo(y+) - Fo(y-) < δ/3. Then one can choose n sufficiently large so that Prob(Dn) < δ/3, Fn(y+) -Fo(y+) < δ/3 and Fn(y+) - Fo(y+) < δ/3. Then Gn(y) - Fo(y) < δ.

Page 99: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-8, Page 96______________________________________________________________________________

4.1.17 A useful property of convergence in probability is the following result:

Theorem 4.3. (Continuous Mapping Theorem) If g(y) is a continuous function on an open setcontaining the support of Yo, then Yn p Yo implies g(Yn) p g(Yo). The result also holds for vectorsof random variables, and specializes to the rules that if Y1n p Y10, Y2n p Y20, and Y3n p Y30 then(a) Y1nY2n + Y3n p Y10Y20 + Y30, and (b) if Prob(Y20 < ) = 0 for some > 0, then Y1n/Y2n pY10/Y20. In these limits, Y10, Y20, and/or Y30 may be constants.

Proof: Given > 0, choose M such that P(Yo > M) < . Let Ao be the set of y in the support of Yothat satisfy y M. Then Ao is compact. Mathematical analysis can be used to show that thereexists a nested sequence of sets Ao A1 A2 A3 with A3 an open neighborhood of Ao on whichg is continuous, A2 compact, and A1 open. From 4.16, liminfn Fn(A1) Fo(A1) 1- implies thereexists n1 such that for m > n1, Fm(A1) 1-2. The continuity of g implies that for each y A2, thereexists δy > 0 such that y-y < δy g(y) - g(y) < . These δy-neighborhoods cover A2. Then A2has a finite subcover. Let δ be the smallest value of δy in this finite subcover. Then, g is uniformlycontinuous: y A2 and y-y < δ imply g(y) - g(y) < . Choose n > n1 such that for m > n, P(Ym- Yo > δ) < /2. Then for m > n, P(g(Ym) - g(Yo) > ) P(Yn - Yo > δ) + P(Yo > M) + 1 -Fm(A1) 4.

4.1.18 The preceding result has an analog for convergence in distribution. This result establishes,for example, that if Yn d Yo, with Yo standard normal and g(y) = y2, then Yo is chi-squared, so thatthat Yn

2 converges in distribution to a chi-squared random variable.

Theorem 4.4. If g(y) is a continuous function on an open set containing the support of Yo, thenYn d Yo implies g(Yn) d g(Yo). The result also holds for vectors of random variables. Proof: The Skorokhod representation given in Theorem 4.1 implies there exist Yn and Yo that havethe same distributions as Yn and Yo, respectively, and satisfy Yn as Yo. Then, Theorem 4.3 impliesg(Yn) as g(Yo), and results 4.1.8 and 4.1.9 above then imply g(Yn) d g(Yo). Because of thecommon distributions, this is the result in Theorem 4.4. For this reason, this result is also sometimesreferred to as (part of) the continuous mapping theorem. The Slutsky theorem, result 4.1.10, is aspecial case of the continous mapping Theorems 4.3 and 4.4. For clarity, I also give a direct proofof Theorem 4.4. Construct the sets Ao A1 A2 A3 as in the proof of Theorem 4.3. A theoremfrom mathematical analysis (Urysohn) states that there exists a continuous function r with valuesbetween zero and one that satisfies r(y) = 1 for y A1 and r(y) = 0 for y A3. Then g*(y) = g(y)r(y)is continuous everywhere. From the Healy-Bray theorem, Yn d Yo E h(Yn) E h(Yo) for allcontinuous bounded h E h(g*(Yn)) E h(g*(Yo)) for all continuous bounded h, since thecomposition of continuous bounded functions is continuous and bounded g*(Yn) d g*(Yo).

Page 100: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-9, Page 97______________________________________________________________________________

But P(g*(Yn) g(Yn)) P(Yn A1) 2 for n sufficiently large, and g*(Yo) = g(Yo). Then, 4.1.16and g*(Yn) - g(Yn) p 0 imply g*(Yn) d g*(Yo).

4.1.19. Convergence properties are sometimes summarized in a notation called Op() and op()which is very convenient for manipulation. (Sometimes too convenient; it is easy to get careless andmake mistakes using this calculus.) The definition of op() is that a random sequence Yn is op(nα) ifn-αYn converges in probability to zero; and one then writes Yn = op(nα). Then, Yn p Yo is also writtenYn = Yo + op(1), and more generally n-α(Yn- Yo) p 0 is written Yn - Yo = op(nα). Thus op() is anotation for convergence in probability to zero of a suitably normalized sequence of randomvariables. When two sequences of random variables Yn and Zn are asymptotically equivalent, so thatthey satisfy Yn - Zn = op(1), then they have a common limiting distribution by Slutskys theorem, andthis is sometime denoted Yn ~a Zn.

The notation Yn = Op(1) is defined to mean that given > 0, there exists a large M (not dependingon n) such that Prob(Yn > M) < for all n. A sequence with this property is called stochasticallybounded. More generally, Yn = Op(nα) means Prob(Yn > Mnα) < for all n. A sequence that isconvergent in distribution is stochastically bounded: If Yn d Yo, then one can find M and no suchthat ± M are continuity points of Yo, Prob(Yo M) > 1-/2, |Fn(M) - Fo(M)| < /4 and |Fn(-M) - Fo(-M)| < /4 for n > no. Then Prob(Yn > M) < for n > no. This implies Yn = Op(1). On the otherhand, one can have Yn = Op(1) without having convergence to any distribution (e.g., consider Yn 0for n odd and Yn standard normal for n even). The notation Yn = Op(nα) means n-αYn = Op(1).

Most of the properties of Op() and op() are obvious restatements of results from Figure 4.2. Forexample, n-αYn = op(1), or n-αYn p 0, immediately implies for any > 0 that there exists no such thatfor n > no, Prob(|n-αYn| > ) < . For each n no, one can find Mn such that Prob(|n-αYn| > Mn) < .Then, taking M to be the maximum of and the Mn for n no, one has Prob(|n-αYn| > M) < for alln, and hence n-αYn = Op(1). The results above can be summarized in the following string ofinplications:

n-αYn converges inprobability to 0

n-αYn = op(1) | n-αYn converges indistribution to 0

| n-αYn = Op(1)

An abbreviated list of rules for op and Op is given in Figure 4.3. We prove the very useful rule6 in this figure: Given > 0, Yn = Op(nα) M > 0 such that Prob(n-αYn > M) < /2. NextZn = op(nβ) implies no such that for n > no, Prob(n-βZn > /M) < /2. Hence Prob(n-α-βYnZn >) Prob(n-αYn > M) + Prob(n-βZn > /M) < . Demonstration of the remaining rules is left asan exercise.

Page 101: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-10, Page 98______________________________________________________________________________

FIGURE 4.3. RULES FOR Op() AND op()

Definition: Yn = op(nα) Prob(n-αYn>) 0 for each > 0. Definition: Yn = Op(nα) for each > 0, there exists M > 0

such that Prob(n-αYn>M) < for all n

1 Yn = op(nα) Yn = Op(nα)

2 Yn = op(nα) & β > α Yn = op(nβ)

3 Yn = Op(nα) & β > α Yn = op(nβ)

4 Yn = op(nα) & Zn = op(nβ) YnZn = op(nα+β)

5 Yn = Op(nα) & Zn = Op(nβ) YnZn = Op(nα+β)

6 Yn = Op(nα) & Zn = op(nβ) YnZn = op(nα+β)

7 Yn = oo(nα) & Zn = op(nβ) & β α Yn + Zn = op(nβ)

8 Yn = Op(nα) & Zn = Op(nβ) & β α Yn + Zn = Op(nβ)

9 Yn = Op(nα) & Zn = op(nβ) & β > α Yn + Zn = op(nβ)

10 Yn = Op(nα) & Zn = op(nβ) & β < α Yn + Zn = Op(nα)

11 Yn = Op(nα) & Zn = op(nα) Yn + Zn = Op(nα)

4.2. INDEPENDENT AND DEPENDENT RANDOM SEQUENCES

4.2.1. Consider a sequence of random variables Y1,Y2,Y3,... . The joint distribution (CDF) ofa finite subsequence (Y1,...,Yn), denoted F1,...,n(y1,...,yn), is defined as the probability of a state ofNature such that all of the inequalities Y1 y1,...,Yn yn hold. The random variables in the sequenceare mutually statistically independent if for every finite subsequence Y1...,Yn, the joint CDF factors:

F1,...,n(y1,...,yn) F1(y1)...Fn(yn).

The variables are independent and identically distributed (i.i.d.) if in addition they have a commonunivariate CDF F1(y). The case of i.i.d. random variables leads to the simplest theory of stochasticlimits, and provides the foundation needed for much of basic econometrics. However, there are

Page 102: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-11, Page 99______________________________________________________________________________

many applications, particularly in analysis of economic time series, where i.i.d. assumptions are notplausible, and a limit theory is needed for dependent random variables. We will define two typesof dependence, martingale and mixing, that will cover a variety of econometric time seriesapplications and require a modest number of tools from probability theory. We have introduced afew of the needed tools in Chapter 3, notably the idea of information contained in σ-fields of events,with the evolution of information captured by refinements of these σ-fields, and the definitions ofmeasurable functions, product σ-fields, and compatability conditions for probabilities defined onproduct spaces. There are treatments of more general forms of dependence than martingale ormixing, but these require a more comprehensive development of the theory of stochastic processes.

4.2.2. Consider a sequence of random variables Yk with k interpreted as an index of (discrete)time. One can think of k as the infinite sequence k K = 1,2,..., or as a doubly infinite sequence,extending back in time as well as forward, k K = ...,-2,-1,0,1,2,... . The set of states of Nature

can be defined as the product space S = ×iK , or S = K, where is the real line, and the

complete information σ-field of subsets of S defined as FK = iK B , where B is the Borel σ-fieldof subsets of the real line; see 3.2. (The same apparatus, with K equal to the real line, can be usedto consider continuous time. To avoid a variety of mathematical technicalities, we will not considerthe continuous time case here.) Accumulation of information is described by a nondecreasing

sequence of σ-fields ... G-1 G0 G1 G2 ... , with Gt = (it B)(i>tφ,S) capturing the ideathat at time t the future is unknown. The monotone sequence of σ-fields Gt, i = ...,-1,0,1,2,... iscalled a filtration. The sequence of random variables Yt is adapted to the filtration if Yt ismeasurable with respect to Gt for each t. Some authors use the notation σ(...,Yt-2,Yt-1,Yt) for Gt toemphasize that it is the σ-field generated by the information contained in Ys for s t. The sequence...,Y-1,Y0,Y1,Y2,... adapted to Gt for k K is termed a stochastic process. One way of thinking ofa stochastic process is to recall that random variables are functions of states of Nature, so that theprocess is a function Y:S×K . Then Y(s,k) is the realization of the random variable in period k,Y(s,) a realization or time-path of the stochastic process, and Y(,k) the random variable in periodk. Note that there may be more than one sequence of σ-fields in operation for a particular process.These might correspond, for example, to the information available to different economic agents. Wewill need in particular the sequence of σ-fields Ht = σ(Yt,Yt+1,Yt+2,...) adapted to the process fromtime t forward; this is a nonincreasing sequence of σ-fields... H-t-1 Ht Ht+1 ... . Sometimes Gtis termed the natural upward filtration, and Ht the natural downward filtration.

Each subsequence (Ym,...,Ym+n) of the stochastic process has a multivariate CDFFm,..,m+n(ym,...,ym+n). It is said to be stationary if for each n, this CDF is the same for every m. Astationary process has the obvious property that moments such as means, variances, and covariancesbetween random variables a fixed number of time periods apart are the same for all times m.Referring to 4.2.1, a sequence i.i.d. random variables is always stationary.

Page 103: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-12, Page 100______________________________________________________________________________

4.2.3. One circumstance that arises in some economic time series is that while the successiverandom variables are not independent, they have the property that their expectation, given history,is zero. Changes in stock market prices, for example, will have this property if the market isefficient, with arbitragers finding and bidding away any component of change that is predictablefrom history. A sequence of random variables Xt adapted to Gt is a martingale if almost surelyEXtGt-1) = Xt-1. If Xt is a martingale, then Yt = Xt - Xt-1 satisfies EYtGt-1) = 0, and is called amartingale difference (m.d.) sequence. Thus, stock price changes in an efficient market form a m.d.sequence. It is also useful to define a supermartingale (resp., submartingale) if almost surelyEXtGt-1) Xt-1 (resp,, EXtGt-1) Xt-1). The following result, called the Kolmogorov maximalinequality, is a useful property of martingale difference sequences.

Theorem 4.5. If random variables Yk are have the property that E(YkY1.,,,.Yk-1) = 0, or moretechnically the property that Yk adapted to σ(...,Yk-1,Yk) is a martingale difference sequence, and if

EYk2 = σk

2, then P(max1kn Yi > ) σi2/2.

ki1

ni1

Proof: Let Sk = Yi. Let Zk be a random variable that is one if Sj for j < k and Sk > , zeroki1

otherwise. Note that Zi 1 and E( Zi) = P(max1kn Yi > ). Theni1

ni1

ki1

variables Sk and Zk depend only on Yi for i k. Then E(Sn - SkSk,Zk) = 0. Hence

Esn2 ESn

2Zk = E[Sk + (Sn - Sk)]2Zk ESk2Zk 2 EZk.

nk1

nk1

nk1

nk1

4.2.4. As a practical matter, many economic time series exhibit correlation between differenttime periods, but these correlations dampen away as time differences increase. Bounds oncorrelations by themselves are typically not enough to give a satisfactory theory of stochastic limits,but a related idea is to postulate that the degree of statistical dependence between random variablesapproaches negligibility as the variables get further apart in time, because the influence of ancienthistory is buried in an avalance of new information (shocks). To formalize this, we introduce theconcept of stochastic mixing. For a stochastic process Yt, consider events A Gt and B Ht+s; thenA draws only on information up through period t and B draws only on information from period t+son. The idea is that when s is large, the information in A is too stale to be of much use indetermining the probability of B, and these events are nearly independent. Three definitions ofmixing are given in the table below; they differ only in the manner in which they are normalized, butthis changes their strength in terms of how broadly they hold and what their implications are. Whenthe process is stationary, mixing depends only on time differences, not on time location.

Page 104: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-13, Page 101______________________________________________________________________________

Form of Mixing Coefficient Definition (for all A Gt and B Ht+s, and all t)

Strong α(s) 0 |P(AB) - P(A)P(B)| α(s)

Uniform (s) 0 |P(AB) - P(A)P(B)| (s)P(A)

Strict ψ(s) 0 |P(AB) - P(A)P(B)| ψ(s)P(A)P(B)

There are links between the mixing conditions and bounds on correlations between events that areremote in time:

(1) Strict mixing Uniform mixing Strong mixing. (2) (Serfling) If the Yi are uniform mixing with EYi = 0 and EYt

2 = σt2 < +, then

EYtYt+s 2φ(s)1/2σtσt+s. (3) (Ibragimov) If the Yi are strong mixing with EYt = 0 and EYt

d < + for some d > 2, thenEYtYt+s 8α(s)1-2/dσtσt+s. (4) If there exists a sequence ρt with limtρt = 0 such that E(U-EU)(W-EW) ρt[(E(U-EU)2)(E(W-EW)2)]1/2 for all bounded continuous functions U = g(Y1,...,Yt) and W =h(Yt+n,...,Yt+n+m) and all t, n, m, then the Yt are strict mixing.

An example gives an indication of the restrictions on a dependent stochastic process that producestrong mixing at a specified rate. First, suppose a stationary stochastic process Yt satisfies Yt = ρYt-1+ Zt, with the Zt imdependent standard normal. Then, var(Yt) = 1/(1-ρ2) and cov(Yt+s,Yt) = ρs/(1-ρ2),and one can show with a little analysis that |P(Yt+s a,Ytb) - P(Yt+s a)P(Ytb)| ρs/π(1 - ρ2s)1/2.Hence, this process is strong mixing with a mixing coefficient that declines at a geometric rate. Thisis true more generally of processes that are formed by taking stationary linear transformations ofindependent processes. We return to this subject in the chapter on time series analysis.

4.3. LAWS OF LARGE NUMBERS

4.3.1. Consider a sequence of random variables Y1,Y2,... and a corresponding sequence of

averages Xn = Yi for n = 1,2,... . Laws of large numbers give conditions under whichn1ni1

the averages Xn converge to a constant, either in probability (weak laws, or WLLN) or almost surely(strong laws, or SLLN). Laws of large numbers give formal content to the intuition that sampleaverages are accurate analogs of population averages when the samples are large, and are essentialto establishing that statistical estimators for many problems have the sensible property that withsufficient data they are likely to be close to the population values they are trying to estimate. In

Page 105: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-14, Page 102______________________________________________________________________________

econometrics, convergence in probability provided by a WLLN suffices for most purposes.However, the stronger result of almost sure convergence is occasionally useful, and is oftenattainable without additional assumptions.

4.3.2 Figure 4.4 lists a sequence of laws of large numbers. The case of independent identicallydistributed (i.i.d.) random variables yields the strongest result (Kolmogorov I). With additionalconditions it is possible to get a laws of large numbers even for correlated variable provided thecorrelations of distant random variables approach zero sufficiently rapidly.

FIGURE 4.4. LAWS OF LARGE NUMBERS FOR Xn = Yk n1nk1

WEAK LAWS (WLLN)

1 (Khinchine) If the Yk are i.i.d., and E Yk = µ, then Xn p µ

2 (Chebyshev) If the Yk are uncorrelated with E Yk = µ and E(Yk - µ)2 = σk2 satisfying

σk2/k2 < +, then Xn p µ

k1

3 If the Yk have E Yk = µ, E(Yk-µ)2 σk2, and E(Yk-µ)(Ym-µ) ρkmσkσm with

σk2/k3/2 < + and limn ρkm < +, then Xn p µ

k1

1n

n

k1

n

m1

STRONG LAWS (SLLN)

1 (Kolmogorov I) If the Yk are i.i.d., and E Yk = µ, then Xn as µ

2 (Kolmogorov II) If the Yk are independent, with E Yk = µ, and E(Yk-µ)2 = σk2 satisfying

σk2/k2 < +, then Xn as µ

k1

3 (Martingale) Yk adapted to σ(...,Yk-1,Yk) is a martingale difference sequence, EYt2 = σt

2,

and σk2/k2 < +, then Xn as 0

k1

4 (Serfling) If the Yk have E Yk = µ, E(Yk-µ)2 = σk2, and E(Yk-µ)(Ym-µ) ρ|k-m|σkσm,

with (log k)2σk2/k2 < + and ρ|k-m| < +, then Xn as µ

k1

k1

Page 106: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-15, Page 103______________________________________________________________________________

To show why WLLN work, I outline proofs of the first three laws in Figure 4.4.

Theorem 4.6. (Khinchine) If the Yk are i.i.d., and E Yk = µ, then Xn p µ .

Proof: The argument shows that the characteristic function (c.f.) of Xn converges pointwise to thec.f. for a constant random variable µ. Let ψ(t) be the c.f. of Y1. Then Xn has c.f. ψ(t/n)n. Since EY1exists, ψ has a Taylors expansion ψ(t) = 1 + ψ(λt)t, where 0 < λ < 1 (see 3.5.12). Then ψ(t/n)n =[1 + (t/n) ψ(λt/n)]n. But ψ(λt/n) ψ(0) = ιµ. A result from 2.1.10 states that if a sequence ofscalars αn has a limit, then [1+αn/n]n exp(lim αn). Then ψ(t/n)n eιµt. But this is the c.f. of aconstant random variable µ, implying Xn d µ, and hence Xn p µ. .

Theorem 4.7. (Chebyshev) If the Yk are uncorrelated with E Yk = µ and E(Yk - µ)2 = σk2

satisfying σk2/k2 < +, then Xn p µ.

k1

Proof: One has E(Xn-µ)2 = . Kroneckers Lemma (see 2.1.9) establishes thatnk1 σ

2n/n 2

bounded implies E(Xn-µ)2 0. Then Chebyshevs inequality implies Xn p µ. k1 σ

2k/k 2

The condition bounded in Theorem 4.7 is obviously satisfied if σk2 is uniformly

k1 σ

2k/k 2

bounded, but is also satisfied if σk2 grows modestly with k; e.g., it is sufficient to have σk

2(log K)/kbounded.

Theorem 4.8. (WLLN 3) If the Yk have E Yk = µ, E(Yk-µ)2 σk2, and E(Yk-µ)(Ym-µ)

ρkmσkσm with σk2/k3/2 < + and limn ρkm < +, then Xn p µ

k1

1n

nk1

nm1

Proof: Using Chebyshev's inequality, it is sufficient to show that E(Xn-µ)2 converges to zero. TheCauchy-Schwartz inequality (see 2.1.11) is applied first to establish

1n

n

m1σmρkm

21n

n

m1σ2

m1n

n

m1ρ2

km

and then to establish that

E(Xn-µ)2 = = 1n 2

n

k1

n

m1σkσmρkm

1n

n

k1σk

1n

n

m1σmρkm

Page 107: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-16, Page 104______________________________________________________________________________

1n

n

k1σ2

k

1/21n

n

k1

1n

n

m1σmρkm

2 1/2

1n

n

k1σ2

k

1/21n

n

m1σ2

m1n 2

n

k1

n

m1ρ2

km

1/2

= = .1n

n

k1σ2

k1n 2

n

k1

n

m1ρ2

km

1/2 1n 3/2

n

k1σ2

k1n

n

k1

n

m1ρ2

km

1/2

The last form and Kroneckers lemma (2.1.11) give the result.

The conditions for this result are obviously met if the σk2 are uniformly bounded and the

correlation coefficients decline at a sufficient rate with the distance between observations; examplesare geometric decline with ρkm bounded by a multiple of λ|k-m| for some λ < 1 and an arithmeticdecline with ρkm bounded by a multiple of |k-m|-1.

The Kolmogorov SLLN 1 is a better result than the Kinchine WLLN, yielding a strongerconclusion from the same assumptions. Similarly, the Kolmogorov SLLN 2 is a better result thanthe Chebyshev WLLN. Proofs of these theorems can be found in C. R. Rao (1973), p. 114-115. TheSerfling SLLN 4 is broadly comparable to WLLN 3, but Serfling gets the stronger almost sureconclusion with somewhat stronger assumptions on the correlations and somewhat weakerassumptions on the variances. If variances are uniformly bounded and correlation coefficientsdecline at least at a rate inversely proportional to the square of the time difference, this sufficient foreither the WLLN 3 or SLLN 4 assumptions.

The SLLN 3 in the table applies to martingale difference sequences, and shows that KolmogorovII actually holds for m.d. sequences.

Theorem 4.9. If Yt adapted to σ(...,Yk-1,Yk) is a martingale difference sequence with EYt2 = σt

2

and σk2/k2 < +, then Xn as 0.

k1

Proof: The theorem is stated and proved by J. Davidson (1994), p. 314. To give an idea why SLLN

work, I will give a simplified proof when the assumption σk2/k2 < + is strengthened

k1

to σk2/k3/2 < +. Either assumption handles the case of constant variances with room to

k1

spare. Kolmogorovs maximal inequality (Theorem 4.5) with n = (m+1)2 and = δm2 implies that

P(maxm2k(m+1)2 Xk > δ) P(max1kn Yi > δm2) σi

2/δ2m4.ki1

(m1)2

i1

Page 108: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-17, Page 105______________________________________________________________________________

The sum over m of the right-hand-side of this inequality satisfies

σi2/δ2m4 = σi

2/δ2m4 36 σi2/i3/2δ2.

m1

(m1)2

i1 i1

mi 1/2

i1

Then P(supk Xk > δ) 36 σi2/i3/2δ2 < +. Theorem 4.2 gives the result.

m1

i1

4.4. CENTRAL LIMIT THEOREMS

4.4.1. Consider a sequence of random variables Y1,...,Yn with zero means, and the associated

sequence of scaled averages Zn = n-1/2 Yi. Central limit theorems (CLT) are concerned withni1

conditions under which the Zn, or variants with more generalized scaling, converge in distributionto a normal random variable Zo. I will present several basic CLT, prove the simplest, and discussthe remainder. These results are summarized in Figure 4.5.

The most straighforward CLT is obtained for independent and identically distributed (i.i.d.)random variables, and requires only that the random variables have a finite variance. Note that thefinite variance assumption is an additional condition needed for the CLT that was not needed for theSLLN for i.i.d. variables.

Theorem 4.10. (Lindeberg-Levy) If random variables Yk are i.i.d. with mean zero and finitepositive variance σ2, then Zn d Zo ~ N(0,σ2).

Proof: The approach is to show that the characteristic function of Zn converges for each argumentto the characteristic function of a normal. The CLT then follows from the limit properties ofcharacteristic functions (see 3.5.12). Let ψ(t) be the cf of Y1. Then Zn has cf ψ(tn-1/2)n. Since EY1= 0 and EY1

2 = σ2, ψ(t) has a Taylors expansion ψ(t) = [1 + ψ"(λt)t2/2], where 0 < λ < 1 and ψ" iscontinuous with ψ"(0) = -σ2. Then ψ(tn-1/2)n = [1 + ψ"(λtn-1/2)t2/2n]n. Then the limit result 2.1.10gives limn [1 + ψ"(λtn-1/2)t2/2n]n = exp(-σ2 t2/2). Thus, the cf of Zn converges for each t to the cfof Zo~ N(0,σ2).

4.4.2. When the variables are independent but not identically distributed, an additional boundon the behavior of tails of the distributions of the random variables, called the Lindeberg condition,is needed. This condition ensures that sources of relatively large deviations are spread fairly evenlythrough the series, and not concentrated in a limited number of observations. The Lindebergcondition can be difficult to interpret and check, but there are a number of sufficient conditions thatare useful in applications. The main result, stated next, allows more general scaling than by n-1/2.

Page 109: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-18, Page 106______________________________________________________________________________

FIGURE 4.5. CENTRAL LIMIT THEOREMS FOR Zn = n-1/2 Yi n

i1

1 (Lindeberg-Levy) Yk i.i.d., EYk = 0, EYk2 = σ2 positive and finite Zn d Zo ~ N(0,σ2)

2 (Lindeberg-Feller) If Yk independent, EYk = 0, EYk

2 = σk2 (0,+), cn

2 = σk2, then

nk1

cn2 +, limn max1kn σk/cn = 0, and Un = Yk/cnd Uo ~ N(0,1) if and only if the

nk1

Lindeberg condition holds: for each > 0, E Yk21(|Yk| > cn)/cn

2 0nk1

3 If Yk independent, EYk = 0, EYk

2 = σk2 (0,+), cn

2 = σk2 have cn

2 + and nk1

limn max1kn σk/cn = 0, then each of the following conditions is sufficient for theLindeberg condition:

(i) For some r > 2, E |Yk|r/cnr 0.

nk1

(ii) (Liapunov) For some r > 2, E |Yk/σk|r is bounded uniformly for all n. (iii) For some r > 2, E |Yk|r is bounded, and ck

2/k is bounded positive, uniformly for all k.

4 Yk a martingale difference sequence adapted to σ(...,Yk-1,Yk) with |Yk| < M for all t and

EYk2 = σk

2 satisfying σk2 σo

2 > 0 Zn d Zo ~ N(0,σo2)n1

nk1

5 (Ibragimov-Linnik) Yk stationary and strong mixing with E Yk = 0, E Yk2 = σ2 (0,+),

EYk+sYk = σ2ρs, and for some r > 2, EYnr < + and α(k)1-2/r < + |ρs|

k1

s1

< + and Zn d Zo ~ N(0,σ2(1+2 ρs))s1

Page 110: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-19, Page 107______________________________________________________________________________

Theorem 4.11. (Lindeberg-Feller) Suppose random variables Yk are independent with mean zero

and positive finite variances σk2. Define cn

2 = σk2 and Un = Yk/cn. Then cn

2 ,nk1

nk1

limn max1kn σk/cn = 0, and Un d Uo ~ N(0,1) if and only if the Yk satisfy the Lindeberg condition

that for > 0, limn E Yk21(|Yk| > cn)/cn

2 = 0.ni1

A proof of Theorem 4.11 can be found, for example, in P. Billingsley (1986), p. 369-375. Itinvolves an analysis of the characteristic functions, with detailed analysis of the remainder terms intheir Taylors expansion. To understand the theorem, it is useful to first specialize it to the case thatthe σk

2 are all the same. Then cn2 = nσ1

2, the conditions cn2 and limn max1kn σk/cn = 0 are met

automatically, and in the terminology at the start of this section, Un = Zn/σ1. The theorem then saysUn d Uo ~ N(0,1) if and only if the sample average of E Yk

21(|Yk| > n1/2) converges to zero for each > 0. The last condition limits the possibility that the deviations in a single random variable couldbe as large in magnitude as the sum, so that the shape of the distribution of this variable makes asignificant contribution to the shape of the distribution of the sum. An example shows how theLindeberg condition bites. Consider independent random variables Yk that equal ±kr with probability1/2k2r, and zero otherwise, where r is a positive scalar. The Yk have mean zero and variance one,

and 1(|Yk| > n1/2) = 1 if kr > n1/2, implying E Yk21(|Yk| > n1/2) = max(0,1-1/rn(1-2r)/2r).n1

ni1

This converges to zero, so the Lindeberg condition is satisfied iff r < 1/2. Thus, the tails of thesequence of random variables cannot fatten too rapidly.

The Lindeberg condition allows the variances of the Yk to vary within limits. For example, thevariables Yk = ±2k with probability 1/2 have σn/cn bounded positive, so that the variances grow toorapidly and the condition fails. The variables Yk = ±2-k with probability 1/2 have cn bounded, so thatσ1/cn is bounded positive, the variances shrink too rapidly, and the condition fails. The next resultgives some easily checked sufficient conditions for the Lindeberg condition.

Theorem 4.12. Suppose random variables Yk are independent with mean zero and positive finite

variances σk2 that satisfy cn

2 = σk2 and limn max1kn σk/cn = 0. Then, each of the

nk1

following conditions is sufficient for the Lindeberg condition to hold:

(i) For some r > 2, E |Yk|r/cnr 0.

nk1

(ii) (Liapunov) For some r > 2, E |Yk/σk|r is bounded uniformly for all n.(iii) For some r > 2, E |Yk|r is bounded, and ck

2/k is bounded positive, uniformly for all k.

Proof: To show that (i) implies the Lindeberg condition, write

Page 111: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-20, Page 108______________________________________________________________________________

E Yk21(|Yk| > cn)/cn

2 (cn)2-r E |Yk|r1(|Yk| > cn)/cn2 2-r E |Yk /cn|r.

nk1

nk1

nk1

For (ii), the middle expression in the string of inequalities above satisfies

(cn)2-r E |Yk|r1(|Yk| > cn)/cn2 2-r(maxkn E|Yk/σk|r) σk

r/cnr

nk1

nk1

2-r(maxkn E|Yk/σk|r) (σk2/cn

2)(maxkn (σk/cn)r-2),nk1

and the assumptions ensure that maxkn E|Yk/σk|r is bounded and maxkn (σk/cn)r-2 0.

Finally, if (iii), then continuing the first string of inequalities,

E |Yk|r/cnr cn

2-rn(supk E |Yk|r)/n(infn cn2/n),

ni1

and the right-hand-side is proportional to cn2-r, which goes to zero.

4.4.3. The following theorem establishes a CLT for the scaled sum Zn = n-1/2 Yi ofni1

martingale differences; or Zn = n-1/2(Xn-Xo). The uniform boundedness assumption in this theoremis a strong restriction, but it can be relaxed to a Lindeberg condition or to a uniform integratabilitycondition; see P. Billingsley (1984), p. 498-501, or J. Davidson (1994), p. 385. Martingaledifferences can display dependence that corresponds to important economic applications, such asconditional variances that depend systematically on history.

Theorem 4.13. Suppose Yk is a martingale difference adapted to σ(...,Yk-1,Yk), and Yk satisfies

a uniform bound |Yk| < M. Let EYk2 = σk

2, and assume that σk2 σo

2 > 0. Then Zn dn1nk1

Zo ~ N(0,σo2).

4.4.4. Intuitively, the CLT results that hold for independent or martingale difference randomvariables should continue to hold if the degree of dependence between variables is negligible. Thefollowing theorem from I. Ibragimov and Y. Linnik, 1971, gives a CLT for stationary strong mixingprocesses. This result will cover a variety of economic applications, including stationary lineartransformations of independent processes like the one given in the last example.

Page 112: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-21, Page 109______________________________________________________________________________

Theorem 4.14. (Ibragimov-Linnik) Suppose Yk is stationary and strong mixing with mean zero,variance σ2, and covariances E Yk+sYk = σ2ρs. Suppose that for some r > 2, EYn

r < + and

α(k)1-2/r < +. Then, |ρs| < + and Zn d Zo ~ N(0,σ2(1+2 ρs)).k1

s1

s1

4.5. EXTENSIONS OF LIMIT THEOREMS

4.5.1. Limit theorems can be extended in several directions: (1) obtaining results for triangulararrays that include weighted sums of random variables, (2) sharpening the rate of convergence tothe limit for "well-behaved" random variables, and (3) establishing "uniform" laws that apply torandom functions. In addition, there are a variety of alternatives to the cases given above whereindependence assumptions are relaxed. The first extension gives limit theorems for randomvariables weighted by other (non-random) variables, a situation that occurs often in econometrics.The second extension provides tools that allow us to bound the probability of large deviations ofrandom sums. This is of direct interest as a sharper version of a Chebychev-type inequality, and alsouseful in obtaining further results. To introduce uniform laws, first define a random function (orstochastic process) y = Y(θ,s) that maps a state of Nature s and a real variable (or vector of variables)θ into the real line. This may also be written, suppressing the dependence on s, as Y(θ). Note thatY(,w) is a realization of the random function, and is itself an ordinary non-random function of θ.For each value of θ, Y(θ,) is an ordinary random variable. A uniform law is one that bounds sumsof random functions uniformly for all arguments θ. For example, a uniform WLLN would say limn

P(supθ Yi(θ,) > ) = 0. Uniform laws play an important role in establishing then1ni1

properties of statistical estimators that are nonlinear functions of the data, such as maximumlikelihood estimates.

4.5.2 Consider a doubly indexed array of constants ain defined for 1 i n and n = 1,2,..., and

weighted sums of the form Xn = ainYi. If the Yi are i.i.d., what are the limiting propertiesni1

of Xn? We next give a WLLN and a CLT for weighted sums. The way arrays like ain typically arise

is that there are some weighting constants ci, and either ain = ci/ cj or ain = ci/[ cj]1/2.ni1

ni1

If ci = 1 for all i, then ain = n-1or n-1/2, respectively, leading to the standard scaling in limit theorems.

Page 113: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-22, Page 110______________________________________________________________________________

Theorem 4.15. Assume random variables Yi are independently identically distributed with mean

zero. If an array ain satisfies limn ajn = 0 and limn maxjn ajn = 0, then Xn p 0.ni1

Proof: This is a weighted version of Khinchine's WLLN, and is proved in the same way. Let ζ(t) bethe second characteristic function of Y1. From the properties of characteristic functions we haveζ(0) = 0 and a Taylor's expansion ζ(t) = tζ(λt) for some 0 < λ < 1. The second characteristic

function of Xn is then γ(t) = aintζ(λinaint), implying γ(t) aintζ(λinaint) ni1

ni1

t(maxin ζ(λinaint)) ain. Then lim ain < and lim (maxin ain) = 0 implyni1

ni1

γ(t) 0 for each t, and hence Xn converges in distribution, hence in probability, to 0.

Theorem 4.16. Assume random variables Yi are i.i.d. with mean zero and variance σ2 (0,+).

If an array ain satisfies limn maxjn ajn = 0 and limn ain2 = 1, then Xn d Xo ~ N(0,σ2).

ni1

Proof: The argument parallels the Lindeberg-Levy CLT proof. The second characteristic functionof Xn has the Taylor's expansion γ(t) = -(1/2)σ2t2ain +[ζ"(λinaint)+σ2]ain

2 t2/2, where λin (0,1) . The

limit assumptions imply γ(t) + (1/2)σ2t2 is bounded in magnitude by ζ"(λinaint)+σ2aint2/2 [ ain

2 t2/2]maxinζ"(λinaint)+σ2.

ni1

ni1

This converges to zero for each t since limn maxinζ"(λinaint)+σ2 0. Therefore, γ(t) convergesto the characteristic function of a normal with mean 0 and variance σ2.

4.5.3. The limit theorems 4.13 and 4.14 are special cases of a limit theory for what are calledtriangular arrays of random variables, Ynt with t = 1,2,...,n and n = 1,2,3,... . (One additional levelof generality could be introduced by letting t range from 1 up to a function of n that increases toinfinity, but this is not needed for most applications.) This setup will include simple cases like Ynt= Zt/n or Ynt = Zt/n1/2, and more general weightings like Ynt = antZt with an array of constants ant, butcan also cover more complicated cases. We first give limit theorems for Ynt that are uncorrelatedor independent within each row. These are by no means the strongest obtainable, but they have themerit of simple proofs.

Theorem 4.17. Assume random variables Ynt for t = 1,2,...,n and n = 1,2,3,... are uncorrelated

across t for each n, with E Ynt = 0, E Ynt2 = σnt

2. Then, σnt2 0 implies Ynt p 0.

ni1

ni1

Page 114: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-23, Page 111______________________________________________________________________________

Proof: Apply Chebyshevs inequality.

Theorem 4.18. Assume random variables Ynt for t = 1,2,...,n and n = 1,2,3,... are independent

across t for each n, with E Ynt = 0, E Ynt2 = σnt

2, σnt2 1, EYnt

3 0, andni1

ni1

σnt4 0. Then Xn = Ynt d Xo ~ N(0,1).

ni1

ni1

Proof: From the properties of characteristic functions (see 3.5.12), the c.f. of Ynt has a Taylorsexpansion that satisfies ψnt(s) - 1 + s2σnt

2/2 s3EYnt3/6. Therefore, the c.f. γn(s) of Xn satisfies

log γn(s) = log(1 - s2σnt2/2 + λnts3EYnt

3/6), where |λnt| 1. From 2.1.10, we have theni1

inequality that for a < 1/3 and b < 1/3, Log(1+a+b) - a < 4b + 3a2. Then, the assumptions

guarantee that log γn(s) + s2 σnt2/2 4s3 EYnt

3/6 + 3 s4 σnt4/4. The

ni1

ni1

ni1

assumptions then imply that log γn(s) -s2/2, establishing the result.

In the last theorem, note that if Ynt = n-1/2Zt, then EZt3 bounded is sufficient to satisfy all the

assumptions. Another set of limit theorems can be stated for triangular arrays with the property thatthe random variables within each row form a martingale difference sequence. Formally, considerrandom variables Ynt for t = 1,...,n and n = 1,2,3,... that are adapted to σ-fields Gnt that are a filtrationin t for each n, with the property that EYntGn,t-1 = 0; this is called a martingale difference array.A WLLN for this case is adapted from J. Davidson (1994), p. 299.

Theorem 4.19. If Ynt and Gnt for t = 1,...,n and n = 1,2,3,... is an adapted martingale difference

array with Ynt M, E Ynt2 = σnt

2, σnt uniformly bounded, and σnt2 0, then

ni1

ni1

Ynt p 0.ni1

The following CLT for martingale difference arrays is taken from D. Pollard (1984), p. 170-174.

Theorem 4.20. If Ynt and Gnt for t = 1,...,n and n = 1,2,3,... is an adapted martingale difference

array, λnt2 = E(Ynt

2Gn,t-1) is the conditional variance of Ynt, λnt2 p σ2 (0,+), and if for

ni1

each > 0, E Ynt21(|Ynt| > ) 0, then Xn = Ynt d Xo ~ N(0,σ2).

nt1

ni1

Page 115: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-24, Page 112______________________________________________________________________________

4.5.4. Chebyshevs inequality gives an easy, but crude, bound on the probability in the tail of adensity. For random variables with well behaved tails, there are sharper bounds that can be used toget sharper limit theorems. The following inequality, due to Hoeffding, is one of a series of resultscalled exponential inequalities that are stated and proved in D. Pollard (1984), p. 191-193: If Yn areindependent random variables with zero means that satisfy the bounds -an Yn bn, then

P( Yi ) exp(-2n2

2/ (bi+ai)2). Note that in Hoeffdings inequality, if |Yn| n1ni1

ni1

M, then P( Yi ) 2exp(-n2/2M2). The next theorem gets a strong law of largen1ni1

numbers with weaker than usual scaling:

Theorem 4.21. If Yn are independent random variables with zero means and |Yn| M, then Xn

= Yi satisfies Xkk1/2/log(k) as 0.n1ni1

Proof: Hoeffding's inequality implies Prob(k1/2Xk > log k) < 2exp(-(log k)2/2M2), and hence

Prob(k1/2Xk > log k) 2exp(-(log z)2

2/2M2)dz

kn1

zn

(6/)exp(M2/22)Φ(-(log n)/M + M/),

with the standard normal CDF Φ resulting from direct integration. Applying Theorem 4.2, thisinequality implies n1/2Xn/log n as 0.

If the Yi are not necessarily bounded, but have a proper moment generating function, one can getan exponential bound from the moment generating function.

Theorem 4.22. If i.i.d. mean-zero random variables Yi have a proper moment generating

function, then Xn = Yi satisfies P(Xn > ) < exp(-τn1/2+κ), where τ and κ are positiven1ni1

constants determined by the distribution of Yi.

Proof: P(Z > ) = F(dz) e(z-)tF(dz) e-tEeZt for a random variable Z. Let m(t) be thez> z>

moment generating function of Yi and τ be a constant such that m(t) is finite for t < 2τ. Then onehas m(t) = 1 + m"(λt)t2/2 for some λ < 1, for each t < 2τ, from the properties of mgf (see 3.5.12).

Page 116: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-25, Page 113______________________________________________________________________________

The mgf of Xn is m(t/n)n = (1 + m"(λt/n)t2/2n2)n, finite for t/n 2τ. Replace t/n by τn-1/2 andobserve that m"(λt/n) m"(τn-1/2) and (1+m"(τn-1/2)τ2/2n)n exp(m"(τn-1/2) τ2/2). Substituting theseexpressions in the initial inequality gives P(Xn > ) exp(-τn1/2 + m"(τn-1/2) τ2/2 ), and the resultholds with κ = m"(τ)τ2/2.

Using the same argument as in the proof of Theorem 4.19 and the inequality P(Xn > ) <exp(-τn1/2+κ) from Theorem 4.20, one can show that Xkk1/2/(log k)2 as 0, a SLLN with weakscaling.

4.5.5. This section states a uniform SLLN for random functions on compact set Θ in a Euclideanspace k. Let (S,F,P) denote a probability space. Define a random function as a mapping Y fromΘ×S into with the property that for each θ Θ, Y(θ,) is measurable with respect to (S,F,P). Notethat Y(θ,) is simply a random variable, and that Y(,s) is simply a function of θ Θ. Usually, thedependence of Y on the state of nature is suppressed, and we simply write Y(θ). A random functionis also called a stochastic process, and Y(,s) is termed a realization of this process. A randomfunction Y(θ,) is almost surely continuous at θo Θ if for s in a set that occurs with probability one,Y(,s) is continuous in θ at θo. It is useful to spell out this definition in more detail. For each > 0,

define Ak(,θo) = . Almost sure continuity states that theses S supθθo1/k

Y(θ,s)Y(θo,s)>

sets converge monotonically as k to a set Ao(,θo) that has probability zero. The condition of almost sure continuity allows the modulus of continuity to vary with s, so there

is not necessarily a fixed neighborhood of θo independent of s on which the function varies by lessthan . For example, the function Y(θ,s) = θs for θ [0,1] and s uniform on [0,1] is continuous atθ = 0 for every s, but Ak(,0) = [0,(-log )/(log k)) has positive probability for all k. The exceptionalsets Ak(,θ) can vary with θ, and there is no requirement that there be a set of s with probability one,or for that matter with positive probability, where Y(θ,s) is continuous for all θ. For example,assuming θ [0,1] and s uniform on [0,1], and defining Y(θ,s) = 1 if θ s and Y(θ,s) = 0 otherwisegives a function that is almost surely continuous everywhere and always has a discontinuity.

Theorem 4.3 in Section 4.1 established that convergence in probability is preserved bycontinuous mappings. The next result extends this to almost surely continuous transformations; theresult below is taken from Pollard (1984), p. 70.

Theorem 4.23. (Continuous Mapping). If Yn(θ) p Yo(θ) uniformly for θ in Θ k, randomvectors τo,τn Θ satisfy τn p τo, and Yo(θ) is almost surely continuous at τo, then Yn(τn) p Yo(τo).

Page 117: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-26, Page 114______________________________________________________________________________

Consider i.i.d. random functions Yi(θ) that have a finite mean ψ(θ) for each θ, and consider the

average Xn(θ) = Yi(θ). Kolmogorovs SLLN I implies that pointwise, Xn(θ) as ψ(θ).n1 ni1

However, we sometimes need in statistics a stronger result that Xn(θ) is uniformly close to ψ(θ) overthe whole domain Θ. This is not guaranteed by pointwise convergence. For example, the randomfunction Yn(s,θ) = 1 if n2s - θ 1, and Yn(s,θ) = 0 otherwise, where the sample space is the unitinterval with uniform probability, has P(Yn(,θ) > 0) 2/n2 for each θ. This is sufficient to giveYn(,θ) as 0 pointwise. However, P(supθ Yn(θ) > 0) = 1.

Theorem 4.24. (Uniform SLLN). Assume Yi(θ) are independent identically distributed randomfunctions with a finite mean ψ(θ) for θ in a closed bounded set Θ k . Assume Yi() is almostsurely continuous at each θ Θ. Assume that Yi() is dominated; i.e., there exists a random variableZ with a finite mean that satisfies Z supθΘY1(θ). Then ψ(θ) is continuous in θ and

Xn(θ) = satisfies supθΘXn(θ) - ψ(θ) as 0. 1n

n

i1Yi(θ)

Proof: We follow an argument of Tauchen (1985). Let (S,F,P) be the probability space, and writethe random function Yi(θ,s) to make its dependence on the state of Nature explicit. We have ψ(θ)

= Y(θ,s)P(ds). Define u(θo,s,k) = Y(θ,s) - Y(θo,s). Let > 0 be given. LetSsup

θθo1/k

Ak(/2,θo) be the measurable set given in the definition of almost sure continuity, and note that fork = k(/2,θo) sufficiently large, the probability of Ak(/2,θo) is less than /(4E Z). Then, Eu(θo,,k) u(θo,s,k)P(ds) + u(θo,s,k)P(ds) Ak(/2,θo) Ak(/2,θo)c

2Z(s)P(ds) + (/2)P(ds) .Ak(/2,θo) Ak(/2,θo)c

Let B(θo) be an open ball of radius 1/k(/2,θo) about θo. These balls constructed for each θo Θcover the compact set Θ, and it is therefore possible to extract a finite subcovering of balls B(θj) withcenters at points θj for j = 1,...,J. Let µj = Eu(θj,,k(/2,θj)) . For θ B(θj), ψ(θ) - ψ(θj) µj . Then

Page 118: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-27, Page 115______________________________________________________________________________

Xn(θ) - ψ(θ) Xn(θ) - Xn(θj) - µj + µj + Xn(θj) - ψ(θj) + ψ(θj) - ψ(θ)supθB(θj)

u(θj,,k(/2,θj)) - µj + + Xn(θj) - ψ(θj) + .1n

n

i1

Apply Kolmogorov's SLLN to each of the first and third terms to determine a sample size nj such that

P( u(θj,,k(/2,θj)) - µj > ) < /2Jsupnnj

n 1n

i1

and

P( Xn(θj) - ψ(θj) > ) < /2J .supnnj

With probability at least 1 - /J, Xn(θ) - ψ(θ) 4. Then, with probability at least 1 - ,supθB(θj)

Xn(θ) - ψ(θ) 4 for n > no = max(nj). G supθΘ

The construction in the proof of the theorem of a finite number of approximating points can bereinterpreted as the construction of a finite family of functions, the Y(θj,), with the approximationproperty that the expectation of the absolute difference between Y(θ,) for any θ and one of themembers of this finite family is less than . Generalizations of the uniform SLLN above can beobtained by recognizing that it is this approximation property that is critical, with a limit on howrapidly the size of the approximating family can grow with sample size for a given , rather thancontinuity per se; see D. Pollard (1984).

4.6. REFERENCES

P. Billingsley (1968) Convergence of Probability Measures, Wiley.P. Billingsley (1986) Probability and Measure, Wiley.J. Davidson (1994) Stochastic Limit Theory, Oxford.W. Feller (1966) An Introduction to Probability Theory and Its Applications, Wiley.I. Ibragimov and Y. Linnik, Independent and Stationary sequences of Random Variables, Wolters-Noordhoff, 1971.J. Neveu (1965) Mathematical Foundations of the Calculus of Probability, Holden-Day.D. Pollard (1984) Convergence of Stochastic Processes, Springer-Verlag.C. R. Rao (1973) Linear Statistical Inference and Its Applications, Wiley.

Page 119: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 4-28, Page 116______________________________________________________________________________

R. Serfling (1970) Convergence Properties of Sn Under Moment Restrictions, Annals of Mathematical Statistics, 41,1235-1248.R. Serfling (1980) Approximation Theorems of Mathematical Statistics, Wiley.G. Tauchen (1985)H. White (1984) Asymptotic Theory for Econometricians, Academic Press.

4.7 EXERCISES

1. The sequence of random variables Xn satisfy Xn(s) = sn, where s is a state of Nature in the sample space S = [0,1] withuniform probability on S. Does Xn have a stochastic limit, and if so in what sense (weak, strong, quadratic mean,distribution)? What about Yn = nXn or Zn = log(Xn)?

2. A sequence of random variables Zn are multivariate normal with mean zero, variance σ2n, and covariances E ZnZn+m= σ2n for m > n. (For an infinite sequence, this means that every finite subsequence is multivariate normal.) Let Sn =

Zk. Does Sn/n converge in probability? Is there a scale factor α(n) such that Sn/α(n) converges in probability?nk1

Is there a scale factor β(n) such that Sn/β(n) is asymptotically normal?

3. Ignoring adjustments for family composition and location, an American family is said to be below the poverty lineif its annual income is less than $14,800 per year. Let Yi be the income level of family i, drawn randomly andindependently from the American population, and let Qi be one if Yi is less than $14,800, zero otherwise. Family incomecan obviously never be larger than GDP, so that it is bounded above by a (very big) constant G, and cannot be negative.Let µ denote the population mean annual income and π denote the population proportion below the poverty line. Letmn and pn denote the corresponding sample means in a simple random sample of size n. Prove that sample mean annualincome mn converges in probability to population mean annual income; i.e., show the requirements for a WLLN aremet. Prove that n1/2(mn - µ) converges in distribution to a normal; i.e., show the requirements for a CLT are met.Similarly, prove that pn converges in probability to π and n1/2(pn - π) converges in distribution to a normal with mean 0and variance π(1-π).

4. Empirical illustration of stochastic limits: On the computer, construct a simple random sample of observations Xkby drawing independent uniform random variables Uk and Vk from (0,1) and defining Xk = 1 if Uk > 1/2 and Xk =log(Vk) if Uk 1/2. Let mn be the sample mean of the Xk from a sample of size n for n = 10, 100, 1000. (a) Does mnappear to converge in probability? To what limit? (b) Draw 100 samples of size 10 by the procedure described above,and keep the sample means from each of the 100 samples. Calculate what are called "studentized residuals" bysubtracting the mean of the 100 sample means, and dividing these differences by their sample standard deviation (i.e.,the square root of the average of the squared deviations). Sort these studentized residuals from low to high and plotthem against quantiles of the standard normal, Qk = Φ-1((k-0.5)/n). This is called a normal probability plot, anddeviations from the 45-degree line reflect differences in the exact distribution of the sample means from normal, plusrandom noise. Are there systematic deviations that suggest the normal approximation is not very good? (c) Repeat part(b) with 100 samples of size 100. Has the normal approximation become more accurate?

Page 120: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-1, Page 117______________________________________________________________________________

CHAPTER 5. EXPERIMENTS, SAMPLING, STATISTICAL DECISIONS

5.1. EXPERIMENTS

The business of economics is to explain how consumers and firms behave, and the implicationsof this behavior for the operation of the economy. To do this, economists need to be able to describethe features of the economy and its economic agents, to model behavior, and to test the validity ofthese models. For example, economists are interested in determining the effects of medicareeligibility on retirement decisions. They believe that the incentives implicit in medical insuranceprograms influence willingness to work, so that changes in these programs may cause retirementbehavior to change. A first level of empirical interest is a description of the current situation, asnapshot of current retirement patterns under the current program. This description could be basedon a census or sample of the current population. Statistics will play a role if a sample is used,providing tools for judging the accuracy of estimates of population parameters. At a deeper level,economists want to estimate how patterns would change if eligibility rules were altered. This interestrequires that one conduct, or at least observe, an experiment in which different workers facedifferent programs, and the impact of the program differences on their responses can be observed.The objective is to uncover a causal mapping from programs to behavioral responses. Major barriersto accomplishing this are confounding causal factors, or mutual causation by deeper hidden effects.A well-designed experiment or intelligent use of a natural experiment that provides a clearseparation between the factor of interest and possible confounding effects will provide the mostcompelling empirical evidence on the economic question.

The most reliable way to try to uncover a causal relationship is through a designed experiment.For example, to study the effect of the medicare program on retirement, one could in principleestablish several different levels of medicare eligibility, or treatments, and assign these treatmentsat random to members of the population. The measured response of employment to these treatmentis the causal effect we were looking for, with the random assignment of treatments assuring that theeffects we see are arising from this source alone, not from other, uncontrolled factors that mighthappen to be correlated with the treatment. Classical prototypes for designed experiments are thosedone in chemistry or biology labs, where a good procedure will be effective in eliminating potentialconfounding factors so the effect of the one factor of interest can be measured. Even here, there canbe problems of measurement error and contaminated experiments, and statistical issues arise.Perhaps better prototypes for experiments in economics are designed field experiments in ecologyor agronomy. For example, consider the classical experiment to measure the impact of fertilizationon the productivity of corn plants. The agronomist prepares different test plots, and tries to keepconditions other than fertilizer, such as irrigation levels, comparable across the plots. However,there will be a variety of factors, such as wind and sunshine, that may differ from one plot to another.To isolate the effect of fertilizer from these confounding effects, the agronomist assigns the fertilizer

Page 121: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-2, Page 118______________________________________________________________________________

treatments to the different plots at random. This randomized treatment design is a powerful tool formeasuring the causal effect of the treatments. Economists rarely have the freedom to study economicrelationships by designing classical experiments with random assignment of treatments. At the scaleof economic policy, it would often be invasive, time-consuming, and costly to conduct theexperiments one would need. In addition, being experimented on can make economic agents andeconomies testy. However, there are various arenas in which designed experiments are done ineconomics. Field experiments have examined the impact of different marginal tax rates onemployment behavior in low-income families, and the effect of different job training programs.Economics laboratory experiments have studied behavior in artificial markets. However, some areasof economic interest are beyond the reach of designed experiments because of technical and ethicalbarriers. No one would seriously propose, for example, to study the effect of life expectancy onsavings behavior by randomly assigning execution dates, or the returns to education by randomlyassigning years of schooling that students may receive. This makes economics primarily anobservational or field science, like astronomy. Economists must search for natural experiments inwhich economic agents are subjected to varying levels of a causal factor of interest undercircumstances where the effects of potential confounding factors are controlled, either by somethinglike random assignment of treatments by Nature, or by measuring the levels of potential confoundingfactors and using modeling and data analysis methods that can untangle the separate effects ofdifferent factors. For example, to study the impact of schooling on income, we might try to use asa Natural experiment individuals such as Vietnam war draftees and non-draftees who as a resultof the random draft lottery have different access to schooling. To study the impact of medicalinsurance eligibility on retirement decisions, we might try to study individuals in different stateswhere laws for State medical welfare programs differ. This is not as clean as random assignmentof treatments, because economic circumstances differ across States and this may influence whatwelfare programs are adopted. Then, one is left with the problem of determining how much of avariation in retirement patterns between States with strong and weak work incentives in their medicalwelfare programs is due to these incentives, and how much is due to overall demographics or incomelevels that induced States to adopt one welfare program or the other.

Looking for good natural experiments is an important part of econometric analysis. The mostpersuasive econometric studies are those where Nature has provided an experiment in which thereis little possibility than anything other than the effect you are interested in could be causing theobserved response. In data where many factors are at work jointly, the ability of statistical analysisto identify the separate contribution of each factor is limited. Regression analysis, which forms thecore of econometric technique, is a powerful tool for separating the contributions of different factors,but even so it is rarely definitive. A good way to do econometrics is to look for good naturalexperiments and use statistical methods that can tidy up the confounding factors that Nature has notcontrolled for us.

Page 122: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-3, Page 119______________________________________________________________________________

5.2. POPULATIONS AND SAMPLES

5.2.1. Often, a population census is impractical, but it is possible to sample from the population.A core idea of statistics is that a properly drawn sample is a representation of the population, and thatone can exploit the analogies between the population and the sample to draw inferences from thesample about features of the population. Thus, one can measure the average retirement age in thesample, and use it to infer the mean retirement age in the population. Statistics provides the toolsnecessary to develop these analogies, and assess how reliable they are.

A basic statistical concept is of a simple random sample. The properties of a simple randomsample are that every member of the population has the same probability of being included, and thesample observations are statistically independent. A simple random sample can be defined formallyin terms of independent trials from a probability space; see Chap. 3.4. However, for currentpurposes, it is sufficient to think of a population that is characterized by a probability distribution,and think of a random sample as a sequence of observations drawn independently from thisdistribution.

5.2.2. A simple random sample is representative of the underlying population in the sense thateach sample observation has the population probability distribution. However, there is a morefundamental sense in which a simple random sample is an analog of the population, so that samplestatistics are appealing approximations to their population analogs. Suppose one is dealing with arandom variable X that is distributed in the population with a CDF F(x), and that one is interested

in some feature of this distribution, such as its mean µ = xF(dx). This expectation depends

on F, and we could make the dependence explicit by writing it as µ(x,F). More generally, the targetmay be µ(g,F), where g = g(x) is some function of x, such as g(x) = x2 or g(x) = 1(x0).

Now suppose (x1, .,xn) is a simple random sample drawn from this CDF, and define Fn(x) = 1(xi x) .n1

ni1

Then Fn(x) equals the fraction of the sample values that are no greater than x. This is called theempirical CDF of the sample. It can be interpreted as coming from a probability measure that putsweight 1/n on each sample point. The population mean µ(x,F) has a sample analog which is usually

written as x = xi, but can also be written as µ(x,Fn) = xFn(dx). This notationn1ni1

emphasizes that the sample mean is a function of the empirical CDF of the sample. The populationmean and the sample mean are then the same function µ(x,), the only difference being that the firstis evaluated at F and the second is evaluated at Fn. The following proposition, sometimes called the"fundamental theorem of statistics", establishes that as a simple random sample gets larger and

Page 123: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-4, Page 120______________________________________________________________________________

larger, its empirical CDF approximates the population CDF more and more closely. Then,intuitively, if µ(g,) is continuous in its second argument, an analogy principle suggests that µ(g,Fn)will converge to µ(g,F), so that µ(g,Fn) will be a good estimator of µ(g,F).

Theorem 5.1. (Glivenko-Cantelli) If random variables X1,X2,... are independent and have acommon CDF F, then supxFn(x) - F(x) converges to zero almost surely.

Proof: Given , δ > 0, there exists a finite number of points z1 < ... < zK such that the monotoneright-continuous function F varies at most /2 between the points; i.e., F(zk*) - F(zk-1) < /2, wherezk* denotes the limiting value as one approaches zk from the left. Any point where F jumps by morethan /4 will be included as a zk point. By convention, assume z1 = - and zK = +. For every x,bracketed by zk-1 x < zk, one has Fn(zk-1) - F(zk-1) - /2 Fn(x) - F(x) Fn(zk) - F(zk) + /2. Theevent supkFn(zk) - F(zk) < /2 then implies the event supx Fn(x) - F(x) < . At each zk, theKolmogorov SLLN establishes that Fn(zk) as F(zk). Then there exists nk such that the probability ofFn(zk) - F(zk) > /3 for any n nk is less than δ/K. Let n = maxk nk. Then, with probability at least1 - δ, the event supm>n supk Fm(zk) - F(zk) < /2 occurs, implying the event supm>n supx Fm(x) - F(x)< , occurs. G

The Glivenko-Cantelli theorem implies that Fn converges in distribution to F, but is stronger,establishing that the convergence is uniform rather than pointwise, and is not restricted to continuitypoints of F. It is useful to state the Kolmogorov SLLN in the terminology used here: If thepopulation statistic µ(g,F) exists, then the sample statistic µ(g,Fn) converges almost surely to µ(g,F).This provides a fundamental justification for the use of simple random samples, and for the use ofsample statistics µ(g,Fn) that are analogs of population statistics µ(g,F) that are of interest.

5.2.3. While the idea of simple random sampling is straightforward, implementation inapplications may not be. The way sampling is done is to first establish a sample frame and asampling protocol. The sample frame essentially identifies the members of the population in anoperational way that makes it possible for them to be sampled, and the sampling protocol spells outprecisely how the sampling is to be done and the data collected. For example, suppose your targetis the population of individuals who were 55 years of age in 1980 and over the following twentyyears have retired in some pattern that may have been influenced by their access to medicalinsurance. An ideal sample frame would be a master list containing the names and current telephonenumbers of all individuals in this target population. The sampling protocol could then be to use arandom number generator to select and call individuals from this list with equal probability, andcollect data on their retirement age and economic circumstances. However, the required master listdoes not exist, so this simple sample design is infeasible. A practical sample frame might insteadstart from a list of all working residential telephone numbers in the U.S. The sampling protocolwould be to call numbers at random from this list, ask screening questions to determine if anyone

Page 124: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-5, Page 121______________________________________________________________________________

from the target population lives at that number, and interview an eligible resident if there is one.This would yield a sample that is not exactly a simple random sample, because some members ofthe target population have died or do not have telephones, households with multiple telephones areover sampled relative to those with one telephone, some households may contain more than oneeligible person, and there may be attrition because some telephones are not answered or therespondent declines to participate. Even this sampling plan is infeasible if there is no master list ofall the working residential telephone numbers. Then one might turn instead to random digit dialing(RDD), with a random number generator on a computer making up potential telephone numbers atrandom until the phone is answered. At first glance, it may seem that this is guaranteed to produceat least a simple random sample of working telephones, but even here complications arise. Differentprefixes correspond to different numbers of working phones, and perhaps to different mixes ofresidential and business phones. Further, the probability that a number is answered may depend onthe economic status of the owner. An important part of econometric analysis is determining whendeviations from simple random sampling matter, and developing methods for dealing with them.

There are a variety of sampling schemes that are more complex variants on simple randomsampling, with protocols that produce various forms of stratification. An example is clustersampling, which first selects geographical units (e.g., cities, census tracts, telephone prefixes), andthen samples residences within each chosen unit. Generally, these schemes are used to reduce thecosts of sampling. Samples produced by such protocols often come with sample weights, the ideabeing that when these are applied to the sample observations, sample averages will be reasonableapproximations to population averages. Under some conditions, econometric analysis can be carriedout on these stratified samples by treating them as if they were simple random samples. However,in general it is important to consider the implications of sampling frames and protocols when oneis setting up a statistical analysis.

We have given a strong theoretical argument that statistical analysis of simple random sampleswill give reasonable approximations to target population features. On the other hand, the history ofstatistics is filled with horror stories where an analysis has gone wrong because the sample was notrandom. The classical example is the Liberty telephone poll in 1936 that predicted that Rooseveltwould lose the Presidential election that he won in a landslide. The problem was that only the richhad telephones in 1936, so the sample was systematically biased. One should be very skeptical ofstatistical analyses that use purposive or selected samples, as the safeguards provided by randomsampling no longer apply and sample statistics may be poor approximations to population statistics.Claims that a given sample frame and protocol have produced a simple random sample also deservescrutiny,

An arena where the sampling theory is particularly obscure is in analysis of economic time-series. Here, one is observing a slice of history, and the questions are what is the population is fromwhich this sample is drawn, and in what sense does this slice has properties of a random sample.One way statisticians have thought about this is to visualize our universe as being one draw from apopulation of "parallel universes".. This helps for doing formal probability theory, but is

Page 125: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-6, Page 122______________________________________________________________________________

unsatisfying for the economist whose target is a hypothesis about the one universe we are in.Another way to approach the problem is to think about the time series sample as a slice of astochastic process that operates through time, with certain rules that regulate the relationship betweenbehavior in a slice and behavior through all time. For example, one might postulate that thestochastic process is stationary and ergodic, which would mean that the distributions of variablesdepend only on their relative position in time, not their absolute position, and that long run averagesconverge to limits.

In this chapter and several chapters following, we will assume that the samples we are dealingwith are simple random samples. Once we have a structure for statistical inference in this simplestcase, we turn in later chapters to the problems that arise under alternative sampling protocols.

5.3. STATISTICAL DECISIONS

5.3.1. The process of statistical estimation can be thought of as decision-making underuncertainty. The economic problem faced by Cab Franc in Chapter 1 is an example. Indecision-making under uncertainty, one has limited information, based upon observed data. Thereare costs to mistakes. On the basis of the available information, one wants to choose an action thatminimizes cost. Let x denote the data, which may be a vector of observations from a simple randomsample, or some more complex sample such as a slice from a time series process. Theseobservations are governed by a probability law, or data generation process (DGP). We do not knowthe true DGP, but assume now that we do know that it is a member of some family of possible DGP'swhich we will index by a parameter θ. The true DGP will correspond to a value θo of this index.Let F(x,θ) denote the CDF for the DGP corresponding to the index θ. For the remainder of thisdiscussion, we will assume that this CDF has a density, denoted by f(x,θ). The density f is called thelikelihood function of the data. The unknown parameter θo might be some population feature, suchas age of retirement in the example discussed at the beginning of this chapter. The statisticaldecision problem might then be to estimate θo, taking into account the cost of errors. Alternately,θo might be one of two possible values, say 0 and 1, corresponding to the DGP an economist wouldexpect to see when a particular hypothesis is true or false, respectively. In this case, the decisionproblem is to infer whether the hypothesis is in fact true, and again there is a cost of making an error.

Where do we get values for the costs of mistakes in statistical decisions? If the client for thestatistical analysis is a business person or a policy-maker, an inference about θo might be an inputinto an action that has a payoff in profits or in a measure of social welfare that is indexed in dollars.A mistake will lower this payoff. The cost of a mistake is then the opportunity cost of foregoing thehigher payoff to be obtained if one could avoid mistakes. For the example of retirement behavior,making a mistake on the retirement age may cause the planned medicare budget to go out of balance,and cost may be a known function of the magnitude of the unanticipated imbalance. However, ifthere are multiple clients, or the analysis is being performed for the scientific audience, there may

Page 126: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-7, Page 123______________________________________________________________________________

not be precise costs, and it may be necessary to provide sufficient information from the analysis sothat potential users can determine their most appropriate action based on their personal costassessments. Before considering this situation, we will look at the case where there is a known costfunction C(θ,θo,x) that depends on the true parameter value θo and on the inference θ made from thedata, and in general can also depend directly on x.

5.3.2. A decision rule, or action, will be a mapping T() from the data x into the space of possibleθ values. Note that while T(x) depends on the data x, it cannot depend directly on the unknownparameter θo, only indirectly through the influence of θo on the determination of x. Because the dataare random variables, T() is also a random variable, and it will have a density ψ(t,θo) that could beobtained from f(x,θo) by considering a one-to-one transformation from x to a vector that containsT(x) and is filled out with some additional variables Z(x). The cost associated with the action T(),given data x, is C(T(x),θo,x). One would like to choose this to be as small as possible, but theproblem is that usually one cannot do this without knowing θo. However, the client may, prior to theobservation of x, have some beliefs about the likely values of θo. We will assume that these priorbeliefs can be summarized in a density h(θ). Given prior beliefs, it is possible to calculate anexpected cost for an action T(). First, apply Bayes law to the joint density f(x,θo)h(θo) of x and θoto obtain the conditional density of θo given x,

(1) p(θox) = f(x,θo)h(θo)/ f(x,θ)h(θ)dθ .

This is called the posterior density of θo, given the data x. Using this posterior density, the expectedcost for an action T(x) is

(2) R(T(x),x) = C(T(x),θo,x)p(θox)dθo

= C(T(x),θo,x)f(x,θo)h(θo)dθo/ f(x,θ)h(θ)dθ .

This expected cost is called the Bayes risk. It depends on the function T(). The optimal action T*()is the function T() that minimizes the expected cost for each x, and therefore minimizes the Bayesrisk. One has R(T*(x),x) R(T*(x) + λ,x) for a scalar λ, implying for each x the first-order

condition 0 = tC(T(x),θo,x)p(θox)dθo. A strategy T(x) is called inadmissible if there is a

second strategy T(x) such that C(T(x),θ,x) C(T(x),θ,x) for all θ for which f(x,θ) > 0, with theinequality strict for some θ. Clearly the search for the optimal action T*(x) can be confined to theset of strategies that are admissible. In general, it is not obvious what the optimal action T*(x) thatsolves this problem looks like. A few examples help to provide some intuition:

Page 127: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-8, Page 124______________________________________________________________________________

(I) Suppose C(θ,θo,x) = (θ - θo)2, a quadratic cost function in which cost is proportional to thesquare of the distance of the estimator T(x) from the true value θo. For a given x, the argumentθ = T(x) that minimizes (2) has to satisfy the first- order condition

0 = (T*(x) - θo)f(x,θo)h(θo)dθo, or

(3) T*(x) = θof(x,θo)h(θo)dθo/ f(x,θ)h(θ)dθ = θp(θx)dθ .

Then, T*(x) equals the mean of the posterior density. (ii) Suppose C(θ,θo,x) = αmax(0,θ - θo) + (1-α)max(0,θo - θ) where α is a cost parametersatisfying 0 < α < 1. This cost function is linear in the magnitude of the error. When α = 1/2,the cost function is symmetric; for smaller α it is non-symmetric with a unit of positive errorcosting less than a unit of negative error. The first-order condition for minimizing cost is

0 = -(1-α) f(x,θo)h(θo)dθo + α f(x,θo)h(θo)dθo,T (x)

T (x)

or letting P(θx) denote the CDF of the posterior density, P(T*(x)x) = α. Then T*(x) equals theα-level quantile of the posterior distribution. In the case that α = 1/2, so that costs are symmetricin positive and negative errors, this criterion picks out the median of the posterior density.

(iii) Suppose C(θ,θo,x) = -1/2α for θ-θo α, and C(θ,θo,x) = 0 otherwise. This is a cost functionthat gives a profit of 1/2α when the action is within a distance α of θo, and zero otherwise; withα a positive parameter. The criterion (2) requires that θ = T*(x) be chosen to minimize the

expression p(θox)dθo. If α is very small, then p(θox)dθo (1/2α)θα

θα(1/2α)

θα

θα

-p(θx). The argument minimizing -p(θx) is called the maximum posterior likelihood estimator;it picks out the mode of the posterior density. Then for α small, the optimal estimator isapproximately the maximum posterior likelihood estimator. Recall that p(θx) is proportionalto f(x,θ)h(θ). Then, the first-order condition for its maximization can be written

(4) 0 = + .θf(x,θ)f(x,θ)

θh(θ)h(θ)

The first term on the right-hand-side of this condition is the derivative of the log of the likelihoodfunction, also called the score. The second term is the derivative of the log of the prior density.If prior beliefs are strong and tightly concentrated, then the second term will be very important,and the maximum will be close to the mode of the prior density, irrespective of the data. On theother hand, if prior beliefs are weak and very disperse, the second term will be small and the

Page 128: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-9, Page 125______________________________________________________________________________

maximum will be close to the mode of the likelihood function. In this limiting case, the solutionto the statistical decision problem will be close to a general-purpose classical estimator, themaximum likelihood estimator.

The cost function examples above were analyzed under the assumption that prior beliefs werecharacterized by a density with respect to Lebesgue measure. If, alternately, the prior density hada finite support, then one would have analogous criteria, with sums replacing integrals, and thecriteria would pick out the best point from the support of the prior.

5.3.3. The idea that there are prior beliefs regarding the true value of θo, and that these beliefs canbe characterized in terms of a probability density, is called the Bayesian approach to statisticalinference. It is philosophically quite different than an approach that thinks of probabilities as beingassociated only with particular random devices such as coin tosses that can produce frequencies.Bayesian statistics assumes that humans have a coherent system of beliefs that can attachprobabilities to events such as "the Universe will continue to expand forever" and "40 percent ofworkers age 65 will work another year if medicare is unavailable", and these personal probabilitiessatisfy the basic axioms of probability theory. One of the implications of this way of thinking is thatit is meaningful to talk about the probability that an event occurs, even if the "event" is somethinglike a mathematical theorem whose truth is completely determinable by logic, and not the result ofsome cosmic coin toss. (In this case, if you do not know if the theorem is true, your probability mayreflect your opinion of the mathematical abilities of its author.) How one thinks about probabilitiesinfluences how one thinks about an economic hypothesis, such as the hypothesis that retirement agedoes not depend on the age of medicare eligibility. In classical statistics, a hypothesis is either trueor false, and the purpose of statistical inference is to decide whether it is true. In Bayesian statistics,this would correspond to concluding that the probability that the event is true is either zero or one.For a Bayesian statistician, it is more meaningful to talk about a high or low probability of thehypothesis being true.

5.3.4. The statistical decision theory just developed assumed that the analysis had a client withprecisely defined prior beliefs. As in the case of the cost of errors, there will be circumstances wherethe client's prior beliefs are not known, or there is not even a well-defined client. It is the lack of aclearly identified prior that is one of the primary barriers to acceptance of the Bayesian approach tostatistics. (Bayesian computations can be quite difficult, and this is a second major barrier.) Thereare three possible options in the situation where there is not a well-defined prior.

5.3.5. The first option is to carry out the statistical decision analysis with prior beliefs that carry"zero information". For example, an analysis may use a "diffuse" prior that gives every value of θan equal probability. There are some technical problems with this approach. If the set of possibleθ values is unbounded, "diffuse" priors may not be proper probability densities that integrate to one.

Page 129: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-10, Page 126______________________________________________________________________________

This problem can be skirted by using the prior without normalization, or by forming it as a limit ofproper priors. More seriously, the idea of equal probability as being equivalent to "zero information"is flawed. A one-to-one but nonlinear transformation of the index θ can change a "diffuse" priorwith equal probabilities into a prior in which probabilities are not equal, without changing anythingabout available information or beliefs. Then, equal probability is not in fact a characterization of"zero information". The technique of using diffuse or "uninformed" priors is fairly popular, in partbecause it simplifies some calculations. However, one should be careful to not assume that ananalysis based on a particular set of diffuse priors is "neutral" or "value-free".

5.3.6. The second option is based on the idea that you are in a game against Nature in whichNature plays θo and reveals information x about her strategy, and you know that x is a draw from theDGP f(x,θo). You then play T(x). Of course, if you had a prior h(θo), which in this context mightbe interpreted as a conjecture about Natures play, you could adopt the optimal Bayes strategy T*(x).A conservative strategy in games is to play in such a way that you minimize the maximum cost youropponent can impose on you. This strategy picks T*(x) = argmint maxh R(t,x,h), where R is theBayes risk, now written with an argument h to emphasize that it depends on the prior h. The ideais that the worst Nature can do is draw θo from the prior that is the least favorable to you in terms ofcosts, and this strategy minimizes this maximum expected cost. This is called a minimax strategy.Unless the problem has some compactness properties, the minimax strategy may not exist, althoughthere may be a sequence of strategies that come close. A minimax strategy is a sensible strategy ina zero-sum game with a clever opponent, since your cost is your opponent's gain. It is not obviousthat it is a good strategy in a game against Nature, since the game is not necessarily zero-sum andit is unlikely that Nature is an aware opponent who cares about your costs. There may howevermeta-Bayesian solutions in which search for a least favorable prior is limited to a class that theanalyst considers possible.

5.3.7. The final option is to stop the analysis short of a final solution, and simply deliversufficient information from the sample to enable each potential user to compute the actionappropriate to her own cost function and prior beliefs. Suppose there is a one-to-one transformationof the data x into two components (y,z) so that the likelihood function f(x,θ) factors into the productof a marginal density of y that depends on θ, and a conditional density of z, given y, that does notdepend on θ, f(x,θ) f1(y,θ)f2(zy). In this case, y is termed a sufficient statistic for θ. If one formsthe posterior density of θ given x and a prior density h, one has in this case

(5) p(θx) = = = p(θy) .f1(y,θ)f2(zy)h(θ)

f1(y,θ)f2(zy)h(θ)dθ

f1(y,θ)h(θ)

f1(y,θ)h(θ)dθ

Page 130: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-11, Page 127______________________________________________________________________________

Then, all the information to be learned about θ from x, reflected in the posterior density, can belearned from the summary data y. Then it is unnecessary to retain all the original data in x forpurposes of statistical inference on θ; rather it is enough to retain the sufficient statistic y. Byreporting y, the econometrician leaves the user completely free to form a prior, and calculate theposterior likelihood and the action that minimizes the users Bayes risk. The limitations of thisapproach are that the dimensionality of sufficient statistics can be high, in many cases the dimensionof the full sample, and that a substantial computational burden is being imposed on the user.

5.4. STATISTICAL INFERENCE

Statistical decision theory provides a template for statistical analysis when it makes sense tospecify prior beliefs and costs of mistakes. Its emphasis on using economic payoffs as the criterionfor statistical inference is appealing to economists as a model of decision-making under uncertainty,and provides a comprehensive, but not necessarily simple, program for statistical computations.While the discussion in this chapter concentrated on estimation questions, we shall see in Chapter7 that it is also useful for considering tests of hypotheses.

The primary limitation of the Bayesian analysis that flows from statistical decision theory is thatit is difficult to rationalize and implement when costs of mistakes or prior beliefs are not fully spelledout. In particular, in most scientific work where the eventual user of the analysis is not identified,so there is no consensus on costs of mistakes or priors, there is often a preference for "purelyobjective" solutions rather than Bayesian ones. Since a Bayesian approach can in principle bestructured so that it provides solutions for all possible costs and priors, including those of anyprospective user, this preference may seem puzzling. However, there may be compellingcomputational reasons to turn to "classical" approaches to estimation as alternative to the Bayesianstatistical decision-making framework. We will do this in the next chapter.

5.5. EXERCISES

1. Suppose you are concerned about the question of whether skilled immigrants make a positive net contribution to theU.S. economy, to the non-immigrant residents, and to the domestic workers who are competing directly for the sameskilled jobs. If you could design an experiment to measure these effects, how would you do it? If you have to rely ona natural experiment, what would you look for?

2. Suppose you have a random sample of size n from a population with CDF F(x) which has mean µ and variance σ2.You estimate µ by forming the sample average, or mean of the empirical distribution Fn. The distribution of thisestimator could be determined by drawing repeated samples from F, forming the empirical distribution of the sampleaverages, and taking the limit. An approximation to this calculation, called the bootstrap, starts from the knownempirical distribution of the original sample Fn rather than the unknown F. Try this computationally. Take F to beuniform on [0,1], and draw a base sample of size 10. Now estimate the CDF of the sample mean by (1) repeatedly

Page 131: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 5-12, Page 128______________________________________________________________________________

sampling from the uniform distribution and (2) sampling with replacement from the base sample. Do 100 draws fromeach, and compare their medians. 3. Discuss how you would go about drawing a simple random sample of commuters in the San Francisco Bay Area.What problems do you face in defining the universe and the sample frame. Discuss ways in which you could implementthe sampling. What problems are you likely to encounter?

4. Review the decision problem of Cab Franc in Chapter 1, and put it in the terminology of Section 5.3.2. Discuss theimpact on Cabs decision of the particular loss function he has.

Page 132: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-1, Page 129______________________________________________________________________________

CHAPTER 6. ESTIMATION

6.1. DESIRABLE PROPERTIES OF ESTIMATORS

6.1.1 Consider data x that comes from a data generation process (DGP) that has a density f(x).Suppose we do not know f(), but do know (or assume that we know) that f() is a member of afamily of densities G. The estimation problem is to use the data x to select a member of G whichis some appropriate sense is close to the true f(). Suppose we index the members of G by theelements of some set Θ, and identify f () with a particular index value θo. Then, another way ofstating the estimation problem is that in the family of densities f(x,θ) parameterized by θ Θ, wewant to use the data x to estimate the true parameter value θo. The parameterization chosen for anestimation problem is not necessarily unique; i.e., there may be more than one way to parameterizethe same family of densities G. Sometimes this observation can be used to our advantage, bychoosing parameterizations that simplify a problem. However, a parameterization can createdifficulties. For example, you might set up Θ in such a way that more than one value of θ picks outthe true density f; e.g., for some θo θ1, one has f(x,θo) = f(x,θ1) for all x. Then you are said to havean identification problem. Viewed within the context of a particular parameterization, identificationproblems cause real statistical difficulties and have to be dealt with. Viewed from the standpoint ofthe fundamental estimation problem, they are an artificial consequence of an unfortunate choice ofparameterization. Another possible difficulty is that the family of densities generated by yourparametric specification f(x,θ), θ Θ, may fail to coincide with G. A particularly critical questionis whether the true f() is in fact in your parametric family. You cannot be sure that it is unless yourfamily contains all of G. Classical statistics always assumes that the true density is in the parametricfamily, and we will start from that assumption too. In Chapter 28, we will ask what the statisticalproperties and interpretation of parameter estimates are when the true f is not in the specifiedparametric family. A related question is whether your parametric family contains densities that arenot in G. This can affect the properties of statistical inference; e.g., degrees of freedom forhypothesis tests and power calculations.

In basic statistics, the parameter θ is assumed to be a scalar, or possibly a finite-dimensionalvector. This will cover many important applications, but it is also possible to consider problemswhere θ is infinite-dimensional. It is customary to call estimation problems where θ is finite-dimensional parametric, and problems where θ is infinite-dimensional semiparametric ornonparametric. (It would have been more logical to call them finite-parametric and infinite-parametric, respectively, but the custom is too ingrained to change.) Several chapters in the latterhalf of this book, particularly Chapter 28, deal with infinite-parameter problems.

6.1.2. In most initial applications, we will think of x as a simple random sample of size n,x = (x1,...,xn), drawn from a population in which x has a density f(x,θo), so that the DGP density isf(x,θ) = f(x1,θo)...f(xn,θo). However, the notation f(x,θo) can also cover more complicated DGP, suchas time-series data sets in which the observations are serially correlated. Suppose that θo is anunknown k×1 vector, but one knows that this DGP is contained in a family with densities f(x,θ)

Page 133: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-2, Page 130______________________________________________________________________________

indexed by θ Θ. An important leading case is k = 1, so that θo is a scalar. For many of the topicsin this Chapter, it is useful to concentrate first on this case, and postpone dealing with the additionalcomplications introduced by having a vector of parameters. However, we will use definitions andnotation that cover the vector as well as the scalar case. Let X denote the domain of x, and Θ denotethe domain of θ. In the case of a simple random sample where an observation x is a point in a spaceX, one has X = Xn. The statistical inference task is to estimate θo. In Chapter 5, we saw that anestimator T(x) of θo was desirable from a Bayesian point of view if T() minimized the expected costof mistakes. For typical cost functions where the larger the mistake, the larger the cost, Bayesestimators will try to get "close" to the true parameter value. That is, the Bayes procedure will seekestimators whose probability densities are concentrated tightly around the true θo. Classicalstatistical procedures lack the expected cost criterion for choosing estimators, but also seekestimators whose probability densities are near the true density f(x,θo).

In this Chapter, we will denote the expectation of a function r(x,γ) of x and a vector ofparameters γ by E r(x,γ), or when it is necessary to identify the parameter vector of the true DGP,

by Ex|θr(x,γ) = r(x,γ)f(x,θ)dx. Sometimes, the notation Ex|θr(x,γ) is abbreviated to Eθr(x,γ).

This notation also applies when the parameters γ are also in Θ. Then Ex|θr(x,θ) is the expectation ofr(x,γ) when γ is set equal to the true parameter vector θ, and Ex|θr(x,γ) is the expectation when r isevaluated at an argument γ that is not necessarily equal to the true parameter vector θ. The first ofthese expectations can be interpreted as a function of θ, and the second as a function of γ and θ.

The class of functions r(x) that do not depend on any unknown parameters are called statistics. Examples of statistics are sample means and sample variances. The expectation ρ(θ) Ex|θr(x) ofa statistic r(x) is a function of θ (if the expectation exists), and it is sometimes said that r(x) is anunbiased estimator of ρ(θ), or if ρ(θ) θ, an unbiased estimator of θ. It is important to note that ρ(θ) Ex|θr(x) is an identity that holds for all θ, and unbiasedness is a statement about this identity, notabout the expectation for a specific value of θ. Observe that the condition ρ(θ) θ is quite special,since in general r(x) need not be in the same space or of the same dimension as θ. The concept ofunbiased estimators and their properties will be developed further later in this section. However, itis worth noting at this point that every statistic that has an expectation is by definition an unbiasedestimator of this expectation. Colloquially, every estimator is an unbiased estimator of what itestimates.

When r(x) is in the same space as θ, so that the Euclidean distance between r(x) and θ is defined,the expectation MSE(θ) Ex|θ(r(x) - θ)2 is termed the mean square error (MSE) of r. Note that theMSE may fail to exist, and when it does exist, it is a function of θ. The MSE is a classical conceptin statistics, but note that it is identical to the the quadratic cost function that is widely used instatistical decision theory. As a result, there will be a strong family resemblance between estimatorsthat have optimality properties for the statistical decision problem with quadratic cost and estimatorsthat have some classical optimality properties in terms of MSE.

It is sometimes useful to think of each possible value of θ as defining a coordinate in a space, andMSE(θ) as determining a point along this coordinate. For example, if there are two possible valuesof the parameter, θ1 and θ2, then (MSE(θ1),MSE(θ2)) is a point in two-dimensional space that

Page 134: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-3, Page 131______________________________________________________________________________

characterizes its MSE as a function of θ. Different estimators have different MSE points in thisspace, and we may use the geometry of their relationship to select among them. There is an obviousgeneralization from two dimensions to a finite-dimensional vector space, corresponding to Θ finite,but the geometry continues to be well defined even when the set of possible parameter vectors is notfinite.

The MSE can always be written as

MSE(θ) Ex|θ(r(x) - θ)2 Ex|θ(r(x) - ρ(θ) + ρ(θ) - θ)2 Ex|θ(r(x) - ρ(θ))2 + (ρ(θ) - θ)2,

where ρ(θ) is the expectation of r(x), and there is no cross-product term since it has expectation zero.The term V(θ) Ex|θ(r(x) - ρ(θ))2 is the variance of r(x), the term B(θ) = ρ(θ) - θ is the bias of r(x),and we have the result that MSE equals variance plus squared bias.

6.1.3. Listed below are some of the properties that are deemed desirable for classical estimators.Classical statistics often proceeds by developing some candidate estimators, and then using someof these properties to choose among the candidates. It is often not possible to achieve all of theseproperties at the same time, and sometimes they can even be incompatible. Some of the propertiesare defined relative to a class of candidate estimators, a set of possible T() that we will denote byT. The density of an estimator T() will be denoted ψ(t,θo), or when it is necessary to index theestimator, ψT(t,θo). Sometimes the parameter vector θ will consist of a subvector α that is of primaryinterest for the application and a subvector β that is not. Then, α is termed the primary parametervector, β is termed a nuisance parameter vector, and the DGP f(x,α,β) depends on both the primaryand nuisance parameters. In this case, the problem is often to estimate α, dealing with the nuisanceparameters as expediently as possible. One approach with fairly wide applicability is to replace βin the DGP by some appropriate function r(x,α), obtaining a concentrated DGP f(x,α,r(x,α)) that isa function only of the α parameters. Some statistical analysis is needed to determine when this isfeasible and can be used as a device to get estimates of α with reasonable statistical properties. Aspecific choice of r(x,α) that often works is the argument that solves the problem maxβ f(x,α,β).Keep in mind that choice of parameterization is to some extent under the control of the analyst.Then it may be possible to choose a parameterization that defines α and isolates nuisance parametersin a way that helps in estimation of the primary parameters α.

6.1.4. Sufficiency. Suppose there is a one-to-one transformation from the data x into variables(y,z). Then the DGP density f(x,θ) can be described in terms of the density of (y,z), which we mightdenote g(y,z,θ) and write as the product of the marginal density of y and the conditional density ofz given y, g(y,z,θ) = g1(y,θ)g2(zy,θ). The relationship of the density f(x,θ) and the density g(y,z,θ)comes from the rules for transforming random variables; see Chapter 3.8. Let x = x(y,z) denote theinverse of the one-to-one transformation from x to y and z, and let J(y,z) denote the Jacobian of thismapping; i.e., the determinant of the array of derivatives of x(y,z) with respect to its arguments,signed so that it is positive. Then g(y,z,θ) = f(x(y,z))J(y,z). The Jacobian J(y,z) does not dependon θ, so g(y,z,θ) factors into a term depending only on y and θ and a term independent of θ if andonly if f(x(y,z)) factors in the same way.

Page 135: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-4, Page 132______________________________________________________________________________

In general, both the marginal and the conditional densities depend on θ. However, if theconditional distribution of z given y is independent of θ, g2(zy,θ) = g2(zy), then the variables y aresaid to be sufficient for θ. In this case, all of the information in the sample about θ is summarizedin y, and once you know y, knowing z tells you nothing more about θ. (In Chapter 5.4, wedemonstrated this by showing that the posterior density for θ, given y and z, depended only on y, nomatter what the prior. Sufficiency of y is equivalent to a factorization g(y,z,θ) = g1(y,θ)g2(zy) ofthe density into one term depending only on y and θ and a second term depending only on z and y,where the terms g1 and g2 need not be densities; i.e., if there is such a factorization, then there isalways an additional normalization by a function of y that makes g1 and g2 into densities. Thischaracterization is useful for identifying sufficient statistics. Sufficiency can also be defined withrespect to a subvector of primary parameters: if g(y,z,α,β) = g1(y,α)g2(zy,β), then y is sufficient forα. Another situation that could arise is g(y,z,α,β) = g1(y,α)g2(zy,α,β), so the marginal distributionof y does not depend on the nuisance parameters, but the conditional distribution of z given ydepends on all the parameters. It may be possible in this case to circumvent estimation of thenuisance parameters by concentrating on g1(y,α). However, y is not sufficient for α in this case, asg2(zy,α,β) contains additional information on α, unfortunately entangled with the nuisanceparameters β.

An implication of sufficiency is that the search for a good estimator can be restricted toestimators T(y) that depend only on sufficient statistics y. In some problems, only the full samplex is a sufficient statistic, and you obtain no useful restriction from sufficiency. In others there maybe many different transformations of x into (y,z) for which y is sufficient. Then, among thealternative sufficient statistics, you will want to choose a y that is a minimal sufficient statistic. Thiswill be the case if there is no further one-to-one transformation of y into variables (u,v) such that uis sufficient for θ and of lower dimension than y. Minimal sufficient statistics will be most usefulwhen their dimension is low, and/or they isolate nuisance parameters so that the marginaldistribution of y depends only on the primary parameters.

An example shows how sufficiency works. Suppose one has a simple random sample x =(x1,...,xn) from an exponential distribution with an unknown scale parameter λ. The DGP density isthe product of univariate exponential densities, f(x,λ) = (λexp(-λx1))...(λexp(-λxn)) = λn

exp(-λ(x1+ ... + xn)). Make the one-to-one transformation y = x1 + ... + xn, z1 = x1,..., zn-1 = xn-1, and note thatthe inverse transformation implies xn = y - z1 - ... - zn-1. Substitute the inverse transformation intof to obtain g(y,z) = f(x(y,z)) = λn

e-λy. Then, g factors trivially into a marginal gamma density g1(y,λ)= λnyn-1

e-λy/(n-1)! for y, and a conditional uniform density g2(z|y) = (n-1)!/yn-1 on the simplex 0 z1+ ... + zn-1 y. Then, y is a sufficient statistic for λ, and one need consider only estimators for λ thatare functions of the univariate statistic y = x1 + ... + xn. In this case, y is a minimal sufficient statisticsince obviously no further reduction in dimension is possible.

In this exponential example, there are other sufficient statistics that are not minimal. Forexample, any y such as y = (x1 + ... + xn-2,xn-1,xn) whose components can be transformed to recoverthe sum of the x's is sufficient. Obviously, the lower the dimension of the sufficient statistic, the lessextensive the search needed to find a satisfactory estimator among all functions of the sufficientstatistic. Then, it is worth while to start from a minimal sufficient statistic.

Page 136: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-5, Page 133______________________________________________________________________________

6.1.5. Ancillarity. Analogously to the discussion of sufficiency, suppose there is a one-to-onetransformation from the data x into variables (y,w,z). Also suppose that the parameter vector θ iscomposed of a vector α of primary parameters and a vector β of nuisance parameters. Suppose that(y,w) are jointly sufficient for (α,β). Then the DGP density can be written as the product of themarginal density of (y,w) and the conditional density of z given y and w, g1(y,w,α,β)g2(zy,w).Further, the marginal density g2 is the product of the conditional density of y given w times themarginal density of w, so that the density of the sample is g3(y|w,α,β)g4(w,α,β)g2(zy,w). Both g3and g4 depend in general on α and β. However, now consider the case where the sample density canbe written g3(y|w,α)g4(w,β)g2(zy,w), with g3 independent of β and g4 independent of α. In thiscase, all the information in the data about α is contained in the conditional distribution of y givenw. In this situation, the statistics w are said to be ancillary to α, and y is said to be conditionallysufficient for α given w. The search for an estimator for α should then concentrate solely on theconditional density of y given w, for the same reasons that it is sufficient to look only at estimatorsthat are functions of sufficient statistics, and the nuisance parameters drop out of the analysis.

In these definitions, the distinction between w and z is not essential; if one absorbs all of z intow, it is still true that (y,w) is sufficient and all the information on α is contained in the conditionaldensity of y given w. However, in applications, it is useful to distinguish w and z in order to reducethe dimensions of the statistics in g3(y|w,α) as much as possible.

Consider an estimator T(y,w) of α that depends on the sufficient statistics (y,w), and suppose wenow want to examine the properties of this estimator. For example, we might want to look at itsexpected value, and compare this to α. When w is ancillary, α influences the distribution of y givenw, via g3(y|w,α), but it has no influence on the distribution of w. This suggests that in assessingT(y,w), we should examine its conditional expectation given w, rather than its unconditionalexpectation with respect to both y and w. Put another way, we should not be satisfied with theestimator T(y,w) if its conditional expectations are far away from α, even if its unconditionalexpectation were close to α, since the latter property is an accident of the distribution of w. Statedmore generally, this is a principle of conditionality (or, principle of ancillarity) which says that theproperties of an estimator should be considered conditional on ancillary statistics.

An artificial example makes it clear why the principle of conditionality is sensible. Suppose asurvey is taken to estimate the size of a population of cows. There are alternative sets ofinstructions for the data collector. Instruction A says to count and report the number of noses.Instruction B says to count and report the number of ears. A sufficient statistic for is (y,w), wherey is the reported count and w is an indicator that takes the value 1 if instruction A is used and thevalue 2 if instruction B is used. Each instruction has a 50/50 chance of being selected. Considerthree estimators of , T(y,w) = y/w, T(y,w) = 2y/3, and T(y,w) = y/w + w - 3/2. Now, the expectedvalue of y if w = 1 is , and the expected value of y if w = 2 is 2. Then, the conditional andunconditional expectations of the three estimators are

Ey|w T(y,w) = Ey|w T(y,w) = Ey|w T(y,w) = 2/3 if w14/3 if w2

1/2 if w11/2 if w2

E T(y,w) = E T(y,w) = (1/2)(2/3) + (1/2)(4/3) = E T(y,w) = + Ew - 3/2 =

Page 137: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-6, Page 134______________________________________________________________________________

All three estimators are functions of the sufficient statistics and have unconditional expectation ,so that they are centered at the true value. However, only the estimator T satisfies the principleof conditionality that this centering property holds conditional on w. The estimator T is not afunction of w, and has the property that conditional on w, it will be systematically off-center.Obviously, it is unappealing to use an estimator that has a systematic bias given the measurementmethod we use, even if the expectation over the choice of measuring instruments accidently averagesout the bias. The estimator T again has a systematic bias, given w, and is an unappealing estimatoreven though the further expectation over the selection of measurement method averages out the bias.In application, the principle of conditionality is usually, but not universally, desirable. First, someestimators that fail to satisfy this principle in finite samples will do so approximately in largesamples, so that they satisfy what might be called asymptotic conditionality. This will often beenough to assure that they have reasonable large sample statistical properties. Second, there may bea tradeoff between the principal of conditionality and tractability; e.g., it is sometimes possible bydeliberately intorducing some randomization into an estimator to make it easier to compute or easierto characterize its distribution, and this gain will sometimes offset the loss of the conditionalityproperty.

A more realistic example where ancillarity and the principle of conditionality are useful arisesin data x = (x1,...,xn) where the xi are independent observations from an exponential density and thesample size n is random with a Poisson density γn-1

e-γ/(n-1)! for n = 1,2,.... The DGP density is thenλnexp(-λ(x1 + ... + xn))γn-1

e-γ/(n-1)!. This density factors into the density λnyn-1e-λy, with y = x1 +...+

xn, that is now the conditional density of y given n, times a marginal density that is a function of n,y, and γ, but not of λ. Then, the principle of ancillarity says that to make inferences on λ, one shouldcondition on n and not be concerned with the nuisance parameter γ that enters only the marginaldensity of n, and the principle of conditionality says that in evaluating an estimator of λ, one shouldcondition on n and not rely on some averaging over the distribution of n to yield some apparentlydesirable property.

6.1.6. Unbiasedness. An estimator T() is unbiased for θ if ExθT(x) θ for all θ; i.e., θ

T(x)f(x,θ)dx. An estimator with this property is "centered" at the true parameter value, and

will not systematically be too high or too low. Unbiasedness is an intuitively appealing criterion thatis often used in classical statistics to select estimators. One implication of unbiasedness is that acomparison of the MSE of two unbiased estimators reduces to a comparison of their variances, sincebias makes no contribution to their respective MSE. When T(y,w) is a function of statistics (y,w),the estimator is conditionally unbiased for θ given w if Exθ,wT(y,w) θ. If w is ancillary to θ, thenconditiionally unbiased estimators, conditioned on the ancillary w, will be appealing, while theprinciple of conditionality suggests that an (unconditionally) unbiased estimator falls short.However, the next concept questions whether unbiasedness in general is a desirable property.

6.1.7. Admissibility. An estimator T() from a class of estimators T for a scalar parameter θ isadmissible relative to T if there is no other estimator T() in T with the property that the MSE of Tis less than or equal to the MSE of T for all θ Θ, with inequality strict for at least one θ. This is

Page 138: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-7, Page 135______________________________________________________________________________

the same as the definition of admissibility in statistical decision theory when the cost of a mistakeis defined as mean squared error (MSE), the expected value of the square of the difference betweenthe estimate and the true value of θ. An inadmissible estimator is undesirable because there is anidentified alternative estimator that is more closely clustered around the true parameter value.Geometrically, if one interprets the MSE of an estimator as a point in a space where each possibleθ is a coordinate, then the estimator T is admissible relative to T if there is no estimator in the classwhose MSE is southwest of the MSE of T. The class T may have many admissible estimators,arrayed northwest or southeast of each other so that there is no uniform in θ ranking of their MSE.Then, admissibility is a relatively weak criterion may when applied leave many candidate estimatorsto be sorted out on other grounds. Another limitation of the admissibility criterion is that one mightin fact have a cost of mistakes that is inconsistent with minimizing mean squared error. Suppose,for example, you incur a cost of zero if your estimate is no greater than a distance M from the truevalue, and a cost of one otherwise. Then, you will prefer the estimator that gives a higher probabilityof being within distance M, even if it occasionally has large deviations that make its MSE large.

The admissibility criterion is usually inconsistent with the unbiasedness criterion, a conflictbetween two reasonable properties. An example illustrates the issue. Suppose T() is an unbiasedestimator. Suppose θ* is an arbitrary point in Θ and c is a small positive constant, and define T()= (1-c)T() + cθ* ; this is called a Stein shrinkage estimator. Then

Exθ(T(x) - θ)2 = Exθ [(1-c)(T(x) - θ) + c(θ* - θ)]2 = c2(θ* - θ)2 + (1-c)2Exθ [T(x) - θ]2,

implying that Exθ(T(x) - θ)2/c = 2c(θ* - θ)2 - 2(1-c)E xθ [T(x) - θ]2 < 0 for c sufficiently small.Then, for a problem where (θ* - θ)2 and E xθ [T(x) - θ]2 are bounded for all θ Θ, one can find c forwhich T() has lower MSE than T(), so that T() is inadmissible.

The concept of admissibility can be extended to vectors of parameters by saying that an estimatoris admissible if each linear combination of the estimators is admissible for the same linearcombination of the parameters. Consequently, if a vector of estimators is admissible, eachcoordinate estimator is admissible, but the reverse is not true

6.1.8. Efficiency. An estimator T() of a scalar parameter is efficient relative to an estimator T()if for all θ the MSE of T() is less than or equal to the MSE of T(). The estimator T() is efficientrelative to a class of estimators T if it is efficient relative to T() for all T() in T. An efficientestimator provides estimates that are most closely clustered around the true value of θ, by the MSEmeasure, among all the estimators in T. In terms of the geometric interpretation of MSE as a pointin a space with a coordinate for each possible θ, an efficient estimator T() in T has the property thatevery other estimator T() in T has a MSE to the northeast of T(). It is possible for two distinctestimators to map into the same MSE point, in which case they are termed equivalent. This does notimply they are identical, or even that they agree in terms of other criteria that may be relevant to theuser. Recall that admissibility requires that there be no other estimator in T with a MSE to thesouthwest of T(). If there are estimators to the northwest or southeast of T(), then T() will not beefficient because it no longer has a uniformly (in θ) weakly smallest MSE; however, T() can stillbe admissible even if it is not efficient. Thus, an efficient estimator must be admissible, but in

Page 139: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-8, Page 136______________________________________________________________________________

general an admissible estimator need not be efficient. Then there can be many admissible estimators,but no efficient estimator. If T contains an efficient estimator T(), then another estimator T() inT is admissible only if it is also efficient. The concept of efficiency extends to parameter vectorsby requiring that it apply to each linear combination of the parameter vector. The following theoremestablishes an important efficiency result for estimators that are functions of sufficient statistics:

Theorem 6.1. (Blackwell) If T() is any estimator of θ from data x, and y is a sufficient statistic,then there exists an estimator T() that is a function solely of the sufficient statistic and that isefficient relative to T(). If T() is unbiased, then so is T(). If an unbiased estimator T() isuncorrelated with every unbiased estimator of zero, then T() has a smaller variance than any otherunbiased estimator, and is the unique efficient estimator in the class of unbiased estimators.

Proof: Suppose there is a scalar parameter. Make a one-to-one transformation of the data x into(y,z), where y is the sufficient statistic, and let g1(y,θ)g2(zy) denote the DGP density. Define T(y)= EzyT(y,z). Write T(y,z) - θ = T(y,z) - T(y) + T(y) - θ. Then

E(T(y,z) - θ)2 = E(T(y,z) - T(y))2 + E(T(y) - θ)2 + 2E(T(y) - θ)(T(y,z) - T(y)) .

But the last term satisfies

2E(T(y) - θ)(T(y,z) - T(y)) = 2Ey(T(y) - θ)Ezy(T(y,z) - T(y)) = 0 .

Therefore, E(T(y,z) - θ)2 E(T(y) - θ)2. If T(y,z) is unbiased, then ET(y) = EyEzyT(y,z) = θ, andT() is also unbiased. Finally, suppose T() is uncorrelated with any estimator U() that is an unbiasedestimator of zero, i.e., EU(y,z) = 0 implies EU(y,z)(T(y) - θ) = 0. Then, any unbiased T(y,z) hasU(y,z)= T(y,z) - T(y) an unbiased estimator of zero, implying

E(T(x) - θ)2 = E(T(x) - T(x) + T(x) - θ)2= E(T(x) - T(x))2 + E(T(x) - θ)2 + 2ET(x)(T(x) - T(x))= E(T(x) - T(x))2 + E(T(x) - θ)2 > E(T(x) - θ)2.

The theorem also holds for vectors of parameters, and can be established by applying the argumentsabove to each linear combination of the parameter vector. G

6.1.9. Minimum Variance Unbiased Estimator (MVUE). If T is a class of unbiased estimatorsof a scalar parameter, so that ExθT(x) θ for every estimator T() in this class, then an estimatoris efficient in this class if its variance is no larger than the variance of any other estimator in theclass, and is termed a MVUE. There are many problems for which no MVUE exists. We next givea lower bound on the variance of an unbiased estimator. If a candidate satisfies this bound, then wecan be sure that it is MVUE. However, the converse is not true: There may be a MVUE, but itsvariance may still be larger than this lower bound; i.e., the lower bound may be unobtainable. Onceagain, the MVUE concept can be extended to parameter vectors by requiring that it apply to eachlinear combination of parameters. If an observation x has a density f(x,θ) that is a continuouslydifferentiable function of the parameter θ, define its Fisher Information to be

Page 140: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-9, Page 137______________________________________________________________________________

J = Exθ [θlog f(x,θ)][θlog f(x,θ)].

Because this is the expectation of a square (or in the matrix case, the product of a vector times itstranspose), J is non-negative (or in the matrix case, positive semi-definite). Except for pathologicalcases, it will be strictly positive. The following bound establishes a sense in which the FisherInformation provides a lower bound on the precision with which a parameter can be estimated.

Theorem 6.2. (Cramer-Rao Bound) Suppose a simple random sample x = (x1,...,xN) with f(x,θ)the density of an observation x. Assume that log f(x,θ) is twice continuously differentiable in ascalar parameter θ, and that this function and its derivatives are bounded in magnitude by a functionthat is independent of θ and has a finite integral in x. Let J(x) denote the Fisher information in anobservation x. Suppose an estimator T(x) has ExθT(x) θ + µ(θ), so that µ(θ) is the bias of theestimator. Suppose that µ(θ) is differentiable. Then, the variance of T(x) satisfies

Vxθ(T(x)) (I + θµ(θ))(nJ)-1(I + θµ(θ)).

If the estimator is unbiased, so µ(θ) 0, this bound reduces to

Vxθ(T(X)) (nJ)-1,

so that the variance of an unbiased estimator is at least as large as the inverse of the Fisherinformation in the sample. This result continues to hold when θ is a vector, with Vxθ(T(x)) acovariance matrix and interpreted to mean than the matrix difference is positive semidefinite.

Proof: Assume θ is a scalar. Let L(x,θ) = log f(xi,θ), so that the DGP density is f(x,θ) =ni1

eL(x,θ). By construction,

1 eL(x,θ)dx and θ + µ(θ) T(x)eL(x,θ)dx.

The conditions of the Lebesgue dominated convergence theorem are met, allowing differentiationunder the integral sign. Then, differentiate each integral with respect to θ to get 0 θL(x,θ)eL(x,θ)dx and 1 + µ(θ) T(x)θL(x,θ)eL(x,θ)dx .

Combine these to get an expression for the covariance of T and θL, 1 + µ(θ) [T(x) - θ]θL(x,θ)eL(x,θ)dx .

Apply the Cauchy-Schwartz inequality; see 3.5.9. In this case, the inequality can be written

(1 + µ(θ))2 = [Exθ(T(x) - θ)2]]Exθ[θL(x,θ)]2] .

[T(x) θ]θL(x,θ)e L(x,θ)dx

2

Page 141: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-10, Page 138______________________________________________________________________________

Dividing both sides by the Fisher information in the sample, which is simply the variance of thesample score, Exθ [θL(x,θ)]2, gives the bound.

When θ is k×1, one again has θ + µ(θ) = . Differentiating with respect to θ

T(x)eL(x,θ)dx

gives I + θµ(θ) = = . The

T(x) θL(x,θ)eL(x,θ)dx

(T(x)θµ(θ))θL(x,θ)eL(x,θ)dx

vector (T(x) - θ - µ(θ)), θL(x,θ)) has a positive semidefinite covariance matrix that can be written

in partitioned form as . If one premultiplies this matrix by W, andVxθ(T(x)) [Iθµ(θ)]

[Iθµ(θ)] nJ

postmultiplies by W, with W = , the resulting matrix is positiveI [Iθµ(θ)](nJ)1

semidefinite, and gives the Cramer-Rao bound for the vector case.

6.1.10. Invariance. In some conditions, one would expect that a change in a problem should notalter an estimate of a parameter, or should alter it in a specific way. Generically, these are calledinvariance properties of an estimator. For example, when estimating a parameter from data obtainedby a simple random sample, the estimate should not depend on the indexing of the observations inthe sample; i.e., T(x1,...,xn) should be invariant under permutations of the observations. Toillustrate, suppose one found that T(x1) had some reasonable property such as being an unbiasedestimator of θ. It is not invariant under permutation, but T(x) =(T(x1)+... +T(xn))/n is, and henceby this invariance criterion would be a preferable estimator.

A second example is invariance with sample scale: if Tn(x1,...,xn) denotes the estimator for asample of size n, and the observations all equal a constant c, then the estimator should not changewith sample size, or Tn(c,...,c) = T1(c). A sample mean, for example, has invariance underpermutation and invariance with sample scale.

Sometimes a parameter enters a DGP in such a way that there is a simple relationship betweenshifts in the parameter and the shifts one would expect to observe in the data. For example, supposethe density of an observation is of the form f(xiθ) h(xi-θ); in this case, θ is called a locationparameter. If the true value of θ shifts up by an amount ∆, one would expect observations onaverage to shift up by the same amount ∆. If Tn(x1,...,xn) is an estimator of θo in this problem, areasonable property to impose on Tn() is that Tn(x1+∆,...,xn+∆) = Tn(x1,...,xn) + ∆. In this case, Tn()is termed location invariant. For this parametric family, it is reasonable to restrict attention toestimators with this invariance property.

Another example is scale invariance. Suppose the density of an observation has the form f(xiθ) θ-1

h(xi/θ). Then θ is called a scale parameter. If θ is increased by a proportion λ, one wouldexpect observations on average to be scaled up by λ. The corresponding invariance property on anestimator Tn() is that Tn(λx1,...,λxn) = λTn(x1,...,xn).

To illustrate the use of invariance conditions, consider the example of a simple random samplex = (x1,...,xn) from an exponential distribution with an unknown scale parameter λ, with the DGP

Page 142: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-11, Page 139______________________________________________________________________________

density f(x,λ) = λ-nexp(-(x1 + ... + xn)/λ). Then y = ( x1 + ... + xn)/n is sufficient and we needconsider only estimators Tn(y). Invariance with respect to scale implies Tn(y) = yTn(1). Invariancewith sample scale requires that if x1 = ... = xn = 1, so that y = 1, then Tn(1) = T1(1). Combining theseconditions, Tn(y) = yT1(1), so that an estimator that is a function of the sufficient statistic and hasthese invariance properties must be proportional to the sample mean, with a proportion independentof sample size.

6.1.11. The next group of properties refer to the limiting behavior of estimators in a sequenceof larger and larger samples, and are sometimes called asymptotic properties. The rationale foremploying these properties is that when one is working with a large sample, then properties that holdin the limit will also hold, approximately, for this sample. The reason for considering suchproperties at all, rather than concentrating on the sample you actually have, is that one can use theseapproximate properties to choose among estimators in situations where the exact finite sampleproperty cannot be imposed or is analytically intractable to work out.

Application of asymptotic properties raises several conceptual and technical issues. The firstquestion is what it would mean to increase sample size indefinitely, and whether various methodsthat might be used to define this limit correspond to approximations that are likely to be relevant toa specific problem. There is no ambiguity when one is drawing simple random samples from aninfinite population. However, if one samples from a finite population, a finite sequence of samplesof increasing size will terminate in a complete census of the population. While one could imaginesampling with replacement and drawing samples that are larger than the population, it is not obviouswhy estimators that have some reasonable properties in this limit are necessarily appropriate for thefinite population. Put another way, it is not obvious that this limit provides a good approximationto the finite sample.

The issue of the appropriate asymptotic limit is particularly acute for time series. One canimagine extending observations indefinitely through time. This may provide approximations thatare appropriate in some situations for some purposes, but not for others. For example, if one istrying to estimate the timing of a particular event, a local feature of the time series, it is questionablethat extending the time series indefinitely into the past and future leads to a good approximation tothe statistical properties of the estimator of the timing of an event. Other ways of thinking ofincreasing sample sizes for time series, such as sampling from more and more "parallel" universes,or sampling at shorter and shorter intervals, have their own idiosyncrasies that make themquestionable as useful approximations.

A second major issue is how the sequence of estimators associated with various sample sizes isdefined. A conceptualization introduced in Chapter 5 defines an estimator to be a functional of theempirical CDF of the data, T(Fn). Then, it is natural to think of T(F(,θ)) as the limit of this sequenceof estimators, and the Glivenko-Cantelli theorem stated in Chapter 5.1 establishes an approximationproperty that the estimator T(Fn) converges almost surely to T(F(,θ)) if the latter exists. Thissuggests that defining estimators as continuous functions of the CDF leads to a situation in whichthe asymptotic limit will have reasonable approximation properties in large samples. However, ttis important to avoid reliance on asymptotic arguments when it is clear that the asymptoticapproximation is irrelevant to the behavior of the estimator in the range of sample sizes actually

Page 143: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-12, Page 140______________________________________________________________________________

encountered. Consider an estimation procedure which says "Ignore the data and estimate θo to bezero in all samples of size less than 10 billion, and for larger samples employ some computationallycomplex but statistically sound estimator". This procedure may technically have good asymptoticproperties, but this approximation obviously tells you nothing about the behavior of the estimatorin economic sample sizes of a few thousand observations.

6.1.12. Consistency. A sequence of estimators Tn(x) = Tn(x1,...,xn) for samples of size n areconsistent for θo if the probability that they are more than a distance > 0 from θo goes to zero as nincreases; i.e., limn P(Tn(x1,...,xn) - θo > ) = 0. In the terminology of Chapter 4, this is weakconvergence or convergence in probability, written Tn(x1,...,xn) p θo. One can also talk about strongconsistency, which holds when limn P(supmnTm(x1,...,xn) - θo > ) = 0, and corresponds to almostsure convergence, Tn(x1,...,xn) as θo.

6.1.13. Asymptotic Normality. A sequence of estimators Tn() for samples of size n are consistentasymptotically normal (CAN) for θ if there exists a sequence rn of scaling constants such that rn + and rn(Tn(xn) - θ) converges in distribution to a normally distributed random variable with somemean µ = µ(θ) and variance σ2 = σ(θ)2. If Ψn(t) is the CDF of Tn(xn), then Qn = rn(Tn(xn) - θ) has theCDF P(Qn q) = Ψn(θ + q/rn). From Chapter 4, one will have convergence in distribution to anormal, rn(Tn(xn) - θ) d Z with Z ~ N(µ,σ2), if and only if for each q, the CDF of Qn satisfies

Ψn(θ + q/rn) - Φ((q-µ)/σ) = 0. This is the conventional definition of convergence inlimn

distribution, with the continuity of the normal CDF Φ permitting us to state the condition withoutexcepting jump points in the limit distribution. In this setup, Ψn(t) is converging in distribution to1(tθ), the CDF of the constant random variable equal to θ. However, rn is blowing up at just theright rate so that Ψn(θ + q/rn) has a non-degenerate asymptotic distribution, whose shape isdetermined by the local shape of Ψn in shrinking neighborhoods of θ. Asymptotic normality is usedto approximate points of the CDF of Tn(xn) in a large but finite sample by using the fact that Ψn(tn) Φ((rn(tn - θ) -µ)/σ). The mean µ is termed the asymptotic bias, and σ2 is termed the asymptoticvariance. If µ = 0, the estimator is said to be asymptotically unbiased. An unbiased estimator willbe asymptotically unbiased, but the reverse is not necessarily true. Often, when a sequence ofestimators is said to be asymptotically normal, asymptotic unbiasedness is taken to be part of thedefinition unless stated explicitly to the contrary. The scaling term rn can be taken to be n1/2 inalmost all finite-parameter problems, and unless it is stated otherwise, you can assume that this isthe scaling that is being used. When it is important to make this distinction clear, one can speak ofRoot-n consistent asymptotically normal (RCAN) sequences of estimators.

Convergence in distribution to a normal is a condition that holds pointwise for each trueparameter θ. One could strengthen the property by requiring that this convergence be uniform in θ;i.e., by requiring for each > 0 and q that there be a sample size n(,q) beyond which supθ Ψ(θo +q/rn) - Φ((q-µ(θo))/σ(θo)) < . If this form of convergence holds, and in addition µ(θ) and σ(θ)2 arecontinuous functions of θ, then the estimator is said to be consistent uniformly asymptotically normal(CUAN).

Page 144: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-13, Page 141______________________________________________________________________________

6.1.14. Asymptotic Efficiency. Consider a family T of sequences of estimators Tn() that areCUAN for a parameter θ and have asymptotic bias µ(θ) 0. An estimator T*() is asymptoticallyefficient relative to class T if its asymptotic variance is no larger than that of any other member ofthe family. The reason for restricting attention to the CUAN class is that in the absence ofuniformity, there exist super-efficient estimators, constructed in the following way: Suppose Tn()is an asymptotically efficient estimator in the CUAN class. For an arbitrary θ*, define Tn*() to equalTn() if n1/2|Tn(x) - θ*| 1, and equal to θ* otherwise. This estimator will have the same asymptoticvariance as Tn() for fixed θ θ*, and an asymptotic variance of zero for θ = θ*. Thus, it is moreefficient. On the other hand, it has a nasty asymptotic bias for parameter vectors that are local toθ*, so that it is not CUAN, and would be an unattractive estimator to use in practice. Once thesenon-uniform superefficient estimators are excluded by restricting attention to the CUAN class, onehas the result that under reasonable regularity conditions, an asymptotic version of the Cramer-Raobound for unbiased estimators holds for CUAN estimators.

6.1.15. Asymptotic sufficiency. In some problems, sufficiency does not provide a usefulreduction of dimension in finite samples, but a weaker "asymptotic" form of sufficiency will provideuseful restrictions. This could arise if the DGP density can be written g1(y,θ)g2(zy,θ) for alow-dimensional statistic y, but both g1 and g2 depend on θ so y is not sufficient. However, g2(zy,θ)may converge in distribution to a density that does not depend on θ. Then, there is a large samplerationale for concentrating on estimators that depend only on y.

6.2. GENERAL ESTIMATION CRITERIA

6.2.1. It is useful to have some general methods of generating estimators that as a consequenceof their construction will have some desirable statistical properties. Such estimators may proveadequate in themselves, or may form a starting point for refinements that improve statisticalproperties. We introduce several such methods:

6.2.2. Analogy Estimators. Suppose one is interested in a feature of a target population that canbe described as a functional of its CDF F(), such as its mean, median, or variance, and write thisfeature as θ = µ(F). An analogy estimator exploits the similarity of a population and of a simplerandom sample drawn from this population, and forms the estimator T(x) = µ(Fn), where µ is thefunctional that produces the target population feature and Fn is the empirical distribution function.For example, a sample mean will be an analogy estimator for a population mean.

6.2.3. Moment Estimators. Population moments will depend on the parameter index in theunderlying DGP. This is true for ordinary moments such as means, variances, and covariances, aswell as more complicated moments involving data transformations, such as quantiles. Let m(x)denote a function of an observation and Exθm(x) = γ(θ) denote the population moment formed bytaking the expectation of m(x). In a sample x = (x1,...,xn), the idea of a moments estimator is to form

Page 145: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-14, Page 142______________________________________________________________________________

a sample moment m(xi) Enm(x), and then to use the analogy of the population and samplen 1n

i1

moments to form the approximation Enm(x) Exθ = γ(θ). The sample average of a function m(x)of an observation can also be interpreted as its expectation with respect to the empirical distributionof the sample; we use the notation Enm(x) to denote this empirical expectation. The momentestimator T(x) solves Enm(x) = γ(T(x)). When the number of moment conditions equals the numberof parameters, an exact solution is normally obtainable, and T(x) is termed a classical method ofmoments estimator. When the number of moment conditions exceeds the number of parameters, itis not possible in general to find T(x) that sets them all to zero at once. In this case, one may forma number of linear combinations of the moments equal to the number of parameters to be estimated,and find T(x) that sets these linear combinations to zero. The linear combinations in turn may bederived starting from some metric that provides a measure of the distance of the moments from zero,with T(x) interpreted as a minimand of this metric. This is called generalized method of momentsestimation.

6.2.4. Maximum likelihood estimators. Consider the DGP density f(x,θ) for a given sample asa function of θ. The maximum likelihood estimator of the unknown true value θ is the statistic T(x)that maximizes f(x,θ). The intuition behind this estimator is that if we guess a value for θ that is faraway from the true θo, then the probability law for this θ would be very unlikely to produce the datathat are actually observed, whereas if we guess a value for θ that is near the true θo, then theprobability law for this θ would be likely to produce the observed data. Then, the T(x) whichmaximized this likelihood, as measured by the probability law itself, should be close to the true θ.The maximum likelihood estimator plays a central role in classical statistics, and can be motivatedsolely in terms of its desirable classical statistical properties in large samples.

When the data are a sample of n independent observations, each with density f(x,θ), then the

likelihood of the sample is f(x,θ) = f(xi,θ). It is often convenient to work with the logarithmni1

of the density, l(x,θ) Log f(x,θ). Then, the Log Likelihood of the sample is L(x,θ) Log f(x,θ) =

l(xi,θ). The maximum likelihood estimator is the function t = T(x) of the data that whenni1

substituted for θ maximizes f(x,θ), or equivalently L(x,θ). The gradient of the log likelihood of an observation with respect to θ is denoted s(x,θ) θl(x,θ),

and termed the score. The maximum likelihood estimator is a zero of the sample expectation of thescore, Ens(x,T(x)). Then, the maximum likelihood estimator is a special case of a momentsestimator.

Maximum likelihood estimators will under quite general regularity conditions be consistent andasymptotically normal. Under uniformity conditions that rule out some odd non-uniform "super-efficient" alternatives, they are also asymptotically efficient. They often have good finite-sampleproperties, or can be easily modified so that they do. However, their finite-sample properties haveto be determined on a case-by-case basis. In multiple parameter problems, particularly when there

Page 146: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-15, Page 143______________________________________________________________________________

are primary parameters α and nuisance parameters β, the maximum likelihood principle cansometimes be used to handle the nuisance parameters. Specifically, maximum likelihood estimationfor all parameters will find the parameter values that solve maxα,β L(x,α,β). But one could get thesame solution by first maximizing in the nuisance parameters β, obtaining a solution β = r(x,α), andsubstituting this back into the likelihood function to obtain L(x,α,r(x,α)). This is called theconcentrated likelihood function, and it can now be maximized in α alone. The reason this can bean advantage is that one may be able to obtain r(x,α) formally without having to compute it.

6.3. ESTIMATION IN NORMALLY DISTRIBUTED POPULATIONS

6.3.1. Consider a simple random sample x = (x1,...,xn) from a population in which observationsare normally distributed with mean µ and variance σ2. Let (v) = (2π)-1/2exp(-ν2/2) denote thestandard normal density. Then the density of observation xi is ((xi - µ)/σ)/σ. The log likelihoodof the sample is

L(x,µ,σ2) = - Log(2π) - Log σ2 - (xi- µ)2/σ2. n2

n2

12

ni1

We will find estimates µe and σe2 for the parameters µ and σ2 using the maximum likelihood method,

and establish some of the statistical properties of these estimators.

6.3.2. The first-order-conditions for maximizing L(x,µ,σ2) in µ and σ2 are 0 = (xi-µ)/σ2 µe = x xi,

ni1 n1

ni1

0 = -n/2σ2 + (xi-µ)2/2σ4 σe2 = (xi-x)2.

ni1 n1

ni1

The maximum likelihood estimator of µ is then the sample mean, and the maximum likelihood

estimator of σ2 is the sample variance. Define s2 = σe2n/(n-1) = (xi-x)2, the sample1

n1 ni1

variance with a sample size correction. The following result summarizes the properties of theseestimators:

Theorem 6.3. If x = (x1,...,xn) is a simple random sample from a population in whichobservations are normally distributed with mean µ and variance σ2, then

(1) (x,s2) are joint minimal sufficient statistics for (µ,σ2). (2) x is an unbiased estimator for µ, and s2 an unbiased estimator for σ2. (3) x is a Minimum Variance Unbiased Estimator (MVUE) for µ; s2 is MVUE for σ2. (4) x is Normally distributed with mean µ and variance σ2/n. (5) (n-1)s2/σ2 has a Chi-square distribution with n-1 degrees of freedom.

Page 147: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-16, Page 144______________________________________________________________________________

(6) x and s2 are statistically independent.(7) n1/2(x - µ)/s has a Student's-T distribution with n-1 degrees of freedom.(8) (x - µ)2/s2 has an F-distribution with 1 and n-1 degrees of freedom.

Proof: (1) Factor the log likelihood function as

L(x,µ,σ2) = - Log(2π) - Log σ2 - (xi - x + x - µ)2/σ2n2

n2

12

ni1

= - Log(2π) - Log σ2 - (xi - x)2/σ2 - (x-µ)2/σ2n2

n2

12

ni1

12

ni1

= - Log(2π) - Log σ2 - - (x-µ)2/σ2 .

n2

n2

12

(n1)s 2

σ2

n2

This implies that x and s2 are jointly sufficient for µ and σ2. Because the dimension of (x,s2) is thesame as the dimension of (µ,σ2), they are obviously minimal sufficient statistics.

(2) The expectation of x is E x = Exi = µ, since the expectation of each observationn1ni1

is µ. Hence x is unbiased. To establish the expectation of s2, first form the n×n matrix M = In -1n1n/n, where In is the n×n identity matrix and 1n is a n×1 vector of ones. The matrix M isidempotent (check) and its trace satisfies tr(M) = tr(In) - tr(1n1n/n) = n - tr(1n1n/n) = n - 1. Theresult then follows from Theorem 3.11 (viii). For a direct demonstration, let Z = (x1-µ,...,xn-µ)denote the vector of deviations of observations from the population mean. This vector containsindependent identically distributed normal random variables with mean zero and variance σ2, so thatEZZ = σ2In. Further, ZM = (x1 - x,...,xn - x) and s2 = ZMMZ/(n-1) = ZMZ/(n-1). Therefore, Es2

= E(ZMZ)/(n-1) = E tr(ZMZ)/(n-1) = E tr(MZZ)/(n-1) = tr(ME(ZZ))/(n-1) = σ2tr(M)/(n-1) =

σ2. Hence, s2 is unbiased.(3) The MVUE property of x and s2 is most easily proved by application of the Blackwell

theorem. We already know that these estimators are unbiased. Any other unbiased estimator of µthen has the property that the difference of this estimator and x, which we will denote by h(x), mustsatisfy Eh(x) 0. Alternately, h(x) could be the difference of s2 and any other unbiased estimatorof σ2. We list a series of conditions, and then give the arguments that link these conditions.

(a) 0 Eh(x) h(x)exp( L(x,µ,σ2))dx h(x)exp(- (xi- µ)2/2σ2)dx .

ni1

(b) 0 h(x) exp(- (xi- µ)2/2σ2)dx

ni1 (xiµ)

ni1

h(x)x exp(- (xi- µ)2/2σ2)dx

ni1

Page 148: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-17, Page 145______________________________________________________________________________

(c) 0 h(x)x exp(- (xi- µ)2/2σ2)dx

ni1 (xiµ)

ni1

h(x)x2 exp(- (xi- µ)2/2σ2)dx

ni1

h(x)[x - µ]2 exp(- (xi- µ)2/2σ2)dx

ni1

(d) 0 h(x) (xi- µ)2exp(- (xi- µ)2/2σ2)dx

ni1

ni1

h(x)[ (xi - x)2]exp(- (xi-µ)2/2σ2)dx

ni1

ni1

Condition (a) is the statement that h(x) has expectation zero, and the second form is obtained bystriking terms that can be taken outside the integral. Differentiate the last form of (a) with respectto µ and strike out terms to get condition (b). The second form of (b) is obtained by using (a) toshow that the second part of the term in brackets makes a contribution that is zero. Differentiate thelast form of (b) with respect to µ and strike out terms to get (c). The second form of (c) is obtainedby using (a) to simplify the term in brackets. The last form of (c) follows by expanding the squaredterm and applying (a) and (b). Condition (d) is obtained by differentiating the last form of (a) withrespect to σ2 and striking out terms. Write xi - µ = xi - x + x - µ, expand the square, and apply (c)to obtain the last form of (d). Condition (b) implies Eh(x)x 0, and condition (d) implies Eh(x)s2

= 0. Then, the estimators x and s2 are uncorrelated with any unbiased estimator of zero. TheBlackwell theorem then establishes that they are the unique minimum variance estimators amongall unbiased estimators.

(4) Next consider the distribution of x. We use the fact that linear transformations ofmultivariate normal random vectors are again multivariate normal: If Z ~ N(µ,Ω) and W = CZ, thenW ~ N(Cµ,CΩC). This result holds even if Z and W are of different dimensions, or C is of lessthan full rank. (If the rank of CΩC is less than full, then the random variable has all its densityconcentrated on a subspace.) Now x = Cx when C = (1/n,...,1/n). We have x multivariate normalwith mean 1nµ and covariance matrix σ2In, where 1n is a n×1 vector of ones and In is the n×n identitymatrix. Therefore, x ~ N(µC1n,σ2CC) = N(µ,σ2/n).

(5) Next consider the distribution of s2. Consider the quadratic form (x/σ)M(x/σ), where M isthe idempotent matrix M = In - 1n1n/n from (2). The vector (x/σ) is independent standard normal,so that Theorem 3.11(iii) gives the result.

(6) The matrices C = (1/n,...,1/n) = 1n and M = In - 1n1n/n have CM = 0. Then Theorem3.11(vii) gives the result that C(x/σ) = x/σ and (x/σ)M(x/σ) = (n-1)s2/σ2 are independent.

For (7), Use Theorem 3.9(ii), and for (8), use Theorem 3.9(iii).

4. LARGE SAMPLE PROPERTIES OF MAXIMUM LIKELIHOOD ESTIMATES

Page 149: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-18, Page 146______________________________________________________________________________

This section provides a brief and informal introduction to the statistical properties of maximumlikelihood estimators and similar estimation methods in large samples. Consider a simple randomsample x = (x1,...,xn) from a population in which the density of an observation is f(x,θo). The DGPdensity or likelihood of the sample is then f(x,θ) = f(x1,θ)...f(xn,θ), with θo the true value of θ. Thelog likelihood of an observation is l(x,θ) = log f(x,θo), and the log likelihood of the sample is Ln(x,θ)

= l(xn,θ). The maximum likelihood estimator Tn(x) is a value of θ which maximizesni1

Ln(x,θ). The first-order condition for this maximum is that the sample score,

θLn(x,θ) = θl(xi,θ) , ni1

equal zero at θ = Tn(x). The second order condition is that the sample hessianθθLn(x,θ) =

θθl(xi,θ) , be negative at θ = T(x). When the parameter θ is more than one-dimensional, theni1

second-order condition is that the sample hessian is a negative definite matrix.Under very mild regularity conditions, the expectation of the score of an observation is zero at

the true parameter vector. Start from the identity exp(l(x,θ))dx 1 and differentiate with

respect to θ under the integral sign to obtain the condition θl(x,θ)exp(l(x,θ))dx 0.

(Regularity conditions are needed to assure that one can indeed differentiate under the integral; thiswill be supplied by assuming a dominance condition so that the Lebesgue dominated convergencetheorem can be applied; see Theorem 3.1 and the discussion following.) Then, at the true parameterθ,one has Exθθl(x,θ) = 0, the condition that the population score is zero when θ = θo. Another

regularity condition requires that θl(x,θ) = 0 only if θ = θo; this has the interpretation of anExθo

identification condition. The maximum likelihood estimator can be interpreted as an analogyestimator that chooses Tn(x) to satisfy a sample condition (that the sample score be zero) that isanalogous to the population score condition. One could sharpen the statement of this analogy bywriting the population score as an explicit function of the population DGP, µ(θ,F(,θo))

θl(x,θ), and writing the sample score as µ(θ,Fn) Enθl(x,θ), where En stands forExθo

empirical expectation, or sample average. The mapping µ(θ,) is linear in its second argument, andthis is enough to assure that it is continuous (in an appropriate sense) in this argument. Then one hasalmost sure convergence of µ(θ,Fn) to µ(θ,F(,θo)) for each θ, from the Glivenko-Cantelli theorem.A few additional regularity conditions are enough to ensure that this convergence is uniform in θ,and that a solution Tn(x) that sets the sample score to zero converges almost surely to the value θothat sets the population score to zero.

Page 150: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-19, Page 147______________________________________________________________________________

The basic large sample properties of maximum likelihood estimators are that, subject to suitableregularity conditions, Tn converges in probability to the true parameter vector θo, and n1/2(Tn - θo)converges in distribution to a normal random variable with mean zero and a variance which achievesthe Cramer- Rao bound for an unbiased estimator. These results imply that in large samples, Tn willbecome a more and more precise estimate of the true parameter. Further, the convergence indistribution to a Normal permits one to use the properties of a Normal population to constructapproximate hypothesis tests and confidence bounds, and get approximations for significance levelsand power whose accuracy increases with sample size. The achievement of the Cramer-Rao lowerbound on variance indicates that in large samples there are no alternative estimators which areuniformly more precise, so MLE is the "best" one can do.

We next list a series of regularity conditions under which the results stated above can be shownto hold. Only the single parameter case will be presented. However, the conditions and results havedirect generalizations to the multiple parameter case. This list is chosen so the conditions are easyto interpret and to check in applications. Note that these are conditions on the population DGP, noton a specific sample. Hence, "checking" means verifying that your model of the DGP and yourassumptions on distributions of random variables are logically consistent with the regularityconditions. They cannot be verified empirically by looking at the data, but it is often possible to setup and carry out empirical tests that may allow you to conclude that some of the regularity conditionsfail. There are alternative forms for the regularity conditions, as well as weaker conditions, whichgive the same or similar limiting results. The regularity conditions are quite generic, and will besatisfied in many economic applications. However, it is a serious mistake to assume withoutchecking that the DGP you assume for your problem is consistent with these conditions. While inmost cases the mantra "I assume the appropriate regularity conditions" will work out, you can beacutely embarrassed if your DGP happens to be one of the exceptions that is logically inconsistentwith the regularity conditions, particularly if it results in estimators that fail to have desirablestatistical properties. Here are the conditions:

A.1. There is a single parameter θ which is permitted to vary in a closed bounded subset Θ.The true value θo is in the interior of Θ. A.2. The sample observations are realizations of independently identically distributed randomvariables x1,...,xn, with a common density f(x,θo). A.3. The density f(x,θ) is continuous in θ, and three times continuously differentiable in θ, foreach x, and is "well behaved" (e.g., measurable or piecewise continuous or continuous) in xfor each θ. A.4. There exists a bound β(x) on the density and its derivatives which is uniform in θ andsatisfies l(x,θ) β(x), (θl(x,θ))2 β(x), θθl(x,θ) β(x), θθθl(x,θ) β(x), and

β(x)2f(xθo)dx < + . (Then, β(x) is a dominating, square-integrable function.)

A.5 The function λ(θ) = Ex|θ l(x,θ) has λ(θ) < λ(θo) and θλ(θ) 0 for θ θo and J = -θθλ(θo)> 0.

Page 151: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-20, Page 148______________________________________________________________________________

The expression J in A.5 is termed the Fisher information in an observation. The first twoassumptions mostly set the problem. The restriction of the parameter to a closed bounded setguarantees that a MLE exists, and can be relaxed by adding conditions elsewhere. Requiring θointerior to Θ guarantees that the first-order condition Enθl(x,Tn()) = 0 for a maximum holds forlarge n, rather than an inequality condition for a maximum at a boundary. This really mattersbecause MLE at boundaries can have different asymptotic distributions and rates of convergencethan the standard n1/2 rate of convergence to the normal. The continuity conditions A.3 are satisfiedfor most economic problems, and in some weak form are critical to the asymptotic distributionresults. Condition A.4 gives bounds that permit exchange of the order of differentiation andintegration in forming expectations with respect to the population density. Condition A.5 is anidentification requirement which implies there cannot be a parameter vector other than θo that onaverage always explains the data as well as θo.

The next result establishes that under these regularity conditions, a MLE is consistent andasymptotically normal (CAN):

Theorem 6.4. If A.1-A.5 hold, then a maximum likelihood estimator Tn satisfies(1) Tn is consistent for θo. (2) Tn is asymptotically normal: n1/2(Tn(x) - θo) d Zo~ N(0,J-1), with J equal to the Fisher

information in an observation, J = θl(x,θo)2. Exθo

(3) En[θl(x,Tn)]2 p J and -Enθθl(x,Tn) p J.(4) Suppose Tn is any sequence of estimators that solve equations of the form Eng(x,θ) = 0,

where g is twice continually differentiable and satisfies g(x,θ) = 0 if and only if θ = θo;Exθo

uniform bounds g(x,θ) β(x), θg(y,θ)2 β(x), θθg(x,θ) β(x), where Eβ(x)2 < + ;and R = -Eθg(y,θo) 0. Let S = Eg(x,θo)2. Then Tnp θo and n1/2(Tn - θ*) d Z1 ~ N(0,V),where V = R-1SR-1. Further, V J-1, so that the MLE Tn is efficient relative to Tn. Further,Z0 and Z1 have the covariance property cov(Z0,Z1 - Z0) = 0.

Result (2) in this theorem implies that to a good approximation in large samples, the estimator Tnis normal with mean θo and variance (nJ)-1, where J is the Fisher information in an observation.Since this variance is the Cramer-Rao bound for an unbiased estimator, this also suggests that oneis not going to be able to find other estimators that are also unbiased in this approximation sense andwhich have lower variance. Result 3 gives two ways of estimating the asymptotic variance J-1

consistently, where we use the fact that J-1 is a continuous function of J for J 0, so that it can beestimated consistently by the inverse of a consistent estimator of J.. Result (4) establishes that MLEis efficient relative to a broad class of estimators called M-estimators.

Proof: An intuitive demonstration of the Theorem will be given rather than formal proofs. Considerfirst the consistency result. The reasoning is as follows. Consider the expected likelihood of anobservation,

Page 152: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-21, Page 149______________________________________________________________________________

λ(θ) l(x,θ) = l(x,θ)f(x,θo)dx. Exθo

We will argue that λ(θ) has a unique maximum at θo. Then we will argue that any function whichis uniformly very close to λ(θ) must have its maximum near θo. Finally, we argue by applying auniform law of large numbers that the likelihood function is with probability approaching oneuniformly very close to λ for n sufficiently large. Together, these results will imply that withprobability approaching one, Tn is close to θo for n large.

Assumption A.4 ensures that λ(θ) is continuous, and that one can reverse the order ofdifferentiation and integration to obtain continuous derivatives θλ(θ) θl(x,θ)f(x,θo)dx θl(x,θ)

Exθo

θθλ(θ) θθl(x,θ)f(x,θo)dx θθl(x,θ)

Exθo

Starting from the identity

1 f(x,θ)dx el(x,θ)dx,

one obtains by differentiation 0 θl(x,θ)el(x,θ)dx

0 [θθl(x,θ) + θl(x,θ)2]el(x,θ)dx

Evaluated at θo, these imply 0 = θλ(θo) and -θθλ(θo) = θl(x,θ)2 = J .Exθo

Assumption A.5 requires further that J 0, and that θo is the only root of θλ(θ). Hence, λ(θ) hasa unique maximum at θo, and at no other θ satisfies a first-order condition or boundary condition fora local maximum.

We argue next that any function which is close enough to θλ(θ) will have at least one root nearθo and no roots far away from θo. The figure below graphs θλ(θ), along with a "sleeve" which is avertical distance δ from θλ. Any function trapped in the sleeve must have at least one root betweenθo - 1 and θo + 2, where [θo-1,θo+2] is the interval where the sleeve intersects the axis, and musthave no roots outside this interval. Furthermore, the uniqueness of the root θo of θλ(θ) plus thecondition θθλ(θo) < 0 imply that as δ shrinks toward zero, so do 1 and 2. In the graph, the samplescore intersects the axis within the sleeve, but for parameter values near two is outside the sleeve.The last step in the consistency argument is to show that with probability approaching one thesample score will be entirely contained within the sleeve; i.e., that Ln(x,θ) is with probabilityapproaching one contained in a δ-sleeve around λ(θ). For fixed θ, Ln(x,θ) = l(xi,θ) is a sampleaverage of i.i.d. random variables l(x,θ) with mean λ(θ). Then Kolmogorov's SLLN implies Ln(x,θ)

Page 153: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-22, Page 150______________________________________________________________________________

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Scor

e

-2 -1 0 1 2 Parameter Value

Population Score

Sample ScoreNeighborhood ofPopulation Score

as λ(θ). This is not quite enough, because there is a question of whether Ln(x,θ) could convergenon-uniformly to λ(θ), so that for any n there are some values of θ where Ln(x,θ) is outside thesleeve. However, assumptions A.1, A.3, and A.4 imply maxθΘ Ln(x,θ) - λ(θ) as 0. This followsin particular because the differentiability of f(x,θ) in θ from A.3 and the bound on θl(x,θ) from A.4imply that l(,θ) is almost surely continuous on the compact set Θ, so that the uniform SLLN inChapter 4.5 applies. This establishes that Tnas θ.

We next demonstrate the asymptotic normality of Tn. A Taylors expansion about θ of thefirst-order condition for maximization of the log likelihood function gives

(1) 0 = θLn(Tn) = θLn(θ) + θθLn(θ)(Tn-θ) + θθθLn(Tan)(Tn-θ)2/2 ,

where Tan is some point between Tn and θ. Define the quantities

Bn = θl(yi,θ), Cn = θθl(yi,θ), Dn = θθθl(yi,Tan)n1ni1 n1

ni1 n1

ni1

Multiply equation (1) by n1/2/(1+n1/2Tn-θ) and let Zn = n1/2(Tn-θ)/(l + n1/2Tn-θ). Then, one gets

0 = n1/2Bn/(1+n1/2Tn-θ) + Cn Zn + DnZn(Tn-θ)/2. We make a limiting argument on each of the terms. First, the θl(yi,θo) are i.i.d. random variableswith Eθl(yi,θo) = θλ(θo) = 0 and E[θl(yi,θo)]2 = - Eθθλ(θo) = J. Hence the Lindeberg-Levy CLTimplies Bn d Wo ~ N(0,J). Second, θθl(Yi,θo) are i.i.d. random variables with Eθθl(Yi,θo) = -J.

Hence the Khinchine WLLN implies Cn p -J. Third, Dn θθθl(yi,Tan) n1ni1

Page 154: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-23, Page 151______________________________________________________________________________

β(yi) p Eβ(Y) < +, by A.4 and Khinchines WLLN, so that Dn is stochasticallyn1ni1

bounded. Furthermore, Zn 1, implying Zn = Op(1). Since Tn is consistent, (Tn - θo) = op(1).Therefore, by rule 6 in Figure 4.3, DnZn(Tn- θo)/2 = op(1).

Given J/2 > > 0, these arguments establish we can find no such that for n > no with probabilityat least 1-, we have DnZn(Tn- θo)/2 < , Cn+J < and Bn < M for a large constant M (since Bnd Wo Bn implies Op(1)). In this event, Cn > J-, Bn + Cn n1/2(Tn-θo) < (1 + n1/2

Tn-θo), andBn M imply Cnn1/2Tn-θo - Bn Bn+Cnn1/2Tn-θo) < (1 - n1/2 Tn-θo). This implies theinequality (J - 2)n1/2

Tn-θo < M + . Therefore n1/2(Tn-θo) = Op(1); i.e., it is stochastically bounded.Therefore, by rule 6 in Figure 3.3, multiplying (2) by 1 + n1/2

Tn-θo yields 0 = Bn + Cnn1/2Tn-θo

+ op(1). But Cn p -J < 0 implies Cn-1 p -J-1. By rule 6, (Cn+J-1)Bn = op(1) and n1/2(Tn-θo) = J-1Bn +

op(1). The limit rules in Figure 3.1 then imply J-1Bn d Zo ~ N(0,J-1), n1/2Tn-θo - J-1Bn p 0, and

hence n1/2Tn-θo d Zo.

The third result in the theorem is that J is estimated consistently by

(3) Jn = θl(yi,Tn)2. n1ni1

To show this, make a Taylors expansion of this expression around θo,

(4) Jn = lθ(yi,θo)2 + 2 θl(yi,Tan)θθl(yi,Tan)(Tn-θo). n1ni1 n1

ni1

We have already shown that the first term in (4) converges in probability to J. The second term

is the product of (Tn - θo) p 0 and an expression which is bounded by 2β(yi)2 pn1ni1

2EYβ(Y)2 < + , by Khinchines WLLN. Hence the second term is op(1) and Jn p J. The final result in the theorem establishes that the MLE is efficient relative to any M-estimator

Tn satisfying g(yi,Tn) = 0, where g meets a series of regularity conditions. The firstn1ni1

conclusion in this result is that Tn is consistent and n1/2(Tn-θo) is asymptotically normal. This isactually of considerable independent interest, since many of the alternatives to MLE that are usedin econometrics for reasons of computational convenience or robustness are M-estimators. Ordinaryleast squares is a leading example of an estimator in this class. The argument for the properties ofTn are exactly the same as for the MLE case above, with g replacing θl. The only difference is thatR and S are not necessarily equal, whereas for g = θl in the MLE case, we had R = S = J. To makethe efficiency argument, consider together the Taylors expansions used to get the asymptoticdistributions of Tn and Tn,

Page 155: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-24, Page 152______________________________________________________________________________

0 = θl(yi,Tn) = θl(yi,θo) + θθl(yi,θo) n1/2(Tn-θo) + op(1)n1ni1 n1

ni1

0 = g(yi,Tn) = g(yi,θo) + gθ(Yi,θo)n1/2(Tn-θo) + op(1)n1ni1 n1

ni1

Solving these two equations gives

n1/2(Tn-θo) = J-1Wn + op(1)

n1/2(Tn-θo) = R-1Un + op(1)

with Wn = θl(yi,θo) and Un = g(yi,θo). Consider any weighted averagen1/2ni1 n1/2

ni1

of these equations,

n1/2((1-γ)Tn + γTn - θo) = J-1(1-γ)Wn + R-1γUn + op(1) .

The Lindeberg-Levy CLT implies that this expression is asymptotically normal with mean zero andvariance

Ω = J-2(1-γ)2Eθl(Yθo)2 + R-2γ2Eg(Y,θo)2 + 2J-1R-1(1- γ)γElθ(Yθo)g(Y,θo) . The condition 0 g(y,θ)f(yθ)dy g(y,θ)e l(yθ)dy, implies, differentiating under the integral sign,

0 θg(y,θ)el(y,θ)dy + θl(y,θ)g(y,θ)el(y,θ)dy .

Evaluated at θo, this implies 0 -R + Eθl(Yθo)g(Y,θo). Hence,

Ω = J-1(1-γ)2 + R-2S γ2 + 2(1-γ)γ J-1R-1R = J-1 + [R-2S - J-1]γ2. Since Ω 0 for any γ, this requires V = R-2S J-1, and hence Ω J-1. Further, note that Ω = var(Zo+γ(Z1-Zo)) = var(Zo) + γ2 var(Z1-Zo) + 2γ cov(Zo,Z1- Zo) , and var(Zo) = J-1, implying 2γ cov(Zo, Z1 - Zo) -γ2 var(Z1 - Zo) . Taking γ small positive or negative implies cov(Zo, Z1 - Zo) = 0.

Page 156: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-25, Page 153______________________________________________________________________________

6.5. EXERCISES

1. You have a random sample i = 1,...,n of observations xi drawn from a normal distribution with unknown mean µ andknown variance 1. Your prior density p(µ) for µ is normal with mean zero and variance 1/k, where k is a number youknow. You must choose an estimate T of µ. You have a quadratic loss function C(T,µ) = (T - µ)2. (a) What is thedensity of the observations, or likelihood, f(x,µ)? (b) What is the posterior density p(µx)? (c) What is the Bayes riskR(T(x)x)? (d) What is the optimal estimator T*(x) that minimizes Bayes risk?

2. A simple random sample with n observations is drawn from an exponential distribution with density λexp(-λx). (a)What is the likelihood function f(x,λ)? (b) What is the maximum likelihood estimator for λ? (c) If you have a priordensity αexp(-αλ) for λ, where α is a constant you know, what is the posterior density of λ? What is the optimalestimator that minimizes Bayes risk if you have a quadratic loss function. (d) Using characteristic functions, show thatthe exact distribution of W = 2nλx, where x is the sample mean, is chi-square with 2n degrees of freedom. Use thisto find the exact sampling distribution of the maximum likelihood estimator.

3. If h(t) is a convex function and t = T(x) is a statistic, then Jensen's inequality says that Eh(T) h(ET), with theinequality strict when h is not linear over the support of T. When h is a concave function, Eh(T) h(ET). If T is anunbiased estimator of a parameter σ2, what can you say about T1/2 as an estimator of σ and exp(T) as an estimator ofexp(σ2)?

4. A simple random sample i = 1,...,n is drawn from a binomial distribution b(K,1,p); i.e., K = k1 + ... + kn is the countof the number of times an event occurs in n independent trials, where ki = 1 with (unknown) probability p and ki = 0 withprobability 1-p for I = 1,...,n. Which of the following statistics are sufficient for the parameter p: a. (k1,...,kn); b.(k1

2,[k2+...+kn]2); c. f K/n; d. (f,[k12+...+kn

2]); e. [k12+...+kn

2] ?

5. You want to estimate mean consumption from a random sample of households i = 1,...,n. You have two alternativeincome measures, C1i which includes the value of in-kind transfers and C2i which excludes these transfers. You believethat the sample mean m1 of C1i will overstate economic consumption because in-kind transfers are not fully fungible,but the sample mean m2 of C2i will understate economic consumption because these transfers do have value. After someinvestigation, you conclude that 0.7m1 + 0.3m2 is an unbiased estimator of mean economic consumption; i.e., an in-kindtransfer that costs a dollar has a value of 70 cents to the consumer because it is not fully fungable. Your friend Dufusproposes instead the following estimator: Draw a random number between 0 and 1, report the estimate m2 if this randomnumber is less than 0.3, and report the estimate m1 otherwise. Is the Dufus estimator unbiased? Is it as satisfactory asyour estimator? (Hint: Does it pass the test of ancillarity?)

6. Suppose T(x) is an unbiased estimator of a parameter θ, and that T has a finite variance. Show that T is inadmissibleby demonstrating that (1-λ)T(x) + λ17 for λ some small positive constant has a smaller mean square error. (This iscalled a Stein shrinkage estimator. The constant 17 is obviously immaterial, zero is often used.)

7. In Problem Set 2, you investigated some of the features of the data set nyse.txt, located in the class data area, whichcontains 7806 observations from January 2, 1968 through December 31, 1998 on stock prices. The file containscolumns for the date in yymmdd format (DAT), the daily return on the New York Stock Exchange, includingdistributions (RNYSE), the Standard & Poor Stock Price Index (SP500), and the daily return on U.S. Treasury 90-daybills from the secondary market (RTB90). Define an additional variable GOOD which is one on days when RNYSEexceeds RTB90, and zero otherwise. The variable GOOD identifies the days on which an overnight buyer of the NYSEportfolio makes money. For the purposes of this exercise, make the maintained hypothesis that these observations areindependent, identically distributed draws from an underlying population; i.e., suspend concerns about dependence inthe observations across successive days or secular trends in their distribution.

Page 157: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 6-26, Page 154______________________________________________________________________________

a. Estimate E(GOOD). Describe the finite sample distribution of your estimator, and estimate its sample variance.Use a normal approximation to the finite sample distribution (i.e., match the mean and variance of the exact distribution)to estimate a 90 percent confidence bound. b. Estimate the population expectation µ of RNYSE employing the sample mean, and alternately the sample median.To obtain estimates of the distribution of these estimators, employ the following procedure, called the bootstrap. Fromthe given sample, draw a resample of the same size with replacement. (To do this, draw 7806 random integers k =floor(1+7806*u), where the u are uniform (0,1) random numbers. Then take observation k of RNYSE for each randominteger draw; some observations will be repeated and others will be omitted. Record the resample mean and median.Repeat this process 100 times, and then estimate the mean and variance of the 100 bootstrap resample means andmedians. Compare the bootstrap estimate of the precision of the sample mean estimator with what you would expectif RNYSE were normally distributed. Do confidence statements based on an assumption of normality appear to bejustified? Compare the bootstrap estimates of the precision of the mean and median estimators of µ. Does choice ofthe sample mean rather than the sample median to estimate µ appear to be justified?

Page 158: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-1, Page 155 ______________________________________________________________________________

CHAPTER 7. HYPOTHESIS TESTING

7.1. THE GENERAL PROBLEM

It is often necessary to make a decision, on the basis of available data from an experiment(carried out by yourself or by Nature), on whether a particular proposition Ho (theory, model,hypothesis) is true, or the converse H1 is true. This decision problem is often encountered inscientific investigation. Economic examples of hypotheses are

(a) The commodities market is efficient (i.e., opportunities for arbitrage are absent). (b) There is no discrimination on the basis of gender in the market for academic economists. (c) Household energy consumption is a necessity, with an income elasticity not exceeding one.(d) The survival curve for Japanese cars is less convex than that for Detroit cars.

Notice that none of these economically interesting hypotheses are framed directly as precisestatements about a probability law (e.g., a statement that the parameter in a family of probabilitydensities for the observations from an experiment takes on a specific value). A challenging part ofstatistical analysis is to set out maintained hypotheses that will be accepted by the scientificcommunity as true, and which in combination with the proposition under test give a probability law.Deciding the truth or falsity of a null hypothesis Ho presents several general issues: the cost ofmistakes, the selection and/or design of the experiment, and the choice of the test.

7.2. THE COST OF MISTAKES

Consider a two-by-two table that compares the truth with the result of the statistical decision.For now, think of each of the alternatives Ho and H1 as determining a unique probability law for theobservations; these are called simple alternatives. Later, we consider compound hypotheses oralternatives that are consistent with families of probability laws.

Truth

H0 H1

Decision

H0 Accepted Cost = 0Probability = 1 - α

Cost = CIIPorbability = β

H0 Rejected Cost = CIProbability = α

Cost = 0Probability = π = 1 - β

There are two possible mistakes, Type I in which a true hypothesis is rejected, and Type II in whicha false hypothesis is accepted. There are costs associated with these mistakes -- let CI denote the cost

Page 159: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-2, Page 156 ______________________________________________________________________________

associated with a Type I mistake, and CII denote the cost associated with a Type II mistake. If thehypothesis is true, then there is a probability α that a particular decision procedure will result inrejection; this is also called the Type I error probability or the significance level. If the hypothesisis false, there is a probability β that it will be accepted; this is called the Type II error probability.The probability π 1-β is the probability that the hypothesis will be rejected when it is false, and iscalled the power of the decision procedure.

This table is in principle completely symmetric between the states Ho and H1: You can call yourfavorite theory Ho and hope the evidence leads to it being accepted, or call it H1 and hope theevidence leads to Ho being rejected. However, classical statistical analysis is oriented so that α ischosen by design, and β requires a sometimes complex calculation. Then, the Type I error is easierto control. Thus, in classical statistics, it is usually better to assign your theory between Ho and H1so that the more critical mistake becomes the Type I mistake. For example, suppose you set out totest your favorite theory. Your study will be convincing only if your theory passes a test which itwould have a high (and known) probability of failing if it is in fact false. You can get such astringent test by making your theory H1 and selecting a null and a decision procedure for which α isknown and small; then your theory will be rejected in favor of Ho with large known probability 1-αif in fact Ho rather than H1 is true. (This will not work if you pick a "straw horse" for the null thatno one thinks is plausible.) Conversely, if you set out to do a convincing demolition of a theory thatyou think is false, then make it the null, so that there is a small known probability α of rejecting thehypothesis if it is in fact true.

A common case for hypothesis testing is that the null hypothesis Ho is simple, but the alternativehypothesis H1 is compound, containing a family of possible probability laws. Then, the probabilityof a Type II error depends on which member of this family is true. Thus, the power of a test is afunction of the specific probability law in a compound alternative. When both the null hypothesisand alternative are compound, the probability of a Type I error is a function of which member of thefamily of probability laws consistent with Ho is true. In classical statistics, the significance level isalways defined to be the "worst case": the largest α for any probability law consistent with the null.

Given the experimental data available and the statistical procedure adopted, there will be a tradeoff between the probabilities of Type I and Type II errors. When the cost CI is much larger than thecost CII, a good decision procedure will make α small relative to β. Conversely, when CI is muchsmaller than CII, the procedure will make α large relative to β. For example, suppose the nullhypothesis is that a drug is sufficiently safe and effective to be released to the market. If the drugis critical for treatment of an otherwise fatal disease, then CI is much larger than CII, and the decisionprocedure should make α small. Conversely, a drug to reduce non-life-threatening wrinkles shouldbe tested by a procedure that makes β small.

7.3. DESIGN OF THE EXPERIMENT

One way to reduce the probability of Type I and Type II errors is to collect more observationsby increasing sample size. One may also by clever design be able to get more information from agiven sample size, or more relevant information from a given data collection budget. One has the

Page 160: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-3, Page 157 ______________________________________________________________________________

widest scope for action when the data is being collected in a laboratory experiment that you canspecify. For example, the Negative Income Experiments in the 1960's and 1970's were able tospecify experimental treatments that presented subjects with different trade offs between wage andtransfer income, so that labor supply responses could be observed. However, even in investigationswhere only natural experiments are available, important choices must be made on what events tostudy and what data to collect. For example, if a survey of 1000 households is to be made todetermine the income elasticity of the demand for energy, one can get more precision byoversampling high income and low income households to get a greater spread of incomes.

There is an art to designing experiments or identifying natural experiments that allow tests of anull hypothesis without confounding by extraneous factors. For example, suppose one wishes to testthe null hypothesis that Japanese cars have the same durability as Detroit cars. One might considerthe following possible experiments:

(a) Determine the average age, by origin, of registered vehicles. (b) Sample the age/make of scrapped cars as they arrive at junk yards. (c) Draw a sample of individual new cars, and follow them longitudinally until they are scrapped.(d) Draw a sample of individual new cars, and operate them on a test track under controlledconditions until they fail.

Experiment (a) is confounded by potential differences in historical purchase patterns; some of thiscould be removed by econometric methods that condition on the number of original purchases inearlier years. Experiments (a)-(c) are confounded by possible variations in usage patterns(urban/rural, young/old, winter roads/not). For example, if rural drivers who stress their cars lesstend to buy Detroit cars, this factor rather than the intrinsic durability of the cars might make Detroitcars appear to last longer. One way to reduce this factor would be to assign drivers to car modelsrandomly, as might be done for example for cars rented by Avis in the "compact" category. Theideal way to do this is a "double blind" experiment in which neither the subject nor the data recorderknows which "treatment" is being received, so there is no possibility that bias in selection orresponse could creep in. Most economic experimental treatments are obvious to aware subjects, sothat "double blind" designs are impossible. This puts an additional burden on the researcher tocarefully randomize assignment of treatments and to structure the treatments so that their form doesnot introduce factors that confound the experimental results.

Economists are often confronted with problems and data where a designed experiment isinfeasible and Nature has not provided a clean "natural experiment", and in addition sample framesand protocols are not ideal. It may nevertheless be possible to model the data generation process totake account of sampling problems, and to use multivariate statistical methods to estimate and testhypotheses about the separate effects of different factors. This exercise can provide useful insights,but must be used cautiously and carefully to avoid misattribution and misinterpretation.Econometricians should follow the rule "Do No Harm". When a natural experiment or data are notadequate to resolve an economic hypothesis, econometric analysis should stop, and not be used todress up propositions that a righteous analysis cannot support. Every econometric study should

Page 161: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-4, Page 158 ______________________________________________________________________________

consider very carefully all the possible processes that could generate the observed data, candidlydiscuss alternative explanations of observations, and avoid unsupportable claims..

7.4. CHOICE OF THE DECISION PROCEDURE

Suppose one thinks of hypothesis testing as a statistical decision problem, like the problem facedby Cab Franc in Chapter 1, with a prior po that Ho is true and p1 = 1 - po that H1 is true. Let f(x|Ho)denote the likelihood of x if Ho is true, and f(xH1) denote the likelihood if H1 is true. Then, theposterior likelihood of Ho given x is, by application of Bayes Law, q(Hox) = f(xHo)po/[f(xHo)po+ f(xH1)p1)]. The expected cost of rejecting Ho given x is then CIq(Hox), and the expected cost ofaccepting Ho given x is CIIq(H1x). The optimal decision rule is then to reject Ho for x in the criticalregion C where CIq(Hox) < CIIq(H1x). This inequality simplifies to CIf(xHo)po < CIIf(xH1)p1,implying

x C (i.e., reject Ho) if and only if f(xH1)/f(xHo) > k Cipo/CIIp1.

The expression f(xH1)/f(xHo) is termed the likelihood ratio. The optimal criterion is then to rejectHo if and only if the likelihood ratio exceeds a threshold k. The larger CI or po, the larger thisthreshold.

A classical statistical treatment of this problem will also pick a critical region C of x for which

Ho will be rejected, and will do so by maximizing power π = f(xH1)dx subject to the constraintC

α = f(xHo)dx. But this is accomplished by picking C = xf(xH1)/f(xHo) > k, where k isC

a constant chosen so the constraint is satisfied. To see why observe that if C contains a littlerectangle [x,x+δ1], where δ is a tiny positive constant, then this rectangle contributes f(xHo)δn tomeeting the constraint and f(xH1)δn to power. The ratio f(xH1)/f(xHo) then gives the rate at whichpower is produced per unit of type I error probability used up. The optimal critical region will startwhere this rate is the highest, and keep adding to C by decreasing the rate threshold k until the typeI error probability constraint is met.

The optimal decision rule for various prior probabilities and costs and the classical statistical testprocedure trace out the same families of procedures, and will coincide when the critical likelihoodratio k in the two approaches is the same. In more general classical hypothesis testing situationswhere the alternative is compound, there is no longer an exact coincidence of the classical andstatistical decision theory approaches to decisions. However, the likelihood ratio often remains auseful basis for constructing good test procedures. In many cases, a "best" test by some classicalstatistical criterion and a test utilizing the likelihood ratio criterion will be the same or nearly thesame.

In general, we will consider DGP which we maintain are members of a family f(x,θ) indexed bya parameter θ. The null hypothesis is that the true value θo of θ is contained in a set N, and the

Page 162: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-5, Page 159 ______________________________________________________________________________

alternative is that it is contained in a set A, with A and N partitioning the universe Θ of possiblevalues of θo. The value θe of θ that maximizes f(x,θ) over θ Θ is the maximum likelihoodestimator. The theory of the maximum likelihood estimator given in Chapter 6 shows that it willhave good statistical properties in large samples under mild regularity conditions. The value θoe thatmaximizes f(x,θ) over θ N is called the constrained maximum likelihood estimator subject to thenull hypothesis. When the null hypothesis is true, the constrained maximum likelihood estimatorwill also have good statistical properties. Intuitively, the reason is that when the null hypothesis istrue, the true parameter satisfies the hypothesis, and hence the maximum value of the constrainedlikelihood will be at least as high as the value of the likelihood at the true parameter. If anidentification condition is met, the likelihood at the true parameter converges in probability to alarger number than the likelihood at any other parameter value. Then, the constrained maximumlikelihood must converge in probability to the true parameter. A rigorous proof of the properties ofconstrained estimators is given in Chapter 22.

A likelihood ratio critical region for the general testing problem is usually defined as a set of theform

C = x supθAf(x,θ)/supθNf(x,θ) > k.

The likelihood ratio in this criterion is less than or equal to one when the maximum likelihoodestimator of θo falls in N, and otherwise is greater than one. Then a critical region defined for somek > 1 will include the observed vectors x that are the least likely to have been generated by a DGPwith a parameter in N. The significance level of the test is set by adjusting k.

Since supθΘf(x,θ)/supθNf(x,θ) = max1,supθAf(x,θ)/supθNf(x,θ), an equivalent expression forthe critical region when k > 1 is

C = x supθΘf(x,θ)/supθNf(x,θ) > k.

This can also be expressed in terms of the log likelihood function,

C = x supθΘ log f(x,θ) - supθNlog f(x,θ) > κ = log k.

Clearly, the log ratio in this expression equals the difference in the log likelihood evaluated at themaximum likelihood estimator and the log likelihood evaluated at the constrained maximumlikelihood estimator. This difference is zero if the maximum likelihood estimator is in N, and isotherwise positive.

The analyst will often have available alternative testing procedures in a classical testing situation.For example, one procedure to test a hypothesis about a location parameter might be based on thesample mean, a second might be based on the sample median, and a third might be based on thelikelihood ratio. Some of these procedures may be better than others in the sense of giving higherpower for the same significance level. The ideal, as in the simple case, is to maximize the powergiven the significance level. When there is a compound alternative, so that power is a function ofthe alternative, one may be able to tailor the test to have high power against alternatives of particular

Page 163: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-6, Page 160 ______________________________________________________________________________

0

0.2

0.4

0.6

0.8

1

Pow

er

-2 -1 0 1 2 True Parameter Value

D

A

C

B

D

A

C

B

importance. In a few cases, there will be a single procedure that will have uniformly best poweragainst a whole range of alternatives. If so, this will be called a uniformly most powerful test.

The figure below shows power functions for some alternative test procedures. The nullhypothesis is that a parameter θ is zero. Power curves A and B equal 0.05 when Ho: θ = 0 is true.Then, the significance level of these three procedures is α = 0.05. The significance level of D ismuch higher, 0.5. Compare the curves A and B. Since A lies everywhere above B and has the samesignificance level, A is clearly the superior procedure. A comparison like A and B most commonlyarises when A uses more data than B; that is, A corresponds to a larger sample. However, it is alsopossible to get a picture like this when A and B are using the same sample, but B makes poor useof the information in the sample.

Compare curves A and C. Curve C has significance level α = 0.05, and has lower power thanA against alternatives less than θ = 0, but better power against alternatives greater than θ = 0. Thus,A is a better test it we want to test against all alternatives, while C is a better test if we are mainlyinterested in alternatives to the right of θ = 0 (i.e., we want to test Ho: θ 0). Compare curves A andD. Curve D has high power, but at the cost of a high probability of a Type I error. Thus, A and Drepresent a trade off between Type I and Type II errors.

Finally, suppose we are most interested in the alternative H1: θ = 1.5. The procedure giving curveA has power 0.61 against this alternative, and hence has a reasonable chance of discriminatingbetween Ho and H1. On the other hand, the procedure B has power 0.32, and much less chance ofdiscriminating. We would conclude that the procedure A is a moderately satisfactory statistical testprocedure, while B is of limited use.

Page 164: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-7, Page 161 ______________________________________________________________________________

7.5. HYPOTHESIS TESTING IN NORMAL POPULATIONS

This section provides a summary of hypothesis test calculations for standard setups involvingdata drawn from a normal population, including power calculations. Assume that we start from asimple random sample of size n, giving i.i.d. observations x1,...,xn. Recall from Chapter 6.3 that thelog likelihood of a normal random sample is

L(x,µ,σ2) = - Log(2π) - Log σ2 - (xi-µ)2/σ2n2

n2

12

ni1

= - Log(2π) - Log σ2 - - (x - µ)2 L(x,s2,µ,σ2) .n

2n2

12

(n1)s 2

σ2

n2σ2

where the sample mean x = xi and the sample variance s2 = (xi-x)2 are unbiasedn

i1

1n1

n

i1

estimators of µ and σ2, respectively. If N denotes the set of parameter values (µ,σ2) consistent witha null hypothesis, then a likelihood ratio critical set for this hypothesis will take the form

C = (x,s2) supθΘ L(x,s2,µ,σ2) - supθN L(x,s2,µ,σ2) > κ.

We consider a sequence of hypotheses and conditions. See Chapter 3.7 for the densities and otherproperties of the distributions used in this section, and Chapter 6.3 for the relation of thesedistributions to data from a normal population. The following table gives the statistical functionsthat are available in many econometrics software packages; the specific notation is that used in theStatistical Software Tools (SST) package. Tables at the end of most statistics texts can also be usedto obtain values of the central versions of these distributions. The direct functions give the CDFprobability for a specified argument, while the inverse functions give the argument that yields aspecified probability.

Distribution CDF Inverse CDFNormal cumnorm(x) invnorm(p)Chi-Square cumchi(x.df) invchi(p,df)F-Distribution cumf(xdf1,df2) invf(p,df1,df2)T-Distribution cumt(x,df) invt(p,df)Non-central Chi-Square cumchi(x,df,δ) NANon-central F-Distribution cumf(x,df1,df2,δ) NANon-Central T-Distribution cumt(x,df,λ) NA

Page 165: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-8, Page 162 ______________________________________________________________________________

In this table, df denotes degrees of freedom, and λ and δ are non-centrality parameters. InverseCDF's are not available for non-central distributions in most packages, and are not needed. In moststatistical packages, values of these functions can either be printed out or saved for furthercalculations. For example, in SST, the command calc cumnorm(1.7) will print out the probabilitythat a standard normal random variable is less than 1.7, the command calc p = cumnorm(1.7) willstore the result of this calculation in the variable p for further use, and a subsequent command calcp will also print out its value.

Problem 1: Testing the mean of a normal population that has known variance Suppose a random sample of size n from a normal population with an unknown mean µ and a

known variance σ2. The null hypothesis is Ho: µ = µo, and the alternative is H1: µ µo. Verify thatthe likelihood ratio, Maxµn(x,µ,1)/n(x,µo,1), is an increasing function of (x - µo)2. Hence, a testequivalent to a likelihood ratio test can be based on (x - µo)2. From Chapter 6.3(8), one has the resultthat under the null hypothesis, the statistic n(x - µo)2/σ2 is distributed χ1

2. Alternately, from Chapter6.3(5), the square root of this expression, n1/2(x - µo)/σ, has a standard normal distribution.

Using the Chi-Square form of the statistic, the critical region will be values exceeding a criticallevel zc, where zc is chosen so that the selected significance level α satisfies χ1

2(zc) = 1 - α. Forexample, taking α = 0.05 yields zc = 3.84146. This comes from a statistical table, or from the SSTcommand calc invchi(1- α,k), where α is the significance level and k is degrees of freedom. Thetest procedure rejects Ho whenever

(1) n(x - µo)2/σ2 > zc = 3.84146.

Consider the power of the Chi-square test against an alternative such as µ = µ1 µo. Thenon-centrality parameter is

(2) δ = n(µ1-µo)2/σ2.

For example, if µ1-µo = 1.2, σ2 = 25, and n = 100, then δ = 1.44100/25 = 5.76. The power iscalculated from the non-central Chi-square distribution (with 1 degree of freedom), and equals theprobability that a random draw from this distribution exceeds zc. This probability π is readilycalculated using the SST command calc 1 - cumchi(zc,k,δ). In the example, π = calc 1 -cumchi(3.84146,1,5.76) = 0.67006. Then, a test with a five percent significance level has power of67 percent against the alternative that the true mean is 1.2 units larger than hypothesized.

An equivalent test can be carried out using the standard normal distributed form n1/2(x - µo)/σ.The critical region will be values of this expression that in magnitude exceed a critical level wc,where wc is chosen for a specified significance level α so that a draw from a standard normal densityhas probability α/2 of being below -wc, and symmetrically a probability α/2 of being above +wc. Onecan find wc from statistical tables, or by using a SST command calc invnorm(1-α/2). For example,if α = 0.05, then wc = calc invnorm(0.975) = 1.95996. The test rejects Ho whenever

(3) n1/2(x - µo)/σ < -wc or n1/2(x - µo)/σ > wc.

Page 166: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-9, Page 163 ______________________________________________________________________________

For example, if n = 100, σ = 5, and µo = 0, the critical region for a test with significance level α =0.05 is 10 x/5 < -1.95996 or 10x/5 > +1.95996. Note that wc

2 = zc, so this test rejects exactly whenthe Chi-square test rejects. The power of the test above against the alternative µ = µ1 µo is theprobability that the random variable n1/2(x - µo)/σ lies in the critical region when x ~ N(µ1,σ2). In thiscase, n1/2(x - µ1)/σ Y is standard normal, and therefore n1/2(x - µo)/σ Y + λ, where

(4) λ = n1/2(µ1 - µo)/σ.

Note that λ2 = δ, where δ is given in (2). The probability of rejection in the left tail is Pr(n1/2(x - µo)/σ< -wcµ = µ1) = Pr(Y < - wc- λ). For the right tail, Pr(n1/2(x - µo)/σ > wcµ = µ1) = Pr(Y > wc- λ).Using the fact that the standard normal is symmetric, we then have

(5) π = Φ(-wc- λ) + 1 - Φ(wc- λ) Φ(-wc- λ) + Φ(-wc+ λ).

This can be calculated using the SST command

π = calc cumnorm(-wc-λ) + cumnorm(-wc+λ).

For example, σ = 5, N = 100, µ1- µo = 1.2, wc = 1.95996 give δ = 2.4 and power π = calccumnorm(-wc - 2.4) + cumnorm(-wc + 2.4) = 0.670. Note this is the same as the power of theChi-square version of the test.

Suppose that instead of testing the null hypothesis Ho: µ = µo against the alternative H1: µ µo,you want to test the one-sided hypothesis Ho: µ µo against the alternative H1: µ > µo. The

likelihood ratio in this case is n(x,µ,σ2)/ n(x,µ,σ2), which is constant for x µoSupµ>µoSupµµo

and is monotone increasing in (x - µo) for x > µo. Hence, a test that rejects Ho for x - µo large appearsdesirable. This suggests using a test based on the statistic n1/2(x-µo)/σ, which is normal with varianceone, and has a non-positive mean under the null. Pick a critical level wc > 0 such that

Prob(n1/2(x-µ)/σ > wc) = α.Supµµo

Note that the sup is taken over all the possible true µ consistent with Ho, and that α is the selectedsignificance level. The maximum probability of Type I error is achieved when µ = µo. (To see this,note that Prob(n1/2(x-µo)/σ > wc) Pr(Y n1/2(x-µ)/σ > wc + n1/2(µo- µ)/σ, where µ is the true value.Since Y is standard normal, this probability is largest over µ µo at µ = µo.) Then, wc is determinedto give probability α that a draw from a standard normal exceeds wc. For example, if n = 100, α =0.05, σ = 5, and Ho is that µ 0, then wc = calc invnorm(0.95) = 1.64485. The power of the test ofµ µo = 0 against the alternative µ = µ1 = 1.2 is given by

(6) π = Pr(n1/2(x-µo)/σ > wcµ = µ1) Pr(Y n1/2(x-µ1)/σ > wc - λ) 1 - Φ(wc- λ) Φ(-wc+ λ) calc cumnorm(-wc + λ),

Page 167: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-10, Page 164 ______________________________________________________________________________

where λ is given in (4). In the example, π = calc cumnorm(- 1.64485 + 2.4) = 0.775. Hence, a testwhich has a probability of at most α = 0.05 of rejecting the null hypothesis when it is true has power0.775 against the specific alternative µ1 = 1.2.

Problem 2. Testing the Mean of a Normal Population with Unknown Variance This problem is identical to Problem 1, except that σ2 must now be estimated. Use the estimator

s2 for σ2 in the Problem 1 test statistics. From Chapter 6.3(8), the Chi-square test statistic with σreplaced by s, F = n(x - µo)2/s2, has an F-distribution with degrees of freedom 1 and N-1. Hence, totest Ho: µ = µo against the alternative H1: µ µo, find a critical level zc such that a specifiedsignificance level α equals the probability that a draw from F1,n-1 exceeds zc. The SST function calczc = invf(1-α,1,n-1) gives this critical level; it can also be found in standard tables. For n = 100 andα = 0.05, the critical level is zc = 3.93694.

The power of the test against an alternative µ1 is the probability that the statistic F exceeds zc.Under this alternative, F has a non-central F-distribution (from Chapter 3.9) with the non-centralityparameter δ = n(µ1- µo)2/σ2 given in (2). Then, the power is given by

(7) π = calc 1 - cumf(zc,1,n-1,δ).

In the example with µ1- µo = 1.2 and σ2 = 25, one has δ = 144/25, and the power is

(8) π = calc 1 - cumf(3.93694,1,99,144/25) = 0.662.

The non-centrality parameter is defined using the true σ2 rather than the estimate s2. Calculatingpower at an estimated non-centrality parameter δe = n(µ1- µo)2/s2 introduces some error -- you willevaluate the power curve at a point somewhat different than you would like. For most practicalpurposes, you do not need an exact calculation of power; you are more interested in whether it is 0.1or 0.9. Then, the error introduced by this approximation can be ignored. In particular, for largesample sizes where the power against economically interesting alternatives is near one, this error isusually negligible. Note that δ/δe = s2/σ2, so (n-1)δ/δe is distributed χ2(n-1). For the rare applicationwhere you really need to know how precise your power calculation is, you can form a confidenceinterval as follows: Given a "significance level" α, compute z1 = calc invchi(α/2,n-1) and z2 = calcinvchi(1-α/2,n-1). Then, with probability α, δ1 z1δe/(n-1) < δ < z1δe/(n-1) δ2. The power π1calculated at δ1 and the power π2 calculated at δ2 give a α-level confidence bound on the exact power.For example, α = 0.5, n = 100, µ1 - µo = 1.2, and s2 = 25 imply δe = 144/25, z1 = calc invchi(.25,99)= 89.18, δ1 = 5.189, and π1 = calc 1-cumf(3.93694,1,99,5.189) = 0.616. Also, z2 = calcinvchi(.75,99) = 108.093, δ2 = 6.289, and π2 = calc 1-cumf(3.93694,1,99,6.289) = 0.700. Then, withprobability 0.5, the exact power for the alternative µ1 - µo = 2 is in the interval [0.616,0.700].

The test of Ho: µ = µo can be carried out equivalently using

(9) T = n1/2(x - µo)/s,

Page 168: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-11, Page 165 ______________________________________________________________________________

which by Chapter 6.3(7) has a t-distribution with n-1 degrees of freedom under Ho: µ = µo. For asignificance level α, choose a critical level wc, and reject the null hypothesis when T > wc. Thevalue of wc satisfies α/2 = tn-1(-wc), and is given in standard tables, or in SST by wc = invt(1-α/2,n-1).For the example α = 0.05 and n = 100, this value is wc = calc invt(.975,99) = 1.9842.

The power of the test is calculated as in Problem 1, replacing the normal distribution by thenon-central t-distribution: π = tn-1,λ(-wc) + 1 - tn-1,λ(wc), where λ = n1/2(µ1- µo)/σ as in equation (4).Points of the non-central t are not in standard tables, but are provided by a SST function, π =cumt(-wc,n-1,λ) + 1 - cumt(wc,n-1,λ). For the example α = 0.05, N = 100, σ = 5, and µ1-µo = 1.2imply λ = 2.4, and this formula gives π =

The T-statistic (9) can be used to test the one-sided hypothesis Ho: µ µo. The hypothesis willbe rejected if T > wc, where wc satisfies α = tn-1(-wc), and is given in standard tables, or in SST bywc = invt(1-α,n-1). The power of the test is calculated in the same way as the one-sided test inProblem 1, with the non-central t-distribution replacing the normal: π = 1 - cumt(wc,n-1,λ).

Problem 3. Testing the Variance of a Normal Population with Unknown Mean Suppose Ho: σ2 = σo

2 versus the alternative H1 that this equality does not hold.. Under the null,the statistic X (n-1)s2/σo

2 is distributed χ2(n-1). Then, a test with significance level α can be madeby rejecting Ho if X < zc1 or X > zc2, where zc1 and zc2 are chosen so the probability is α/2 that a drawfrom χ2(n-1) is less than zc1, and α/2 that it is greater than zc2. These can be calculated using zc1 =calc invchi(α/2,n-1) and zc2 = calc invchi(1-α/2,n-1). To calculate the power of the test against thealternative H1: σ2 = σ1

2, note that in this case (n-1)s2/σ12 = Xσo

2/σ12 Y is χ2(n-1). Then,

π = 1 - Pr(zc1 X zc2σ2=σ12) = 1 - Pr(zc1σo

2/σ12 Y zc2σo

2/σ12)

= calc cumchi(zc1σo2/σ1

2,n-1) + 1 - cumchi(zc2σo2/σ1

2,n-1).

Problem 4. Testing the Equality of Unknown Variances in Two PopulationsSuppose independent random samples of sizes ni are drawn from normal populations with means

µi and variances σi2, respectively, for i = 1,2. The null hypothesis is Ho: σ1

2 = σ22, and the alternative

is σ12 σ2

2. For each population, we know from 3.6 that (ni-1)si2/σi

2 has a Chi-square distributionwith ni-1 degrees of freedom. Further, we know that the ratio of two independent Chi-squaredistributed random variables, each divided by its degrees of freedom, has an F-distribution with theserespective degrees of freedom. Then, R = s1

2/s22 is distributed F(n1-1,n2-1) under Ho. One can form

a critical region C = R|R<cL or R > cU that has significance level α by choosing the lower andupper tails cL and cU of the F-distribution so that each has probability α/2.

Under alternatives to the null, the ratio s12/s2

2, multiplied by the ratio σ22/σ1

2, has a centralF(n1-1,n2-1)-distribution, and the power of the test is

π = 1 - Prob(cL R cU) = 1 - Prob(cL σ22/σ1

2 R σ22/σ1

2 cU σ22/σ1

2)= F(cLσ2

2/σ12,n1-1,n2-1) + 1 - F(cUσ2

2/σ12,n1-1,n2-1).

Page 169: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-12, Page 166 ______________________________________________________________________________

Problem 5. Testing the Equality of Unknown Means in Two Populations with a CommonUnknown Variance

Suppose independent random samples of sizes ni are drawn from normal populations with meansµi for i = 1,2 and a common variance σ2. The null hypothesis is Ho: µ1 = µ2, and the alternative isµ1 µ2. Then x1 - x2 is normally distributed with mean µ1 - µ2 and variance σ2(n1

-1 + n2-1). Further,

(n1-1)s12/σ2 is chi-square with n1-1 degrees of freedom, (n2-1)s2

2/σ2 is chi-square with n2-1 degreesof freedom, and all three random variables are independent. Then ((n1-1)s1

2 + (n2-1)s22)/σ2 is chi-

square with n1 + n2 - 2 degrees of freedom. It follows that

s2 = (n1-1 + n2

-1)((n1-1)s12 + (n2-1)s2

2)/(n1 + n2 - 2)

is an unbiased estimator of σ2(n1-1 + n2

-1), with (n1 + n2 - 2)s2/σ2(n1-1 + n2

-1) distributed Chi-squarewith n1 + n2 - 2 degrees of freedom. Therefore, the statistic

(x1 - x2)/s = (x1 - x2)/[(n1-1 + n2

-1)((n1-1)s12 + (n2-1)s2

2)/(n1 + n2 - 2)]1/2

is distributed under the null hypothesis with a T-distribution with n1 + n2 - 2 degrees of freedom.The power against an alternative µ1 µ2 is calculated exactly as in Problem 2, following (9), exceptthe degrees of freedom is now n1 + n2 - 2 and the non-centrality parameter is

λ =(µ1- µ2)/σ(n1-1 + n2

-1)1/2.

7.6. HYPOTHESIS TESTING IN LARGE SAMPLES

Consider data x = (x1,...,xn) obtained by simple random sampling from a population with densityf(x,θo), where θo is a k×1 vector of unknown parameters contained in the interior of a set Θ. The

sample DGP is f(x,θo) = f(xi,θo) and log likelihood is Ln(x,θ) = l(xn,θ), where l(x,θ)ni1

ni1

= log f(x,θ) is the log likelihood of an observation. Consider the maximum likelihood estimatorTn(x), given by the value of θ that maximizes Ln(x,θ). Under general regularity conditions like thosegiven in Chapter 6.4, the maximum likelihood estimator is consistent and asymptotically normal.This implies specifically that n1/2(Tn(x)-θo) d Zo with Zo ~ N(O,J-1) and J the Fisher information inan observation, J = E [θl(x,θo)][θl(x,θo)]. The Chapter 3.1.18 rule for limits of continuoustransformations implies n1/2

J1/2(Tn(x)-θo) d N(0,I), and hence that the quadratic form W(x,θo) n(Tn(x)-θo)J(Tn(x)-θo) (Tn(x)-θo)V(Tn(x))-1(Tn(x)- θo) d χ2(k), the Chi-square distribution withk degrees of freedom. When k = 1, this quadratic form equals the square of the difference betweenTn(x) and θo, divided by the variance V(Tn(x)) of Tn(x). The square root of this expression, (Tn(x)-θo)/(V(Tn(x)))1/2, converges in distribution to a standard normal.

Consider the null hypothesis Ho: θ = θo. When this null hypothesis is true, the quadratic formW(x,θo) has a limiting Chi-square distribution with k degrees of freedom. Then, a test of thehypothesis with a significance level α can be carried out by choosing a critical level c from the upper

Page 170: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-13, Page 167 ______________________________________________________________________________

tail of the χ2(k) distribution so that the tail has probability α, and rejecting Ho when W(x,θo) > c. Weterm W(x,θo) the Wald statistic.

Suppose an alternative H1: θ = θ1 to the null hypothesis is true. The power of the Wald test isthe probability that the null hypothesis will be rejected when H1 holds. But in this case,n1/2

J1/2(Tn(x)-θo) = n1/2J1/2(Tn(x)-θ1) + n1/2

J1/2(θ1-θo), with the first term converging in distribution toN(0,I). For fixed θ1 θo, the second term blows up. This implies that the probability thatn1/2

J1/2(Tn(x)-θo) is small enough to accept the null hypothesis goes to zero, and the power of the testgoes to one. A test with this property is called consistent, and consistency is usually taken to be aminimum requirement for a hypothesis testing procedure to be statistically satisfactory. A closerlook at the power of a test in large samples is usually done by considering what is called local power.Suppose one takes a sequence of alternatives to the null hypothesis that get closer and closer to thenull as sample size grows. Specifically, consider H1: θ = θo + λ/n1/2. For this sequence ofalternatives, the term n1/2

J1/2(θ1-θo) = J1/2δ is a constant, and we have the result that n1/2J1/2(Tn(x)-θo)

d N(J1/2 λ,I). This implies that (Tn(x)-θo)(nJ)(Tn(x)-θo), the Wald statistic, converges indistribution to a noncentral Chi-square distribution with k degrees of freedom and a noncentralityparameter λJλ. The local power of the test is the probability in the upper tail of this distributionabove the critical level c for the Wald statistic. The local power will be a number between zero andone which provides useful information on the ability of the test to distinguish the null from nearbyalternatives. In finite sample applications, the local power approximation can be used for a specificalternative θ1 of interest by taking λ = n1/2

(θ1-θo) and using the noncentral Chi-square distributionas described above.

In practice, we do not know the Fisher Information J exactly, but must estimate it from thesample by

(10) Jen = En[θl(x,Tn)][θl(xi,Tn)] [θl(xi,Tn)][θl(xi,Tn)].n 1ni1

The expression in (10) is termed the outer product of the score θl(xi,Tn) of an observation. Whenthere is a single parameter, this reduces to the square of θl(xi,Tn); otherwise, it is a k×k array ofsquares and cross-products of the components of θl(xi,Tn). From the theorem in Chapter 6.4, Jen pJ, and the rule 1.17 in Chapter 4 implies that replacing J by Jen in the Wald test statistic does notchange its asymptotic distribution.

In the discussion of maximum likelihood estimation in Chapter 6.4 and the proof of itsasymptotic normality, we established that when θo is the true parameter,

(11) n1/2(Tn(x)-θo) = J-1

θLn(x,θo)/n1/2 + op(1);

that is, the difference of the maximum likelihood estimator from the true parameter, normalized byn1/2, equals the normalized score of the likelihood at θo, transformed by J-1, plus asymptoticallynegligible terms. If we substitute (11) into the Wald statistic, we obtain LM(x,θo) = W(x,θo) + op(1),where

(12) LM(x,θo) = [θL(x,θo)](nJ)-1[θL(x,θo)].

Page 171: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-14, Page 168 ______________________________________________________________________________

The statistic (12) is called the Lagrange Multiplier (LM) statistic, or the score statistic. The nameLagrange Multiplier comes from the fact that if we maximize Ln(x,θ) subject to the constraint θo -θ = 0 by setting up the Lagrangian Ln(x,θ) + λ(θo-θ), we obtain the first order condition λ = θLn(x,θ)and hence LM(x,θo) =λ(nJ)-1λ. Because LM(x,θo) is asymptotically equivalent to the Wald statistic,it will have the same asymptotic distribution, so that the same rules apply for determining criticallevels and calculating power. The Wald and LM statistics will have different numerical values infinite samples, and sometimes one will accept a null hypothesis when the other rejects. However,when sample sizes are large, their asymptotic equivalence implies that most of the time they willeither both accept or both reject, and that they have the same power. In applications, J in (11) mustbe replaced by an estimate, either Jen from (10), or Joen = En[θl(x,θo)][θl(xi,θo)] in which the scoreis evaluated at the hypothesized θo. Both converge in probability to J, and substitution of either in(12) leaves the asymptotic distribution of the LM statistic unchanged. A major advantage of the LMform of the asymptotic test statistic is that it does not require that one compute the estimate Tn(x). Computation of maximum likelihood estimates can sometimes be difficult. In these cases, the LMstatistic avoids the difficulty.

The generalized likelihood ratio criterion was suggested in a number of simple tests ofhypotheses as a good general procedure for obtaining test statistics. This method rejects Ho if

(13) κ < maxθ Ln(x,θ) - Ln(x,θo),

where κ is a constant that is adjusted to give the desired significance level for the test. A Taylorsexpansion of Ln(x,θo) about Tn(x) yields

(14) Ln(x,Tn(x)) - Ln(x,θo) = θLn(x,Tn(x))(Tn(x)-θo) - (Tn(x)-θo)θθLn(x,θen)(Tn(x)-θo),

where θen is between θo and θn. But θLn(x,Tn(x)) = 0. Under the regularity conditions in Chapter6.4, θθLn(x,θen)/n p J. (To make the last statement rigorous, one needs to either establish that theconvergence in probability of θθLn(x,θ)/n to J(θ) is uniform in θ, or expand θθLn(x,θen)/n to firstorder about θo and argue that the first term goes in probability to -J and the second term goes inprobability to zero.) Then, LR(x,θo) = 2[Ln(x,Tn(x)) - Ln(x,θo)], termed the likelihood ratio statistic,satisfies

(15) LR(x,θo) = (Tn(x)-θo)(nJ)(Tn(x)-θo) + op(1) W(x,θo) + op(1),

and the LR statistic is asymptotically equivalent to the Wald statistic. Therefore, the LR statistic willbe asymptotically distributed Chi-square with k degrees of freedom, where k is the dimension of θo,and its local power is the same as that of the Wald statistic, and calculated in the same way.

The major advantage of the LR statistic is that its computation requires only the values of the loglikelihood unrestricted and with the null imposed; it is unnecessary to obtain an estimate of J orperform any matrix calculations. We conclude that the trinity consisting of the Wald, LM, and LRstatistics are all asymptotically equivalent, and provide completely substitutable ways of testing ahypothesis using a large sample approximation.

Page 172: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-15, Page 169 ______________________________________________________________________________

7.6. EXERCISES

1. Use the data set nyse.txt in the class data area, and the variable RNYSE giving the daily rate of return on the NewYork Stock Exchange. For the purpose of this exercise, make the maintained hypothesis that the observations areindependent and identically normally distributed. Let µ denote the population mean and σ2 denote the populationvariance of RNYSE.

a. Test Ho: µ = 0.0003 versus H1: µ 0.0003 at significance level α = 0.04. At α = 0.01. What is the power of eachof these tests against the alternative µ = 0.0005?

b. Test Ho: µ 0.0003 versus H1: µ < 0.0003 at significance level α = 0.05. What is the power of this test againstthe alternative µ = 0.0005?

c. Test Ho: σ2 = 0.0001 versus H1: σ2 0.0001 at significance level α = 0.01. What is the power of this test againstthe alternative σ2 = 0.000095?

d. Some analysts claim that opening of international capital markets in the 1980's improved the productivity ofcapital in large multinational corporations, and this has in turn led to higher mean returns to equity. Make themaintained hypothesis that the variance of returns is constant over the full observation period in nyse.txt. Test thehypothesis that mean return after January 1, 1985 was the same as the mean return prior to that date, versus thealternative that it was not. Use α = 0.01.

e. Some analysts claim that the introduction of dynamic hedging strategies and electronic trading, beginning aroundJanuary 1, 1985, has made the stock market more volatile than it was previously. Test the hypothesis that the variancein RNYSE after that date was higher than the variance before that date, versus the alternative that it was smaller. Useα = 0.02. Do not maintain the hypothesis of a common mean return in the two periods.

2. The table gives the investment rate Xi in 8 developed countries. Assume that the Xi are i.i.d. draws from a normallydistributed population.

Country Ratio of Gross Fixed CapitalFormation to GDP (Pct.,1993)

Japan 30.1Germany 22.7Netherlands 19.7France 18.9Canada 18.2Italy 17.1U.S.A. 16.1U.K. 14.9

(a) Test the hypothesis that the population mean investment rate is no greater than 17.0 using a significance levelof 95 percent. Be specific about the test statistic, its distribution under the null, and the critical level you would use.

(b) Compute the power of this test against the alternative that the mean is 20.0. Be specific about the distributionthat would be used for the power calculation. Give numerical values for the parameters of this distribution, substitutings for σ if necessary. Give a numerical value for the power.

3. Suppose a random sample of size 4 is drawn from a uniform distribution on [0,θ]. You want to test Ho: θ 2versus H1: θ > 2 by rejecting the null if Max(Xn) > K. Find the value of K that gives significance level α = 0.05. Construct the power curve for this test.

4. A random sample X1,...,XN is drawn from a normal density. The variance is known to be 25. You want to testthe hypothesis Ho: µ 2 versus the alternative H1: µ < 2 at significance level α = 0.01, and you would like to havepower π = 0.99 against the alternative µ = 1. What sample size do you need?

Page 173: McFadden-Statistical Tools for Economists

McFadden, Statistical Tools © 2000 Chapter 7-16, Page 170 ______________________________________________________________________________

5. Let X1,...,XN be a random sample from a density whose mean is µ and variance is σ2. Consider estimators of µ of

the form m = aNnXn, where the aNn are non-random weights. Under what conditions on the weights is mNn1

unbiased? Among unbiased estimators of this form, what weights give minimum variance?

6. A husband and wife are both laid off when the local auto assembly plant closes, and begin searching for new jobson the same day. The number of weeks Y the wife takes to find a job has a geometric density, fY(y) = py-1(1-p), for y =1,2,..., where p is a parameter. The number of weeks X it takes the husband to find a job is independent of Y, and alsohas a geometric density, fX(x) = qx-1(1-q), for x = 1,2,..., where q is a parameter. The parameters have the values p =0.5 and q = 0.75. Useful facts about the geometric density f(z) = rz-1(1-r) for z = 1,2,...are (I) EZ(Z-1)(Z-n)

(z-1)(z-n)rz-1(1-r) (r/(1-r))n n! for n = 1,2,.. and (ii) Pr(Z > t) rz-1(1-r) rt .

z1

z1

a. What is the expected value of the difference between the lengths of the unemployment spells of the husband andthe wife?

b. If the wife is unemployed for at least 6 weeks, what is the expectation of the total number of person-weeks ofunemployment insurance the couple will receive, assuming benefits continue for a person as long as he or she isunemployed?

c. What is the probability that the unemployment spell for the husband is greater than that for the wife? d. What is the expected time until at least one member of the couple is employed? e. What is the expected time until both husband and wife are employed?

7. Let X1,...,XN be a random sample from a uniform distribution on [0,θ], where θ is an unknown parameter. Show that T = [(N+1)/N]Max(Xn) is an unbiased estimator for θ. What is its variance? What is theasymptotic distribution of N(T-θ)?

8. You decide to become a bookmaker for the next Nobel prize in Economics. Three events will determinethe outcomes of wagers that are placed:

U. The prize will go to a permanent resident of the U.S.R. The prize will go to an economist politically more conservative than Milton FriedmanA. The prize will go to someone over 70 years of age(a) You are given the following probabilities: P(AU) = 5/6, P(AR) = 4/5, P(AR&U) = 6/7, P(UA&R) =

3/4, P(RA) = 4/7, P(RA&U) = 3/5. Find P(RU), P(A&RU), P(ARU).(b) If in addition, you are given P(U) = 3/5, find P(A), P(R), P(A&R), P(A&R&U).© You want to sell a ticket that will pay $2 if one of the events U,R,A occurs, $4 if two occur, and $8 if all

three occur. What is the minimum price for the ticket such that you will not have an expected loss?

9. If X is standard normal, what is the density of X? Of exp(X)? Of 1/X?

10. You wish to enter a sealed bid auction for a computer that has a value to you of $3K if you win the auction.You believe that each competitor will make a bid that is uniformly distributed on the interval [$2K,$3K].

(a) If you know that the number N of other bidders is 3, what is the probability that all competitor's bids areless than $2.9K? What should you bid to maximize your expected profit?

(b) Suppose the number N of other bidders is unknown, but you believe N is Poisson distributed with anexpected value EN = 5. What is the probability that the maximum bid from your competitors is less than x? Whatshould you bid to maximize expected profit?