Springer Texts in Statistics978-3-319-52401-6/1.pdf · Springer Texts in Statistics ... Matthew A. Carlton † Jay L. Devore Probability with Applications in Engineering ... for carrying
Post on 20-Aug-2018
243 Views
Preview:
Transcript
Springer Texts in Statistics
More information about this series at http://www.springer.com/series/417
Series Editors
R. DeVeaux
S.E. Fienberg
I. Olkin
Matthew A. Carlton • Jay L. Devore
Probability with Applicationsin Engineering, Science,and Technology
Second Edition
EditorsMatthew A. CarltonDepartment of StatisticsCalifornia Polytechnic State UniversitySan Luis Obispo, CA, USA
Jay L. DevoreDepartment of StatisticsCalifornia Polytechnic State UniversitySan Luis Obispo, CA, USA
ISSN 1431-875X ISSN 2197-4136 (eBook)Springer Texts in StatisticsISBN 978-3-319-52400-9 ISBN 978-3-319-52401-6 (eBook)DOI 10.1007/978-3-319-52401-6
Library of Congress Control Number: 2017932278
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole orpart of the material is concerned, specifically the rights of translation, reprinting, reuse ofillustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way,and transmission or information storage and retrieval, electronic adaptation, computer software,or by similar or dissimilar methodology now known or hereafter developed.The use of general descriptive names, registered names, trademarks, service marks, etc. in thispublication does not imply, even in the absence of a specific statement, that such names areexempt from the relevant protective laws and regulations and therefore free for general use.The publisher, the authors and the editors are safe to assume that the advice and information inthis book are believed to be true and accurate at the date of publication. Neither the publisher northe authors or the editors give a warranty, express or implied, with respect to the materialcontained herein or for any errors or omissions that may have been made.
Printed on acid-free paper
This Springer imprint is published by Springer NatureThe registered company is Springer International Publishing AGThe registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Additional material to this book can be downloaded from http://extras.springer.com.
# Springer International Publishing AG 2014, 2017
Preface
Purpose
Our objective is to provide a post-calculus introduction to the subject of probability that
• Has mathematical integrity and contains some underlying theory
• Shows students a broad range of applications involving real problem scenarios
• Is current in its selection of topics
• Is accessible to a wide audience, including mathematics and statistics majors (yes, there are a few
of the latter, and their numbers are growing), prospective engineers and scientists, and business
and social science majors interested in the quantitative aspects of their disciplines
• Illustrates the importance of software for carrying out simulations when answers to questions
cannot be obtained analytically
A number of currently available probability texts are heavily oriented toward a rigorous mathe-
matical development of probability, with much emphasis on theorems, proofs, and derivations. Even
when applied material is included, the scenarios are often contrived (many examples and exercises
involving dice, coins, cards, and widgets). So in our exposition we have tried to achieve a balance
between mathematical foundations and the application of probability to real-world problems. It is our
belief that the theory of probability by itself is often not enough of a “hook” to get students interested
in further work in the subject. We think that the best way to persuade students to continue their
probabilistic education beyond a first course is to show them how the methodology is used in practice.
Let’s first seduce them (figuratively speaking, of course) with intriguing problem scenarios and
applications. Opportunities for exposure to mathematical rigor will follow in due course.
Content
The book begins with an Introduction, which contains our attempt to address the following question:
“Why study probability?” Here we are trying to tantalize students with a number of intriguing
problem scenarios—coupon collection, birth and death processes, reliability engineering, finance,
queuing models, and various conundrums involving the misinterpretation of probabilistic information
(e.g., Benford’s Law and the detection of fraudulent data, birthday problems, and the likelihood of
having a rare disease when a diagnostic test result is positive). Most of the exposition contains
references to recently published results. It is not necessary or even desirable to cover very much of
this motivational material in the classroom. Instead, we suggest that instructors ask their students to
read selectively outside class (a bit of pleasure reading at the very beginning of the term should not be
v
an undue burden!). Subsequent chapters make little reference to the examples herein, and separating
out our “pep talk” should make it easier to cover as little or much as an instructor deems appropriate.
Chapter 1 covers sample spaces and events, the axioms of probability and derived properties,
counting, conditional probability, and independence. Discrete random variables and distributions are
the subject of Chap. 2, and Chap. 3 introduces continuous random variables and their distributions.
Joint probability distributions are the focus of Chap. 4, including marginal and conditional
distributions, expectation of a function of several variables, correlation, modes of convergence, the
Central Limit Theorem, reliability of systems of components, the distribution of a linear combination,
and some results on order statistics. These four chapters constitute the core of the book.
The remaining chapters build on the core in various ways. Chapter 5 introduces methods of
statistical inference—point estimation, the use of statistical intervals, and hypothesis testing. In
Chap. 6 we cover basic properties of discrete-time Markov chains. Various other random processes
and their properties, including stationarity and its consequences, Poisson processes, Brownian
motion, and continuous-time Markov chains, are discussed in Chap. 7. The final chapter presents
some elementary concepts and methods in the area of signal processing.
One feature of our book that distinguishes it from the competition is a section at the end of almost
every chapter that considers simulation methods for getting approximate answers when exact results
are difficult or impossible to obtain. Both the R software and Matlab are employed for this purpose.
Another noteworthy aspect of the book is the inclusion of roughly 1100 exercises; the first four
core chapters together have about 700 exercises. There are numerous exercises at the end of each
section and also supplementary exercises at the end of every chapter. Probability at its heart is
concerned with problem solving. A student cannot hope to really learn the material simply by sitting
passively in the classroom and listening to the instructor. He/she must get actively involved in
working problems. To this end, we have provided a wide spectrum of exercises, ranging from
straightforward to reasonably challenging. It should be easy for an instructor to find enough problems
at various levels of difficulty to keep students gainfully occupied.
Mathematical Level
The challenge for students at this level should be to master the concepts and methods to a sufficient
degree that problems encountered in the real world can be solved. Most of our exercises are of this
type, and relatively few ask for proofs or derivations. Consequently, the mathematical prerequisites
and demands are reasonably modest. Mathematical sophistication and quantitative reasoning ability
are, of course, crucial to the enterprise. Univariate calculus is employed in the continuous distribution
calculations of Chap. 3 as well as in obtaining maximum likelihood estimators in the inference
chapter. But even here the functions we ask students to work with are straightforward—generally
polynomials, exponentials, and logs. A stronger background is required for the signal processing
material at the end of the book (we have included a brief mathematical appendix as a refresher for
relevant properties). Multivariate calculus is used in the section on joint distributions in Chap. 4 and
thereafter appears rather rarely. Exposure to matrix algebra is needed for the Markov chain material.
Recommended Coverage
Our book contains enough material for a year-long course, though we expect that many instructors
will use it for a single term (one semester or one quarter). To give a sense of what might be
reasonable, we now briefly describe three courses at our home institution, Cal Poly State University
(in San Luis Obispo, CA), for which this book is appropriate. Syllabi with expanded course outlines
are available for download on the book’s website at Springer.com.
vi Preface
Title: Introduction to
Probability and
Simulation
Introduction to Probability
Models
Probability and Random Processes
for Engineers
Main
audience:
Statistics and math
majors
Statistics and math majors Electrical and computer
engineering majors
Prerequisites: Univariate calculus,
computer
programming
Univariate calculus, computer
programming, matrix algebra
Multivariate calculus, continuous-
time signals incl. Fourier analysis
Sections
covered:
1.1–1.6 1.1–1.6 1.1–1.5
2.1–2.6, 2.8 2.1–2.5, 2.8 2.1–2.5
3.1–3.4, 3.8 3.1–3.4, 3.8 3.1–3.5
4.1–4.3, 4.5 4.1–4.3, 4.5, 4.8 4.1–4.3, 4.5, 4.7
6.1–6.5 7.1–7.3, 7.5–7.6
7.5 8.1–8.2
Both of the first two courses place heavy emphasis on computer simulation of random phenomena;
instructors typically have students work in R. As is evident from the lists of sections covered,
Introduction to Probability Models takes the earlier material at a faster pace in order to leave a few
weeks at the end for Markov chains and some other applications (typically reliability theory and a bit
about Poisson processes). In our experience, the computer programming prerequisite is essential for
students’ success in those two courses.
The third course listed, Probability and Random Processes for Engineers, is our university’s
version of the traditional “random signals and noise” course offered by many electrical engineering
departments. Again, the first four chapters are covered at a somewhat accelerated pace, with about
30–40% of the course dedicated to time and frequency representations of random processes (Chaps. 7
and 8). Simulation of random phenomena is not emphasized in our course, though we make liberal
use of Matlab for demonstrations.
We are able to cover as much material as indicated on the foregoing syllabi with the aid of a not-
so-secret weapon: we prepare and require that students bring to class a course booklet. The booklet
contains most of the examples we present as well as some surrounding material. A typical example
begins with a problem statement and then poses several questions (as in the exercises in this book).
After each posed question there is some blank space so the student can either take notes as the solution
is developed in class or else work the problem on his/her own if asked to do so. Because students have
a booklet, the instructor does not have to write as much on the board as would otherwise be necessary
and the student does not have to do as much writing to take notes. Both the instructor and the students
benefit.
We also like to think that students can be asked to read an occasional subsection or even section on
their own and then work exercises to demonstrate understanding, so that not everything needs to be
presented in class. For example, we have found that assigning a take-home exam problem that
requires reading about the Weibull and/or lognormal distributions is a good way to acquaint students
with them. But instructors should always keep in mind that there is never enough time in a course of
any duration to teach students all that we’d like them to know. Hopefully students will like the book
enough to keep it after the course is over and use it as a basis for extending their knowledge of
probability!
Preface vii
Acknowledgments
We gratefully acknowledge the plentiful feedback provided by the following reviewers: Allan Gut,
Murad Taqqu, Mark Schilling and Robert Heiny.
We very much appreciate the production services provided by the folks at SPi Technologies. Our
production editors, Sasireka. K and Maria David did a first-rate job of moving the book through the
production process and were always prompt and considerate in communications with us. Thanks to
our copyeditors at SPi for employing a light touch and not taking us too much to task for our
occasional grammatical and stylistic lapses. The staff at Springer U.S. has been especially supportive
during both the developmental and production stages; special kudos go to Michael Penn and Rebekah
McClure.
A Final Thought
It is our hope that students completing a course taught from this book will feel as passionately about
the subject of probability as we still do after so many years of living with it. Only teachers can really
appreciate how gratifying it is to hear from a student after he/she has completed a course that the
experience had a positive impact and maybe even affected a career choice.
San Luis Obispo, CA Matthew A. Carlton
San Luis Obispo, CA Jay L. Devore
viii Preface
Contents
1 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Sample Spaces and Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The Sample Space of an Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Some Relations from Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.4 Exercises: Section 1.1 (1–12) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Axioms, Interpretations, and Properties of Probability . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Interpreting Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2.2 More Probability Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.3 Determining Probabilities Systematically . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.4 Equally Likely Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2.5 Exercises: Section 1.2 (13–30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3 Counting Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.1 The Fundamental Counting Principle . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3.2 Tree Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.3.3 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.3.4 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.5 Exercises: Section 1.3 (31–49) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.4.1 The Definition of Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . 30
1.4.2 The Multiplication Rule for P(A \ B) . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.4.3 The Law of Total Probability and Bayes’ Theorem . . . . . . . . . . . . . . . . 34
1.4.4 Exercises: Section 1.4 (50–78) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
1.5.1 P(A \ B) When Events Are Independent . . . . . . . . . . . . . . . . . . . . . . . . 44
1.5.2 Independence of More than Two Events . . . . . . . . . . . . . . . . . . . . . . . . 45
1.5.3 Exercises: Section 1.5 (79–100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.6 Simulation of Random Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
1.6.1 The Backbone of Simulation: Random Number Generators . . . . . . . . . . 51
1.6.2 Precision of Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.6.3 Exercises: Section 1.6 (101–120) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
1.7 Supplementary Exercises (121–150) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2 Discrete Random Variables and Probability Distributions . . . . . . . . . . . . . . . . . . . . 67
2.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.1.1 Two Types of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
2.1.2 Exercises: Section 2.1 (1–10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
ix
2.2 Probability Distributions for Discrete Random Variables . . . . . . . . . . . . . . . . . . . 71
2.2.1 A Parameter of a Probability Distribution . . . . . . . . . . . . . . . . . . . . . . . 74
2.2.2 The Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.2.3 Another View of Probability Mass Functions . . . . . . . . . . . . . . . . . . . . . 78
2.2.4 Exercises: Section 2.2 (11–28) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.3 Expected Value and Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.3.1 The Expected Value of X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.3.2 The Expected Value of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.3.3 The Variance and Standard Deviation of X . . . . . . . . . . . . . . . . . . . . . . 88
2.3.4 Properties of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.3.5 Exercises: Section 2.3 (29–48) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.4 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.4.1 The Binomial Random Variable and Distribution . . . . . . . . . . . . . . . . . . 97
2.4.2 Computing Binomial Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
2.4.3 The Mean and Variance of a Binomial Random Variable . . . . . . . . . . . . 101
2.4.4 Binomial Calculations with Software . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.4.5 Exercises: Section 2.4 (49–74) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
2.5 The Poisson Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
2.5.1 The Poisson Distribution as a Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
2.5.2 The Mean and Variance of a Poisson Random Variable . . . . . . . . . . . . . 110
2.5.3 The Poisson Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.5.4 Poisson Calculations with Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.5.5 Exercises: Section 2.5 (75–89) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
2.6 Other Discrete Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.6.1 The Hypergeometric Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
2.6.2 The Negative Binomial and Geometric Distributions . . . . . . . . . . . . . . . 117
2.6.3 Alternative Definition of the Negative Binomial Distribution . . . . . . . . . 120
2.6.4 Exercises: Section 2.6 (90–106) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
2.7 Moments and Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
2.7.1 The Moment Generating Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
2.7.2 Obtaining Moments from the MGF . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2.7.3 MGFs of Common Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2.7.4 Exercises: Section 2.7 (107–128) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2.8 Simulation of Discrete Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.8.1 Simulations Implemented in R and Matlab . . . . . . . . . . . . . . . . . . . . . . 134
2.8.2 Simulation Mean, Standard Deviation, and Precision . . . . . . . . . . . . . . . 135
2.8.3 Exercises: Section 2.8 (129–141) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2.9 Supplementary Exercises (142–170) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
3 Continuous Random Variables and Probability Distributions . . . . . . . . . . . . . . . . . 147
3.1 Probability Density Functions and Cumulative Distribution Functions . . . . . . . . . 147
3.1.1 Probability Distributions for Continuous Variables . . . . . . . . . . . . . . . . . 148
3.1.2 The Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . . . 152
3.1.3 Using F(x) to Compute Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.1.4 Obtaining f(x) from F(x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
3.1.5 Percentiles of a Continuous Distribution . . . . . . . . . . . . . . . . . . . . . . . . 156
3.1.6 Exercises: Section 3.1 (1–18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
3.2 Expected Values and Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . 162
3.2.1 Expected Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
3.2.2 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
3.2.3 Exercises: Section 3.2(19–38) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
x Contents
3.3 The Normal (Gaussian) Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
3.3.1 The Standard Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
3.3.2 Non-standardized Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 175
3.3.3 The Normal MGF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
3.3.4 The Normal Distribution and Discrete Populations . . . . . . . . . . . . . . . . . 179
3.3.5 Approximating the Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . 180
3.3.6 Normal Distribution Calculations with Software . . . . . . . . . . . . . . . . . . 182
3.3.7 Exercises: Section 3.3 (39–70) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
3.4 The Exponential and Gamma Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
3.4.1 The Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
3.4.2 The Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
3.4.3 The Gamma MGF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
3.4.4 Gamma and Exponential Calculations with Software . . . . . . . . . . . . . . . 193
3.4.5 Exercises: Section 3.4 (71–83) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
3.5 Other Continuous Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
3.5.1 The Weibull Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
3.5.2 The Lognormal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
3.5.3 The Beta Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
3.5.4 Exercises: Section 3.5 (84–100) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
3.6 Probability Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
3.6.1 Sample Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
3.6.2 A Probability Plot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
3.6.3 Departures from Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
3.6.4 Beyond Normality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
3.6.5 Probability Plots in Matlab and R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
3.6.6 Exercises: Section 3.6 (101–111) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
3.7 Transformations of a Random Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
3.7.1 Exercises: Section 3.7 (112–128) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
3.8 Simulation of Continuous Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
3.8.1 The Inverse CDF Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
3.8.2 The Accept–Reject Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
3.8.3 Built-In Simulation Packages for Matlab and R . . . . . . . . . . . . . . . . . . . 227
3.8.4 Precision of Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
3.8.5 Exercises: Section 3.8 (129–139) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
3.9 Supplementary Exercises (140–172) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
4 Joint Probability Distributions and Their Applications . . . . . . . . . . . . . . . . . . . . . . 239
4.1 Jointly Distributed Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.1.1 The Joint Probability Mass Function for Two Discrete
Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
4.1.2 The Joint Probability Density Function for Two Continuous
Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
4.1.3 Independent Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
4.1.4 More Than Two Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 246
4.1.5 Exercises: Section 4.1 (1–22) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
4.2 Expected Values, Covariance, and Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . 255
4.2.1 Properties of Expected Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256
4.2.2 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
4.2.3 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Contents xi
4.2.4 Correlation Versus Causation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
4.2.5 Exercises: Section 4.2 (23–42) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
4.3 Properties of Linear Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
4.3.1 The PDF of a Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268
4.3.2 Moment Generating Functions for Linear Combinations . . . . . . . . . . . . . 270
4.3.3 Exercises: Section 4.3 (43–65) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
4.4 Conditional Distributions and Conditional Expectation . . . . . . . . . . . . . . . . . . . . 277
4.4.1 Conditional Distributions and Independence . . . . . . . . . . . . . . . . . . . . . 279
4.4.2 Conditional Expectation and Variance . . . . . . . . . . . . . . . . . . . . . . . . . . 280
4.4.3 The Laws of Total Expectation and Variance . . . . . . . . . . . . . . . . . . . . . 281
4.4.4 Exercises: Section 4.4 (66–84) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286
4.5 Limit Theorems (What Happens as n Gets Large) . . . . . . . . . . . . . . . . . . . . . . . . 290
4.5.1 Random Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
4.5.2 The Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
4.5.3 Other Applications of the Central Limit Theorem . . . . . . . . . . . . . . . . . 297
4.5.4 The Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299
4.5.5 Exercises: Section 4.5 (85–102) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
4.6 Transformations of Jointly Distributed Random Variables . . . . . . . . . . . . . . . . . . 302
4.6.1 The Joint Distribution of Two New Random Variables . . . . . . . . . . . . . 303
4.6.2 The Joint Distribution of More Than Two New Variables . . . . . . . . . . . 306
4.6.3 Exercises: Section 4.6 (103–110) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
4.7 The Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309
4.7.1 Conditional Distributions of X and Y . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
4.7.2 Regression to the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
4.7.3 The Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 312
4.7.4 Bivariate Normal Calculations with Software . . . . . . . . . . . . . . . . . . . . 313
4.7.5 Exercises: Section 4.7 (111–120) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
4.8 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
4.8.1 The Reliability Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
4.8.2 Series and Parallel Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317
4.8.3 Mean Time to Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
4.8.4 Hazard Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
4.8.5 Exercises: Section 4.8 (121–132) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
4.9 Order Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
4.9.1 The Distributions of Yn and Y1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
4.9.2 The Distribution of the ith Order Statistic . . . . . . . . . . . . . . . . . . . . . . . 328
4.9.3 The Joint Distribution of the n Order Statistics . . . . . . . . . . . . . . . . . . . 329
4.9.4 Exercises: Section 4.9 (133–142) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
4.10 Simulation of Joint Probability Distributions and System Reliability . . . . . . . . . . 332
4.10.1 Simulating Values from a Joint PMF . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
4.10.2 Simulating Values from a Joint PDF . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
4.10.3 Simulating a Bivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . 336
4.10.4 Simulation Methods for Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338
4.10.5 Exercises: Section 4.10 (143–153) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
4.11 Supplementary Exercises (154–192) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342
5 The Basics of Statistical Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 351
5.1 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352
5.1.1 Estimates and Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354
5.1.2 Assessing Estimators: Accuracy and Precision . . . . . . . . . . . . . . . . . . . . 357
5.1.3 Exercises: Section 5.1 (1–23) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360
xii Contents
5.2 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
5.2.1 Some Properties of MLEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372
5.2.2 Exercises: Section 5.2 (24–36) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
5.3 Confidence Intervals for a Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
5.3.1 A Confidence Interval for a Normal Population Mean . . . . . . . . . . . . . . 376
5.3.2 A Large-Sample Confidence Interval for μ . . . . . . . . . . . . . . . . . . . . . . 380
5.3.3 Software for Confidence Interval Calculation . . . . . . . . . . . . . . . . . . . . . 381
5.3.4 Exercises: Section 5.3 (37–50) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
5.4 Testing Hypotheses About a Population Mean . . . . . . . . . . . . . . . . . . . . . . . . . . 386
5.4.1 Hypotheses and Test Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386
5.4.2 Test Procedures for Hypotheses About a Population Mean μ . . . . . . . . . 388
5.4.3 P-Values and the One-Sample t Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
5.4.4 Errors in Hypothesis Testing and the Power of a Test . . . . . . . . . . . . . . 392
5.4.5 Software for Hypothesis Test Calculation . . . . . . . . . . . . . . . . . . . . . . . 395
5.4.6 Exercises: Section 5.4 (51–76) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396
5.5 Inferences for a Population Proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.5.1 Confidence Intervals for p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401
5.5.2 Hypothesis Testing for p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
5.5.3 Software for Inferences about p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
5.5.4 Exercises: Section 5.5 (77–97) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405
5.6 Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
5.6.1 The Posterior Distribution of a Parameter . . . . . . . . . . . . . . . . . . . . . . . 410
5.6.2 Inferences from the Posterior Distribution . . . . . . . . . . . . . . . . . . . . . . . 413
5.6.3 Further Comments on Bayesian Inference . . . . . . . . . . . . . . . . . . . . . . . 413
5.6.4 Exercises: Section 5.6 (98–106) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
5.7 Supplementary Exercises (107–138) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416
6 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
6.1 Terminology and Basic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423
6.1.1 The Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
6.1.2 Exercises: Section 6.1 (1–10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428
6.2 The Transition Matrix and the Chapman–Kolmogorov Equations . . . . . . . . . . . . 431
6.2.1 The Transition Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431
6.2.2 Computation of Multistep Transition Probabilities . . . . . . . . . . . . . . . . . 432
6.2.3 Exercises: Section 6.2 (11–22) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
6.3 Specifying an Initial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440
6.3.1 A Fixed Initial State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443
6.3.2 Exercises: Section 6.3 (23–30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444
6.4 Regular Markov Chains and the Steady-State Theorem . . . . . . . . . . . . . . . . . . . . 446
6.4.1 Regular Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
6.4.2 The Steady-State Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448
6.4.3 Interpreting the Steady-State Distribution . . . . . . . . . . . . . . . . . . . . . . . 450
6.4.4 Efficient Computation of Steady-State Probabilities . . . . . . . . . . . . . . . . 451
6.4.5 Irreducible and Periodic Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453
6.4.6 Exercises: Section 6.4 (31–43) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454
6.5 Markov Chains with Absorbing States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457
6.5.1 Time to Absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458
6.5.2 Mean Time to Absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461
6.5.3 Mean First Passage Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465
6.5.4 Probabilities of Eventual Absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . 466
6.5.5 Exercises: Section 6.5 (44–58) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 469
Contents xiii
6.6 Simulation of Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
6.6.1 Exercises: Section 6.6 (59–66) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
6.7 Supplementary Exercises (67–82) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
7 Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
7.1 Types of Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
7.1.1 Classification of Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 493
7.1.2 Random Processes Regarded as Random Variables . . . . . . . . . . . . . . . . 493
7.1.3 Exercises: Section 7.1 (1–10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
7.2 Properties of the Ensemble: Mean and Autocorrelation Functions . . . . . . . . . . . . 496
7.2.1 Mean and Variance Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496
7.2.2 Autocovariance and Autocorrelation Functions . . . . . . . . . . . . . . . . . . . 499
7.2.3 The Joint Distribution of Two Random Processes . . . . . . . . . . . . . . . . . 502
7.2.4 Exercises: Section 7.2 (11–24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503
7.3 Stationary and Wide-Sense Stationary Processes . . . . . . . . . . . . . . . . . . . . . . . . 504
7.3.1 Properties of Wide-Sense Stationary Processes . . . . . . . . . . . . . . . . . . . 508
7.3.2 Ergodic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
7.3.3 Exercises: Section 7.3 (25–40) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514
7.4 Discrete-Time Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
7.4.1 Special Discrete Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518
7.4.2 Exercises: Section 7.4 (41–52) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 520
7.5 Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522
7.5.1 Relation to Exponential and Gamma Distributions . . . . . . . . . . . . . . . . . 524
7.5.2 Combining and Decomposing Poisson Processes . . . . . . . . . . . . . . . . . . 526
7.5.3 Alternative Definition of a Poisson Process . . . . . . . . . . . . . . . . . . . . . . 528
7.5.4 Nonhomogeneous Poisson Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
7.5.5 The Poisson Telegraphic Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531
7.5.6 Exercises: Section 7.5 (53–72) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
7.6 Gaussian Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535
7.6.1 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
7.6.2 Brownian Motion as a Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
7.6.3 Further Properties of Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . 538
7.6.4 Variations on Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
7.6.5 Exercises: Section 7.6 (73–85) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
7.7 Continuous-Time Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
7.7.1 Infinitesimal Parameters and Instantaneous Transition Rates . . . . . . . . . . 546
7.7.2 Sojourn Times and Transitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
7.7.3 Long-Run Behavior of Continuous-Time Markov Chains . . . . . . . . . . . . 552
7.7.4 Explicit Form of the Transition Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 554
7.7.5 Exercises: Section 7.7 (86–97) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
7.8 Supplementary Exercises (98–114) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
8 Introduction to Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
8.1 Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
8.1.1 Properties of the Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . 566
8.1.2 Power in a Frequency Band . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
8.1.3 White Noise Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
8.1.4 Power Spectral Density for Two Processes . . . . . . . . . . . . . . . . . . . . . . 572
8.1.5 Exercises: Section 8.1 (1–21) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573
8.2 Random Processes and LTI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
8.2.1 Statistical Properties of the LTI System Output . . . . . . . . . . . . . . . . . . . 577
xiv Contents
8.2.2 Ideal Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580
8.2.3 Signal Plus Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
8.2.4 Exercises: Section 8.2 (22–38) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
8.3 Discrete-Time Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
8.3.1 Random Sequences and LTI Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 591
8.3.2 Random Sequences and Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593
8.3.3 Exercises: Section 8.3 (39–50) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
Appendix A: Statistical Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
Appendix B: Background Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
Appendix C: Important Probability Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
Answers to Odd-Numbered Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
Contents xv
Introduction: Why Study Probability?
Some of you may enjoy mathematics for its own sake—it is a beautiful subject which provides many
wonderful intellectual challenges. Of course students of philosophy would say the same thing about
their discipline, ditto for students of linguistics, and so on. However, many of us are not satisfied just
with aesthetics and mental gymnastics. We want what we’re studying to have some utility, some
applicability to real-world problems. Fortunately, mathematics in general and probability in particu-
lar provide a plethora of tools for answering important professional and societal questions. In this
section, we’ll attempt to provide some preliminary motivation before forging ahead.
The initial development of probability as a branch of mathematics goes back over 300 years, where
it had its genesis in connection with questions involving games of chance. One of the earliest recorded
instances of probability calculation appeared in correspondence between the two very famous
mathematicians, Blaise Pascal and Pierre de Fermat. The issue was which of the following two
outcomes of die-tossing was more favorable to a bettor: (1) getting at least one 6 in four rolls of a fair
die (“fair” here means that each of the six outcomes 1, 2, 3, 4, 5, and 6 is equally likely to occur) or
(2) getting at least one pair of 6s when two fair dice are rolled 24 times in succession. By the end of
Chap. 1, you shouldn’t have any difficulty showing that there is a slightly better than 50-50 chance of
(1) occurring, whereas the odds are slightly against (2) occurring.
Games of chance have continued to be a fruitful area for the application of probability methodol-
ogy. Savvy poker players certainly need to know the odds of being dealt various hands, such as a full
house or straight (such knowledge is necessary but not at all sufficient for achieving success in card
games, as such endeavors also involve much psychology). The same holds true for the game of
blackjack. In fact, in 1962 the mathematics professor Edward O. Thorp published the book Beat theDealer; in it he employed probability arguments to show that as cards were dealt sequentially from a
deck, there were situations in which the likelihood of success favored the player rather than the dealer.
Because of this work, casinos changed the way cards were dealt in order to prevent card-counting
strategies from bankrupting them. A recent variant of this is described in the paper “Card Counting in
Continuous Time” (Journal of Applied Probability, 2012: 184-198), in which the number of decks
utilized is large enough to justify the use of a continuous approximation to find an optimal betting
strategy.
In the last few decades, game theory has developed as a significant branch of mathematics devoted
to the modeling of competition, cooperation, and conflict. Much of this work involves the use of
probability properties, with applications in such diverse fields as economics, political science, and
biology. However, especially over the course of the last 60 years, the scope of probability applications
has expanded way beyond gambling and games. In this section, we present some contemporary
examples of how probability is being used to solve important problems.
xvii
Software Use in Probability
Modern probability applications often require the use of a calculator or software. Of course, we rely
on machines to perform every conceivable computation from adding numbers to evaluating definite
integrals. Many calculators and most computer software packages even have built-in functions that
make a number of specific probability calculations more convenient; we will highlight these through-
out the text. But the real utility of modern software comes from its ability to simulate random
phenomena, which proves invaluable in the analysis of very complicated probability models. We will
introduce the key elements of probability simulation in Sect. 1.7 and then revisit simulation in a
variety of settings throughout the book.
Numerous software packages can be used to implement a simulation. We will focus on two:
Matlab and R. Matlab is a powerful engineering software package published by MathWorks; many
universities and technology companies have a license for Matlab. A freeware package called Octave
has been designed to implement the majority of Matlab functions using identical syntax; consult
http://www.gnu.org/software/octave/. (Readers using Mac OS or Windows rather than GNU/Linux
will find links to compatible versions of Octave on this same website.) R is a freeware statistical
software package maintained by a core user group. The R base package and numerous add-ons are
available at http://cran.r-project.org/.
Throughout this textbook, we will provide side by side Matlab and R code for both probability
computations and simulation. It is not the goal, however, to serve as a primer in either language
(certainly, some prior knowledge of elementary programming is required). Both software packages
have extensive help menus and active online user support groups. Readers interested in a more
thorough treatment of these software packages should consultMatlab Primer by Timothy A. Davis or
The R Book by Michael J. Crawley.
Modern Application of Classic Probability Problems
The coupon collector problem has been well known for decades in the probability community. As an
example, suppose each box of a certain type of cereal contains a small toy. The manufacturer of this
cereal has included a total of ten toys in its cereal boxes, with each box being equally likely to yield
one of the ten toys. Suppose you want to obtain a complete set of these toys for a young relative or
friend. Clearly you will have to purchase at least ten boxes, and intuitively it would seem as though
you might have to purchase many more than that. How many boxes would you expect to have to
purchase in order to achieve your goal? Methods from Chap. 4 can be used to show that the average
number of boxes required is 10(1 + 1/2 + 1/3 + � � � + 1/10). If instead there are n toys, then
n replaces 10 in this expression. And when n is large, more sophisticated mathematical arguments
yield the approximation n(ln(n) + .58).
The article “A Generalized Coupon Collector Problem” (Journal of Applied Probability, 2011:1081-1094) mentions applications of the classic problem to dynamic resource allocation, hashing in
computer science, and the analysis of delays in certain wireless communication channels (in this latter
application, there are n users, each receiving packets of data from a transmitter). The generalization
considered in the article involves each cereal box containing d different toys with the purchaser then
selecting the least collected toy thus far. The expected number of purchases to obtain a complete
collection is again investigated, with special attention to the case of n being quite large. An
application to the wireless communication scenario is mentioned.
xviii Introduction: Why Study Probability?
Applications to Business
The article “Newsvendor-Type Models with Decision-Dependent Uncertainty” (MathematicalMethods of Operations Research, 2012, published online) begins with an overview of a class of
decision problems involving uncertainty. In the classical newsvendor problem, a seller has to choose
the amount of inventory to obtain at the beginning of a selling season. This ordering decision is made
only once, with no opportunity to replenish inventory during the season. The amount of demand D is
uncertain (what we will call in Chap. 2 a random variable). The cost of obtaining inventory is c perunit ordered, the sale price is r per unit, and any unsold inventory at the end of the season has a
salvage value of v per unit. The optimal policy, that which maximizes expected profit, is easily
characterized in terms of the probability distribution of D (this distribution specifies how likely it is
that various values of D will occur).
In the revenue management problem, there are S units of inventory to sell. Each unit is sold for a
price of either r1 or r2 (r1 > r2). During the first phase of the selling season, customers arrive who will
buy at the price r2 but not at r1. In the second phase, customers arrive who will pay the higher price.
The seller wishes to know how much of the initial inventory should be held in reserve for the second
phase. Again the general form of the optimal policy that maximizes expected profit is easily
determined in terms of the distributions for demands in the two periods. The article cited in the
previous paragraph goes on to consider situations in which the distribution(s) of demand(s) must be
estimated from data and how such estimation affects decision making.
A cornerstone of probabilistic inventory modeling is a general result established more than
50 years ago: Suppose that the amount of inventory of a commodity is reviewed every T time periods
to decide whether more should be ordered. Under rather general conditions, it was shown that the
optimal policy—the policy that minimizes the long-run expected cost—is to order nothing if the
current level of inventory is at least an amount s but to order enough to bring the inventory level up toan amount S if the current level is below s. The values of s and S are determined by various costs, the
price of the commodity, and the nature of demand for the commodity (how customer orders and order
amounts occur over time).
The article “A Periodic-Review Base-Stock Inventory System with Sales Rejection” (OperationsResearch, 2011: 742-753) considers a policy appropriate when backorders are possible and lost sales
may occur. In particular, an order is placed every T time periods to bring inventory up to some level S.Demand for the commodity is filled until the inventory level reaches a sales rejection thresholdM for
some M < S. Various properties of the optimal values of M and S are investigated.
Applications to the Life Sciences
Examples of the use of probability and probabilistic modeling can be found in many subdisciplines of
the life sciences. For example, Pseudomonas syringae is a bacterium which lives in leaf surfaces. The
article “Stochastic Modeling of Pseudomonas Syringae Growth in the Phyllosphere” (MathematicalBiosciences, 2012: 106-116) proposed a probabilistic (synonymous with “stochastic”) model called a
birth and death process with migration to describe the aggregate distribution of such bacteria and
determine the mechanisms which generated experimental data. The topic of birth and death processes
is considered briefly in Chap. 7 of our book.
Another example of such modeling appears in the article “Means and Variances in Stochastic
Multistage Cancer Models” (Journal of Applied Probability, 2012: 590-594). The authors discuss awidely used model of carcinogenesis in which division of a healthy cell may give rise to a healthy cell
and a mutant cell, whereas division of a mutant cell may result in two mutant cells of the same type or
possibly one of the same types and one with a further mutation. The objective is to obtain an
Introduction: Why Study Probability? xix
expression for the expected number of cells at each stage and also a quantitative assessment of how
much the actual number might deviate from what is expected (that is what “variance” does).
Epidemiology is the branch of medicine and public health that studies the causes and spread of
various diseases. Of particular interest to epidemiologists is how epidemics are propagated in one or
more populations. The general stochastic epidemic model assumes that a newly infected individual is
infectious for a random amount of time having an exponential distribution (this distribution is
discussed in Chap. 3) and during this infectious period encounters other individuals at times
determined by a Poisson process (one of the topics in Chap. 7). The article “The Basic ReproductionNumber and the Probability of Extinction for a Dynamic Epidemic Model” (MathematicalBiosciences, 2012: 31-35) considers an extension in which the population of interest consists of a
fixed number of subpopulations. Individuals move between these subpopulations according to a
Markov transition matrix (the subject of Chap. 6) and infectives can only make infectious contact
with members of their current subpopulation. The effect of variation in the infectious period on the
probability that the epidemic ultimately dies out is investigated.
Another approach to the spread of epidemics is based on branching processes. In the simplest such
process, a single individual gives birth to a random number of individuals; each of these in turn gives
birth to a random number of progeny, and so on. The article “The Probability of Containment for
Multitype Branching Process Models for Emerging Epidemics” (Journal of Applied Probability,2011: 173-188) uses a model in which each individual “born” to an existing individual can have one
of a finite number of severity levels of the disease. The resulting theory is applied to construct a
simulation model of how influenza spread in rural Thailand.
Applications to Engineering and Operations Research
We want products that we purchase and systems that we rely on (e.g., communication networks,
electric power grids) to be highly reliable—have long lifetimes and work properly during those
lifetimes. Product manufacturers and system designers therefore need to have testing methods that
will assess various aspects of reliability. In the best of all possible worlds, data bearing on reliability
could be obtained under normal operating conditions. However, this may be very time consuming
when investigating components and products that have very long lifetimes. For this reason, there has
been much research on “accelerated” testing methods which induce failure or degradation in a much
shorter time frame. For products that are used only a fraction of the time in a typical day, such as
home appliances and automobile tires, acceleration might entail operating continuously in time but
under otherwise normal conditions. Alternatively, a sample of units could be subjected to stresses
(e.g., temperature, vibration, voltage) substantially more severe than what is usually experienced.
Acceleration can also be applied to entities in which degradation occurs over time—stiffness of
springs, corrosion of metals, and wearing of mechanical components. In all these cases, probability
models must then be developed to relate lifetime behavior under such acceleration to behavior in
more customary situations. The article “Overview of Reliability Testing” (IEEE Transactions onReliability, 2012: 282-291) gives a survey of various testing methodologies and models. The article
“A Methodology for Accelerated Testing by Mechanical Actuation of MEMS Devices” (Microelec-tronics Reliability, 2012: 1382-1388) applies some of these ideas in the context of predicting lifetimes
for micro-electro-mechanical systems.
An important part of modern reliability engineering deals with building redundancy into various
systems in order to decrease substantially the likelihood of failure. A k-out-of-n:G system works or is
good only if at least k amongst the n constituent components work or are good, whereas a k-out-of-n:Fsystem fails if and only if at least k of the n components fail. The article “Redundancy Issues in
Software and Hardware Systems: An Overview” (Intl. Journal of Reliability, Quality, and SafetyEngineering, 2011: 61-98) surveys these and various other systems that can improve the performance
xx Introduction: Why Study Probability?
of computer software and hardware. The so-called triple modular redundant systems, with 2-out-of-3:
G configuration, are now commonplace (e.g., Hewlett-Packard’s original NonStop server, and a
variety of aero, auto, and rail systems). The article “Reliability of Various 2-Out-of-4:G Redundant
Systems with Minimal Repair” (IEEE Transactions on Reliability, 2012: 170-179) considers using a
Poisson process with time-varying rate function to model how component failures occur over time so
that the rate of failure increases as a component ages; in addition, a component that fails undergoes
repair so that it can be placed back in service. Several failure modes for combined k-out-of-n systems
are studied in the article “Reliability of Combined m-Consecutive-k-out-of-n:F and Consecutive-kc-out-of-n:F Systems” (IEEE Transactions on Reliability, 2012: 215-219); these have applications in
the areas of infrared detecting and signal processing.
A compelling reason for manufacturers to be interested in reliability information about their
products is that they can establish warranty policies and periods that help control costs. Many
warranties are “one dimensional,” typically characterized by an interval of age (time). However,
some warranties are “two dimensional” in that warranty conditions depend on both age and cumula-
tive usage; these are common in the automotive industry. The article “Effect of Use-Rate on System
Lifetime and Failure Models for 2DWarranty” (Intl. Journal of Quality and Reliability Management,2011: 464-482) describes how certain bivariate probability models for jointly describing the behavior
of time and usage can be used to investigate the reliability of various system configurations.
The word queue is used chiefly by the British to mean “waiting line,” i.e., a line of customers or
other entities waiting to be served or brought into service. The mathematical development of models
for how a waiting line expands and contracts as customers arrive at a service facility, enter service,
and then finish began in earnest in the middle part of the 1900s and continues unabated today as new
application scenarios are encountered.
For example, the arrival and service of patients at some type of medical unit are often described by
the notation M/M/s, where the first M signifies that arrivals occur according to a Poisson process, the
second M indicates that the service time of each patient is governed by an exponential probability
distribution, and there are s servers available for the patients. The article “Nurse Staffing in Medical
Units: A Queueing Perspective” (Operations Research, 2011: 1320-1331) proposes an alternative
closed queueing model in which there are s nurses within a single medical unit servicing n patients,
where each patient alternates between requiring assistance and not needing assistance. The perfor-
mance of the unit is characterized by the likelihood that delay in serving a patient needing assistance
will exceed some critical threshold. A staffing rule based on the model and assumptions is developed;
the resulting rule differs significantly from the fixed nurse-to-patient staffing ratios mandated by the
state of California.
A variation on the medical unit situation just described occurs in the context of call centers, where
effective management entails a trade-off between operational costs and the quality of service offered
to customers. The article “Staffing Call Centers with Impatient Customers” (Operations Research,2012: 461-474) considers an M/M/s queue in which customers who have to wait for service may
become frustrated and abandon the facility (don’t you sometimes feel like doing that in a doctor’s
office?). The behavior of such a system when n is large is investigated, with particular attention to thestaffing principle that relates the number of servers to the square root of the workload offered to the
call center.
The methodology of queueing can also be applied to find optimal settings for traffic signals. The
article “Delays at Signalized Intersections with Exhaustive Traffic Control” (Probability in Engi-neering and Informational Sciences, 2012: 337-373) utilizes a “polling model,” which entails
multiple queues of customers (corresponding to different traffic flows) served by a single server in
cyclic order. The proposed vehicle-actuated rule is that traffic lights stay green until all lanes within a
group are emptied. The mean traffic delay is studied for a variety of vehicle interarrival-time
distributions in both light-traffic and heavy-traffic situations.
Introduction: Why Study Probability? xxi
Suppose two different types of customers, primary and secondary, arrive for service at a facility
where the servers have different service rates. How should customers be assigned to the servers? The
article “Managing Queues with Heterogeneous Servers” (Journal of Applied Probability, 2011:435-452) shows that the optimal policy for minimizing mean wait time has a “threshold structure”:
for each server, there is a different threshold such that a primary customer will be assigned to that
server if and only if the queue length of primary customers meets or exceeds the threshold.
Applications to Finance
The most explosive growth in the use of probability theory and methodology over the course of the
last several decades has undoubtedly been in the area of finance. This has provided wonderful career
opportunities for people with advanced degrees in statistics, mathematics, engineering, and physics
(the son-in-law of one of the authors earned a Ph.D. in mechanical engineering and taught for several
years, but then switched to finance). Edward O. Thorp, whom we previously met as the man who
figured out how to beat blackjack, subsequently went on to success in finance, where he earned much
more money managing hedge funds and giving advice than he could ever have hoped to earn in
academia (those of us in academia love it for the intangible rewards we get—psychic income, if
you will).
One of the central results in mathematical finance is the Black-Scholes theorem, named after the
two Nobel-prize-winning economists who discovered it. To get the flavor of what is involved here, a
bit of background is needed. Suppose the present price of a stock is $20 per share, and it is known that
at the end of 1 year, the price will either double to $40 or decrease to $10 per share (where those prices
are expressed in current dollars, i.e., taking account of inflation over the 1-year period). You can enter
into an agreement, called an option contract, that allows you to purchase y shares of this stock (for anyvalue y) 1 year from now for the amount cy (again in current dollars). In addition, right now you can
buy x shares of the stock for 20x with the objective of possibly selling those shares 1 year from now.
The values x and y are both allowed to be negative; if, for example, x were negative, then you would
actually be selling shares of the stock now that you would have to purchase at either a cost of $40 per
share or $10 per share 1 year from now. It can then be shown that there is only one value of c,specifically 50/3, for which the gain from this investment activity is 0 regardless of the choices of
x and y and the value of the stock 1 year from now. If c is anything other than 50/3, then there is an
arbitrage, an investment strategy involving choices of x and y that is guaranteed to result in a
positive gain.
A general result called the Arbitrage Theorem specifies conditions under which a collection of
investments (or bets) has expected return 0 as opposed to there being an arbitrage strategy. The basis
for the Black-Sholes theorem is that the variation in the price of an asset over time is described by a
stochastic process called geometric Brownian motion (see Sect. 7.6). The theorem then specifies a fair
price for an option contract on that asset so that no arbitrage is possible.
Modern quantitative finance is very complex, and many of the basic ideas are unfamiliar to
most novices (like the authors of this text!). It is therefore difficult to summarize the content of
recently published articles as we have done for some other application areas. But a sampling
of recently published titles emphasizes the role of probability modeling. Articles that appeared in
the 2012 Annals of Finance included “Option Pricing Under a Stressed Beta Model” and “Stochastic
Volatility and Stochastic Leverage”; in the 2012 Applied Mathematical Finance, we found
“Determination of Probability Distribution Measures from Market Prices Using the Method of
Maximum Entropy in the Mean” and “On Cross-Currency Models with Stochastic Volatility and
Correlated Interest Rates”; the 2012 Quantitative Finance yielded “Probability Unbiased Value-at-
xxii Introduction: Why Study Probability?
Risk Estimators” and “A Generalized Birth-Death Stochastic Model for High-Frequency Order Book
Dynamics.”
If the application of mathematics to problems in finance is of interest to you, there are now many
excellent masters-level graduate programs in quantitative finance. Entrance to these programs
typically requires a very solid background in undergraduate mathematics and statistics (including
especially the course for which you are using this book). Be forewarned, though, that not all
financially savvy individuals are impressed with the direction in which finance has recently moved.
Former Federal Reserve Chairman Paul Volcker was quoted not long ago as saying that the ATM
cash machine was the most significant financial innovation of the last 20 years; he has been a very
vocal critic of the razzle-dazzle of modern finance.
Probability in Everyday Life
In the hopefully unlikely event that you do not end up using probability concepts and methods in your
professional life, you still need to face the fact that ideas surrounding uncertainty are pervasive in our
world. We now present some amusing and intriguing examples to illustrate this.
The behavioral psychologists Amos Tversky and Daniel Kahneman spent much of their academic
careers carrying out studies to demonstrate that human beings frequently make logical errors when
processing information about uncertainty (Kahneman won a Nobel prize in economics for his work,
and Tversky would surely have also done so had the awards been given posthumously). Consider the
following variant of one Tversky-Kahneman scenario. Which of the following two statements is more
likely?
(A) Dr. D is a former professor.
(B) Dr. D is a former professor who was accused of inappropriate relations with some students,
investigation substantiated the charges, and he was stripped of tenure.
T-K’s research indicated that many people would regard statement B as being more likely, since it
gives a more detailed explanation of why Dr. D is no longer a professor. However, this is incorrect.
Statement B implies statement A. One of our basic probability rules will be that if one event B is
contained in another event A (i.e., if B implies A), then the smaller event B is less likely to occur or
have occurred than the larger event A. After all, other possible explanations for A are that Dr. D is
deceased or that he is retired or that he deserted academia for investment banking—all of those plus B
would figure in to the likelihood of A.
The survey article “Judgment under Uncertainty; Heuristics and Biases” (Science, 1974: 1124-1131) by T-K described a certain town served by two hospitals. In the larger hospital about 45 babies
are born each day, whereas about 15 are born each day in the smaller one. About 50% of births are
boys, but of course the percentage fluctuates from day to day. For a 1-year period, each hospital
recorded days on which more than 60% of babies born were boys. Each of a number of individuals
was then asked which of the following statements he/she thought was correct: (1) the larger hospital
recorded more such days, (2) the smaller hospital recorded more such days, or (3) the number of such
days was about the same for the two hospitals. Of the 95 participants, 21 chose (1), 21 chose (2), and
53 chose (3). In Chap. 5 we present a general result which implies that the correct answer is in fact (2),
because the sample percentage is less likely to stray from the true percentage (in this case about 50%)
when the sample size is larger rather than small.
In case you think that mistakes of this sort are made only by those who are unsophisticated or
uneducated, here is yet another T-K scenario. Each of a sample of 80 physicians was presented with
the following information on treatment for a particular disease:
Introduction: Why Study Probability? xxiii
With surgery, 10% will die during treatment, 32% will die within a year, 66% will die within 5 years. With
radiation, 0% will die during treatment, 23% will die within a year, 78% will die within 5 years.
Each of the 87 physicians in a second sample was presented with the following information:
With surgery, 90% will survive the treatment, 68% will survive at least 1 year, and 34% will survive at least
5 years. With radiation, 100% will survive the treatment, 77% will survive at least 1 year, and 22% will survive
at least 5 years.
When each physician was asked to indicate whether he/she would recommend surgery or radiation
based on the supplied information, 50% of those in the first group said surgery whereas 84% of those
in the second group said surgery.
The distressing thing about this conclusion is that the information provided to the first group of
physicians is identical to that provided to the second group, but described in a slightly different way.
If the physicians were really processing information rationally, there should be no significant
difference between the two percentages.
It would be hard to find a book containing even a brief exposition of probability that did not
contain examples or exercises involving coin tossing. Many such scenarios involve tossing a “fair”
coin, one that is equally likely to result in H (head side up) or T (tail side up) on any particular toss.
Are real coins actually fair, or is there a bias of some sort? Various analyses have shown that the result
of a coin toss is predicable at least to some degree if initial conditions (position, velocity, angular
momentum) are known. In practice, most people who toss coins (e.g., referees in a football game
trying to determine which team will kick off and which will receive) are not conversant in the physics
of coin tossing. The mathematician and statistician Persi Diaconis, who was a professional magician
for 10 years prior to earning his Ph.D. and mastered many coin and card tricks, has engaged in
ongoing collaboration with other researchers to study coin tossing. One result of these investigations
was the conclusion based on physics that for a caught coin, there is a slight bias toward heads—about
.51 versus .49. It is not, however, clear under which real-world circumstances this or some other bias
will occur.
Simulation of fair-coin tossing can be done using a random number generator available in many
software packages (about which we’ll say more shortly). If the resulting random number is between
0 and .5, we say that the outcome of the toss was H, and if the number is between .5 and 1, then a T
occurred (there is an obvious modification of this to incorporate bias). Now consider the following
sequence of 200 Hs and Ts:
THTHTTTHTTTTTHTHTTTHTTHHHTHHTHTHTHTTTTHHTTHHTTHHHTHHHTTHHHTTTHHHTHHHHTTTHTHTHHHHTHTTTHHHTHHTHTTTHHTHHHTHHHHTTHTHHTHHHTTTHTHHHTHHTTTHHHTTTTHHHTHTHHHHTHTTHHTTTTHTHTHTTHTHHTTHTTTHTTTTHHHHTHTHHHTTHHHHHTHH
Did this sequence result from actually tossing a fair coin (equivalently, using computer simulation
as described), or did it come from someone who was asked to write down a sequence of 200 Hs and Ts
that he/she thought would come from tossing a fair coin? One way to address this question is to focus
on the longest run of Hs in the sequence of tosses. This run is of length 4 for the foregoing sequence.
Probability theory tells us that the expected length of the longest run in a sequence of n fair-coin
tosses is approximately log2(n) � 2/3. For n = 200, this formula gives an expected longest run of
length about 7. It can also be shown that there is less than a 10% chance that the longest run will have
a length of 4 or less. This suggests that the given sequence is fictitious rather than real, as in fact was
the case; see the very nice expository article “The Longest Run of Heads” (Mathematics Magazine,1990, 196-207).
As another example, consider giving a fair coin to each of the two authors of this textbook. Carlton
tosses his coin repeatedly until obtaining the sequence HTT. Devore tosses his coin repeatedly until
the sequence HTH is observed. Is Carlton’s expected number of tosses to obtain his desired sequence
xxiv Introduction: Why Study Probability?
the same as Devore’s, or is one expected number of tosses smaller than the other? Most students to
whom we have asked these questions initially answered that the two expected numbers should be the
same. But this is not true. Some rather tricky probability arguments can be used to show that Carlton’s
expected number of tosses is eight, whereas Devore expects to have to make ten tosses. Very
surprising, no? A bit of intuition makes this more plausible. Suppose Carlton merrily tosses away
until at some point he has just gotten HT. So he is very excited, thinking that just one more toss will
enable him to stop tossing the coin and move on to some more interesting pursuit. Unfortunately his
hopes are dashed because the next toss is an H. However, all is not lost, as even though he must
continue tossing, at this point he is partway toward reaching his goal of HTT. If Devore sees HT at
some point and gets excited by light at the end of the tunnel but then is crushed by the appearance of a
T rather than an H, he essentially has to start over again from scratch. The charming nontechnical
book Probabilities: The Little Numbers That Rule Our Lives by Peter Olofsson has more detail on this
and other probability conundrums.
One of the all-time classic probability puzzles that stump most people is called the BirthdayProblem. Consider a group of individuals, all of whom were born in the same year (one that did not
have a February 29). If the group size is 400, how likely is it that at least two members of the group
share the same birthday? Hopefully a moment’s reflection will bring you to the realization that a
shared birthday here is a sure thing (100% chance), since there are only 365 possible birthdays for the
400 people. On the other hand, it is intuitively quite unlikely that there will be a shared birthday if the
group size is only five; in this case we would expect that all five individuals would have different
birthdays.
Clearly as the group size increases, it becomes more likely that two or more individuals will have
the same birthday. So how large does the group size have to be in order for it to be more likely than
not that at least two people share a birthday (i.e., that the likelihood of a shared birthday is more than
50%)? Which one of the following four group-size categories do you believe contains the correct
answer to this question?
(1) At least 100 (2) At least 50 but less than 100
(3) At least 25 but less than 50 (4) Fewer than 25
When we have asked this of students in our classes, a substantial majority opted for the first two
categories. Very surprisingly, the correct answer is category (4). In Chapter 1 we will show that with
as few as 23 people in the group, it is a bit more likely than not that at least two group members will
have the same birthday.
Two people having the same birthday implies that they were born within 24 h of one another, but
the converse is not true; e.g., one person might be born just before midnight on a particular day and
another person just after midnight on the next day. This implies that it is more likely that two people
will have been born within 24 h of one another than it is that they have the same birthday. It follows
that a smaller group size than 23 is needed to make it more likely than not that at least two people will
have been born within 24 h of one another. In Sect. 4.9 we show how this group size can be
determined.
Two people in a group having the same birthday is an example of a coincidence, an accidental and
seemingly surprising occurrence of events. The fact that even for a relatively small group size it is
more likely than not that this coincidence will occur should suggest that coincidences are often not as
surprising as they might seem. This is because even if a particular coincidence (e.g., “graduated from
the same high school” or “visited the same small town in Croatia”) is quite unlikely, there are so many
opportunities for coincidences that quite a few are sure to occur.
Introduction: Why Study Probability? xxv
Back to the follies of misunderstanding medical information: Suppose the incidence rate of a
particular disease in a certain population is 1 in 1000. The presence of the disease cannot be detected
visually, but a diagnostic test is available. The diagnostic test correctly detects 98% of all diseased
individuals (this is the sensitivity of the test, its ability to detect the presence of the disease), and 93%of non-diseased individuals test negative for the disease (this is the specificity of the test, an indicatorof how specific the test is to the disease under consideration). Suppose a single individual randomly
selected from the population is given the test and the test result is positive. In light of this information,
how likely is it that the individual will have the disease?
First note that if the sensitivity and the specificity were both 100%, then it would be a sure thing
that the selected individual has the disease. The reason this is not a sure thing is that the test
sometimes makes mistakes. Which one of the following five categories contains the actual likelihood
of having the disease under the described conditions?
1. At least a 75% chance (quite likely)
2. At least 50% but less than 75% (moderately likely)
3. At least 25% but less than 50% (somewhat likely)
4. At least 10% but less than 25% (rather unlikely)
5. Less than 10% (quite unlikely)
Student responses to this question have overwhelmingly been in categories (1) or (2)—another
case of intuition going awry. The correct answer turns out to be category (5). In fact, even in
light of the positive test result, there is still only a bit more than a 1% chance that the individual is
diseased!
What is the explanation for this counterintuitive result? Suppose we start with 100,000 individuals
from the population. Then we’d expect 100 of those, or 100, to be diseased (from the 1 in 1000
incidence rate) and 99,900 to be disease free. From the 100 we expect to be diseased, we’d expect
98 positive test results (98% sensitivity). And from the 99,900 we expect to be disease free, we’d
expect 7% of those or 6993 to yield positive test results. Thus we expect many more false positives
than true positives. This is because the disease is quite rare and the diagnostic test is rather good but
not stunningly so. (In case you think our sensitivity and specificity are low, consider a certain D-dimer
test for the presence of a coronary embolism; its sensitivity and specificity are 88% and 75%,
respectively.)
Later in Chapter 1 (Example 1.31) we develop probability rules which can be used to show that the
posterior probability of having the disease conditional on a positive test result is .0138—a bit over
1%. This should make you very cautious about interpreting the results of diagnostic tests. Before you
panic in light of a positive test result, you need to know the incidence rate for the condition under
consideration and both the sensitivity and specificity of the test. There are also implications for
situations involving detection of something other than a disease. Consider airport procedures that are
used to detect the presence of a terrorist. What do you think is the incidence rate of terrorists at a given
airport, and how sensitive and specific do you think detection procedures are? The overwhelming
number of positive test results will be false, greatly inconveniencing those who test positive!
Here’s one final example of probability applied in everyday life: One of the following columns
contains the value of the closing stock index as of August 8, 2012, for each of a number of countries,
and the other column contains fake data obtained with a random number generator. Just by looking at
the numbers, without considering context, can you tell which column is fake and which is real?
xxvi Introduction: Why Study Probability?
China 2264 3058
Japan 8881 9546
Britain 5846 7140
Canada 11,781 6519
Euro area 797 511
Austria 2053 4995
France 3438 2097
Germany 6966 4628
Italy 14,665 8461
Spain 722 598
Norway 480 1133
Russia 1445 4100
Sweden 1080 2594
Turkey 64,699 35,027
Hong Kong 20,066 42,182
India 17,601 3388
Pakistan 14,744 10,076
Singapore 3052 5227
Thailand 1214 7460
Argentina 2459 2159
⋮ ⋮ ⋮
The key to answering this question is a result called Benford’s Law. Suppose you start reading
through a particular issue of a publication like the New York Times or The Economist, and each time
you encounter any number (the amount of donations to a particular political candidate, the age of an
actor, the number of members of a union, and so on), you record the first digit of that number. Possible
first digits are 1, 2, 3, . . ., or 9. In the long run, how frequently do you think each of these nine possible
first digits will be encountered? Your first thought might be that each one should have the same long-
run frequency, 1/9 (roughly 11%). But for many sets of numbers this turns out not to be the case.
Instead the long-run frequency is given by the formula log10[(x + 1)/x], which gives .301, .176, .125,
. . ., .051, .046 for x = 1, 2, 3, . . ., 8, 9. Thus a leading digit is much more likely to be 1, 2, or 3 than
7, 8, or 9.
Examination of the foregoing lists of numbers shows that the first column conforms much more
closely to Benford’s Law than does the second column. In fact, the first column is real, whereas the
second one is fake. For Benford’s Law to be valid, it is generally required that the set of numbers
under consideration span several orders of magnitude. It does not work, for example, with batting
averages of major league baseball players, most of which are between .200 and .299, or with fuel
efficiency ratings (miles per gallon) for automobiles, most of which are currently between 15 and 30.
Benford’s Law has been employed to detect fraud in accounting reports, and in particular to detect
fraudulent tax returns. So beware when you file your taxes next year!
This list of amusing probability appetizers could be continued for quite a while. Hopefully what
we have shown thus far has sparked your interest in knowing more about the discipline. So without
further ado . . .
Introduction: Why Study Probability? xxvii
top related