Top Banner
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed Random Variables Scalar detection EE 278B: Probability and Random Variables 1–1 Probability Theory Probability theory provides the mathematical rules for assigning probabilities to outcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage Basic elements of probability theory: Sample space Ω: set of all possible “elementary” or “finest grain” outcomes of the random experiment Set of events F : set of (all?) subsets of Ω — an event A Ω occurs if the outcome ω A Probability measure P: function over F that assigns probabilities to events according to the axioms of probability (see below) Formally, a probability space is the triple , F , P) EE 278B: Probability and Random Variables 1–2
23

LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Apr 30, 2018

Download

Documents

lamminh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Lecture Notes 1

Probability and Random Variables

• Probability Spaces

• Conditional Probability and Independence

• Random Variables

• Functions of a Random Variable

• Generation of a Random Variable

• Jointly Distributed Random Variables

• Scalar detection

EE 278B: Probability and Random Variables 1 – 1

Probability Theory

• Probability theory provides the mathematical rules for assigning probabilities tooutcomes of random experiments, e.g., coin flips, packet arrivals, noise voltage

• Basic elements of probability theory:

Sample space Ω: set of all possible “elementary” or “finest grain” outcomesof the random experiment

Set of events F : set of (all?) subsets of Ω—an event A ⊂ Ω occurs if theoutcome ω ∈ A

Probability measure P: function over F that assigns probabilities to eventsaccording to the axioms of probability (see below)

• Formally, a probability space is the triple (Ω,F ,P)

EE 278B: Probability and Random Variables 1 – 2

Page 2: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Axioms of Probability

• A probability measure P satisfies the following axioms:

1. P(A) ≥ 0 for every event A in F2. P(Ω) = 1

3. If A1, A2, . . . are disjoint events—i.e., Ai ∩Aj = ∅, for all i 6= j —then

P

( ∞⋃

i=1

Ai

)

=

∞∑

i=1

P(Ai)

• Notes:

P is a measure in the same sense as mass, length, area, and volume—allsatisfy axioms 1 and 3

Unlike these other measures, P is bounded by 1 (axiom 2)

This analogy provides some intuition but is not sufficient to fully understandprobability theory—other aspects such as conditioning and independence areunique to probability

EE 278B: Probability and Random Variables 1 – 3

Discrete Probability Spaces

• A sample space Ω is said to be discrete if it is countable

• Examples:

Rolling a die: Ω = 1, 2, 3, 4, 5, 6 Flipping a coin n times: Ω = H, Tn, sequences of heads/tails of length n

Flipping a coin until the first heads occurs: Ω = H,TH, TTH, TTTH, . . .

• For discrete sample spaces, the set of events F can be taken to be the set of allsubsets of Ω, sometimes called the power set of Ω

• Example: For the coin flipping experiment,

F = ∅, H, T, Ω

• F does not have to be the entire power set (more on this later)

EE 278B: Probability and Random Variables 1 – 4

Page 3: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

• The probability measure P can be defined by assigning probabilities to individualoutcomes— single outcome events ω—so that:

P(ω) ≥ 0 for every ω ∈ Ω

ω∈Ω

P(ω) = 1

• The probability of any other event A is simply

P(A) =∑

ω∈A

P(ω)

• Example: For the die rolling experiment, assign

P(i) = 16 for i = 1, 2, . . . , 6

The probability of the event “the outcome is even,” A = 2, 4, 6, isP(A) = P(2) + P(4) + P(6) = 3

6 = 12

EE 278B: Probability and Random Variables 1 – 5

Continuous Probability Spaces

• A continuous sample space Ω has an uncountable number of elements

• Examples:

Random number between 0 and 1: Ω = (0, 1 ]

Point in the unit disk: Ω = (x, y) : x2 + y2 ≤ 1 Arrival times of n packets: Ω = (0,∞)n

• For continuous Ω, we cannot in general define the probability measure P by firstassigning probabilities to outcomes

• To see why, consider assigning a uniform probability measure over (0, 1 ]

In this case the probability of each single outcome event is zero

How do we find the probability of an event such as A = [0.25, 0.75]?

EE 278B: Probability and Random Variables 1 – 6

Page 4: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

• Another difference for continuous Ω: we cannot take the set of events F as thepower set of Ω. (To learn why you need to study measure theory, which isbeyond the scope of this course)

• The set of events F cannot be an arbitrary collection of subsets of Ω. It mustmake sense, e.g., if A is an event, then its complement Ac must also be anevent, the union of two events must be an event, and so on

• Formally, F must be a sigma algebra (σ-algebra, σ-field), which satisfies thefollowing axioms:

1. ∅ ∈ F2. If A ∈ F then Ac ∈ F3. If A1, A2, . . . ∈ F then

⋃∞i=1Ai ∈ F

• Of course, the power set is a sigma algebra. But we can define smallerσ-algebras. For example, for rolling a die, we could define the set of events as

F = ∅, odd, even, Ω

EE 278B: Probability and Random Variables 1 – 7

• For Ω = R = (−∞,∞) (or (0,∞), (0, 1), etc.) F is typically defined as thefamily of sets obtained by starting from the intervals and taking countableunions, intersections, and complements

• The resulting F is called the Borel field

• Note: Amazingly there are subsets in R that cannot be generated in this way!(Not ones that you are likely to encounter in your life as an engineer or even asa mathematician)

• To define a probability measure over a Borel field, we first assign probabilities tothe intervals in a consistent way, i.e., in a way that satisfies the axioms ofprobability

For example to define uniform probability measure over (0, 1), we first assignP((a, b)) = b− a to all intervals

• In EE 278 we do not deal with sigma fields or the Borel field beyond (kind of)knowing what they are

EE 278B: Probability and Random Variables 1 – 8

Page 5: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Useful Probability Laws

• Union of Events Bound :

P(

n⋃

i=1

Ai

)

≤n∑

i=1

P(Ai)

• Law of Total Probability : Let A1, A2, A3, . . . be events that partition Ω, i.e.,disjoint (Ai ∩Aj = ∅ for i 6= j) and

iAi = Ω. Then for any event B

P(B) =∑

iP(Ai ∩B)

The Law of Total Probability is very useful for finding probabilities of sets

EE 278B: Probability and Random Variables 1 – 9

Conditional Probability

• Let B be an event such that P(B) 6= 0. The conditional probability of event Agiven B is defined to be

P(A |B) =P(A ∩B)

P(B)=

P(A,B)

P(B)

• The function P(· |B) is a probability measure over F , i.e., it satisfies theaxioms of probability

• Chain rule: P(A,B) = P(A)P(B |A) = P(B)P(A |B) (this can be generalizedto n events)

• The probability of event A given B , a nonzero probability event— thea posteriori probability of A—is related to the unconditional probability ofA—the a priori probability—by

P(A |B) =P(B |A)P(B)

P(A)

This follows directly from the definition of conditional probability

EE 278B: Probability and Random Variables 1 – 10

Page 6: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Bayes Rule

• Let A1, A2, . . . , An be nonzero probability events that partition Ω, and let B bea nonzero probability event

• We know P(Ai) and P(B |Ai), i = 1, 2, . . . , n, and want to find the a posterioriprobabilities P(Aj |B), j = 1, 2, . . . , n

• We know that

P(Aj |B) =P(B |Aj)

P(B)P(Aj)

• By the law of total probability

P(B) =

n∑

i=1

P(Ai, B) =

n∑

i=1

P(Ai)P(B |Ai)

• Substituting, we obtain Bayes rule

P(Aj |B) =P(B |Aj)

∑ni=1P(Ai)P(B |Ai)

P(Aj), j = 1, 2, . . . , n

• Bayes rule also applies to a (countably) infinite number of events

EE 278B: Probability and Random Variables 1 – 11

Independence

• Two events are said to be statistically independent if

P(A,B) = P(A)P(B)

• When P(B) 6= 0, this is equivalent to

P(A |B) = P(A)

In other words, knowing whether B occurs does not change the probability of A

• The events A1, A2, . . . , An are said to be independent if for every subsetAi1, Ai2, . . . , Aik of the events,

P(Ai1, Ai2, . . . , Aik) =k∏

j=1

P(Aij)

• Note: P(A1, A2, . . . , An) =∏n

j=1P(Ai) is not sufficient for independence

EE 278B: Probability and Random Variables 1 – 12

Page 7: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Random Variables

• A random variable (r.v.) is a real-valued function X(ω) over a sample space Ω,i.e., X : Ω → R

Ω

ω X(ω)

• Notations:

We use upper case letters for random variables: X, Y, Z, Φ, Θ, . . .

We use lower case letters for values of random variables: X = x means thatrandom variable X takes on the value x, i.e., X(ω) = x where ω is theoutcome

EE 278B: Probability and Random Variables 1 – 13

Specifying a Random Variable

• Specifying a random variable means being able to determine the probability thatX ∈ A for any Borel set A ⊂ R, in particular, for any interval (a, b ]

• To do so, consider the inverse image of A under X , i.e., ω : X(ω) ∈ A

R

set A

inverse image of A under X(ω), i.e., ω : X(ω) ∈ A

• Since X ∈ A iff ω ∈ ω : X(ω) ∈ A,P(X ∈ A) = P(ω : X(ω) ∈ A) = Pω : X(ω) ∈ A

Shorthand: P(set description) = Pset description

EE 278B: Probability and Random Variables 1 – 14

Page 8: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Cumulative Distribution Function (CDF)

• We need to be able to determine PX ∈ A for any Borel set A ⊂ R, i.e., anyset generated by starting from intervals and taking countable unions,intersections, and complements

• Hence, it suffices to specify PX ∈ (a, b ] for all intervals. The probability ofany other Borel set can be determined by the axioms of probability

• Equivalently, it suffices to specify its cumulative distribution function (cdf):

FX(x) = PX ≤ x = PX ∈ (−∞, x ] , x ∈ R

• Properties of cdf:

FX(x) ≥ 0

FX(x) is monotonically nondecreasing, i.e., if a > b then FX(a) ≥ FX(b)

FX(x)

1

x

EE 278B: Probability and Random Variables 1 – 15

Limits: limx→+∞

FX(x) = 1 and limx→−∞

FX(x) = 0

FX(x) is right continuous, i.e., FX(a+) = limx→a+ FX(x) = FX(a)

PX = a = FX(a)− FX(a−), where FX(a−) = limx→a− FX(x)

For any Borel set A, PX ∈ A can be determined from FX(x)

• Notation: X ∼ FX(x) means that X has cdf FX(x)

EE 278B: Probability and Random Variables 1 – 16

Page 9: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Probability Mass Function (PMF)

• A random variable is said to be discrete if FX(x) consists only of steps over acountable set X

PSfrag

1

x

• Hence, a discrete random variable can be completely specified by the probability

mass function (pmf)

pX(x) = PX = x for every x ∈ XClearly pX(x) ≥ 0 and

x∈X pX(x) = 1

• Notation: We use X ∼ pX(x) or simply X ∼ p(x) to mean that the discreterandom variable X has pmf pX(x) or p(x)

EE 278B: Probability and Random Variables 1 – 17

• Famous discrete random variables:

Bernoulli : X ∼ Bern(p) for 0 ≤ p ≤ 1 has the pmf

pX(1) = p and pX(0) = 1− p

Geometric : X ∼ Geom(p) for 0 ≤ p ≤ 1 has the pmf

pX(k) = p(1− p)k−1 , k = 1, 2, 3, . . .

Binomial : X ∼ Binom(n, p) for integer n > 0 and 0 ≤ p ≤ 1 has the pmf

pX(k) =

(

n

k

)

pk(1− p)n−k, k = 0, 1, 2, . . .

Poisson: X ∼ Poisson(λ) for λ > 0 has the pmf

pX(k) =λk

k!e−λ , k = 0, 1, 2, . . .

Remark: Poisson is the limit of Binomial for np = λ as n → ∞, i.e., for everyk = 0, 1, 2, . . ., the Binom(n, λ/n) pmf

pX(k) → λk

k!e−λ as n → ∞

EE 278B: Probability and Random Variables 1 – 18

Page 10: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Probability Density Function (PDF)

• A random variable is said to be continuous if its cdf is a continuous function

PSfrag

1

x

• If FX(x) is continuous and differentiable (except possibly over a countable set),then X can be completely specified by a probability density function (pdf)fX(x) such that

FX(x) =

∫ x

−∞fX(u) du

• If FX(x) is differentiable everywhere, then (by definition of derivative)

fX(x) =dFX(x)

dx

= lim∆x→0

F (x + ∆x) − F (x)

∆x= lim

∆x→0

Px < X ≤ x + ∆x∆x

EE 278B: Probability and Random Variables 1 – 19

• Properties of pdf:

fX(x) ≥ 0

∫ ∞

−∞fX(x) dx = 1

For any event (Borel set) A ⊂ R,

PX ∈ A =

x∈A

fX(x) dx

In particular,

Px1 < X ≤ x2 =

∫ x2

x1

fX(x) dx

• Important note: fX(x) should not be interpreted as the probability that X = x.In fact, fX(x) is not a probability measure since it can be > 1

• Notation: X ∼ fX(x) means that X has pdf fX(x)

EE 278B: Probability and Random Variables 1 – 20

Page 11: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

• Famous continuous random variables:

Uniform: X ∼ U[a, b ] where a < b has pdf

fX(x) =

1b−a if a ≤ x ≤ b

0 otherwise

Exponential : X ∼ Exp(λ) where λ > 0 has pdf

fX(x) =

λe−λx if x ≥ 0

0 otherwise

Laplace: X ∼ Laplace(λ) where λ > 0 has pdf

fX(x) =1

2λe−λ|x|

Gaussian: X ∼ N (µ, σ2) with parameters µ (the mean) and σ2 (thevariance, σ is the standard deviation) has pdf

fX(x) =1√2πσ2

e−(x−µ)2

2σ2

EE 278B: Probability and Random Variables 1 – 21

The cdf of the standard normal random variable N (0, 1) is

Φ(x) =

∫ x

−∞

1√2π

e−u2

2 du

Define the function Q(x) = 1− Φ(x) = PX > xN (0, 1)

x

Q(x)

The Q(·) function is used to compute PX > a for any Gaussian r.v. X :

Given Y ∼ N (µ, σ2), we represent it using the standard X ∼ N (0, 1) as

Y = σX + µ

Then

PY > y = P

X >y − µ

σ

= Q

(

y − µ

σ

)

The complementary error function is erfc(x) = 2Q(√2x)

EE 278B: Probability and Random Variables 1 – 22

Page 12: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Functions of a Random Variable

• Suppose we are given a r.v. X with known cdf FX(x) and a function y = g(x).What is the cdf of the random variable Y = g(X)?

• We useFY (y) = PY ≤ y = Px : g(x) ≤ y

x y

y

x : g(x) ≤ y

EE 278B: Probability and Random Variables 1 – 23

• Example: Quadratic function. Let X ∼ FX(x) and Y = X2. We wish to findFY (y)

y

x

y

√y−√

y

If y < 0, then clearly FY (y) = 0. Consider y ≥ 0,

FY (y) = P −√y < X ≤ √

y = FX (√y)− FX (−√

y )

If X is continuous with density fX(x), then

fY (y) =1

2√y

(

fX(+√y) + fX(−√

y))

EE 278B: Probability and Random Variables 1 – 24

Page 13: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

• Remark: In general, let X ∼ fX(x) and Y = g(X) be differentiable. Then

fY (y) =k∑

i=1

fX(xi)

|g′(xi)|,

where x1, x2, . . . are the solutions of the equation y = g(x) and g′(xi) is thederivative of g evaluated at xi

EE 278B: Probability and Random Variables 1 – 25

• Example: Limiter. Let X ∼ Laplace(1), i.e., fX(x) = (1/2)e−|x|, and let Y bedefined by the function of X shown in the figure. Find the cdf of Y

y

x+1

−1

+a

−a

To find the cdf of Y , we consider the following cases

y < −a: Here clearly FY (y) = 0

y = −a: Here

FY (−a) = FX(−1)

=

∫ −1

−∞12e

x dx = 12e

−1

EE 278B: Probability and Random Variables 1 – 26

Page 14: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

−a < y < a: Here

FY (y) = PY ≤ y= P aX ≤ y

= P

X ≤ y

a

= FX

(y

a

)

= 12e

−1 +

∫ y/a

−1

12e

−|x| dx

y ≥ a: Here FY (y) = 1

Combining the results, the following is a sketch of the cdf of Y

FY (y)

ya−a

EE 278B: Probability and Random Variables 1 – 27

Generation of Random Variables

• Generating a r.v. with a prescribed distribution is often needed for performingsimulations involving random phenomena, e.g., noise or random arrivals

• First let X ∼ F (x) where the cdf F (x) is continuous and strictly increasing.Define Y = F (X), a real-valued random variable that is a function of X

What is the cdf of Y ?

Clearly, FY (y) = 0 for y < 0, and FY (y) = 1 for y > 1

For 0 ≤ y ≤ 1, note that by assumption F has an inverse F−1, so

FY (y) = PY ≤ y = PF (X) ≤ y = PX ≤ F−1(y) = F (F−1(y)) = y

Thus Y ∼ U [ 0, 1 ], i.e., Y is a uniformly distributed random variable

• Note: F (x) does not need to be invertible. If F (x) = a is constant over someinterval, then the probability that X lies in this interval is zero. Without loss ofgenerality, we can take F−1(a) to be the leftmost point of the interval

• Conclusion: We can generate a U[ 0, 1 ] r.v. from any continuous r.v.

EE 278B: Probability and Random Variables 1 – 28

Page 15: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

• Now, let’s consider the more useful scenario where we are given X ∼ U[ 0, 1 ] (arandom number generator) and wish to generate a random variable Y withprescribed cdf F (y), e.g., Gaussian or exponential

x = F (y)

y = F−1(x)

1

• If F is continuous and strictly increasing, set Y = F−1(X). To show Y ∼ F (y),

FY (y) = PY ≤ y= PF−1(X) ≤ y= PX ≤ F (y)= F (y) ,

since X ∼ U[ 0, 1 ] and 0 ≤ F (y) ≤ 1

EE 278B: Probability and Random Variables 1 – 29

• Example: To generate Y ∼ Exp(λ), set

Y = −1

λln(1−X)

• Note: F does not need to be continuous for the above to work. For example, togenerate Y ∼ Bern(p), we set

Y =

0 X ≤ 1− p

1 otherwise

x = F (y)

y

1−p

10

• Conclusion: We can generate a r.v. with any desired distribution from a U[0, 1] r.v.

EE 278B: Probability and Random Variables 1 – 30

Page 16: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Jointly Distributed Random Variables

• A pair of random variables defined over the same probability space are specifiedby their joint cdf

FX,Y (x, y) = PX ≤ x, Y ≤ y , x, y ∈ R

FX,Y (x, y) is the probability of the shaded region of R2

x

y

(x, y)

EE 278B: Probability and Random Variables 1 – 31

• Properties of the cdf:

FX,Y (x, y) ≥ 0

If x1 ≤ x2 and y1 ≤ y2 then FX,Y (x1, y1) ≤ FX,Y (x2, y2)

limy→−∞

FX,Y (x, y) = 0 and limx→−∞

FX,Y (x, y) = 0

limy→∞

FX,Y (x, y) = FX(x) and limx→∞

FX,Y (x, y) = FY (y)

FX(x) and FY (y) are the marginal cdfs of X and Y

limx,y→∞

FX,Y (x, y) = 1

• X and Y are independent if for every x and y

FX,Y (x, y) = FX(x)FY (y)

EE 278B: Probability and Random Variables 1 – 32

Page 17: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Joint, Marginal, and Conditional PMFs

• Let X and Y be discrete random variables on the same probability space

• They are completely specified by their joint pmf :

pX,Y (x, y) = PX = x, Y = y , x ∈ X , y ∈ Y

By axioms of probability,∑

x∈X

y∈YpX,Y (x, y) = 1

• To find pX(x), the marginal pmf of X , we use the law of total probability

pX(x) =∑

y∈Yp(x, y) , x ∈ X

• The conditional pmf of X given Y = y is defined as

pX|Y (x|y) =pX,Y (x, y)

pY (y), pY (y) 6= 0, x ∈ X

• Chain rule: pX,Y (x, y) = pX(x)pY |X(y|x) = pY (y)pX|Y (x|y)

EE 278B: Probability and Random Variables 1 – 33

• Independence: X and Y are said to be independent if for every (x, y) ∈ X × Y ,

pX,Y (x, y) = pX(x)pY (y) ,

which is equivalent to pX|Y (x|y) = pX(x) for every x ∈ X and y ∈ Y suchthat pY (y) 6= 0

EE 278B: Probability and Random Variables 1 – 34

Page 18: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Joint, Marginal, and Conditional PDF

• X and Y are jointly continuous random variables if their joint cdf is continuousin both x and y

In this case, we can define their joint pdf, provided that it exists, as the functionfX,Y (x, y) such that

FX,Y (x, y) =

∫ x

−∞

∫ y

−∞fX,Y (u, v) du dv , x, y ∈ R

• If FX,Y (x, y) is differentiable in x and y, then

fX,Y (x, y) =∂2F (x, y)

∂x∂y= lim

∆x,∆y→0

Px < X ≤ x+∆x, y < Y ≤ y +∆y∆x∆y

• Properties of fX,Y (x, y):

fX,Y (x, y) ≥ 0

∫ ∞

−∞

∫ ∞

−∞fX,Y (x, y) dx dy = 1

EE 278B: Probability and Random Variables 1 – 35

• The marginal pdf of X can be obtained from the joint pdf via the law of totalprobability:

fX(x) =

∫ ∞

−∞fX,Y (x, y) dy

• X and Y are independent iff fX,Y (x, y) = fX(x)fY (y) for every x, y

• Conditional cdf and pdf: Let X and Y be continuous random variables withjoint pdf fX,Y (x, y). We wish to define FY |X(y |X = x) = PY ≤ y |X = xWe cannot define the above conditional probability as

PY ≤ y, X = xPX = x

because both numerator and denominator are equal to zero. Instead, we defineconditional probability for continuous random variables as a limit

FY |X(y|x) = lim∆x→0

PY ≤ y |x < X ≤ x+∆x

= lim∆x→0

PY ≤ y, x < X ≤ x+∆xPx < X ≤ x+∆x

= lim∆x→0

∫ y

−∞ fX,Y (x, u) du∆x

fX(x)∆x=

∫ y

−∞

fX,Y (x, u)

fX(x)du

EE 278B: Probability and Random Variables 1 – 36

Page 19: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

• We then define the conditional pdf in the usual way as

fY |X(y|x) = fX,Y (x, y)

fX(x)if fX(x) 6= 0

• Thus

FY |X(y|x) =∫ y

−∞fY |X(u|x) du

which shows that fY |X(y|x) is a pdf for Y given X = x, i.e.,

Y | X = x ∼ fY |X(y|x)

• independence: X and Y are independent if fX,Y (x, y) = fX(x)fY (y) for every(x, y)

EE 278B: Probability and Random Variables 1 – 37

Mixed Random Variables

• Let Θ be a discrete random variable with pmf pΘ(θ)

• For each Θ = θ with pΘ(θ) 6= 0, let Y be a continuous random variable, i.e.,FY |Θ(y|θ) is continuous for all θ. We define fY |Θ(y|θ) in the usual way

• The conditional pmf of Θ given y can be defined as a limit

pΘ|Y (θ | y) = lim∆y→0

PΘ = θ, y < Y ≤ y +∆yPy < Y ≤ y +∆y

= lim∆y→0

pΘ(θ)fY |Θ(y|θ)∆y

fY (y)∆y=

fY |Θ(y|θ)pΘ(θ)fY (y)

EE 278B: Probability and Random Variables 1 – 38

Page 20: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Bayes Rule for Random Variables

• Bayes Rule for pmfs: Given pX(x) and pY |X(y|x), then

pX|Y (x|y) =pY |X(y|x)

x′∈XpY |X(y|x′)pX(x′)

pX(x)

• Bayes rule for densities: Given fX(x) and fY |X(y|x), then

fX|Y (x|y) =fY |X(y|x)

∫∞−∞ fX(u)fY |X(y|u) du fX(x)

• Bayes rule for mixed r.v.s: Given pΘ(θ) and fY |Θ(y|θ), then

pΘ|Y (θ | y) =fY |Θ(y|θ)

θ′ pΘ(θ′)fY |Θ(y|θ′)

pΘ(θ)

Conversely, given fY (y) and pΘ|Y (θ|y), then

fY |Θ(y|θ) =pΘ|Y (θ|y)

fY (y′)pΘ|Y (θ|y′)dy′fY (y)

EE 278B: Probability and Random Variables 1 – 39

• Example: Additive Gaussian Noise Channel

Consider the following communication channel:

Θ Y

Z ∼ N (0, N)

The signal transmitted is a binary random variable Θ:

Θ =

+1 with probability p

−1 with probability 1− p

The received signal, also called the observation, is Y = Θ+ Z , where Θ and Zare independent

Given Y = y is received (observed), find pΘ|Y (θ|y), the a posteriori pmf of Θ

EE 278B: Probability and Random Variables 1 – 40

Page 21: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

Solution: We use Bayes rule

pΘ|Y (θ|y) =fY |Θ(y|θ)

θ′ pΘ(θ′)fY |Θ(y|θ′)

pΘ(θ)

We are given pΘ(θ):

pΘ(+1) = p and pΘ(−1) = 1− p

and fY |Θ(y|θ) = fZ(y − θ):

Y | Θ = +1 ∼ N (+1, N) and Y | Θ = −1 ∼ N (−1, N)

Therefore

pΘ|Y (1|y) =p√2πN

e−(y−1)2

2N

p√2πN

e−(y−1)2

2N +(1 − p)√

2πNe−

(y+1)2

2N

=pe

yN

peyN + (1− p)e−

yN

for −∞ < y < ∞

EE 278B: Probability and Random Variables 1 – 41

Scalar Detection

• Consider the following general digital communication system

Θ ∈ θ0, θ1 Θ(Y ) ∈ θ0, θ1Ynoisychannel decoder

fY |Θ(y|θ)where the signal sent is

Θ =

θ0 with probability p

θ1 with probability 1− p

and the observation (received signal) is

Y | Θ = θ ∼ fY |Θ(y | θ) , θ ∈ θ0, θ1

• We wish to find the estimate Θ(Y ) (i.e., design the decoder) that minimizes theprobability of error :

Pe= PΘ 6= Θ = PΘ = θ0, Θ = θ1+ PΘ = θ1, Θ = θ0= PΘ = θ0PΘ = θ1 |Θ= θ0+ PΘ= θ1PΘ = θ0 |Θ= θ1

EE 278B: Probability and Random Variables 1 – 42

Page 22: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

• We define the maximum a posteriori probability (MAP) decoder as

Θ(y) =

θ0 if pΘ|Y (θ0|y) > pΘ|Y (θ1|y)θ1 otherwise

• The MAP decoding rule minimizes Pe, since

Pe = 1− PΘ(Y ) = Θ

= 1−∫ ∞

−∞fY (y)PΘ(y) = Θ |Y = y dy

and the integral is maximized when we pick the largest PΘ(y) = Θ |Y = yfor each y, which is precisely the MAP decoder

• If p = 12 , i.e., equally likely signals, using Bayes rule, the MAP decoder reduces

to the maximum likelihood (ML) decoder

Θ(y) =

θ0 if fY |Θ(y|θ0) > fY |Θ(y|θ1)θ1 otherwise

EE 278B: Probability and Random Variables 1 – 43

Additive Gaussian Noise Channel

• Consider the additive Gaussian noise channel with signal

Θ =

+√P with probability 1

2

−√P with probability 1

2

noise Z ∼ N (0, N) (Θ and Z are independent), and output Y = Θ+ Z

• The MAP decoder is

Θ(y) =

+√P if

PΘ = +√P |Y = y

PΘ = −√P | Y = y

> 1

−√P otherwise

Since the two signals are equally likely, the MAP decoding rule reduces to theML decoding rule

Θ(y) =

+√P if

fY |Θ(y | +√P )

fY |Θ(y | −√P )

> 1

−√P otherwise

EE 278B: Probability and Random Variables 1 – 44

Page 23: LectureNotes1 ProbabilityandRandomVariablesweb.stanford.edu/class/archive/ee/ee278/ee278.1152/lect01-2_Abbas.pdf · LectureNotes1 ProbabilityandRandomVariables • Probability Spaces

• Using the Gaussian pdf, the ML decoder reduces to the minimum distance

decoder

Θ(y) =

+√P (y −

√P )2 < (y − (−

√P ))2

−√P otherwise

From the figure, this simplifies to

Θ(y) =

+√P y > 0

−√P y < 0

Note: The decision when y = 0 is arbitrary

y

f(y |−√P ) f(y |+

√P )

+√P−

√P

EE 278B: Probability and Random Variables 1 – 45

• Now to find the minimum probability of error, consider

Pe = PΘ(Y ) 6= Θ

= PΘ =√PPΘ(Y ) = −

√P |Θ =

√P +

PΘ = −√PPΘ(Y ) =

√P |Θ = −

√P

= 12PY ≤ 0 |Θ =

√P+ 1

2PY > 0 |Θ = −√P

= 12PZ ≤ −

√P+ 1

2PZ >√P

= Q

(

P

N

)

= Q(√

SNR)

The probability of error is a decreasing function of P/N , the signal-to-noise

ratio (SNR)

EE 278B: Probability and Random Variables 1 – 46