Top Banner
Risk and Risk Aversion — © Jonathan Ingersoll 1 version: June 25, 2019 Chapter 2 — Risk and Risk Aversion The previous chapter discussed risk and introduced the notion of risk aversion. This chapter examines those concepts in more detail. In particular, it answers questions like, when can we say that one prospect is riskier than another or one agent is more averse to risk than another? How is risk related to statistical measures like variance? Risk aversion is important in finance because it determines how much of a given risk a person is willing to take on. Consider an investor who might wish to commit some funds to a risk project. Suppose that each dollar invested will return x dollars. Ignoring time and any other prospects for investing, the optimal decision for an agent with wealth w is to choose the amount k to invest to maximize ( 1) [( )] uw kx + . The marginal benefit of increasing investment gives the first order condition ( ) 0 ( 1) ( 1) u u w kx x k = = + (1) At k = 0, the marginal benefit is ( ) [ 1] uw x which is positive for all increasing utility functions provided the risk has a better than fair return, paying back on average more than one dollar for each dollar committed. Therefore, all risk averse agents who prefer more to less should be willing to take on some amount of the risk. How big a position they might take depends on their risk aversion. Someone who is risk neutral, with constant marginal utility, would willingly take an unlimited position, because the right-hand side of (1) remains positive at any chosen level for k. For a risk-averse agent, marginal utility declines as prospects improves so there will be a finite optimum. Conversely, if [] 1, x < risk-averse and risk-neutral agents would always choose k* = 0. 1 We would like to expand upon this with results like: An agent who is more risk averse will invest less in any project than another agent who is less risk averse. The more risk a project has, the smaller is the optimal position for any risk averse agent. The greater the risk, the higher must be the expected outcome to induce the same level of commitment. To verify these claims, we obviously must have precise definitions of riskier and more risk averse. To start, we will consider only single-argument, monotonic increasing, concave utility functions. Also we will ignore time; all risks are resolved immediately, or equivalently all amounts are stated in future value terms. Also ignored are any pre-risk withdrawals from wealth. Finally, it is assumed that expectations exist. A sufficient condition for this to be true is either bounded utility or bounded outcomes. Neither is necessary, however, and many of the examples will violate both conditions. All results should be interpreted to have the additional provision “provided the expectations exist.” Risk conveys one of two related notions. The first is the uncertainty of outcomes. The risky prospect has more dispersed outcomes. This concept is a purely statistical. The other notion is risk is something that risk-averse agents do not like. However, not everything that is disliked by a risk averter is risk. Because utility is increasing, decreasing any outcome will reduce expected utility. However, decreasing very good outcomes may well reduce dispersion. Our definition of risk must take this into consideration. The solution to this dilemma is the one already adopted. The common practice, and the one adopted in the previous chapter is to define a risk premium relative to the expected payoff. The basic economic justification for using the 1 If it is possible to select k < 0, that is, to short the investment, then all risk-neutral or risk averse investors would wish to do that. This would not be possible in equilibrium where investments are in zero or positive supply.
36

Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Jul 10, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 1 version: June 25, 2019

Chapter 2 — Risk and Risk Aversion The previous chapter discussed risk and introduced the notion of risk aversion. This chapter examines those concepts in more detail. In particular, it answers questions like, when can we say that one prospect is riskier than another or one agent is more averse to risk than another? How is risk related to statistical measures like variance?

Risk aversion is important in finance because it determines how much of a given risk a person is willing to take on. Consider an investor who might wish to commit some funds to a risk project. Suppose that each dollar invested will return x dollars. Ignoring time and any other prospects for investing, the optimal decision for an agent with wealth w is to choose the amount k to invest to maximize ( 1)[ ( )]u w k x+ − . The marginal benefit of increasing investment gives the first order condition

( )0 ( 1) ( 1)u u w k x xk

∂ ′= = + − − ∂ (1)

At k = 0, the marginal benefit is ( ) [ 1]u w x′ − which is positive for all increasing utility functions provided the risk has a better than fair return, paying back on average more than one dollar for each dollar committed. Therefore, all risk averse agents who prefer more to less should be willing to take on some amount of the risk. How big a position they might take depends on their risk aversion. Someone who is risk neutral, with constant marginal utility, would willingly take an unlimited position, because the right-hand side of (1) remains positive at any chosen level for k. For a risk-averse agent, marginal utility declines as prospects improves so there will be a finite optimum. Conversely, if [ ] 1,x < risk-averse and risk-neutral agents would always choose k* = 0.1 We would like to expand upon this with results like: An agent who is more risk averse will invest less in any project than another agent who is less risk averse. The more risk a project has, the smaller is the optimal position for any risk averse agent. The greater the risk, the higher must be the expected outcome to induce the same level of commitment. To verify these claims, we obviously must have precise definitions of riskier and more risk averse. To start, we will consider only single-argument, monotonic increasing, concave utility functions. Also we will ignore time; all risks are resolved immediately, or equivalently all amounts are stated in future value terms. Also ignored are any pre-risk withdrawals from wealth. Finally, it is assumed that expectations exist. A sufficient condition for this to be true is either bounded utility or bounded outcomes. Neither is necessary, however, and many of the examples will violate both conditions. All results should be interpreted to have the additional provision “provided the expectations exist.”

Risk conveys one of two related notions. The first is the uncertainty of outcomes. The risky prospect has more dispersed outcomes. This concept is a purely statistical. The other notion is risk is something that risk-averse agents do not like. However, not everything that is disliked by a risk averter is risk. Because utility is increasing, decreasing any outcome will reduce expected utility. However, decreasing very good outcomes may well reduce dispersion. Our definition of risk must take this into consideration. The solution to this dilemma is the one already adopted. The common practice, and the one adopted in the previous chapter is to define a risk premium relative to the expected payoff. The basic economic justification for using the

1 If it is possible to select k < 0, that is, to short the investment, then all risk-neutral or risk averse investors would wish to do that. This would not be possible in equilibrium where investments are in zero or positive supply.

Page 2: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 2 version: June 25, 2019

expectation is that it is the only criterion for a risk-neutral agent with linear utility, and linear functions provides the closure on the set of concave, or risk averse, functions.

So when comparing among prospects with the same expectation, we will define risk as that which is disliked by risk averters. Whether or not something is disliked, however, depends upon the particular utility function used in the evaluation. So risk is a property that is defined for a class of utility functions. Naturally we would like a definition which is as broadly applicable as possible.

Risk: The Basics

Rothschild and Stiglitz defined risk and developed its properties for the class of concave utilities. That is, they asked and answered the question: Under what conditions do all risk averse agents weakly prefer x to y; i.e., when is it true that [ ( )] [ ( )]u x u y≥ for all concave u. Asking this question for all concave utilities rather than for all increasing, concave utilities automatically equates the expectations because both u(x) = x and u(x) = −x are weakly concave functions, and the first always prefers a higher mean while the second always prefers a lower mean. This is where we will begin our examination, though we will then look at increasing, concave utilities.

To reiterate, x is said to be less risky than y if

[ ( )] [ ( )]u x u y≥ (2) for all concave (but not necessarily increasing) u. The random variable x is strictly less risky than y if the inequality in (2) is strict for all strictly concave utility functions.

Unlike the preference relation introduced in the previous chapter, this riskiness ordering is not complete. It is possible to find outcomes andx y and utility functions u1 and u2 such that

1[ ( )]u x > 1[ ( )]u y while 2 2[ ( )] [ ( )]u x u y< so neither may be said to be riskier, yet they are not equivalent either. For example, consider the random variables andx y with

3 31 14 4 4 4Pr 0 Pr 4 , Pr 1 , Pr 9x x y y= = = = = = = = = (3)

Both variables have an expected value of 3. For the utility function u(x) = x1/2, the expected utilities of x and y are both 1.5. However, for the utility function wα/α so x is strictly preferred when ½ < α < 1 and y is strictly preferred for α < ½. In fact as α → 0, [u(x)] → −∞. So these two prospects cannot be ranked on riskiness in the class of all concave utility functions.

Riskiness is a complete ordering only when the class of utilities or permitted random variables or both are restricted. The set of quadratic utility functions composes such a restricted class. Expected quadratic utility is 2[ ( )] var[ ].u x x bx b x= − − So for two random outcomes with the same expectation, the one with the smaller variance is preferred under quadratic utility. This ordering is obviously complete because risk can be measured by the variance, a single the real number, and the greater than ordering is complete over the real numbers.

If the random variables are restricted to be normally distributed, then variance again provides a complete ranking of riskiness. For a normally distributed variable with mean µ and standard deviation σ, expected utility is

0

[ ( )] ( ) ( ) [ ( ) ( )] ( ) .u x u e e de u e u e e de∞ ∞

−∞= µ + σ φ = µ + σ + µ − σ φ∫ ∫ (4)

where e is a standard normal variable with mean 0 and variance 1. Increasing variance decreases expected utility

Page 3: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 3 version: June 25, 2019

0

[ ( )] [ ( ) ( )] ( ) 0 .u x u e u e e e de∞∂ ′ ′= µ + σ − µ − σ φ ≤

∂σ ∫

(5)

Because marginal utility is weakly decreasing, the final integrand is always negative or zero so an increase in variance always weakly reduces expected utility.2 Variance is a valid measure of riskiness in many models used in Finance, but even when there is a complete ordering for riskiness based on some statistic, variance may not play that role. Consider the class of utility functions

0( ; ) | | for 1.u x c x c x x c= − − < (6) These utility functions are piece-wise linear with a kink at x = x0 where the slope decreases from 1 + c to 1 − c. For this class, the expected utility for any gamble is

0[ ( ; )] [ ] [| |]u x c x c x x= − − (7)

The expected utility for gambles with the same expectation is smaller whenever the absolute deviation of x from x0 is larger. This statistic provides a complete ordering but is not based on variance. One random variable can be riskier than another even if it has a smaller variance.

Similar examples can be constructed for other measures of central tendency. While no single parameter like variance is sufficient to determine the risk of a random outcome for all concave utilities, it is true that random variables that are riskier must have a higher variance. This is obvious because variance does measure riskiness for some members of the class, namely quadratic utilities

While variance is not the correct measure of riskiness in all cases, the idea that increased dispersion creates more risk is valid when dispersion is defined properly. It is natural to think that for ỹ = x + ,ε that ε with [ ] 0ε = is the added risk. Some obvious ways that might describe additional risk are (i) ε is uncorrelated with x,3 (ii) ε is a fair game with respect to x, or (iii) ε is independent of x. This list forms a natural hierarchy as (iii) ⇒ (ii) ⇒ (i).

A lack of correlation between andx ε means var[ ] var[ ],x x+ ε ≥ but that is insufficient to guarantee an increase in risk as the following example illustrates

1 1 12 3 61

1 1 122 3 61

1 1 132 3 61

6

Pr[ ] 10 30 50 20

10 5 5 [ ] ( 5) 10 ( 5) 020 10 30 [ ] 10 ( 5) 30 10 50( 5) 0 .50 5 45

x xx

x

ε + ε= ⋅ + ⋅ + ⋅ =

− ε = ⋅ − + ⋅ + ⋅ − =ε = ⋅ ⋅ − + ⋅ ⋅ + ⋅ − =

(8)

The expectation of ε is zero, and it is uncorrelated with x. Nevertheless, y x≡ + ε is not riskier than x because it is preferred by the concave utility function

45 [ ( )] 19.5( ) 45 .4( 45) 45 [ ( )] 20 .x x u xu x x x u y

≤ == ⇒ + − ≥ =

(9)

The fair game condition (ii) is sufficient for increased riskiness. Its sufficiency can be demonstrated by computing the expected utility of y in two steps. First take the expectation conditional on x and then apply Jensen's inequality to the concave utility 2 The normal is not the only distribution for which this is true. From (5), it is clear it will be true for any symmetric distribution which can be standardized like the normal. 3 We certainly do not want x and ε to be negatively correlated as that could reduce the variance, and therefore, could not increase risk. Positive correlation might be fine, but that would not provide a minimal description.

Page 4: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 4 version: June 25, 2019

[ ( )] [ ( )] [ ( ) | ] [ | ] [ ( )] .[ ] [ ( )]u y u x u x x u x x u x≡ + ε = + ε ≤ + ε = (10)

This means that independence (iii) is also sufficient as the independence guarantees (ii). While sufficient, the fair game condition is not necessary as stated. To illustrate consider the random variable x that takes on the values ±1 with equal probability. The random variable y ≡ kx is clearly riskier for any k > 1 because

1 12 2[ ( )] [ ( ) ( )] [ ( 1) (1)]u kx u k u k u u= − + ≤ − + (11)

for every concave u. However, the difference between the variables is ( 1) ,y x k xε = − = − which clearly is not mean-independent of or even uncorrelated with x. The same is true for any other symmetric mean-zero random variable. What is both necessary and sufficient is that y has the same distribution as x + ε with [ | ] 0.xε = This is commonly written as .dy x= + ε It is clearly sufficient because (10) applies equally well. There need be no relation between the random variables only between their distributions. So y need only be distributed like, but not necessarily be equal to, .x + ε

4 As we know that kx is riskier than x, it must be possible to find a representation in the x + ε form. Define

1 ( 1) 2with probability when 1( 1) 211 ( 1) 2with probability when 1.( 1) 21

k k k xk kkk k k xk kk

− + = − −+ ε = − + =−− −

(12)

In this construction, x + ε = ±k with equal probability so it has the same distribution as kx (and, therefore, the same distribution as y), and [ | ] 0xε = so kx and y are riskier than x. The Rothschild-Stiglitz Theorem on Risk Rothschild and Stiglitz formalized the discussion in previous section to provide a general definition of increased riskiness. They showed that the fair game construction is both necessary and sufficient condition for a random variable to be less risky and be preferred under all concave utility functions. They also showed a third equivalent statement is that the riskier variable has greater weight in the tails of its distribution.

Theorem 2.1: More Risky (Rothschild-Stiglitz).5 Let andx y be two random variables with the same expectation and the cumulative distribution functions, F(x) and G(y). Their combined domain is [a, b] ≡ [min(ax, ay), max(bx, by)]. The following three definitions for x is weakly less risky than y are equivalent. 6

i) x is preferred to y by all risk averters: [ ( )] [ ( )] with 0 .u x u y u u′′≥ ∀ ≤

4 Another way to state this result is y x′= + ε with [ | ] 0,x′ε = and x′ has the same distribution as x, but need bear no other relation. With this restriction, some x′ + ε that equals y can always be found. 5 Rothschild-Stiglitz riskiness is also called concave dominance. Similarly, first-order stochastic dominance is called monotone dominance, and second-order stochastic dominance is called monotone-concave dominance. Stochastic dominance is introduced later. 6 As noted in Rothschild and Stiglitz (1972), (ii) is not equivalent to the other conditions when the distributions are unbounded below. Ross (1971) provides a sufficient condition for the integral condition to be equivalent, but, unfortunately, it has no simple interpretation. The same is true in Theorem 2.5 below.

Page 5: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 5 version: June 25, 2019

ii) G has more weight in its tails than F: ( ) [ ( ) ( )] 0 , ( ) 0 .t

aT t G s F s ds t T b≡ − ≥ ∀ =∫

iii) y is distributed like x plus noise: with [ | ] 0 .dy x x= + ε ε =

Proof: (i) ⇒ (ii): Consider the subset of increasing, concave functions ( ) min( , ).zu x x z≡ The derivative of zu is 1 for x < z and 0 for x > z. Integrating by parts

0 [ ( ) ( )] ( )[ ( ) ( )]

( )[ ( ) ( )] ( )[ ( ) ( )]

0 [ ( ) ( )] ( ) .

b

z z zabb

z za a

z

a

u x u y u t dF t dG t

u t F t G t u t F t G t dt

F t G t dt T z

≤ − ≡ −

′= − − −

= − − ≡

∫∫

(13)

As this holds for all utility functions indexed by z, T(z) must be nonnegative for all z. (ii) ⇒ (iii): 7 To start define a simple mean-preserving spread of x as a random variable

that moves all the probability mass in some interval (x′, x″) to the endpoints x′ and x″ in a way that preserves the mean. This definition is valid for both continuous and discrete distributions and the construction is always possible for any interval in which there is positive probability mass. Moving all the probability mass to x′ reduces the mean while moving all the mass to x″ increases it. So some mixture preserves the mean.

Define ( ) ( )zI aF z F x dx≡ ∫ and similarly for GI(z). By assumption (ii), GI(a) = FI(a) = 0,

GI(b) = FI(b) = b, and GI(z) ≥ FI(z). Assume that the latter is strictly true for z (a, b). If not the following construction can be applied separately to each interval in which the strict inequality does holds. Because GI and FI are integrals of increasing, nonnegative functions, they are both increasing, convex functions as shown in the figure.

Now pick any point z0 on GI and consider a tangent line, L, extended until it intersects FI at points z′ and z″.8 These intersections must occur in the interval [a, b] because GI = FI at those

7 The (ii) ⇒ (iii) portion of the original Rothschild-Stiglitz proof has a minor error in a supporting Lemma which was subsequently corrected. See Leshno, Levy, and Spector (1995) and Deb and Seo (2011). This proof outlines a construction that is substantially different using an alternate notion of mean-preserving spread adopted by Machina and Pratt (1997). 8 For a discrete distribution or a distribution with atoms, there will be kinks in GI. In this case, a “tangent” at a kink is any line that passes through the kink and otherwise remains below GI.

Figure 2.1: A Simple Mean-Preserving Spread

This figure illustrates a simple mean-preserving spread. These are the building blocks of the proof of Rothschild-Stiglitz riskiness

Page 6: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 6 version: June 25, 2019

two endpoints. Define H1(z) = FI(z) for z ≤ z′ and z ≥ z″, and H1(z) = L(z) for z′ < z < z″. H1 is the integral of some cumulative distribution function because it is increasing and

convex. Furthermore, as its derivative is constant between z′ and z″, the cumulative distribution to which is belongs is also constant over that interval. This means that there is no probability mass within the interval (z′, z″) for the distribution H1. In other words, the probability mass has been spread to the points z′ and z″. The kinks in H1 at those points mark the atoms of added probability. Furthermore,

1 1 1[ ] [ ( ) ( )] [ ( ) ( )] [ ( ) ( )] 0 .b bb

aa ax z t dF t dH t t F t H t F t H t dt− = − = − − − =∫ ∫ (14)

The first term is zero because F(a) = H1(a) = 0 and F(b) = H1(b) = 1. The second term is zero because the integrals of F and H1 are equal at a and b by construction. So the underlying H1 distribution has the same mean as the F distribution. Thus, H1 is a simple mean-preserving spread of F.

H2 can be constructed by picking a point between A and z0 and a point between z0 and B, then running their tangents lines until they intersect with H1. H2 then represents a probability distribution that differs from H1 by two simple mean preserving spreads. Continuing in this fashion Hn → GI pointwise. So in the limit, the distribution G is the same as the distribution F plus a number of simple mean-preserving spreads. If G and F are discrete distributions, then the match can be achieved with a finite number of simple mean-preserving spreads

(iii) ⇒ (i): This was proved in (10) for .y x= + ε As discussed there, it is irrelevant whether y is equal to x + ε or just distributed like that. It was noted previously, but is worth mentioning again, that the “riskier” relation is not a complete ordering. Rather it is a partial ordering.9 There are many pairs of distributions for which neither is riskier than the other. Furthermore, this does not mean that they are equally risky. Their risks simply cannot be compared. Indeed, not only is it incorrect to asset that two distributions are equally risky when neither is riskier than the other. It is never correct to say that, and two random variables can be equally risky only if they have the same distribution.

Theorem 2.2: Equally Risky Variables. Two random variables are equally risky with both being weakly less risky than the other if an only if they have identical probability distribu-tions.

Proof: The sufficiency of identical distributions is obvious. The necessity follows directly from property (ii). If the two random variables are both less risky than the other, then for all t, [ ( ) ( )] 0t

a G s F s ds− ≥∫ and [ ( ) ( )] 0.ta F s G s ds− ≥∫ That can be true only if each integral is

zero for all t, and this can be true only if F(s) = G(s) everywhere. Riskiness comparisons are only concerned with the marginal distributions. The joint

distribution of the two (or more) random variables are irrelevant. In later chapters, the joint distributions will be of most interest. This is particularly true in portfolio analysis where covariances and other relations are very important.

In the previous chapter, risk aversion was defined as a preference for the expected value

9 The precise term for the “weakly riskier” relation is a partial ordering. A partial ordering is one that is reflexive, antisymmetric, and transitive. The relation on a set is (i) reflexive if ,x x(ii) antisymmetric if x y y⇒ ,xand transitive if and ,x y y z x z⇒ for all distinct x, y, z ∈. The “weakly riskier” ordering is a partial ordering on probability distributions but not random variables. Two random variables with the same distribution would both be weakly less risky than the other violating antisymmetry. “Strictly riskier” is a strict partial ordering which is irreflexive, antisymmetric, and transitive.

Page 7: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 7 version: June 25, 2019

of a random variable over the random variable itself. The comparison was made only between a risky prospect and a second with the same expectation that was free of risk. The Rothschild-Stiglitz definition of risk permits a stronger definition of risk aversion namely the preference for the less risky of two alternatives with the same mean even if both are risky. This is obviously a stronger definition as in includes all cases from the previous one. Nevertheless, Theorem 2.1 shows that these two definitions of risk aversion are identical under Expected Utility Theory From condition (i) if a utility function is concave, and therefore risk averse in the Arrow-Pratt sense, it is also risk averse in the Rothschild-Stiglitz sense of preferring the less risky option.

In other theories, there are alternative, narrower definitions of riskier for which these two concepts of risk aversion are not identical. That is, an agent with a concave utility function may not always pick the less risky of two gamble with the same expectation. One such notion of risk is comonotonic risk. This is the basis of Rank Dependent Utility which is described briefly below and fully developed in Chapter 11.

Determining whether one random variable is riskier than another is not necessarily easy. Condition (i) is essentially the definition. Obviously, it is impractical to compute the expected utility for all possible concave utilities. Condition (iii) can be used to show that one variable is riskier than another, but not finding such a relation does not necessarily mean there is none; it might only show a lack of imagination on our parts. Condition (ii) can be used constructively; however, if might not be simple to show that the inequality holds for all t. Nevertheless, there are easily checked necessary conditions that must hold when one variable is risker. As already mentioned, y can be riskier than x only if its variance is at least as large. Another necessary condition is that its minimum outcome cannot be higher than the minimum outcome for x. This is obvious from condition (ii). It also follows more intuitively from condi-tion (i). Suppose the lower extent of the domains are ordered ax < ay. Then the utility function

1( ) ( )xu z z a −≡ − − assigns −∞ utility to x but finite utility to y. As that utility function is concave, x is not weakly preferred by all concave utilities so y cannot be risker. Limiting arguments show that if x and y have the same lowest outcome, then only the one with the larger probability of this outcome can be riskier. If the lowest outcome of both x and y are equal and they have the same probabilities, then y is riskier only if its second lowest outcome is smaller than that of x. These two restrictions on the low outcomes and their probabilities can be applied recursively until a difference is found.

The riskier random variable must also have the higher maximum outcome. This follows from symmetry. Recall that Rothschild-Stiglitz riskiness relates to preference of all concave utility functions and not all increasing concave utility functions. Another way to state this result is: If y is riskier than x, then −y is riskier than −x. This is obvious from condition (iii). So if the minimum outcome of −y is less than the minimum outcome of −x, then obviously the maximum outcome of y is greater than the maximum outcome of x.

In Economics, “riskier” is mostly interpreted in a Rothschild-Stiglitz sense. In Finance, this is also true with one major exception. In mean-variance analysis underlying the Capital Asset Pricing Model of Chapter 5, “riskier” indicates a higher variance, or possibly, a higher covariance or beta. This terminology is also used outside that mdel. Usually the context will make clear which of the three interpretations is meant. Risk and Symmetric Spreads

A simple way to create a probability distribution that is risker than another is to add a symmetric spread

Page 8: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 8 version: June 25, 2019

2 1

2

( ) ( ) ( )

where ( ) 0, ( ) 0 ( ) ( ), .

f x f x h x

f x h x dx x h x z h x z z

= +

≥ = ∃ ° : ° − = ° + ∀∫ (15)

The first condition guarantees that the new density is nonnegative everywhere as a density function must be. The second condition ensures that f2 has a total mass of 1. The third condition shows that the spread is symmetric around x°, which need not me the mean of the original distribution. The second random variable is risker because

02 2 1

1 10 0

[ ( )] ( ) ( ) ( ) ( ) ( ) ( )

[ ( )] [ ( ) ( )] ( ) [ ( )] 2 ( ) ( ) .

u x u x f x dx u x f x dx u x h x dx

u x u x z u x z h x z dz u x u x h x z dz

∞ ∞ ∞=

−∞ −∞ −∞∞ ∞

≡ = +

= + ° + + ° − ° + ≤ + ° ° +

∫ ∫ ∫∫ ∫

(16)

The equality in the second line follows from ( ) ( ).h x z h x z° − = ° + The inequality follows be-cause u is concave. The final integral is zero because 0 0( ) ( )h x z dz h x z dz

∞ ∞∫ ° + = ∫ ° − by symme-

try, and as the integral over the entire range is zero, each portion must be.

Relative Riskiness when Expectations Differ

Riskiness has this far only been compared between opportunities with the same expecta-tion. Much of Finance deals with the tradeoff between risk and expected returns so we need to extend the notion of risk to opportunities with different expectations to examine this tradeoff.

Obviously, we cannot say that x is less risky than y whenever [ ( )] [ ( )]u x u y≥ for all increasing, concave utility functions. Utility functions “sufficiently close” to risk neutral will always prefer the outcome with the higher mean so, by this definition, x could be less risky thany only if its expectation were higher. This is clearly not in keeping with our intuition on riskiness that it is something completely different from the expected outcome. Neither would it be partic-ularly useful as riskier prospects typically have larger expectations. In fact, the aforementioned condition [ ( )] [ ( )]u x u y≥ is the definition of second-order stochastic dominance, which is discussed later.

If a risk-free opportunity is available with outcome x0, and if it is meaningful to talk about portfolio combinations, then one solution to differing means is to say y is riskier that x if

0 0( )x x x+ λ − with 0 0( ) ( )y x x xλ ≡ − − is less risky than y in a Rothschild-Stiglitz sense. This construction levers or de-levers x so that is has the same expectation as y. This definition has been used by Levy and others, but it is not the usual choice. For example, in the mean-variance model underlying the CAPM, the usual statement that a higher variance means more risk would not be correct.

A more common choice is to say that x is (strictly) less risky than y if x − [x] is (strictly) less risky than [ ]y y− by the Rothschild-Stiglitz definition. This is the sense in which their notion of riskiness will be used here. When comparing risky prospects with different expecta-tions, there will generally be some trade-off between the expectation and risk. This will form a large part in our study of portfolios. The general, and unfortunately only partial, criteria for analyzing such trade-offs is stochastic dominance.

Strong (Rothschild-Stiglitz) Risk Aversion

In the previous chapter risk aversion was defined as the rejection of any risky prospect in favor of a safe prospect with the same expectation. There it was shown that concave utility functions are risk averse in this sense. With a general definition of risk now available, it seems reasonable to discuss a stronger notion and say that a utility function is risk averse if it never

Page 9: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 9 version: June 25, 2019

assigns a lower value to the less risky of two prospects with the same expectation. Of course we already have discussed that notion of risk aversion. It was given as condi-

tion (i) in Theorem 2.1. All concave risk utility functions are also risk averse in this stronger sense. In other words, there is no difference between the weak (Arrow-Pratt) and strong (Rothschild-Stiglitz) notions of risk aversion under Expected Utility Theory. However, in other contexts, these two concepts can be distinct. This is addressed in Chapter 11 on probability weighting. First-Order Stochastic Dominance10

The previous section introduced the concept x is preferred to y for all increasing, concave utility functions only to dismiss it. While that is not a good description of riskiness, it is exactly the description of second-order stochastic dominance. To describe second-order dominance, we must first take two steps back to state-wise dominance. A random variable x is said to dominate y state-wise if .x y≥ That is, x state-wise domi-nates y if y x= + η and 0η ≤ so that outcome by outcome x is never smaller. It strictly state-wise dominates if it is larger some times.11 Obviously if x strictly dominates ,y then it provides higher expected utility for any strictly increasing utility function. A bit less obviously

[ ( )] [ ( )]u w x u w y+ ≥ + for any random w as well. This follows because that relation holds for any fixed w so

[ ( )] [ ( ) | ] [ ( ) | ] [ ( )] .[ ] [ ]u w x u w x w u w y w u w y+ = + + +≥ = (17)

This means that even when facing other risks, any agent with increasing utility will always prefer the addition of x to the addition of y. If it is meaningful to take negative (or short) positions in the risks, any agent would also willingly add x financed by selling y. That is,

[ ( )] [ ( )] [ ( )] .u w x y u w u w+ − = − η ≥ (18)

Dominance will be an important concept in our study of arbitrage in the next chapter. First-order (or first-degree) stochastic dominance is similar to state-wise dominance. It is defined as 1SD 1[ ( )] [ ( )],x y u x u y u⇔ ≥ ∀ ∈ (19)

where 1 is the set of increasing utility functions. Because 1 includes risk-neutral utility, first-order dominance requires that x has at least as large an expectation. The difference between first-order stochastic dominance and state-wise dominance can be illustrated by three gambles based on the throw of a single fair die

die outcome 1 2 3 4 5 6 gamble A 5 5 7 10 10 10 gamble B 5 5 5 8 10 10 gamble C 9 5 7 10 10 5

10 First-order stochastic dominance is sometimes called monotone stochastic dominance. Similarly, second-order stochastic dominance can be called monotone, concave stochastic dominance. 11 State-wise dominance is also called zero-order stochastic dominance, and as with most comparisons in Finance, a weak rather than strict comparison is assumed unless explicitly stated otherwise. Strict dominance is usually denotedx y> where the > sign is interpreted in the same sense as for vectors — that is, there is at least one state for which xs > ys, rather than x is always strictly bigger. For continuous random variables, the statement is true with probability 1,Pr 0 1,η ≥ = and in addition for strict dominance Pr 0 0.η > >

Page 10: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 10 version: June 25, 2019

Gamble A dominates gamble B state-wise because it is larger when a 3 or 4 is tossed and is otherwise the same. Gamble A does not dominate C state-wise, because C has a better result when a 1 is tossed. However, A first-order stochastically dominates C. Each outcome is equally likely so A’s two payoffs of 10 and 5 when a 6 or 1 are tossed beat the 5 and 9 outcomes for C.

The random variable statement for first-order stochastic dominance is dy x= + η with 0.η ≤ The difference here is that when x and y are defined on the same set of events, x need not exceed y in each state; rather its distribution must simply be better as described above.

Another way to describe first-order stochastic dominance is that the cumulative distribution for y is never less than that for x so at any value z, G(z) ≡ Pry ≤ z ≥ Prx ≤ z ≡ F(z). This is the usual statement of first-order stochastic dominance; however, it is perhaps a bit easier to recall the relation as Pry ≥ z ≤ Prx ≥ z where the stochastically larger random variable has the larger probability. The relation of these descriptions are formalized in the next theorem.

Theorem 2.3: First-Order Stochastic Dominance. Let andx y be two random variables with the cumulative distribution functions, F(x) and G(y), defined on the union of the domains of the two random variables; that is, [a, b] ≡ [min(ax, ay), max(bx, by)]. The random variable x first-order stochastically dominates y if and only if the following equivalent conditions hold

i) First-order stochastic dominance: 1[ ( )] [ ( )]u x u y u≥ ∀ ∈

ii) Distribution description: G(z) − F(z) ≥ 0, ∀z

iii) Random variable description: with 0.dy x= + η η ≤

Proof: This proof is constructed in three parts: (i) ⇒ (ii) ⇒ (iii) ⇒ (i).

(i) ⇒ (ii): Consider for utility the unit step function that takes on the value 0 below z and 1 at or above z; ( ) .z x zu x ≥= 1 For any choice of z, this function is increasing so from (i)

0 [ ( )] [ ( )] [ ( ) ( )] 1 ( ) 1 ( ) .b

z z t zau x u y dF t dG t F z G z z≥≤ − ≡ − = − − + ∀∫ 1 (20)

(ii) ⇒ (iii): Define the random variables 11( ) ( ( ))y x G F x−≡ and 1( ) ( )x y x xη ≡ − as func-

tions of x. F(x) takes on the values 0 to 1 over the range A ≤ x ≤ B so applying G−1 gives y1 the same distribution as y.12 By assumption G(z) ≥ F(z) so y ≤ x and 0.η ≤ Therefore 1

dy y x= = + η

with 0.η ≤ (iii) ⇒ (i): This portion is obvious. If y has the same distribution as ,x + η then

[ ( )] [ ( )] [ ( )]u y u x u x= + η ≤ (21) because 0.η ≤

The point should be stressed that the conditions (ii) and (iii) are necessary as well as sufficient. Whenever neither x nor y first-order dominates the other, two increasing utility functions can be found such that one prefers x and the other prefers y.

First-order stochastic dominance is a property of the distributions in whole and cannot be applied in many portfolio contexts unlike state-by-state dominance. For example, if x dominates y statewise, then for any w, w + x dominates w + y statewise. However, this is not true for first-order stochastic dominance; 1SDx y does not imply that 1SD .w x w y+ + For example, suppose x takes on the values 0 and 10 with equal probability, y is an independent variable taking on the

12 The inverse function G−1 exists if y is a continuous random variable. A similar proof is valid even if the random variables are discrete or have atoms, but it requires tedious work to construct the mapping.

Page 11: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 11 version: June 25, 2019

value 0 with probability 2/3 and 6 with probability 1/3, and .w x= − Then 0,w x+ ≡ but 6w y+ =

one-third of the time so w x+ does not stochastically dominate .w y+ A sufficient condition for first-order stochastic dominance to survive the addition of a

random variable, w, is that it is independent of both the original variables.

Theorem 2.4: First-Order Stochastic Dominance for Sums with Independence. If w is independent of x and y, then 1SDx y implies 1SD .w x w y+ +

Proof: As 1SD ,x y [u(x)] ≥ [u(y)] for all increasing u. For fixed w, define vw(x) ≡ u(x + w). Then vw(x) is increasing for all w. Because x first-order stochastically dominates y, [vw(x)] ≥ [vw(y)] for all w. But these expectations are just the conditional expectations

[ ( )] [ ( ) | ]w xv x u x w w= + and similarly for y. Therefore,

,

,

[ ( )] [ [ ( ) | ]] [ [ ( )] | ][ [ ( )] | ] [ [ ( ) | ]] [ ( )] .

x w w x w x w

w y w w y y w

u w x u w x w v x wv y w u w y w u w y

+ = + =≥ = + = +

(22)

This holds for all increasing u so w + x first-order stochastic dominates w + y as required. Likelihood Ratio Dominance Likelihood ratio dominance, also called monotonic likelihood ratio is a special case of first-order stochastic dominance. It is used in many models in Finance and Economics when stochastic dominance is not a sufficiently strong property. Probability density f(x) likelihood ratio dominates density g(x) if

2 12 1

2 1

( ) ( ),( ) ( )

f x f xx xg x g x

∀ > ≥ (23)

or for differentiable densities

( )( ) ( ) 0 .f x g xx

∂≥

∂ (24)

Because the likelihood ratio is increasing, it must take on its lowest value at smallest possible outcome and its highest value at the largest possible outcome. And as both densities must integrate to one, there must be a possibly degenerate range of values (x0, x0) such that f(x) ≤ g(x) for x ≤ x0, f(x) = g(x) for x0 ≤ x ≤ x0, and f(x) ≥ g(x) for x ≥ x0. That a monotonic likelihood ratio implies first-order stochastic dominance is verified as follows. From (23) ( ) ( ) ( ) ( ),f z g x f x g z x z≥ ∀ < so

( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) .z z

f z g x dx f x g z dx G z f z F z g z z−∞ −∞

≥ ⇒ ≥ ∀∫ ∫ (25)

Similarly, ( ) ( ) ( ) ( ) [1 ( )] ( ) ( )[1 ( )] .

z zf x g z dx f z g x dx F z g z f z G z z

∞ ∞≥ ⇒ − ≥ − ∀∫ ∫ (26)

Combining

1 ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )1 ( ) ( ) ( )

( ) ( ) .

F z f z F z z G z F z G z F z F z G z zG z g z G z

G z F z z

−≥ ≥ ∀ ⇒ − ≥ − ∀

−⇒ ≥ ∀

(27)

Monotone likelihood ratios are often assumed when first-order stochastic dominance is insufficient. They are used in many principal agent-problems.

Page 12: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 12 version: June 25, 2019

Second-Order Stochastic Dominance Second-order stochastic dominance applies to preferences for all increasing, concave utility functions. It is defined as

2SD 2[ ( )] [ ( )], .x y u x u y u⇔ ≥ ∀ ∈ (28)

The set of increasing concave utilities is included in the set of increasing utilities so all random variables that exhibit first-order dominance also exhibit second-order dominance. This means, among other things, that second-order stochastic dominance, therefore, also requires a weakly larger expectation. Second-order stochastic dominance can be thought of as a combination of first-order stochastic dominance and increased risk. The following theorem formalizes this notion.

Theorem 2.5: Second-Order Stochastic Dominance. Let andx y be two random variables with the cumulative distribution functions, F(x) and G(y), on [a, b] ≡ [min(ax, ay), max(bx, by)]. The random variable x second-order stochastically dominates y if and only if the following equivalent conditions hold

i) Second-order stochastic dominance: 2[ ( )] [ ( )]u x u y u≥ ∀ ∈

ii) Distribution description: ( ) [ ( ) ( )] 0 .z

aT z G t F t dt z≡ − ≥ ∀∫

iii) Random variable description: with 0and [ | ] 0.dy x x= + η + ε η ≤ ε + η =

Proof: (i) ⇒ (ii): This portion of the proof is almost identical to the first portion of the proof of Theorem 2.1 about risk. The integral in (13) remains valid. The difference is the expectations of x and y need not be equal so T(b) can be nonnegative rather than zero.

(ii) ⇒ (iii): Again this portion is similar to the second part of the proof of Theorem 2.3. If GI(b) = FI(b), then x and y have the same expectation so 0.η ≡ In this case, y is only riskier than x, and the proof of Theorem 2.1 applies exactly. If FI(b) < GI(b), then the random variable η is needed. All that is required is to choose ( )W xη = to make FI(b) = GI(b). If there is an interior point, z*, below b where GI(z) = FI(z), then W(x) ≡ 0 for z ≤ z*. Above the last such point, FI need only be increased by an increasing concave function to make GI(b) = FI(b). This describes

( ).W xη = Then a series of simple mean-preserving spreads can be added to match FI to GI.

(iii) ⇒ (i): 2[ ( )] [ ( )] [ ( )] [ ( )] .u y u x u x u x u≡ + η + ε ≤ + η ≤ ∀ ∈ (29)

The first inequality follows because u is concave and x + η + ε is riskier than .x + η The second inequality follows because x + η is first-order dominated by x.

Although the proof uses both first-order dominance and riskiness, it is possible to con-struct examples in which x neither first-order stochastically dominates y nor is less risky than y, but still second-order stochastically dominates y. It is important that the two properties work together. One example is

Pr0.50 50 30 0 200.25 12 4 6 140.25 8 0 6 2

x yη ε−−

(30)

Clearly η is nonpositive while ε adds risk because

[ | 20] [ | 8] 0 .x xε + η = = ε + η = = (31)

Page 13: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 13 version: June 25, 2019

Therefore, y is second-order dominated. However, x does not first-order dominate y because Prx ≤ 13 = 0.5 > Pry ≤ 13 = 0.25. Nor is x less risky than y because var[x] = 402 > var[y] = 54. It is also possible for x to be riskier than y yet first-order (and therefore second-order) dominate it. A simple example is x is uniformly distributed on [2, 4] while y is uniformly distributed on [0, 1]. Once the means have been subtracted, x is riskier even though it first order dominates y. If the random variables are defined on the same state space, then x not only first-order dominates y but could dominate it state-by-state as well. Also, the theorem only says it is always possible to find a random variable representation like (iii). But that is not the only representation that guarantee second-order dominance. Another one is with 0and [ | ] 0 .dy x x= + η + ε η ≤ ε = (32) The sufficiency of this representation can be demonstrated by applying riskiness and first-order dominance in the reverse order used in (29)

2[ ( )] [ ( )] [ ( )] [ ( )] .u y u x u x u x u≡ + η + ε ≤ + ε ≤ ∀ ∈ (33)

This time the first inequality follows from first-order dominance while the second follows from reduced riskiness.

Another characterization that guarantees second-order stochastic dominance is when the cumulative distributions F and G cross only once. In this case x second-order dominates y if and only if [x] ≥ [y].

As with first-order stochastic dominance, second-order stochastic dominance cannot be applied in many portfolio contexts. When x second-order stochastically dominates y it does not necessarily follow that w + x second-order stochastically dominates w + y. Independence of w with both x and y is sufficient for this to be true.

Theorem 2.6: Second-Order Stochastic Dominance for Sums with Independence. If w is independent of x and y, then 2SDx y implies 2SD .w x w y+ +

Proof: The proof is identical to that of Theorem 2.4 when u is assumed to be increasing and concave.

The various notions of dominance, stochastic dominance, and risk are outlined in the table.

Higher-Order Stochastic Dominances Third and higher order stochastic dominances are defined in a similar fashion. Third-order stochastic dominance applies to the set 3, utility functions that are increasing, risk averse, and prudent with u′(x) ≥ 0, u″(x) ≤ 0, and u‴(x) ≥ 0. Third-order stochastic dominance can be described by an integral of the tail condition (ii) in Theorem 2.5

3SD [ ] [ ], ( ) 0, ( ) 0 .z

ax y x y T b T t dt z⇔ ≥ ≥ ≥ ∀∫ (34)

The sufficiency of this condition can be demonstrated applying an integration by parts to (13)

[ ( ) ( )] ( ) ( ) ( ) ( )

( ) ( ) ( ) ( ) ( ) ( ) 0.

|b b

a az b z

a a a

u x u y u z T z u z T z dz

u z T b u z T t dt u v T t dt dv

′ ′′− = −

′ ′′ ′′′= − + ≥

∫∫ ∫ ∫

(35)

Page 14: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 14 version: June 25, 2019

The first term is nonnegative by assumption. The second term is nonnegative because u″ ≤ 0 and the integral is nonnegative by assumption. The integrand in the third term is nonnegative because

0u′′′ ≥ and the inner integral is nonnegative by assumption; therefore, the integral in nonnega-tive. To prove necessity, note that the first two conditions in (34) must be required for third-order stochastic dominance because it includes second-order stochastic dominance. The last condition can be proved with the set of utility functions

21

221

2

1( ; ) ( ) ( )

0 0 .z x x z x zzx x x z

u x z u x u xx z x zz x z

− < − < − < ′ ′′≡ = = > >> (36)

This function is in 3; therefore, from the first line in (35) for any z ≤ b

0 [ ( ) ( )] ( ) ( ) ( ) ( ) ( ) .|b b z

a a au x u y u z T z u z T z dz T t dt′ ′′≤ − = − =∫ ∫ (37)

The evaluated term is zero because ( ) ( ) 0.T a u b′= = The remaining term verifies the last condition in (34). There is no known distributional description for third-order (or higher order) stochastic dominance comparable to dy x= + η + ε for second-order stochastic dominance. Peter Fishburn (1985) did derive a set of six necessary, though not sufficient conditions.

Continuing in the same fashion, stochastic dominance of order n applies to all utility functions whose derivatives alternate in sign up through the nth. The distribution conditions for nth-order stochastic dominance use repeated integrals of the tail function

SD 1

1 0

[ ] [ ], ( ) 0,for 1,2, , 2, ( ) 0 ,

where ( ) ( ) and ( ) ( ) ( ) .

n k nz

n na

x y x y T b k n T z z

T z T t dt T z G t F t

⇔ ≥ ≥ = − ≥ ∀

≡ ≡ −∫

(38)

Repeated integration by parts verifies this condition

(4)1 2 3

2( ) ( )

11

[ ( ) ( )] ( ) [ ] ( ) ( ) ( ) ( ) ( ) ( )

( ) [ ] ( 1) ( ) ( ) ( 1) ( ) ( ) 0 .n bk k n n

k nak

u x u y u b x y u b T b u b T b u b T b

u b x y u b T b u z T z dz−

−=

′ ′′ ′′′− = − − + − +

′= − + − + − ≤∑ ∫

(39)

The first term is nonnegative as marginal utility is positive and x has at least as large an expecta-tion. Each term in the sum is nonnegative because the derivatives and (−1)k both alternate in sign and Tk(b) ≥ 0. Similarly, the integrand is nonnegative at all z so the integral is nonnegative. The converse is also true. The necessity of this condition can be demonstrated by modifying the choice for u in the obvious way so that the nth derivative is equal to (−1)n for x < z and 0 for x > z. Each higher order of stochastic dominance eliminates additional random variables as dominated. That is, the set of undominated prospects shrinks with each increase in n.

There is no simple characterization of third- or higher-order stochastic dominances in terms of random variables comparable to dy x= + η + ε for second-order stochastic dominance. However, it is known that if x dominates y in the nth order, then some noncentral moment less than or equal to the nth must describe this preference. That is, for some k ≤ n, [ ] [ ],k kx y≠ and for the smallest such k, ( 1) [ ] 0.k k kx y− − < The moments for k < n, must be included in this description of nth-order stochastic dominance because stochastic dominance of order n always implies stochastic dominances of all orders higher than n.

Page 15: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 15 version: June 25, 2019

Comparison of Dominance, Stochastic Dominance, and Riskiness

Utility Condition Random Variable Condition Distribution Condition

Dominance [ ( )] [ ( )]u w x u w y+ ≥ +

,w∀ all increasing u 0y x= + ξ ξ ≤

x y≥

1st order Stochastic Dominance

[ ( )] [ ( )]u x u y≥ all increasing u

d 0y x= + ξ ξ ≤

( ) ( )G t F t t≥ ∀

2nd order Stochastic Dominance

[ ( )] [ ( )]u x u y≥ all increasing concave u

d 0[ | ] 0

y xx

= + ξ + ε ξ ≤ε + ξ =

[ ( ) ( )] ( ) 0T

G t F t dt T T−∞

− ≡ ∆ ≥ ∀∫

Rothschild-Stiglitz

Riskiness

[ ( )] [ ( )]u x u y≥ all concave u

d

[ | ] 0y x

x= + ε

ε =

( ) 0, ( ) 0T T∆ ≥ ∀ ∆ ∞ =

Nth order Stochastic Dominance

[ ( )] [ ( )]u x u y≥ all u with (−1)nu(n) < 0

n = 1, …, N

1

1 0

( ) 0,for 1,2, , 2, ( ) 0 ,

where ( ) ( ) and ( ) ( ) ( ) .

k nz

n na

T b k n T z z

T z T t dt T z G t F t

≥ = − ≥ ∀

≡ ≡ −∫

The cumulative distribution functions are F(x) and G(y), respectively.

Risk and Dominance with Quantiles and Lower Partial Moments The development given in the previous sections is the traditional one. However, many of the results using stochastic dominance are easier to demonstrate or and prove using the quantile method developed by Levy and Kroll (1979). The pth probability quantile for the distribution F(x) is that that value of x for which F(x) = p, denoted by F−1(p). 13 The stochastic dominances described by quantiles are

st 1 11 -order: ( ) ( ) [0,1]F p G p p− −≥ ∀ ∈

nd 1 1

02 -order: [ ( ) ( )] 0 [0,1]

pF q G q dq p− −− ≥ ∀ ∈∫ (40)

11 1 1 1

0 0 03 -order: [ ( ) ( )] 0 [0,1] and [ ( ) ( )] 0.

p qrd F r G r drdq p F p G p dp− − − −− ≥ ∀ ∈ − ≥∫ ∫ ∫

Higher-order stochastic dominances are defined similarly with repeated integrals.

13 If F is not continuous, then the quantile function is defined as ( ) inf : ( ) .Q p x F x p≡ ≥ This is unambiguous, because cumulative distribution functions are right continuous. With a slight abuse of notation, we will continue to use F−1(p) to denote the quantiles.

Page 16: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 16 version: June 25, 2019

The validity of these descriptions follows immediately from the tail definitions because the quantile is the inverse function of the cumulative distribution. The equivalence is illustrated in the Figure. The tail and quantile descriptions of second-order stochastic dominance involve the integrals of G(x) − F(x) and 1 1( ) ( ),F p G p− −− as shown in condition (ii) in Theorem 2.5 and (40), respectively. Both integrals computed the signed area between the two curves. Therefore, the two integrals up ay cross-over pint like (xi, Pi) must be equal.

Now consider any point between crossovers like P′ where 1 1( ) ( ),F p G p− −> clearly

1 11 1 1 1

0 0[ ( ) ( )] [ ( ) ( )] [ ( ) ( )] 0 .i iP P xF p G p dp F p G p dp G t F t dt− −′ − − − −

−∞− > − = − ≥∫ ∫ ∫ (41)

The first inequality follows because the left-hand side integral includes an extra portion where the integrand is negative. The final integral must be nonnegative if F second-order stochastically dominates G.

Similarly, consider any point like P″ where 1 1( ) ( ).F p G p− −< Again

1 11 1 1 1

0 0[ ( ) ( )] [ ( ) ( )] [ ( ) ( )] 0.i iP P xF p G p dp F p G p dp G t F t dt+ +′′ − − − −

−∞− > − = − ≥∫ ∫ ∫ (42)

In this case, the left-hand side integral excludes a portion where the integrand is positive. Stochastic dominance can also be described using lower partial moments. The nth lower partial moment of the distribution F(x) is defined as

( ) ( ) ( ) .z n

n bz z x dF xµ ≡ −∫ (43)

The dominance restrictions on lower partial moments can be determined directly from the tail restrictions. Integrating by parts, the lower partial moments are

Figure 2.2: Second Order Stochastic Dominance with Quantile Functions

This figure illustrates how second-order stochastic dominance can be defined using quantile functions.

Page 17: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 17 version: June 25, 2019

1

2 22

( ; ) ( ) ( ) ( ) ( ) ( ) ( )

( ; ) ( ) ( ) ( ) ( ) 2 ( ) ( ) .

z z zz

aa a az zz

aa a

z F z x dF x z x F x F x dx F x dx

z F z x dF x z x F x z x F x dx

µ ≡ − = − + =

µ ≡ − = − + −

∫ ∫ ∫∫ ∫

(44)

Therefore 1 1( , ) ( , )z G z F zµ ≥ µ ∀ is equivalent to condition (ii) in Theorem 2.5 for second-order stochastic dominance. Similarly, 2 2( , ) ( , )z G z F zµ ≥ µ ∀ is equivalent to (37) for third-order stochastic dominance. In general, the nth lower partial moment is related to stochastic dominance of order n+1.

Stochastic Dominance and Generalized Means The lower partial moments give a complete description of stochastic dominance. Gener-alized means are easy to determine and provide necessary conditions for second-order stochastic dominance for positive random variables.14 Generalized means are also known as power means or Hölder means. The generalized α-mean for a positive random variable is defined as

1/( ) [ ] .( )G x xα αα ≡ (45)

The standard mean is [x] = G1(x). Other special cases are the Harmonic mean, G−1; the mean square, G2; the geo-metric mean, limα→0 Gα, the minimum value, limα→−∞ Gα; and the maximum value, limα→∞ Ga.

All of these generalized means i) take on values within the domain of x; that is if a ≤ x ≤ b, then a ≤ Gα(x) ≤ b; ii) are homogenous of degree one, Gα(bx) = bGα(x), and iii) satisfy the relation if α < δ, then Gα(x) ≤ Gδ(x). The last inequality is strict unless x is constant. The first two properties are obvious from the definitions; the last property follows from Jensen’s inequal-ity because f(x) ≡ xα/δ is a concave function if 0 < α < δ. A necessary though not sufficient condition for x to first-order stochastically dominate y is that x has higher generalized means for all α. A necessary though not sufficient condition for x to second-order stochastic dominate y is that x has higher generalized means for all α ≤ 1. These statements follow directly from the definitions of stochastic dominance. The function F(z) ≡ (αz)1/α is strictly increasing for positive z so for the CRRA utility function uα(x) = xα/α

1/ 1/[ ( )] [ ( )] ( ) [ / ] [ / ] ( ) .( ) ( )u x u y G x x y G xα α α αα α α α≥ ⇒ = α ≥ α =α α (46)

For first-order stochastic dominance the first inequality must hold for all increasing u including all uγ. For second-order stochastic dominance the first inequality must hold for all increasing, concave u including all uα with α ≤ 1. Therefore, the generalized means must be ordered for all γ or all α ≤ 1. Comparative Risk Aversion With risk precisely defined, we can discuss risk aversion in more detail. A natural place to begin is by relating riskier to more risk averse. It is natural to say that one agent (or utility function) is more risk averse than another when the first always makes at least as safe a choice. Two such notions have been developed. The more common comparison is due to Arrow and Pratt. One agent is weakly more risk averse than another if in all comparisons between a safe and a risky choice, the first agent always makes the safe choice whenever the second does. In other words, the less risk averse agent assigns a smaller risk premium for every risk.

14 This also applies to continuous nonnegative random variables provided the means are defined.

Page 18: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 18 version: June 25, 2019

The second comparison is based on the Rothschild and Stiglitz definition of risk. One agent is weakly more risk averse than another if he always chooses the less risk two alternatives whenever the other agent does. As shown previously, these two notions of risk aversion are identical under EUT so they will always identify the same utility functions as more risk averse.

Theorem 2.7: Weakly More Risk Averse (Arrow-Pratt). A twice-differentiable single-attribute utility function u1 is weakly more risk averse than another, u2, if the following four equivalent conditions are true:15

1 2

1 2

1 21

1 2

i) , where [ ( )] ( )ii) ( ) ( ) where ( ) ( ) ( )

iii) with 0, 0 such that ( ) ( ( ))iv) ( ) 0 where ( ) ( ( )) .

i i i

i i i

x z u x z u x zA x A x x A x u x u xG G G u x G u x

g t t g t u u t−

π ≥ π ∀ + ≡ + − π′′ ′≥ ∀ ≡ −

′ ′′∃ > ≤ =

′′ ≤ ∀ ≡

The utility function u1 is strictly more risk averse than u2 if each inequality is replaced by a strict inequality.16

Proof: A risky prospect z is accepted only if its expectation exceeds its risk premium. Therefore, if π1 ≥ π2 for all risks, π1 < [z] guarantees π2 < [z] so agent 2 declines any gamble that agent 1 does. This means that (i) describes a utility function that is more risk averse.

(i) ⇒ (ii): Consider the simple gamble z h= ± with equal probability. For small h

2 21 1 2 2( ) 2 / 2 / ( ) .A x h h A x= π ≥ π = (47)

(ii) ⇒ (iii): Because both u1 and u2 are both strictly increasing and twice differentiable, there exists a twice differentiable function G with positive first derivative such that u1 = G(u2). Differentiating twice gives 2

1 2 2 1 2 2 2 2( ) ( ) ( )[ ]u G u u u G u u G u u′ ′ ′ ′′ ′ ′′ ′′ ′= = + (48)

Solving the second for G″ and using the first we have

1 2 2 2 1 2 22 1 22 2

2 2 2 2 2 2 2

( ) ( ) ( )( ) [ ][ ] [ ] ( )

u G u u G u u u G uG u A Au u u u G u u u

′′ ′ ′′ ′ ′′ ′′ ′′′ = − = − = − + ′ ′ ′ ′ ′ ′ ′

(49)

Therefore, if A1 ≥ A2, G″ ≤ 0. (iii) ⇒ (i): Consider a generic gamble, z, with expectation, [ ] .z z= Using Jensen’s inequality

( ) ( )

1 1 1 2

2 2 2 1 2

( ) [ ( )] [ ( ( ))][ ( )] ( ) ( ) .

u x z u x z G u x zG u x z G u x z u x z

+ − π = + = +

≤ + = + − π = + − π

(50)

Therefore, π1 ≥ π2 because u1 is strictly increasing. (ii) (iv): Define t ≡ u2(x) so 1

2 ( ).x u t−= Then 2 ( )dt u x dx′= so 12 2( )/ / 1/ ( ).du t dt dx dt u x− ≡ =

Differentiating g twice gives

15 Conditions (i), (iii), and (iv) also apply in the obvious fashion to utility functions that are not differentiable. The function g must be concave, and G must be increasing and concave. Obviously (ii) only applies when the Arrow-Pratt measure is defined which requires twice differentiable utility. 16 This is the usual meaning of “weakly more risk averse” in contrast to “strictly more risk averse.” The phrase is less commonly used to contrast to Ross’ “strongly more risk averse” described in Theorem 2.8.

Page 19: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 19 version: June 25, 2019

[ ]

11 1 2

1 22

1 11 2 1 2 2 1

1 22 2 22 2 2

( ( ))( ) [ ( ( ))]( )

( ) ( ) ( ) ( )( ) ( ) ( ) .[ ( )] [ ( )] [ ( )]( ) ( )

u u tdg t u u tdt u x

u u t u u t u x u xdxg t A x A xu x u x dt u x

−−

− −

′′ = =

′′ ′ ′′ ′′′ = − = − +

′ ′ ′

(51)

The final equality follows as 12 ( ) .u t x− = The final fraction is positive so A1 ≥ A2 if and only if g″ ≤

0. Changing the weak inequalities in (i) through (iv) to strict inequalities makes the inequal-

ities in (47) through (51) strict as well.

So far we have discussed two notions of risk aversion: i) Arrow-Pratt which says that a risky prospect is never chosen over a fixed payment equal to its expectation and ii) Rothschild-Stiglitz which says that no prospect is ever chosen over less risky prospect with the same expec-tation. Each of these definitions of risk averse has a corresponding notion of more risk averse. Although the two definitions of risk aversion are identical (at least under EUT), they do not come to the same conclusion about the magnitude of difference in risk aversion.

Theorem 2.7 demonstrated that if one utility function is more concave than another then it demands a larger risk premium for bearing any risk in its entirety. But it does not follow that a more concave utility function always demands a larger risk premium for moving from one prospect to a Rothschild-Stiglitz more risk prospect. This can be illustrated with the following example.

There are two risky projects. The first, x, has two outcomes with a probability p of paying 20 and a probability 1 − p of paying 0. The second project, y, has the same probability, 1 − p, of paying 0 and a probability p/2 of paying 25 and 15. The second project is riskier than the first in a Rothschild-Stiglitz sense because the added variation of ±5 in y when x = 20, is conditionally mean-zero noise.

The risk premium that an investor would pay to give up y in favor of x is the solution to[ ( )]u y [ ( )].u x≡ − π The figure shows the risk premium for two exponential utility functions

with risk aversions a = 0.05 and 0.09. The premiums are plotted against the probability p of the higher outcome. The more risk averse utility function demands a higher risk premium than the less risk averse utility only for p > 57.9%.

This result does not coincide with the intuition provided by the simple problem that is the

Figure 2.3: Risk Premiums Illustrating Stronger Risk aversion

This figure illustrates that Arrow-Pratt risk is insufficient for an increase in risk to require a larger risk premium. The strong-er measure of risk aversion due to Ross is required.

Page 20: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 20 version: June 25, 2019

basis for Theorem 2.7. The reason is that the risk premium here provides only partial insurance. It protects against the ±5 risk, but not against the risk of getting 0 instead of a 20. The Arrow Pratt theorem is only applicable for complete insurance.

This problem can also be analyzed just as the simple Arrow-Pratt problem. Becausedy x= + ε with [ | ] 0,xε = a Taylor expansion gives

212

[ ( )] [ ( )] [ ( )]

[ ( )] [ ( )] [ ( ) | ] ( ) ( ) [ | ] ( ) [ | ]

( ) var[ | ] .2 [ ( )]

[ ][ ] [ ]

y x

y x

y x

u x u y u x

u x u x u x x u x u x x u x x

u x xu x

− π ≡ = + ε

′ ′ ′′− π ≈ + ε ≈ + ε + ε

′′− ε⇒ π ≈

(52)

When x is not random (52) is just the Arrow-Pratt result 12 ( ) var[ ].A xπ ≈ ε When x is random,

with a constant conditional variance for ,ε then 12 var[ ] [ ( )] [ ( )],y x u x u x→ ′′ ′π ≈ ε × − which is

almost the same result.17 However, when the conditional variance depends on x, as it does in this example, the

results can be quite different. The risk premium in (52) can be re-expressed as

( ) var[ | ] var[ | ] ( ) var[ | ]2 [ ( )] 2 [ ( )] var[ | ]

( ) var[ | ] var[ | ]1 var[ ] .2 [ ( )]

[ ] [ ] [ ][ ]

[ ]

u x x x u x xu x u x x

u x x xu x

′′ ′′− ε ε − επ ≈

′ ′ ε

′′− × ε ε ε′

=

=

(53)

The numerator is a weighted average of the second derivative of the utility function where the weights are the conditional variances at different values of x. For most utility functions, −u″(x) decreases with x so if var[ | ]xε increases with x, the smaller values of −u″(x) will be over-weighted leading to a risk premium that is smaller than predicted by the Arrow-Pratt measure. When −u″ decreases rapidly, the overweighting can be great enough to decrease the risk premium when risk aversion rises. In particular, for the example discussed

1 12 2(1 ) (100) 0 (115) (125)( ) var[ | ] .

2 [ ( )] 2(1 ) (100) (115) (125)[ ] p u pu uu x x

u x p u pu pu′′ ′′ ′′′′ − ⋅ − −− ε

π ≈′ ′ ′ ′− + +

=

(54)

Because var[ | 100] 0,xε = = the marginal benefit of the insurance depends only on the second derivative at the high outcomes. However, the cost of the insurance depends on the average marginal utility across all outcomes. As p decreases, the numerator decreases more rapidly than the denominator and in the limit goes to zero while the denominator remains positive. The intuition that greater risk aversion leads to larger risk premiums even for partially insured risks can be restored with a stronger notion of risk aversion. That is provided in the following theorem.

Theorem 2.8: Strongly More Risk Averse (Ross). A twice-differentiable utility function u1 is strongly more risk averse than another, u2, if the following three equivalent conditions are true:

17 Note, however, that the ratio of expectations in the final line of (52) is not the expected value of the Arrow-Pratt measure.

Page 21: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 21 version: June 25, 2019

1 2

1 1

2 2

1 2

i) , where [ ( )] [ ( )]( ) ( )ii) 0 such that ,( ) ( )

iii) 0and ( ) with 0, 0 such that ( ) ( ) .

i i ix u x u xu x u y x yu x u y

G x G G u x u x G

π ≥ π ∀ ε + ε ≡ − π′′ ′

∃λ > ≥ λ ≥′′ ′

′ ′′∃γ > ≤ ≤ = λ +

(55)

The utility function ui is strictly more risk averse than uj if each inequality in (55) is replaced by a strict inequality.

Proof: (i) ⇒ (ii): As condition (i) holds for all gambles, we need only consider a simple one like that in the example. For the safe project, the payoff is with probability 1 − p and h with proba-bility p. The riskier project has the same low payoff with the same probability. The higher payoff is split into x h+ ε = ± ε each with probability 1

2 .p Then

1 12 2

[ ( )] [ ( )](1 ) ( ) ( ) ( ) (1 ) ( ) ( ) .

u x u xp u pu h pu h p u pu h

+ ε ≡ − π− + + ε + − ε = − − π + − π

(56)

Differentiating implicitly with respect to ε gives

0

( ) 1 ( ) ( ) ( ) 0 .2 (1 ) ( ) ( )

u h u hpp u pu h ε=

′ ′∂π ε + ε − − ε ∂π ε= − ⇒ =

′ ′∂ε − − π + − π ∂ε

(57)

The second derivative is

2

2

2

( ) 1 ( ) ( )2 (1 ) ( ) ( )

1 [ ( ) (1 ) ( ) ][ ( ) ( )] ( ) .2 [(1 ) ( ) ( ) ]

( ) ( )( ) ( )

( ) ( )

u h u hpp u pu hpu h p u u h u hp

p u pu h

′′ ′′∂ π ε + ε + − ε= −

′ ′∂ε − − π ε + − π ε′′ ′′ ′ ′− π ε + − − π ε + ε − − ε ∂π ε

−′ ′− − π ε + − π ε ∂ε

(58)

Therefore,

2

20 (0) 0

( ) ( )(0) .(1 ) ( ) ( ) (1 ) ( ) ( )( ) ( )

pu h pu hp u pu h p u pu h

ε= π =

′′ ′′∂ π − −′′π ≡ = =′ ′ ′ ′∂ε − − π ε + − π ε − +

(59)

This must hold for all ε and p. For very small ε, 2 21 12 2( ) (0) (0) (0) (0) .′ ′′ ′′π ε ≈ π + π ε + π ε = π ε It

follows then that π1 ≥ π2 for lotteries with all sizes of ε only if 1 2(0) (0).′′ ′′π ≥ π From (59)

1 1 1

2 2 2

( ) ( ) (1 ) ( ) .( ) ( ) (1 ) ( )

u h pu h p uu h pu h p u l

′′ ′ ′+ −≥

′′ ′ ′+ −

(60)

This must be true for all h, , and p so

1 1

2 2

( ) ( ) , .( ) ( )

u h u hu h u

′′ ′≥ λ ≥ ∀

′′ ′

(61)

(ii) ⇒ (iii): Define 1 2( ) ( ) ( ).G x u x u x≡ − λ Differentiating, 1 2 2 1 2( / ).G u u u u u′ ′ ′ ′ ′ ′= − λ = − λ So G′ ≤ 0 because 2 0u′ > , and 1 2/ .u u′ ′ ≤ λ The second derivative is 1 2 2 1 2( / ).G u u u u u′′ ′′ ′′ ′′ ′′ ′′= − λ = − λSo G is weakly concave because u is.

(iii) ⇒ (i): In the following string, the first inequality follows from Jensen’ inequality. The second inequality follows because G is a decreasing function and π2 is nonnegative

Page 22: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 22 version: June 25, 2019

1 1 1 2 2

2 2 2 2 2 1 2

[ ( )] [ ( )] [ ( )] [ ( )] [ ( )] [ ( )][ ( )] [ ( )] [ ( )] [ ( )] [ ( )]

u x u x u x G x u x G xu x G x u x G x u x

− π ≡ + ε = λ + ε + + ε ≤ λ + ε +≡ λ − π + ≤ λ − π + − π ≡ − π

(62)

Because 1 1 1 2[ ( )] [ ( )]u x u x− π ≤ − π and u1 is an increasing function, it follows that π1 ≥ π2. Obviously condition (ii) in this theorem implies condition (ii) in Theorem 2.8 so the Arrow-Pratt definition includes more utility functions than Ross definition.

To summarize weak (Arrow-Pratt) risk aversion means [ ( )] [ ]( ).u x u x≤ Strong (Rothschild-Stiglitz) risk aversion means [ ( )] [ ( )] when [ | ] 0.u x u x x+ ε ≤ ε = These two ideas are identical under EUT. However, strongly (Ross) more risk averse means the risk premium is larger. 1 2 , where [ ( )] [ ( )].i i ix u x u xπ ≥ π ∀ ε + ε ≡ − π Weakly (Arrow-Pratt) more risk averse is the special case when x is constant. These two ideas are distinct as shown in Theorem 2.8. Changing Risk Aversion The concept of more risk averse can also be applied to a given utility function evaluated at different points. Utility can be more or less risk averse at different levels of wealth. For the utility of wealth or wealth, it is customarily assumed and generally experimentally verified that absolute risk aversion is decreasing. A utility function displays decreasing absolute risk aversion (DARA) if the risk premium for every gamble is decreasing in wealth. The definition of the risk premium is [ ( )] ( ) .( )u x z u x x+ = − π Differentiating totally with respect to x gives

( ) [ ( )][ ( )] ( ) [1 ( )] ( ) .( )

( )( )( )

uu u u

u

u x x u x zu x z u x x x xu x x

′ ′− π +′ ′ ′ ′+ = − π − π ⇒ π =′ − π

(63)

Now define v(x) ≡ −u′(x). This is an increasing function because u is concave so it can also be thought of as another utility function.18 Because the denominator in (63) is positive, the risk premium is decreasing with wealth if and only if ( ( )) [ ( )].uv x x v x z− π ≥ + The right hand side of that relation defines the v-utility risk premium, [ ( )] ( ) .( )vv x z v x x+ ≡ − π So DARA is equivalent to the u risk premium being smaller than the v risk premium. By Theorem 2.72.72.7, this means that v is more risk averse than u and is therefore a concave transformation of it. So a utility function displays DARA if and only if ( ) ( )( )u x g u x′ = − where g is an increasing concave function. A utility function displays strict DARA if the inequalities here are strict, or g is strictly concave.

If u is thrice differentiable, then

( )2

22 2

( ) [ ( )] 1( ) ( ) ( ) [ ( )] .( ) [ ( )] [ ( )]

u x u xA x u x u x u xu x u x u x′′′ ′′ −′ ′ ′′′ ′′= − + = −′ ′ ′

(64)

For DARA, this must be negative which can be true only if the third derivative of u be positive. In fact, the third derivative must be strictly positive even when utility displays local (or global) constant absolute risk aversion. However, a positive third derivative is not sufficient by itself for DARA. For example, the third derivative of −e−aw is positive, but its absolute risk aversion is constant not strictly decreasing, and risk aversion is increasing for quadratic utility.

An alternate characterization of DARA is that marginal utility is log-convex. If n ( )u x′ is

convex then n ( )u x′− is concave, and

18 The utility function v need not be concave. However, it is proved just below that it must be if u is DARA.

Page 23: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 23 version: June 25, 2019

2

2( ) ( )0 [ n ( )] .

( )d d u x dA xu xdx dx u x dx

′′ −′≥ − = = ′ (65)

In terms of preference relations, DARA can be defined as any gamble weakly rejected at a wealth of x must be weakly rejected at any lower wealth x − k < x; that is

.x y x x k y x k+ ⇒ − + − (66)

For strictly DARA, the weak preferences are replaced with strict preferences, and the inequalities in the previous equations are also strict.

Relative risk aversion is decreasing if

( ) [ ( )] ( )0 ( ) .dR x d xA x dA xA x xdx dx dx

≥ = = + (67)

Relative risk aversion only makes sense when all outcomes are positive so assuming A > 0, strictly decreasing absolute risk aversion is a necessary condition for decreasing relative risk aversion. Conversely increasing relative risk aversion is a necessary condition for increasing absolute risk aversion. However, it is possible to have decreasing absolute risk aversion along with decreasing, constant, or increasing relative risk aversion. The LRT utility function (w − a)γ/γ has decreasing absolute risk aversion, but relative risk aversion is decreasing, constant, or increasing as a is positive, zero, or negative, respectively.

There is mixed evidence on whether relative risk aversion is increasing or decreasing. Early experiments and surveys indicated that relative risk aversion was increasing; some later studies have shown the opposite. One problem in determining the change in relative risk aversion is the confounding effect that risk aversion has on wealth. Those who are less relatively risk averse are willing to invest more heavily in riskier prospects. If, as should be the case, risky prospects earn more on average than safe prospects, then those who are less risk averse will tend to become wealthier. Therefore, in a cross section of people those who are wealthier will tend to display less risk aversion. It is difficult to tell if this is due to decreasing relative risk aversion or if their innately lower risk aversion led to their higher wealth. Prudence

The third derivative of a utility function is called prudence. This concept, though not the name, was first introduced by Leland (1968). A risk averse utility function is prudent if marginal utility is convex, ( ) 0,u x′′′ > and imprudent if marginal utility is concave. The ratio

( )( )( )

u xP xu x′′′

≡ −′′

(68)

is referred to as absolute prudence. As with risk aversion, relative prudence is defined as xP(x). Prudence can be defined in terms of simple gambles as follows. A utility function displays (strict) prudence if the 50-50 lottery (−k, x) is (strictly) preferred to the 50-50 lottery (0, x − k) for all k > 0 and x with [x] = 0. If this is true at all initial wealth levels, then the utility function is globally prudent. (See Eeckhoudt and Schlesinger 2006). This shows that prudence is a preference for the disaggregation of a certain loss with a zero-mean risk; the utility function prefers the risk to be combined with better outcome (0 rather than −k). There are many similarities between prudence and risk aversion. Most obviously from the definition in (68), prudence is the “risk aversion” of marginal utility. As shown below, prudence

Page 24: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 24 version: June 25, 2019

is related to the precautionary premium just as risk aversion is related to the risk premium. Also its relation to the third moment of the distribution is similar to that between risk aversion and variance.

For exponential utility, absolute prudence is equal to absolute risk aversion. For utility with a constant relative risk aversion of ρ, relative prudence is constant at 1 + ρ. In general, the relation between Arrow-Pratt risk aversion and prudence can be determined by

2 2( ) ( )( ) ( ) ( ) ( ) ( ) ( ) ( ) .( ) ( )

u x A wA x A x A x P x A x P w A wu x A w′′′ ′

′ = − + = − + ⇒ = −′

(69)

Absolute prudence exceeds absolute risk aversion when the latter is decreasing, so DARA implies prudence.

Leland showed that prudence was related to the optimal savings plans. Consider the maximization problem for a saver with initial wealth w0, random time-1 income of y and time additive utility. The problem and first-order condition are19

0 0 1

0 0 1

max ( ) [ ( )]

0 ( *) [ ( * )] .s

u w s u s y

u w s u s y

− + +

′ ′= − − + +

(70)

Now compare savings for two different incomes with 2y riskier than 1,y 2 1 .dy y= + ε For a prudent investor, 1u′ is convex so 0 0 1 1 1 1 1 1 2( ) [ ( )] [ ( )] .u w s u s y u s y∗ ∗ ∗′ ′ ′− = + ≤ + (71)

Because marginal utility is decreasing, an increase in saving will decrease the right-hand side and increase the left-hand side. Therefore, the optimal saving when income is riskier must be larger for a prudent investor. Another feature of prudence is its relation to the third moment of a risk. Just as a utility function’s second derivative is associated with variance, its third derivative is associated with skewness or the third moment.20 Using a three-term Taylor expansion, the expected utility of an uncertain outcome can be expressed as

2 31 12 6[ ( )] ( ) ( ) [( )] ( ) [( ) ] ( ) [( ) ]′ ′′ ′′′= + − + − + − +

u x u x u x x x u x x x u x x x (72)

The risk premium can be approximated by equating (72) and ( ) ( ) ( )′− π ≈ − πu x u x u x

1 ( ) 1 ( ) 1 1var[ ] skew[ ] ( ) var[ ] ( )skew[ ]2 ( ) 6 ( ) 2 3

u x u xx x A x x P x xu x u x′′ ′′′ π ≈ − − = − ′ ′

(73)

where 3skew[ ] [( ) ].x x x≡ − So a prudent investor demands a large risk premium when the risk has a larger left tail (negative skewness). Similarly, if the fourth derivative is negative, then kurtosis is disliked at least for small gambles. This approximate result continues for higher derivatives and moments. However, in all cases the relation between preferences for moments and derivatives of the utility function are guaranteed to be true only for small gambles. Brockett and Khane (1992) show that if the deri-vatives of the utility function alternate in sign at least through n, it is always possible to find two 19 The interest rate on saving has been set to zero, but that is with no loss of generality. Interest received could be added to the argument of u1 or time-1 consumption and income could be measured in present value terms. 20 In statistics, skewness generally refers to the third moment normalized by the cube of the standard deviation. In Finance, skewness is often defined as the unnormalized third moment.

Page 25: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 25 version: June 25, 2019

random variables x and y with any specified orderings on their first n moments such that either ordering is preferred. In particular, consider the two random variables

3

3

2.45 prob 0.51[ ] 4.9245 var[ ] 6.373 [( ) ] 0.644

7.05 prob 0.49

0 prob 0.124.9 prob 0.75 [ ] 4.975 var[ ] 6.257 [( ) ] 1.71910 prob 0.12

x x x x x

y y y y y

≡ = = − =

≡ = = − =

(74)

The random variable y has a higher expectation, small variance, and larger skewness. Neverthe-less, for CARA utility with a = 1, [u(x)] = −0.044 > [u(y)] = −0.126. This is also true for the utility function u(x) = x1/2. On the other hand, a quadratic or cubic utility function that was increasing and concave over the relative range would obviously rank x above y. Higher Order Derivatives and Completely Monotonic Utility

Most commonly used utility functions have derivatives that alternate in sign. This means that, subject to the caution in footnote Error! Bookmark not defined., odd central moments are liked and even central moments are disliked. Not only is this true of common utility functions, it is essentially required. To be precise, if each of a utility function’s first n derivatives has a constant sign on some semi-infinite range (x, ∞), then those signs must alternate; u′ > 0, u″ < 0, …, u(2n) < 0, u(2n+1) > 0 or more succinctly ( )( ) ( 1) ( ) 0, .i i

if x f x i n≡ − < ∀ ≤ This can be proved by induction.

Assume that the first n + 1 derivatives of u each has a constant sign on (x, ∞) with fi(x) < 0 for i = 1,…, n while fn+1 breaks the pattern with 1( ( ) 0.)n nf fx x+ −= >′ From the mean value theorem

1 2 1 2 1 1 1 2 1

1 1 2 1 1 1 1 2 1

* [ , ] such that ( ) ( ) ( *)( )( ) ( *)( ) ( ) ( )( ) .

n n n

n n n n

x x x f x f x f x x xf x f x x x f x f x x x

− − −

− −

′∃ ∈ = + −= − − > − −

(75)

The last inequality follows because 1 0) )( (n nf f xx+ ′= >− so fn is a strictly decreasing function. The points of evaluation are arbitrary except for satisfying x2 > x1 > x. The domain of x is unbounded by assumption, fn−1(x1) < 0, and fn(x1) < 0, so we can choose 2 1 1 1 1( ) ( )./n nx x f x f x−> + For this value, (75) says

1 11 2 1 1 1 2 1 1 1 1

1

( )( ) ( ) ( )( ) ( ) ( ) 0 ,( )

nn n n n n

n

f xf x f x f x x x f x f xf x

−− − −> − − > − = (76)

which is a contradiction. Therefore, fn+1 cannot be positive for all x as conjectured. If the derivatives of an infinitely differentiable function alternate for all n, the function is called completely monotonic. Most commonly used utility function, are completely monotonic. Stochastic dominance is relatively easy to verify for completely monotonic utility. Bernstein’s theorem says that a function f is completely monotonic if and only if there is a nonnegative measure h such that

0( ) ( ) ;sxf x e h s ds

∞ −= ∫ (77)

in other words, f is the Laplace transform of some nonnegative function h. In terms of utility, this means that any completely monotonic utility function a nonnegative linear combination of expo-

Page 26: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 26 version: June 25, 2019

nential utility functions

0( ) ( ) .axu x e h a da

∞ −= ∫ (78)

So if x is preferred to y for all CARA utility functions then x is preferred for all completely monotonic utility functions as well.

0 0

0 0

[ ( )] ( ) [ ] ( )

[ ] ( ) ( ) [ ( )]

ax ax

ay ay

u x e h a da e h a da

e h a da e h a da u y

∞ ∞− −

∞ ∞− −

= − = −

≥ − = − =

∫ ∫∫ ∫

(79)

The inequality follows because [ ] [ ]ax aye e− −− ≥ − and h(a) ≥ 0 for all a > 0 by assumption. First-Order Risk Aversion

Arrow-Pratt risk aversion is also called second-order risk aversion, but do not be confused by this designation. It is second-order in a mathematical sense described in (80) below. In most models it is of first-order importance.

Second-order (Arrow-Pratt) risk aversion means that the risk premium of a zero-mean gamble scales with the square of the magnitude of the gamble. Consider a zero-mean gamble scaled by t. From the derivation of the risk premium,

21 12 2[ ( )] ( , ) ( , ) ( ) var[ ] ( ) var[ ] .( )u x t u x x t x t A x t A x t+ ε = − π ⇒ π ≈ ε = ε (80)

In this standard case, the risk premium increases with the square of the scaling variable, or, equivalently, it is proportional to the variance of the risk, at least for small risks.

Under first-order risk aversion, the risk premium is proportional to the first power of the scaling variable or to the standard deviation of the risk. This will be true if the utility function has a kink at the current endowment. To illustrate, consider the simplest case with a piecewise linear utility function having a slope of 1 above x and λ > 1 below x and normalize u(x) = 0. The certainty equivalent of a zero-mean gamble is clearly negative so u(x−π) = −λπ and

( , ) ( , ) [ ( )] [max( ,0)] [min( ,0)]

[ ] ( 1) [min( ,0)] ( 1) [min( ,0)]1( , ) [min( ,0)] .

( )u x x t x t u x t t tt t t

x t t

− π = −λπ = + ε = ε + λ ε= ε + λ − ε = λ − ε− λ

⇒ π = ε ⋅λ

(81)

Here for small risks, the risk premium is first order in t or equivalently the premium is propor-tional to the standard deviation of the risk (holding the distribution fixed). This scaling is typically true for any kinked utility function. However, it is entirely a local property. Evaluated at a different point, risk aversion is second order unless there is a kink there as well. And there definitely cannot be a kink at every point in an increasing concave function. The distinction between first and second-order risk aversion might seem to be one of only theoretical interest, but it does help explain some phenomena. There is also one particularly important consequence. If it is possible to take a portion of a gamble, then both the mean and standard deviation scale with the portion accepted while the variance scales with the square of the proportion accepted. This means that any gamble with a positive mean will always be accepted at some scale under second-order risk aversion, but not necessarily under first-order risk aversion. To see this take t as the fraction of the gamble accepted. If the gamble has a positive expectation µ > 0, then the excepted gain and the second order risk premium for taking

Page 27: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 27 version: June 25, 2019

the fraction t of the gamble are tµ and 2 212 ,At σ respectively. So there is always a t sufficiently

small that the mean gain exceeds the risk premium. But standard deviation also scales propor-tionally with t, so under first-order risk aversion, both the mean and first-order risk premium are always in proportion to each other so the latter may always exceed the former and the gamble might never be accepted at any scale. First-order risk aversion is used to explain the reluctance of most people to take on small risks even when favorable outcomes are likely. First-order risk aversion is not inconsistent with expected utility maximization, but a concave utility function cannot have a kink everywhere and still be increasing. Therefore, relying on first-order risk aversion as an explanation means accep-ting the coincidence that the current situation is atypically right at a kink. In a multi-period context we would also require that the utility function be changing over time to relocate the kinks to the current level. There is another situation that can be considered first-order risk aversion even when the utility function is twice differentiable. This occurs when the agent already faces some risk while considering adding some amount of a new risky prospect. The existing risk is ,xx x≡ + ε and the agent is considering adding an additional risky prospect ηη = η + ε in some scale t. The risk premium, π(t), required for the agent to take the new prospect at scale t is the solution to

[ ( )]+ η = u x t [ ( )].+ η − π u x t Using a Taylor expansion around ,+ ηx t

( )

2 21 12 2

2 21 12 2

1/22( )

( )

[ ( ) ( )( ) ( )( ) ] [ ( ) ( )( ) ( )( ) ]

( )[2 cov( , ) var( )] ( ) ( ) ( ) ( )

( ) 1 1 [2 cov( , ) var( )] ( )[ cov( ,

x x x x

uu

u u t u t u u u

u t x t u t u t

t t x t A t x

η η

⋅′′⋅′

′ ′′ ′ ′′⋅ + ⋅ ε + ε + ⋅ ε + ε ≈ ⋅ + ⋅ ε − π + ⋅ ε − π

′′ ′ ′⋅ η + η ≈ − ⋅ π + ⋅ π

⇒ π ≈ − + η + η ≈ ⋅

212) var( )] .tη + η

(82)

So provided the covariance is not zero, the risk premium is proportional to the amount adopted for very small t. Only for lager adoptions does the variance of the new risk become important, and its effect on the risk premium varies with the square of the amount adopted. Also the measure of risk at small t is the covariance of the new risk with the existing one rather than the variance. In addition, if the covariance is negative, the risk premium is as well.

The Samuelson Paradox Samuelson (1963) reports that a colleague of his declined to take an even bet of winning $200 or losing $100; however, he would have been willing to accept that bet if it were to be repeated 100 times. Samuelson showed that if his colleague were unwilling to accept the single bet at any wealth level, it was inconsistent with expected utility maximization that he should accept 100 such bets. This is known as the Samuelson Paradox. The basic reasoning is this. To reject a single favorable bet, risk aversion must exceed some number, a, and if that bet is rejected at all wealth levels, then A(x) > a for all x. But A(x) =

[ n ( )] / ,′− d u x dx so

0 0

0

0 0 0 0

( )0

n ( ) n ( ) n ( ) n ( ) n ( ) ( )

( ) ( ) .− −

′ ′ ′ ′ ′= + < − = − −

′ ′⇒ <

∫ ∫

x x

x x

a x x

u x u x d u x u x adx u x a x x

u x u x e (83)

That is, marginal utility must decrease at least exponentially fast over the relevant region. This means that the utility benefits in the upper tail of the distribution cannot offset the downside risk for large gambles.

This paradox is not necessarily a rejection of expected utility maximization even by

Page 28: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 28 version: June 25, 2019

Samuelson’s colleague. The assumption that the gamble would be turned down at all wealth levels is not a minor one. If people have decreasing risk aversion, and they seem to, gambles rejected at low wealth levels might be accepted at higher wealth levels, and because a series of the favorable bets increases expected wealth, the series can be accepted when a single bet is not.

However, risk aversion as measured by such questions does seem to be extreme. For example, if a person turns down a 50-50 bet that would lose $100 and gain $125 for any wealth level below $300,000, then the reasoning in (83) shows they would turn down a 50-50 gamble of losing $600 and winning $36 billion when their wealth is $290,000. This would seem to be an unlikely characterization of most people’s preferences.

First-order risk aversion is one explanation of this behavior. For example, the utility function that is piecewise linear with a kink at the current wealth will reject the single Samuelson gamble if the slope below the current wealth is steeper than twice that above the current wealth. But it will accept as few as two plays if that slope there is less than 3 times as steep. Other less extreme utility functions will do the same, provided marginal utility does not decrease too quickly.

The perplexing problem is not that Samuelson’s colleague might have been extremely risk averse,21 but that most people also seem to display abnormally large risk aversion for small gambles. This means either that most people are coincidentally in a region of very high local risk aversion or that risk aversion is large everywhere. The first possibility, which relies on massive coincidence, is unlikely. The second possibility leads to Samuelson’s argument which has been expanded upon by others. The Samuelson paradox is now generally attributed to a reframing of a person’s choices; that is, the utility function is not the same over time, but is constantly recenter-ing on the current wealth level with first-order risk aversion applying at that point.

There is, however, another candidate explanation for the Samuelson Paradox, and that is a misunderstanding of the law of large numbers and diversification. Suppose the original win $125 or lose $100 bet is divided into n independent bets of win $125/n or lose $100/n on each. The resulting distribution is binomial with a constant mean of +12.5 regardless of the number of splits and a standard deviation of 1/2112.5 .n− For a large enough n, this combined bet will eventually be evaluated above the status quo by any utility function that is twice differentiable. This follows from the basic result risk premium result in equation (45) of chapter 1. But that is not the combined bet discussed in the paradox. Instead each separate bet is the same size so together they have both a mean and variance proportional to the number of separate bets. Using the same approximation, the extended bet would never be accepted if the original was not. Of course, that approximation assumes a small gamble which this is decidedly not.

Further Notes Other Statistical Risk Measures

There have been many other suggested statistical measures of risk in addition to variance or standard deviation.

(Lower) Semi-Variance is perhaps the most popular such measure used in academic re-search. Semi-variance is defined as 2 2

0 0 0( ) [( ) | ].S x x x x x≡ − < The break point, x0, is typically set either at the mean or, when comparing risks, at a fixed common reference point. For example,

21 To reject the single gamble, his absolute risk aversion must have been in excess of 0.005. If his wealth was $10,000, his relative risk aversion must have been in excess of 50. Typical estimates for relative risk aversion are in the single digit to low two-digit range.

Page 29: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 29 version: June 25, 2019

the risk-free rate is often used with asset returns. Semi-deviation is the square root of the semi-variance. These measures are also called downside variance and downside deviation because semi-variance can mean the measure applied to the upper tail as well. Downside risk is another name though that phrase is also used in a general sense with no specific statistical measure in mind. The upper semi-variance, 2

0 0[( ) | ],x x x x− > is then called upside potential. Semi-variance is related to the lower partial second moment defined in (44), µ2(x0) =

20 0( ) Pr .S x x x× < The difference between these two measures is that semi-variance ignores

outcomes at or above x0 while the lower moment counts them as zero. Like variance, semi-variance provides a complete ordering as it assigns a real number to

every random variable. Unlike variance it does not increase the measured risk based on extreme outcomes above the reference. Compared to the lower partial moment; however, it is a poor measure of risk. Consider the two distributions: x takes on the values −90 with probability 10% and 10 with probability 90%; y takes on the values −90 with probability 10%, −6 with probabi-lity 30%, and 18 with probability 60%. Both random variables have expected values of zero, and y is obviously riskier. However, the semi-variance of x is 8100 while the semi-deviation of y is lower at 2052. The second lower partial moments at a reference level of 0 are 810 and 820.6 do show the proper risk ordering as they must.

Mean Absolute Deviation (MAD) and semi-MAD are similar to variance and semi-variance but use the absolute deviations. MAD ≡ [| |].x x− SMAD(x0) ≡ 0 0[ | ].x x x x− < The latter is related to µ1(x0, F) as defined in (44). Like variance, MAD penalizes large deviations on the upside though not as severely. However, it lacks the nice tractability of variance as it is not a differentiable function.

The Gini Mean Difference is the expected absolute difference between all possible pairs of realizations of a random variable GMD ≡ ( ) ( ).x x dF x dF x′ ′| − |∫ ∫ 22 That is, the mean differ-ence is the expected absolute difference between two independent draws from the distribution F. The mean difference can also be expressed as four times the covariance of a random variable with its cumulative function. GMD = 4⋅cov[x, F(x)]. As with semi-variance, a smaller mean dif-ference is a necessary condition for one random variable to both be less risky in the Rothschild-Stiglitz sense and to second-order dominate another. In addition, variables with a higher mean and lower risk by this measure form a subset of all non-second-order dominated prospects.

Entropy as used in information theory describes how much information a single obser-vation delivers on average. A related measure, also called entropy, can be used to describe relative risk of strictly positive random variables. This measure is

( )( ) n [ ] [ n ] .L x x x≡ − (84)

This is obviously a measure of relative risk rather than absolute risk because L(ax) = L(x) for any positive constant a.

The measure of entropy in information theory is

( ) ( ) log ( ) log ( ) .( ) [ ( )]i i iiH x x x= − π π = − π∑π (85)

In information theory, the logarithms are generally base 2, but any base cane be used as that only changes the measure by a constant multiplier. The entropy of distribution 2π relative to distribu-tion 1π is 2 1 1 2 1( ; ) log( )./i i i iD ≡ − ∑ π π ππ π Note that H and D are written as a functions of the

22 The mean difference should not be confused with the mean absolute difference [ ][| |].x x− It is related to the normalized Gini coefficient as MD = 2[x]Gini.

Page 30: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 30 version: June 25, 2019

probabilities only; the actual outcomes of x are irrelevant. This is fine when measuring informa-tion, but obviously inappropriate when the variation in the outcomes is important as in a measure of risk.

The risk measure L is a relative information entropy measure. Define the probability[ ].i i ip x xπ= π Then

( ; ) log( ) [log( ) log [ ] [log( )] log [ ]( )] ( ) ,/i i i i i i ii iD p x x x xπ π π= − π π = − π − = − +∑ ∑p π (86)

which is (84) when natural logarithms are used.

Value at Risk or VaR is the maximum possible loss that occurs with a probability less than some fixed percentage. If z is a random variable measuring losses with a cumulative distribution H(z), then VaRp ≡ H−1(p). More commonly if x = −z is the random gain or profit, with cumulative distribution F(x) = 1 − H(x), then VaRp ≡ F−1(1−p). VaR is used mostly as a risk management tool or for financial reporting. It is not commonly used in academic research except when specifically discussing such applications.

One problem with VaR is that it is not sub-additive which is a desirable property of a risk measure. A risk measure, ρ, is sub-additive if ( ) ( ) ( ).ax by a x b yρ + ≤ ρ + ρ That is mixing two risks together should provide some benefit of diversification. This is not true of VaR. For example, suppose x1 and x2 are independent and identically distributed random variables with Prxi = −10 = 4%, Prxi = 0 = 1%, and Prxi = 5 = 95%. Then the VaR5% for either one is 0. However, the VaR5%(x + y) = −10 because at least one of the x’s will be −10 with probability 7.84%.

Expected Shortfall (also known as conditional value at risk or average value at risk) measures not what loss will be suffered with probability p, but how bad the loss can be. It is the average loss realized in the worst fraction p of cases; 0ES (1/ ) VaR .p

p p dπ≡ − π∫ ESp is sub-additive even though it is based on VaR.

Maximum Drawdown is the maximum loss from a peak to a trough before a new peak is achieved. It is usually expressed as a positive percentage of the peak. For the current maximum drawdown a new peak may not yet have been achieved. This is purely an historical number though presumably forecasts could be made. The Calmar Ratio is the ratio of the average annual return to the Maximum Drawdown. A similar measure is the MAR (Managed Accounts Report) Ratio. This is the Calmar ratio measured from the inception of a mutual fund or trading strategy. Stronger Notions of Riskiness

In experiments, subjects do not always display risk aversion and make less risky choices. A primary example of violations is when choices involve losses. Risk preference over losses is one part of Prospect Theory which is the subject of Chapter 11. But risk preference has also been observed in other cases as well, particularly when the comparison is not simple. Risk preference is not inconsistent with expected utility theory, but it does run counter to the usual assumption in Finance. Of course, opting for more risk in complicated problems could be simple confusion, but it has led to the development of alternative notions of risk that are included within the Rothschild-Stiglitz definition but are simpler and, therefore, less likely to be confused. Single Crossing Risk occurs when a two cumulative distribution G(y) and F(x) defined on the range [a, b] have the same expectation, and ∃z1and z2 with z1 ≤ z2 such that

Page 31: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 31 version: June 25, 2019

1

1 2

2

( ) ( ) for( ) ( ) for( ) ( ) for .

G z F z a z zG z F z z z zG z F z z z b

> < <= ≤ ≤< < <

(87)

Obviously the distributions have a single crossing occurring throughout the range [z1, z2], this range could include either or endpoint of the domain. If z1 = z2 the single crossing occurs at that one point. Equivalently, G is single-crossing more risky than F if G−1(p) < F−1(p) for p < p* and G−1(p) ≥ F−1(p) for p > p* where p* ≡ F(z1) = F(z2) = G(z1) = g(z2). Both of these properties are easy to verify from a graph.

It is straight-forward to see that single-crossing risk is a special case of Rothschild-Stiglitz risk. Clearly 1[ ( ) ( )] 0

z

aG t F t dt z z− > ∀ <∫ (88)

because the integrand is strictly positive in the interval. Similarly

2[ ( ) ( )] [ ( ) ( )] [ ( ) ( )] 0 .z b b

a a zG t F t dt G t F t dt G t F t dt z z− = − − − > ∀ >∫ ∫ ∫ (89)

The first integral on the right-hand side is zero because the expectations are equal. The second integral is negative because its integrand is strictly negative in the interval. Therefore, condition (ii) in Theorem 2.1 is satisfied.

However, not every distribution that is Rothschild-Stiglitz riskier is also single-crossing riskier. A simple example is given below.

x, y 1 2 3 4 5 g(y) 30% 5% 55% 0% 10% G(y) 30% 35% 90% 90% 100% f(x) 25% 15% 45% 10% 5% F(x) 25% 40% 85% 95% 100%

G(⋅) − F(⋅) 5% −5% 5% −5% 0% ( ) [ ( ) ( )]z

aT z G t F t≡ −∫ 5% 0% 5% 0% 0%

The probabilities of the outcomes are g(y) and f(x); the cumulative distributions are G(y) and F(x). The next to last line shows that there are three crossings of the cumulative distribution, between 1 and 2, between 2 and 3, and between 3 and 4. The final line shows that [ ] [ ]x y=

because T(b) = T(5) = 0 and that y is riskier than x because T(z) ≡ ∑[G(z) − F(z)] is everywhere nonnegative.

Monotonic Riskiness is an even narrower definition of risk than single-crossing risk and therefore of Rothschild-Stiglitz riskiness. Two random variables are said to be comonotonic if they are both non-decreasing functions in their joint ordered state space. That is, x and ε are comonotonic if for all pairs of states, s and s′, (xs − xs′)(εs − εs′) ≥ 0. If x and ε are comonotonic and [ ] 0,ε = then dy x= + ε is monotonically riskier than x.23 Monotonic riskiness implies single-crossing riskiness. Because ε is monotonic with x, there is a

23 In many context y is said to monotonically riskier than x only if y x≡ + ε rather than being distributed in that fashion. The riskiness properties are the same; but there are additional advantages in modelling.

Page 32: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 32 version: June 25, 2019

(possibly null) range, (x′, x″) for which 0,ε = with 0for x x′ε < < and 0for .x x′′ε > >

24 Monotonic riskiness can be described directly in terms of the two cumulative distribu-

tions. G is monotonically riskier than F if G−1(P) − G−1(Q) ≥ F−1(P) − F−1(Q) ∀ 0≤ Q < P ≤ 1. That is, for any quantile rage from Q to P, the spread in the y outcomes is greater than the spread in the x. outcomes.

As with Rothschild-Stiglitz riskiness, single-crossing riskiness and monotonic riskiness can be extended to variables with different expectations by first subtracting their means. So if

[ ]x x− is single crossing or monotonic less risky than [ ],y y− then x is said to be single crossing or monotonic less risky than y. Alternative Definitions of Stochastic Dominance

The alternate definitions of more risk lead to alternate definitions of stochastic domi-nance.

Single-Crossing and Monotonic Stochastic Dominance can be defined based on those to notions of risk. Single crossing stochastic dominance occurs if the cumulative distributions cross once and [x − y] ≥ 0. Monotonic stochastic dominance occurs if y x= + ε with x and ε comono-tonic and [ ] 0.ε ≤ Both of these are special case of second-order stochastic dominance and monotonic stochastic dominance is a special case of single crossing stochastic dominance.

DARA-Stochastic Dominance is developed in Vickson (1975). It is stochastic dominance for the set of utility functions that display decreasing absolute risk aversion. Recall that a positive third derivative is a necessary condition for DARA while DARA obviously implies risk aversion. So DARA-dominance is intermediate to second and third-order stochastic dominance. The tail condition for DARA stochastic dominance is

1[ ( ) ( )] [ ] 0 0, .z tkt kz

a ae G F d dt k e x y k z− − −τ − τ τ + − ≥ ∀ > ∀∫ ∫ (90)

The double integral in (90) for k = 0 is the condition for third-order stochastic dominance. Fractional Stochastic Dominance is defined by Müller, et al. (2015) as stochastic domi-

nance between first and second order. Stochastic dominance of order δ is defined in the usual way as -SD [ ( )] [ ( )]x y u x u y uδ δ⇔ ≥ ∀ ∈ (91)

where δ is the class of increasing and continuously differentiable utility functions with 1 ≤ δ ≤ 2 satisfying25 2 1 1 20 ( 1) ( ) ( ) .u x u x x x′ ′≤ δ − ≤ ∀ ≤ (92)

This restriction permits marginal utility to be increasing but limits the extent to which it can increase. Setting δ =1puts no restriction on marginal utility except that it be positive so δ = 1 corresponds to first-order stochastic dominance. When δ = 2, marginal utility must be decreasing so this corresponds to second-order stochastic dominance. The integral condition equivalent to δ-

24 Because [ ] 0,ε = ,ε must have both positive and negative realizations or trivially be identically 0.

25 The class δ can be defined more generally for non-differentiable utility functions as

4 3 2 11 2 3 4

4 3 1 1

( ) ( ) ( ) ( )0 ( 1) .u x u x u x u x x x x x

x x x x− −

≤ δ − ≤ ∀ < < <− −

Page 33: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 33 version: June 25, 2019

order stochastic dominance is

max[ ( ) ( ),0] ( 1) max[ ( ) ( ),0] .z z

a aG t F t dt F t G t dt z− ≤ δ − − ∀∫ ∫ (93)

There is also a corresponding change in densities like the mean-preserving spread used to define riskiness. If F and G have a single crossing and F second-order dominates but does not first-order dominate G, then there is always a minimum δ strictly between 1 and 2 for which (93) holds. This can be thought of as an index “risk-aversion” which makes F dominate.

Levered Stochastic Dominance — Levy and Kroll (1979) to extend the notion of stochastic dominance to compare portfolio combinations of risky and risk-free opportunities. This extended definition can find additional dominated risks. For example, if y is risker than x but has a higher expectation, it cannot be dominated. However, if the combination of y and the risk-free opportunity that has the same expectation as x is still more risky, then x dominates in this portfolio sense. Levered Stochastic Dominance with a risk-free asset is defined as follows. Let x, y, and r denote two random outcomes and a safe outcome per dollar invested. Define xα ≡ αx + (1−α)r. That is, xα is the return when the fraction α of the investment is in x. Then x nth-order levered stochastic dominates y if there is some α for which xα nth-order stochastic dominates y.

The levered stochastically dominated y’s include all the stochastically dominated y’s of the same order and add some additional ones. In particular

1SD 2SD SD

L1SDR L2SDR L SDR .

n

n

⇒ ⇒ ⇒⇓ ⇓ ⇓

⇒ ⇒ ⇒

Hazard Rate Dominance is partial ordering that falls between likelihood ratio dominance and first-order stochastic dominance. Hazard rates are most often used with random variables measuring the time until a single event occurs. Let F(t) and f(t) be the cumulative probability distribution that the event occurs at or before time t and the probability density of its occurrence. The hazard rate is the probability that the event occurs at time t conditional on the probability that it has not occurred previously; ( ) ( ) [1 ( )].h t f t F t≡ − It also can be expressed as h(t) =

n 1 ( ) .( )d F t dt− − Distribution F hazard rate dominates distribution G if ( ) ( )G Fh t h t≥ or

( ) ( ) .1 ( ) 1 ( )

g t f tG t F t

≥− −

(94)

This follows immediately from the first inequality in (27) so hazard rate dominance is a special case of likelihood ratio dominance. To verify that it includes first-order stochastic dominance use the derivative definition to write

0 0

( ) 1 exp ( ) 1 exp ( ) ( )t t

F GF t h z dz h z dz G t = − − ≤ − − = ∫ ∫ (95)

Probability Premium: The Arrow-Pratt risk aversion measure can be used to determine the probability premium. The probability premium is defined for binary gambles giving ±h. It is the required difference between the probabilities of receiving +h and −h to just induce acceptance. That is

Page 34: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 34 version: June 25, 2019

[ ] [ ]1 12 2( ) ( , ) ( ) ( , ) ( )u x p x h u x h p x h u x h= + + + − − (96)

Using a second order Taylor expansion,

2 21 1 1 1

2 2 2 221 1

2 4

( ) ( )[ ( ) ( ) ( ) ] ( )[ ( ) ( ) ( ) ]

0 2 ( ) ( ) ( , ) ( ) .

u x p u x u x h u x h p u x u x h u x h

pu x h u x h p x h A x h

′ ′′ ′ ′′= + + + + + − − + +

′ ′′= + ⇒ =

(97)

Additional Characterizations of Utility

Some models make assumptions about utility beyond that the third derivative being positive. Proper Risk Aversion — Pratt and Zeckhauser (1987): Proper risk aversion is the idea that risks tend to crowd each other out. Under DARA, any gamble rejected at a wealth of w must be rejected at a lower wealth x + z with z < 0; that is

and .x z x x y x x z y x z+ + ⇒ + + + (98)

Here the first antecedent is not really necessary. It simply says utility is increasing. It is stated explicitly for analogy with proper risk aversion which is defined in the same fashion when x and z are random andx z x x y x x z y x z+ + ⇒ + + + (99)

for all x, y, and z that are statistically independent. So proper risk aversion is an extension of DARA. Proper risk aversion goes beyond it in the following sense. If a DARA investor is just willing to take on either risk y or risk z, and are both independent of his current wealth, he will be unwilling to take on both together even when they are independent of each other

DARAand ~ ~ .x z x x y x z y x+ + ⇒ + + (100)

A necessary condition for proper risk aversion is that ( ) ( ) ( ).A W A W A W′′ ′≥ A sufficient condition for proper risk aversion is that the derivatives of the utility function alternate in sign. As the latter is true of all standard utility functions, they all display proper risk aversion.

To illustrate suppose u(x) = −1/x. A(x) = 2/x and is decreasing, and the signs of the deri-vatives alternate in sign. Wealth is deterministic, fixed at 10. Gambles y and z are both equal chances of gaining 5 or losing 2.5. So [ ( )] [ ( )] ( ) 0.1,u x y u x z u x+ = + = = − and either one is just acceptable. But for both gambles together [ ( )] 0.1025.u x y z+ + = − So they will not be accepted together.

Standard Risk Aversion and Temperance — Kimball (1991, 1992): Standard risk aversion is a refinement of proper risk aversion. It replaces the first antecedent in (99), ,x z x+ with

[ ( )] [ ( )] .u x z u x′ ′+ ≥ (101) Standard risk aversion is equivalent to the combination of DARA and decreasing absolute prudence.

The fourth derivative of utility is known as temperance. A utility function is temperate if its fourth derivative is negative and intemperate if its fourth derivative is positive. Just as DARA means that the third derivative of the utility function must be positive, decreasing absolute pru-dence requires that the fourth derivative be negative.

Page 35: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 35 version: June 25, 2019

(4) 2

2( ) [ ( )]0 ( ) .

( ) [ ( )]u w u wP wu w u w

′′′′≥ = − +

′′ ′′ (102)

The second term is positive, so prudence can be decreasing only if the fourth derivative is negative. Using a fourth order expansion of utility as in (72), temperance can be associated with the fourth moment of the probability distribution, kurtosis. Kimball also showed that temperance means an investor showed moderation in accepting independent risks.

Completely Monotonic Utility — Brockett and Golden (1987): In discussing higher order derivatives of a utility function it was shown that if the utility function has is unbounded above and each derivative has a constant sign, then the derivatives must alternate in sign. If the utility function is infinitely differentiable with this alternating property, then it is said to be completely (or totally) monotonic. For example, the commonly used LRT utilities are all completely mono-tonic. The completely monotonic class can be called ∞; it is the class to which infinite-order stochastic dominance applies Although nth-order stochastic dominance is typically harder to describe the larger is n, this is not true for ∞. Define the moment-generating function Ψz(t) ≡ [e−tz] for any random variable z whose domain is bounded below. Then provided [u(x)] and [u(y)] are finite

[ ( )] [ ( )], iff ( ) ( ) 0.x yu x u y u t t t∞≥ ∀ ∈ Ψ ≤ Ψ ∀ > (103)

This result makes the determination of stochastic dominance a straightforward exercise for any pair of random variables with a known moment generating function. All completely monotonic utility functions are positive combinations of exponential utility functions. That is, for all utility functions in ∞, there is a nonnegative function ω(a) such that marginal utility is26

0( ) ( ) .axu x e a da

∞ −′ = ω∫ (104)

For example, ω(a) = a−γ with γ < 1 gives

1

0( ) (1 ) .axu x a e da x

∞ −γ − γ−′ = = Γ − γ∫ (105)

where Γ(⋅) is the gamma function. Removing this constant with an allowed monotonic trans-formation and integrating gives the familiar power utility function xγ/γ. This property is quite useful, for example, in the aggregation problem. When aggregated utility is ( )i i iU x u x( ) = ∑ λ for some set of positive weights λi, marginal utility is

0 0 0

( ) ( ) ( ) ( ) ( ) .ax ax axi i i i i iii i

U x u x e a da e a da e a da∞ ∞ ∞− − −′ ′= λ = λ ω = λ ω ≡ Ω∑∑ ∑ ∫ ∫ ∫ (106)

26 The utility function itself can be represented as 0 0

0( ) ( ) 1 exp[ ( )] ( )[( ) ]/u x u x a x x a a da

∞= + − − − ω∫ where x0 is any

value for which utility is finite; e.g., it cannot be 0 for log utility.

Page 36: Chapter 2 — Risk and Risk Aversion Risk and Risk Aversion.pdfChapter 2 — Risk and Risk Aversion . The previous chapter discussed risk and introduced the notion of risk aversion.

Risk and Risk Aversion — © Jonathan Ingersoll 36 version: June 25, 2019

References Samuelson, Paul. 1963. Fishburn, P. 1985, "Third-degree Stochastic Dominance and Random Variables."

Economic Letters 19: 113-117. Hadar, J., and W. R. Russell. 1969. “Rules for Ordering Uncertain Prospects.” American

Economic Review 59: 25-34. Kahnemann, David, and A. Tversky. 1979. “Prospect Theory: An Analysis of Decision

under Risk.” Econometrica 47: 263-297. Kreps, David. 1988. Notes on the Theory of Choice. Underground Classics in Economics.

Westview Press. Machina, Mark. 1982. “‘Expected Utility’ Analysis Without the Independence Axiom.”

Econometrica 50: 277-323. Pratt, John. 1964. “Risk-Aversion in the Small and in the Large.” Econometrica 32: 122-

136. Richard, Scott. 1975. “Multivariate Risk Aversion, Utility Independence, and Separable

Utility Functions.” Management Science 22: 12-21. Ross, Stephen, 1971. "Risk and Efficiency." unpublished working paper. Ross, Stephen. 1981. “Some Stronger Measures of Risk Aversion in the Small and in the

Large.” Econometrica 49: 621-638. Rothschild, Michael, and Joseph Stiglitz. “Increasing Risk I: A Definition.” Journal of

Economic Theory 2: 225-243. Shalit, H., and Yitzhaki, S. (1994) “Marginal Conditional Stochastic Dominance,"

Management Science 40, 670-684 Vickson, R. G. “Stochastic Dominance for Decreasing Absolute Risk Aversion.” Journal

of Financial and Quantitative Analysis 10: 799-811. von Neumann, John, and Oskar Morgenstern. 1944. Theory of Games and Economics

Behavior. Princeton University Press.