Active Portfolio Management

Active Portfolio Management

Lectures

1 Richard R. Lindsey

Portfolio Choice

Individual:

1. Strictly prefers more to less (strictly increasing utility

function)

2. Risk averse

0 initial wealth

riskless interest rate

random return on j-th risky asset

dollar investment in j-th asset

uncertain end of period wealth

f

j

j

w

r

r

a

w


Portfolio Choice

0

0

( )(1 ) (1 )

(1 ) ( )

j f j jj j

f j j fj

w w a r a r

w w r a r r

0{ }max [ ( (1 ) ( ))]j

f j j fa

j

EU w r a r r


Portfolio Choice

2

F.O.C. [ ( )( )] 0

S.O.C. [ ( )( ) ] 0 j f

j f

EU w r r j

EU w r r j

() 0 more preferred to less

() 0 concave utility or risk averse

U

U


Portfolio Choice

Theorem: An individual who is risk averse and strictly

prefers more to less will invest in risky assets iff the rate

of return on at least one asset > rf .

Consider the case with a single risky asset

F.O.C. [ ( )( )] 0fEU w r r


Portfolio Choice

Claim:

Consider the no investment case

*

*

*

0 iff [ ] 0

0 iff [ ] 0

0 iff [ ] 0

f

f

f

a E r r

a E r r

a E r r

0 0[ ( (1 ))( )] ( (1 ))( [ ] )f f f fEU w r r r U w r E r r


Portfolio Choice


() 0 sign is entirely determined by [ ] fU E r r

[ ] 0 can increase utility by adding some of the risky asset

[ ] 0 can increase utility by shorting some of the risky asset

[ ] 0 utility is maximized

f

f

f

E r r

E r r

E r r

Portfolio Choice

Richard R. Lindsey61

In the multi-asset case, to hold no risky assets or to short

them

And again

Therefore, a risk averse individual with strictly increasing

utility avoids any positive investment in risky assets only

if none of the investments have a positive risk premium.

0

0

[ ( (1 ))( )] 0

( (1 ))( [ ] ) 0

f f

f f

EU w r r r j

U w r E r r j

0 only if [ ] 0 j j fa j E r r j

Portfolio Choice


When one or more of the risky assets has a positive risk

premium, the investor will have positive holdings in some

risky assets

Note that j and j´ are not necessarily the same because with

more than one risky asset, a positive risk premium on an

asset does not necessarily mean a positive investment (e.g.

2 assets w/ + risk premium but one stochastically

dominates the other).

0 if [ ] 0j j fj a j E r r

Risk Aversion


Consider now the case with one risky asset and one riskless

asset.

For a monotonically increasing strictly concave (MISC)

individual to invest all her wealth in the risky asset:

1st order Taylor series expansion around

0[ ( (1 ))( )] 0fEU w r r r

0( (1 ))fU w r

Risk Aversion


Note that this is for a small risk.

The minimum risk premium to induce full investment is

0 02 2

0 0

[ ( (1 ))( )] ( (1 )) [ ]

( (1 )) [( ) ] o( [( ) ]f f f

f f f

EU w r r r U w r E r r

U w r E r r w E r r

0 20

0

20 0

( (1 ))[ ] [( ) ]

( (1 ))

( (1 )) [( ) ]

ff f

f

A f f

U w rE r r w E r r

U w r

R w r w E r r

Risk Aversion


This is known as the Arrow-Pratt measure of absolute risk

aversion (the inverse of RA is the risk tolerance).

For small risks (or small changes in risk) it is a measure of

the intensity of an individual’s aversion to risk.

It is a measure of curvature (but since vonNeumann-

Morgenstern utility is unique up to affine transformations,

the 2nd derivative is not sufficient).

Risk Aversion


Theorem:

( )0 decreasing absolute risk aversion

( )0 increasing absolute risk aversion

( )0 constant absolute risk aversion

A

A

A

dR zz

dz

dR zz

dz

dR zz

dz

00

00

00

( )0 if 0

( )0 if 0

( )0 if 0

A

A

A

dR zdaw z

dw dz

dR zdaw z

dw dz

dR zdaw z

dw dz

Risk Aversion


Decreasing absolute risk aversion implies that the risky asset

is a normal good (i.e. the dollar demand increases as

wealth increases).

Increasing absolute risk aversion implies that the risky asset

is an inferior good (i.e. the dollar demand decreases as

wealth increases).

Constant absolute risk aversion implies that the dollar

demand is invariant with respect to wealth.

Risk Aversion


Absolute risk aversion is therefore related to the dollar

demand for the risky asset.

But under decreasing absolute risk aversion, an individual

may actually increase, hold constant, or decrease the

proportion of wealth in the risky asset as wealth increases.

This brings us to the Arrow-Pratt measure of relative risk

aversion( )R AR zR z

Risk Aversion


Theorem:

Where

Is the wealth elasticity of demand.

( )1 if 0 (relatively elastic)

( )1 if 0

( )1 if 0 (relatively inelastic)

R

R

R

dR z

dz

dR z

dz

dR z

dz

0

0

wda

dw a

Risk Aversion


η<1: the proportion of agent’s initial wealth invested in the

risky asset decreases as wealth increases

η=1: the proportion of agent’s initial wealth invested in the

risky asset is constant as wealth increases

η>1: the proportion of agent’s initial wealth invested in the

risky asset increases as wealth increases

Linear Risk Tolerance Utility


To get sharper results and closed form solution for securities

holdings, we need to specify the form of the utility

function. Most typically we use a class of utility function

known as linear risk tolerance (LRT) utilities or HARA

utilities (hyperbolic absolute risk aversion). These utility

functions satisfy state independence and time additivity.



Definition: Linear risk tolerance utility, the time additive

and state dependent utility function U(·) satisfies linear

risk tolerance if it solves the differential equation:

Where φ and β are independent of z.

Note: every LRT utility function is identified by 2

parameters: the intercept φ and the slope β.

( )

( )

U zz

U z



This differential equation has three sets of solutions

depending on the value of β

Where ≈ means that the solutions are unique up to a positive

linear transform.

11(A) 0,1 : ( ) where 0; max ,0

1U z z z

(B) 1 : ( ) lnU z z

(C) 0 : ( ) exp where 0z

U z



These three classes are:

(A) Generalized Power Utility (when 0)

1( )AR z

z

2

( )0

( )

AdR z

dz z



( )R

zR z

z

2

( )

( )

RdR z

dz z

Which is 0 iff 0

0 iff 0

0 iff 0

Recall from Risk Aversion


Theorem:

Where



( )1 if 0


R

R

R

dR z

dz

dR z

dz

dR z

dz

0

0

wda

dw a



When = 0 we have power utility which is CPRA or

constant proportional (relative) risk aversion. Also known

as iso-elastic utility.

The proportion of wealth in the risky asset is invariant to

changes in wealth.

When = -1 we have quadratic utility.



(B) Generalized Log Utility (when 0)

1( )AR z

z

2

( ) 10

( )

AdR z

dz z



( )R

zR z

z

2

( )

( )

RdR z

dz z

Which is 0 iff 0

0 iff 0

0 iff 0

Recall from Risk Aversion


Theorem:

Where



( )1 if 0


R

R

R

dR z

dz

dR z

dz

dR z

dz

0

0

wda

dw a



When = 0 we have log utility which is CPRA or constant

proportional (relative) risk aversion. Also known as iso-

elastic utility.

The proportion of wealth in the risky asset is invariant to

changes in wealth.

Note when = 0 we have RR(z) = 1.



(C) Negative Exponential Utility

Constant absolute risk aversion (CARA)

Dollar demand for risky assets is unaffected by changes in

wealth (riskless borrowing or lending absorbs all

changes).

1( )AR z

( )0AdR z

dz

Stochastic Dominance

Empirical Observations Properties of U(z)

Investors prefer more to less U(z) > 0

Investors are risk averse U(z) > 0

The risky asset is a normal good dRA(z)/dz < 0


We now want to relate these three properties of utility

functions to the properties of payoff distributions.

For example, one question we can ask is: Under what

circumstances can we unambiguously say that an

individual will prefer one risky asset to another if all

we know is that he prefers more to less?


We can answer questions like this using stochastic

dominance.

Note that stochastic dominance is:

1. Always a pairwise comparison.

2. Only a partial ordering among risky assets.

3. Much richer than what we will cover here (e.g. you can

develop much of modern portfolio theory just using

stochastic dominance).



Definition: First Order Stochastic Dominance

Then XA FSD XB .


( ) Pr[ ]F x X x

( ) and ( ) are different distributions

( ) 0

A BF x F x

a F a

If ( ) ( ) 0

0 some

A BF x F x x

x




Definition: Second Order Stochastic Dominance

Then XA SSD XB .


If ( ) ( ) 0

0 some

t

A Ba

F x F x dx t

t

and [ ] [ ]A BE X E X




Definition: Third Order Stochastic Dominance

Then XA TSD XB .


If ( ) ( ) 0

0 some

y t

A Ba a

F x F x dxdt y

y

[ ] [ ] and [ ] [ ]A B A BE X E X Var X Var X






Theorem: XA FSD XB XA SSD XB XA TSD XB (these are

progressively weaker tests).

Theorem: E[U(XA)] > E[U(XB)] for all U(·) (that are finite

for all finite x) such that U(x) > 0 everywhere iff XA FSD

XB (i.e. prefers more to less).


for all finite x) such that U(x) > 0 and U(x) < 0

everywhere iff XA SSD XB (i.e. risk averse).




for all finite x) such that U(x) > 0, U(x) < 0 and U(x) >

0 everywhere iff XA TSD XB .


for all finite x) such that U(x) > 0, U(x) < 0 and RA(x) <

0 everywhere iff XA TSD XB (i.e. risky asset is a normal

good).



Theorem: The following three statements are equivalent:

1. A FSD B

2. FA(x) ≤ FB(x) for all x

3. x A = x B + α where α ≥ 0

Theorem: The following three statements are equivalent:

1. A SSD B

2. E[x A] = E[x B] and

3. x A = x B + ε where E[ε |A] = 0


if ( ) ( ) 0 and 0 some t

A Ba

F x F x dx t t


Let’s consider an example

Which investment do we choose?


1

1 with probability 0.25

4 with probability 0.75X

2




X

1

1

[ ] 3.25

[ ] 1.6875

E X

Var X

2

2

[ ] 3.25

[ ] 1.6875

E X

Var X



0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6

X1

X2


Cannot have FSD because the cumulative distribution

functions cross.

No SSD because both distribution functions are admissible.

Definition: A distribution is admissible or efficient with

respect to a set of distribution functions, S, if it is not

dominated by a member of S.




0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 1 2 3 4 5 6

X1

X2

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0 1 2 3 4 5 6

g(t)


X2 TSD X1so we would choose X2.

Note that this choice reflects a preference for skewness.

If you must take a risky gamble, do you prefer to take it

when wealth is high or low?


Riskiness of Distributions

This is a partial ordering of distributions.

Definition: Distribution Y is more risky than distribution X

if:

1. Y=X+Z where E[Z|X]=0 and non-degenerate.

2. Y is obtained from X by the addition of a mean

preserving spread.

3. X is preferred to Y by all risk averters providing

E[X]=E[Y].

4. Var[Y] > Var[X] provided E[X]=E[Y].


Riskiness of Distributions

Theorem: The partial orderings given by 1, 2, and 3 are

equivalent.

Theorem: The partial orderings given by 1, 2, 3, and 4 are

equivalent for normal distributions. (Reason: normals are

stable under addition if variances are finite.)


Bibliography

Huang, Chi-fu, and Robert Litzenberger, Foundations for

Financial Economics, North-Holland.

Levy, Haim, Stochastic Dominance: Investment Decision

Making under Uncertainty, Springer.

Ohlson, James, The Theory of Financial Markets and

Information, North-Holland.

Rothschild, M. and J. E. Stiglitz (1970). ―Increasing Risk: I.

A Definition.‖ Journal of Economic Theory 2: 225-43.


Optimization: Definitions


Our optimization problems will take the form:

Where f is a function, x is an n-vector and S is a set of n-

vectors. We call f the objective function, x the choice

variable or control variable, and S the constraint set or

opportunity set.

max ( ) subject to xf x x S

▲▲▲▲▲▲



Definition: The value x* of the variable x solves the problem

if

In this case, we say that x* is a maximizer of the function fsubject to the constraint x an element of S, and that f(x*) is the maximum (or maximum value) of the function fsubject to the constraint.


*( ) ( ) f x f x x S

▲▲▲▲▲▲



A minimizer is defined analogously.

x1 is a local maximizer x2 is a minimizer

x3 is a maximizer x4 is a ?

x5 is a ?

▲▲▲▲▲▲



Note that we can transform the objective function f with any strictly increasing function g. In other words:

Is identical to the set of solutions to the problem:

This fact is sometimes useful since it may be easier to work with a transform of the objective function rather than the original function.


max ( ( )) subject to xg f x x S

▲▲▲▲▲▲



Minimization problems are just the maximization of the

negative of the objective function

Has the same set of solutions as

max ( ) subject to x

f x x S

min ( ) subject to xf x x S

▲▲▲▲▲▲



Note that a continuous function on a compact set (closed

and bounded) attains both a minimum and a maximum on

that set (this is the Extreme Value Theorem). This is a

sufficient condition for a maximum (and a minimum) to

exist.

▲▲▲▲▲▲

Interior Optimum: One Variable


Proposition: (FOC) Let f be a differentiable function of a single variable defined on the interval I. If a point x* in the interior of I is a local or global maximizer or minimizer of f then f '(x*) = 0 (i.e. it is stationary).

Proposition: (SOC) Let f be a function of a single variable with continuous first and second derivatives, defined on the interval I. Suppose that x* is a stationary point of f in the interior of I (so that f '(x*) = 0).

1. If f "(x*) < 0 then x* is a local maximizer.

2. If x* is a local maximizer then f "(x*) ≤ 0.

3. If f "(x*) > 0 then x* is a local minimizer.

4. If x* is a local minimizer then f "(x*) ≥ 0.

Note: These are necessary conditions.

▲

Interior Optimum: Many Variables


Proposition: (FOC) Let f be a differentiable function of nvariables defined on the set S. If the point x in the interior of S is a local or global maximizer or minimizer of f then f i'(x) = 0 for i = 1, ..., n (i.e. it is stationary).

Proposition (SOC) Let f be a function of n variables with continuous partial derivatives of first and second order, defined on the set S. Suppose that x* is a stationary point of f in the interior of S (so that f i'(x*) = 0 for all i).

1. If H(x*) is negative definite then x* is a local maximizer.

2. If x* is a local maximizer then H(x*) is negative semidefinite.

3. If H(x*) is positive definite then x* is a local minimizer.

4. If x* is a local minimizer then H(x*) is positive semidefinite.

Note: These are necessary conditions.

▲▲▲



Where H is the Hessian matrix

2 2

1 1 1

2 2

1

n

n n n

f f

x x x xH

f f

x x x x

▲▲▲



An implication of this result is that if x* is a stationary point of f then

1. if H(x*) is negative definite then x* is a local maximizer

2. if H(x*) is negative semidefinite, but neither negative definite nor positive semidefinite, then x* is not a local minimizer, but might be a local maximizer

3. if H(x*) is positive definite then x* is a local minimizer

4. if H(x*) is positive semidefinite, but neither positive definite nor negative semidefinite, then x* is not a local maximizer, but might be a local minimizer

5. if H(x*) is neither positive semidefinite nor negative semidefinite then x* is neither a local maximizer nor a local minimizer.

A stationary point which is neither a maximizer or a minimizer is called a saddle point (note that not all saddle points look like a saddle. For example, every point (0, y) is a saddle point of the function f (x, y) = x3.).

▲▲▲

Global Optimum: One Variable


Proposition: Let f be a differentiable function defined on

the interval I, and let x be in the interior of I. Then:

1. if f is concave then x is a global maximizer of f in I if and only if x

is a stationary point of f

2. if f is convex then x is a global minimizer of f in I if and only if x

is a stationary point of f .

So if f is twice differentiable:

1. f "(z) ≤ 0 for all z ∈ I ⇒ [x is a global maximizer of f in I if and

only if f '(x) = 0]

2. f "(z) ≥ 0 for all z ∈ I ⇒ [x is a global minimizer of f in I if and only

if f '(x) = 0].

▲

Global Optimum: Many Variables


Proposition: Suppose that the function f has continuous

partial derivatives in a convex set S and let x be in the

interior of S. Then:

1. if f is concave then x is a global maximizer of f in S if and only if

it is a stationary point of f .

2. if f is convex then x is a global minimizer of f in S if and only if it

is a stationary point of f .

So if f is twice differentiable:

1. H(z) is negative semidefinite for all z ∈ S ⇒ [x is a global maximizer

of f in S if and only if x is a stationary point of f ].

2. H(z) is positive semidefinite for all z ∈ S ⇒ [x is a global minimizer

of f in S if and only if x is a stationary point of f ].

▲▲

Global Optimum: Many Variables


Note the difference between this and the local optima:

Sufficient conditions for local maximizer: if x* is a

stationary point of f and the Hessian of f is negative

definite at x* then x* is a local maximizer of f.

Sufficient conditions for global maximizer: if x* is a

stationary point of f and the Hessian of f is negative

semidefinite for all values of x then x* is a global

maximizer of f.

▲▲

Constrained Optimization: Equality


Usually it is not enough to consider solutions which

maximize (or minimize) a particular function (e.g. Diet

Coke can).

Instead, we want to find a solution which is subject to fixed,

outside constraints.

To solve these problems, we can use Lagrange multipliers.

▲



Suppose that Monique and

Carl are going swimming

in the river, and they see

each other in a field

bounded by the river.

Since it is such a hot day,

they want to jump in the

river as quickly as

possible, but they want to

do it together. What point

(P) on the riverbank

should they meet?

▲▲▲▲



In mathematical terms, if d(M,P) is the distance between M

and P, they must solve the problem:

Subject to the constraint:

Pmin (P) (M,P) (P,C)f d d

(P) 0g

▲▲▲▲



We can solve this graphically

if we recall that ellipses are

curves of constant P (i.e.

for every point P on an

ellipse, the total distance

from one focus of the

ellipse to P and then to the

other focus is the same).

So we need to find and

ellipse (with C and M as

the foci) which is tangent

to the riverbank.

▲▲▲▲



Or, mathematically, the normal vector to the ellipse must

point in the same direction as the normal vector to the

river.

▲▲▲▲



Recall that the gradient of a function f (which is written )

is a normal vector to a curve (in two dimensions) or a

surface (in higher dimensions). The length of the normal

vector doesn’t matter; any constant multiple of the

gradient is also a normal vector. In our case, we have two

functions whose normal vectors are parallel, so:

The unknown multiplier -λ is necessary because the

magnitudes of the two gradients may be different.

f

(P) (P)f g

▲▲▲▲▲▲



Alternatively, we can approach the problem by considering

the optimization problem and combine it with the

constraint to form a new function called the Lagrangian or

Lagrangian function:

and then we set:

P, Pmin (P, ) min (P) (P)f gL

(P, ) 0L

▲▲▲▲▲▲



Proposition: Let f and g be continuously differentiable

functions of two variables defined on the set S, let c be a

number, and suppose that (x*, y*) is an interior point of S

that solves the problem

Suppose also that either

,max ( , ) subject to g( , )x yf x y x y c

* *,0

g x y

x

* *,0

g x y

y

▲▲▲▲▲▲



Then there is a unique number λ such that (x*, y*) is a

stationary point of the Lagrangian

That is (x*, y*) satisfy the FOC

( ,y) ( , ) ( ( , ) )x f x y g x y cL

* * * * * *( ,y ) ( , ) ( , )0

x f x y g x y

x x x

L

* *( , )g x y c

* * * * * *( ,y ) ( , ) ( , )0

x f x y g x y

y y y

L

▲▲▲▲▲▲



▲▲▲▲▲▲



Algorithm for solving a two-variable maximization problem with an equality constraint.

Let f and g be continuously differentiable functions of two variables defined on a set S and let c be a number. If the problem

has a solution, it may be found as follows.

A) Find all the values of (x, y, λ) in which 1. (x, y) is an interior point of S

2. (x, y, λ) satisfies the FOC and the constraint.

B) Find all the points (x, y) that satisfy g1'(x, y) = 0, g2'(x, y) = 0, and g(x, y) = c. (For most problems, there are no such values of (x, y). In particular, if g is linear there are no such values of (x, y).)

C) If the set S has any boundary points, find all the points that solve the problem maxx,y f (x, y) subject to the two conditions g(x, y) = c and (x, y) is a boundary point of S.

D) The points (x, y) you have found at which f (x, y) is largest are the maximizers of f .


▲▲▲▲▲▲



Example: Consider the problem

(Note that the objective function xy is defined on the set of

all 2-vectors, which has no boundary. The constraint set is

therefore not bounded, so the extreme value theorem does

not imply that this problem has a solution.)

The Lagrangian is

,max subject to 6x yxy x y

( ,y) ( 6)x xy x yL

▲▲



The FOC are

And the constraint

These equations have a unique solution, (x, y, λ) = (3, 3, 3). We

have g'1(x, y) = 1 ≠ 0 and g'2(x, y) = 1 ≠ 0 for all (x, y), so we

conclude that if the problem has a solution it is (x, y) = (3, 3).▄

0yx

L

0xy

L

6x y

▲▲




(Note that the constraint set is compact and the objective

function is continuous, so the extreme value theorem

implies that this problem has a solution.)

The Lagrangian is

2 2 2

,max subject to 2 3x yx y x y

2 2 2( ,y) (2 3)x x y x yL

▲▲▲▲



The FOC are

And the constraint

(Note that the constraint could also be considered the FOC

for the Lagrangian with respect to λ, the Lagrange

multiplier.)

2 4 2 ( 2 ) 0xy x x yx

L

2 2 0x yy

L

2 22 3 0x y

▲▲▲▲



To find the solutions of these three equations, first note that

from the first equation we have either x = 0 or y = 2λ. We

can check each possibility in turn.

x = 0: we have y = 31/2 and λ = 0, or y = −31/2 and λ = 0.

y = 2λ: we have x2 = y2 from the second equation, so either x =

1 or x = −1 from the third equation.

x = 1: either y = 1 and λ = 1/2, or y = −1 and λ = −1/2.

x = −1: either y = 1 and λ = 1/2, or y = −1 and λ = −1/2.

▲▲▲▲



So, the FOC have six solutions: 1. (x, y, λ) = (0, 31/2,0), with f (x, y) = 0.

2. (x, y, λ) = (0, −31/2,0), with f (x, y) = 0.

3. (x, y, λ) = (1, 1, 1/2), with f (x, y) = 1.

4. (x, y, λ) = (1, −1, −1/2), with f (x, y) = −1.

5. (x, y, λ) = (−1, 1, 1/2), with f (x, y) = 1.

6. (x, y, λ) = (−1, −1, −1/2), with f (x, y) = −1.

Now, g'1(x, y) = 4x and g'2(x, y) = 2y, so the only value of (x, y) for which g'1(x, y) = 0 and g'2(x, y) = 0 is (x, y) = (0, 0). At this point the constraint is not satisfied, so the only possible solutions of the problem are the solutions of the first-order conditions.

We conclude that the problem has two solutions, (x, y) = (1, 1) and (x, y) = (−1, 1).▄

▲▲▲▲


2/3/2009Richard R. Lindsey134

Consider the problem

And suppose we solve the problem for various values of c.

Let the solution be (x*(c), y*(c)) with a Lagrange

multiplier of λ*(c). Assume that the functions x*, y*, and

λ* are differentiable and that g1'(x*(c), y*(c)) ≠ 0 or

g2'(x*(c), y*(c)) ≠ 0, so that the first-order conditions are

satisfied. Let f *(c) = f (x*(c), y*(c)). Differentiate f *(c)

with respect to c:


▲▲▲



Differentiate f *(c) with respect to c:

(using the FOC). Note, however, that g(x*(c), y*(c)) = c for

all c, so the derivatives of each side of this equality are the

same for all c. That is

* * * * * * * * *

* * * * * * * **

( ) ( ( ), ( )) ( ) ( ( ), ( )) ( )

( ( ), ( )) ( ) ( ( ), ( )) ( )( )

f c f x c y c x c f x c y c y c

c x c y c

g x c y c x c g x c y c y cc

x c y c

* * * * * * * *( ( ), ( )) ( ) ( ( ), ( )) ( )1

g x c y c x c g x c y c y cc

x c y c

▲▲▲



Therefore

Or…The value of the Lagrange multiplier at the solution of the problem is equal to the rate of change in the maximal value of the objective function as the constraint is relaxed.

(Note that this follows directly from our use of the gradient earlier.)

So, in a utility maximization problem, the optimal value of the Lagrange multiplier measures marginal utility of our control variable (or the shadow price of that variable).

**( )( )

f cc

c

▲▲▲



Sufficient conditions for a local optimum with two variables.

Consider the problem

Suppose (x*, y*) and λ* satisfy the FOC:

And the constraint


* * * *( , ) ( , )0

f x y g x y

x x* * * *( , ) ( , )

0f x y g x y

y y

* *( , )g x y c

▲▲▲



Then

If D(x*, y*, λ*) > 0 then (x*, y*) is a local maximizer

of f subject to the constraint g(x, y) = c.

If D(x*, y*, λ*) < 0 then (x*, y*) is a local mimimizer

of f subject to the constraint g(x, y) = c.

Where D(x*, y*, λ*) is the determinant of the bordered

Hessian of the Lagrangian.

▲▲▲



* * * *

* * 2 * * 2 * * 2 * * 2 * ** * * * *

* * 2 * * 2 * * 2 * * 2 * ** *

( , ) ( , )0

( , ) ( , ) ( , ) ( , ) ( , ), ,

( , ) ( , ) ( , ) ( , ) ( , )

g x y g x y

x y

g x y f x y g x y f x y g x yD x y

x x x x x x y x y

g x y f x y g x y f x y g x y

y y x y x y y y y

▲▲▲



Example: Consider again the problem

We previously found that there are six solutions to the FOC

1. (x, y, λ) = (0, 31/2,0), with f (x, y) = 0.

2. (x, y, λ) = (0, −31/2,0), with f (x, y) = 0.

3. (x, y, λ) = (1, 1, 1/2), with f (x, y) = 1.

4. (x, y, λ) = (1, −1, −1/2), with f (x, y) = −1.

5. (x, y, λ) = (−1, 1, 1/2), with f (x, y) = 1.

6. (x, y, λ) = (−1, −1, −1/2), with f (x, y) = −1.

2 2 2

,max subject to 2 3x yx y x y

▲▲▲▲



Further, we found that solutions 3 and 5 are global

maximizers and solutions 4 and 6 are global minimizers.

The two remaining solutions of the FOC, (0, 31/2) and

(0, −31/2), are neither global maximizers nor global

minimizers. Are they local maximizers or local

minimizers?

▲▲▲▲



The determinant of the bordered Hessian of the Lagrangian

is

The determinant is

0 4 2

( , , ) 4 2 4 2

2 2 2

x y

D x y x y x

y x

2 2 2 2 2

2 2

4 ( 8 4 ) 2 (8 2 (2 4 )) 8(2 (2 ) (4 ))

8(6 (4 ))

x x xy y x y y x y y x y

y x y

▲▲▲▲



(since 2x2 + y2 = 3 at each solution, from the constraint). The

value of the determinant at the two solutions is

(0, 31/2, 0): −8·33/2, so (0, 31/2) is a local minimizer;

(0, −31/2, 0): 8·31/2, so (0, −312) is a local maximizer. ▄

▲▲▲▲



Proposition: Suppose that f and g are continuously differentiable functions defined on an open convex subset S of two-dimensional space and suppose that there exists a number λ* such that (x*, y*) is an interior point of S that is a stationary point of the Lagrangean

Suppose further that g(x*, y*) = c.

Then if L is concave – in particular if f is concave and λ*g is convex – then

(x*, y*) solves the problem maxx,y f (x, y) subject to g(x, y) = c.

L is convex – in particular if f is convex and λ*g is concave – then (x*, y*) solves the problem minx,y f (x, y) subject to g(x, y) = c.

( ,y) ( , ) ( ( , ) )x f x y g x y cL

▲

Envelope Theorem


Often we are interested in how the maximal value of a

function depends on its parameters.

Consider the unconstrained maximization problem:

Assume that for any a the problem has a unique solution;

denote this solution x*(a). Denote the maximum value

of f , for any given value of a, by M *(a): M *(a)

= f (x*(a), a). We call M * the value function.

max ( ( ), ) xf x a a

▲▲▲

Envelope Theorem


Taking the derivative of M using the chain rule

The first term is the indirect effect of how changing a affects the optimal

choice of x and how that change in x affects the value of f. The second term

is the direct effect of how changing a changes f holding x fixed at x(a). This

expression can be simplified by noticing that since x*(a) is the optimal

choice for x at each value of a,

* * *( ) ( , ) ( ) ( ( ), )dM a f x a dx a f x a a

da x da a

*( , )0

f x a

x

▲▲▲

Envelope Theorem


This means

Or the change in the objective function adjusting optimally

is equal to the change in the objective function when one

doesn’t adjust x.

In other words, the total derivative of f(x(a),a) with respect

to a is equal to the partial derivative of f(x(a),a) with

respect to a, evaluated at the optimal choice of x.

This is known as the Envelope Theorem.

* *( ) ( ( ), )dM a f x a a

da a

▲▲▲

Envelope Theorem


Note that to compute the effect of changing a on x(a), we

differentiate the FOC

*( , )

0

f x ax

a

2 * 2 *

2

( , ) ( ) ( ( ), )0

f x a dx a f x a a

da x ax

▲▲

Envelope Theorem


The sign of the denominator is negative by the SOC,

therefore the sign of the expression is determined by the

sign of the mixed partial in the numerator.

2 *

2 *

2

( ( ), )( )

( , )

f x a adx a x a

da f x a

x

▲▲

Envelope Theorem


Now consider

Then the Lagrangian is

The envelope theorem states

Again, we only have to take into account the change in y, not the associated change in x.

,max ( , ) subject to g( , ) 0x yf x y x y

( ,y) ( , ) ( , )x f x y g x yL

* * **( (y),y) ( ( ), ) ( ( ), )x f x y y g x y y

y y y

L

▲

Envelope Theorem


Example: Consider a utility maximization problem: maxx

U(x) subject to p·x = w. where x is a vector (a bundle of

goods), p is the price vector, and w is the consumer's

wealth (a real number). Denote the solution of the problem

by x*(p, w), and denote the value function by v, so that

The function v is known as the indirect utility function.

*( , ) ( ( , )) for every ( , )v p w U x p w p w

▲▲

Envelope Theorem


By the envelope theorem

Thus

This result is known as Roy's identity. ▄

* *( , )( , ) ( , )i

ii

v p wp w x p w

p

*( , )( , )

v p wp w

w

*

( , )

( , )( , )

i

ii

v p w

px p w

v p w

w

▲▲

Mean-Variance Analysis: Intro


Mean-variance model for asset choice was developed by

Markowitz (1952 Journal of Finance).

Recalling our discussion of stochastic dominance, we can

see that, in general, investors should have MISC

preferences. In other words, they should exhibit a

preference for expected return and aversion to variance.

But for arbitrary distribution functions and utility functions

E[U(·)] cannot be expressed as a function of only mean

and variance.

▲▲▲▲▲▲▲▲



To see this, take a Taylor series expansion around the

expected end of period wealth:

2

( )

3

( [ ]) ( [ ])( [ ])

1( [ ])( [ ])

2

1( [ ])( [ ])

!

n n

n

U w U E w U E w w E w

U E w w E w

U E w w E wn

▲▲▲▲▲▲▲▲



Taking the expectation:

Unless the last term is zero, we need more than the mean and variance.

Note that the last part of the last term is the nth central moment of w .

( )

3

1[ ] ( [ ]) ( [ ]) [ ]

2

1( [ ]) [( [ ])]

!

n n

n

E U w U E w U E w Var w

U E w E w E wn

▲▲▲▲▲▲▲▲



For arbitrary distributions, the mean-variance model can be

motivated by assuming quadratic utility:

There are no additional terms because the third and higher

order derivatives are zero.

2

2 2

[ ] [ ] [ ]2

[ ] ( [ ]) ( )2

bE U w E w E w

bE w E w w

▲▲▲▲▲▲▲▲



Problems with quadratic utility

Saturation (i.e. utility decreases as wealth increases after a certain

point).

Increasing absolute risk aversion (i.e. risky assets are inferior

goods).

▲▲▲▲▲▲▲▲



For arbitrary preferences, the mean-variance model can be

motivated by assuming that rates of return on risky assets

are multivariate normal.

The normal is completely characterized by the mean and the

variance (all higher moments can be described as

functions of the first two moments).

Note: the lognormal is also characterized by the mean and

variance, but is not stable under addition.

▲▲▲▲▲▲▲▲



Problems with normality

Unbounded

Inconsistent with limited liability

Inconsistent with economic theory (no place for negative

consumption)

Experimentally, returns are not normal

Note: multivariate normal is sufficient for mean-variance

analysis, but not necessary.

▲▲▲▲▲▲▲▲



Although the mean-variance model is not a general model of

asset choice, it holds a central role in finance due to it’s

tractability and it’s richness of empirical predictions.

▲▲▲▲▲▲▲▲

Mean-Variance Analysis: Basics


Assume that we have:

N ≥ 2 assets

frictionless markets

unlimited short selling

common knowledge about

expected returns

the variance-covariance structure

finite variances and unequal expectations

variance-covariance matrix of asset returns

1

the vector of expected returns

N

e

e

e

▲▲▲▲▲▲▲▲▲▲▲▲



If we plot the variance and expected returns for all N

securities

▲▲▲▲▲▲▲▲▲▲▲▲



And then consider all possible portfolios of them

▲▲▲▲▲▲▲▲▲▲▲▲



We have the feasible set of portfolios in mean-variance

space (which is a parabola).

▲▲▲▲▲▲▲▲▲▲▲▲



Definition: A portfolio is a frontier portfolio if it has the

minimum variance among portfolios having the same

expected rate of return.

1

[ ] [ ] 1N

p i i

i

E r w E r w e w w

1 1

[ ]N N

p i j ij

i j

Var r w w w w

▲▲▲▲▲▲▲▲▲▲▲▲



A portfolio p is a frontier portfolio iff wp, the N-vector of

portfolio weights of p is the solution to:

{ }

1min

2

s.t. and 1

w

p

w w

w e E r w

▲▲▲▲▲▲▲▲▲▲▲▲



Forming the Lagrangian and solving for the first order

conditions:

F.O.C.

1

12

pw w E r w e w L

0w ew

L

0pE r w e

L

1 0w

L

▲▲▲▲▲▲▲▲▲▲▲▲



Since Ω is positive definite, these first order conditions are

necessary and sufficient for a global optimum.

Solving the 1st FOC for the weights

Premultiply by the expected returns and using the 2nd FOC

1 1

pw e

1 1

pE r e e e

▲▲▲▲▲▲▲▲▲▲▲▲



Or premultiply the portfolio weights by a vector of 1’s and

use the 3rd FOC

Define

1 11 e

1 1A e e 1B e e

1C 2D BC A

B AM

A C

▲▲▲▲▲▲▲▲▲▲▲▲



Note: A, B, C, and D are just numbers. M contains

sufficient information to prove everything in efficient set

mathematics.

Solving for the Lagrange multipliers

▲▲▲▲▲▲▲▲▲▲▲▲

C A

D

pE r

B A

D

pE r



And substituting into our expression for wp gives

Any frontier portfolio can be found this way since the expected return was arbitrary and this equation is a necessary and sufficient solution.

1 1C [ ] A B A [ ]

D D

p pp

E r E rw e

1 1 1 11 1C A [ ] B A

D Dp pw e E r e

h [ ] gp pw E r

▲▲▲▲▲▲▲▲▲▲▲▲



Note that is the vector of portfolio weights corresponding

to a frontier portfolio with E[r]=0 and that is the

vector of portfolio weights corresponding to a frontier

portfolio with E[r]=1.

Claim all frontier portfolios can be generated by forming

portfolios of the two frontier portfolios formed with

weights and .

Note that it therefore follows that all frontier portfolios can

be formed from any two distinct frontier portfolios.

g

g

g h

g h

▲▲▲▲▲▲▲▲▲▲▲▲

Mean-Variance Analysis: Frontier


The covariance between the returns of any two frontier

portfolios is

Or the variance of any frontier portfolio can be found and

then we can write

1 C A A( , ) [ ] [ ]

C D C Cp q p q p qCov r r w w E r E r

2

2

2

A( ) C

11 D

C C

pp

E rr

▲▲▲▲▲



Which is the equation of a hyperbola in SD-E[r] space with

center (0, A/C) and asymptotes

The minimum variance portfolio is defined as the portfolio

having the minimum variance of all possible portfolios.

Note

A D

C Cp pE r

1[ ]

CMVE r

A[ ]

CMVVar r

▲▲▲▲▲



Definition: Frontier

portfolios which have

expected rates of return

strictly greater than that

of the minimum variance

portfolio are called

efficient portfolios.

These are portfolios which

have the highest return

for a given variance.

▲▲▲▲▲



Let be m frontier portfolios and

be real numbers such that .

Then

Therefore, any linear combination of frontier portfolios is on

the frontier.

1, ,iw i m

1, ,i i m

1

1m

i

i

1 1

1

m m

i i i i

i i

m

i

i

w g hE r

g h E r

▲▲▲▲▲



If the i=1,…,m portfolios are efficient, and αi>0 for all i,

then

Any convex combination of efficient portfolios is an

efficient portfolio (i.e. the set of efficient portfolios is a

convex set).

1 1

A A

C C

m m

i i i

i i

E r

▲▲▲▲▲

Bibliography


Cornuejols and Tütüncü, Optimization Methods in Finance, Cambridge.

Huang and Litzenberger, Foundations for Financial Economics, North-Holland.

Intriligator, Mathematical Optimization and Economic Theory, Prentice-Hall.

Marsden and Tromba, Vector Calculus, Freeman.

Varian, Microeconomic Analysis, Norton.

Mean-Variance Analysis: Risk Free Rate


Everything we have done so far did not have a riskless asset.

Now consider N+1 assets with equal to the portfolio

weights on risky assets is the solution to

▲▲▲▲▲▲

pw

pw

{ }

1min

2

s.t. (1 )

w

f p

w w

w e w r E r



Which has the solution

▲▲▲▲▲▲

1

2B 2A C

p f

p f

f f

E r rw e r

r r

2

2

2( )

B 2A C

p f

p

f f

E r rr

r r



There are three cases.

1. A/C>rf

▲▲▲▲▲▲



2. A/C<rf

▲▲▲▲▲▲



3. A/C=rf

Note: invest everything in the riskless asset and hold an arbitrage portfolio of risky assets whose weight sums to zero.

▲▲▲▲▲▲



We can also write

which holds independent of the relationship between rf and

A/C

and

for any frontier portfolio p other than the riskless asset.

▲▲▲▲▲▲

q f qp p fE r r E r r

1q qp f qp p qr r r

, 0p q qCov r E

Mean-Variance Analysis


Let’s return to our minimization problem:

There are alternative ways to pose this problem; for

example, we could rewrite the constraints as:

▲▲▲▲

{ }

1min

2

s.t. and 1

w

p

w w

w e E r w

Aw b



Where

Note: If we wanted to include a riskless asset, we could also have N+1 assets

with one of the assets’ return equal to the risk-free rate.

▲▲▲▲

1 2

1 1 1

N

Ae e e

1

[ ]p

bE r



Forming the Lagrangian

With FOC

▲▲▲▲

1

2w w b Aw L

0w Aw

L

0A w b

L



Solving now, from the first FOC

Substituting into the second FOC and solving for the

optimal weights gives

▲▲▲▲

1w A

1 1 1( )w A A A b



Example: Assume that we have three stocks with the

following characteristics (what do you expect?)

▲▲▲

1

2

3

0.100162

0.164244

0.182082

e

e e

e

11 12 13

21 22 23

31 32 33

0.100162 0.045864 0.005712

0.210773 0.028283

0.066884



And that we want a 15% return on the portfolio (is this

feasible?). The constraints can be written

▲▲▲

1 1 1

0.100162 0.164244 0.182082A

1

0.15b



Now we can use the solution to find the optimal weights

▄

▲▲▲

1 1 1( )w A A A b

0.3830

0.0397

0.5773

w



Do you see any problems or issues associated with the

solution to our portfolio problem?

▲▲▲



Do you see any problems or issues associated with the

solution to our portfolio problem?

There may be other constraints which must be imposed:

Diversification constraints

max or min

Short-sale constraints

Borrowing constraints

Leverage constraints

Tracking error constraints

Etc.

▲▲▲



For example, the Investment Company Act of 1940

Rule 12-d3 imposes certain investment constraints on

mutual funds:

Mutual funds cannot own more than 5% of other investment

companies (firms which derive more than 15% of revenue from

securities related activity)

If a mutual fund advertises as a diversified fund, it cannot hold

more than 5% of its assets in any company or hold more than

10% of the voting stock for any company for 75% of the fund

▲▲▲



This means that we may want to (or need to) place

additional constraints on our optimization. Further, these

constraints may be inequality constraints (for example a

short-sale constraint would be expressed as wi ≥ 0 for

all i.).

So, let’s revisit optimization – this time with inequality

constraints.

▲▲▲

Optimization with Inequalities


Consider a problem of the form

where f and gj for j = 1, ..., m are functions of n variables, x

= (x1, ..., xn), and cj for j = 1, ..., m are constants.

All of the problems we have studied so far can be put into

this form…

▲▲

max ( ) subject to ( ) for 1, ,j jx

f x g x c j m



For equality constraints, we simply introduce two inequality

constraints for every equality. For example, the problem

Can be written as

▲▲

max ( ) subject to ( ) 0x

f x g x

max ( ) subject to ( ) 0 and ( ) 0x

f x g x g x



To start thinking about how to solve the general problem,

first consider the case with a single constraint

There are two possible solutions for this problem, one where

the constraint is binding and the other is where the

constraint does not bind. In the latter case, where the

constraint is not binding for small changes in the

constraint, we say that the constraint is slack.

▲▲▲▲▲▲

max ( ) subject to ( )x

f x g x c



▲▲▲▲▲▲



As before, we define the Lagrangian by

From our previous analysis of problems with equality

constraints and problems with no constraints,

if g(x*) = c (as in the left-hand panel) and the constraint

satisfies a regularity condition, then L'i(x*) = 0 for all i

if g(x*) < c (as in the right-hand panel), then f i'(x*) = 0 for

all i.

▲▲▲▲▲▲

( ) ( ) ( ( ) )x f x g x cL



In the first case (that is, if g(x*) = c) we have λ ≥ 0. Suppose, to the contrary, that λ < 0. Then we know that a small decrease in c raises the maximal value of f . That is, moving x* inside the constraint raises the value of f , contradicting the fact that x* is the solution of the problem.

In the second case, the value of λ does not enter the conditions, so we can choose any value for it. Given the interpretation of λ, setting λ = 0 makes sense. Under this assumption we have f i'(x) = L'i(x) for all x, so that L'i(x*) = 0 for all i.

▲▲▲▲▲▲



Thus in both cases we have L'i(x*) = 0 for all i, λ ≥ 0, and

g(x*) ≤ c. In the first case we have g(x*) = c and in the

second case λ = 0.

We can combine the two cases by writing the conditions as

▲▲▲▲▲▲

*( )0 for 1, ,

j

xj n

x

L

* *0, ( ) , and either 0 or ( ) 0g x c g x c



Alternatively, since the product of two numbers is zero if at

least one of them is zero, we can write

Note that we have not ruled out the possibility that both λ = 0 and g(x*) = c.

The inequalities λ ≥ 0 and g(x*) ≤ c are called

complementary slackness conditions; at most one of these

conditions is slack (i.e. not an equality).

▲▲▲▲▲▲

*( )0 for 1, ,

j

xj n

x

L

* *0, ( ) , and ( ( ) ) 0g x c g x c



For a problem with many constraints, we introduce a

multiplier for each constraint and obtain the Kuhn-Tucker

conditions. For the problem

The Kuhn-Tucker conditions are

▲▲


f x g x c j m

*( )0 for 1, ,

i

xi n

x

L

* *0, ( ) , and ( ( ) ) 0 for 1, ,j j j j j jg x c g x c j m



Where

▲▲

1

( ) ( ) ( ( ) )m

j j j

j

x f x g x c

L




The Lagrangian is

▲▲

1 2

2 21 2

,max ( 4) ( 4)x x

x x

2 21 2 1 2 1 1 2 2 1 2( , ) ( 4) ( 4) ( 4) ( 3 9)x x x x x x x x L

1 2 1 2subject to 4 and 3 9x x x x



And the Kuhn-Tucker conditions are

▄

▲▲

1 1 2

2 1 2

1 2 1 1 1 2

1 2 2 2 1 2

2( 4) 0

2( 4) 0

4, 0, and ( 4) 0

3 9, 0, and ( 3 9) 0

x

x

x x x x

x x x x



We have seen that a solution x* of an optimization problem

with equality constraints is a stationary point of the

Lagrangean if the constraints satisfy a regularity condition

(∇g(x*) ≠ 0 in the case of a single constraint g(x) = c)). In

an optimization problem with inequality constraints a

related regularity condition guarantees that a solution

satisfies the Kuhn-Tucker conditions. The weakest forms

of this regularity condition are difficult to verify. The next

result gives three alternative strong forms that are much

easier to verify.

▲▲▲▲



Proposition Let f and gj for j = 1, ..., m be continuously

differentiable functions of many variables and let cj for j =

1, ..., m be constants. Suppose that x* solves the problem

Suppose that either each gj is concave

or each gj is convex and there is some x such that gj(x) < cj for j = 1, ..., m

or each gj is quasi-convex, ∇gj(x*) ≠ (0, ..., 0) for all j, and there is some x

such that gj(x) < cj for j = 1, ..., m.

Then there exists a unique vector λ = (λ1, ..., λm) such that

(x*, λ) satisfies the Kuhn-Tucker conditions.

▲▲▲▲


f x g x c j m



Example of a quasi-convex

function which is not

convex.

Example of a function which

is not quasi-convex.

▲▲▲▲



Recall that a linear function is concave, so the conditions in

the result are satisfied if each constraint function is linear.

Note that the last part of the second and third conditions is

very weak: it requires only that some point strictly satisfy

all the constraints.

One way in which the conditions in the result may be

weakened is sometimes useful: the conditions on the

constraint functions need to be satisfied only by the

binding constraints—those for which gj(x*) = cj.

▲▲▲▲



We saw previously that for both an unconstrained

maximization problem and a maximization problem with

an equality constraint the first-order conditions are

sufficient for a global optimum when the objective and

constraint functions satisfy appropriate

concavity/convexity conditions. The same is true for an

optimization problem with inequality constraints.

Precisely, we have the following result.

▲▲▲▲▲▲



Proposition: Let f and gj for j = 1, ..., m be continuously

differentiable functions of many variables and let cj for j =

1, ..., m be constants. Consider the problem

Suppose that

f is concave

and gj is quasi-convex for j = 1, ..., m.

If there exists λ = (λ1, ..., λm) such that (x*, λ) satisfies the

Kuhn-Tucker conditions then x* solves the problem.

▲▲▲▲▲▲


f x g x c j m



Corollary: The Kuhn-Tucker conditions are both necessary

and sufficient if the objective function is concave and

either

each constraint is linear

or each constraint function is convex and some vector of the

variables satisfies all constraints strictly.

But sometimes the condition that the objective function is

concave is too strong to be useful, for instance, we

generally assume that utility functions are quasi-concave,

in which case, the following result is useful.

▲▲▲▲▲▲



Proposition: Let f and gj for j = 1, ..., m be continuously differentiable functions of many variables and let cj for j = 1, ..., m be constants. Consider the problem

Suppose that f is twice differentiable and quasi-concave

and gj is quasi-convex for j = 1,...,m.

If there exists λ = (λ1, ..., λm) and a value of x* such that (x*, λ) satisfies the Kuhn-Tucker conditions and f 'i(x*) ≠ 0 for i = 1, ..., n then x* solves the problem.

▲▲▲▲▲▲


f x g x c j m



Corollary: Suppose that the objective function is twice

differentiable and quasi-concave and every constraint is

linear. If x* solves the problem then there exists a unique

vector λ such that (x*, λ) satisfies the Kuhn-Tucker

conditions, and if (x*, λ) satisfies the Kuhn-Tucker

conditions and f 'i(x*) ≠ 0 for i = 1, ..., n then x* solves

the problem.

▲▲▲▲▲▲



Very Important!

If you have a minimization problem, remember that you can

transform it to a maximization problem by multiplying the

objective function by −1. Thus for a minimization

problem the condition on the objective function in the first

result above is that it be convex, and the condition in the

second result is that it be quasi-convex.

▲▲▲▲▲▲



Example: maxx[−(x − 2)2] subject to x ≥ 1

Written in the standard format, this problem is

maxx[−(x − 2)2] subject to 1 − x ≤ 0.

The objective function is concave and the constraint is linear. Thus the Kuhn-Tucker conditions are both necessary and sufficient: the set of solutions of the problem is the same as the set of solutions of the Kuhn-Tucker conditions.

▲▲




−2(x − 2) + λ = 0

x−1 ≥ 0, λ ≥ 0, and λ(1 − x) = 0.

From the last condition we have either λ = 0 or x = 1.

x = 1: 2 + λ = 0, or λ = −2, which violates λ ≥ 0.

λ = 0: −2(x − 2) = 0; the only solution is x = 2.

Thus the Kuhn-Tucker conditions have a unique solution,

(x, λ) = (2, 0). Hence the problem has a unique solution

x = 2. ▄

▲▲



Example: maxx[−(x − 2)2] subject to x ≥ 3

Written in the standard format, this problem is

maxx[−(x − 2)2] subject to 3 − x ≤ 0.

As in the previous example, the objective function is concave and the constraint function is linear, so that the set of solutions of the problem is the set of solutions of the Kuhn-Tucker conditions.

▲▲




−2(x−2) + λ = 0

x−3 ≥ 0, λ ≥ 0, and λ(3 − x) = 0.

From the last conditions we have either λ = 0 or x = 3. x = 3: −2 + λ = 0, or λ = 2.

λ = 0: −2(x − 2) = 0; since x ≥ 3 this has no solution compatible with the other conditions.

Thus the Kuhn-Tucker conditions have a single solution, (x, λ) = (3, 2). Hence the problem has a unique solution, x = 3.▄

▲▲



These two examples illustrate a procedure for finding

solutions of the Kuhn-Tucker conditions that is useful in

many problems.

1. Look at the complementary slackness conditions, which

imply that either a Lagrange multiplier is zero or a

constraint is binding.

2. Check the implications of each case, using the other

equations.

In these two examples, this procedure is very easy to follow.

The following examples are more complicated.

▲




The objective function is concave and the constraints are

both linear, so the solutions of the problem are the

solutions of the Kuhn-Tucker conditions.

▲▲▲▲▲▲

1 2

2 21 2

,max ( 4) ( 4)x x

x x

1 2 1 2subject to 4 and 3 9x x x x



We previously found the Kuhn-Tucker conditions,

What are the solutions of these conditions? Start by looking at the two conditions λ1(x1 + x2 − 4) = 0 and λ2(x1 + 3x2 − 9) = 0. These two conditions yield the following four cases.

▲▲▲▲▲▲

1 1 2

2 1 2

1 2 1 1 1 2

1 2 2 2 1 2

2( 4) 0

2( 4) 0

4, 0, and ( 4) 0

3 9, 0, and ( 3 9) 0

x

x

x x x x

x x x x



(1) x1 + x2 = 4 and x1 + 3x2 = 9:

In this case we have x1 = 3/2 and x2 = 5/2. Then the first two

equations are

5 − λ1 − λ2 = 0

3 − λ1 − 3λ2 = 0

which imply that λ1 = 6 and λ2 = −1, which violates the

condition λ2 ≥ 0. We can rule out this case.

▲▲▲▲▲▲



(2) x1 + x2 = 4 and x1 + 3x2 < 9, so that λ2 = 0:

Then first two equations imply x1 = x2 = 2 and λ1 = 4.

All the conditions are satisfied, so

(x1, x2, λ1, λ2) = (2, 2, 4, 0) is a solution.

▲▲▲▲▲▲



(3) x1 + x2 < 4 and x1 + 3x2 = 9, so that λ1 = 0:

Then the first two equations imply x1 = 12/5 and x2 = 11/5,

violating x1 + x2 < 4. We can rule out this case.

▲▲▲▲▲▲



(4) x1 + x2 < 4 and x1 + 3x2 < 9, so that λ1 = λ2 = 0:

Then first two equations imply x1 = x2 = 4, violating x1 + x2

< 4. We can rule out this case.

So (x1, x2, λ1, λ2) = (2, 2, 4, 0) is the single solution of the

Kuhn-Tucker conditions. Hence the unique solution of

problem is (x1, x2) = (2, 2).▄

▲▲▲▲▲▲



Example: maxx,y xy subject to x + y ≤ 6, x ≥ 0, and y ≥ 0.

The objective function is twice-differentiable and quasi-

concave and the constraint functions are linear, so the

Kuhn-Tucker conditions are necessary and if ((x*, y*), λ*)

satisfies these conditions and no partial derivative of the

objective function at (x*, y*) is zero then (x*, y*) solves

the problem. Solutions of the Kuhn-Tucker conditions at

which all derivatives of the objective function are zero

may or may not be solutions of the problem (we need to check

the values of the objective function at these solutions).

▲▲▲▲



The Lagrangian is


y − λ1 + λ2 = 0

x − λ1 + λ3 = 0

λ1 ≥ 0, x + y ≤ 6, λ1(x + y − 6) = 0

λ2 ≥ 0, x ≥ 0, λ2x = 0

λ3 ≥ 0, y ≥ 0, λ3y = 0.

1 2 3( , ) ( 6)x y xy x y x y L

▲▲▲▲



(1) If x > 0 and y > 0 then λ2 = λ3 = 0, so that λ1 = x = y from

the first two conditions. Hence x = y = λ = 3 from the third

condition. These values satisfy all the conditions.

(2) If x = 0 and y > 0 then λ3 = 0 from the last condition and

hence λ1 = x = 0 from the second condition. But now from

the first condition λ2 = −y < 0, contradicting λ2 ≥ 0.

(3) If x > 0 and y = 0 then λ2 = 0, and a symmetric argument

yields a contradiction.

(4) If x = y = 0 then λ1 = 0 form the third set of conditions,

so that λ2 = λ3 from the first and second conditions. These

values satisfy all the conditions.

▲▲▲▲



We conclude that there are two solutions of the Kuhn-

Tucker conditions, (x, y, λ1, λ2, λ3) = (3, 3, 3, 0, 0) and

(0, 0, 0, 0, 0). The value of the objective function at (3, 3)

is greater than the value of the objective function at (0, 0),

so the solution of the problem is (3, 3). ▄

▲▲▲▲

Optimization Summary


Conditions under which FOC are necessary and sufficient:

Unconstrained Maximization Problems

If x* solves maxx f (x) then f 'i(x*) = 0 for i = 1, ..., n.

If f 'i(x*) = 0 for i = 1, ..., n and if f is concave then x*

solves maxx f (x).

▲▲▲▲



Equality Constrained Maximization Problems (one constraint)

If x* solves maxx f (x) subject to g(x) = c, and if

∇g(x*) ≠ (0,...,0), then there exists λ such that L'i(x*) = 0

for i = 1, ..., n and g(x*) = c.

If there exists λ such that L'i(x*) = 0 for i = 1, ..., n and

g(x*) = c and if f is concave and λg is convex then x*

solves maxx f (x) subject to g(x) = c.

▲▲▲▲



Inequality Constrained Maximization Problems

If x* solves maxx f (x) subject to gj(x) ≤ cj for j = 1, ..., m

and if {gj is concave for j = 1, ..., m} or {gj is convex for

j = 1, ..., m and there exists x such that gj(x) < cj for

j = 1, ..., m} or {gj is quasi-convex for j = 1, ..., m,

∇gj(x*) ≠ (0,...,0) for j = 1, ..., m, and there exists x such

that gj(x) < cj for j = 1, ..., m} then there exists (λ1,...,λm)

such that L'i(x*) = 0 for i = 1, ..., n and λj ≥ 0, gj(x*) ≤ cj,

and λj(gj(x*) − cj) = 0 for j = 1, ..., m.

▲▲▲▲



Inequality Constrained Maximization Problems

If there exists (λ1,...,λm) such that L'i(x*) = 0 for i = 1, ..., n

and λj ≥ 0, gj(x*) ≤ cj, and λj(gj(x*) − cj) = 0 for j = 1, ..., m

and if gj is quasi-convex for j = 1, ..., m and either {f is

concave} or {f is quasi-concave and twice differentiable

and ∇ f (x*) ≠ (0,...,0) where L(x) = f (x) − ∑j=1mλj(gj(x) −

cj)} then x* solves maxx f (x) subject to gj(x) ≤ cj for

j = 1, ..., m.

▲▲▲▲

Bibliography


Cornuejols and Tütüncü, Optimization Methods in Finance, Cambridge.

Huang and Litzenberger, Foundations for Financial Economics, North-Holland.

Intriligator, Mathematical Optimization and Economic Theory, Prentice-Hall.

Marsden and Tromba, Vector Calculus, Freeman.

Varian, Microeconomic Analysis, Norton.

▲▲▲▲▲▲▲▲▲▲▲▲▲



3. A/C=rf

Note: invest everything in the riskless asset and hold an arbitrage portfolio of risky assets whose weight sums to zero.

▲▲



Recall the expression for the optimal weights

Substituting rf=A/C and premultiplying by ι, we get

1

2B 2A C

p f

p f

f f

E r rw e r

r r

1

2

2

A

C B 2A C

AA C

C B 2A C

0

p f

p

f f

p f

f f

E r rw e

r r

E r r

r r

▲▲

M-V Analysis Inequalities


Let’s return to our exploration of mean-variance analysis.

When we add inequality constraints to our problem, the

quadratic optimization problem generally does not have a

simple analytical solution. Instead, we must use

numerical methods to solve for the optimal portfolio

weighting.

▲▲

M-V Analysis Inequalities


State-of-the-art quadratic programming algorithms with inequality constraints use two kinds of approaches: (1) the active-set method or projection method, and (2) the interior point method.

Both of these approaches solve a series of sub-problems where there are only equality constraints. They differ only in how they arrange the order of those sub-problems. In the active-set method, you proceed along the boundary of the feasible set defined by the constraints. In the interior-point method, you proceed within the feasible set. (You can use Matlab’s functions e.g. quadprog).

Current implementations of interior methods often outperform active set methods in terms of speed. On the other hand, active set methods are more robust and better suited for warm starts, which are important for solving integer optimization problems (quadprog uses an

active set method).

▲▲

M-V Analysis Inequalities: Example


Example: Let’s return to our earlier numerical example,

adding the restriction that we cannot short any of the

stocks. In addition, we will also add the constraint that

stock 2 must have a weight of at least 0.10. Our problem

can be written:

▲▲▲▲

1min

2

s.t.

ww w

Aw b



Where

▲▲▲▲

1 1 1

0.100162 0.164244 0.182082

1 0 0

0 1 0

0 0 1

1 0 1

A



And

Notice to express the constraint that w2≤0.10, we used w1+w3≤0.90. Sometimes

we need to reengineer our constraints to reach a solution.

▲▲▲▲

1

0.15

0

0

0

0.90

b



The solution is

(using quadprog this took 1 iteration)

▄

▲▲▲▲

0.3699

0.1000

0.5301

w

M-V Analysis


Congratulations!

M-V Analysis


Congratulations!

Now you know how to do everything in portfolio analysis –

you just need to set up the appropriate problem.

M-V Analysis


Congratulations!

Now you know how to do everything in portfolio analysis –

you just need to set up the appropriate problem.

Let’s consider a few alternatives…

M-V Analysis: Diversification Constraint


As discussed last time, there are sometimes regulatory

requirements for diversification. In addition, many portfolios

are required (by their managers/investors) to have minimum

and/or maximum investment limits in certain stocks, industries,

sectors, or asset classes. These types of problems can be

generally expressed:

Where the vectors wl and wu represent lower and upper bounds.

1min

2

s.t.

and

w

l u

w w

Aw b

w w w

M-V Analysis: Trading Volume


A typical constraint is one on trading volume. This constraint may be used for a large portfolio where you want to avoid price impact or for any portfolio where you want to control the liquidity risk of the portfolio.

Where x is a vector of ADV in dollar terms and c is a constant for the threshold.

(e.g. $500 million portfolio; 10% of ADV (in millions) of stock iwi ≤ (0.1/500)xi ) Can you generalize this?

1min

2

s.t.

and

ww w

Aw b

w cx

M-V Analysis: Beta Exposure


Sometimes it is desirable to match the beta of a benchmark portfolio:

Where:

(note that this will not bound the tracking error or asset specific risk – only the factor risk)

benchmark

1min

2

s.t.

and

ww w

Aw b

w

1

N

M-V Analysis: Beta Exposure


Or we can specify a range for the beta exposure:

lower limit upper limit

1min

2

s.t.

and

ww w

Aw b

w

M-V Analysis: Factor Exposure


Or sometimes we are matching multiple factors:

Where:

(NB: tilting)

lower limit upper limit

1min

2

s.t.

and B

ww w

Aw b

w

11 12 1

21 22 2

1 2

B

K

K

N N NK

M-V Analysis: Tracking Error


Most professionals with a benchmark use a minimization of

tracking error when weighting stocks in the portfolio.



Most professionals with a benchmark use a minimization of

tracking error when weighting stocks in the portfolio.

Two methods:

1. Minimize the tracking error for a given expected excess

return over the benchmark.

2. Maximize the expected excess return over the benchmark

without exceeding a maximum tracking error constraint,



Tracking error is generally defined as the standard deviation of the portfolio returns minus the benchmark returns:

Consider the components of the variance

The last term is beyond our control and the first term is what we ―usually‖ minimize.

benchmarkTE ( )

( )

p

p b

StdDev r r

Var r r

( ) ( ) 2 ( , ) ( )p b p p b bVar r r Var r Cov r r Var r



Define

And our problem becomes

1( , )

( , )

b

N b

Cov r r

Cov r r

min 2

s.t.

and

w

p

w w w

Aw b

w

M-V Analysis: Tracking Error (Factors)


If we are dealing with multiple factors and want to minimize

tracking error, we note:

Where the vector f are the factors into which we have

decomposed returns and the residual terms for different

securities have covariance of zero.

( ) ( ) ( )i i i iVar r Var f Var

1 1i i j j K K ir f f f



We can then write the variance-covariance matrix as

Or

1,1 1, 1 1 1,1 ,1

,1 , 1 1, ,

1

( ) ( , )

( , ) ( )

( ) 0

0 ( )

K K N

N N K K K K N K

N

Var f Cov f f

Cov f f Var f

Var

Var

B ( )B ( )Var f Var



B then represents the N by K matrix of factor exposures;

Var(f ) is a K by K matrix of factor premium variances and

Var(ε) is an N by N diagonal matrix of error variances.

The squared tracking error is then

If we add any other relevant constraints, we can solve this

using our quadratic optimizer.

(note: we are now minimizing the tracking error)

2TE ( ) B ( )B ( ) ( ) ( )( )p b p b p b p bw w Var f w w w w Var w w

M-V Analysis: Tracking Error (Tilting)


When we actually have specific values or weights for our

factor exposure, we can tilt the portfolio to those weights

by applying a constraint

Where B is as defined earlier and d is the vector

representing the tilt. For example, if we have five factors:

market, size, growth, country, and sector and we wanted to

overweight size and growth, we could use

B ( ) dp bw w

d (0 0.1 0.1 0 0)

M-V Analysis: Tracking Error (Tilting)


The zeros in d make sure that the portfolio’s exposures to

the benchmark with respect to market, country and sector

are the same, and the values make sure that the exposure

to size and growth will by higher than the benchmark by

0.1.

With factor tilting, the optimization problem becomes

min( ) ( )( )

s.t. B ( ) d

and any other constraints

p

p b p bw

p b

w w Var w w

w w

M-V Analysis: Tracking Error (Ghost)


There may be cases in which you do not know what the

underlying securities in the benchmark are or their

weights. In this case, you would minimize the tracking

error with respect to the history of returns of the

benchmark. One possible approach is to minimize

Where βb is the benchmark’s factor exposure and εb is the

benchmark’s error term. Now that we have described the

tracking error, we continue as before.

2B B ( ) 0

TE ( )0 ( )1 1 1 1

p p p p

b b b

Varw w w wVar f

Var

M-V Analysis: Tracking Error (Risk-Adj)


As indicated earlier, an alternative approach is have a

maximum tracking error constraint and maximize

expected return of the portfolio subject to that constraint.

We could write this as

And any other constraints. Alternatively, if we did not have

a target mean or tracking error, we could use a tracking

error risk aversion parameter A and write

2

max

s.t. ( )

w

p b x

w

Var r r

max ( )p bw

w AVar r r


Note that these two formulations are related. The set of

maximum-return portfolios obtained as we vary the

tracking error constraint is identical to the set of optimal

portfolios obtained as we vary the tracking-error risk

aversion parameter. In other words, we can always choose

parameters so the two formulations are equivalent. This

property may be useful for solving the optimization

problem depending on how our optimizer wants the

problem to be set.

M-V Analysis: Tracking Error (Risk-Adj)

M-V Analysis


Get the idea?

One we know how to solve the portfolio optimization

problem, everything else is just a wrinkle.

M-V Analysis


Get the idea?

One we know how to solve the portfolio optimization

problem, everything else is just a wrinkle.

That doesn’t mean that it’s easy – what it means is that we

have to figure out how to pose the problem that we want

to solve in a manner in which we can solve it (with the

help of an optimizer).

M-V Analysis


Get the idea?

One we know how to solve the portfolio optimization problem, everything else is just a wrinkle.

That doesn’t mean that it’s easy – what it means is that we have to figure out how to pose the problem that we wantto solve in a manner in which we can solve it (with the help of an optimizer).

But, just for fun, let’s see if there is anything else we can learn.

M-V Analysis Utility


Notice that in the numerical example at the beginning of

class, we assumed that we wanted an expected return for

the portfolio of 15% and optimized to achieve that

objective. What makes this right?

▲▲▲

Theory would tell us

that what we want

to do is find the

point on the

efficient frontier

which maximizes

the investor’s utility.Note that less risk averse investors will have “flatter” indifference curves.



In practice, we often use a modified approach to mean-

variance analysis in which we construct optimal portfolios

for different risk tolerance parameters (λ), and by varying

λ, find the efficient frontier.

In this approach, we trade off risk against return by

maximizing

For various risk tolerances λ.

▲▲▲

21 1max max max

2 2p p

x x xU w w w



Where

The unconstrained optimum is found using the FOC

Under the normal regularity conditions.

▲▲▲

10

dUw

dw

* 1w

( , )ij iCov R c R c

[ ]i iE R c



Or with equality constraints

Forming the standard Lagrangian

▲▲▲

1max max subject to

2w wU w w w Aw b

1( )

2w w w Aw b

L



FOC

▲▲▲

10w A

w

L

0Aw b

L

* 1 ( )w A

Aw b

M-V Analysis Utility/2-Fund Separation


Solving for the optimal weights

Notice that the optimal solution is split into a constrained

minimum-variance portfolio and a speculative portfolio.

This is known as two-fund separation. The first term does

not depend either on the expected returns or on the risk

tolerance – it is the constrained minimum-variance

portfolio. The second term depends on the expected

returns and the investor’s risk tolerance.

▲▲▲

* 1 1 1 1 1 1( ) ( ( ) )w A A A b A A A A

M-V Analysis Efficiency of Solution


A brief aside:

Note that constrained optimization reduces the efficiency of the

solution. A constrained solution must be less optimal than an

unconstrained solution (assuming that the constraint is

binding). The loss in efficiency can be measured as the

difference between a constrained and unconstrained solution.

But, not every difference between constrained and unconstrained

portfolios is statistically or economically significant. So we

might want to test whether there is a difference. One way to

test for significance is to use the Sharpe ratio (SR).

▲▲

M-V Analysis Efficiency of Solution


Consider a simple case of running an unconstrained

optimization with k* assets and a constrained optimization

with k assets (k* > k). We can use

Where the statistic is F-distributed and the Sharpe Ratio is

▲▲

* *

* * *2 2

2 , ( 1)

( )( )( )F

(1 ) k N k k

N k k k SR SR

SR

fr rSR

Asset-Liability Management


Now consider the problem when we also have stochastic

liabilities. In this case, we focus on the difference between

assets and liabilities. This is known as surplus. The

change in surplus depends directly on the returns of the

asset portfolio (Rp) as well as the liability returns (Rl).

We will express surplus returns as a change in surplus

relative to assets

Surplus Assets Liabilitiesp lR R

Surplus Liabilities

Assets Assetsp l p lR R R fR



Where f is the ratio of liabilities to assets. If we set f = 1 and

Rl = c, we are back in the world without liabilities (or

where cash is our liability).

If we want to use the same optimizer, we need to transform

this problem into one of surplus – i.e. we need to express

covariance in terms of surplus risk and expected returns in

terms of the relative return of assets verses liabilities.

S S

1max subject to

2ww w w Aw b



11 1 1

S1

1

1 0 0 1 0 0

0 1 0 1

0 0 1 0 0 1

k l

k kk kl

l lk ll

f f

f f

f f

1

S (1 )

l

k l

f

c f

f



Now our solution is

By varying the risk-tolerance parameter, we can trace out

the surplus-efficient frontier.

* 1 1S S

1 1 1 1S S S S S

( )

( ( ) )

w A A A b

A A A A



The unconstrained (asset-only) frontier and the surplus-

efficient frontier coincide if:

Liabilities are cash (or, equivalently, if assets have zero covariance

with liabilities)

All assets have the have the same covariance with liabilities

There exists a liability-mimicking asset and it lies on the efficient

frontier

The Investment Universe


The choice of the investment universe has a significant

impact on the outcome of portfolio construction. If we

constrain ourselves to NYSE equities, it is likely that our

optimizer will produce a solution skewed toward smaller

cap stocks (why?). If we add Nasdaq equities and foreign

equities, this is likely to change as the variance-covariance

structure changes.

In general, to avoid the accumulation of estimation errors,

we would like to limit our portfolio optimization to groups

of assets with high intragroup and low intergroup

correlations.



In the two asset case, our unconstrained optimization

produces* 1w

* 1 11 11 12 1*

* 1 122 21 22

ww

w

*11 22

111 11 22 12 21

2 211 11 11

1 1

(1 )

dw

d



As the correlation between the two assets approaches 1, the

portfolio weights will react very sensitively to changes in

means (or expected return estimates). As assets become

more similar, any expected return becomes increasingly

important for the allocation decision. Portfolio

optimization with highly correlated assets will almost

certainly lead to extreme and undiversified results.

In the next homework set, I have you explore a method of reducing this

problem using cluster analysis.

Risk Decomposition


It is often useful to understand the sources of risk in and

how those risks are spread through our portfolio. To get at

this, we can decompose risk in the following way.

Consider the standard deviation of portfolio returns

The first question we would like to address is how does

portfolio risk as we change the holdings of a particular

asset?

1/2

1/2 2( )p i ii i j ij

i i j i

w w w w w

Risk Decomposition


What we need is the ―marginal contribution to risk‖ MCTR

which can be easily calculated

Where the ith element in the k by 1 vector is

1MCTRp

kp

d w

dw

i ii j ij

p j i ipi p

i p p

w wd

dw

Risk Decomposition


Note that if we add the weighted MCTRs of all securities in

the portfolio, we get the volatility of the portfolio

as we would expect. If we divide this expression by the

volatility of the portfolio, we get

p ipi i p

i pi i

dw w

dw

21

p ipii i i

p i pi i i

dww w

dw

Risk Decomposition


Which shows that the percentage contributions to risk

(PCTR), which add up to 100%, are equal to the weighted

betas. This can be written as a vector

Where W is a k by k diagonal matrix with portfolio weights

on the diagonal. Each element of the vector PCTR is

given by

1

WPCTR

pk

p

d

dw

PCTRpi

i i ip i

dww

dw

Bibliography


Huang and Litzenberger, Foundations for Financial

Economics, North-Holland.

Intriligator, Mathematical Optimization and Economic

Theory, Prentice-Hall.

▲▲▲▲▲▲▲▲▲▲▲▲▲

Factor Risk Contributions


Last time we looked at risk decomposition of a portfolio.

Today we will assume that we can decompose the

uncertainty in asset returns into common factors.

Stocks are at least partly driven by characteristics like

industry, country, size, etc.

We can write the risk premium of a given stock as a

combination of these factor returns weighted by their

respective factor exposures.

▲▲▲▲



Where r is a k by1vector of risk premia (asset return minus

cash), X is a k by p matrix of factor exposures, f is a p by 1

vector of factor returns and u is a k by 1 vector of asset-

specific returns which are both uncorrelated with factor

returns and uncorrelated across assets.

The covariance matrix of excess returns can be expressed

r Xf u

[ ] [( )( ) ]E rr E Xf u Xf u

▲▲▲▲



Where Σff denotes the p by p covariance matrix of factor

returns and Σuu is a k by k covariance (diagonal) of asset-

specific returns

[ ] [( )] [( )] [( )] [( )]E rr E Xfu E Xff X E uu E uX f

ff uuX X

▲▲▲▲



We can now decompose the portfolio risk into a common

and a specific part

Using the same logic as last time, we get for the marginal

factor contribution to risk MFCTR (an f by 1 vector)

2p ff uuw X X w w w

MFCTR( )

p ff

p

d X w

d X w

▲▲▲▲

Implied View Analysis


So far, we have calculated the optimal portfolio weights

from given return expectations. But often we are working

with previously established portfolios and all we have are

the weights. How can we determine what the expectations

are and whether or not the weights make sense?

This is done using ―reverse optimization‖, which maps the

positions into implicit return expectations.

▲▲▲▲▲▲▲▲▲▲



In an unconstrained portfolio optimization, marginal risks

are traded off against marginal returns. A portfolio is

therefore optimal when the relationship between marginal

risks and marginal returns is the same for all assets in the

portfolio

Since the Sharpe ratio of the portfolio measures the

relationship between incremental risk and return, we can

express the relationship between marginal return and

marginal risk as:

▲▲▲▲▲▲▲▲▲▲▲



Where the beta measures the sensitivity of an asset to

movements of the portfolio:

Note that this follows from portfolio mathematics not from an equilibrium

condition, but if the portfolio were the market portfolio, the implied returns

would be the returns that investors would need to hold the market portfolio.

pp

p p

w

2p

w

▲▲▲▲▲▲▲▲▲▲▲



This kind of analysis can be used to show investors whether

their return expectations are consistent with market

realities, i.e., whether they are over or under investing

their risk budget in particular areas and whether they are

investing in a way that is consistent with their views.

▲▲▲▲▲▲▲▲▲▲▲



Let’s consider an example

Expected return 10% (5% excess); Volatility 8.97%, Sharpe

ratio 0.57

Asset Weight % Return % Volatility %

Equity 40 11 18

Absolute Rtn 15 12 8

Private Eqty 15 11 9

Real Estate 5 10 14

US Bonds 25 7 3

Non-US Bonds 0 8 8

Cash 0 5 0

▲▲▲▲▲▲▲▲▲▲▲



With a correlation matrix

1.0 0.0 0.5 0.5 0.3 0.3 0.0

0.0 1.0 0.0 0.0 0.0 0.0 0.0

0.5 0.0 1.0 0.5 0.3 0.3 0.0

0.5 0.0 0.5 1.0 0.5 0.3 0.0

0.3 0.0 0.3 0.5 1.0 0.8 0.0

0.3 0.0 0.3 0.3 0.8 1.0 0.0

0.0 0.0 0.0 0.0 0.0 0.0 1.0

▲▲▲▲▲▲▲▲▲▲▲



We can compute the marginal contribution to risk using the

equation from last time

We compute the MCTR for US Bonds as 0.014 –what does

this mean? Suppose instead of holding 25%, we invested

26%, then our total portfolio risk would change from

8.7948 to 8.8089

MCTRi i p

__

8.8089 8.7948 0.0141p

p US BondsUS Bonds

dw

dw

▲▲▲▲▲▲▲▲▲▲▲



Or for the complete picture

Biggest increase in risk would come from equities (already about 80%), smallest increase from Absolute Return (most diversifying).

Asset PCTR % MCTR Implied Rtn %

Equity 79.1 0.174 9.84

Absolute Rtn 1.9 0.011 0.62

Private Eqty 10.2 0.060 3.39

Real Estate 4.8 0.085 4.80

US Bonds 4.0 0.014 0.80

Non-US Bonds 0.0 0.029 1.66

Cash 0.0 0.000 0.00

▲▲▲▲▲▲▲▲▲▲▲



0

2

4

6

8

10

12

0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2

▲▲▲▲▲▲▲▲▲▲▲



Implied excess return for Absolute Return strategies is much

lower than the forecast. This means that the investor is

underspending risk in this area.

For equities, the investor is overspending in the risk

allocation. A large allocation in a relatively

undiversifying asset requires large implied return to make

the portfolio optimal.

In this case, it is apparent that the investor’s implied return

for equities is much larger than historical experience.

▲▲▲▲▲▲▲▲▲▲▲



View Optimization

This approach can be used iteratively where changes are

made to allocations or to forecasts until there is reasonable

correspondence between implied returns and expected

returns.

It can also be used to build a consensus view within a

portfolio team.

Note, however, that these views are for an unconstrained investor.

▲▲▲▲▲▲▲▲▲▲▲

Correcting for Autocorrelation


Some asset classes appear to have much less risk than one

might commonly believe.

Corporate high yield

Hedge funds

If the risk for an asset class is underestimated, too much

capital will be allocated to that class.

Loss of efficiency in the portfolio.

Broader issue of societal allocations.

▲▲▲▲▲▲▲



Positively autocorrelated returns (high returns tend to be

followed by high returns), show less historical volatility

than an uncorrelated series.

Where does autocorrelation come from?

Infrequent trading in illiquid securities.

Real estate

High yield

Hedge funds

Non-synchronous trading

▲▲▲▲▲▲▲



One of the ways to check and correct for autocorrelation is

known as the Blundell-Ward filter:

Which creates a new, transformed return series, r*, using the

returns r at times t and t-1. The coefficient a1 is estimated

from an autoregressive first-order (AR(1)) model:

1*1

1 1

1

1 1t t t

ar r r

a a

0 1 1t t tr a a r

▲▲▲▲▲▲▲



Note that by applying this filter the mean is unchanged:

And the variance increases:

1*

1 1

1

1 1t

ar r r r

a a

212 * 2

21

1( ) ( )

(1 )t t

ar r

a

▲▲▲▲▲▲▲



This approach can also be used to arrive at more realistic

beta estimates.

Let’s consider an example using four hedge fund indices,

convertible arbitrage, distressed debt, event-driven and

macro and the MSCI USA index as the market, we could

run three types of regressions

0it mt tr r

* *0it mt tr r

0 1 1 2 2 3 3it mt mt mt mt tr r r r r

▲▲▲▲▲▲▲



Index a1 β0 β*0 β0+β1+β2+β3

Convertible 0.55 (7.66) 0.09 0.22 0.25

Distressed 0.52 (6.86) 0.18 0.44 0.49

Event-Driven 0.28 (3.56) 0.29 0.38 0.38

Macro 0.18 (2.10) 0.29 0.37 0.52

The betas from ordinary regressions appear to

underestimate the true market exposure and therefore

overstate the diversifying effects associated with the

hedge funds.

▲▲▲▲▲▲▲

Problems with the Covariance Matrix


The covariance matrix is a fundamental tool for our analysis,

so it is worthwhile spending a bit of time looking at its

properties.

Since this is intended to be a covariance matrix, it must be

true that for all w. In other words, it must be

positive semi-definite. A necessary and sufficient

condition for positive semi-definiteness (for symmetric

matrices) is that all of the eigenvalues of Σ are positive or

zero and at least one eigenvalue is greater than zero.

0w w

▲▲▲▲▲



However, we may find that we sometimes have negative

eigenvalues when we have estimated out covariance

matrix.

This can arise for several reasons:

Estimates are generated from time series of different lengths.

The number of observations is less than the number of assets or

risk factors.

Two or more assets are collinear.

▲▲▲▲▲



Consider the following:

Where the variances have been standardized to 1.0 for

simplicity.

The eigenvalues can be found

1.0 0.9 0.3

0.9 1.0 0.7

0.3 0.7 1.0

1 2 3( , , ) (2.0,1.29, 0.3)e e e

▲▲▲▲▲



So this matrix is not positive semi-definite. One of the ways

to fix this is to perform an adjustment to the matrix.

1. Find the smallest eigenvalue (here e3)

2. Create a minimum zero eigenvalue by shifting the

covariance matrix where I is an identity

matrix.

3. Scale the resulting matrix by 1/(1/e3) to enforce

variances of 1:

*3e I

** *

3

1

1 e

▲▲▲▲▲



For our example, the new adjusted matrix is

With eigenvalues

**

1.0 0.69 0.23

0.69 1.0 0.54

0.23 0.54 1.0

1 2 3( , , ) (1.77,1.22,0)e e e

▲▲▲▲▲

Significance of the Inverse Covariance


Let’s turn to the economics of our unconstrained solution

If we run the regression of asset i against all other k-1 assets

The explanatory power of this regression is given as

* 1w

i ij j i

j i

r a r

2iR

▲▲▲▲



It can then be shown than

1122 2 2

11 1 11 1 11 1

2212 2 21

22 2 22 2 22 2

1 22 2 2

1

(1 ) (1 ) (1 )

1

(1 ) (1 ) (1 )

1

(1 ) (1 ) (1 )

k

k

k k

kk k kk k kk k

R R R

R R R

R R R

▲▲▲▲



Which means that the optimal weight for asset i is

The numerator is the excess return after regression hedging

(i.e. the excess return after the reward for implicit

exposure to other assets has been removed. This is

equivalent to a in the regression.

*

2(1 )

i ij j

j ii

ii i

wR

▲▲▲▲



Since ζii is the total risk associated with asset i, the fraction

of risk that cannot be hedged away is the denominator of

our expression.

In terms of the regression equation, this is the unexplained

variance or the variance of the error term.

*

2(1 )

i ij j

j ii

ii i

wR

▲▲▲▲



Since the regression attempts to minimize the variance of

the errors – this means that the optimization will put

maximum weight into those assets that are similar to the

other assets (as a group) but have a small return

advantage. This property leads to implausible results

when estimation errors are taken into account.

Covariance in Good and Bad Times


Often we find that during times of market difficulty, correlations within an asset class increase. Sometimes this is stated, ―In times of stress, all correlations go to one.‖

Is the low correlation in a full sample covariance matrix just an artifact of reasonably positive correlation in normal times and of highly negative correlation in unusual times? Or is it a diversifying asset?

Investors may not want to bet on average correlation – they may actually have preferences that vary depending on the state of the world.

▲▲▲▲



To address these types of issues, we may want to optimize

our portfolio based upon our expectation of the occurrence

of ―normal‖ and ―unusual‖ times.

To determine what are unusual times, we will define them

according to their statistical distance from the mean vector

This statistic is distributed Chi-Squared with k degrees of

freedom. If we define an unusual observation as the outer

10%, we can test each time period.

1 1ˆ ˆ( ) ( )ˆ ˆt t t t tr r d d D

▲▲▲▲



Notice that the distance is weighted by the inverse of the

covariance matrix. This means that we take into account

asset volatilities (the same deviation from the mean might

be significant for low-volatility series but not for high-

volatility series). Hence, outliers are not necessarily

associated with down markets.

▲▲▲▲



We could now build a new covariance matrix weighted by

our subjective (or estimated) probabilities.

Where we have included the relative risk tolerance for each

regime (note that these must be scaled so they sum to the

actual risk tolerance of the investor).

Note that this analysis can be very sensitive to the inclusion of new assets

since that may change which periods are usual and unusual. For that reason,

it may be useful to define unusual times with respect to a core set of assets.

(1 )new normal normal unusual unusualp p

▲▲▲▲

Estimation Error


We should be clear that everything that we have done so far

is predicated on a couple of things:

1. We are using expected returns – in other words,

forecasted returns for our assets.

2. We are using an expected variance-covariance structure

– in other words, forecasted for our universe of assets.

3. If the future deviates from our forecasts by a significant

amount, we will not have an optimal portfolio. (This is an

issue of performance measurement)

▲▲▲▲▲▲▲▲▲▲▲▲▲

Estimation Error


As I have said, generally you will want to forecast the mean

in some manner (if we have time we will talk more about

this later in the course). Your forecast could be a simple

forecast (like last period’s return or the sample mean) or it

could be more complex (Delphi method; time series

forecast; multi-factor forecast).

▲▲▲▲▲▲

Estimation Error


For the variance-covariance structure, one typically uses

simple approaches like the estimated structure based upon

the sample history, a 250 day moving average, or an

exponentially weighted average. You can add complexity

to this by embedding Arch-Garch processes or other

generalizations, but remember that if you are not using a

factor decomposition (and thereby reducing the space),

you are now attempting to forecast a large number of

variables for a problem of any size.

2

2

nn

▲▲▲▲▲▲

Estimation Error


To review what I discussed last time, assume that we have

an estimated mean of 10% and an estimated volatility of

20%.

Estimation error for the mean is given by

And the confidence interval is calculated as

T

,z zT T

▲▲▲▲▲▲

Estimation Error


For the variance, Campbell, Lo and MacKinlay have shown

We can see from these expressions that the estimation error

for the mean is effected by the length of the time series T

and the estimation error for the variance is effected both

by the length and by the frequency of sampling (∆t).

We also see this in the following tables:

12 2ˆ( ) 1 2

TVar

t

▲▲▲▲▲▲

Estimation Error


Estimation Period (yrs) Estimation Error % 95% Confidence Interval %

1 20 78

5 9 35

10 6 25

20 4 18

50 3 11

Effect of Sample Period on Estimation Error for Mean Returns

▲▲▲▲▲▲

Estimation Error


Effect of Sample Period on Estimation Error (%) for Variance

Estimation Estimation Frequency

Period yrs Daily Weekly Monthly Quarterly

1 0.35 0.79 1.71 3.27

5 0.16 0.35 0.74 1.30

10 0.11 0.25 0.52 0.91

20 0.08 0.18 0.37 0.64

50 0.05 0.11 0.23 0.40

▲▲▲▲▲▲

What is more important – estimation error in the

mean or in the variance?

Currency in the Portfolio


When optimizing a portfolio, one often has to deal with a

block structure. In other words, two or more blocks of

assets (eg. stocks and bonds, equities and currencies,

active managers and passive strategies).

Often the correlation between blocks is ignored or set to

zero and the problem is solved separately, or the problem

is solved in a two-step process where one finds the

―optimal‖ allocation for part of the problem and then finds

the ―optimal‖ allocation for the second part of the

problem.

▲▲▲▲▲▲▲▲▲▲▲▲▲



We will study this problem using currencies.

Optimal currency hedging is the subject of ongoing debate

between plan sponsors, asset managers and consultants.

We will consider asset returns (local return plus currency

return minus domestic cash rate)

i ii h

i i

p sa c

p s

▲▲▲▲▲▲▲▲



And currency returns (local cash rate plus currency return

minus domestic cash rate)

The covariance matrix of asset and currency returns is

assumed to follow the block structure

ii i h

i

se c c

s

aa ae

ea ee

▲▲▲▲▲▲▲



Currency hedging takes the form of regression hedging

where we regress asset returns against all currency returns:

Regression hedging can also be expressed in matrix terms as

Where β is

1 1i i i ik k ik k ia e e e

1ea ee

11 12 1

21 22 2

1 2

k

k

k k kk

▲▲▲▲▲▲▲



We can now define the variance in asset returns that remains

unexplained by currency returns (this is the conditional

variance of asset returns conditioned on currency returns)

And write the inverse of the covariance matrix of asset and

currency returns as

|a e aa ee

1 1| |1

1 1 1| |

a e a e

a e ee a e

▲▲▲▲▲▲▲



Where we use the results for the inverse of a partitioned

matrix

1 1 1 111 12 12 22

1 1 1 1 1 121 22 22 21 22 22 21 12 22

P P D D P P

P P P P D P P P D P P

111 12 22 21D P P P P

▲▲▲▲▲▲▲



For example, checking the value of D

1 111 12 22 21

1 1 1

1 1 1

1

|

( ) ( )

aa ae ee ea

aa ae ee ee ee ea

aa ae ee ee ee ea

aa ee

a e

D P P P P

▲▲▲▲▲▲▲



Now, defining

And recalling the solution to the unconstrained optimization

a

e

ww

w

a

e

* 1w

▲▲▲▲▲▲▲



There are three solutions to our problem.

First is the simultaneous optimization or the joint full blown

optimization (choosing the optimal asset and currency

positions simultaneously):

This assumes that the manager has expertise over all assets and

currencies.

1 1*| |,*

* 1 *, ,

a e a a e ea sim

sim

e sim ee e a sim

ww

w w

▲▲▲▲▲▲▲▲



Note that the optimal hedge positions for currency depend

on the optimal asset positions, which are themselves

effected by the presence of currencies in the portfolio.

Also, the hedge positions have a speculative component

driven by non-zero expected returns in currencies as well

as a variance reduction component related to beta.

* 1 *, ,e sim ee e a simw w

▲▲▲▲▲▲▲▲



If currencies carry a positive risk premium (the currency

return is, on average, greater than the interest rate

differential), currencies will be included in the optimal

portfolio because the first term will be positive.

Instead, let’s focus on the case (often assumed in practice)

that currencies do not offer a significant risk premium. In

this case, the solution becomes

* 1, |

* *, ,

a sim a e a

e sim a sim

w

w w

▲▲▲▲▲▲▲▲



Suppose now that local asset returns are also uncorrelated

with currency returns. In that case, taking on currency

risk does not help to reduce (or hedge) asset risk and

currency risk would always be an add-on to asset risk.

If local returns are not correlated with currency movements,

the covariance between currency returns and foreign

assets returns in home currency units contains solely the

covariance between currencies.

▲▲▲▲▲▲▲▲



Which in matrix terms becomes

or

, , ,

,

j j ji i i i

i i j i j i j

ji

i j

s s sp s p sCov Cov Cov

p s s p s s s

ssCov

s s

1 1ee ee

ea ee

▲▲▲▲▲▲▲▲



So the currency positions will completely hedge out the

currency risk that arises from the unhedged asset positions

(unitary hedging):

* 1, |

* *, ,

a sim a e a

e sim a sim

w

w w

▲▲▲▲▲▲▲▲



Now, suppose the opposite – that foreign asset returns (in

home country currency) and currency returns are not

correlated. Now we would have and

so our solution would be

Since the covariance of asset returns conditioned on

currency returns would be

0ea1 0ea ee

* 1,

*, 0

a sim aa a

e sim

w

w

|a e aa ee aa

▲▲▲▲▲▲▲▲



To summarize:

1. If currencies carry a risk premium, there will always be a

speculative aspect to currency exposure.

2. If currencies do not have a risk premium, we need to look at

currency exposure in terms of its ability to reduce asset risk:

a. Zero correlation between local returns and currency returns means

currencies add risk without return or diversification benefits.

b. Negative correlation between local returns and currency returns

makes currencies a hedge asset that reduces total portfolio risk.

c. Positive correlation between local returns and currency returns

would increase total portfolio risk. In that case, over-hedging

(short position in currency is greater than the long position in the

asset) is optimal.

▲▲▲▲▲▲▲▲



Now consider the second approach, where we optimize asset

positions in a first step and in a second step choose

optimal currency positions conditional on the already

established asset positions. This is known as partial

optimization and the solution is

Terms representing the conditional covariance drop out and

there is no feedback of currency positions on asset

positions. Total risk is controlled but currencies are managed

independently.

* 1,*

1 **,,

a par aa a

par

ee e a pare par

ww

ww

▲



The final option for constructing portfolios with currencies

is simply separate optimization (also known as currency

overlay)

In this case currencies are completely independent and should be measured

against their own benchmark.

* 1,*

1*,

a sep aa asep

ee ee sep

ww

w

▲



I hope, by now, that it is obvious to you that these different

techniques are in decreasing order of efficiency (in other

words, decreasing utility).

Moreover, it should also be obvious that currencies are just a

proxy for any investible asset that you want as part of your

portfolio (hedge funds; foreign equity; private equity; real

estate; etc.). These three techniques can always be used

(and commonly are), but they are always in decreasing

efficiency.

▲

Bibliography


Blundell and Ward, ―Property Portfolio Allocation: A Multifactor

Model‖, Land Development Studies, 1987.

Chan and Hussey, ―Marginal Contribution to the Sharpe Ratio‖,

Northwater Capital Management Inc., January 2009.

Chow, Jacquier, Kritzman, and Lowry, ―Optimal Portfolios in

Good Times and Bad‖, Financial Analysts Journal, 1999.

Scholes and Williams, ―Estimating Beta from Nonsynchronous

Data‖, Journal of Financial Economics, 1977.

Stevens, ―On the Inverse of the Covariance Matrix in Portfolio

Analysis‖, Journal of Finance, 1998.

▲▲▲▲▲▲▲▲▲▲▲▲▲

Bibliography


Campbell, Lo, and MacKinlay, The Econometrics of

Financial Markets, Princeton University Press, 1997.

Jorion, ―Mean Variance Analysis of Currency Overlays‖,

Financial Analysts Journal, 1994.

▲▲▲▲▲▲▲▲▲▲▲▲▲

Risk Revisited


So far we have often relied on an assumption (or

presumption) of normal returns. But we know that asset

returns are not normal and, therefore, the mean and

variance do not fully describe the characteristics of the

joint asset return distribution. Specifically, the risk and

the undesirable outcomes associated with the portfolio

cannot be adequately captured by the variance.

Let’s spend a bit of time looking at alternative portfolio risk

measures that are sometimes used in practice.

Risk Revisited


Generally speaking, there are two different types of risk

measures:

1. Dispersion Measures: consider both positive and

negative deviations from the mean, and treat those

deviations as equally risky.

2. Downside Measures: maximize the probability that the

portfolio return is above a certain minimal acceptable

level known as the benchmark or disaster level.

Dispersion: Standard Deviation


Of course, the best known and most used dispersion

measure is (for historical reasons) the foundation of

modern portfolio theory – standard deviation

1/2

1/2 2( )p i ii i j ij

i i j i

w w w w w

Dispersion: Mean-Absolute Deviation


The mean-absolute deviation or MAD approach doesn’t use squared deviations, but absolute deviations

Where

And ri is the return on the asset and μi is the expected return on the asset.

p i i i i

i i

MAD r E w r w

p i i

i

r w r

Dispersion: Mean-Absolute Deviation


The computation of optimal portfolios under MAD is

straightforward since the optimization problem is linear

and can be solved with standard linear programming

routines.

Note that it can be shown that if individual asset returns are

multivariate normal

2

p pMAD r

Dispersion: Mean-Absolute Moment


The mean-absolute moment (MAMq) of order q is defined by

Or

Which is a straightforward generalization of the mean-standard deviation (q=2) and the mean-absolute deviation (q=1) approaches.

1/

, 1

qq

q p i i i i

i i

MAM r E w r w q

1/

( ) , 1q

q

q p p pMAM r E r E r q

Downside Measures


Now let’s turn to downside measures, where the objective is to have a portfolio return above a certain minimum – a safety first approach.

While these types of measures may have significant intuitive and theoretical appeal, they are often computationally more complicated to use in a portfolio context.

Downside risk measures of individual assets cannot be easily integrated into portfolio downside risk measures since their computation requires knowledge of the entire joint distribution of asset returns.

You usually have to resort to computationally intense nonparametric estimation, simulation, and optimization techniques.

Moreover, the estimation error for downside measures is usually higher than that for mean-variance approaches since we only use a portion of the original data – often just the tail of the empirical distribution.

Downside: Roy’s Safety First


Published the same year (1952) as Markowitz’s paper (the

foundation of Modern Portfolio Theory), was Roy’s paper

on safety first (the foundation of downside risk measures).

Under MPT, the investor makes a trade off between risk and

return where the final portfolio allocation depends on the

investor’s utility function. As you know, it can be hard, or

even impossible, to determine the investor’s actual utility

function.



Roy argued that an investor, rather than thinking in terms of

utility, first wants to make sure that a certain amount of

the principal is preserved. Thereafter, the investor decides

on a minimal acceptable return that achieves this principal

preservation.

In essence, the investor solves

Where Pr is the probability function and rp is the portfolio

return.

0min Pr( ) subject to 1pw

r r w



Of course, it would be unlikely that the investor would know

the true probability function, but if we recall that

Tchebycheff’s inequality (for a random variable x, mean μ

and variance σ2 ) states that for any positive real number c

Then we can write

2

2Pr x c

c

0 0

2

2

0

Pr( ) Pr( )p p p p

p

p

r r r r

r



Therefore, not knowing the probability function, the investor

solves the approximation

Note that if r0 is equal to the risk-free rate, then this optimization problem is

equivalent to maximizing a portfolio’s Sharpe ratio.

0

min subject to 1p

w p

wr

Downside: Semi-variance


Even in his 1959 book, Markowitz proposed the use of

semi-variance to correct for the fact that variance

penalizes over-performance and under-performance

equally.

Portfolio semi-variance is

2

2,min min ,0p i i i i

i i

E w r w

Downside: Lower Partial Moment


The lower partial moment risk measure is a generalization of

semi-variance. The lower partial moment with power

index q and a target rate of return r0 is given by

If we set q=2 and r0 equal to the expected return, we get the

semi-variance.

Note, it can be shown q=1 represents a risk neutral investor, 0<q≤1 a risk

seeking investor and q>1 a risk-averse investor.

0

1

, , 0min ,0p

q qr q r pE r r

Downside: Value at Risk


The best known downside risk measure is probably value at

risk (VaR), originally developed by JP Morgan. VaR is

related to the percentiles of loss distributions, and

measures the predicted maximum loss at a specified

probability level (for example 95%).

VaR can be defined as

Typical values of (1-ε) are 90%, 95%, and 99%.

1VaR min | Prp pr r r r



Note that there a several equivalent ways to define VaR

emphasizes that r is the value such that the probability of a loss greater than r is less than ε.

An alternative (and equivalent) way to define VaR

emphasizes that r is the value such that the probability that the maximum loss is at most r is (1-ε).

1VaR min | Prp pr r r r

1VaR min | Pr (1 )p pr r r r



There are many well known problems with VaR:

1. The common assumption of lognormal returns is problematic

when you have long and short positions.

2. It is not sub-additive (in other words, the risk of two

combined portfolios may not be less than the sum of the risks

of each), which means that diversification does not generally

hold.

3. When calculated from generated scenarios, VaR is a non-

smooth and non-convex function with multiple stationary

points making it a difficult function to find a global optimum.

4. It does not take into account the magnitude of losses beyond

the VaR value.

Downside: Conditional Value at Risk


The problems with value at risk led to the development of

desirable properties for a risk measure. Risk measures

which satisfy these properties are known as coherent risk

measures.

A risk measure ρ is called a coherent measure of risk if it

satisfies:

1. Monotonicity: if X ≥ 0, then ρ(X) ≤ 0.

2. Subadditivity: ρ(X+Y) ≤ ρ(X)+ ρ(Y).

3. Positive Homogeneity: for any positive real number c,

ρ(cX) = cρ(X).

4. Translational invariance: for any real number c,

ρ(X+c) ≤ ρ(X)-c.



These properties can be interpreted:

1. If there are only positive returns, then the risk should be non-

positive.

2. The risk of a portfolio of two assets should be less than or

equal to the risks of the individual assets.

3. If the portfolio is increased c times, the risk becomes c times

larger.

4. Cash or another risk-free asset does not contribute to

portfolio risk.

Note that standard deviation is not a coherent measure since it violates the

monotonicity property. Semi-deviation type measures violate the

subadditivity condition. The four properties together are quite restrictive.



Conditional value at risk is a coherent risk measure defined

as:

CVaR measures the expected amount of losses in the tail of

the distribution of possible portfolio losses (beyond the

portfolio VaR).

This is also known as expected shortfall, expected tail loss,

or tail VaR.

(1 ) (1 )CVaR ( ) | VaR ( )p p p pr E r r r



Let’s consider some of the mathematical properties of

CVaR.

Let w be the vector denoting the number of shares of each

asset and y be a random vector describing the uncertain

outcomes of the economy (or the market variables). The

function f(w,y) (the loss function) represents the loss

associated with the portfolio vector w (Note that for each

w, the loss function is a one-dimensional random

variable). Finally, p(y) is the probability associated with

scenario y.



Now, assuming all random variables are discrete, the

probability that the loss function does not exceed a certain

value γ is given by the cumulative probability

Using this cumulative probability, we can write

{ | ( , ) }

( , ) ( )y f w y

w p y

(1 )VaR ( ) min{ | ( , ) (1 )}w w y



Since CVaR of the losses of portfolio w is the expected

value of the losses conditioned on the losses being in

excess of VaR, we have

(1 )

(1 )

(1 ) (1 )

{ | ( , ) VaR ( )}

{ | ( , ) VaR ( )}

CVaR ( ) ( ( , ) | ( , ) VaR ( ))

( ) ( , )

( )

y f w y w

y f w y w

w E f w y f w y w

p y f w y

p y





The continuous equivalents of these formulas are

( , )

( , ) ( )

f w y

w p y dy

(1 )VaR ( ) min{ | ( , ) (1 )}w w y

(1 )

(1 ) (1 )

( , ) VaR ( )

CVaR ( ) ( ( , ) | ( , ) VaR ( ))

1( , ) ( )

f w y w

w E f w y f w y w

f w y p y dy



Moreover, we see that

(1 )

(1 )

(1 )

( , ) VaR ( )

(1 )

( , ) VaR ( )

(1 )

1CVaR ( ) ( , ) ( )

1VaR ( ) ( )

VaR ( )

f w y w

f w y w

w f w y p y dy

w p y dy

w



Since

In other words, CVaR is always at least as large as VaR, but

it is a coherent risk measure (and VaR is not). Further,

CVaR is a concave function and therefore has a unique

minimum.

Note, however, we have a problem in that you need to have

an analytical expression for VaR – this problem was

solved by Rockefellar and Uryasev (2000).

(1 )( , ) VaR ( )

1( ) 1

f w y w

p y dy



Their idea is that instead of CVaR we can use the function

Rockefellar and Uryasev prove the following

1. is a convex and continuously differentiable

function in .

2. is a minimizer of .

3. The minimum value of is .

( , )

1( , ) ( ( , ) ) ( )

f w y

F w f w y p y dy

( , )F w

(1 )VaR ( )w ( , )F w

( , )F w (1 )CVaR ( )w



So we can find the optimal value of by

solving the optimization problem

If we denote as the solution to this optimization

problem, then is the optimal CVaR.

The optimal portfolio is given by and the corresponding

VaR is given by .

In other words, we can compute the optimal CVaR without first calculating

VaR.

(1 )CVaR ( )w

,min ( , )w

F w

* *( , )w * *( , )F w

*w*



In practice, the probability density function p(y) is not

known or difficult to estimate. Instead, we might have T

different scenarios Y={y1,…,yT} that are sampled from the

probability distribution or that have been obtained from

computer simulations. Evaluating the auxiliary function

using the scenarios Y, we obtain* *( , )F w

1

1( , ) max(( ( , ) ),0)

TY

i

i

F w f w yT



Therefore the optimization problem

Takes the form

(1 )min CVaR ( )w

w

,1

1min max(( ( , ) ),0)

T

iw

i

f w yT



Which can also be written

Subject to

Along with any other constraints (like short sales). Where zi

is an auxiliary variable for .

,1

1min

T

iw

i

zT

0, 1, ,iz i T

( , ) , 1, ,i iz f w y i T

max(( ( , ) ),0)if w y



Under the assumption that f(w,y) is linear in w, the above

optimization is linear and can be solved using standard

linear programming techniques.



This representation of CVaR can also be used to construct

other portfolio optimization problems. For example, the

mean-CVaR optimization problem

Subject to

Along with other constraints on w written as

maxw

w

(1 ) 0CVaR ( )w c

ww C



Results in the following

Subject to

maxw

w

0

1

1 T

i

i

z cT

0, 1, ,iz i T

( , ) , 1, ,i iz f w y i T

ww C



Palmquist, Uryasev, and Krokhmal provide us with an

example of the mean-CVaR approach.

They considered two-week returns for all of the stocks in the

S&P 100 from July 1, 1997 to July 8, 1999 for scenario

generation. Optimal portfolios were constructed solving

the mean-CVaR optimization approach for a two-week

horizon at different levels of confidence.



Note risk is the percent of the portfolio allowed to be put at risk.



It can be shown that for a normally distributed loss function,

the mean-variance and mean-CVaR frameworks generate

the same efficient frontier. However, when distributions

are non-normal, these two approaches can be significantly

different.

M-V optimization relies on deviations on both sides of the

mean, while M-CVaR relies only on the part of the

distribution which contributes to high losses.



Bibliography


Artzner, Delbaen, Eber, and Heath, ―Coherent Measures of Risk‖, Mathematical Finance, 1999.

Grootveld and Hallerbach, ―Variance Verses Downside Risk: Is There Really That Much Difference?‖, European Journal of Operational Research, 1999.

Krokhmal, Palmquist, and Uryasev, ―Portfolio Optimization with Conditional Value-At-Risk Objective and Constraints‖, Journal of Risk, 2002.

Markowitz, ―Portfolio Selection‖, Journal of Finance, 1952.

Rockafellar and Uryasev, ―Optimization of Conditional Value-At-Risk‖, Journal of Risk, 2000.

Roy, ―Safety-First and the Holding of Assets‖, Econometrica, 1952.

Uryasev, ―Conditional Value-At-Risk: Optimization Algorithms and Applications‖, Financial Engineering News, 2000.

▲▲▲▲▲▲▲▲▲▲▲▲▲

Asset Allocation

Allocation between asset classes accounts for the major

portion of risk and return in a portfolio

Selection of specific instruments is a decision with smaller

influence on portfolio performance

Asset Allocation should consider all financial aspects Current and future wealth, income, and financial needs

Financial goals

Taxes and tax advantaged investments

Liquidity (for unexpected needs)

Investors (all types) need customized strategies


▲

Typical Financial Advice for Individuals


Questionnaires to assess investor’s risk aversion

E*Trade, Charles Schwab, Fidelity, Financial Engines, etc.

Risk aversion of the investor typically assumed to be CRRA

Choose from standardized portfolios

Conservative (20% stocks)

Dynamic (40% stocks)

Aggressive (60% stocks)

Is this customized?

Typical Financial Advice for Individuals


Recently, so called life-cycle funds have been popular

Fidelity Freedom 2020

Asset allocation is purely time-dependent

Rule of thumb percent stock = 100 – age

But these strategies do not depend on wealth, expected

performance, cash flow, etc.

Dynamic Asset Allocation


In real life investors change their asset allocation as time

goes by and new information is available

In theory investors value wealth at the end of the planning

horizon (and along the way) using a specific utility

function and maximize expected utility

Fixed-mix strategies are optimal only under certain

conditions

In general, the optimal investment strategy is dynamic and

reflects real-life behavior



After a stock market correction (with significant losses in

the stock portion of the portfolio) an investor would:





Rebalance back to the original allocation (constant RRA)






Buy more stocks and assume a larger stock allocation than in the

original portfolio (increasing RRA)






Buy more stocks and assume a larger stock allocation than in the

original portfolio (increasing RRA)

Do nothing and keep the new stock allocation or sell stocks to assume

a smaller stock allocation than in the original portfolio (decreasing

RRA)



Samuelson (1969)

Optimal program for investment/consumption in each period

Backward dynamic programming (maximize discounted expected

utility over lifetime)

No bequest

One risky asset (iid) and one riskless

Power utility

Optimal to invest the same proportion of wealth in stocks

in every period, independent of wealth

Merton (1969) extended this to multiple risky assets and a

variety of bequest situations



Conflict between theoreticians and practitioners

Samuelson’s and Merton’s result is that under their

assumptions about the market and under constant relative

risk aversion, the consumption and investment decisions

are independent of each other; the optimal investment

decision is invariant with respect to the investment

horizon and with respect to wealth.



This is the same as an investment problem where you

maximize the utility of final wealth at the end of the

investment horizon, by allocating and reallocating at each

period along the way.

The result follows directly from the utility function used.

Myopic investment strategy.



Mossin (1968) attempted to isolate the class of utility

functions of terminal wealth which result in myopic utility

for intermediate periods.

Log utility for general asset distributions

Power utility for serially independent asset distributions

If there is a riskless asset – all HARA (linear risk tolerance) utility

functions



Hakansson (1971) showed for HARA no myopic strategy

except for complete absence of restrictions on borrowing

and short sales A percent margin requirement

An absolute limit on borrowing

Lending that must be repaid

Therefore, under those restrictions, only power and log

utility functions can lead to myopic policies; furthermore

if there is serial correlation only log utility produces

myopic policies



More recently, numerical dynamic portfolio optimization

methods have been developed

Two methods

Stochastic programming

Stochastic dynamic programming (stochastic control)

Stochastic Programming


Efficiently solves the most general models

Transaction costs

Return distributions with serial dependence

Lends itself well to the more general asset liability model (ALM)

Traditionally uses scenario trees to represent possible

future events

Need to keep the tree thin for computational tractability

In later stages a very small number of scenarios are used to represent

the distribution (very thin sub-trees)

Emphasis is on obtaining a good first-stage solution rather than an

entire accurate policy

Stochastic Dynamic Programming


Used when focus is on obtaining optimal policies and transaction costs are not a primary issue.

Based on Bellman’s dynamic programming principle. An optimal policy has the property that, whatever the initial action, the

remaining choices constitute an optimal policy with respect to the subproblemstarting at the state that results from the initial conditions.

Closed form solutions exist for HARA utility functions.

For general monotone increasing and concave utility functions there are no analytical solutions, but can be solved numerically when state space is small.

Curse of dimensionality

Dynamic Portfolio Choice


Let’s extend the single-period utility maximization problem

to a multi-period setting.

Let:

t = 0,…, T be discrete time periods with T the investment

horizon

Rt be the random vector of asset returns in time periods t

yt = (y1,…, yN)t be the amount of money invested in the

different asset classes i = 1,…, N at time t

Scalars W0 and st, t = 0,…, T-1, represent the initial wealth

and possible cash flows (positive and negative) over time





We can then write:

0 0 0

1 1

0 0 1

max

st.

, 1, ,

0, , , , given, 0

T

t t t t

t T T

E U y

y W s

R y y s t T

y W s s s



As an aside, note that with time-additive utility we could

also write

Where δ represents the discount factor.

1

0 0 0

1 1

0 0 1

max

st.

, 1, ,

0, , , , given, 0

Tt

t

t

t t t t

t T T

E U y

y W s

R y y s t T

y W s s s



Back to our problem, defining xt (for t = 0, T-1) as the vector

of fractions invested in each asset class in each period, we

write

Where Wt is the wealth available each period before adding

or deducting cash

tt

t t

yx

W s

1 1 1 1( )t t t t tW R x W s



We can then write:

Here we can see that for serially independent asset returns, wealth is a single state connecting one period with the next.

1

0 0 1

max

st. 1 0, , 1

( ) , 0, , 1

0, , , , given, 0

T

t

t t t t t

t T T

E U W

x t T

W R x W s t T

y W s s s



Now we can write the problem as a dynamic programming

recursion

1

1 0 0 1

max ( )

st. 1

A b

where ( ) ( )

( ) and , , , given, 0

t t t t t t t

t

t

t

T T

t t t t t T T

U W E U W s R x

x

x

l x u

U W U W

W R x W s W s s s



In practice, we need to resort to Monte Carlo simulation to

estimate the expected utility of the single-period utility

maximizing problem in each period.

Let be samples of return

distributions for each period t. We can represent the

problem as:

, , 1, , 1,t tR S t T

1

1ˆ ˆmax ( )

st. 1

A b,

t

t t t t t t tt S

t

t t

U W U W s R xS

x

x l x u



Now the dynamic optimization problem can be solved using a backward dynamic programming recursion, conditioning on wealth.

Starting at T-1, parameterize wealth into K discrete levels

and solve the T-1 problem K times using sample ST-1, obtaining solutions .

We then use those solutions to obtain the T-2 solutions and continue ―backward‖. In period 0, the initial wealth is known and we conduct the final optimization using the period 1 value function.

In each period in the backward recursion, use a new sample generated from Monte Carlo.

1ˆ kTx

1, 1, ,kTW k K

Practical Utility


Represent utility as a piecewise exponential function with K

pieces represents a certain absolute risk aversion γi where

i = 1,…, K

Let be discrete wealth levels representing

the borders of each piece i, such that below the risk

aversion is γi and above (until ) the risk aversion is

γi+1 for all i = 1,…, K.

For each piece i represent utility by an exponential function

ˆ , 1, ,iW i KîW

îW 1

îW

i iWi i i iU W a b e

Practical Utility


With a first derivative with respect to wealth

The γi are chosen to represent the desired function of risk

aversion verses wealth.

The coefficients of the exponential functions for each piece i

are found by matching both the function values and the

first derivatives at the intersections . In other words, we

fit an spline function.

i ii i Wi i

i

U Wb e

W

îW

Practical Utility


Thus at each wealth level , representing the border

between risk aversion γi and γi+1 , we have the following

two equations

From which we calculate the coefficients (setting a1 = 0 and

b1 = 1)

1ˆˆ

1 1i ii i WW

i i i ia b e a b e

1ˆˆ

1 1i ii i WW

i i i ib e b e

1ˆ( )

11

i i ii Wi i

i

b b e

ˆ1

1

1 i ii Wi i i

i

a a b e

Practical Utility


Example 1


Current wealth $100,000

Cash contributions (savings) of $15,000 per year

20 year investment horizon

US Stocks, International Stocks, Corporate Bonds,

Government Bonds, and Cash

Example 1


US Stocks Int Stocks Corp Bonds Gvt Bonds Cash

Mean 10.80 10.37 9.49 7.90 5.61

Std 15.72 16.75 6.57 4.89 0.70

Example 1


Four utility functions

A: exponential, absolute risk aversion = 2

B: Increasing relative risk aversion and decreasing absolute risk

aversion

2.0 @ W of $0.25M and below, increasing to 3.5 @ W of $3.5 and above

C: Decreasing relative risk aversion and decreasing absolute

risk aversion

8.0 @ W of $1.0M and below, decreasing to 1.01 @ W of $1.5M and above

D: Quadratic (downside)

Quadratic with linear penalty of 1000 for underperforming $1.0M

Recall from Lecture 2


Example 1


Utility CEW Mean Std 99% 95%

Exponential 1.412 1.564 0.424 0.770 0.943

Increasing RRA 1.440 1.575 0.452 0.771 0.937

Decreasing RRA 1.339 1.498 0.436 0.865 0.998

Quadratic 0.982 1.339 0.347 0.911 1.006

Example 1


Exponential Increasing RRA

QuadraticDecreasing RRA

Example 1


57.416.9

25.7

00

Exponential

US Stock

Int Stock

Corp Bonds

Gvmt Bonds

Cash

34

13.7

52.3

0 0

Increasing RRA

US Stock

Int Stock

Corp Bonds

Gvmt Bonds

Cash

10.610

67.2

12.2 0

Decreasing RRA

US Stock

Int Stock

Corp Bonds

Gvmt Bonds

Cash

53.216.4

30.4

0 0

Quadratic

US Stock

Int Stock

Corp Bonds

Gvmt Bonds

Cash

Example 1


Exponential

Example 1


Exponential

Example 1


Exponential

Example 1


Exponential: 1 to go

Example 1



Example 1



Example 1


Increasing RRA

Example 1


Increasing RRA

Example 1


Increasing RRA

Example 1


Increasing RRA: 1 to go

Example 1



Example 1



Example 1


Decreasing RRA

Example 1


Decreasing RRA

Example 1


Decreasing RRA

Example 1


Decreasing RRA: 1 to go

Example 1



Example 1



Example 1


Quadratic

Example 1


Quadratic

Example 1


Quadratic

Example 1


Quadratic: 1 to go

Example 1


Quadratic: 10 to go

Example 1


Quadratic: 19 to go

Example 2


Now compare these dynamic strategies with six fixed-mix

strategies. US stocks only

Cash only

All asset classes equally weighted

Risk averse (conservative)

Medium risk (dynamic)

Risk prone (aggressive)

With the exception of equally weighted asset classes, all

strategies are the solution of the single period Markowitz

optimization.

Example 2


Example 2


Strategy Mean Std 99% 95%

US stocks 1.825 1.065 0.469 0.660

Cash 0.868 0.019 0.822 0.834

Equally weighted 1.349 0.301 0.799 0.920

Risk Averse 1.098 0.110 0.869 0.930

Medium Risk 1.538 0.407 0.825 0.975

Risk Prone 1.663 0.639 0.677 0.852

Example 2 CEW Improvement


Exponential Increasing

RRA

Decreasing

RRA

Quadratic

US stocks 9.61% 7.17% 96.12% 12.06%

Cash 62.79% 66.04% 56.08% 13.36%

Equally wtd 11.10% 12.30% 14.56% 2.03%

Risk averse 29.93% 32.42% 27.45% 1.03%

Medium risk 0.55% 0.76% 0.62% 1.19%

Risk Prone 1.63% 0.44% 23.72% 4.81%

Bibliography


Hakansson, ―On Myopic Portfolio Policies, With and Without Serial Correlation of Yields‖, Journal of Business, 1971.

Infanger, ―Dynamic Asset Allocation Strategies Using a Stochastic Dynamic Programming Approach‖, in Handbook of Asset and Liability Management, Volume 1, Zenios and Ziemba eds., 2006.

Merton, ―Lifetime Portfolio Selection Under Uncertainty: the Continuous-time Case‖, Review of Economics and Statistics, 1969.

Mossin, ―Optimal Multiperiod Portfolio Policies‖, Journal of Business, 1968.

Samuelson, ―Lifetime Portfolio Selection by Dynamic Stochastic Programming‖, Review of Economics and Statistics, 1969.

▲▲▲▲▲▲▲▲▲▲▲▲▲

Characteristic Portfolios


Consider a single period problem with no rebalancing within

the period with the underlying assumptions:

There is a riskless asset

All first and second moments exist

It is not possible to build a fully invested portfolio that has zero

risk

The expected excess return on the fully invested portfolio with

minimum risk is positive.



Define a vector of asset attributes or characteristics (these

could be betas, expected returns, earnings-to-price ratios,

capitalization, membership in a an economic sector, etc.)

The exposure of portfolio to the attribute is .

1

2

N

a

aa

a

pw apw


4/14/2009Richard R. Lindsey494

The characteristic portfolio uniquely captures the defining

attribute.

Characteristic portfolio machinery connects attributes and

portfolios and to identify a portfolio’s exposure to an

attribute in terms of its covariance with the characteristic

portfolio.

The process works both ways, we can start with a portfolio

and find the attribute that the portfolio expresses most

effectively.



Proposition 1

1. For any non-zero attribute there is a unique portfolio that

has minimum risk and unit exposure to the attribute.

The weights of the characteristic portfolio are:

Characteristic portfolios are not necessarily fully

invested; they can have long and short positions, and

may have significant leverage.

1

1a

aw

a a



2. The variance of the characteristic portfolio is given by:

3. The beta of all assets with respect to the characteristic

portfolio is equal to

aw

2

1

1a a aw w

a a

aw a

2

a

a

wa



4. Consider two attributes and with characteristic

portfolios and Let and be, respectively, the

exposure of portfolio to characteristic and the

exposure of portfolio to characteristic . The

covariance of the characteristic portfolios satisfies

aw dw

dw

aw

a d

da ad

a

d

2 2,a d d a a da d



5. If is a positive scalar, then the characteristic portfolio

of is . Because characteristic portfolios have

unit exposure to the attribute, if we multiply the attribute

by we will need to divide the characteristic portfolio

by to preserve unit exposure.

a aw



6. If characteristic is a weighted combination of

characteristics and , then the characteristic portfolio

of is a weighted combination of the characteristic

portfolios of and ; in particular, if

then

where

d fa d f

a

d f

a

d f

22

2 2

f ad aa d f

d f

w w w

2 2 2

1 f fd d

a d f

aa



Proof

The holdings of the characteristic portfolio can be

determined by solving for the portfolio with minimum risk

given the constraint that the exposure to characteristic

equals 1.

The first order conditions are

Where is the Lagrange multiplier.

a

min s.t. 1w w w a

1

0

w a

w a



The results are

And

Which proves item 1. Item 2 can be verified using and

the definition of portfolio variance. Item 3 can be verified

using the definition of beta with respect to portfolio P as

1

1a

aw

a a

1

1

a a

aw

2P Pw



For item 4, note and

Items 5 and 6 are straightforward.

2

2

{ }

{ }

ad a d

a d

a d

d a

w w

w w

a w

a

2

2

{ }

{ }

ad a d

a d

a a

a d

w w

w w

w d

d



Example 1:

Suppose is the attribute. Every

portfolio’s exposure to measures the extent of its

investment if then the portfolio is fully invested.

Portfolio C, the characteristic portfolio for attribute , is

the minimum-risk fully invested portfolio:

1 1 1

1Pw



Note every asset has a beta of 1 with this portfolio; and the

covariance of any fully invested portfolio with C is .

1

1

2

1

2

1

C

C C C

C

C

w

w w

w

2C



Example 2

Suppose beta is the attribute, where beta is defined by some

benchmark portfolio B

Then the benchmark is the characteristic portfolio of beta

2

B

B

w



So the benchmark is the minimum-risk portfolio with a beta

of 1.

Note that the relationship between portfolios C and B is

1

1

2

1

1

B

B B B

w w

w w

2 2BC B C C B



Proposition 2

Let q be the characteristic portfolio of the characteristic

(expected excess returns)

Then

a. The Sharpe ratio is

f

1

1q

fw

f f

11 2max{ | }q PSR SR P f f



b.

c.

2

1

1

1

q q

q

f w f

f f

2

q

q

qq

q

wf

wSR



d. If is the correlation between portfolios P and q, then

e. The fraction of q invested in risky assets is given by

Pq

P Pq qSR SR

2

2

C qq

C

f



Proof

For any portfolio , the Sharpe ratio is . For

any positive constant , the portfolio with holdings

will also have a Sharpe ratio equal to . Thus, to find

the maximum Sharpe ratio, we can set the expected excess

return to 1 and minimize risk. We can then minimize

subject to the constraint that . This is just the

problem we solved to get , the characteristic portfolio

of .

Items b and c are properties of the characteristic portfolio.

Pw P P PSR f

Pw

PSR

qw

f

B Bw w

1w f



For d, we use c:

And e follows from Proposition 1, item 4.

P PP

P P

qPq

P q

P qq Pq q

P q

f w fSR

wwSR

w wSR SR



Proposition 3

Assume

1. Portfolio q is net long

Let portfolio Q be the characteristic portfolio of .

Portfolio Q is fully invested with holdings

In addition SRQ=SRq, and for any portfolio P with a

correlation with portfolio Q, we have

0Cf

q f

0q

Q q qw w

PQ

P PQ QSR SR



2.

Note that this specifies exactly how Portfolio Q ―explains‖

expected returns.

3.

2 2

QC

C Q

ff

wrt 2

QQ Q Q

Q

wf f f

2

2

B QQ

Q B

f

f



4. If the benchmark is fully invested, , then1B

C BQ

C

f

f



Portfolio A (characteristic portfolio for alpha)

Define alpha as . Let be the characteristic

portfolio for alpha, the minimum risk portfolio with alpha

of 100% (note that this portfolio will have significant

leverage). According to Proposition 1, item 6, we can

express in terms of and . From item 4, we see

that the relationship between alpha and beta is

However, by construction, so portfolios A and B are

uncorrelated and

Bf f Aw

Aw Bw qw

2 2,B A B A A B

0B

0A

Characteristic Portfolio of Alpha


Consider the characteristic portfolio for alpha where

Is the vector of forecasted expected residual returns, where

the residual is relative to the benchmark portfolio. Since

the alphas are forecasts of residual return, both the

benchmark and the riskless asset have alphas of zero.

The portfolio weights are

1 2 N

1

1Aw

Characteristic Portfolio of Alpha


Portfolio A has an alpha of 1, and it has minimum

risk among all portfolios with that property. The variance

of portfolio A is

In addition, we can define alpha in terms of Portfolio A

1Aw

2

1

1A A Aw w

2

A

A

w

Alpha


Looking forward (ex ante), a is a forecast of residual return.

Looking backward (ex post), a is the average of the realized

residual returns.

The term alpha (just like beta) comes from the use of linear

regression

The residual returns from this regression are

―Realized alphas are for keeping score – the job of an active manager is to

score – for that you need to forecast alpha‖

( ) ( ) ( )P P P B Pr t r t t

( ) ( )P P Pt t

Alpha


Looking into the future, alpha is a forecast of residual return

Note that by definition, the benchmark portfolio always has

a residual return of 0. Therefore the alpha of the

benchmark portfolio must also be 0.

Similarly, the residual returns for a riskless portfolio is also

0 and it’s alpha must be 0.

n nE

Information Ratio


While α is the primary measure of a portfolio’s excess

return, another metric, the information ratio, is often used

by professionals.

The information ratio adjusts the α for the portfolio’s

residual risk and is written:

αP is predicted alpha; ωP is the predicted standard deviation

of the residual.

Typically, we consider the ex-ante information ratio for making decisions and

the ex-post information ratio for performance evaluation.

P

P

IR

Information Ratio


If ωP is 0, we set IRP equal to 0, and, in general, we define

the information ratio IR as the largest possible value of

IRP given alphas {αn}

max |pIR IR

Information Ratio


Now, returning to Portfolio A (the characteristic portfolio for

alpha), we note that it has several interesting properties

Proposition 4

1. Portfolio A has zero beta; therefore it typically has long

and short positions

2. Portfolio A has the maximum information ratio

0A Aw

1 for all A PIR IR IR P

Information Ratio


3. Portfolio A has total and residual risk equal the inverse of

IR.

4. Any portfolio P that can be written as

has IRP = IR.

1A A

IR

with 0P P B P A Pw w w

Information Ratio


5. Recall Portfolio Q – the characteristic portfolio of ).

This portfolio is a mixture of the benchmark and portfolio

A:

With and

Therefore IRQ = IR. The information ratio of Portfolio Q

equals that of Portfolio A.

q f

Q Q B Q Aw w w

2

2

B QQ

Q B

f

f

2

2

QQ

Q Af

Information Ratio


6. Total holdings in risky assets for Portfolio A are

7. Let be the residual return on any portfolio P. The

information ratio of portfolio P is

2

2C A

A

C

{ , }P Q P QIR IR Corr

P

Information Ratio


8. The maximum information ratio is related to portfolio Q’s maximum Sharpe ratio

9. Alpha can be represented as

So alpha is directly related to the marginal contribution to residual risk by the information ratio.

Q Q

Q Q

IR SR

MCRRAQ

A

wIR IR

Information Ratio


10. The Sharpe ratio of the benchmark is related to the

maximal information ratio and Sharpe ratio2 2 2BSR SR IR

Fundamental Law of Active

Management


A portfolio manager applies quantitative analysis to market

data to find and exploit the opportunities for excess return

hidden in market inefficiencies.

Quantitative analysis opens up the possibility of statistical

arbitrage if the methods and models used combine all

available information efficiently.

This is illustrated within the framework of the fundamental

law of active management (Grinold 1989; Grinold & Kahn

1997).


Management


The fundamental law states that the information ratio (IR) is

the product of the information coefficient (IC) and the

square root of breadth (BR)

Breadth is defined as the number of independent forecasts of

exceptional return (think of breadth as the number of

independent factors for which you make forecasts).

The information coefficient is the correlation of each

forecast with the actual outcomes (here assumed to be the

same for all forecasts).

IR IC BR


Management


This equation says that a higher information ratio can be

achieved by increasing the information coefficient or by

increasing the breadth.

IC can be increased by finding factors that are more

significant than those that are already in the model.

BR can be increased by finding more factors that are

uncorrelated (or relatively uncorrelated) with the existing

factors in the model.


Management


Generally, for quantitative portfolio management, we use a

model something like

The fundamental law basically assesses how well our model

explains stock-return process, and it expresses the

equation’s goodness of fit as the product of the number of

explanatory variables and each variable’s average

contribution.

1 1 2 2it i i t i t iK Kt itr f f f


Management


While the fundamental law can be expressed in different ways, there are certain general facts which always hold:

1. IR2 approximately equals the goodness of fit (R2) of the forecasting equations.

2. The breadth is the number of explanatory variables in the forecasting equations.

3. IC2 is the average contribution of each explanatory variable in increasing R2

4. When the benchmark is ignored and the risk-free rate is subtracted from the portfolio returns, IR is essentially the maximum Sharpe ratio one can achieve and the fundamental law decomposes the maximum Sharpe ratio into the number of explanatory variables and their average contribution.

Bibliography


Chincarini and Kim, Quantitative Equity Portfolio

Management, 2006.

Grinold, ―The Fundamental Law of Active Management‖,

Journal of Portfolio Management, 1989.

Grinold and Kahn, Active Portfolio Management‖, 2000.

▲▲▲▲▲▲▲▲▲▲▲▲▲

Active Portfolio Management

Documents