Conditional Risk Mappings

Conditional Risk Mappings

Andrzej Ruszczynski∗ Alexander Shapiro†

February 21, 2004; revised April 26, 2005

Abstract

We introduce an axiomatic definition of a conditional convex risk mappingand we derive its properties. In particular, we prove a representation theoremfor conditional risk mappings in terms of conditional expectations. We alsodevelop dynamic programming relations for multistage optimization problemsinvolving conditional risk mappings.

Key words: Risk, Convex Analysis, Conjugate Duality, Stochastic Optimization,Dynamic Programming, Multi-Stage Stochastic Programming.

1 Introduction

Models of risk and optimization problems involving these models attracted a consider-able attention in recent years. One direction of research associated with an axiomaticapproach, was initiated by Kijima and Ohnishi [8]. The influential paper of Artzner,Delbaen, Eber and Heath [1] introduced the concept of coherent risk measures. Sub-sequently, this approach was developed by Follmer and Schied [7], Cheredito, Delbaenand Kupper [5], Rockafellar, Uryasev and Zabarankin [15],. Ruszczynski and Shapiro[17], and others. In [17] a general duality framework has been developed, which allowsto view earlier representation theorems for risk measures as special cases of the theoryof conjugate duality in paired topological vector spaces. In the discussion below wefollow the general setting and terminology of [17].

We assume that Ω is a measurable space equipped with a sigma algebra F ofsubsets of Ω, and that an uncertain outcome is represented by a measurable functionX : Ω → R. We assume that the smaller the values of X, the better (for example,

∗Rutgers University, Department of Management Science and Information Systems, Piscataway,NJ 08854, USA, e-mail: [email protected]

†School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia30332-0205, USA, e-mail: [email protected]

1

X represents uncertain costs). Of course, our constructions can be easily adapted tothe reverse situation.

If we introduce a space X of measurable functions on Ω, we can talk of a riskfunction as a mapping ρ : X → R (we can also consider risk functions with valuesin the extended real line). In our earlier work [17] we have refined and extendedthe analysis of [1, 6, 5, 7, 15]. and we have derived, from a handful of axioms, fairygeneral properties of risk functions. Most importantly, we have analyzed optimizationproblems involving risk functions. In such a problem the uncertain outcome X resultsfrom our decision z modeled as an element of some vector space Z. Formally X =F (z), where F : Z → X . The associated optimization problem takes on the form:

Minz∈S

ρ(F (z)), (1.1)

where S is a convex subset of Z. In [17] we have derived optimality conditions andduality theory for problems of form (1.1).

Our objective now is to analyze models of risk in a dynamic setting. Supposethat our information, decisions and costs are associated with stages t = 1, . . . , T .After each stage t, a sigma subalgebra Ft of F models the information available. Weassume that these sigma subalgebras form a filtration: F1 ⊂ F2 ⊂ · · · ⊂ FT , withFT = F .

The cost incurred at stage t is represented by a function Xt ∈ Xt, where Xt is aspace of measurable functions on (Ω,Ft). The total cost is thus

X = X1 + X2 + · · ·+ XT .

One way to model risk in problems involving such random outcomes would be toapply a certain risk function ρ(·) to the entire sum X. However, this would ignorethe dynamic character of the problem in question, and the sequential nature of thedecision making process. For these reasons we aim at developing conditional riskmappings that represent future risk from the point of view of the information availableat the current stage.

Our approach, as well as that of Riedel [13], is different from the method ofArtzner et al. [2]. In [2] an adapted sequence Xt, t = 1, . . . , T , is viewed as ameasurable function on a new measurable space (Ω′,F ′), with Ω′ := Ω× 1, . . . , T,and with the sigma algebra F ′ generated by sets of form Bt × t, for all Bt ∈ Ft

and t = 1, . . . , T . They use then the properties of coherent (scalar) risk measureson this new space to develop risk models in the dynamic setting. Our intention isto develop models suitable for sequential decision making, and eventually to extenddynamic programming equations to risk-averse problems.

The main issue here is our knowledge at the time when risk is evaluated. In theclassical setting of multistage stochastic optimization, the main tool used to formu-late the corresponding dynamic programming equations is the concept of conditionalexpectation. Given two sigma algebras F1 ⊂ F2 of subsets of Ω, with F1 representing

2

our knowledge when the expectation is evaluated and F2 representing all events underconsideration, the conditional expectation can be defined as a mapping from a spaceof F2-measurable functions into a space of F1-measurable functions. Of course, theconditional expectation mapping is linear. The basic idea of our approach is to extendthe concept of conditional expectation to an appropriate class of convex mappings.

Together with the sigma algebras F1 ⊂ F2, we consider two linear (vector) spacesX1 and X2 of functions measurable with respect to F1 and F2. A conditional riskmapping is defined in section 2 as a convex, monotone and translation equivariantmapping ρX2|X1

: X2 → X1. In section 3 we extend our insights from [17] to derive aduality representation theorem for conditional risk mappings. Section 4 is devoted tothe analysis of relations of conditional risk mappings and conditional expectations.In section 5 we consider a sequence of sigma algebras F1 ⊂ F2 ⊂ · · · ⊂ FT and thecorresponding linear spaces Xt, t = 1, . . . , T , of measurable functions, and we analyzecompositions of risk mappings of the form ρX2|X1

· · · ρXt−1|Xt−2 ρXT |XT−1

. Two

practically important examples of conditional risk mappings are thoroughly analyzedin section 6. Finally, section 7 addresses the issue of risk measures for sequences, anddevelops dynamic programming equations for associated optimization problems.

2 Axioms of Conditional Risk Mappings

In order to construct dynamic models of risk we need to extend the concept of riskfunctions. We proceed as follows. Let F1 ⊂ F2 be sigma algebras of subsets of a setΩ, and X1 ⊂ X2 be linear spaces of real valued functions φ(ω), ω ∈ Ω, measurablewith respect to F1 and F2, respectively.

Definition 1 We say that a mapping ρ : X2 → X1 is a conditional risk mapping ifthe following properties hold:

(A1) Convexity: If α ∈ [0, 1] and X, Y ∈ X2, then

αρ(X) + (1− α)ρ(Y ) ρ(αX + (1− α)Y

);

(A2) Monotonicity: If Y X, then ρ(Y ) ρ(X);

(A3) Predictable Translation Equivariance: If Y ∈ X1 and X ∈ X2, then

ρ(X + Y ) = ρ(X) + Y.

The inequalities in (A1) and (A2) are understood componentwise, i.e., Y Xmeans that Y (ω) ≥ X(ω) for every ω ∈ Ω. The above definition depends on thechoice of the spaces X1 and X2. To emphasize this, we sometimes write ρX2|X1

for theconditional risk mapping. An example of a conditional risk mapping is the conditionalexpectation ρ(X) := E[X|F1], provided that E[X|F1] is an element of the space X1 for

3

every X ∈ X2. We show in section 4 that, in general, the concept of conditional riskmappings is closely related to the notion of conditional expectation. This motivatesthe use of the adjective conditional in the name of these mappings.

Axioms (A1)–(A3) generalize the conditions introduced in [13] for dynamic riskmeasures in the case of a finite space Ω. We postulate convexity, rather than positivehomogeneity, and we allow for a general measurable space Ω.

For each ω ∈ Ω, we associate with ρ the function

ρω(X) := [ρ(X)](ω), X ∈ X2. (2.1)

Assumptions (A1) and (A2) mean that, for every ω ∈ Ω, the function ρω : X2 →R is convex and monotone, respectively. Moreover, assumption (A3) implies thatρω(X + a) = ρω(X) + a for every X ∈ X2 and every a ∈ R, provided that the spaceX1 includes the constant functions (see the following condition (C′)). That is, ρω(·)satisfies the axioms of convex risk functions, as given in [7] and analyzed in our earlierpaper [17]. In particular, if the sigma algebra F1 is trivial, i.e., F1 = ∅, Ω, then anyfunction X ∈ X1 is constant over Ω, and hence the space X1 can be identified with R.In that case ρ(·) becomes real valued and assumptions (A1)–(A3) become the axiomsof convex (real valued) risk functions.

We assume that with each space Xi, i = 1, 2, is associated a linear space Yi ofsigned finite measures on (Ω,Fi) such that Y1 ⊂ Y2, and1

∫Ω|X|d|µ| < +∞ for every

X ∈ Xi and µ ∈ Yi. Then we can define the scalar product (bilinear form)

〈µ, X〉 :=

∫Ω

X(ω) dµ(ω), X ∈ Xi, µ ∈ Yi. (2.2)

By PYiwe denote the set of probability measures µ ∈ Yi, i.e., µ ∈ PYi

if µ isnonnegative and µ(Ω) = 1. We assume that Xi and Yi are paired locally convextopological vector spaces. That is, Xi and Yi are endowed with respective topologieswhich make them locally convex topological vector spaces. Moreover, these topologiesare compatible with the scalar product (2.2), i.e., every linear continuous functional onXi can be represented in the form 〈µ, ·〉 for some µ ∈ Yi, and every linear continuousfunctional on Yi can be represented in the form 〈·, X〉 for some X ∈ Xi. In particular,we can endow each space Xi and Yi with its weak topology induced by its paired space.This will make Xi and Yi paired locally convex topological vector spaces provided thatfor any X ∈ Xi\0 there exists µ ∈ Yi such that 〈µ, X〉 6= 0, and for any µ ∈ Yi\0there exists X ∈ Xi such that 〈µ, X〉 6= 0.

A natural choice of Xi, i = 1, 2, is the space of all bounded Fi-measurable functionsX : Ω → R. In that case we can take Yi to be the space of all signed finite measures on(Ω,Fi). Another possible choice is Xi := Lp(Ω,Fi, P ) for some positive (probability)measure P on (Ω,F2) and p ∈ [1, +∞]. Note that since F1 ⊂ F2, P is also a positive

1For a signed measure µ we denote by |µ| the corresponding total variation measure, i.e., |µ| =µ+ + µ− where µ = µ+ − µ− is the Jordan decomposition of µ.

4

measure on (Ω,F1), and hence X1 ⊂ X2. We can take then Yi to be the linear spaceof measures ν which are absolutely continuous with respect to P and whose density(Radon–Nikodym derivative) h = dν/dP belongs to the space Lq(Ω,Fi, P ), whereq ≥ 1 is such that 1/p + 1/q = 1. In that case we identify Yi with Lq(Ω,Fi, P ), anddefine the scalar product

〈h,X〉 :=

∫Ω

X(ω)h(ω)dP (ω), X ∈ Lp(Ω,Fi, P ), h ∈ Lq(Ω,Fi, P ). (2.3)

Note that an element X ∈ Lp(Ω,Fi, P ) (an element h ∈ Lq(Ω,Fi, P )) is a class offunctions which are equal each other for almost every (a.e.) ω ∈ Ω with respect tothe measure P . The space Xi := Lp(Ω,Fi, P ) is a Banach space and, for p ∈ [1, +∞),Yi := Lq(Ω,Fi, P ) is its dual space of all continuous linear functionals on Xi. Whendealing with Banach spaces we endow Xi and Yi := X ∗

i with the strong (norm) andweak∗ topologies, respectively. If Xi is a reflexive Banach space, i.e., X ∗∗

i = Xi, thenXi and X ∗

i , both endowed with strong topologies, form paired spaces.We assume throughout the paper that the space X2 is sufficiently large, so that

the following assumption holds (recall that for X ∈ X2, the notation X 0 meansthat X(ω) ≥ 0 for all ω ∈ Ω).

(C) If µ ∈ Y2 is not nonnegative, then there exists X ∈ X2 such that X 0 and〈µ, X〉 < 0.

The above condition ensures that the cone of nonnegative valued functions in X2 andthe cone of nonnegative measures in Y2 are dual to each other. It is a mild technicalrequirement on the pairing of X2 and Y2. We are using it in the key Theorem 1.

A measure µ is not nonnegative if µ(A) < 0 for some A ∈ F2. Therefore, condition(C) holds, for example, if the space X2 contains all functions 1lA(·), A ∈ F2, where1lA(ω) = 1 for ω ∈ A and 1lA(ω) = 0 for ω 6∈ A. For technical reasons we assume thatthis property holds also for the space X1:

(C′) For every B ∈ F1, the function 1lB belongs to the space X1.

We say that Y is an F1-step function if it can be represented in the form Y =∑Kk=1 αk1lBk

, where Bk ∈ F1 and Bk ∩ Bl = ∅, if k 6= l. Clearly, if αk ≥ 0, thenthe step function Y is nonnegative. By assumption (C′), the space X1 contains allF1-step functions, and in particular all constant functions.

It is said that the conditional risk mapping ρ is positively homogeneous if

ρ(αX) = αρ(X), for all X ∈ X2 and α > 0. (2.4)

In that case ρ(0) = 0, and for any Y ∈ X1 we have

ρ(Y ) = ρ(0 + Y ) = ρ(0) + Y = Y,

and hence ρ[ρ(X)] = ρ(X) for any X ∈ X2.

5

3 Conjugate Duality of Conditional Risk Mappings

We say that mapping ρ : X2 → X1 is lower semicontinuous if, for every ω ∈ Ω,the corresponding function ρω : X2 → R is lower semicontinuous (in the consideredtopology of X2). Note that the function ρω(·) is real valued here, and hence if X2

is a Banach space, equipped with its strong (norm) topology, and ρω(·) is convexand lower semicontinuous, then actually ρω(·) is continuos. With ρω is associated itsconjugate function

ρ∗ω(µ) := supX∈X2

〈µ, X〉 − ρω(X)

. (3.5)

Note that the conjugate function ρ∗ω(·) can take the value +∞. Recall that theextended real valued function ρ∗ω(·) is said to be proper if ρ∗ω(µ) > −∞ for any µ ∈ Y2

and its domaindom(ρ∗ω) := µ ∈ Y2 : ρ∗ω(µ) < +∞

is nonempty. Since it is assumed that ρω(·) is finite valued, we always have ρ∗ω(·) >−∞. We also use the notation ρ∗(µ, ω) for the function ρ∗ω(µ) in order to emphasizethat it is a function of two variables, i.e., ρ∗ : Y2 × Ω → R. It has the followingproperties: for every ω ∈ Ω the function ρ∗(·, ω) is convex and lower semicontinuous,and for every µ ∈ Y2 the function ρ∗(µ, ·) is F1-measurable.

We denote by PY2 the set of all probability measures on (Ω,F2) which are in Y2.In particular, if Y2 := Lq(Ω,F2, P ), then (identifying measures with their densities)

PY2 =

h ∈ Lq(Ω,F2, P ) :

∫Ω

h(ω)dP (ω) = 1 and h(ω) ≥ 0 for a.e. ω ∈ Ω

.

With each ω ∈ Ω we associate a set of probability measures PY2|F1(ω) ⊂ PY2 definedas the set of all ν ∈ PY2 such that for every B ∈ F1

ν(B) =

1, if ω ∈ B,

0, if ω 6∈ B.(3.6)

Note that ω is fixed here and B varies in F1. Condition (3.6) means that for every ωand every B ∈ F1 we know whether B happened or not, and can be written equiva-lently as ν(B) = 1lB(ω), ∀ω ∈ Ω. In particular, if F1 = ∅, Ω, then PY2|F1(ω) = PY2

for all ω ∈ Ω.We can now formulate the basic duality result for conditional risk mappings.

Theorem 1 Let ρ = ρX2|X1be a lower semicontinuous conditional risk mapping sat-

isfying assumptions (A1)–(A3). Then

ρω(X) = supµ∈PY2|F1

(ω)

〈µ, X〉 − ρ∗(µ, ω)

, ω ∈ Ω, X ∈ X2, (3.7)

6

where PY2|F1(ω) is the set of probability measures defined in (3.6), and ρ∗(µ, ω) isdefined in (3.5). Conversely, suppose that a mapping ρ : X2 → X1 can be repre-sented in form (3.7) for some (proper) function ρ∗ : Y2 × Ω → R. Then ρ is lowersemicontinuous and satisfies conditions (A1)–(A3).

Proof. If assumptions (A1)–(A3) hold true, then ρw is a convex risk function.Since ρw is lower semicontinuous, it follows from the Fenchel-Moreau Theorem that

ρω(X) = supµ∈PY2

〈µ, X〉 − ρ∗ω(µ)

, X ∈ X2. (3.8)

Conversely, if ρω can be represented in form (3.7) for some ρ∗ω, then ρ is lower semi-continuous and satisfies conditions (A1)–(A2). All these facts can be established byapplying verbatim the proof of Theorem 2 in [17] to the function ρω. Therefore, theonly issue that needs to be clarified is the restriction of dom(ρ∗ω) to PY2|F1(ω).

Let ω ∈ Ω be fixed and let µω ∈ dom(ρ∗ω), and hence ρ∗ω(µω) is finite. It followsfrom (A3) that for every Y ∈ X1 we have

ρ∗ω(µω) = supX∈X2

〈µω, X + Y 〉 − ρω(X + Y )

= sup

X∈X2

〈µω, X〉+ 〈µω, Y 〉 − ρω(X)− Y (ω)

= ρ∗ω(µω) + 〈µω, Y 〉 − Y (ω).

Therefore 〈µω, Y 〉 = Y (ω) for all Y ∈ X1. Setting Y := 1lB, where B ∈ F1, and notingthat 〈µω, Y 〉 = Eµω [1lB] = µω(B), we conclude that µω(B) = 1lB(ω) for all ω ∈ Ω. Itfollows that µω ∈ PY2|F1(ω), and hence dom ρ∗ω ⊆ PY2|F1(ω).

To prove the converse we only need to verify assumption (A3). Suppose that(3.7) holds true. Then every µω ∈ dom(ρ∗ω) is an element of PY2|F1(ω). Let Y :=∑K

k=1 αk1lBk, be an F1-step function. By assumption (C′), we have Y ∈ X1. Then

〈µω, Y 〉 =K∑

k=1

αkµω(Bk) = Y (ω).

Passing to the limit, we obtain 〈µω, Y 〉 = Y (ω) for every F1-measurable Y . Therefore(3.7) implies that for every Y ∈ X1 and all ω ∈ Ω we have

[ρ(X + Y )](ω) = supµ∈PY2|F1

(ω)

〈µ, X + Y 〉 − ρ∗(µ, ω)

= sup

µ∈PY2|F1(ω)

〈µ, X〉 − ρ∗(µ, ω)

+ Y (ω).

This is identical to (A3).

7

Let us provide a sufficient condition for the lower semicontinuity assumption.Recall that the space X2 is said to be a lattice if for any X1, X2 ∈ X2 the elementX1 ∨X2, defined as[

X1 ∨X2

](ω) := max

X1(ω), X2(ω)

, ω ∈ Ω,

belongs to X2. For every X ∈ X2 we can then define |X| ∈ X2 in a natural way, i.e.,|X|(ω) = |X(ω)|, ω ∈ Ω. The space X2 is a Banach lattice if it is a Banach spaceand |X1| ≤ |X2| implies ‖X1‖ ≤ ‖X2‖. For example, every space X2 := Lp(Ω,F2, P ),p ∈ [1, +∞], is a Banach lattice. We can remark here that the lower semicontinuityof ρ follows from conditions (A1)–(A2), if X2 has the structure of a Banach lattice.Note that [ρ(X)](·) is a finite valued function, and hence ρω(X) is finite for all ω ∈ Ωand all X ∈ X2. Direct application of [17, Proposition 1] yields the following result.

Proposition 1 Suppose that X2 is a Banach lattice and ρ : X2 → X1 satisfies as-sumptions (A1) and (A2). Then ρω(·) is continuous for all ω ∈ Ω.

Clearly, if ρ : X2 → X1 is positively homogeneous, then the corresponding functionρω is also positively homogeneous. Therefore, if ρ = ρX2|X1

is a positively homoge-neous, lower semicontinuous, conditional risk mapping, then ρ∗(·, ω) is the indicatorfunction of a closed convex set A(ω) ⊂ PY2|F1(ω), and hence

ρω(X) = supµ∈A(ω)

〈µ, X〉, ω ∈ Ω, X ∈ X2. (3.9)

We view ω 7→ A(ω) as a multifunction from Ω into the set PY2 of probability measures,on (Ω,F2), which are included in Y2. Formula (3.9) was first derived in [13, Th. 1]for finite spaces Ω. Our results extend it to arbitrary measurable spaces.

The property of positive homogeneity can be strengthened substantially.

Proposition 2 Let ρ = ρX2|X1 be a positively homogeneous, lower semicontinuousconditional risk mapping. Suppose that for all B ∈ F1 and X ∈ X2, it holds that1lBX ∈ X2. Then for every nonnegative F1-step function Y and every X ∈ X2, wehave that Y X ∈ X2 and

ρ(Y X) = Y ρ(X). (3.10)

Proof. Consider a set B ∈ F1 and any X ∈ X2. It follows from (3.9) that


〈µ, 1lBX〉+ 〈µ, 1lΩ\BX〉

, ω ∈ Ω. (3.11)

If ω ∈ B then (3.6) implies that µ(Ω \B) = 0 for all µ ∈ A(ω), and the second termat the right hand side of (3.11) vanishes. Hence


〈µ, 1lBX〉 = ρω(1lBX). (3.12)

8

By a similar argument, ρω(1lBX) = 0 for all ω 6∈ B. Thus

ρ(1lBX) = 1lBρ(X). (3.13)

Consider now a nonnegative F1-step function Y :=∑K

k=1 αk1lBk. Then Y X =∑K

k=1 αk1lBkX ∈ X2. It follows from (3.13) that for ω ∈ Bk the following chain

of equations holds true:

ρω(Y X) = 1lBk(ω)ρω(Y X) = ρω(1lBk

Y X) = ρω(αk1lBkX).

Using positive homogeneity and (3.13) again, we obtain

ρω(Y X) = αkρω(1lBkX) = αkρω(X), if ω ∈ Bk.

This means that

ρω(Y X) =K∑

k=1

αk1lBk(ω)ρω(X) = Y (ω)ρω(X), ω ∈ Ω.

This completes the proof.

Remark 1 In order to pass in (3.10) from step functions to general functions Y ∈X1 we need some additional assumptions about the spaces involved. For example,consider Xi := Lp(Ω,Fi, P ), i = 1, 2, where P is a probability measure on (Ω,F2)and p ∈ [1, +∞). Then for every F1-step function Y and every X ∈ X2, we haveY X ∈ X2. Moreover, the set of F1-step functions is dense in X1. Hence by “passingto the limit operation” and using Proposition 1 we conclude that (3.10) is valid forall Y ∈ X1 and X ∈ X2, provided that Y X ∈ X2.

4 Conditional Expectation Representation

In this section we discuss relations between conditional risk mappings and condi-tional expectations. In Theorem 1 we have established the representation (3.7) ofa conditional risk mapping. Our objective now is to analyze in more detail the setof probability measures A(ω) := dom(ρ∗ω). Recall that A(ω) ⊂ PY2|F1(ω), and thatrepresentation (3.7) reduces to (3.9), if the risk mapping is positively homogeneous.Definition (3.6) of the set PY2|F1(ω) means that its every element µω is a certain prob-ability measure on F2, which assigns value 1 or 0 to sets in F1, depending whether ωis an element of the set or not. It is thus reasonable to ask if it is possible to representthese measures as conditional probability measures.

In order to gain an insight into this question, let us view A as a multifunctionfrom Ω to PY2 . For the sake of illustration we temporarily suppose that Ω is finite,say Ω = ω1, . . . , ωN, and that F2 contains all subsets of Ω. Consider a selection

9

µω ∈ A(ω), i.e., µωi∈ A(ωi), i = 1, ..., N . Since A(ω) ⊂ PY2|F1(ω), we have of course

that µω ∈ PY2|F1(ω). By the definition of the conditional risk mapping, the functionω 7→ ρω(X) is F1-measurable. Therefore, the selection µω is F1-measurable as well.The selection µω is a conditional probability measure, of some measure ν2 on (Ω,F2)with respect to the sigma algebra F1, if and only if the complete probability formulais valid:

ν2(B ∩ S) =∑ω∈S

µω(B)ν2(ω) for all S ∈ F1, B ∈ F2 (4.1)

(see, e.g., Billingsley [3, p. 430]). By noting that the left hand side of (4.1) is equal to∑ω∈B∩S ν2(ω), we can view (4.1) as a system of linear equations in unknowns ν2(ωi),

i = 1, ..., N .The question is whether system (4.1) has a solution ν2(ω), which is a probability

measure. The answer is rather simple. Consider a set S ∈ F1. We have thatµω ∈ PY2|F1(ω) and hence it follows by (3.6) that if ω 6∈ S, then µω(S) = 0, andconsequently µω(ω) = 0 for any ω ∈ S. Let S1, ..., SK ∈ F1 be sets generating F1,i.e., Si ∩ Sj = ∅ for i 6= j, ∪K

i=1Si = Ω and if S is an F1-measurable subset of Si,i = 1, ..., N , then either S = Si or S = ∅. For ω ∈ Si, i = 1, ..., N , the value of µω(ω)is constant, because µω is F1-measurable and the set Si is indivisible. Let us denotethis value by µSi

(ω). Note that∑ω∈Si

µSi(ω) = µω(Si) = 1, i = 1, ..., N,

where the last equality holds by (3.6). Now let us take an arbitrary probabilitymeasure ν1 on (Ω,F1) and define

ν2(ω) := ν1(Si)µSi(ω), ω ∈ Si, i = 1, ..., K. (4.2)

Of course, the condition “ω ∈ Si” means that Si is the smallest F1-measurable setcontaining ω. Clearly ν2(ω) ≥ 0 and

∑ω∈Ω

ν2(ω) =K∑

i=1

ν1(Si)∑ω∈Si

µSi(ω) =

K∑i=1

ν1(Si) = 1,

and hence ν2 is a probability measure.Let us verify that equation (4.1) holds here. We have that for ω ∈ Sj, it follows

by (3.6) that µω(B) = µω(B ∩ Sj) and hence

∑ω∈S

µω(B)ν2(ω) =K∑

i=1

ν1(Si)∑

ω∈S∩Si

µSi(ω)

∑ω∈B∩Si

µSi(ω).

Now∑

ω∈S∩SiµSi

(ω) is equal 1 if Si ⊂ S and is zero otherwise. It follows that∑ω∈S

µω(B)ν2(ω) =∑

i:Si⊂S

ν1(Si)∑

ω∈B∩Si

µSi(ω) =

∑ω∈S∩B

ν2(ω).

10

We showed that to every selection µω ∈ A(ω) corresponds a probability measureν2 on (Ω,F2) such that µω is the conditional probability measure of ν2. The measureν2 is given explicitly by formula (4.2). In particular, we have that ν2(S) = ν1(S)for every S ∈ F1. This shows that not only µω can be represented as a conditionalprobability of some measure ν2, but that we can also fix the values of ν2 on the sigmaalgebra F1 by taking an arbitrary probability measure ν1 on (Ω,F1).

In order to extend the above analysis to a general setting, we proceed as follows.

Definition 2 We say that a multifunction M : Ω ⇒ PY2 is weakly∗ F1-measurableif for every X ∈ X2 the multifunction MX : Ω ⇒ R, defined as

MX(ω) :=〈µ, X〉 : µ ∈M(ω)

,

is F1-measurable. We say that a selection µω ∈ M(ω) is weakly∗ F1-measurable iffor every X ∈ X2 the function ω 7→ 〈µω, X〉 is F1-measurable.

The multifunction ω 7→ A(ω), associated with representation (3.9), is weakly∗

F1-measurable. Indeed,

AX(ω) =[− ρω(−X), ρω(X)

],

and hence F1-measurability of AX(·) follows from the fact that ρ(X) ∈ X1 whichensures F1-measurability of [ρ(X)](·) and [ρ(−X)](·). In the sequel, whenever speak-ing about measurability of multifunctions and their selections, we shall mean weak∗

measurability.By Theorem 1, for all ω ∈ Ω, every measure µ ∈ A(ω) satisfies condition (3.6).

Therefore, if µω = µ(ω) is a selection of A(ω), then

[µ(·)](S) = 1lS(·), for all S ∈ F1. (4.3)

Moreover, if µ(ω) is weakly∗ F1-measurable, then [µ(·)](B) is F1-measurable, for everyB ∈ F2.

Example 1 Consider ρ(X) := E[X|F1], X ∈ X2, where the conditional expectationis taken with respect to a probability measure P on (Ω,F2). It is assumed here thatthis conditional expectation is well defined for every X ∈ X2, and the space X1 islarge enough such that it contains E[X|F1] for all X ∈ X2. Note that the functionE[X|F1](·) is defined up to a set of P -measure zero, i.e., two versions of E[X|F1](·)can be different on a set of P -measure zero. The conditional expectation mapping ρsatisfies assumptions (A1)–(A3) and is a linear mapping. Representation (3.9) holdswith A(ω) = µ(ω) being a singleton and µω = µ(ω) being a probability measure on(Ω,F2). By the definition of the conditional expectation, E[X|F1] is F1-measurable,and hence µω is weakly∗ F1-measurable. Considering X = 1lA for A ∈ F2, we see that

µω(A) = E[1lA|F1](ω) =[P (A|F1)

](ω). (4.4)

This means that µ(·) is the conditional probability of P with respect to F1 (see, e.g.,Billingsley [3, pp. 430–431]). Clearly it satisfies (3.6).

11

Remark 2 The family of conditional risk mappings is closed under the operation oftaking the maximum. That is, let ρν = ρν

X2|X1, ν ∈ I, be a family of conditional risk

mappings satisfying assumptions (A1)–(A3). Here I is an arbitrary set. Suppose,further, that for every X ∈ X2 the function

[ρ(X)](·) := supν∈I

[ρν(X)](·) (4.5)

belongs to the space X1, and hence ρ maps X2 into X1. It is then straightforward toverify that the max-function ρ also satisfies assumptions (A1)–(A3). Moreover, if ρν ,ν ∈ I, are lower semicontinuous, then ρ is also lower semicontinuous. In particular,let ρν(X) := Eν [X|F1], ν ∈ I, where I is a subset of the set of probability measures on(Ω,F2). Suppose that the corresponding max-function ρ is well defined, i.e., ρ mapsX2 into X1. Then ρ is a lower semicontinuous, positively homogeneous, conditionalrisk mapping. We show below that, under mild regularity conditions, the converse isalso true, i.e., a positively homogeneous conditional risk mapping can be representedas maximum of a family of conditional expectations.

Remark 3 Let (I,G, Q) be a probability space and ρν = ρνX2|X1

, ν ∈ I, be a family

of conditional risk mappings satisfying assumptions (A1)–(A3). Suppose further thatthe integral mapping

[ρ(X)](ω) :=

∫Iρν

ω(X)dQ(ν) (4.6)

is well defined for every X ∈ X2 and ω ∈ Ω, and ρ(X) is an element of X1. It is notdifficult to see then that the integral mapping ρ : X2 → X1 also satisfies conditions(A1)–(A3). Moreover, if each ρν , ν ∈ I, is lower semicontinuous, then by Fatou’slemma, ρ is also lower semicontinuous.

Let µω ∈ A(ω) be a weakly∗ F1-measurable selection. It follows that for anyset B ∈ F2, the function ξ(ω) := µω(B) is F1-measurable. Consider a probabilitymeasure ν1 on (Ω,F1). Then the function ξ(ω) can be viewed as a random variableon the probability space (Ω,F1, ν1). We can now define a set function ν2 on (Ω,F2)as follows:

ν2(B) =

∫Ω

µω(B) dν1(ω), B ∈ F2. (4.7)

Proposition 3 Let µω ∈ A(ω) be a weakly∗ F1-measurable selection. Then for everyprobability measure ν1 on (Ω,F1) the set function (4.7) is a probability measure on(Ω,F2) which is equal to ν1 on F1, and is such that µω is the conditional probabilityof ν2 with respect to F1.

Proof. The set function ν2 is a probability measure, as an average of probabilitymeasures µω. For B ∈ F1, by virtue of (4.3), formula (4.7) yields:

ν2(B) = ν1(B), B ∈ F1.

12

It remains to prove that µω is the conditional probability of ν2 with respect to thesigma subalgebra F1, i.e., that µω(B) =

[ν2(B|F1)

](ω) for any B ∈ F2 and a.e. ω.

Let us consider sets B ∈ F2 and S ∈ F1. From (4.7) we obtain

ν2(B ∩ S) =

∫Ω

µω(B ∩ S) dν1(ω). (4.8)

We have µω(B) = µω(B∩S)+µω(B\S) and µω(B\S) ≤ µω(Ω\S). Since Ω\S ∈ F1,it follows from (4.3) that µω(Ω \ S) = 0 for all ω ∈ S. Hence µω(B \ S) = 0, for allω ∈ S. Thus µω(B) = µw(B ∩ S), and equation (4.8) can be rewritten as follows:

ν2(B ∩ S) =

∫S

µω(B) dν2(ω), for all S ∈ F1 and B ∈ F2. (4.9)

This is equivalent to the statement that µω is the conditional probability of ν2 withrespect to F1 (see, e.g., Billingsley [3, p. 430]).

Recall that Xi and Yi are assumed to be paired locally convex topological vectorspaces. It is said that Xi is separable if it has a countable dense subset.

Lemma 1 Suppose that the space X2 is separable and the representation (3.9) holds.Then there exists a countable family µi

ω, i ∈ N, of weakly∗ F1-measurable selectionsof A(ω) such that

[ρ(X)](ω) = supi∈N

〈µiω, X〉 (4.10)

for all X ∈ X2 and ω ∈ Ω.

Proof. Let εk ↓ 0 be a sequence of positive numbers and and Xnn∈N be a densesubset of X2. For every k, n ∈ N consider the multifunction

Mk,n(ω) :=ν ∈ A(ω) : 〈ν, Xn〉 ≥

[ρ(Xn)

](ω)− εk

.

This multifunction is weakly∗ F1-measurable and nonempty valued. Since X2 is sepa-rable, the multifunction Mk,n(·) admits a weakly∗ F1-measurable selection µk,n(·) (see[9]). By the definition of Mk,n we have then that

〈µk,n(ω), Xn〉 ≥[ρ(Xn)

](ω)− εk

for all k, n ∈ N and ω ∈ Ω. Since ρω(·) is lower semicontinuous for every ω ∈ Ω, itfollows that

supk,n〈µk,n(ω), X〉 ≥ [ρ(X)](ω), ω ∈ Ω.

Because of (3.9) we also have that

[ρ(X)](ω) ≥ supk,n〈µk,n(ω), X〉, ω ∈ Ω.

Thus representation (4.10) follows with µiω := µk,n(ω), i = (k, n) ∈ N× N.

13

Remark 4 Under the assumptions of Lemma 1, we can also write the followingrepresentation

[ρ(X)](ω) = supµ(·)∈A(·)

〈µ(ω), X〉, (4.11)

where the supremum is taken over all weakly∗ F1-measurable selections µ(ω) ∈ A(ω).

We can now formulate the main result of this section.

Theorem 2 Let ρ = ρX2|X1 be a positively homogeneous, lower semicontinuous, con-ditional risk mapping. Suppose that the space X2 is separable. Then for every prob-ability measure ν on (Ω,F1) there exists a countable family νi ∈ PY2, i ∈ N, ofprobability measures on (Ω,F2) which agree with ν on F1 and are such that

ρω(·) = supi∈N

Eνi

[· |F1

](ω), ω ∈ Ω. (4.12)

Proof. By Theorem 1 representation (3.9) holds. The assertion then follows byLemma 1 together with Proposition 3.

Remark 5 If in Theorem 2 we remove the assumption that ρ is positively homoge-neous, then by the above analysis, Theorem 1 implies the following extension of therepresentation (4.12):

ρω(·) = supi∈N

Eνi

[· |F1

](ω)− γi(ω)

, ω ∈ Ω, (4.13)

whereγi(ω) := sup

X∈X2

Eνi

[X|F1

](ω)− ρω(X)

. (4.14)

Remark 6 Assuming that the representation (4.12) holds, we have that for anyY ∈ X1, Y 0, and X ∈ X2 such that Y X ∈ X2,

ρω(Y X) = supi∈N

Eνi

[Y X|F1

](ω) = Y (ω) sup

i∈NEνi

[X|F1

](ω) = Y (ω)ρω(X), ω ∈ Ω.

That is, under the assumptions of Theorem 2, the result of Proposition 2 (i.e., equation(3.10)) holds for a general function Y ∈ X1 (compare with Remark 1).

5 Iterated Risk Mappings

In order to formulate dynamic programming equations for multistage stochastic opti-mization problems involving risk measures we need to consider compositions of severalconditional risk mappings. In this section we give a preliminary analysis of this sub-ject. Let F1 ⊂ F2 ⊂ F3 be sigma algebras, X1 ⊂ X2 ⊂ X3 be respective spaces ofmeasurable functions, with dual spaces Y1 ⊂ Y2 ⊂ Y3, and let ρX3|X2

: X3 → X2 and

14

ρX2|X1: X2 → X1 be conditional risk mappings. (For any inclusion like X2 ⊂ X3, we

assume that the topology of X2 is induced by the topology of X3.) Then it can beeasily verified that the composite mapping ρX3|X1

: X3 → X1, defined by

ρX3|X1:= ρX2|X1

ρX3|X2, (5.1)

is also a conditional risk mapping.Suppose that both conditional risk mappings at the right hand side of (5.1) are

positively homogeneous and lower semicontinuous. We have then[ρX3|X2

(X)](ω) = sup

µ2∈A2(ω)

〈µ2, X〉, X ∈ X3, ω ∈ Ω, (5.2)[ρX2|X1

(Y )](ω) = sup

µ1∈A1(ω)

〈µ1, Y 〉, Y ∈ X2, ω ∈ Ω, (5.3)

with the multifunctions A2 : Ω ⇒ PY3 and A1 : Ω ⇒ PY2 having closed convex valuesand weakly∗ measurable with respect to F2 and F1, correspondingly. In order toanalyze composition (5.1) it is convenient to consider weakly∗ measurable selectionsµi(·) of the multifunctions Ai(·), i = 1, 2.

Proposition 4 Suppose that the space X3 is separable, and ρX2|X1and ρX3|X2

are pos-itively homogeneous, lower semicontinuous and satisfy conditions (A1)–(A3). Thenthe conditional risk mapping ρX3|X1

can be represented in the form[ρX3|X1

(X)](ω) = sup

µ1∈A1(ω)

supµ2(·)∈A2(·)

∫Ω

〈µ2(ω), X〉 dµ1(ω), (5.4)

where the second sup operation at the right hand side of (5.4) is taken with respect toweakly∗ F2-measurable selections µ2(ω) ∈ A2(ω).

Proof. By (5.2) and (5.3) we have that, for every ω ∈ Ω,[ρX3|X1

(X)](ω) = sup

µ1∈A1(ω)

∫Ω

supµ2∈A2(ω)

〈µ2, X〉 dµ1(ω).

By Lemma 1 (see Remark 4), we also have that

supµ2∈A2(ω)

〈µ2, X〉 = supµ2(·)∈A2(·)

〈µ2(ω), X〉, (5.5)

where the supremum in the right hand side of (5.5) is taken over all weakly∗ F2-measurable selections µ2(ω) ∈ A2(ω). Consequently[

ρX3|X1(X)

](ω) = sup

µ1∈A1(ω)

∫Ω

supµ2(·)∈A2(·)

〈µ2(ω), X〉 dµ1(ω). (5.6)

Similarly to the proof of Lemma 1, we can now interchange the integral and ‘sup’operators at the right hand side of (5.6), and hence (5.4) follows.

15

Remark 7 By Lemma 1 we have that actually it suffices to take the second supre-mum at the right hand side of (5.4) with respect to a countable number of weakly∗

F2-measurable selections µ2(ω) ∈ A2(ω).

Representation (5.4) means that ρX3|X1can be written in form (3.9) with the set

A(ω) is formed by all measures µ ∈ Y3 representable in the form

µ(S) =∫Ω

[µ2(ω)

](S) dµ1(ω), S ∈ F3, (5.7)

where µ2(·) ∈ A2(·) is a weakly∗ F2-measurable selection and µ1 ∈ A1(ω). We denotethe multifunction A by A1 A2.

Consider now a sequence of sigma algebras (a filtration) F1 ⊂ F2 ⊂ · · · ⊂ FT ,with F1 = ∅, Ω and FT = F . We define linear (locally convex topological vector)spaces X1 ⊂ · · · ⊂ XT of real valued functions on Ω such that all functions in Xt areFt-measurable. We also introduce the corresponding paired spaces Y1 ⊂ ... ⊂ YT ofmeasures, t = 1, . . . , T . Let ρXt|Xt−1

, t = 2, . . . , T , be conditional risk mappings. Note

that since F1 = ∅, Ω, the space X1 is formed by constant over Ω functions and canbe identified with R, and hence ρX2|X1

is an (unconditional) risk function.With the above sequence of conditional risk mappings we associate the following

(unconditional) risk functions

ρt := ρX2|X1 · · · ρXt−1|Xt−2

ρXt|Xt−1, t = 2, . . . , T. (5.8)

The recursive application of Proposition 4 renders the following result.

Theorem 3 Let ρXt+1|Xt, t = 1, . . . , T − 1, be positively homogeneous, lower semi-

continuous, conditional risk mappings. Suppose that the spaces Xt, t = 2, ..., T , areseparable. Then for every X ∈ Xt, t = 2, . . . , T ,

ρt(X) = supµ∈A1···At−1

〈µ, X〉, (5.9)

where each Aτ : Ω ⇒ PYτ+1 is weakly∗ Fτ -measurable and such that[ρXτ+1|Xτ

(X)](ω) = sup

µ∈Aτ (ω)

〈µ, X〉. (5.10)

Note that we always have (A1 A2) A3 = A1 (A2 A3) and therefore there is noambiguity in the notation A1 · · · At−1.

Remark 8 Although formula (5.7) suggests a way for calculating the compositionA1 · · · At−1 in the max-representation (5.9), its practical application is difficult. Itseems that this is not a drawback of that formula but rather a nature of the consideredproblem. In the classical setting of multistage stochastic programming the situation

16

simplifies considerably if the underlying process satisfies the so-called between stagesindependence condition. That is what we discuss next.

Let ξ1, ..., ξT be a sequence of random vectors, ξt ∈ Rdt , on a probability space(Ω,F , P ), representing evolution of random data at times t = 1, ..., T . Let Ft bethe sigma subalgebra of F generated by random vector ξ[t] := (ξ1, ..., ξt), t = 1, ..., T .Clearly, the inclusions F1 ⊂ F2 ⊂ ... ⊂ FT hold. For p ∈ [1, +∞), we assume nowthat each space Xt is formed by functions of ξ[t] with finite p-th moment. That is, every

X ∈ Xt can be represented in the form X(ω) = X(ξ[t](ω)

)and

∫Ω|X|pdP < +∞, i.e.,

Xt = Lp(Ω,Ft, P ). We take then Yt := Lq(Ω,Ft, P ) and use the corresponding scalarproduct of the form (2.3). With a slight abuse of the notation we sometimes writean element X of Xt as X(ξ[t]) and an element h of Yt as h(ξ[t]). In this frameworkthe set Aτ (ω) in the max-representation (5.10) is a function of ξ[t]. It is also makessense here to talk about the “between stages independence” condition in the sensethat random vectors ξt+1 and ξ[t] are independent for t = 1, ..., T − 1. Under thiscondition the dynamic programming equations, which will be discussed in section 7,simplify considerably.

6 Examples of Conditional Risk Mappings

In this section we discuss some examples of conditional risk mappings which can beconsidered as natural extensions of the corresponding examples of (real valued) riskfunctions (see [17]). We use the framework and notation of section 2, and take P tobe a probability measure on (Ω,F2). Unless stated otherwise, all expectations andprobability statements in this section are made with respect to P .

Example 2 Let Xi := Lp(Ω,Fi, P ) and Yi := Lq(Ω,Fi, P ), i = 1, 2, for some p ∈[1, +∞). Consider

ρ(X) := E[X|F1] + c σp(X|F1), X ∈ X2, (6.1)

where c ≥ 0 and σp(·|F1) is the conditional upper semi-deviation:

σp(X|F1) :=(E

[(X − E[X|F1]

)p

+

∣∣F1

])1/p

. (6.2)

If the sigma algebra F1 is trivial, then E[·|F1] = E[·] and σp(X|F1) becomes theupper semi-deviation of X of order p. Thus ρ is the conditional counterpart of themean–pth-semideviation models of [10, 11].

Let us show that for c ∈ [0, 1], the above mapping ρ satisfies assumptions (A1)–(A3). Assumption (A3) can be verified directly. That is, if Y ∈ X1 and X ∈ X2,then

ρ(X + Y ) = E[X + Y |F1] + c(E

[(X + Y − E[X + Y |F1]

)p

+

∣∣F1

])1/p

= E[X|F1] + Y + c(E

[(X − E[X|F1]

)p

+

∣∣F1

])1/p

= ρ(X) + Y.

17

In order to verify assumptions (A1) and (A2) consider function ρω defined in (2.1).For ω ∈ Ω we can write

E[·|F1](ω) = Eµω [·], (6.3)

where µ(ω) = µω is the conditional probability of P with respect to F1 (see Exam-ple 1). Therefore, for any X ∈ X2 and ω ∈ Ω,

ρω(X) = Eµω [X] + c(Eµω

[(X − Eµω [X]

)p

+

])1/p

. (6.4)

We have that µω ∈ PY2|F1(ω) and its (conditional probability) density fω = dµω/dPhas the following properties: fω ∈ Y2, fω ≥ 0, for any A ∈ F2, the function ω 7→∫

AfωdP is F1-measurable and, moreover, for any B ∈ F1, the following equality holds∫

B

∫A

fω(ω) dP (ω) dP (ω) = P (A ∩B).

We see that for a fixed ω the function ρω(X) is identical with the risk functionanalyzed in [17, Example 2]; the conditional measure µω plays the role of the prob-ability measure. It follows from the analysis in [17] that, for c ∈ [0, 1], the functionρω(·) satisfies assumptions (A1) and (A2). Moreover, the representation

ρω(X) = supγ∈A∗∫

ΩγXdµω

holds with

A∗ =

γ = 1 + h−∫Ω

h dµω :∫Ω

hq dµω ≤ cq, h 0

.

Since dµω = fωdP , we conclude that the representation (3.9) follows with

A(ω) =

g ∈ Y2 : g = fω

(1 + h− E[fωh]

), h ∈ cBq(ω), h 0

, (6.5)

whereBq(ω) :=

h ∈ Y2 : E[hqfω] ≤ 1

.

Consider now the framework outlined in Remark 8. That is, there are two randomvectors ξ1, ξ2, the sigma algebras F1 and F2 are generated by ξ1 and (ξ1, ξ2), respec-tively, an element X ∈ X2 is a function of (ξ1, ξ2), and the conditional expectationsin (6.1) and (6.2) are taken with respect to the random vector ξ1. We have then that[ρ(X)](ξ1) is a function of ξ1. Now if ξ1 and ξ2 are independent and every X ∈ X2 is afunction of ξ2 only, then the corresponding conditional expectations are independentof ξ1 and hence [ρ(X)](ξ1) is constant, and

ρ(X) = E[X] + c(E

[(X − E[X]

)p

+

])1/p

.

In that case ρ(X) can be viewed as the mean–pth-semideviation risk function.

18

Example 3 Let Xi := L1(Ω,Fi, P ) and Yi := L∞(Ω,Fi, P ), i = 1, 2. For constantsε1 > 0 and ε2 > 0, consider

ρ(X) := E[X|F1] + Φ(X|F1), X ∈ X2, (6.6)

where [Φ(X|F1)

](ω) := inf

Z∈X1

Eε1[Z −X]+ + ε2[X − Z]+

∣∣F1

(ω). (6.7)

It is straightforward to verify that assumption (A3) holds here. Indeed, for X ∈ X2

and Y ∈ X1 we have[Φ(X + Y |F1)

](ω) = inf

Z∈X1

Eε1[(Z − Y )−X]+ + ε2[X − (Z − Y )]+

∣∣F1

(ω).

By making change of variables Z 7→ Z−Y , we obtain that Φ(X +Y |F1) = Φ(X|F1),and hence assumption (A3) follows.

Because of (6.3), we can write, as in the previous example, that

ρω(X) = Eµω [X] + infZ∈X1

Eµω

ε1[Z −X]+ + ε2[X − Z]+

(6.8)

= E[fωX] + infZ∈X1

Eε1[fωZ − fωX]+ + ε2[fωX − fωZ]+

, (6.9)

where fω = dµω/dP is the conditional density. We can continue now in a way similarto the analysis of Example 3 in [17]. We have that

Eε1[fωZ − fωX]+ + ε2[fωX − fωZ]+

= sup

h∈ME

[h(fωX − fωZ)

], (6.10)

whereM :=

h ∈ Y2 : −ε1 ≤ h(ω) ≤ ε2, a.e. ω ∈ Ω

. (6.11)

By substituting the right hand side of (6.10) into (6.9), we obtain

ρω(X) = E[fωX] + infZ∈X1

suph∈M

E[h(fωX − fωZ)

].

Since the set M is compact in the weak∗ topology of Y2, we can interchange the ‘inf’and ‘sup’ operators in the right hand side of the above equation. Also we have that

infZ∈X1

E[hfωZ

]= inf

Z∈X1

E[Z E[hfω|F1]

]=

0, if E[hfω|F1] = 0,−∞, otherwise.

We obtain that

ρω(X) = E[fωX] + supE[hfωX] : h ∈M, E[hfω|F1] = 0

.

It follows that for ε1 ∈ (0, 1] and ε2 > 0, assumptions (A1) and (A2) are satisfied,and representation (3.9) holds with

A(ω) =g ∈ Y2 : g = (1 + h)fω, h ∈M, E

[hfω|F1

]= 0

, (6.12)

19

where M is defined in (6.11).Since

ε1[Z −X]+ + ε2[X − Z]+ = ε1

(Z + (1− p)−1[X − Z]+ −X

),

where p := ε2/(ε1 + ε2), we have that

ρ(X) = (1− ε1)E[X|F1] + ε1CV@RX2|X1 [X], (6.13)

where

CV@RX2|X1 [X](ω) := infZ∈X1

EZ + (1− p)−1[X − Z]+ | F1

(ω). (6.14)

Clearly, for ε1 = 1 we have that ρ(·) = CV@RX2|X1 [ · ]. By the above analysis weobtain that for p ∈ (0, 1), CV@RX2|X1 [ · ] is a positively homogeneous, continuous riskmapping. If F1 = ∅, Ω, then CV@RX2|X1 [ · ] becomes the Conditional Value at Riskfunction analyzed in [14, 15, 18]. For a nontrivial F1, the measure CV@RX2|X1 wasanalyzed in [12].

7 Multistage risk optimization problems

In order to construct risk models for multistage decision problems, we first introducerecursive risk models for sequences.

As in section 5, consider a sequence of sigma algebras F1 ⊂ F2 ⊂ · · · ⊂ FT , withF1 = ∅, Ω and FT = F , and let X1 ⊂ · · · ⊂ XT be a corresponding sequence oflinear spaces of Ft-measurable functions, t = 1, ..., T . Let ρXt|Xt−1 : Xt → Xt−1 beconditional risk mappings. Denote X := X1×X2×· · ·×XT and X := (X1, X2, . . . , XT ),where Xt ∈ Xt, t = 1, . . . , T , and define a function ρ : X → R as follows:

ρ(X) := X1 + ρX2|X1

[X2 + ρX3|X2

(X3 + . . .

· · ·+ ρXT−1|XT−2

[XT−1 + ρXT |XT−1

(XT

)])].

(7.1)

Since F1 = ∅, Ω, the space X1 can be identified with R, and hence ρ(X) is realvalued. By assumption (A3) we have

XT−1 + ρXT |XT−1(XT ) = ρXT |XT−1

(XT−1 + XT ).

Applying this formula for t = T, T − 1, . . . , 2 we obtain the equation:

ρ(X) = ρT (X1 + ... + XT ), (7.2)

where, similarly to (5.8),

ρt := ρX2|X1 · · · ρXt|Xt−1

, t = 2, ..., T. (7.3)

20

Since each conditional risk mapping ρXt|Xt−1 satisfies (A1)–(A3), it follows that thefunction ρT satisfies (A1)–(A3) as well. Moreover, if conditional risk mappings ρXt|Xt−1

are positively homogeneous, then ρ is positively homogeneous. Assuming furtherthat the spaces Xt are separable and ρXt|Xt−1 are lower semicontinuous, we obtain byTheorem 3 that the following representation holds true

ρ(X) = supµ∈A

Eµ[X1 + ... + XT ], (7.4)

where the set A := A1 ... AT−1 is given by the composition of the multifunctionsAτ : Ω ⇒ PYτ+1 , τ = 1, . . . , T − 1, defined in equation (5.10) of Theorem 3.

It may be of interest to discuss the difference between our approach and a construc-tion in Artzner et al. [2]. In [2] an adapted sequence Xt, t = 1, . . . , T , is viewed asa measurable function on a new measurable space (Ω′,F ′), with Ω′ = Ω×1, . . . , T,and with the sigma algebra F ′ generated by sets of form Bt × t, for all Bt ∈ Ft

and t = 1, . . . , T . Then representation (7.4), for some set A, can be derived fromaxioms of coherent risk measures of [1]. In our setting these axioms correspond to theassumptions (A1)–(A3) for the trivial sigma algebra F1 = Ω′, ∅, and to the positivehomogeneity of the (unconditional) risk function ρ(X). Our approach is via axiomsof conditional risk mappings, which allows for a specific analysis of the structure ofthe set A. This connects the theory of dynamic risk measures with the concept ofconditional probability, which is crucial for the development of dynamic programmingequations.

In applications, we frequently deal with random outcomes Xt ∈ Xt resulting fromdecisions zt in some stochastic system. In order to model this situation, we introducelinear spaces Zt of Ft-measurable functions2 Zt : Ω → Rnt and consider functionsft : Rnt ×Ω → R, t = 1, ..., T . With functions ft we associate mappings Ft : Zt → Xt

defined as follows [Ft(Zt)

](ω) := ft(Zt(ω), ω), Zt ∈ Zt, ω ∈ Ω.

We assume that the functions ft(zt, ω) are random lower semicontinuous3, and thatthe mappings Ft are well defined, i.e., for every Zt ∈ Zt the function ft(Zt(·), ·) be-longs to the space Xt, t = 1, ..., T . We say that the mapping Ft is convex if

[Ft(·)

](ω)

is convex for all ω ∈ Ω. Then for every conditional risk mapping ρXt|Xt−1 , satisfy-ing (A1)–(A3), the function ρXt|Xt−1(Ft(·)) is convex in the sense that the function[ρXt|Xt−1(Ft(·))

](ω) is convex for every ω ∈ Ω. This follows by assumptions (A1) and

(A2) and can be shown in the same way as [17, Proposition 2].Let Z = Z1 ×Z2 × · · · × ZT , and let F : Z → X be defined as

F (Z) := (F1(Z1), . . . , FT (ZT )).

2Note that since F1 is trivial, the space Z1 coincides with Rn1 and elements Z1 ∈ Z1 are n1-dimensional vectors.

3Random lower semicontinuous functions are also called normal integrands (see Definition 14.27in [16, p.676]).

21

With the risk function ρ, defined in (7.1), and the mapping F we can associate thefunction

ϑ(Z) := ρ(F (Z)) = F1(Z1) + ρX2|X1

[F2(Z2) + ρX3|X2

(F3(Z3) + · · ·

· · ·+ ρXT−1|XT−2

[FT−1(ZT−1) + ρXT |XT−1

(FT (ZT )

)])].

As discussed above, by the recursive application of [17, Proposition 2], it can be easilyshown that ϑ(·) is a convex function. Also by using (7.2) and (7.4) we can write

ϑ(Z) = ρT

(F1(Z1) + F2(Z2) + · · ·+ FT (ZT )

)= sup

µ∈A

∫Ω

[f1(Z1) + f2(Z2(ω), ω)... + fT (ZT (ω), ω)

]dµ(ω).

Suppose that we are given Ft-measurable, closed-valued multifunctions

Gt : Rnt−1 × Ω ⇒ Rnt , t = 2, ..., T,

with G1 ⊂ Rn1 being a fixed (deterministic) set. We define the set

S :=Z ∈ Z : Zt(ω) ∈ Gt(Zt−1(ω), ω), ω ∈ Ω, t = 1, . . . , T

,

and consider the problemMinZ∈S

ϑ(Z). (7.5)

We refer to problem (7.5) as the nested formulation of a multistage optimizationproblem. We shall derive dynamic programming equations for this problem.

In order to accomplish that, we need some mild technical assumptions. We assumethat the spaces Xt are solid in the sense that for every two elements X, X ∈ Xt andevery Ft-measurable function Xt satisfying X(·) ≤ Xt(·) ≤ X(·), the function Xt

is an element of Xt. For example, the spaces Lp(Ω,Ft, P ), p ∈ [1, +∞] are solid.Furthermore, we assume that there exist elements X t ∈ Xt such that for all Z ∈ S

we have Ft(Zt) X t, t = 1, . . . , T .

Remark 9 We also need the following result about interchageability of the ‘min’and ‘integral’ operators. Let (Ω,F) be a measurable space, X be a linear space ofF -measurable functions X : Ω → R and M be a a linear space of F -measurablefunctions Z : Ω → Rn. It is said that the space M is decomposable if for everyZ ∈ M and B ∈ F , and every bounded and F -measurable function W : Ω → Rn,the space M also contains the function V (·) := 1lΩ\B(·)Z(·) + 1lB(·)W (·) (Castaingand Valadier [4, p. 197], Rockafellar and Wets [16, p.676]). For example, the spacesLp(Ω,F , P ; Rn), p ∈ [1, +∞), of F -measurable functions Z : Ω → Rn such that∫

Ω‖Z‖pdP < +∞, are decomposable.

22

Let the space M be decomposable and h : Rn × Ω → R be a random lower semi-continuous function. Then for every probability measure µ on (Ω,F) the followinginterchangeability formula holds∫

Ω

infz∈Rn

h(z, ω) dµ(ω) = infZ∈M

∫Ω

h(Z(ω), ω) dµ(ω) (7.6)

(Rockafellar and Wets [16, Theorem 14.60]). Now let ρ : X → R be a risk function.By using monotonicity of ρ, it is possible to extend the interchangeability formula(7.6) to risk functions as follows (cf., [17, Theorem 4]):

ρ

(inf

z∈Rnh(z, ·)

)= inf

Z∈Mρ(HZ), (7.7)

where HZ(ω) := h(Z(ω), ω).

Let us go back to the multistage problem (7.5). We assume that the spaces Zt,t = 1, ..., T , are decomposable. Problem (7.5) can be written in a more explicit formas follows:

MinZ1∈G1

MinZ2(·)∈G2(Z1,·)

. . . MinZT (·)∈GT (ZT−1(·),·)

ρT

[F1(Z1) + F2(Z2) + · · ·+ FT (ZT )

]. (7.8)

Consider the minimization with respect to ZT in the above problem. Since the func-tion ρT is a risk function, and in particular is monotone in the sense of (A2), and ZT

is required to be only FT -measurable, the interchangeability formula (7.7) allows usto carry out this minimization inside the argument of ρT . We obtain the followingequivalent formulation of (7.8):

MinZ1∈G1

MinZ2(·)∈G2(Z1,·)

. . . MinZT−1(·)∈GT−1(ZT−2(·),·)

ρT

[F1(Z1) + F2(Z2) + · · ·

· · ·+ FT−1(ZT−1) + infzT∈GT (ZT−1(·),·)

fT (zT , ·)].

Owing to Ft-measurability of GT and random lower semicontinuity of fT (·, ·), thepointwise infimum infzT∈GT (ZT−1(ω),ω) fT (zT , ω) is FT -measurable (e.g., Rockafellar andWets [16, Theorem 14.37]). By the assumption that XT is solid, this infimum (as afunction of ω) is an element of XT .

Using the fact that

ρt := ρt−1 ρXt|Xt−1, t = 2, ..., T,

we can re-write the last problem as follows:

MinZ1∈G1

MinZ2(·)∈G2(Z1,·)

. . . MinZT−1(·)∈GT−1(ZT−2(·),·)

ρT−1

[F1(Z1) + F2(Z2) + · · ·

· · ·+ FT−1(ZT−1) + ρXT |XT−1

(inf

zT∈GT (ZT−1(·),·)fT (zT , ·)

)].

(7.9)

23

Our argument can be now repeated for T − 1, T − 2, . . . , 1. In order to simplify thenotation, we define the following function

QT (zT−1, ω) :=[ρXT |XT−1

(VT (zT−1)

)](ω), (7.10)

where [VT (zT−1)

](ω) := inf

zT∈GT (zT−1,ω)fT (zT , ω). (7.11)

Repeating our analysis for t = T − 1, ..., 2, we move the minimization with respect toZt inside the argument of ρt. We define

Qt(zt−1, ω) :=[ρXt|Xt−1

(Vt(zt−1)

)](ω), (7.12)

where [Vt(zt−1)

](ω) := inf

zt∈Gt(zt−1,ω)

ft(zt, ω) + Qt+1(zt, ω)

. (7.13)

Of course, equations (7.12) and (7.13) for t can be combined into one equation:[Vt(zt−1)

](ω) = inf

zt∈Gt(zt−1,ω)

ft(zt, ω) +

[ρXt+1|Xt

(Vt+1(zt)

)](ω)

. (7.14)

Finally, at the first stage we solve the problem

Minz1∈G1

Q2(z1), (7.15)

where Q2(z1) := ρX2|X1

(V2(z1)

). Note again that the set G1 and function Q2(z1) are

deterministic, i.e., independent of ω.We can interpret functions Qt(zt−1, ω) as cost-to-go functions and equations (7.12)–

(7.13), or equivalently (7.14), as dynamic programming equations for the multistagerisk optimization problem (7.5).

Suppose now that the conditional risk mappings ρXt|Xt−1 are lower semicontinuousand positively homogeneous. Then it follows from (3.9) that there exist convex closedsets At(ω) ⊂ PYt|Ft−1(ω) such that[

ρXt|Xt−1(Xt)](ω) = sup

µt∈At(ω)

Eµt

[Xt

], ω ∈ Ω, Xt ∈ Xt. (7.16)

Substitution of (7.16) into (7.14) yields the following form of the dynamic program-ming equations:

[Vt(zt−1)

](ω) = inf

zt∈Gt(zt−1,ω)

ft(zt, ω) + sup

µt∈At(ω)

Eµt

[Vt+1(zt)

].

(7.17)

The above dynamic programming equations provide a framework for extendingthe theory of multistage stochastic optimization problems to risk functions. Theonly difference is the existence of the additional ‘sup’ operation with respect to a setof conditional probabilities. This makes the dynamic programming equations moredifficult than in the expected value case, but the problem is much more difficult.

24

Remark 10 Suppose that the conditional risk mappings ρXt|Xt−1 are lower semicon-tinuous and positively homogeneous, and that every space Xt, t = 1, ..., T , is sepa-rable. We have then by Theorem 2 that there exist (countable) families Dt ⊂ PYt ,t = 1, ...., T , of probability measures such that[

ρXt|Xt−1(Xt)](ω) = sup

ν∈Dt

Eν

[Xt

∣∣Ft−1

](ω). (7.18)

Note that equation (7.18) still holds if the set Dt is replaced by D∗t := cl[conv(Dt)],

where the topological closure of the convex hull of Dt is taken in the paired topologyof the space Yt. Since the set PYt is convex and closed in Yt, we have that D∗

t ⊂ PYt .Substitution of (7.18) into (7.14), with Dt replaced by D∗

t , yields the following formof the dynamic programming equations:

[Vt(zt−1)

](ω) = inf

zt∈Gt(zt−1,ω)

ft(zt, ω) + sup

ν∈D∗tEν

[Vt+1(zt)

∣∣Ft−1

](ω)

, (7.19)

Suppose, further, that the problem is convex, i.e., the functions ft(·, ω) and setsGt(zt−1, ω) are convex for all ω ∈ Ω and zt−1. Then under various regularity conditions,the min-max problem in the right hand side of (7.19) has a saddle point (zt, νt) ∈Gt(zt−1, ω) × D∗

t . It follows then that an optimal solution of problem (7.5) satisfiesthe following system of dynamic equations[

Vt(zt−1)](ω) = inf

zt∈Gt(zt−1,ω)

ft(zt, ω) + Eνt

[Vt+1(zt)

∣∣Ft−1

](ω)

, (7.20)

where νt, t = 1, ..., T , can be viewed as worst case distributions. Moreover, for agiven probability measure νt−1 on (Ω,Ft−1) we can construct Dt in such a way thatevery measure νt ∈ Dt, and hence every νt ∈ D∗

t , coincides with νt−1 on Ft−1. In thatway we can construct the worst case distributions νt in a consistent way, i.e., each νt

coincides with νt−1 on Ft−1.

The dynamic programming equations simplify considerably if we assume the be-tween stages independence condition. Following the framework of Remark 8, letξ1, ..., ξT be a sequence of random vectors representing evolution of the data. Sup-pose that each Xt ∈ Xt is a function of ξt, and the objective functions ft(zt, ξt) andmultifunctions Gt(zt−1, ω) are actually functions of ξt. With a slight abuse of notationwe denote them by Gt(zt−1, ξt) (it is also possible to consider dependence on all pre-vious data ξ[t] = (ξ1, ..., ξt)). This implies that the cost-to-go function Qt(zt−1, ξt−1)is a function of zt−1 and ξt−1, and the value function Vt(zt−1, ξt) = [Vt(zt−1)](ξt) is afunction of zt−1 and ξt (we change the notation in a corresponding way). Supposethat the between stages independence condition holds, i.e., ξt and ξt−1 are inde-pendent for t = 2, ..., T . Suppose, further, that [ρXt|Xt−1(Xt)](·) is constant for anyXt ∈ Xt, t = 2, ..., T (its values are constant functions on Ω). Under the betweenstages independence condition, this holds for the conditional mappings discussed in

25

Examples 1–3. Then ρXt|Xt−1 maps every element of Xt into a constant, and henceρXt|Xt−1 = ρt, where ρt is he composite mapping defined in (7.3). Furthermore, thecost-to-go functions

Qt(zt−1) = ρt

(Vt(zt−1)

)(7.21)

are deterministic (independent of ξt−1) and equations (7.14) take on the form

Vt(zt−1, ξt) = infzt∈Gt(zt−1,ξt)

ft(zt, ξt) + ρt+1

(Vt+1(zt)

). (7.22)

For illustation, let ρXt|Xt−1 be the mean–absolute-semideviation risk mapping ofExample 2 (with p = 1). Then the corresponding cost-to-go function, defined in(7.12), can be written as

Qt(zt−1, ξt−1) = E[Vt(zt−1, ξt)

∣∣ξt−1

]+cE

[(Vt(zt−1, ξt)− E

[Vt(zt−1, ξt)

∣∣ξt−1

] )+

∣∣ ξt−1

].

If, moreover, ξt and ξt−1 are independent, then

Qt(zt−1) = E [Vt(zt−1, ξt)] + cE[(

Vt(zt−1, ξt)− E[Vt(zt−1, ξt))

+

].

Then the dynamic programming equations (7.22) become more transparent: at eachstage we minimize the sum of the current cost, ft(zt, ξt) and the (static) mean-semideviation risk function of the next value function Vt+1(zt).

Example 4 Consider the financial planning model governed by the equations

n∑i=1

zit = Wt andn∑

i=1

ξi,t+1zit = Wt+1, t = 0, . . . , T − 1,

where all zit ≥ 0. Here Wt denotes the wealth at stage t, and zit are the positions inassets i = 1, . . . , n at stages t = 0, ..., T − 1. The random multipliers ξi,t+1 representthe changes in the investment values between stages t and t + 1. They are assumedto be non-negative.

Suppose that the between stages independence condition holds for the randomprocess ξ1, ..., ξT , and at every period t we want to maximize ρt[Wt], where ρt is apositively homogeneous risk function. For example, it may be the mean-semideviationor CV@R function. At the last stage, the value function VT (WT−1) is the optimalvalue of the problem

MaxWT , zT−1

ρT (WT )

s.t. WT =∑n

i=1 ξiT zi,T−1,∑ni=1 zi,T−1 = WT−1,

zi,T−1 ≥ 0, i = 1, . . . , n.

(7.23)

26

The wealth at the preceding stage, WT−1, is the parameter of this problem. Since ρT

is positively homogeneous, we see that the optimal value is simply proportional tothe wealth:

VT (WT−1) = WT−1VT (1),

where VT (1) is the (nonnegative) optimal value of (7.23) for WT−1 = 1. We can usethis fact at stage T − 2. Since ρT−1 is positively homogeneous and its argument,VT (WT−1) is linear, by a similar argument we conclude that

VT−1(WT−2) = VT (1)VT−1(1)WT−2,

where VT−1(1) is the optimal value of the problem obtained from (7.23) by replacingT with T − 1. Continuing in this way, we conclude that the optimal solution of thefirst stage problem is obtained by solving problem of the form (7.23) with T = 1.That is, under the assumption of between stages independence, the optimal policy ismyopic and employs single-stage risk models.

Acknowledgement. The authors are indebted to Darinka Dentcheva for helpfuldiscussions regarding weakly measurable selections of multifunctions.

References

[1] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, Coherent measures of risk,Mathematical Finance 9 (1999) 203–228.

[2] P. Artzner, F. Delbaen, J.-M. Eber, D. Heath and H. Ku, Coherent multiperiodrisk mesurement, Manuscript, ETH Zurich, 2003.

[3] P. Billingsley, Probability and Measure, Wiley, New York, 1995.

[4] C. Castaing and M. Valadier, Convex Analysis and Measurable Multifunctions,Springer-Verlag, Berlin, 1977.

[5] P. Cheredito, F. Delbaen and M. Kupper, Coherent and convex risk measuresfor bounded cadlag processes, Working paper, ETH Zurich, 2003.

[6] F. Delbaen, Coherent risk measures on general probability spaces, Essays inHonour of Dieter Sondermann, Springer, 2002.

[7] H. Follmer and A. Schied, Convex measures of risk and trading constraints,Finance and Stochastics 6 (2002) 429–447.

[8] M. Kijima and M. Ohnishi, Mean-risk analysis of risk aversion and wealth effectson optimal portfolios with multiple investment opportunities, Ann. Oper. Res.45 (1993) 147–163.

[9] K. Kuratowski and C. Ryll-Nardzewski, A general theorem on selectors, Bull.Acad. Pol. Sc. 13 (1965) 397–403.

27

[10] W. Ogryczak and A. Ruszczynski, From stochastic dominance to mean–risk mod-els: semideviations as risk measures, European Journal of Operational Research,116 (1999), 33–50.

[11] W. Ogryczak and A. Ruszczynski, On consistency of stochastic dominance andmean–semideviation models, Mathematical Programming, 89 (2001), 217–232.

[12] G. Pflug and A. Ruszczynski, A risk measure for income processes, in: G. Szego(Ed.), Risk Measures for the 21st Century, John Wiley & Sons, 2004.

[13] F. Riedel, Dynamic coherent risk measures, Stochastic Processes and Their Ap-plications 112 (2004) 185–200.

[14] R.T. Rockafellar and S.P. Uryasev, Optimization of conditional value-at-risk,The Journal of Risk, 2 (2000), 21-41.

[15] R.T. Rockafellar, S. Uryasev and M. Zabarankin, Deviation measures in riskanalysis and optimization, Research Report 2002-7, Department of Industrialand Systems Engineering, University of Florida.

[16] R.T. Rockafellar and R.J-B. Wets, Variational Analysis, Springer-Verlag, Berlin,1998.

[17] A. Ruszczynski and A. Shapiro, Optimization of convex risk functions, E-printavailable at: http://www.optimization-online.org, 2004.

[18] A. Shapiro and S. Ahmed, On a class of minimax stochastic programs, SIAMJournal on Optimization, 14 (2004), 1237-1249.

28

Conditional Risk Mappings

Documents