VOLUMEN 11 NÚMERO 1 ENERO A JUNIO DE 2007 ISSN: 1870-6525
Morfismos Comunicaciones Estudiantiles Departamento de Matem´aticas Cinvestav
Editores Responsables • Isidoro Gitler • Jes´ us Gonz´alez
Consejo Editorial • Luis Carrera • Samuel Gitler • On´esimo Hern´andez-Lerma • Hector Jasso Fuentes • Miguel Maldonado • Ra´ ul Quiroga Barranco • Enrique Ram´ırez de Arellano • Enrique Reyes • Armando S´anchez • Mart´ın Solis • Leticia Z´arate
Editores Asociados • Ricardo Berlanga • Emilio Lluis Puebla • Isa´ıas L´opez • Guillermo Pastor • V´ıctor P´erez Abreu • Carlos Prieto • Carlos Renter´ıa • Luis Verde
Secretarias T´ecnicas • Roxana Mart´ınez • Laura Valencia ISSN: 1870 - 6525 Morfismos puede ser consultada electr´onicamente en “Revista Morfismos” en la direcci´on http://www.math.cinvestav.mx. Para mayores informes dirigirse al tel´efono 57 47 38 71. Toda correspondencia debe ir dirigida a la Sra. Laura Valencia, Departamento de Matem´aticas del Cinvestav, Apartado Postal 14-740, M´exico, D.F. 07000 o por correo electr´onico: laura@math.cinvestav.mx.
VOLUMEN 11 NÚMERO 1 ENERO A JUNIO DE 2007 ISSN: 1870-6525
Informaci´ on para Autores El Consejo Editorial de Morfismos, Comunicaciones Estudiantiles del Departamento de Matem´ aticas del CINVESTAV, convoca a estudiantes de licenciatura y posgrado a someter art´ıculos para ser publicados en esta revista bajo los siguientes lineamientos: • Todos los art´ıculos ser´ an enviados a especialistas para su arbitraje. No obstante, los art´ıculos ser´ an considerados s´ olo como versiones preliminares y por tanto pueden ser publicados en otras revistas especializadas. • Se debe anexar junto con el nombre del autor, su nivel acad´ emico y la instituci´ on donde estudia o labora. • El art´ıculo debe empezar con un resumen en el cual se indique de manera breve y concisa el resultado principal que se comunicar´ a. • Es recomendable que los art´ıculos presentados est´ en escritos en Latex y sean enviados a trav´ es de un medio electr´ onico. Los autores interesados pueden obtener el foron web mato LATEX 2ε utilizado por Morfismos en “Revista Morfismos” de la direcci´ http://www.math.cinvestav.mx, o directamente en el Departamento de Matem´ aticas del CINVESTAV. La utilizaci´ on de dicho formato ayudar´ a en la pronta publicaci´ on del art´ıculo. • Si el art´ıculo contiene ilustraciones o figuras, ´ estas deber´ an ser presentadas de forma que se ajusten a la calidad de reproducci´ on de Morfismos. • Los autores recibir´ an un total de 15 sobretiros por cada art´ıculo publicado.
• Los art´ıculos deben ser dirigidos a la Sra. Laura Valencia, Departamento de Matem´ aticas del Cinvestav, Apartado Postal 14 - 740, M´ exico, D.F. 07000, o a la direcci´ on de correo electr´ onico laura@math.cinvestav.mx
Author Information Morfismos, the student journal of the Mathematics Department of the Cinvestav, invites undergraduate and graduate students to submit manuscripts to be published under the following guidelines: • All manuscripts will be refereed by specialists. However, accepted papers will be considered to be “preliminary versions” in that authors may republish their papers in other journals, in the same or similar form. • In addition to his/her affiliation, the author must state his/her academic status (student, professor,...). • Each manuscript should begin with an abstract summarizing the main results.
• Morfismos encourages electronically submitted manuscripts prepared in Latex. Authors may retrieve the LATEX 2ε macros used for Morfismos through the web site http://www.math.cinvestav.mx, at “Revista Morfismos”, or by direct request to the Mathematics Department of Cinvestav. The use of these macros will help in the production process and also to minimize publishing costs. • All illustrations must be of professional quality.
• 15 offprints of each article will be provided free of charge.
• Manuscripts submitted for publication should be sent to Mrs. Laura Valencia, Departamento de Matem´ aticas del Cinvestav, Apartado Postal 14 - 740, M´ exico, D.F. 07000, or to the e-mail address: laura@math.cinvestav.mx
Lineamientos Editoriales “Morfismos” es la revista semestral de los estudiantes del Departamento de Matem´ aticas del CINVESTAV, que tiene entre sus principales objetivos el que los estudiantes adquieran experiencia en la escritura de resultados matem´ aticos. La publicaci´ on de trabajos no estar´ a restringida a estudiantes del CINVESTAV; deseamos fomentar tambi´en la participaci´ on de estudiantes en M´exico y en el extranjero, as´ı como la contribuci´ on por invitaci´ on de investigadores. Los reportes de investigaci´ on matem´ atica o res´ umenes de tesis de licenciatura, maestr´ıa o doctorado pueden ser publicados en Morfismos. Los art´ıculos que aparecer´ an ser´ an originales, ya sea en los resultados o en los m´etodos. Para juzgar ´esto, el Consejo Editorial designar´ a revisores de reconocido prestigio y con experiencia en la comunicaci´ on clara de ideas y conceptos matem´ aticos. Aunque Morfismos es una revista con arbitraje, los trabajos se considerar´ an como versiones preliminares que luego podr´ an aparecer publicados en otras revistas especializadas. Si tienes alguna sugerencia sobre la revista hazlo saber a los editores y con gusto estudiaremos la posibilidad de implementarla. Esperamos que esta publicaci´ on propicie, como una primera experiencia, el desarrollo de un estilo correcto de escribir matem´ aticas.
Morfismos
Editorial Guidelines “Morfismos” is the journal of the students of the Mathematics Department of CINVESTAV. One of its main objectives is for students to acquire experience in writing mathematics. Morfismos appears twice a year. Publication of papers is not restricted to students of CINVESTAV; we want to encourage students in Mexico and abroad to submit papers. Mathematics research reports or summaries of bachelor, master and Ph.D. theses will be considered for publication, as well as invited contributed papers by researchers. Papers submitted should be original, either in the results or in the methods. The Editors will assign as referees well–established mathematicians. Even though Morfismos is a refereed journal, the papers will be considered as preliminary versions which could later appear in other mathematical journals. If you have any suggestions about the journal, let the Editors know and we will gladly study the possibility of implementing them. We expect this journal to foster, as a preliminary experience, the development of a correct style of writing mathematics.
Morfismos
´ ˜ de existencia de Con este numero festejamos los primeros diez anos Morfismos. Con el apoyo decidido del personal acad´emico, estudiantil, t´ecnico y administrativo del Departamento de Matem´aticas del CINVES´ de los autores y revisores que activa TAV, as´ı como con la participacion y entusiastamente han colaborado durante este tiempo, Morfismos se ha ´ del conociubicado como un medio de la m´as alta calidad para la difusion miento acad´emico. Gracias a todos.
Feliz D´ecimo Aniversario, Morfismos!
We celebrate with this issue the tenth anniversary of Morfismos. Thanks to the support of staff and students at the Mathematics Department of CINVESTAV, as well as to the very valuable collaboration of authors and referees, nowadays Morfismos has been recognized as a high quality journal for the communication of results in Mathematics. Thanks to all of them.
Happy Tenth Anniversary, Morfismos!
Contenido Dice games and stochastic dynamic programming Henk Tijms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Average optimality for semi-Markov control processes Anna Ja´skiewicz and Andrzej S. Nowak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Real options on consumption in a small open monetary economy: a stochastic optimal control approach Francisco Venegas-Mart´ınez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Morfismos, Vol. 11, No. 1, 2007, pp. 1–14
Dice games and stochastic dynamic programming ∗ Henk Tijms
Abstract This paper uses dice games such as the game of Pig and the game of Hog to illustrate the powerful method of stochastic dynamic programming. Many students have difficulties in understanding the concepts and the solution method of stochastic dynamic programming, but using challenging dice games this understanding can be greatly enhanced and the essence of stochastic dynamic programming can be explained in a motivating way.
2000 Mathematics Subject Classification: 90C40, 91A15, 91A05. Keywords and phrases: stochastic dynamic programming, game of Pig, game of Hog, heuristic rule, optimal rule, two-person game, television game show.
1
Introduction
It is a great pleasure to make as Dutch mathematician this invited contribution about stochastic dynamic programming at the occasion of the 10th anniversary of the journal Morfismos of Mexican mathematics students: where a Dutch school of researchers made significant contributions to the field stochastic dynamic programming in the seventies and the eighties of the last century, it was in the nineties the Mexican school of researchers under leadership of professor On´esimo Hern´andezLerma that was very influential in the further development of the field. In this contribution we consider stochastic problems that are fun and instructive to work on. These problems are the dice game Pig and the ∗
Invited Article.
1
2
Henk Tijms
related dice game Hog. The game of Pig and the game of Hog are not only teaching treasures but involve challenging research problems as well. These control problems are of pedagogical use for • stochastic dynamic programming • Markov chains • game theory. The dice games of Pig and Hog are simple to describe, but it is not that simple at all to find the optimal strategies. Let us first describe the games. The game of Pig The game of Pig involves two players who in turn roll a die. The object of the game is to be the first player to reach 100 points. In each turn, a player repeatedly rolls a die until either a 1 is rolled or the player holds. If the player rolls a 1, the player gets a score zero for that turn and it becomes the opponent’s turn. If the player holds after having rolled a number other than 1, the total number of points rolled in that turn is added to the player’s total score and it becomes the opponent’s turn. At any time during a player’s turn, the player must choose between the two decisions “roll” or “hold”. The game of Hog The game of Hog (fast Pig) is a variation of the game of Pig in which players have only one roll per turn but may roll as many dice as desired. The number of dice a player chooses to roll can vary from turn to turn. The player’s score for a turn is zero if one or more of the dice come up with the face value 1. Otherwise, the sum of the face values showing on the dice is added to the player’s score. We will first analyze the single-player versions of the two stochastic control problems. For various optimality criteria in the single player problem, the stochastic dynamic programming approach for calculating an optimal control rule will discussed. The optimal control rule is rather complex and therefore its performance will also be compared with the performance of a simple heuristic rule.
Dice Games and Stochastic Dynamic Programming
2
3
The game of Pig
We first consider the single-player version of the game of Pig before we discuss the dynamic programming approach the case with two players. In the two-player’s case the goal is to be the first player reaching 100 points. For the single-player version the following two optimality criteria can be considered: • minimal expected number of turns to reach 100 points • maximal probability of reaching 100 points in a given number of turns. The optimal control rules can be calculated from the optimality equations from stochastic dynamic programming, but these optimal rules are rather complex and difficult to use in practice. Therefore we also consider the simple “hold at 20” heuristic and compare the performance of this heuristic with the performance of the optimal rule. The “hold at 20” rule is as follows: after rolling a number other than 1 in the current turn, the player holds that turn when the accumulated number of points during the turn is 20 or more. The rationale of this simple heuristic is easily explained. Suppose that k points have been accumulated so far in the current turn. If you roll again, the expected number of points you gamble away is 16 × k, while the expected number of additional points you gain is equal to 56 × 4, using the fact the expected value of the outcome of a roll of a die is 4 given that the outcome is not 1. The first value of k for which 16 × k ≥ 56 × 4 is k = 20. It turns out that the “hold at 20” heuristic performs very well the criterion is to minimize the expected number of turns to reach 100 points. As will be shown below, the expected value of the number of turns to reach 100 point is 12.367 when the “hold at 20” heuristic is used and this lies only 0.7% above the minimal expected value 12.545 that results when an optimal control rule is used. The situation is different for the criterion of maximizing the probability of reaching 100 points within a given number of turns. Under the “hold at 20” heuristic the probability of reaching 100 points within N turns has the respective values 0.0102, 0.0949, 0.3597, 0.7714, and 0.9429 for N =5, 7, 10, 15, and 20, whereas this probability has the maximum values 0.1038, 0.2198, 0.4654, 0.8322, and 0.9728 when an optimal rule is used. Thus, the “hold at 20” heuristic performs unsatisfactorily for the second optimality criterion.
4
2.1
Henk Tijms
Analysis of the heuristic rule
The analysis for the “hold at 20” heuristic is based on recurrence relations that are derived by arguments used to analyze absorbing Markov chains. Define µi as the expected value of the number of turns needed to reach a total score of 100 points when starting a new turn with a score of i points and using the “hold at 20” rule. The goal is to find µ0 . For a=0, 20, 21, 22, 23, 24, and 25, denote by α0,a the probability that the player will end up with exactly a points in any given turn under the “hold at 20” rule. Once the probabilities α0,a have been computed, we calculate the µi by a backwards recursion. By the law of conditional expectations, µi = 1 + µi α0,0 +
25 !
µi+a α0,a
for i = 99, 98 . . . , 0.
a=20
with the convention µk = 0 for k ≥ 100. Thus, initiating the recursion with µ99 = 1 + µ99 α0,0 , we compute successively µ99 , µ98 , . . . , µ0 . How to calculate the probabilities α0,a ? This goes along the same lines as the computation of the absorption probabilities in a Markov chain with absorbing states. For any fixed a, we use the intermediary probabilities αb,a for 0 ≤ b ≤ 19, where αb,a is defined as the probability that the current turn will end up with exactly a points when so far b points have been accumulated during the current turn and the “hold at 20 rule” is used. For a = 0, we find by the law of conditional probabilities that 6
αb,0
1 1 ! = + αb+j,0 6 6
for b = 19, 18, . . . , 0
j=2
with the convention αk,0 = 0 for k ≥ 20. For any a with 20 ≤ a ≤ 25, we find by conditioning that αb,a =
6 ! j=2
αb+j,a
1 6
for b = 19, 18, . . . , 0
with the convention αk,a = 1 for k = a and αk,a = 0 for k ≥ 20 and k ̸= a. Applying these recursion equations, we find α0,0 = 0.6245, α0,20 = 0.0997, α0,21 = 0.0950, α0,22 = 0.0742, α0,23 = 0.0542, α0,24 = 0.0352, and α0,25 = 0.0172. Next the value µ100 = 12.637 is calculated for the expected number of turns needed to reach 100 points if the “hold at 20 rule” is used.
Dice Games and Stochastic Dynamic Programming
5
How do we calculate the probability of reaching 100 points in no more than N turns under the “hold at 20” heuristic? To do so, we define Qn (i) for i < 100 and n ≥ 1 the probability Qn (i) as the probability of reaching 100 points in no more than n turns when the first turn is started with a score of i points and the “hold at 20 rule” is used . Also, let Qn (i) = 1 for any i ≥ 100 and n ≥ 1. If no more than a given number of N turns are allowed, the desired probability is QN (0). Using the law of conditional probabilities, it follows that the probabilities Qn (i) for n = 1, 2, . . . can be computed from the recursion Qn (i) = Qn−1 (i)α0,0 +
25 !
Qn−1 (i + a)α0,a
a=20
for i < 100 and n ≥ 1 with the boundary condition Q0 (j) = 1 for j ≥ 100 and Q0 (j) = 0 for j < 100.
2.2
Dynamic programming for the single-player version
In the optimality analysis of the single-player version, a state variable should be defined together with a value function. The state s of the system is defined by a pair s = (i, k), where i = the player’s score at the start of the current turn k = the number of points obtained so far in the current turn. We first consider the criterion of minimizing the expected number of turns to reach 100 points. For this criterion, the value function V (s) is defined by V (s) = the minimal expected value of the number of turns including the current turn to reach 100 points starting from state s We wish to compute V (0, 0) together with the optimal decision rule. This can be done from Bellman’s optimality equations. For k = 0, 6
!1 1 V (i, r). V (i, 0) = 1 + V (i, 0) + 6 6 r=2
For k ≥ 1 and i + k < 100, "
# 6 ! 1 1 V (i, k) = min V (i + k, 0), V (i, 0) + V (i, k + r) , 6 6 r=2
6
Henk Tijms
where V (i, k) = 0 for those (i, k) with i + k ≥ 100. The first term in the right side of the last equation corresponds to the decision “hold” and the second term corresponds to the decision “roll”. The optimality equation can be solved by the method of successive substitutions. Starting with V0 (s) = 0 for all s, the functions V1 (s), V2 (s), . . . are recursively computed from 6
!1 1 Vn−1 (i, r), Vn (i, 0) = 1 + Vn−1 (i, 0) + 6 6
n = 1, 2, . . .
r=2
and # 6 ! 1 1 Vn−1 (i, k + r) , Vn (i, k) = min Vn−1 (i + k, 0), V (i, 0) + 6 6 "
r=2
n = 1, 2, . . . .
By a basic result from the theory of stochastic dynamic programming (see for instance [1] or [2]), lim Vn (s) = V (s)
n→∞
for all s.
In the literature bounds are known for the difference |Vn (s) − V (s)|, providing a stopping criterion for the method of successive substitutions. Let us next consider the optimality criterion of maximizing the probability of reaching 100 points in no more than N turns with N a given integer. Then, we define for m = 0, 1, . . . , N the value function Pm (s) by Pm (s)
=
the maximal probability of reaching 100 points from state s if no more than m turns can be used including the current turn,
where Pm (s) = 1 for all s = (i, k) with i + k ≥ 100. The desired probability PN (0, 0) and the optimal decision rule can be calculated from Bellman’s optimality equation. For k = 0 and i = 99, 98, . . . , 0 6
!1 1 Pm (i, 0) = Pm−1 (i, 0) + Pm (i, r), 6 6 r=2
m = 1, . . . , N
Dice Games and Stochastic Dynamic Programming
7
and for i = 98, 97, . . . , 0 and k = 99 − i, . . . , 1 !
# 6 " 1 1 Pm (i, k) = min Pm−1 (i + k, 0), Pm−1 (i, 0) + Pm (i, k + r) , 6 6 r=2
1 ≤ m ≤ N.
The value functions P1 (s), P2 (s), . . . , PN (s) can be recursively calculated, using the fact that Pm (i, k) = 1 if i + k ≥ 100 and starting with $ 1 if i + k ≥ 100 P0 (i, k) = 0 if i + k < 100.
2.3
Dynamic programming for the two-players case
To conclude this section, we consider for the game of Pig the case of two players. The players alternate in taking turns rolling the die. The first player to reach 100 points is the winner. Since there is an advantage in going first in Pig, it is assumed that a toss of a fair coin decides which player begins in the game of Pig. Then, under optimal play of both players, each player has a probability of 50% of being the ultimate winner. But how to calculate the optimal decision rule. By the assumption that players alternate in taking turns rolling the die, the optimal decision rule can be computed by using standard dynamic programming techniques. In the final section of this paper we will consider a variant of the game of Hog in which in each round the two players have to decide simultaneously how many dice to roll, where the players cannot observe each other’s decision. Such a variant with simultaneous actions of both players in the same turn can also be considered for the game of Pig. Then, methods from standard dynamic programming cannot be longer used but instead one should use much more involved methods from game theory. The dynamic programming solution for the game of Pig with two players who alternate in taking turns proceeds as follows. The state s is defined by s = ((i, k), j), where (i, k) indicates that the player whose turn it is has a score i and has k points accumulated so far in the current turn and j indicates that the opponent’s score is j. Define the value function P (s) by P (s)
=
the probability of the player winning whose turn it is given that the present state is state s,
8
Henk Tijms
where P (s) is taken to be equal to 1 for those s = ((i, k), j) with i + k ≥ 100 and j < 100. To write down the optimality equations, we use the simple observation that the probability of a player winning after rolling a 1 or holding is one minus the probability that the other player will win beginning with the next turn. Thus, for state s = ((i, k), j) with k = 0, 6
!1 1 P ((i, 0), j) = [1 − P ((j, 0), i)] + P ((i, r), j). 6 6 r=2
For state s = ((i, k), j) with k ≥ 1 and i + k, j < 100, " P ((i, k), j)) = min 1 − P ((j, 0), i + k),
# 6 ! 1 1 [1 − P ((j, 0), i)] + P ((i, k + r), j) , 6 6 r=2
where the first expression in the right side of the last equation corresponds to the decision “hold” and the second expression corresponds to the decision “roll”. Using the method of successive substitution, these optimality equations can be numerically solved, yielding the optimal decision to take in any state s = ((i, k), j).
3
The game of Hog
We first give the analysis for the single-player version of the game. In the game of Hog (Fast Pig) the player has to decide in each turn how many dice to roll simultaneously. A similar heuristic as the “hold at 20” rule manifests itself in the game of Hog (Fast Pig). This heuristic is the “five dice” rule that prescribe to roll five dice in each turn. The rationale of this rule is as follows: five dice are the optimal number of dice to roll when the goal is to maximize the expected value of the score in a single turn. The expected value of the total score in a single turn with d dice is (1 − (5/6)d ) × 0 + (5/6)5 × 4d and this expression is maximal for d = 5. The number of turns needed to reach 100 points has the expected value 13.623 when the “five dice” rule is used, while the expected value of the number of turns needed to reach 100 points has the value 13.039 when an optimal decision rule is used. Again, a very good performance of the heuristic rule when the criterion is to minimize the expected number of
9
Dice Games and Stochastic Dynamic Programming
turns. However, the story is different when the criterion is to maximize the probability of reaching 100 points in no more than N turns with N given. This probability has the respective values 0.0056, 0.0610, 0.2759, 0.6993, and 0.9159 for N =5, 7, 10, 15, and 20 when the “five dice” rule is used, while the respective values are 0.0869, 0.1914, 0.4240, 0.8004, and 0.9631 under an optimal rule.
3.1
Analysis for the single-player version
For both the criterion of the expected number of turns to reach 100 points and the criterion of the probability to reach 100 points in a given number of turns, we will give a unified analysis that covers both the heuristic rule and the optimal rule. Instead of taking the state as the current score of the player, it is convenient to define the state as the number of points the player still needs to reach the goal when a new turn is about to begin. The decision d in any state s prescribes to roll simultaneously d dice. Denoting the set of possible decisions in state s by D(s), we can give a unified analysis by taking D(s) = {5} for the analysis of the “five dice” rule and taking D(s) = {1, 2, . . . , D} for the analysis of an optimal rule, where D is finite but large number. A key (d) ingredient in the computations are the probabilities qi to be defined by (d)
qi
=
the probability of obtaining i points in a turn when the decision is to roll d dice, (d)
To calculate this probabilities, we need the probability ri which is defined as the conditional probability that a roll of d dice gives i points given that no 1s are rolled. Using the fact that the conditional distribution of the outcome of the roll of a single die is uniformly distributed on the integers 2, . . . , 6 given that the outcome is not 1, it follows that (d) the ri can be recursively calculated from the convolution formula (d)
ri
=
6 ! 1 j=2
5
(d−1)
ri−j
for i = 2d, 2d + 1, . . . , 6d, (0)
(0)
(d)
and ri
= 0 otherwise, (d)
with the convention r0 = 1 and ri = 0 for i ̸= 0. Next, the qi follow from " #d " #d 5 5 (d) (d) (d) and qi = ri for i, d = 1, 2, . . . . q0 = 1 − 6 6
10
Henk Tijms
For the criterion of the expected number of turns to reach the goal, we define the value-function V (i) as the minimal expected number of additional turns to get i additional points when using the decision sets D(i) (in case D(i) = {5} for all i, the minimal expected number should of course be read as the expected number). The goal is to calculate V (100). Then, letting V (i) = 0 for i ≤ 0, we have the dynamic programming equation: ! # 6d " (d) 1 + q0 V (i) + V (i) = min qr(d) V (i − r) d∈D(i)
r=2d
or, equivalently, V (i) = min
d∈D(i)
!
1 (d)
1 − q0
$
1+
6d "
r=2d
qr(d) V (i − r)
%#
.
The function values V (i) can be computed recursively for i = 1, . . . , 100. For the criterion of the probability of reaching the goal within a given number of N turns, the value function Pm (i) is defined as the maximal probability to get i additional points when no more than m turns are allowed, where m runs from 1 to N . We wish to find PN (100). Letting Pm (i) = 1 for i ≤ 0, we have the dynamic programming equation: Pm (i) = min
d∈D(i)
!
(d) q0 Pm−1 (i)
+
6d "
qr(d) Pm−1 (i
r=2d
− r)
#
.
The recursion is initiated with the boundary condition P0 (i) = 1 for i ≤ 0 and P0 (i) = 0 for i > 0.
3.2
Analysis for the case of two players
To conclude this section, we consider for the game of Hog the original case of two players. The players alternate in taking turns rolling the die. The first player to reach 100 points is the winner. Since there is an advantage in going first in Hog, it is assumed that a toss of a fair coin decides which player begins in the game of Hog. The dynamic programming solution for the game of Hog with two players who alternate in taking turns proceeds as follows. The state defined as s = (i, j), where i indicates the number of points the player whose turn it is still needs for the winning score and j indicates the number of points the opponent
11
Dice Games and Stochastic Dynamic Programming
still needs for the winning score. Define the value function P (s) as the win probability of the player whose turn it is given that the present state is state s and both players act optimally in each turn. Then, for the states (i, j) with i, j > 0, the optimality equation is P (i, j) = max
d=1,...,D
!
(d)
q0 [1 − P (j, i)] +
6d "
r=2d
qr(d) [1 − P (j, i − r)]
#
,
with the convention P (j, k) = 0 for j > 0 and k ≤ 0, where D denotes the largest number of dice that can be rolled.
4
A game-theoretic problem
This section considers a variant of the game of Hog, where the two players have to take simultaneously a decision in each round of the game. This variant deals with a television game show and the problem is as follows. At the end of the television game show the two remaining contestants have to play a dice game. The contestants each sit behind a panel with a battery of buttons numbered as 1, 2, . . . , D, say D=10. In each stage of the game, both contestants must simultaneously press one of the buttons, where the contestants cannot observe each other’s decision. The number on the button pressed by the contestant is the number of dice that are thrown for the contestant. For each contestant the score of the throw for that contestant is added to his/her total, provided that none of the dice in that throw showed the outcome 1; otherwise no points are added to the current total of the candidate. The candidate who first reaches a total of 100 points is the winner. In case both candidates reach the goal of 100 points in the same move, the winner is the candidate who has the largest total. In the event of a tie, the winner is determined by a toss of a fair coin. At each stage of the game both candidates have full information about his/her own current total and the current total of the opponent. What does the optimal strategy look like? The computation and the structure of an optimal strategy is far more complicated than in the problems discussed before. The optimal rules for the decision problems considered before were deterministic, but the optimal strategy will involve randomized actions for the problem of the television game show. In zero-sum games randomization is a key ingredient of the optimal strategy.
12
Henk Tijms
We will give only an outline of the solution procedure. The rules of the game state that in each round the two players have to decide at the same moment upon the number of dice to use, so without seeing what the opponent is doing but knowing and using the scores so far. So, after a number of rounds player 1 still needs a points and player 2 needs b points. This describes the state of the system. If now player 1 decides to use k dice and player 2 uses l dice, then the state changes from (a, b) (k) (l) into (a − i, b − j) with probability qi qj . The game is a stochastic terminating zero-sum game. The value of the game is defined as the probability that player 1 will win minus the probability that player 2 will win, given that both players play optimally. Define ⎧ if a < b and a ≤ 0 ⎨ 1 0 if a = b ≤ 0 V (a, b) = ⎩ −1 if a > b and b ≤ 0. We want to determine V (a, b) for both a and b positive and the optimal, possibly randomized, actions that guarantee this value. The value of the game and the optimal moves of the two players can be computed by repeatedly solving the appropriate matrix games. Let x = (x1 , x2 , . . . , xD ) be a randomized move for player 1, i.e., player 1 rolls d dice with probability xd , where $ xd = 1. d
The first approach to think off is to recursively compute V (a, b) via a sequence of LP -problems, starting in (a, b) = (1, 1) and working backwards, step by step, until (a, b) = (G, G) with G = 100. This requires to solve the optimization problem: maximize V $ d
⎛
xd ⎝
$
i+j>0
subject to
(d) (l)
(d) (l)
⎞
qi qj V (a − i, b − j) + q0 q0 V ⎠ ≥ V, l = 1, . . . , D, xd ≥ 0, d = 1, . . . , D,
$
xd = 1 ,
d
where, for i + j > 0, the values V (a − i, b − j) have been computed before and hence are known. (V is unrestricted in sign.) However,
Dice Games and Stochastic Dynamic Programming
13
this optimization problem is not exactly an LP -problem because of the nonlinear term ! (d) (l) xd q0 q0 V. d
To make an LP -approach possible, we proceed as follows. Define V as the value of the game if it is played at most n times with a terminal reward 0, if after n steps the game has not yet reached the payoff-zone. Thus, V (0) (a, b) := 0 if a > 0 and b > 0. Also, define (n) (a, b)
V (n) (a, x, b, l) =
! d
xd
! i,j
(d) (l)
qi qj V (n−1) (a − i, b − j), n > 0 ,
with the convention that, for n ≥ 0 and a ≤ 0 or b ≤ 0, V (n) (a, b) = V (a, b). Then, in iteration n for state (a, b), the value of the game and an optimal move for player 1 can be obtained from the following LP -problem for a matrix game: maximize V
subject to
V (n) (a, x, b, l) ≥ V, l = 1, . . . , D, ! xd ≥ 0, d = 1, . . . , D, xd = 1. d
The optimal value V satisfies V = V (n) (a, b) and the optimal x(n) (a, b) represents an optimal move for player 1 in state (a, b) in iteration n. V (n) (a, x, b, l) converges exponentially fast to the value of the game, and x(n) is nearly optimal for n sufficiently large. Of course, for reasons of symmetry, the optimal move for player 2 in state (a, b) is the same as the optimal move for player 1 in state (b, a). The computations for an optimal strategy are formidable for larger values of D with D being the maximum number of dice that can be rolled. The computations reveal that the optimal strategy uses indeed randomized actions. For example, for the case of D = 5, player 1 uses 2, 4 or 5 dice with respective probabilities 0.172, 0.151 and 0.677 when player 1 still needs 1 point and player 2 still needs 3 points. Also, the numerical calculations reveal a kind of turnpike result: for states (i, j) sufficiently far from (0, 0) the players use non-randomized decisions only (for example in state (5,13) in which player 1 still needs 5 points and player 2 still needs 13 points, player 1 uses 4 dice and player 2 uses 5 dice when D = 5). It would be nice to have a theoretical proof of this intuitively obvious turnpike result
14
Henk Tijms
as well to have a theoretical proof of certain monotonicity properties of the optimal strategy. There are various modifications of the television game show possible. To mention a few: 1. Suppose that a player gets not only a score 0 but also loses all (or some of) the points collected so far if there is an outcome 1 in the throw of his dice. 2. Suppose the players know the outcomes of their own throws, but don’t know what the other player has been doing at all. This is a game with imperfect information. Is it possible to determine an optimal strategy? 3. Suppose that, in addition to the previous situation, you also know how many dice your opponent has used. This too is a game with imperfect information. Henk Tijms Department of Econometrics and Operations Research, Vrije University, Amsterdam, The Netherlands, tijms@feweb.vu.nl
References [1] Derman C., Finite State Markovian Decision Problems, Academic Press, New York, 1970. [2] Hern´andez-Lerma O., Adaptive Markov Control Processes, Springer Verlag, New York, 1989. [3] Neller T. W.; Presser C. G. M., Optimal play of the dice game Pig, The UMAP Journal 25 (2004), 25–47. See also the material on the website http://cs.gettysburg.edu/projects/pig/ [4] Tijms, H. C., Understanding Probability, Chance Rules in Everyday Life, 2nd edition, Cambridge University Press, Cambridge, 2007. [5] Tijms H. C.; Van der Wal J., A real-world stochastic two-person game, Probability in the Engineering and Informational Sciences 20 (2006), 599–608.
Morfismos, Vol. 11, No. 1, 2007, pp. 15–36
Average optimality for semi-Markov control processes ∗ Anna Ja´skiewicz
Andrzej S. Nowak
Abstract This paper is a survey of some recent results on the average cost optimality equation for semi-Markov control processes. We assume that the embedded Markov chain is V -geometric ergodic and show that there exist a solution to the average cost optimality equation as well as an (ε-)optimal stationary policy. Moreover, we also prove the equivalence of two optimality cost criteria: ratioaverage and time-average, in the sense that they lead to the same optimal costs and (ε-)optimal stationary policies.
2000 Mathematics Subject Classification: 90C40, 93E20, 60K15. Keywords and phrases: Borel state space, the average cost optimality equation.
1
Introduction
In this paper we deal with the ratio-average and time-average cost optimality criteria for semi-Markov control processes on a Borel space. First, we only assume that the one-step cost function is lower semianalytic, and the transition probability function satisfies certain ergodicity conditions. For such a model, a lower semianalytic solution to the optimality equation and a universally measurable ε-optimal stationary policy are shown to exist with respect to the ratio-average cost criterion. Next we indicate that additional regularity assumptions (either (B1) or (B2)) allow us to obtain either a Borel measurable or continuous solution to the optimality equation. Moreover, we show that in these cases there exists a Borel measurable optimal stationary policy. ∗
Invited article
15
16
Anna Ja´skiewicz and Andrzej S. Nowak
In order to establish the optimality equation we apply a fixed point theorem. The idea of using this method for semi-Markov control processes satisfying general ergodicity assumptions belongs to Vega-Amaya [26]. He solved the optimality equation in the case when regularity assumptions (B1) are satisfied. Then, his concept was applied by Ja´skiewicz [11] to semi-Markov models with lower semicontinuous cost functions and weakly continuous transition laws (assumptions (B2)). However, one can imagine examples of (semi-)Markov control processes with weakly continuous transition probabilities and one-step payoff functions, which are not neccessarily lower semicontinuous [12]. Moreover, there are examples of such models that meet neither conditions (B1) nor (B2). Nevertheless, the optimality equation can be still derived using a fixed point argument. This fact, in turn, allows us to consider the time-average cost criterion within such a general framework. Starting with the optimality equation we are able to obtain (ε-)optimal stationary policies and optimal costs with respect to the time-average cost criterion. The paper is organized as follows. First we recall certain terminology and facts concerning lower semianalytic functions and Borel as well as universally measurable selectors. Then we present the model and introduce our assumptions. In Section 4 we discuss a solution to the average cost optimality equation, whilst Section 5 is devoted to a study of the time-average cost criterion. We end this section with an example illustrating that the average cost optimality criteria may lead to different optimal policies and optimal costs.
2
Preliminaries
At the beginning we give the definitions of Borel, analytic and universally measurable sets and functions. For further and more complete terminology the reader is referred to [1]. Definition 1. We call X a Borel space, if X is a non-empty Borel subset of some Polish space, i.e., complete separable metric space, and it is endowed with σ-algebra B(X) of all its Borel subsets.
Let N N be the set of sequences of positive integers, equipped with the product topology. So N N is a Polish space. Let A be a separable metric space. Definition 2. A is called an analytic set or analytic space provided
Average optimality for semi-Markov control processes
17
there is a continuous function g on N N whose range g(N N ) is A. There are other equivalent definitions of analytic sets in a Borel space X. One posiibility is to define them as the projection on X of the Borel subsets of X × Y, where Y is some uncountable Borel space. Now let E be an analytic subset of an analytic space X and let p be any probability measure on the Borel subsets of X. Definition 3. E is universally measurable, if E is in the completion of the Borel σ-algebra with respect to every probability measure p. From now on let X and Y be Borel spaces. Let A(X) be the analytic σ-algebra and U(X) be the σ-algebra of all universally measurable subsets of X. Definition 4. We say that a function f : X "→ Y is analytically measurable [universally measurable] if f −1 (B) ∈ A(X) [f −1 (B) ∈ U(X)] for every B ∈ B(Y ).
We have B(X) ⊂ A(X) ⊂ U(X). Hence, every Borel measurable function is analytically measurable, and every analytically measurable function is universally measurable. Definition 5. Let B ⊂ X and f : B "→ R. If B is analytic and the set {x ∈ B : f (x) < c} is analytic for each c ∈ R, then f is said to be lower semianalytic (l.s.a.). Now we are in a position to recall some basic results on l.s.a. functions and universally measurable selectors. Lemma A. (Proposition 7.48 in [1]) Let f : X × Y "→ R be l.s.a., and q(dy|x) a Borel measurable stochastic kernel on Y given X. Then, the function f¯ : X "→ R defined by f¯(x) =
!
f (x, y)q(dy|x) Y
is l.s.a. Lemma B. (Jankov-von Neumann theorem) If K ⊂ X × Y is analytic, then there exists an analytically measurable function φ : projX (K) "→ Y such that Gr(φ) := {(x, y) : y = φ(x), x ∈ projX (K)} ⊂ K. For the proof the reader is referred to [1], p. 182. This lemma brings us to the following selection theorem for l.s.a. functions.
18
Anna Ja´skiewicz and Andrzej S. Nowak
Lemma C. Let K ⊂ X × Y be analytic and f : K #→ R be l.s.a. Define f ∗ : projX (K) #→ R by f ∗ (x) = inf f (x, y), y∈Y (x)
with Y (x) := {y ∈ Y : (x, y) ∈ K}. Then, the following holds (a)f ∗ is l.s.a. function, (b) the set I = {x ∈ projX (K)|for some yx ∈ Y (x), f (x, yx ) = f ∗ (x)} is universally measurable, and for every ε > 0 there exists a universally measurable function φ : projX (K) #→ Y such that Gr(φ) ⊂ K and for all x ∈ projX (K) f (x, φ(x)) = f ∗ (x), if x ∈ I,
and f (x, φ(x)) ≤ f ∗ (x) + ε, if x ̸∈ I.
Part (a) follows from the proof of Proposition 7.47 in [1], whilst part (b) is a consequence of Proposition 7.50 in [1]. Definition 6. Let K be a Borel set and projX (K) = X. It is said that K admits a graph (Borel measurable selection or uniformization), if there exists a Borel measurable function φ : X #→ Y such that φ(x) ∈ Y (x). It is worth mentioning that a Borel set K need not have a graph [2]. However, it is well-known that if Y (x) is σ-compact for each x ∈ X, then K contains a graph [3].
3
The model
A semi-Markov control process is described by the following objects: (i) The state space X is a standard Borel space. (ii) A is a Borel action space. (iii) K is a non-empty analytic subset of X × A. We assume that for each x ∈ X, the non-empty x-section A(x) = {a ∈ A : (x, a) ∈ K} of K represents the set of actions available in state x.
Average optimality for semi-Markov control processes
19
(iv) Q(·|x, a) is a regular transition measure from X ×A into R+ ×X, where R+ = [0, ∞). It is assumed that Q(D|x, a) is a Borel function on X × A for any Borel subset D ⊂ R+ × X and Q(·|x, a) is a probability measure on R+ × X for any x ∈ X, a ∈ A(x). Denote ˆ a) := Q([0, t] × X|x, ˆ a) Q(t, X|x, ˆ ⊂ X. If a ∈ A(x) is selected in state x, then for any Borel set X ˆ Q(t, X|x, a) is the joint probability that the sojourn time is not greater ˆ By H(·|x, a) denote a distribution than t ∈ R+ and the next state y ∈ X. of the sojourn time when the process is in state x and action a ∈ A(x) is selected, that is, H(t|x, a) = Q(t, X|x, a). Let τ (x, a) be the mean holding time, i.e., ! ∞
τ (x, a) =
tH(dt|x, a).
0
Put q(·|x, a) := Q(R+ , ·|x, a). Then, q is called the transition law of the embedded Markov process. Moreover, the distribution of the sojourn time and the next state are independent conditional on (x, a), i.e., ˆ a) = q(X|x, ˆ a)H(t|x, a). Q(t, X|x, (v) Let ci : K %→ R, i = 1, 2. Then, the expected one-step cost function c : K %→ R equals c(x, a) = c1 (x, a) + τ (x, a)c2 (x, a).
Here c1 is an immediate cost paid by the decision maker at the transition time and the cost c2 is incurred until the next transition occurs. Put T0 := 0. Let {Tn } denote a sequence of random decision (jump) epochs. If the initial state is x = x0 and the action a0 ∈ A(x) is selected, then the immediate cost c1 (x, a0 ) is incurred for the decision maker and the process remains in state x up to the time T = T1 − T0 = T1 . The cost c2 (x, a0 ) per unit time is incurred until the next transition occurs. Afterwards the system jumps to the state x1 according to the probability measure q(·|x, a0 ). The decision maker chooses again an action a1 ∈ A(x1 ) and the process remains in state x1 for a random time T2 − T1 . The cost c1 (x1 , a1 ) + (T2 − T1 )c2 (x1 , a1 ) is incurred and a new state x2 is generated according to the distribution q(·|x1 , a1 ). This situation repeats itself yielding a trajectory (x0 , a0 , t1 , x1 , a1 , t2 , . . .) of some stochastic process, where xn and an describe the state and the chosen action, respectively, on the nth stage of the process. Obviously, tn is a
20
Anna Ja´skiewicz and Andrzej S. Nowak
realization of the random variable Tn , and a distribution function of the random holding time Tn+1 − Tn is H(·|xn , an ). A policy is a sequence π = {πn } where πn (n ≥ 0) is a universally measurable stochastic kernel on A given (X × A × R+ )n × X satisfying πn (A(xn )|hn ) = 1 for any history hn = (x0 , a0 , t1 , . . . , xn ) of the process (Clearly, h0 = x0 .) By Π0 we denote the class of all policies. Let F 0 be the set of all universally measurable transition probabilities f from X to A such that f (x) ∈ A(x) for each x ∈ X. A stationary policy π is of the form π = {f, f, . . .}, where f ∈ F 0 . Thus, every stationary policy π = {f, f, . . .} can be identified with the mapping f ∈ F 0 . Since K is analytic, the Jankov-von Neumann theorem guarantees that there exists at least one f ∈ F 0 . Therefore, F 0 and Π0 are non-empty. Let Ω = (K × R+ )∞ be the space of all infinite histories of the process endowed with U (σ-algebra of universally measurable sets in Ω). According to Proposition 7.45 in [1], for any π ∈ Π and an initial state x0 = x ∈ X there exists a unique probability measure Pxπ defined on Ω. By Exπ we denote the expectation operator with respect to Pxπ . Let π ∈ Π0 , x ∈ X and t ≥ 0 be fixed. Put N (t) := max{n ≥ 0 : Tn ≤ t} as the counting process. By our assumptions, which are presented below, Pxπ (N (t) < ∞) = 1 (see Remark 2 in [10]). We shall consider the two average expected costs: - the ratio-average cost J(x, π) := lim sup n→∞
- the time-average cost j(x, π) := lim sup
Exπ
!"
#
Exπ
!"
#
n−1 k=0 c(xk , ak ) , " Exπ n−1 k=0 (τ (xk , ak ))
t→∞
N (t) k=0 c(xk , ak )
t
.
For functions J(x, π) and j(x, π) we define the optimal costs as J(x) := inf J(x, π), π∈Π0
j(x) := inf j(x, π). π∈Π0
A policy π ε is called ε-optimal with respect to the ratio-average cost criterion if J(x, π ε ) − ε ≤ J(x)
Average optimality for semi-Markov control processes
21
for all x ∈ X. In a similar way we define the ε-optimality with respect to the time-average cost criterion. Now we are in a position to introduce our assumptions. (B0) Basic assumptions: (i) the set K is analytic; (ii) there exist a constant B > 0 and a Borel measurable function V : X "→ [1, +∞) such that |c(x, a)| ≤ BV (x) and |τ (x, a)| ≤ BV (x) for every (x, a) ∈ K; (iii) the function τ is Borel measurable, whilst c is l.s.a on K. (GE) V -geometric ergodicity assumptions: (i) there exists a Borel set C ⊂ X such that for some λ ∈ (0, 1) and η > 0, we have !
X
V (y)q(dy|x, a) ≤ λV (x) + η1C (x)
for each (x, a) ∈ K; V is the function introduced in (B0); (ii) the function V is bounded on C, i.e., vC := sup V (x) < ∞; x∈C
(iii) there exist some δ ∈ (0, 1) and a probability measure µ concentrated on the Borel set C with the property that
q(D|x, a) ≥ δµ(D) for each Borel set D ⊂ C, x ∈ C and a ∈ A(x).
For any function u : X "→ R define the V-norm ∥u∥V := sup
x∈X
|u(x)| . V (x)
Under (GE) the embedded state process {xn } governed by a stationary policy f ∈ F 0 is a positive recurrent aperiodic Markov chain and there exists a unique invariant probability measure πf (consult Theorem 11.3.4 and page 116 in [16]). Moreover, by Theorem 2.3 in [17], {xn } is V -ergodic, that is, there exist θ > 0 and α ∈ (0, 1) such that (1)
! " "! " " n u(y)q (dy|x, f (x)) − u(y)πf (dy)" ≤ V (x)∥u∥V θαn " X
X
22
Anna Ja´skiewicz and Andrzej S. Nowak
for every u with ∥u∥V < ∞, and x ∈ X, n ≥ 1. Here q n (·|x, f (x)) denotes the n-stage transition probability induced by q and a stationary policy f. As an immediate consequence of (1), one can easily get ! c(x, f (x))πf (dx) , J(f ) := J(x, f ) = ! X
(2)
X
τ (x, f (x))πf (dx)
for every f ∈ F 0 .
Lemma 1. Let (GE) hold. Then (a) inf f ∈F 0 πf (C) ≥
(b) supf ∈F 0
!
X
1−λ η ;
V (y)πf (dy) ≤
η 1−λ ;
Proof. Let the process be governed by a stationary policy f ∈ F 0 . Integrating both sides of (GE, i) with respect to the invariant probability measure πf we get "
X
V (y)πf (dy) ≤ λ
"
X
V (y)πf (dy) + ηπf (C).
Now part (a) easily follows from the fact that V ≥ 1, whilst part (b) is a consequence of πf (C) ≤ 1. ✷ We also make two additional assumptions on the sojourn time T.
(R) Regularity condition: there exist κ > 0 and β < 1 such that for all x ∈ C and a ∈ A(x).
H(κ|x, a) ≤ β
(I) Uniform integrability condition: lim sup sup [1 − H(t|x, a)] = 0.
t→∞ x∈C a∈A(x)
Assumption (R) ensures that an infinite number of transitions does not occur in a finite time interval. Ross [20], Sennott [21] and Yushkevich [25] assume that assumption (R) holds for the whole state space. However, we require (R) to hold only for the states x ∈ C. This is because condition (GE) implies that the embedded Markov process governed by any policy returns to the set C within the finite number of transitions with probability 1. Therefore, we have to control its behaviour only on the set C. From (R) one can easily deduce that (3)
τ (x, a) ≥ κ(1 − β) for x ∈ C and a ∈ A(x).
For broader discussion of the assumptions the reader is referred to [7, 9, 13, 16, 17, 22].
Average optimality for semi-Markov control processes
4
23
The average cost optimality equation
We begin with an auxiliary result that enables us to replace the function V used in (GE) by a new function W. The W -norm defined below will play an essential role in the proofs of our main results. Lemma 2. Let assumption (GE) hold. Then, there exist a measurable function W > V and a constant λ′ ∈ (0, 1) such that !
X
′
W (y)q(dy|x, a) ≤ λ W (x) + δ1C (x)
!
W (y)µ(dy). C
Proof. Define W (x) := V (x) + ηδ . Then, simple calculations give !
!
"
#
η η ≤ λV (x) + + η1C (x) δ δ X " # ! λ + ηδ η + W (x) + 1C (x)δ V (y)µ(dy) . ≤ 1 + ηδ δ C
W (y)q(dy|x, a) = X
V (y)q(dy|x, a) +
Hence, the result holds with λ′ =
λ+ ηδ 1+ ηδ
.✷
For any function u : X #→ R we define the W -norm as ∥u∥W := sup
x∈X
|u(x)| . W (x)
From now on, we shall take into consideration the functions, which have finite W -norm. Moreover, note that iff ∥u∥W < ∞.
∥u∥V < ∞
Let L0W denote the set of all l.s.a. functions whose W -norm is finite. Note that L0W is a Banach space. For any (x, a) ∈ K set p(·|x, a) := q(·|x, a) − 1C (x)δµ(·). Observe that from Lemma 1 we have (4)
!
X
W (y)p(dy|k) =
!
X
W (y)p(dy|x, a) ≤ λ′ W (x).
Put g := inf J(f ). f ∈F 0
24
Anna Ja´skiewicz and Andrzej S. Nowak
From (B0), (GE, i) and (R) we conclude that g < ∞. Indeed, by (2) and (3) ! ! B X V (x)πf (dx) |c(x, f (x))|πf (dx) X ≤ . |J(f )| ≤ ! X
τ (x, f (x))πf (dx)
Now Lemma 1 yields
g≤
κ(1 − β)πf (C)
η B 1−λ
1−λ η κ(1
− β)
.
For any function u ∈ L0W define the operator T in the following way (5)
(Tu)(x) := inf
a∈A(x)
"
#
c(x, a) − gτ (x, a) +
$
u(y)p(dy|x, a) X
for all x ∈ X.
Theorem 1. Assume (B0, GE, R). (a) There exist a constant g ∗ and a function h ∈ L0W such that (6)
h(x) = inf
a∈A(x)
"
c(x, a) − g ∗ τ (x, a) +
#
h(y)q(dy|x, a) X
$
for all x ∈ X.(b) For any ε > 0 there exists a universally measurable function f ε ∈ F such that
(7) h(x) ≥ c(x, f ε (x)) − g ∗ τ (x, f ε (x)) +
#
X
h(y)q(dy|x, f ε (x)) − ε
for all x ∈ X.(c) Moreover, g ∗ = g = inf π∈Π0 J(x, π) and g ∗ ≥ J(f ε ) − ε. The proof of Theorem 1 is similar to that of Theorem 1 in [12] and makes use of an idea presented by Vega-Amaya [26]. We first notice that by (4) the operator T is contractive and maps the Banach space L0W into itself. This follows from our assumption (B0) and (4). Hence, from the Banach fixed point theorem there exists h ∈ L0W such that (8)
h(x) =
inf
a∈A(x)
"
c(x, a) − gτ (x, a) + $
#
#
h(y)q(dy|x, a)
X
−1C (x)δ h(y)µ(dy) . X
Clearly, if x ̸∈ C then (8) becomes (6). If, on the other hand, x ∈ C we define # h(x)µ(dx) d := −δ C
Average optimality for semi-Markov control processes
25
and are going to show that d = 0. On the contrary, we assume that d ̸= 0 and proceed along the same lines as in [12]. Therefore, optimality equation (6) is satisfied with g ∗ := g and the function h. Then, part (b) follows directly from Lemma C(b), whilst part (c) is an immediate consequence of a standard dynamic programming argument (see [7, 9]). Now we shall describe more specific results, when certain assumptions of regularity are imposed. Assumptions (B1) correspond to the model with a strongly continuous transition probability function, whilst conditions (B2) are in agreement with a semi-Markov control process, whose transition law is weakly continuous. (B1) Basic assumptions: (i) the set K is Borel and A(x) is compact for any x ∈ X;
(ii) for each x ∈ X, c(x, ·) is lower semicontinuous and τ (x, ·) is continuous on A(x); (iii) for each x ∈ X and Borel set D ⊂ X, the function q(D|x, ·) is continuous on A(x); (iv) there exists a constant B > 0 and a Borel measurable function V : X $→ [1, +∞) such that |c(x, a)| ≤ BV (x) and |τ (x, a)| ≤ BV (x) for every (x, a) ∈ K;
(v) for each x ∈ X, the function !
X
V (y)q(dy|x, ·)
is continuous on A(x). (B2) Basic assumptions: (i) the set K is Borel, A(x) is compact for any x ∈ X, and moreover, the set-valued mapping x $→ A(x) is upper semicontinuous, that is, {x ∈ X : A(x) ∩ D ̸= ∅} is closed for every closed set D in A; (ii) c is lower semicontinuous and τ is continuous on K; (iii) the transition law q is weakly continuous on K, that is, !
X
u(y)q(dy|x, a)
26
Anna Ja´skiewicz and Andrzej S. Nowak
is a continuous function of (x, a) ∈ K for each bounded and continuous function u; (iv) there exists a constant B > 0 and a continuous function V : X "→ [1, +∞) such that |c(x, a)| ≤ BV (x) and |τ (x, a)| ≤ BV (x) for every (x, a) ∈ K; (v) the function
!
X
is continuous in (x, a) ∈ K;
V (y)q(dy|·, ·)
" > 0 (recall that (vi) there exists an open set C" ⊂ C such that µ(C) µ is a probability measure on the set C).
Let Π [F ] be the set of all Borel measurable [stationary Borel measurable] policies. Note that since either (B1, i) or (B2, i) hold, then from Corollary 1 in [3] F is non-empty. Thus, let k ∈ F. Define g = inf J(f ). f ∈F
Remark 1. It is worth pointing out that under (B1, i) or (B2, i) the optimal costs within the classes F to F 0 are same, that is, g = inf J(f ) = inf J(f ).
(9)
f ∈F
f ∈F 0
Indeed, let f ∈ F 0 and D be any Borel subset of X. Then, by the definition of πf we have πf (D) =
!
X
q(D|x, f (x))πf (dx) =
! ! X
A
q(D|x, a)f (da|x)πf (dx).
ˆ ⊂ X and a Borel From Lemma 7.28(c) in [1], there exist a Borel set X ˆ ˆ ˆ measurable function f : X "→ A such that πf (X) = 1 and fˆ(x) = f (x) ˆ Now define for each x ∈ X. ∗
f (x) =
#
ˆ fˆ(x), x ∈ X, k(x), otherwise.
Hence, f ∗ ∈ F. Further, we observe that ν(D) =
!
X
q(D|x, f ∗ (x))ν(dx),
with πf = ν.
27
Average optimality for semi-Markov control processes
Since our assumptions (GE) imply the uniqueness of an invariant probability measure, we conclude that ν = πf ∗ . Therefore, (9) holds. An analogous conclusion may be drawn in Lemma 1. It allows to re-formulate the lemma with the set F instead of F 0 . Observe that from our discussion in Section 2 it follows that we cannot only assume K is a Borel set. This is because it may occur that F = ∅, and consequently inf f ∈F J(f ) = ∞. Therefore, we add the additional assumption: compactness of A(x), in order to apply Corollary 1 in [3]. By BW [LW ] we denote the space of all Borel measurable [lower semicontinuous] functions that have finite W -norm. Now we are ready to present our next results. Theorem 2. Assume (B1) [(B2)] and (GE, R). (a) There exist a constant g ∗ and a function h ∈ BW [h ∈ LW ] such that (10)
h(x) = inf
a∈A(x)
!
"
∗
c(x, a) − g τ (x, a) +
h(y)q(dy|x, a) X
#
for all x ∈ X. (b) There exists a Borel measurable function f$ ∈ F such that h(x) = c(x, f$(x)) − g ∗ τ (x, f$(x)) +
"
X
h(y)q(dy|x, f$(x))
for all x ∈ X. (c) Moreover, g ∗ = g = inf π∈Π J(x, π) and g ∗ = J(f$).
The average cost optimality equation for semi-Markov control processes with strongly continuous transition probability functions satisfying quite general ergodicity assumptions has been established in a few papers [8, 9, 26]. For instance, Hern´andez-Lerma and Luque-V´asquez [8] apply the so-called Schweitzer’s data transformation, Ja´skiewicz [9] examines auxiliary perturbed models, whilst Vega-Amaya [26] makes use of a fixed point theorem, which directly leads to the solution of the optimality equation. In particular, (under slightly different ergodicity assumptions) he defines the operator T as in (5). Since c and τ are Borel measurable functions, it is easy to observe that T maps the space BW into itself and is a contractive operator. Therefore, by the Banach fixed point theorem it follows that there exists a function h ∈ BW such that !
"
"
#
h(x)= inf c(x, a) − gτ (x, a) + h(y)q(dy|x, a) − 1C (x) h(y)µ(dy) . a∈A(x)
X
X
28
Anna Ja´skiewicz and Andrzej S. Nowak
!
Now it suffices to prove that X h(y)µ(dy) = 0. But this fact can be shown in much the same way as in the proofs of Theorems 3.5 and 3.6 in [26] and by applying Lemma 1. Hence, the optimality equation holds with the function h and constant g ∗ := g, and moreover, by our semicontinuity/compactness assumptions (B1) we may replace inf in (10) by min . The existence of a Borel measurable selector of the minima on the right-hand side of (10) follows from a measurable selection theorem [3]. The part (c) is a consequence of a dynamic programming argument. Here, we would like to emphasise that the idea of making use of a fixed point theorem to solve the optimality equation in this set-up belongs to Vega-Amaya [26]. As far as semi-Markov control processes with weakly continuous and V -geometric ergodic transition probabilities are concerned, they were only examined in [11, 14]. A solution to the optimality equation has been obtained in [11] by a fixed point argument. However, in contrast to the previous case the Banach theorem cannot be applied directly. This is because the operator T need not map the space LW into itself. To see ! this peculiarity observe that the function k(x) := 1C (x) X u(y)µ(dy) is only Borel measurable. Even if the set C was closed (or open), we would ! not know the sign of the integral X u(y)µ(dy) for different functions u ∈ LW . Consequently, k(x) is not necessarily lower semicontinuous. Therefore, we first have to regularise/smooth an appropriate function in the following way
Φ (x, a) := u
lim inf
x′ →x, a′ →a ′
−1C (x )
"
′
′
′
′
c(x , a ) − gτ (x , a ) +
#
$
#
X
u(y)q(dy|x′ , a′ )
u(y)µ(dy) X
Then, the function Φu is lower semicontinuous on K and the operator
% defined as T
% := inf Φu (x, a) (Tu)(x) a∈A(x)
maps the space LW into itself and is contractive. For these properties the reader is referred to Lemma 3.3 in [11]. Consequently, there exists % a function h ∈ LW such that a fixed point of T, (11)
h(x) =
inf Φh (x, a)
a∈A(x)
29
Average optimality for semi-Markov control processes
=
inf
a∈A(x)
+
#
X
!
lim inf
x′ →x, a′ →a
"
c(x′ , a′ ) − gτ (x′ , a′ )
h(y)q(dy|x′ , a′ ) − 1C (x′ )
#
h(y)µ(dy)
X
$%
.
&
Now it suffices to prove that X h(y)µ(dy) = 0 and at the same time dispose of the liminf in (11). This fact has been shown in [11]. Thus, (10) holds with the function h and constant g ∗ = g. Parts (b) and (c) are obvious. Generally, semi-Markov control processes with Feller transition probabilities require more delicate handling. Indeed, even for V -geometric ergodic Markov decision models, for which the jump times occur at integer points, the issue is not a simple matter, see [6, 13, 15, 22]. For instance, Ja´skiewicz and Nowak [13] and Sch¨al [22] apply the Fatou lemma for weakly convergent measures, which only yields the optimality inequality. K¨ uenle [15], on the other hand, introduces certain contraction operators that lead to a parametrized family of functional equations. Making use of some continuity and monotonicity properties of the solutions to these equations (with respect to the parameter) he obtains a lower semicontinuous solution of the optimality equation. In contrast to his approach, Gonz´alez-Trejo et al. [6] apply directly the Banach fixed point theorem. Nevertheless, their method has some disadvatages, namely, it requires stronger assumptions and excludes many interesting examples (see Remark 4(b) in [13]). For further interesting examples of (semi-)Markov control processes the reader is referred to [5, 6, 7, 8, 11, 18, 23] and reference therein.
5
The time-average cost criterion
The following result is concerned with the equivalence of the ratioaverage and time-average cost criteria. Generally, these two criteria may have nothing to do with each other, and may lead to different optimal policies and costs (see Example). Theorem 3. Assume (GE, R, I). (a)If (B0) is satisfied and K admits a graph, then g ∗= inf π∈Π0 j(x, π), and for any ε0 > 0, the policy f ε0 ∈ F 0 is an ε0 -optimal. (b)If either (B1) or (B2) holds, then g ∗ = inf π∈Π j(x, π) = j(x, f').
30
Anna Ja´skiewicz and Andrzej S. Nowak
The proof of Theorem 3 is based on the optional sampling theorem applied to the appropriate sub- and supermartingales that are uniformly integrable. The property of uniform integrability is the main difficulty in the proof and in order to overcome it one needs to employ some basic facts from renewal theory and certain consequences of V -geometric ergodicity. Let us mention that this issue was thoroughly studied in [10] under assumptions (B1) and in [14] under conditions (B2). Therefore, part (b) follows immediately from Theorem in [10] and Theorem 2 in [14]. As far as part (a) is concerned, its proof is similar to that of part (b). There are only two matters in question that require some explanation. Firstly, we claim that any universally measurable policy can be replaced by a Borel measurable one. This fact is formulated in Lemma 3 below. Secondly, we note that the proof of Theorem 2 in [14] is also valid for any Borel measurable q (not only weakly continuous). Hence, all lemmas in [10, 14] hold true within our general framework. We provide a rough idea of the proof. Lemma 3. Assume that π ∈ Π and x ∈ X is fixed and let d : X "→ R be a Borel measurable function such that ∥d∥W ≤ +∞. If K contains a graph, !x , for which (a) there exists a Borel measurable semi-Markov policy π (b) the function
Exπ d(xn ) = Ex!πx d(xn ),
n = 0, 1, ...
D(x) := sup Exπ d(xn ) = sup Ex!π d(xn ), π∈Π0
is universally measurable in x.
n = 0, 1, ...
! π ∈Π
The proof of part (a) consists of two steps. We first follow Proposition 8.1 in [1] replacing π by an universally measurable semi-Markov policy. Next this policy is superseded by a Borel measurable one (note that this can be done since K contains a graph). Part (b) is a consequence of part (a) and Lemma 7.1 in [24]. Proof of Theorem 3. Let Fn be the σ-algebra generated by all events up to the nth state. By (6) we infer that Sn =
n−1 " k=0
(c(xk , ak ) − g ∗ τ (xk , ak )) + h(xn )
Average optimality for semi-Markov control processes
31
is a submartingale with respect to Fn , and by (7) !n = S
n−1 " k=0
(c(xk , f ε (xk )) − ε − g ∗ τ (xk , f ε (xk ))) + h(xn )
is a supermartingale with respect to Fn . From [10, 14] it follows that !n } are uniformly integrable, that is, {Sn } and {S ε π # (I) Ex |SN (t)+1 | and Exf |S N (t)+1 | are well defined; ε π !n |; N (t) ≥ n] tend to 0, when n → ∞. (II) Ex [|Sn |; N (t) ≥ n] and Exf [|S Now applying the optional sampling theorem to {Sn } with two stopping times 0 and N (t) + 1, we get ⎛
⎞
N (t)
h(x) ≤ Exπ ⎝
"
k=0
(c(xk , ak ) − g ∗ τ (xk , ak ))⎠ + Exπ h(xN (t)+1 ).
Simple rearrangments and the fact that ⎛
⎞
N (t)
yield
(12)
Exπ ⎝
"
k=0
τ (xk , ak )⎠ = Exπ TN (t)+1 ⎛
⎞
N (t)
1 π⎝ E c(xk , ak )⎠ t x k=0
1 g ∗ Exπ TN (t)+1 ≤ t
+Exπ
(
"
h(xN (t)+1 ) t
)
−
h(x) . t
From the proofs of Lemma 8 in [10] and Theorem 2 in [14], it follows that 1 lim E π TN (t)+1 = 1. t→∞ t x In addition, since h ∈ LW , then ∥h∥V < +∞ (because ∥h∥W < +∞). Hence, making use of Lemma 8 in [10] and Theorem 2 in [14] we conclude 1 π E h(xN (t)+1 ) = 0. t→∞ t x lim
Letting t → ∞ in (12) we infer g ∗ ≤ lim sup t→∞
Exπ
*+
,
N (t) k=0 c(xk , ak )
t
= j(x, π).
32
Anna Ja´skiewicz and Andrzej S. Nowak
Since π ∈ Π is arbitrary, we have g ∗ ≤ inf j(x, π).
(13)
π∈Π0
!n . Applying the optional sampling theorem to {S !n } with Now consider S two stopping times 0 and N (t) + 1 we obtain ε εExf N (t)
(14)
t
fε
ε
f Ex h(x) ∗ Ex TN (t)+1 + +g ≥ t t
+
"#
$
N (t) ε k=0 (c(xk , f (xk )))
t
ε Exf h(xN (t))
t
.
Letting t → ∞ in (14) and arguing in the same way as above, we deduce ε εExf N (t) lim sup + g ∗ ≥ j(x, f ε ). t t→∞ Let M (t) be a renewal function that corresponds to an i.i.d. sequence of random variables, each with the following distribution % H(t) :=
&
β, t ∈ [0, κ) 1, t ≥ κ,
% H(t) = 0,
t < 0.
The constants κ and β were introduced in (R). From Lemma 6(b) in [10], it follows that ε
Exf N (t) ≤ θC M (t) + θ(x), with θ(x) :=
'
(
η 1 ln V (x) + 1C (x) , ln(1/λ) λ
θC := sup θ(x)
(see (GE)).
x∈C
Moreover, M (t) 1 = t→∞ t κ(1 − β) lim
(by Theorem 3.3.2(a) in [19]).
Thus, ε
θC M (t) + θ(x) εE f N (t) ε lim sup x ≤ ε lim sup = =: ε0 t t κ(1 − β) t→∞ t→∞
Average optimality for semi-Markov control processes
33
Let f ε0 := f ε . We infer that g ∗ ≥ j(x, f ε0 ) − ε0 . From the fact that ε > 0 is arbitrary (so is ε0 ), we obtain g ∗ ≥ inf j(x, π).
(15)
π∈Π0
Combining (13) and (15) together yields part (a). ✷ Remark 2. The proof of Theorem 3 discovers a suprising feature of an ε-optimal policy f ε obtained in Theorem 1. Namely, it turns out that f ε is ε0 -optimal within the time-average cost criterion, and ε0 does not have to be ε. However, ε0 can be expressed by ε and other constants used in the assumptions. We present Example 3.2 in [4]. It shows that the two average optimal costs and corresponding optimal policies may be different. Example. Let X = {1, 2, 3}, A(1) = {c, s} and A(2) = A(3) = {s}. The mean holding time equals τ (1, c) = τ (1, s) = τ (2, s) = 1, τ (3, s) = 2. The transition probabilities are given by q(2|1, s) = q(3|1, s) = 12 , q(2|2, s) = q(3|3, s) = q(1|1, c), whilst the one-step cost function is r(2, s) = 1, r(1, c) = 25 , and 0 otherwise. In this model, there are two stationary policies f (1) = c and d(1) = s. Let x0 = 1, then 1 j(1, d) = , 2
J(1, d) = lim sup n→∞
1
1 2n + 32 n
=
1 3
and
2 2 j(1, f ) = , J(1, f ) = . 5 5 Therefore, for the time-average cost criterion, policy f is better than d, whereas for the other one d is better. Anna Ja´skiewicz Institute of Mathematics and Computer Science, Wroc!law University of Technology, Wyspia´ nskiego 27, 50-370 Wroc!law, Poland. ajaskiew@im.pwr.wroc.pl
Andrzej S. Nowak Faculty of Mathematics, Computer Science, and Econometrics, University of Zielona G´ ora, Podg´ orna 50, 65-246 Zielona G´ ora, Poland, a.nowak@wmie.uz.zgora.pl
34
Anna Ja´skiewicz and Andrzej S. Nowak
References [1] Bertsekas D.P.; Shreve, S.E., Stochastic Optimal Control: The Discrete Time Case, Academic Press, New York, 1978. [2] Blackwell D., A Borel set not containing a graph, Ann. Math. Stat. 39 (1968), 1345–1347. [3] Brown L.D.; Purves R., Measurable selections of extrema, Ann. Stat. 1 (1973), 902–912. [4] Feinberg E.A., Constrained semi-Markov decision processes with average rewards, Math. Methods Oper. Res. 39 (1994), 257–288. [5] Feinberg E.A.; Lewis M.E., Optimality of four-threshold policies in inventory systems with customer returns and borrowing/storage options, Prob. Eng. Info. Sci. 19 (2005), 45–71. [6] Gonz´alez-Trejo J.I.; Hern´andez-Lerma O.; Hoyos-Reyes L.F., Minimax control of discrete-time stochastic systems, SIAM J. Control Optim. 41 (2003), 1626–1659. [7] Hern´andez-Lerma O.; Lasserre J.B., Further Topics on DiscreteTime Markov Control Process, New York, Springer-Verlag, 1999. [8] Hern´andez-Lerma O.; Luque-V´asquez F., Semi-Markov control models with average costs, Appl. Math. 26 (1999), 315–331. [9] Ja´skiewicz A., An approximation approach to ergodic semi-Markov control processes, Math. Methods Oper. Res. 54 (2001), 1–19. [10] Ja´skiewicz A., On the equivalence of two expected average cost criteria for semi-Markov control processes, Math. Oper. Res. 29 (2004), 326–338. [11] Ja´skiewicz A., A fixed point approach to solve the average cost optimality equation for semi-Markov decision processes with Feller transition probabilities, Comm. Stat. -Theory and Methods 36 (2007), 2559-2575. [12] Ja´skiewicz A., Semi-Markov control processes with noncompact action spaces and discontinuous costs, in preparation (2007).
Average optimality for semi-Markov control processes
35
[13] Ja´skiewicz A.; Nowak A.S., On the optimality equation for average cost Markov control processes with Feller transition probabilities, J. Math. Anal. Appl. 316 (2006), 495–509. [14] Ja´skiewicz A.; Nowak A.S., Optimality in Feller semi-Markov control processes, Oper. Res. Lett. 34 (2006), 713–718. [15] K¨ uenle H.-U., On Markov games with average reward criterion and weakly continuous transition probabilities, SIAM J. Control Optim. 45 (2007), 2156–2168. [16] Meyn S.P.; Tweedie R.L., Markov Chains and Stochastic Stability, New York, Springer-Verlag, 1993. [17] Meyn S.P.; Tweedie R.L., Computable bounds for geometric convergence rates of Markov chains, Ann. Appl. Probab. 4 (1994), 981–1011. [18] Nishimura K.; Stachurski J., Stochastic optimal policies when the discount rate vanishes, J. Econ. Dynamics Control 31 (2007), 1416–1430. [19] Resnick S.I., Adventures in Stochastic Processes, Birkh¨auser, Boston, 1992. [20] Ross S.M., Applied Probability Models with Optimization Applications, San Francisco, Holden-Day, 1970. [21] Sennott L.I., Average cost semi-Markov decision processes and the control of queueing systems, Prob. Eng. Info. Sci. 3 (1989), 247– 272. [22] Sch¨al M., Average optimality in dynamic programming with general state space, Math. Oper. Res. 18 (1993), 163–172. [23] Stockey N.L.; Lucas R.E.; Prescott E.C., Recursive Methods in Economic Dynamics, Harvard University Press, Massachusetts, 1989. [24] Strauch R.E., Negative dynamic programming, Ann. Stat. 37 (1966), 871–890. [25] Yushkevich A., On semi-Markov controlled models with an average reward criterion, Theory Probab. Appl. 26 (1981), 796–803.
36
Anna Ja´skiewicz and Andrzej S. Nowak
[26] Vega-Amaya O., The average cost optimality equation: a fixed point approach, Bol. Soc. Mat. Mexicana 9 (2003), 185–195.
Morfismos, Vol. 11, No. 1, 2007, pp. 37–52
Real options on consumption in a small open monetary economy: a stochastic optimal control approach Francisco Venegas-Mart´ınez Abstract This paper is aimed to develop a stochastic model of a small open monetary economy where risk-averse agents have expectations of the exchange-rate dynamics driven by a mixed diffusionjump process. The size of a possible exchange-rate depreciation is supposed to have an extreme value distribution of the Fr´echet type. Under this framework, an analytical solution of the price of the real option of waiting when consumption can be delayed (a claim that is not traded) is derived. Finally, a Monte Carlo simulation experiment is carried out to obtain numerical approximations of the real option price.
2000 Mathematics Subject Classification: 49J20, 49K20, 93C20. Keywords and phrases: stochastic optimal control, partial differential equations, monetary economy, contingent claims, extreme values.
1
Introduction
Real options have been attracting an increasing attention in economic theory and mathematical economy; see, for instance: Beck and Stockman (2005) studying money as a real option, Strobel (2005) examining monetary integration and inflation preferences through real options, Henderson and Hobson (2002) analyzing real options with constant relative risk aversion, and Foote and Folta (2002) dealing with temporary labor as a real option, among others. The main issue associated with real options is how to value a non-traded contingent claim1 . 1
We refer the reader to the two classical books in real options: Dixit and Pindyck (1994), and Schwartz and Trigeorgis (2001).
37
38
Francisco Venegas-Mart´ınez
In this paper, the underlying asset of the option is the price of money in terms of goods, that is, the consumer’s adquisitive power, which is a non-traded asset (there is no markets for trading purchasing power) and, therefore, the option becomes also a non-traded claim. The real option we will be valuing is the option to set off consumption when the adquisitive power reaches a certain threshold in a future given date; otherwise the individual will have to wait. Even though there is no market for this contingent claims, its value provides an idea of how much the individual is willing to pay for activating consumption. It should be clear, for the reader, that this approximation for valuing derivatives is different from the complete market approach, developed in the BlackScholes-Merton theory, in which the contingent claim can be replicated by a portfolio that combines stock (available in a stock market) and bonds (available in a credit market). In this research, by generalizing Henderson and Hobson’s (2002) paper, we value the real option of waiting when consumption can be delayed in a small open monetary economy with a representative, competitive, and risk-averse consumer. To reach this goal, Merton’s model (1976) is extended by including an extreme value distribution for the jump size of the underlying; an analytical solution for the price of the derivative is obtained. It should be also emphasized that the proposed valuing procedure differs from that in Venegas-Mart´ınez (2005), which is based on Bayesian inference, by now using the von NeumannMorgenstern expected utility framework, which, of course, provides further economic and financial intuition. This paper develops a stochastic economy that explicitly recognizes the role of extreme or exceptional movements in the dynamics of the nominal exchange rate. It is assumed that the exchange-rate dynamics follows a mixed diffusion-jump process where the size of an upward jump is supposed to have an extreme value distribution of the Fr´echet type. In this case, the underlying non-traded asset is the price of money in terms of goods. Using this stochastic setting and assuming identical rational consumers with logarithmic preferences (risk-averse individuals), the price of such a real option is characterized as the solution of a (partial) differential-integral equation with boundary conditions. In fact, we provide an analytical solution of the value of such a real option. Finally, several Monte Carlo simulation experiments are carried out to get numerical approximations of the real option price. Even though this work was, mainly, intended for mathematicians (dealing with mathematical finance) and economists (dealing with math-
Real options on consumption in a small open monetary economy
39
ematical economy), it was not written in terms of definitions, theorems, propositions, and remarks. I should say that when I started writing this paper I did it in a prose style (there was not a specific reason then), and when I finished I realized that the result was a good story. I apologize and hope my colleagues enjoy this story as much as I did. The paper is organized as follows. In the next section, we work out a one-good, cash-in-advance, stochastic economy where agents have expectations of the exchange-rate dynamics driven by a mixed diffusionjump process and the size of a possible exchange-rate depreciation is supposed to have an extreme value distribution. Through section 3, we undertake the consumer’s decision problem. In section 4, we deal with valuing the real option of delaying consumption. In section 5, we provide numerical approximations of the real option price. Finally, in section 6, we present conclusions, acknowledge limitations, and make suggestions for further research.
2
Structure of the model
Let us consider a small open monetary economy populated by infinitely lived identical households in a world with a single consumption good internationally tradable. The main assumptions on the economy resemble those from Venegas-Mart´ınez (2001), (2006a) and (2006b), and they will be described in what follows.
2.1
Purchasing power parity and exchange rate dynamics
We assume that the consumption good is freely traded, and its domestic price level, Pt , is determined by the purchasing power parity condition, namely, we assume that the good in the economy is freely traded and its domestic price level, Pt , is determined by the purchasing power parity condition, namely, Pt = Pt∗ et , (1) where Pt∗ is the foreign-currency price of the good in the rest of world, and et is the nominal exchange rate. Throughout the paper, we will assume, for convenience, that Pt∗ is equal to 1. We also suppose that the initial value of the exchange-rate, e0 , is known and equal to 1. In what follows, we will suppose that the ongoing uncertainty in the dynamics of the expected exchange rate, and therefore in the inflation rate, is generated by a geometric Brownian motion combined with a
40
Francisco Venegas-Mart´ınez
Poisson process where the size of a forward jump is driven by extreme value distributions of the Fr´echet type, that is, det dPt = = µdt + σdWt + ZdNt (2) et Pt where µ ∈ R, σ > 0, (Wt )t≥0 is a Brownian motion defined on a fixed probability space (Ω, F, PW ), and dNt is a Poisson process with intensity parameter λ. From now on, it will be supposed that Cov(dWt , dNt ) = 0. Even though it is easy to incorporate downward jumps by adding a second Poisson process in (2) multiplied by a Weibull distribution, for the sake of simplicity we will keep the analysis only for upward jumps (cf. Venegas-Mart´ınez (2006c)). Moreover, extreme downward movements in the exchange rate or in inflation have never been observed, this situation would be quite atypical. The size of an upward jumps is defined by 1 − 1, X > 0, α > 0, Z= 1 − X −α Y −ν , κ, ν > 0, X= κ where Y is a Fr´echet random variable with parameters α, ν and κ > 0. Clearly, the quantity Z remains positive. The cumulative distribution function of Y is given by: ⎧ ⎪ ⎨0, % ( y < ν, ' & −α FY (y) = (3) y−ν ⎪ , y ≥ ν. ⎩exp − κ The corresponding density of Y satisfies: ) * y − ν −α α fY (y) = FY (y) exp κ κ
(4)
On the other hand, since the number of expected upward jumps in the exchange rate, per unit of time, follows a Poisson process dNt with intensity λ, we have that and
PN {one unit jump during dt} = PN {dNt = 1} = λdt
PN {more than one unit jump during dt} = PN {dNt > 1} = o(dt),
so that
PN {no jump during dt} = 1 − λdt + o(dt),
where o(dt)/dt → 0 as dt → 0.
Real options on consumption in a small open monetary economy
2.2
41
A cash-in-advance constraint
Consider a cash-in-advance constraint of the Clower type: ψmt = ct ,
(5)
where mt is the demand for real cash balances, ct is the demand for consumption, and ψ −1 > 0 is the time that money must be held to in order to finance consumption. The constant ψ applies uniformly at all time t. Condition (5) is critical in linking the exchange-rate dynamics with consumption. An economic interpretation of a cash-in-advance constraint is that money is needed to buy consumption goods. Notice that when ψ = 1 the agent is forced to maintain his demand for money balances in the same proportion of demanded goods. Moreover, if we state the following link between mt and cs : mt =
!
t+ψ −1
cs ds,
t
where ψ −1 > 0 is the time that money must be held to buy consumption goods, then " # ! t+ψ−1 1 ct mt = cs ds = + o . ψ ψ t If the error term o(1/ψ) is neglected, it follows that mt ψ = ct , as in (5).
2.3
The return rates of non-traded and traded assets
Let St = 1/Pt the price of money in terms of goods, a non-traded asset, and V = V (St , t) the price of a European call option on St ; a nontraded contingent claim. Suppose also that there is a real bond of price bt that pays a constant real interest rate r (i.e., it pays r units of the consumption good per unit of time). Thus, the consumer’s real wealth, xt , is given by xt = St + V (St , t) + bt ,
(6)
where x0 is exogenously determined. The stochastic rate of return of St , dRS , is obtained by applying Itˆo’s lemma to the inverse of the price
42
Francisco Venegas-Mart´ınez
level, with (2) as the underlying process, that is, d
!
1 Pt
"
# ! " ! " $ ! " 1 2 1 2 2 1 = − µPt + 2 σ Pt dt − σPt dWt 2 3 Pt Pt Pt2 ! " −X −α + 1 1 + − (7) dNt Pt Pt ' ( 1 %& −µ + σ 2 dt − σdWt − X −α dNt . = Pt
Hence, the stochastic rate of return of St is given by
& ' dRS = σ 2 − µ dt − σdWt − X −α dNt .
(8)
Observe now that the stochastic rate of return of St , dRS = dSt /St , can be rewritten as2 dRS = φdt + σdWt + ξdNt , (9) where φ = σ 2 − µ and ξ = −X −α . If V = V (St , t) denotes the value of the option, then Itˆo’s lemma leads to " 2 ∂V ∂V 1∂ V 2 2 + φSt + 2 σ St dt dV = ∂t ∂St ∂St2 ∂V σSt dWt + [V (St (ξ + 1), t) − V (St , t)] dNt + ∂St !
or, dV = φV V dt + σV V dWt + ξV V dNt , where
1 φV = V
!
" 2 ∂V ∂V 1∂ V 2 2 + φSt + 2 σ St , ∂t ∂St ∂St2 σV =
and ξV =
(10)
1 ∂V σSt V ∂St
1 [V (St (ξ + 1), t) − V (St , t)] . V
2 Another approach for the dynamics of the underlying asset can be found in Venegas-Mart´ınez (2005).
Real options on consumption in a small open monetary economy
3
43
The household’s decision problem
The consumer’s real wealth stochastic accumulation in terms of the portfolio shares, w1t = St /xt , w2t = V /xt , 1 − w1t − w2t = bt /xt , and consumption, ct , is given by dxt = xt w1t dRS + xt w2t dRV + xt (1 − w1t − w2t )rdt − ct dt, with x0 exogenously determined. In this equation, dRV ≡ dV /V . Thus, by substituting (9) and (10) in the above expression, the budget constraint can be rewritten as ! dxt = xt (r + (γ − r)w1t + (φV − r)w2t )dt + (w1t σ + w2t σV )dWt " + (w1t ξ + w2t ξV )dNt , (11) where γ = σ 2 − µ − ψ = φ − ψ.
3.1
The utility index
The von Neumann-Morgenstern utility at time t = 0, v0 , of the competitive risk-averse consumer is assumed to have the time-separable form: v0 = E0
#$
∞
−rt
log(ct ) e
0
% dt ,
(12)
where E0 is the conditional expectation on all available information at t = 0. To avoid unnecessary complex dynamics in consumption, we assume that the agent’s subjective discount rate is consistent with the constant real international rate of interest, r. We consider the logarithmic utility function in order to derive closed-form solutions and make the subsequent analysis more tractable.
3.2
The first order conditions
The Hamilton-Jacobi-Bellman equation for the stochastic optimal control problem of maximizing utility, with log(ct ) = log(ψxt w1t ) and sub-
44
Francisco Venegas-Mart´ınez
ject to (11), is given by max H(wt ; xt , t) ≡ max
w1t ,w2t
w1t ,w2t
!
log(ψxt w1t )e−rt
+ Ix (xt , t)xt [r + (γ − r)w1t + (φV − r)w2t ]
+ It (xt , t) + 12 Ixx (xt , t)x2t (w1t σ + w2t σV )2 & " # $ % + λEξ I xt (w1t ξ + w2t ξV + 1), t − I(xt , t) = 0. (13)
The first-order conditions for w1t and w2t are, respectively, Hw1t = 0
and
Hw2t = 0.
We postulate I(xt , t) in a time-separable form as I(xt , t) = e−rt [β1 log(xt ) + β0 ], where β0 and β1 are to be determined from (13). By substituting the above candidate in (13), we obtain ! max H(w1t , w2t ; xt , t) ≡ max
w1t ,w2t
w1t ,w2t
log(ψxt w1t )
+ β1 [r + (γ − r)w1t + (φV − r)w2t ] − r[β1 log(xt ) + β0 ]
− 12 β1 (w1t σ + w2t σV )2
+ λβ1 Eξ [log(w1t ξ + w2t ξV + 1)]
&
= 0.
If we now compute the first-order conditions, we find that the optimal values of w1t and w2t satisfy: ' ( λξ 1 + Eξ + γ − r = (w1t σ + w2t σV )σ β1 w1t w1t ξ + w2t ξV + 1
and
'
( λξV Eξ + φV − r = (w1t σ + w2t σV )σV . w1t ξ + w2t ξV + 1 So far we have not made any assumption on the parameter values. From now on, without loss of generality, we assume that γ = φ − r, that is, r = ψ.
45
Real options on consumption in a small open monetary economy
4
Pricing the real option of waiting when consumption can be delayed
If we suppose a corner solution, w1t = 1 and w2t = 0, then ! " ξ 1 + λEξ + γ − r = σ2 β1 ξ+1 and λEξ
!
" ξV + φV − r = σσV . ξ+1
(14)
(15)
In this case, it can be shown that β1 = r−1 . After some simple computations, we have that equations (14) and (15) collapse in ! " ξ 2 φ = r + σ − λEξ , (16) ξ+1 and λEξ
!
" ξV + φV − r = σσV . ξ+1
(17)
From (17), it follows ! " V (St (ξ + 1), t) − V (St , t) λEξ ξ+1 $ # 2 ∂V ∂V ∂V 2 1∂ V 2 2 + σ St − rV = φSt + 2 σ St . + 2 ∂t ∂St ∂St ∂St If we now substitute (16) in the above equation, we get ⎤ V (St (ξ + 1), t) − V (St , t) − ξSt ∂V ∂St ⎦ λEξ ⎣ ξ+1 ⎡
+
∂V ∂2V 2 2 ∂V + rSt + 12 σ St − rV = 0. ∂t ∂St ∂St2
(18)
We impose the boundary conditions V (0, t) = 0 and V (St , T ) = max(St − K, 0) where K is the exercise price of the real option (the cost, in terms of goods, of delaying consumption until the “last minute” = T ). In such a case, without loss of generality, we may consider a finite planning horizon [0, T ] in the expected utility expressed in (12). Notice that if fξ (·)
46
Francisco Venegas-Mart´ınez
is the density function of ξ, then the presence of the expected value in the above equation given by ! " V (St (1 + ξ), t) − λV (St , t) Eξ ξ+1 # ∞ V (St (1 + ξ), t) − λV (St , t) fξ (ξ)dξ = ξ+1 −∞ produces in (18) a (partial) differential-integral equation. Notice that if ξ is constant in (18), by redefining λ as λ/(ξ + 1), we obtain Merton’s (1976) formula. Finally, observe that when ξ = 0 or λ = 0, equation (18) reduces to the Black-Scholes’ (1973) second order parabolic partial differential equation. Observe now that if we introduce the following change of variable: $ % y − ν −α , ζ= κ then one of the expectations terms in (18) satisfies !
" ! " ξ X −α E =E ξ+1 X −α − 1 # ∞ [(y − ν)/κ]−α fY (y)dy = [(y − ν)/κ]−α − 1 0 # ∞ ζ e−ζ dζ = ζ −1 0 = −eΓ(−1, 1), where Γ(−1, 1) = −Γ(0, 1) + e−1 , Γ(0, 0) = ∞, Γ(0, ∞) = 0, and Γ(0, 1) ≈ 2/9 (in fact, Γ(0, 1) = 0.219383934...). Here, Γ(a, b) denotes the incomplete Gamma function. In such a case, equation (18) can be transformed into ! " V (St (1 + ξ), t) − V (St , t) ∂V λEξ + ξ+1 ∂t 2 & ' ∂ V ∂V + 12 σ 2 St2 2 + r + λ( 29 e − 1) St − rV = 0. ∂St ∂St
A possibility to determine V (St , t) consists in defining a sequence of random variables Yn , each defined as the product of n independent and identically distributed random variables ξ + 1, with Y0 = 1. In
Real options on consumption in a small open monetary economy
47
other words, if {ξn }n∈ is a sequence of independent and identically distributed random variables. We define Y0 = 1 Y1 = ξ1 + 1 Y2 = (ξ1 + 1)(ξ2 + 1) .. . n ! Yn = (ξk + 1) k=1
.. .
In this case, the solution of equation (18) with the boundary conditions V (0, t) = 0,
V (St , T ) = max(St − K, 0),
and
is given by V (St , t) =
∞ "
#
e−λ(T −t)/(ξ+1) [λ(T − t)/(ξ + 1)]n n! n=0 $ × VBS (St Yn e−λEξ [ξ/(ξ+1)](T −t) , t) , Eξ EYn
where ξ is independent of {ξn }n∈ Scholes solution. Indeed, consider V (St , t) =
∞ "
and V BS (·, ·) is the basic Black-
Eξ EYn [Pn,t V (n) ], BS
n=0
where Pn,t =
e−λ(T −t)/(ξ+1) [λ(T − t)/(ξ + 1)]n , n! U = Y e−λEξ [ξ/(ξ+1)](T −t) n,t
(19)
n
and V (n) = V BS (St Un,t , t). BS In what follows, it will convenient to introduce the notation Qn,t = St Un,t .
(20)
48
Francisco Venegas-Mart´ınez
In such a case, " # ∞ (n) ! ∂VBS ∂V , = Eξ EYn Pn,t Un,t ∂St ∂Qn,t
(21)
n=0
and
" # ∞ 2 V (n) ! ∂ ∂2V 2 BS = Eξ EYn Pn,t Un,t 2 ∂St2 ∂Q n,t n=0 " # ∞ (n) ! ∂VBS ∂V =λEξ [ξ/(ξ + 1)] Eξ EYn Pn,t Qn,t ∂t ∂Qn,t n=0 " # ∞ (n) ! ∂VBS Eξ EYn Pn,t + ∂t n=0 " # ∞ (n) ! Pn,t VBS Eξ EYn +λ ξ+1 n=0 ⎡ (n−1 ' λ(T −t) λ(T − t) − ξ+1 ) * ∞ ⎢e ! ξ+1 V (n) ⎢ BS Eξ EYn ⎢ −λ (n − 1)! ξ+1 ⎣ n=1
Hence, by virtue of (22) and (23), we get
∂V ∂V =λEξ [ξ/(ξ + 1)]St ∂t ∂St " # ' ( ∞ ! ∂V (n) V (St , t) BS + λEξ + Eξ EYn Pn,t ∂t ξ+1 n=0 (m ⎡ λ(T −t) ' λ(T − t) − ξ+1 ) * ∞ ! ⎢e ξ+1 V (m+1) BS Eξ EYm+1 ⎢ −λ ⎣ m! ξ+1 m=0
(22)
⎤
⎥ ⎥ ⎥ . (23) ⎦
⎤
⎥ ⎥ . (24) ⎦
Observe that the last term in the above equation can be written as " # ' ( ! ∞ (Qn,t (1 + ξ), t) V (n) V ((ξ + 1)St , t) BS Eξ Eξ EYn Pn,t = ξ+1 ξ+1 n=0 " # ∞ ! V (n+1) (Qn+1,t , t) BS (25) Eξ EYn+1 Pn,t = ξ+1 n=0
Real options on consumption in a small open monetary economy
49
since Qn+1,t y Qn,t (ξ + 1) are independent and identically distributed random variables. Therefore, equation (24) is transformed into " # ∞ (n) ! ∂VBS ∂V = Eξ EYn Pn,t ∂t ∂t n=0 ⎤ ⎡ V (St (ξ + 1), t) − V (St , t) − ξSt ∂V ∂St ⎦ . (26) − λEξ ⎣ ξ+1 From (21), (22) and (26), it follows that
∂V ∂2V ∂V + 12 σ 2 St2 2 + rSt − rV ∂t ∂St ∂St " # ∞ 2 V (n) (n) ! ∂ ∂V ∂V (n) BS BS BS + 12 σ 2 Q2n,t = Pn,t Eξ EYn + rQn,t − rV (n) 2 BS ∂t ∂Q ∂Q n,t n,t n=0 ⎤ ⎡ V (St (ξ + 1), t) − V (St , t) − ξSt ∂V ∂St ⎦ . (27) − λEξ ⎣ ξ+1
Since
∂V (n) ∂ 2 V (n) ∂V (n) BS BS BS + 12 σ 2 Q2n,t + rQ − rV (n) =0 n,t BS ∂t ∂Qn,t ∂Q2n,t
holds for all n ∈ N ∪ {0}, we deduce, immediately, that (19) is solution of (18).
5
Numerical approximations
In order to obtain numerical approximations of (19), the quantity inside the mathematical expectations in (19) Mξ,Yn =
1000 ! n=0
e−λ(T −t)/(ξ+1) [λ(T − t)/(ξ + 1)]n (n) VBS n!
(28)
is simulated by using the statistical software “Xtremes” (Reiss and Thomas, 2001) and Ripley’s methodology (1987) for Monte Carlo simulations. Subsequently, we compute the average of 10,000 simulated values of Mξ,Yn to obtain, for different values of λ, approximate solutions of the real option of waiting when consumption can be delayed. To
50
Francisco Venegas-Mart´ınez
do this, let us first consider the following parameter values, in Table 1, (0) . In Table 1, S stands for computing the basic Black-Scholes price VBS t for the price of money in terms of goods, K is the cost (in terms of goods) of delaying consumption until the last minute, r is the nominal interest rate, and T − t is the term. Units of St and K are given in money in terms of consumption goods. Parameters for Black-Scholes price of the real option (0) St K r σ T −t VBS 42.00 41.00 0.11 0.13 0.25 2.436 Table 1: Parameter values of the benchmark Black-Scholes price. Table 2 shows numerical approximation of the price of the real option by using Monte Carlo simulation for different values of λ with Eξ [ξ/(ξ + 1)] = −eΓ(−1, 1). It is assumed, for simulation purposes, that ξ follows a Fr´echet distribution with mean 0.01 and variance 0.001. Real option price Eξ [ξ/(ξ + 1)] = −eΓ(−1, 1) λ 0.1 0.2 0.3 0.4 V (St , t) 2.646 2.673 2.698 2.726 (Cont.) λ 0.6 0.7 0.8 0.9 V (St , t) 2.845 2.865 2.898 3.012
0.5 2.742 1.0 3.081
Table 2: Simulated prices of the real option. It is important to point out that option prices in Table 2 depend of the choices of the mean and variance of the random variable ξ. We may conclude, from Table 2 and the chosen mean and variance, that the price of the real option of waiting when consumption can be delayed increases when the average number of jumps per unit of time increases since a growing λ rises the future opportunity cost of purchasing goods.
6
Conclusions
We have developed a stochastic model of a small open monetary economy in which agents have expectations of the exchange-rate dynamics guided by a mixed diffusion-jump process. The size of a possible
Real options on consumption in a small open monetary economy
51
exchange-rate depreciation is supposed to have an extreme value distribution of the Fr´echet type. By using a logarithmic utility, we have derived an analytical solution for valuing the real option of waiting when consumption can be delayed; a claim that is not traded. The provided explicit solutions have made much easier the understanding of the key issues of extreme jumps in valuing contingent claims in a cash-in-advance economy. Finally, a Monte Carlo simulation was carried out to obtain approximate solutions of the real option price. It is worthwhile mentioning that the derived results do not depend on the assumption of logarithmic utility, which is a limit case of the family of constant relative risk aversion utility functions. Needles to say, both nontradable and durable goods will provide more realistic assumptions and should be considered in extending, in further research, the real option of waiting when consumption can be delayed. Acknowledgement The author wishes to thank Arnold Zellner for helpful comments and suggestions. The author is solely responsible for opinions and errors. Francisco Venegas-Mart´ınez Secci´ on de Estudios de Posgrado e Investigaci´ on, Escuela Superior de Econom´ıa, Instituto Polit´ecnico Nacional Plan de Agua Prieta no. 66, Col. Plutarco El´ıas Calles. C.P. 11340, D.F. fvenegas@ipn.mx
Referencias [1] Beck, S. & Stockman, D. R. (2005). Money as real options in a cash-in-advance economy. Economics Letters, 87, 337-345. [2] Black, F. and M. Scholes (1973). “The Pricing of Options and Corporate Liabilities”, The Journal of Political Economy, 81(3), 637-654. [3] Dixit, A. K., Pindyck, R. S., (1994). Investment under Uncertainty. Princeton University Press, Princeton NJ. [4] Foote, D. A. & Folta, T. B. (2002). Temporary workers as real options. Human Resource Management Review, 12, 579-597.
52
Francisco Venegas-Mart´ınez
[5] Merton, R. C., (1976). Option pricing when underlying stock returns are discontinuous. Journal of Financial Economics, 3, 125144. [6] Henderson, V., Hobson, D. G., (2002). Real options with constant relative risk aversion. Journal of Economic Dynamics and Control, 27, 329-355. [7] Reiss, R. D., Thomas, M., (2001). Statistical Analysis of Extreme Values. Second edition Birkh¨auser-Verlag, Basel, Switzerland. [8] Ripley, B. D., (1987). Stochastic Simulation, Wiley, New York. [9] Schwartz, E. S., Trigeorgis, L., (2001). Real Options and Investment under Uncertainty. MIT Press Cambridge, Massachusetts London, England. [10] Strobel, F., (2005). Monetary integration and inflation preferences: a real options analysis. European Economic Review, 49, 845-860. [11] Venegas-Mart´ınez, F., (2001). Temporary stabilization: a stochastic analysis, Journal of Economic Dynamics and Control, 25, 1429-1449. [12] Venegas-Mart´ınez, F., (2005). Bayesian inference, prior information on volatility, and option pricing: a maximum entropy approach. International Journal of Theoretical and Applied Finance, 8, 1-12. [13] Venegas-Mart´ınez, F. (2006a). “Stochastic Temporary Stabilization: Undiversifiable Devaluation and Income Risks”. Economic Modelling, 23(1), 157-173. [14] Venegas-Mart´ınez, F., (2006b). Fiscal policy in a stochastic temporary stabilization model: undiversifiable devaluation risk. Journal of World Economic Review, 1, 87-106. [15] Venegas-Mart´ınez, F. (2006c). Financial and Economic Risks (Derivative Products and Economic Decisions under Uncertainty), International Thomson Editors.
Morfismos, Comunicaciones Estudiantiles del Departamento de Matem´ aticas del CINVESTAV, se termin´ o de imprimir en el mes de marzo de 2008 en el taller de reproducci´ on del mismo departamento localizado en Av. IPN 2508, Col. San Pedro Zacatenco, M´exico, D.F. 07300. El tiraje en papel opalina importada de 36 kilogramos de 34 × 25.5 cm consta de 500 ejemplares con pasta tintoreto color verde.
Apoyo t´ecnico: Omar Hern´ andez Orozco.
Contenido Dice games and stochastic dynamic programming Henk Tijms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Average optimality for semi-Markov control processes Anna Ja´skiewicz and Andrzej S. Nowak . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Real options on consumption in a small open monetary economy: a stochastic optimal control approach Francisco Venegas-Mart´ınez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37