Morfismos, Vol 16, No 2, 2012

Page 1

VOLUMEN 16 NÚMERO 2 JULIO A DICIEMBRE DE 2012 ISSN: 1870-6525


Morfismos Departamento de Matem´ aticas Cinvestav

Chief Editors - Editores Generales • Isidoro Gitler • Jes´ us Gonz´ alez

Associate Editors - Editores Asociados • Ruy Fabila • Ismael Hern´ andez • On´esimo Hern´ andez-Lerma • H´ector Jasso Fuentes • Sadok Kallel • Miguel Maldonado • Carlos Pacheco • Enrique Ram´ırez de Arellano • Enrique Reyes • Dai Tamaki • Enrique Torres Giese

Apoyo T´ecnico • Adriana Aranda S´ anchez • Juan Carlos Castro Contreras • Irving Josu´e Flores Romero • Omar Hern´ andez Orozco • Roxana Mart´ınez • Laura Valencia Morfismos est´ a disponible en la direcci´ on http://www.morfismos.cinvestav.mx. Para mayores informes dirigirse al tel´efono +52 (55) 5747-3871. Toda correspondencia debe ir dirigida a la Sra. Laura Valencia, Departamento de Matem´ aticas del Cinvestav, Apartado Postal 14-740, M´exico, D.F. 07000, o por correo electr´ onico a la direcci´ on: morfismos@math.cinvestav.mx.


VOLUMEN 16 NÚMERO 2 JULIO A DICIEMBRE DE 2012 ISSN: 1870-6525



Morfismos Departamento de Matem´ aticas Cinvestav

Morfismos, Volumen 16, N´ umero 2, julio a diciembre de 2012, es una publicaci´on semestral editada por el Centro de Investigaci´on y de Estudios Avanzados del Instituto Polit´ecnico Nacional (Cinvestav), a trav´es del Departamento de Matem´aticas. Av. Instituto Polit´ecnico Nacional No. 2508, Col. San Pedro Zacatenco, Delegaci´ on Gustavo A. Madero, C.P. 07360, D.F., Tel. 55-57473800, www.cinvestav.mx, morfismos@math.cinvestav.mx, Editores Generales: Drs. Isidoro Gitler Golwain y Jes´ us Gonz´ alez Espino Barros. Reserva de Derechos No. 04-2012-011011542900-102, ISSN: 1870-6525, ambos otorgados por el Instituto Nacional del Derecho de Autor. Certificado de Licitud de T´ıtulo No. 14729, Certificado de Licitud de Contenido No. 12302, ambos otorgados por la Comisi´ on Calificadora de Publicaciones y Revistas Ilustradas de la Secretar´ıa de Gobernaci´ on. Impreso por el Departamento de Matem´aticas del Cinvestav, Avenida Instituto Polit´ecnico Nacional 2508, Colonia San Pedro Zacatenco, C.P. 07360, M´exico, D.F. Este n´ umero se termin´o de imprimir en febrero de 2013 con un tiraje de 50 ejemplares. Las opiniones expresadas por los autores no necesariamente reflejan la postura de los editores de la publicaci´ on. Queda estrictamente prohibida la reproducci´on total o parcial de los contenidos e im´agenes de la publicaci´ on, sin previa autorizaci´on del Cinvestav.



Information for Authors The Editorial Board of Morfismos calls for papers on mathematics and related areas to be submitted for publication in this journal under the following guidelines: • Manuscripts should fit in one of the following three categories: (a) papers covering the graduate work of a student, (b) contributed papers, and (c) invited papers by leading scientists. Each paper published in Morfismos will be posted with an indication of which of these three categories the paper belongs to. • Papers in category (a) might be written in Spanish; all other papers proposed for publication in Morfismos shall be written in English, except those for which the Editoral Board decides to publish in another language. • All received manuscripts will be refereed by specialists.

• In the case of papers covering the graduate work of a student, the author should provide the supervisor’s name and affiliation, date of completion of the degree, and institution granting it. • Authors may retrieve the LATEX macros used for Morfismos through the web site http://www.math.cinvestav.mx, at “Revista Morfismos”. The use by authors of these macros helps for an expeditious production process of accepted papers. • All illustrations must be of professional quality.

• Authors will receive the pdf file of their published paper.

• Manuscripts submitted for publication in Morfismos should be sent to the email address morfismos@math.cinvestav.mx.

Informaci´ on para Autores El Consejo Editorial de Morfismos convoca a proponer art´ıculos en matem´ aticas y ´ areas relacionadas para ser publicados en esta revista bajo los siguientes lineamientos: • Se considerar´ an tres tipos de trabajos: (a) art´ıculos derivados de tesis de grado de alta calidad, (b) art´ıculos por contribuci´ on y (c) art´ıculos por invitaci´ on escritos por l´ıderes en sus respectivas ´ areas. En todo art´ıculo publicado en Morfismos se indicar´ a el tipo de trabajo del que se trate de acuerdo a esta clasificaci´ on. • Los art´ıculos del tipo (a) podr´ an estar escritos en espa˜ nol. Los dem´ as trabajos deber´ an estar redactados en ingl´ es, salvo aquellos que el Comit´ e Editorial decida publicar en otro idioma. • Cada art´ıculo propuesto para publicaci´ on en Morfismos ser´ a enviado a especialistas para su arbitraje. • En el caso de art´ıculos derivados de tesis de grado se debe indicar el nombre del supervisor de tesis, su adscripci´ on, la fecha de obtenci´ on del grado y la instituci´ on que lo otorga. • Los autores interesados pueden obtener el formato LATEX utilizado por Morfismos en el enlace “Revista Morfismos” de la direcci´ on http://www.math.cinvestav.mx. La utilizaci´ on de dicho formato ayudar´ a en la pronta publicaci´ on de los art´ıculos aceptados. • Si el art´ıculo contiene ilustraciones o figuras, ´ estas deber´ an ser presentadas de forma que se ajusten a la calidad de reproducci´ on de Morfismos. • Los autores recibir´ an el archivo pdf de su art´ıculo publicado.

• Los art´ıculos propuestos para publicaci´ on en Morfismos deben ser dirigidos a la direcci´ on morfismos@math.cinvestav.mx.


Editorial Guidelines Morfismos is the journal of the Mathematics Department of Cinvestav. One of its main objectives is to give advanced students a forum to publish their early mathematical writings and to build skills in communicating mathematics. Publication of papers is not restricted to students of Cinvestav; we want to encourage students in Mexico and abroad to submit papers. Mathematics research reports or summaries of bachelor, master and Ph.D. theses of high quality will be considered for publication, as well as contributed and invited papers by researchers. All submitted papers should be original, either in the results or in the methods. The Editors will assign as referees well-established mathematicians, and the acceptance/rejection decision will be taken by the Editorial Board on the basis of the referee reports. Authors of Morfismos will be able to choose to transfer copy rights of their works to Morfismos. In that case, the corresponding papers cannot be considered or sent for publication in any other printed or electronic media. Only those papers for which Morfismos is granted copyright will be subject to revision in international data bases such as the American Mathematical Society’s Mathematical Reviews, and the European Mathematical Society’s Zentralblatt MATH.

Morfismos

Lineamientos Editoriales Morfismos, revista semestral del Departamento de Matem´ aticas del Cinvestav, tiene entre sus principales objetivos el ofrecer a los estudiantes m´ as adelantados un foro para publicar sus primeros trabajos matem´ aticos, a fin de que desarrollen habilidades adecuadas para la comunicaci´ on y escritura de resultados matem´ aticos. La publicaci´ on de trabajos no est´ a restringida a estudiantes del Cinvestav; deseamos fomentar la participaci´ on de estudiantes en M´exico y en el extranjero, as´ı como de investigadores mediante art´ıculos por contribuci´ on y por invitaci´ on. Los reportes de investigaci´ on matem´ atica o res´ umenes de tesis de licenciatura, maestr´ıa o doctorado de alta calidad pueden ser publicados en Morfismos. Los art´ıculos a publicarse ser´ an originales, ya sea en los resultados o en los m´etodos. Para juzgar ´esto, el Consejo Editorial designar´ a revisores de reconocido prestigio en el orbe internacional. La aceptaci´ on de los art´ıculos propuestos ser´ a decidida por el Consejo Editorial con base a los reportes recibidos. Los autores que as´ı lo deseen podr´ an optar por ceder a Morfismos los derechos de publicaci´ on y distribuci´ on de sus trabajos. En tal caso, dichos art´ıculos no podr´ an ser publicados en ninguna otra revista ni medio impreso o electr´ onico. Morfismos solicitar´ a que tales art´ıculos sean revisados en bases de datos internacionales como lo son el Mathematical Reviews, de la American Mathematical Society, y el Zentralblatt MATH, de la European Mathematical Society.

Morfismos


Contents - Contenido Procesos markovianos en la toma de decisiones: contribuciones de On´esimo Hern´ andez-Lerma Francisco Venegas-Mart´ınez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Homotopy theory of non-orientable mapping class groups Miguel A. Maldonado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Variance optimality for controlled Markov-modulated diffusions Beatris Adriana Escobedo-Trujillo and Carlos Octavio Rivera-Blanco . . . . 51



Morfismos, Vol. 16, No. 2, 2012, pp. 1–28 Morfismos, Vol. 16, No. 2, 2012, pp. 1–28

Procesos markovianos en la toma de decisiones: contribuciones de On´esimo Hern´andez-Lerma Procesos markovianos en la toma de decisiones: Francisco ınez andez-Lerma contribuciones de Venegas-Mart´ On´esimo Hern´ Francisco Venegas-Mart´ınez Resumen En esta investigaci´ on se lleva a cabo una revisi´on de algunas de las aportaciones de On´esimo Hern´ andez-Lerma a las matem´aticas Resumen y, espec´ıficamente, a la teor´ıa y pr´ actica de los procesos markoEn en estala investigaci´ on se llevadea agentes cabo una revisi´on de vianos toma de decisiones racionales. Sealgunas desta- de las aportaciones de On´ e simo Hern´ a ndez-Lerma a las matem´ aticas can, particularmente, sus extensiones, reformulaciones y nuevos y, espec´ ıficamente, a la teor´ ıa y pr´ a ctica de los procesos markoplanteamientos en los procesos markovianos de decisi´on, juegos vianos enoptimalidad la toma de de decisiones de optimalidad agentes racionales. Seopdestaestoc´ asticos, Blackwell, en sesgo, can, particularmente, sus extensiones, reformulaciones y nuevos timalidad rebasante y control ´ optimo estoc´astico. planteamientos en los procesos markovianos de decisi´on, juegos 2010 Mathematics Subject Classification: 91A15, 91A35, en 49J20, estoc´ asticos, optimalidad de Blackwell, optimalidad sesgo,60J25, op93C05. timalidad rebasante y control ´optimo estoc´astico.

Palabras y frases clave: Procesos markovianos de decisi´ on, juegos esto2010 Mathematics Subject Classification: 91A15, 91A35, 49J20, 60J25, c´ asticos, control ´ optimo estoc´ astico. 93C05. Abstract Palabras y frases clave: Procesos markovianos de decisi´ on, juegos estoThis paper conducts review of some of the contributions of c´ asticos, control ´ optimoa estoc´ astico. On´esimo Hern´ andez-Lerma to the mathematics and specifically to the theory and practice of Abstract Markov processes in the decision This ofpaper conducts of someinofparticular, the contributions making rational agents.a review It highlights, his ex- of On´ e simo Hern´ a ndez-Lerma to the mathematics and specifically tensions, reformulations and new approaches to Markov decision to the stochastic theory and practice of Markov processes the decision processes, games, Blackwell optimality, biasin optimality, making of rational agents. It highlights, in particular, his exovertaking optimality, and stochastic optimal control. tensions, reformulations and new approaches to Markov decision Keywords: Markovstochastic decision games, procesess, stochastic games, processes, Blackwell optimality, biasstochastic optimality,opovertaking optimality, and stochastic optimal control. timal control.

1

Keywords: Markov decision procesess, stochastic games, stochastic optimal control. Introducci´ on

En las u ´ltimas d´ecadas, los modelos matem´aticos de toma de decisiones 1 Introducci´ onexperimentado una serie de cambios y transde agentes racionales han formaciones profundas. Estos cambios han abierto nuevos paradigmas En las u ´ltimas d´ecadas, los modelos matem´aticos de toma de decisiones de agentes racionales han experimentado una serie de cambios y trans1 formaciones profundas. Estos cambios han abierto nuevos paradigmas 1


2

Francisco Venegas-Mart´ınez

que resaltan la exposici´ on de los agentes a diferentes tipos de riesgos. Esto, por supuesto, es una necesidad irremisible para modelar apropiadamente la realidad contingente y no s´olo una sofisticaci´on m´as en el tratamiento matem´ atico. En este sentido, las investigaciones de On´esimo Hern´ andez-Lerma han abierto un sinn´ umero de horizontes a la teor´ıa de los procesos markovianos y, como consecuencia, han conducido a la utilizaci´ on de herramientas matem´aticas m´as robustas, las cuales permiten una mejor comprensi´ on de qu´e hacen y porqu´e lo hacen los agentes al tomar decisiones en ambientes de riesgo e incertidumbre. En este contexto, los procesos markovianos ocupan un lugar privilegiado, ya sea por sus bondades t´ecnicas en el modelado del proceso de toma de decisiones o bien por su potencial riqueza en diversas aplicaciones. Es tambi´en importante destacar que la crisis mundial de 2007-2009 y sus m´ ultiples lecciones han sido un factor determinante en las diversas reformulaciones recientes de la teor´ıa de los procesos markovianos, haciendo ineludible replantear el modelado del riesgo y la incertidumbre en los procesos de toma de decisiones de los agentes. Las exigencias mismas de la realidad contingente han motivado diversas extensiones de las teor´ıas existentes y, en muchas ocasiones, ha sido necesaria la reformulaci´on de nuevos paradigmas te´ oricos. En particular, los procesos markovianos de decisi´ on, los juegos estoc´ asticos, la optimalidad de Blackwell, la optimalidad en sesgo, la optimalidad rebasante y el control ´optimo estoc´astico, han tenido un desarrollo notable en los u ´ltimos a˜ nos, y esto en gran parte por las contribuciones de On´esimo Hern´andez-Lerma y a sus colaboradores en todo el mundo. Esto se puede apreciar en: PrietoRumeau y Hern´ andez-Lerma (2012) en juegos markovianos y cadenas de Markov en tiempo continuo; Guo y Hern´andez-Lerma (2009) en procesos markovianos de decisi´ on en tiempo continuo; Hern´andez-Lerma y Lasserre (1996) sobre criterios de optimalidad de proceso markovianos controlados en tiempo discreto; Hern´andez-Lerma y Lasserre (1999) en sus investigaciones en procesos markovianos controlados en tiempo discreto; Hern´ andez-Lerma y Lasserre (2003) en cuanto a cadenas de Markov y probabilidades invariantes1 ; Hern´andez-Lerma (1989) por sus aportaciones en procesos markovianos adaptativos; y Hern´andez-Lerma (1990) y (1994) por sus contribuciones a los procesos markovianos en tiempo discreto y tiempo continuo, respectivamente. 1 V´ease tambi´en el trabajo de Hern´ andez-Lerma y Lasserre (2001b) sobre cadenas de Markov y procesos de Harris.


Procesos markovianos en la toma de decisiones

3

La formaci´ on y trayectoria de On´esimo Hern´andez-Lerma se sustentan en su ardua y constante labor en el desarrollo de las matem´aticas. Obtuvo la Licenciatura en F´ısica y Matem´aticas en la Escuela Superior de F´ısica y Matem´ aticas (ESFM) del Instituto Polit´ecnico Nacional (IPN) en 1970. Posteriormente, obtuvo la maestr´ıa, en 1976, y el doctorado, en 1978, en la Divisi´ on de Matem´aticas Aplicadas de Brown University en los Estados Unidos de Norteam´erica. Actualmente es Profesor Em´erito y Jefe del Departamento de Matem´aticas del Centro de Investigaci´ on y de Estudios Avanzados (CINVESTAV) del IPN. Por su reconocida trayectoria y m´ ultiples e importantes contribuciones a las matem´ aticas a trav´es de la investigaci´on, la formaci´on de recursos humanos y la divulgaci´ on cient´ıfica, On´esimo Hern´andez-Lerma recibi´o, en 2009, el Premio Thomson Reuters por ser el matem´atico mexicano con el mayor n´ umero de citas, (alrededor de 1200); en 2008, obtuvo el Premio Scopus de la Editorial Elsevier por el alto impacto de sus publicaciones; en 2008, la Presea L´azaro C´ardenas del IPN, la m´axima distinci´ on que otorga esta Instituci´on a un investigador; en 2003, el Doctor Honoris Causa por la Universidad de Sonora (UNISON); y, en 2001, el Premio Nacional de Ciencias y Artes otorgado por el Gobierno de los Estados Unidos Mexicanos a trav´es de la Secretar´ıa de Educaci´on P´ ublica. Asimismo ha sido distinguido como miembro del Consejo Consultivo de Ciencias de la Presidencia de la Rep´ ublica. Es tambi´en miembro del Sistema Nacional de Investigadores (SNI) con el nivel III desde 1993. Las ´ areas de investigaci´on de On´esimo Hern´andezLerma son muchas y entre ellas destacan: el control o´ptimo de sistemas estoc´asticos, la teor´ıa de juegos estoc´ asticos, la programaci´on lineal infinita y los procesos markovianos. Sobre estos temas, ha publicado alrededor de 140 art´ıculos de investigaci´on en revistas especializadas con estricto arbitraje y pertenecientes a ´ındices internacionales de gran prestigio, as´ı como 11 libros y monograf´ıas en casas editoriales de amplio reconocimiento. Su labor en la formaci´on de recursos humanos a nivel de posgrado ha sido excepcional, habiendo graduado 18 estudiantes de doctorado y 39 de maestr´ıa. Adem´ as, por su liderazgo en el ´ambito mundial ha recibido visitantes posdoctorales de Francia, Espa˜ na y la Rep´ ublica Popular China, entre otros pa´ıses. El prop´ osito del presente trabajo consiste en realizar una revisi´ on, la cual no pretende ser exhaustiva, sobre las contribuciones y reformulaciones de On´esimo Hern´ andez-Lerma en procesos markovianos. Muchas de las contribuciones recientes en procesos markovianos de decisi´on,


4

Francisco Venegas-Mart´ınez

juegos estoc´ asticos y control ´ optimo estoc´astico se deben a On´esimo Hern´andez-Lerma y a sus coautores en todo el mundo. Una ventaja did´actica de las aportaciones de sus investigaciones en el modelado del proceso de la toma secuencial de decisiones de agentes racionales, en tiempo discreto o continuo y en ambientes con riesgo e incertidumbre, es que todas sus investigaciones proporcionan una visi´on unificada y consistente. Este trabajo est´ a organizado de la siguiente manera: en la pr´oxima secci´on se revisan los procesos markovianos de decisi´on; en la tercera secci´on se estudian los juegos markovianos; a trav´es de la cuarta secci´on se analiza la optimalidad en sesgo y optimalidad rebasante; en el quinto apartado se examina la optimalidad de Blackwell para procesos markovianos de difusi´ on controlados; en el transcurso de la sexta secci´on se revisa la teor´ıa de control ´ optimo estoc´astico con procesos markovianos de difusi´on; por u ´ltimo, en la s´eptima secci´on se proporcionan las conclusiones, destacando las ´ areas de oportunidad para extender la teor´ıa de los procesos markovianos.

2

Procesos markovianos de decisi´ on

Existen muchos sistemas en ciencias naturales y sociales en donde los eventos futuros tienen asociada una distribuci´on de probabilidad que depende s´ olo del presente, en cuyo caso podr´ıa ser id´oneo modelarlos con cadenas de Markov. Varias preguntas surgen en el comportamiento de una cadena de Markov: ¿C´ omo evoluciona un proceso de este tipo? ¿Converge, en alg´ un sentido, a un estado estacionario? ¿Qu´e tan r´apido converge? Estas preguntas han sido ampliamente contestadas en la literatura cuando la cadena de Markov tiene un n´ umero finito de estados ¿Pero qu´e sucede cuando hay un n´ umero infinito de estados, numerable o continuo? Al respecto, Hern´ andez-Lerma y Lasserre (2003) se ocupan de las cadenas de Markov homog´eneas en tiempo discreto con espacios arbitrarios de estados y con un comportamiento erg´odico descrito con medidas de probabilidad invariantes. En particular, esta secci´on se concentra en procesos markovianos controlados en tiempo discreto y con horizonte de planeaci´ on finito o infinito. Muchos fen´omenos y situaciones de inter´es son susceptibles de ser modelados bajo este esquema2 ; 2 Este esquema es tambi´en conocido como programaci´ on din´ amica estoc´ astica en tiempo discreto.


Procesos markovianos en la toma de decisiones

5

por ejemplo, la toma de decisiones de consumo, producci´on, inversi´on, y la valuaci´on de proyectos de inversi´ on, ya sea en el corto o largo plazo. Una clase relevante de procesos de control la constituyen los procesos de control markovianos. La evoluci´on de estos procesos en tiempo discreto se puede describir como sigue. El sistema se encuentra, inicialmente, en el estado i0 = x entonces el controlador elige una acci´on (o control) a0 = a, lo que genera un costo, r(x, a), que depende del estado y el control. Posteriormente, el sistema se mueve a un nuevo estado i1 = y de acuerdo con una ley de transici´on en la que el futuro s´ olo est´a determinado por el presente. Este procedimiento se repite y los costos se acumulan. Se dice que {in : n = 1, 2, . . .} es un proceso de control markoviano, en tiempo discreto, si para cualquier estrategia π (una funci´on de las sucesiones de estados y acciones) y cualquier n = 0, 1, . . ., la distribuci´ on en n + 1, dada toda la historia del proceso hasta n, depende s´olo del estado y la acci´ on en n. Los estados y las acciones son colecciones de variables aleatorias, definidas en un espacio de probabilidad adecuado, y el objetivo es encontrar una pol´ıtica de control que optimice un criterio de desempe˜ no (en t´erminos de valores esperados). A continuaci´ on se presentan, en forma breve, los elementos que integran un proceso markoviano de decisi´ on, abreviado mediante {S, A, K, q, r}. Para simplificar la exposici´ on se supone que el espacio de estados, S, es discreto. Considere una cadena de Markov controlada en tiempo discreto con: i) Un espacio de estados, finito o numerable S. ii) Un espacio medible de acciones, A, equipado con una σ-´algebra A de A. En este caso, el conjunto de restricciones se representa mediante K = S × A. iii) Para cada estado i ∈ S existe un conjunto de acciones A(i) disponibles. Estos conjuntos se suponen elementos de A. iv) Una matriz de probabilidades de transici´on [q(j | i, a)]. Para cada i, j ∈ S y a ∈ A(i) la funci´ on q(j | i, a) es no negativa y medible, y j∈S q(j | i, a) = 1 para cada i ∈ S y a ∈ A(i). v) Una funci´ on r : K → R, llamada la utilidad, ganancia o costo, dependiendo del contexto.


6

Francisco Venegas-Mart´ınez

Sea Hn = S × (S × A)n el espacio de historias hasta el tiempo n = 0, 1, . . .. Sea H = Hn el espacio de todas las historias finitas. 0≤n<∞

Los espacios Hn y H est´ an equipados con las σ-´algebras generadas por 2s y A. Una estrategia π es una funci´on que asigna a cada historia de estados y acciones hn = (i0 , a0 , i1 , . . . , in−1 , an−1 , in ) ∈ Hn , n = 0, 1, . . ., una medida de probabilidad π(· | hn ) definida en (A, A) que satisface las siguientes condiciones: a) π(A(in ) | hn ) = 1, b) Para cualquier B ∈ A la funci´ on π(B | ·) es medible en H.

Una estrategia de Markov φ es una sucesi´on de funciones φn : S → A, n = 0, 1, . . ., tal que φn (i) ∈ A(i) para cualquier i ∈ S. Se dice que una estrategia de Markov φ es (N, ∞)-estacionaria, donde N = 0, 1, . . ., si φn (i) = φN (i) para cualquier n = N + 1, N + 2 y cualquier i ∈ S. A una estrategia (0, ∞)-estacionaria se le llama, simplemente, estacionaria. De esta manera, una estrategia estacionaria se determina por una funci´on φ : S → A tal que φ(i) ∈ A(i), i ∈ S. Una estrategia estacionaria aleatorizada φ es definida por sus distribuciones condicionales φ(· | i), i ∈ S, sobre (A, A) de tal forma que φ(A(i) | i) = 1 para cualquier i ∈ S. Observe que en esta construcci´on “can´onica”, los procesos de estados y de acciones son colecciones de variables aleatorias. El conjunto H∞ de todas las sucesiones de estadosacciones (i0 , a0 , i1 , a1 , ..., in−1 , an−1 , in , an , ...) y su correspondiente sigma-´ algebra producto, F , forman un espacio medible (H∞ , F ). As´ı, cada estrategia π y estado inicial i0 = x inducen una u ´nica medida de probabilidad Pxπ sobre H∞ , en cuyo caso se denota al operador de esperanza por Exπ . As´ı, la utilidad total descontada3 cuando el estado inicial es i y la estrategia utilizada es π est´a dada por: ∞ π t β r(it , at ) V (i | π) = Ei t=0

donde β ∈ (0, 1) es el factor de descuento. La funci´on de valor del problema planteado se define mediante V (i) = sup V (i | π), π

3 Se pueden utilizar diversos criterios de desempe˜ no; v´ease, por ejemplo, Feinberg(1982).


Procesos markovianos en la toma de decisiones

7

Sea una constante no negativa. Una estrategia π ∗ se llama ε-´optima si para todo estado inicial i V (i | π) ≥ V (i) − ε. Una estrategia 0-´ optima se llama, simplemente, ´optima. Con respecto del esquema anterior, Hern´andez-Lerma (1989) considera sistemas de control estoc´ astico parcialmente observables en tiempo discreto. El autor estudia el problema de control adaptativo no param´etrico, en un horizonte infinito, con el criterio de ganancia total descontada y proporciona las condiciones para que una pol´ıtica adaptativa sea asint´oticamente ´ optima, as´ı mismo establece condiciones para aproximar uniformemente, casi seguramente, la funci´on de ganancia ´optima. Su trabajo combina resultados de convergencia con problemas de control estoc´astico adaptativo y param´etrico. Asimismo, Hern´ andez-Lerma (1986) proporciona procedimientos de discretizaci´on de procesos markovianos de control adaptativo, en tiempo discreto, con un n´ umero finito de estados y en un horizonte de planeaci´on infinito, los cuales dependen de par´ ametros desconocidos. En su investigaci´on las discretizaciones se combinan con un esquema coherente de estimaci´on de par´ ametros para obtener aproximaciones uniformes a la funci´on de valor ´ optimo, as´ı como para determinar pol´ıticas de control adaptativas asint´ oticamente ´ optimas. Por otro lado, Hern´ andez-Lerma (1985), bajo el criterio de ganancia descontada y con un espacio de estados numerable, estudia los procesos semi-markovianos de decisi´ on que dependen de par´ametros desconocidos. Y dado que los valores verdaderos de los par´ametros son inciertos, el autor proporciona un esquema iterativo para determinar asint´oticamente la m´ axima ganancia total descontada. Las soluciones toman el esquema iterativo de valor no estacionario de Federgruen y Schweitzer (1981) y se combinan con el principio de estimaci´on y control para el control adaptativo de procesos semi-markovianos de Sch¨al (1987).4 Asimismo, Jasso-Fuentes y Hern´ andez-Lerma (2007) estudian una clase general de los procesos markovianos de difusi´on con ganancia media esperada (tambi´en conocido como ganancia erg´odica) y proporcionan algunos criterios “sensibles” al descuento. Estos autores dan las 4

V´ease tambi´en Hern´ andez-Lerma y Marcus (1987).


8

Francisco Venegas-Mart´ınez

condiciones bajo las cuales varios criterios de optimalidad son equivalentes. Otros trabajos relacionados se encuentran en: Guo y Hern´andezLerma (2003a) al estudiar cadenas de Markov controladas en tiempo continuo; Guo y Hern´ andez-Lerma (2003b) al proporcionar condiciones de tendencia y monotonicidad para procesos markovianos de control en tiempo continuo bajo el criterio de pago promedio; Guo y Hern´andezLerma (2003c) que analizan cadenas de Markov controladas en tiempo continuo con el criterio de pagos descontados; y Hern´andez-Lerma y Govindan (2001) quienes investigan el caso de procesos markovianos de control no estacionarios con pagos descontados en un horizonte infinito.

Por u ´ltimo es importante destacar que Hern´andez-Lerma (1986) extiende el esquema iterativo introducido por White (1980) para un n´ umero finito de estados con el prop´osito de aproximar la funci´on de valor de un proceso markoviano con un conjunto numerable de estados a un espacio multidimensional numerable de estados. Bajo los mismos supuestos de White (1980), el autor proporciona un esquema iterativo para determinar asint´ oticamente una pol´ıtica ´optima descontada, la cual, a su vez, se puede utilizar para obtener una pol´ıtica ´optima estacionaria.5

3

Juegos markovianos en tiempo continuo

En esta secci´ on se formaliza un juego estoc´astico de suma cero con dos jugadores en tiempo continuo y homog´eneo6 . Los elementos que conforman dicho juego se expresan en forma abreviada como {S, A, B, KA , KB , q, r}. Aqu´ı, S es el espacio de estados, el cual se supone numerable, y A y B son los espacios de acciones para los jugadores 1 y 2, respectivamente. Estos espacios se suponen espacios polacos (i.e., espacios m´etricos, separables y completos). Los conjuntos KA ⊂ S × A y KB ⊂ S × B son espacios de Borel que representan conjuntos de restricciones. Es decir, para cada estado i ∈ S, la i-secci´on en KA , a saber, A(i) := 5 Otros trabajos relacionados con el tema son Hern´ andez-Lerma (1985) y Hern´ andez-Lerma y Marcus (1984) y (1985). 6 En Hern´ andez-Lerma (1994) se encuentra una introducci´ on a los procesos markovianos de control en tiempo continuo. Asimismo, el caso discreto es tratado en Hern´ andez-Lerma y Lasserre (1999).


Procesos markovianos en la toma de decisiones

9

{a ∈ A | (i, a) ∈ KA } representa el conjunto de acciones admisibles para el jugador 1 en el estado i; similarmente, la i-secci´on en KB , B(i) := {b ∈ B | (i, b) ∈ KB }, representa la familia de acciones admisibles para el jugador 2 en el estado i. Considere ahora el subconjunto de Borel dado S × A × B y defina K := {(i, a, b) | i ∈ S, a ∈ A(i), b ∈ B(i)}. La componente q denota la matriz [q(j|i, a, b)] de tasas de transici´on del juego, la cual satisface [q(j|i, a, b)] ≥ 0 para toda (i, a, b) ∈ K, i = j, y se supone conservativa, es decir, [q(j|i, a, b)] = 0 ∀ (i, a, b ∈ K) j∈S

y estable, esto es, q(i) :=

sup a∈A(i), b∈B(i)

qi (a, b) < ∞, ∀ i ∈ S,

donde qi (a, b) = −q(i | i, a, b) para toda a ∈ A(i) y b ∈ B(i). Adem´as, q(j | i, a, b) es una funci´ on medible en A × B para i, j ∈ S fijas. Por u ´ltimo, r : K → R es la tasa de ganancia (o utilidad) del jugador 1 (o la tasa de costo para el jugador 2). Los jugadores 1 y 2 observan continuamente el estado presente del sistema. Siempre que el sistema est´e en el estado i ∈ S en el tiempo t ≥ 0, los jugadores eligen de manera independiente las acciones a ∈ A(i) y b ∈ B(i), conforme a algunas estrategias admisibles introducidas m´as adelante. Como una consecuencia de esto, ocurre lo siguiente: (1) el jugador 1 recibe una ganancia r(i, at , bt ); (2) el jugador 2 incurre en una p´erdida r(i, at , bt ) (se dice que el juego es de suma cero porque lo que un jugador gana, el otro irremediablemente lo pierde) y (3) el sistema se mueve a un nuevo estado j = i con una funci´on de transici´on posiblemente no homog´enea determinada por las tasas de transici´on [q(j | i, a, b)]. El objetivo del jugador 1 es maximizar su ganancia, mientras que para el jugador 2 es minimizar su costo o p´erdida con respecto a alg´ un criterio de desempe˜ no, Vα , el cual definir´a posteriormente. Sea X un espacio polaco. Den´ otese por B(X) su σ-´algebra de Borel, y por P (X) el espacio de Borel de medidas de probabilidad definidas sobre X, equipado con la topolog´ıa de convergencia d´ebil. Una estrategia markoviana para el jugador 1, denotada por π 1 , es una familia ucleos estoc´ asticos que satisfacen: {πt1 , t ≥ 0} de n´


10

Francisco Venegas-Mart´Ĺnez

(1) Para cada t ≼ 0 e i ∈ S, Ď€t1 (¡ | i) es una medida de probabilidad sobre A tal que Ď€t1 (A(i) | i) = 1; (2) Para cada E ∈ B(A) e i ∈ S, la funci´ on t → Ď€t1 (E | i) es Borel medible para t ≼ 0. Sin p´erdida de generalidad, en virtud de (1), tambi´en se puede ver a Ď€t1 (¡ | i) como una medida de probabilidad sobre A(i). Asimismo, se denotar´a por Î m 1 a la familia de todas las estrategias markovianas del jugador 1. Una estrategia markoviana Ď€ 1 = {Ď€t1 (¡ | i), t ≼ 0} ∈ Î m 1 es llamada estacionaria si para cada i ∈ S existe una medida de probabilidad Ď€t1 (¡ | i) ∈ P (A(i)) tal que Ď€t1 (¡i) = Ď€ 1 (¡ | i) para toda t ≼ 0; esta pol´Ĺtica se denota mediante {Ď€ 1 (¡ | i), i ∈ E}. El conjunto de todas las estrategias estacionarias del jugador 1 es representada por Î s1 . La misma notaci´on es utilizada para el jugador 2, con P (B(i)) en lugar de P (A(i)). m Para cada par de estrategias, (Ď€ 1 , Ď€ 2 ) := {(Ď€t1 , Ď€t2 ), t ≼ 0} ∈ Î m 1 Ă— Î 2 , las tasas de ganancia y de transici´ on se definen, respectivamente, para cada i, j ∈ S y t ≼ 0, como: 1 2 q(j | i, a, b)Ď€t1 (da | i)Ď€t2 (db | i) q(j | i, t, Ď€ , Ď€ ) := B(i)

y 1 2

r(t, i, π π ) :=

Ď€1

B(i)

A(i)

A(i)

r(i, a, b)Ď€t1 (da | i)Ď€t2 (db | i).

Ď€2

y son ambas estacionarias, las expresiones En particular, cuando anteriores se escriben, usualmente, como q(j | i, Ď€ 1 , Ď€ 2 ) y r(i, Ď€ 1 , Ď€ 2 ), respectivamente. Considere ahora la matriz Q(t, Ď€ 1 , Ď€ 2 ) = [q(j | i, t, Ď€ 1 , on de transici´ on (tal vez subestoc´astica) p(s, i, t, j, Ď€ 1 , Ď€ 2 ) Ď€ 2 )] una funci´ 1 2 para la cual Q(t, Ď€ , Ď€ ) es su matriz de tasas de transici´on, es decir, ∂p(s, i, t, j, Ď€ 1 , Ď€ 2 ) = q(j | i, s, Ď€ 1 , Ď€ 2 ) ∂t t=s

para todo i, j ∈ S y s ≼ 0 es llamada un proceso de tipo Q. Un proceso de tipo Q, p(s, i, t, j, Ď€ 1 , Ď€ 2 ), es llamado honesto si j∈S p(s, i, t, j, Ď€ 1 , Ď€ 2 ) = 1 para toda i ∈ S y t ≼ s ≼ 0.

En lo que sigue se definen Î 1 y Î 2 como subconjuntos de estrategias markovianas que contienen a Î s1 y Î s2 y que satisfacen la condici´on de continuidad de las correspondientes tasas de transici´on para t ≼ 0 y


Procesos markovianos en la toma de decisiones

11

para toda estrategia en Π1 y Π2 . De esta manera, q(j | t, i, π 1 , π 2 ) es continua en t ≥ 0, para todo i, j ∈ S, y (π 1 , π 2 ) ∈ Π1 × Π2 . Para cada pareja de estrategias (π 1 , π 2 ) ∈ Π1 ×Π2 , los datos iniciales (s, i) ∈ S := [0, ∞) × S y un factor de descuento α > 0, el criterio de pago descontado Vα (s, i, π 1 , π 2 ) se define como   ∞  p(s, i, t, j, π 1 , π 2 )r(t, j, π 1 , π 2 ) e−α(t−s) dt. Vα (s, i, π 1 , π 2 ) := s

j∈S

Las siguientes dos funciones:

L(s, i) := sup π 1 ∈Π

inf Vα (s, i, π 1 , π 2 )

π 2 ∈Π2 1

y U (s, i) := inf sup Vα (s, i, π 1 , π 2 ) π 2 ∈Π2 π 1 Π1

definidas sobre S son llamadas el valor inferior y el valor superior respectivamente, del juego con pago descontado. Es claro que L(s, i) ≤ U (s, i) ∀ (s, i) ∈ S. Si L(s, i) = U (s, i) para toda (s, i) ∈ S, entonces a la funci´on com´ un se le llama el valor del juego y es denotada por V . Suponga que el juego tiene un valor V , entonces una estrategia π ∗1 en Π1 se dice que es ´ optima para el jugador 1 si inf Vα (s, i, π ∗1 , π 2 ) = V (s, i) ∀ (s, i) ∈ S.

π 2 ∈Π2

optima para el jugador 2 si Similarmente, π ∗2 en Π2 es ´ sup Vα (s, i, π 1 , π 2∗ ) = V (s, i) ∀ (s, i) ∈ S.

π 1 ∈Π1

Si π ∗k ∈ Πk es ´ optima para el jugador k (k = 1, 2), entonces el par ∗1 ∗2 optima. (π , π ) es llamado una estrategia ´ Con respecto del planteamiento anterior es importante destacar que Guo y Hern´andez-Lerma (2005a) estudian juegos de suma cero para cadenas de Markov en tiempo continuo con la posibilidad de que las utilidades y las tasas de transici´ on sean no acotadas, esto bajo el criterio de


14

Francisco Venegas-Mart´ınez

distingue entre las pol´ıticas f y f . Para evitar este comportamiento, se imponen condiciones bajo las cuales las ganancias en un horizonte finito de pol´ıticas estacionarias son necesariamente de la forma JT (f ) = T g(f ) + hf (·) + e(f, T ), donde hf (·) es el sesgo de f y e(f, T ) es el t´ermino residual que tiende a 0 cuando T → ∞. En consecuencia, si f y f son dos pol´ıticas estacionarias con la misma ganancia promedio, entonces JT (f ) − JT (f ) = hf (·) − hf (·) + [e(f, T ) − e(f , T )] Si adem´as se supone que hf (·) ≥ hf (·), entonces la pol´ıtica f la cual tiene un sesgo mayor, eventualmente rebasar´a a f en el sentido de que para cualquier ε > 0 dado JT (f ) ≥ JT (f ) − ε para toda T suficientemente grande. En otras palabras, la maximizaci´on de la funci´ on de sesgo, dentro de la clase de pol´ıticas de ganancia ´optima, permite obtener la pol´ıtica con mayor crecimiento. Al respecto, Escobedo-Trujillo, L´ opez-Barrientos y Hern´andez-Lerma (2012) tratan con juegos diferenciales estoc´ asticos de suma cero con ganancias promedio en el largo plazo. Su principal objetivo es proporcionar las condiciones para la existencia y caracterizaci´on de equilibrios ´optimos en sesgo y rebasantes. Primero caracterizan la familia de estrategias ´optimas de ganancias promedio. Posteriormente, en esta familia, se imponen condiciones adecuadas para determinar las subfamilias de los equilibrios en sesgo y rebasantes. Un aspecto esencial para conseguir esto es demostrar la existencia de soluciones de las ecuaciones de optimalidad de ganancia promedio. Esto se hace mediante el enfoque usual del “descuento desvaneciente”. ´ Por su parte, Alvarez-Mena y Hern´andez-Lerma (2006) consideran juegos estoc´ asticos no cooperativos de N personas con los criterios de ganancias descontadas. El espacio de estados se supone que es numerable y los conjuntos de acci´ on son espacios m´etricos compactos. Estos autores obtienen varios resultados importantes. El primero se refiere a la sensibilidad o la aproximaci´ on de juegos restringidos. El segundo muestra la existencia de equilibrios de Nash para juegos restringidos con un espacio de estados finito (y un espacio acciones compacto). El tercero extiende las condiciones para la existencia de una clase de juegos


Procesos markovianos en la toma de decisiones

13

valor del juego y un par de estrategias estacionarias ´optimas mediante el uso de la ecuaci´ on de optimalidad de Shapley (1953). Por u ´ltimo, presentan una caracterizaci´ on de martingala de un par de estrategias ´optimas estacionarias. Por otro lado, Guo and Hern´ andez-Lerma (2005b) realizan un estudio sobre juegos de suma no cero de dos personas para cadenas de Markov en tiempo continuo bajo el criterio de pago descontado en espacios de acci´ on de Borel. Las tasas de transici´on son, posiblemente, no acotadas, y las funciones de pago podr´ıan no tener cotas superiores ni inferiores. En este trabajo se proporcionan las condiciones que garantizan la existencia de equilibrios de Nash en estrategias estacionarias. Para el caso de juegos de suma cero, demuestran la existencia del valor del juego, y tambi´en proporcionan una forma recursiva de calcularlo, o al menos aproximarlo. Estos autores tambi´en demuestran que si las tasas de transici´ on est´ an uniformemente acotadas, entonces un juego de tiempo continuo es equivalente, en cierto sentido, a un juego markoviano en tiempo discreto. Por u ´ltimo, Guo y Hern´ andez-Lerma (2007) extienden sus investigaciones de juegos de suma cero de dos personas para procesos de Markov de saltos en tiempo continuo con un criterio de pago con descuento. Los espacios de estados y de acciones son espacios polacos (espacios m´etricos, separables y completos), las tasas de transici´on pueden ser no acotadas, y las tasas de ganancia pueden no tener cotas superiores ni inferiores. En este trabajo, los autores extienden los resultados en Guo y Hern´andez-Lerma (2003d) a procesos markovianos de saltos en tiempo continuo. Suponga que JT (f ) denota la ganancia total esperada durante el intervalo de tiempo [0, T ] cuando se utiliza la pol´ıtica de control f . Sea g(f ) := lim inf T →∞

1 JT (f ) T

la ganancia promedio correspondiente. Si f y f son dos pol´ıticas tales que JT (f ) = JT (f ) + T θ para toda T > 0 y alg´ un θ ∈ (0, 1), entonces se tienen dos pol´ıticas que producen la misma ganancia promedio aunque sus ganancias en un horizonte finito son diferentes. As´ı, el criterio de ganancia promedio no


14

Francisco Venegas-Mart´ınez

distingue entre las pol´ıticas f y f . Para evitar este comportamiento, se imponen condiciones bajo las cuales las ganancias en un horizonte finito de pol´ıticas estacionarias son necesariamente de la forma JT (f ) = T g(f ) + hf (·) + e(f, T ), donde hf (·) es el sesgo de f y e(f, T ) es el t´ermino residual que tiende a 0 cuando T → ∞. En consecuencia, si f y f son dos pol´ıticas estacionarias con la misma ganancia promedio, entonces JT (f ) − JT (f ) = hf (·) − hf (·) + [e(f, T ) − e(f , T )] Si adem´as se supone que hf (·) ≥ hf (·), entonces la pol´ıtica f la cual tiene un sesgo mayor, eventualmente rebasar´a a f en el sentido de que para cualquier ε > 0 dado JT (f ) ≥ JT (f ) − ε para toda T suficientemente grande. En otras palabras, la maximizaci´on de la funci´ on de sesgo, dentro de la clase de pol´ıticas de ganancia ´optima, permite obtener la pol´ıtica con mayor crecimiento. Al respecto, Escobedo-Trujillo, L´ opez-Barrientos y Hern´andez-Lerma (2012) tratan con juegos diferenciales estoc´ asticos de suma cero con ganancias promedio en el largo plazo. Su principal objetivo es proporcionar las condiciones para la existencia y caracterizaci´on de equilibrios ´optimos en sesgo y rebasantes. Primero caracterizan la familia de estrategias ´optimas de ganancias promedio. Posteriormente, en esta familia, se imponen condiciones adecuadas para determinar las subfamilias de los equilibrios en sesgo y rebasantes. Un aspecto esencial para conseguir esto es demostrar la existencia de soluciones de las ecuaciones de optimalidad de ganancia promedio. Esto se hace mediante el enfoque usual del “descuento desvaneciente”. ´ Por su parte, Alvarez-Mena y Hern´andez-Lerma (2006) consideran juegos estoc´ asticos no cooperativos de N personas con los criterios de ganancias descontadas. El espacio de estados se supone que es numerable y los conjuntos de acci´ on son espacios m´etricos compactos. Estos autores obtienen varios resultados importantes. El primero se refiere a la sensibilidad o la aproximaci´ on de juegos restringidos. El segundo muestra la existencia de equilibrios de Nash para juegos restringidos con un espacio de estados finito (y un espacio acciones compacto). El tercero extiende las condiciones para la existencia de una clase de juegos


Procesos markovianos en la toma de decisiones

15

restringidos que se pueden aproximar por juegos restringidos con un n´ umero finito de estados y espacios de acci´on compactos. Otras contribuciones que son relevantes en juegos estoc´asticos son: Rinc´on-Zapatero (2004) y Rinc´ on-Zapatero et al. (1998) y (2000) quienes caracterizan en juegos diferenciales equilibrios de Nash de subjuegos perfectos; Nowak (2003a) y (2003b), y Nowak y Szajowski (2003) y (2005) analizan equilibrios de Nash de juegos estoc´asticos de suma cero y no cero; y Neck (1985) y (1991) estudia juegos diferenciales entre la autoridad fiscal y el banco central.

4 Optimalidad en sesgo y optimalidad rebasante Jasso-Fuentes y Hern´ andez-Lerma (2008) proporcionan las condiciones para la existencia de pol´ıticas rebasantes ´optimas para una clase general de procesos de difusi´ on controlados. La caracterizaci´on es de tipo lexicogr´afico, es decir, en primer lugar se identifica la clase de las llamadas pol´ıticas can´ onicas y, posteriormente, dentro de esta clase se buscan pol´ıticas con alguna caracter´ıstica especial, por ejemplo, pol´ıticas can´onicas que adem´ as maximizan el sesgo.8 Por otro lado, Escobedo-Trujillo y Hern´andez-Lerma (2011) estudian difusiones controladas moduladas con una cadena de Markov. Una difusi´on controlada modulada con una cadena de Markov es una ecuaci´on diferencial estoc´ astica de la forma dx(t) = b(x(t), ψ(t), u(t))dt + σ(x(t), ψ(t))dWt , x(0) = 0, ψ(0) = i, donde ψ(t) es una cadena de Markov irreducible en tiempo continuo con un espacio de estados finito S = {1, 2, ..., N } y probabilidades de transici´on P {ψ(s + dt) = j | ψ(s) = i} = qij dt + o(dt). Para estados i = j la cantidad qij es la tasa de transici´on de pasar de i a j, mientras que qij . qii = − j =i

8 La optimalidad rebasante fuerte es un concepto introducido inicialmente por Ramsey (1928). Una noci´ on m´ as d´ebil se introdujo, de forma independiente, por Atsumi (1965) y von Weizs¨ acker (1965).


16

Francisco Venegas-Mart´ınez

Estos autores proporcionan las condiciones para la existencia y la caracterizaci´on de pol´ıticas rebasantes ´ optimas. Para ello, primero, utilizan el hecho de que la ganancia promedio de la ecuaci´on de Hamilton-JacobiBellman asegura que la familia de las pol´ıticas de control can´onicas es no vac´ıo. Posteriormente, dentro de esta familia, se caracterizan las pol´ıticas can´ onicas que maximizan el sesgo y que son rebasantes ´optimas. Otros resultados importantes sobre optimalidad en sesgo y optimalidad rebasante se encuentran en Prieto-Rumeau y Hern´andezLerma (2006) y (2009). Asimismo, Prieto-Rumeau y Hern´andez-Lerma (2005) tratan con juegos markovianos de suma cero en tiempo continuo con un espacio de estados numerable, espacios de Borel arbitrarios de acciones y tasas de transici´on y de ganancia (o costo) posiblemente no acotadas. Analizan tambi´en la optimalidad en sesgo y los criterios de optimalidad rebasante.

5

Optimalidad de Blackwell para procesos de difusi´ on controlados

Los criterios de optimalidad m´ as comunes para problemas de control ´optimo con horizonte infinito son los de utilidad descontada esperada y utilidad promedio esperada. Estos dos criterios tienen objetivos opuestos: el primero distingue el desempe˜ no en el corto plazo, ya que se desvanece para intervalos de tiempo grandes, mientras que el segundo considera la conducta asint´ otica, ignorando simplemente lo que pasa en intervalos de tiempo finito. Como alternativa a estas dos situaciones extremas se consideran refinamientos del criterio de utilidad promedio tales como optimalidad rebasante, optimalidad en sesgo y los llamados criterios sensibles al descuento, los cuales incluyen optimalidad con mdescuentos para un entero m ≥ −1 y optimalidad de Blackwell para m = +∞. Se les llama “refinamientos” porque se refieren a pol´ıticas de control que optimizan la utilidad promedio y que, adem´as, satisfacen alguna otra propiedad adicional. Al respecto, es importante resaltar que Jasso-Fuentes y Hern´ andez-Lerma (2009) proporcionan algunos de estos refinamientos. Estos autores dan condiciones que garantizan la optimalidad con m-descuentos para cada entero m ≥ −1 y tambi´en para la optimalidad de Blackwell cuando el sistema controlado es un proceso de difusi´on markoviano de la forma dx(t) = b(x(t), u(t))dt + σ(x(t))dB(t)

para todo t ≥ 0 y x(0) = x,


Procesos markovianos en la toma de decisiones

17

donde b(·, ·) : Rn × U → Rn y σ(·) : Rn → Rn×d son funciones dadas que satisfacen un conjunto de condiciones est´andar y B(·) es un movimiento browniano de dimensi´ on d. El conjunto U ⊂ Rm es llamado el conjunto de control (o de acciones) y u(·) es un proceso estoc´astico con valores en el conjunto U , el cual representa la acci´on del controlador a cada tiempo t ≥ 0.

6

Control ´ optimo con procesos markovianos de difusi´ on

En esta secci´ on se establece el problema general de control ´optimo estoc´astico donde las restricciones son procesos markovianos de difusi´on y se formula la t´ecnica de programaci´ on din´amica con la cual se obtiene la ecuaci´on diferencial parcial no lineal de Hamilton-Jacobi-Bellman (HJB), cuya soluci´ on caracteriza el control ´optimo y con ello las trayectorias de las variables que optimizan a la funci´on objetivo9 . De acuerdo con Hern´andez-Lerma (1994), el control ´optimo estoc´astico es una t´ecnica matem´ atica utilizada para resolver problemas de optimizaci´on de sistemas din´ amicos en ambientes de incertidumbre. A continuaci´ on se establece el modelo matem´atico general del problema de control ´ optimo estoc´ astico en tiempo continuo. Considere un sistema din´amico en tiempo continuo con un horizonte temporal finito, [0, T ]. Se consideran, primero, funciones adecuadas µ(t, x, u) y σ(t, x, u) que satisfacen µ : R+ × Rn × Rk → Rn , σ : R+ × Rn × Rk → Rn+d .

Para un punto x0 ∈ Rn considere la ecuaci´on diferencial estoc´astica dXt = µ(t, Xt , ut )dt + σ(t, Xt , ut )dWt X 0 = x0 , en donde se considera al proceso n-dimensional Xt , como el proceso de variables de estado, que se requiere controlar, el proceso k-dimensional ut como el proceso de control, cuya correcta elecci´on controlar´a a Xt , y Wt es un proceso de Wiener d-dimensional, definido sobre un espacio fijo de probabilidad equipado con una filtraci´on (Ω, F, (FtW )t∈[0,T ] , P). 9 Para m´ as detalles sobre el problema de control o ´ptimo estoc´ astico en tiempo continuo v´ease, por ejemplo, Bj¨ ork, Myhrman y Persson (1987).


18

Francisco Venegas-Mart´ınez

Se define a continuaci´ on una regla de control admisible. Para tal efecto se considera la clase de procesos de control admisible como procesos cuyo valor ut en el tiempo t es adaptado al proceso de estado Xt , el cual se obtiene mediante una funci´on u(t, x) u : R+ × Rn → Rk , de manera que ut = u(t, Xt ), u as´ı definida es llamada regla de control de retroalimentaci´on o estrategia markoviana. Ahora se impone a u la restricci´on de que para cada t, ut ∈ U ⊂ Rk donde U es la clase de controles admisibles. Una regla de control u(t, x) es admisible si: 1) u(t, x) ∈ U , ∀ t ∈ R+ y ∀ x ∈ Rn , 2) Para cualquier punto inicial (t, x) dado, la ecuaci´on diferencial estoc´ astica dXs = µ(s, Xs , u(s, Xs ))ds + σ(s, Xs , u(s, Xs ))dWs Xt = x tiene una u ´nica soluci´ on. Dado que el problema de control ´ optimo se establecer´a en un esquema estoc´astico, y toda vez que el proceso de estados es n-dimensional, ser´a necesario definir las siguientes funciones y establecer el teorema fundamental del c´ alculo estoc´ astico, llamado el lema de Itˆo (para el caso de n variables). Para cualquier regla de control u, las funciones µu y σ u son definidas por: µu (t, x) = µ(t, x, u(t, x)), σ u (t, x) = σ(t, x, u(t, x)), y se suponen con segundas derivadas continuas. El lema de Itˆo para n variables de estado se establece a continuaci´on. Considere la funci´ on y = f (t, x), x = (x1 , x2 , ..., xn ) y las ecuaciones diferenciales estoc´ asticas dxi = µi (xi , t)dt + σi (xi , t)dWit , i = 1, 2, ..., n,


Procesos markovianos en la toma de decisiones

19

y cualquier vector fijo ut ∈ Rk , entonces para cualquier regla de control u, se tiene  n n ∂f (t, x) 1 ∂ 2 f (t, x) u dy =  + Ďƒ (xi , t) Ďƒju (xi , t) Ď ij ∂t 2 ∂xj ∂xi i j=1 i=1 n n ∂f (t, x) u ∂f (t, x) u + Âľi (xi , t) dt + Ďƒi (xi , t) dWit , ∂xi ∂xi i=1

i=1

donde Ď ij es el coeficiente de correlaci´ on entre dWj,t y dWit , de tal forma que Ď ij dt = Cov(dWit , dWjt ). Ahora bien, dada una regla de control ut = u(t, Xtu ) con su correspondiente proceso controlado X u se utilizar´a la notaci´on dXtu = Âľu dt + Ďƒ u dWt . Para definir la funci´ on objetivo del problema de control se consideran las funciones: F : R+ Ă— Rn Ă— Rk → R dada por (t, Xtu , ut ) → F (t, Xtu , ut ) y ÎŚ : Rn → R dada por Xtu → ÎŚ(Xtu ), donde F eval´ ua el desempeËœ no del sistema a trav´es del tiempo y ÎŚ eval´ ua 2 el final. Se supone que tanto F como ÎŚ son de clase C . Se define la funcional objetivo del problema de control como J0 : U → R, T u u F (t, Xt , ut )dt + ÎŚ(XT ) | F0 , J0 (u) = E 0

on disponible al tiempo t = 0. El donde F0 representa la informaci´ problema de control puede ser escrito como uno de maximizaci´on de la funcional J0 (u), sobre u ∈ U . Se define la funcional ´optima por tal que J 0 = J0 ( u) se J 0 = max J0 (u). Si existe un control admisible u u∈U

es un control ´ dice entonces que u optimo para el problema dado. Si se supone una pareja (t, x) fija donde t ∈ [0, T ] y x ∈ Rn , el problema de control se puede definir como: T u u f (s, Xs , us )dt + ÎŚ(XT ) | Ft max E us

t

sujeto a

dXsu = Âľ(s, Xsu , u(s, Xsu ))ds + Ďƒ(s, Xsu , u(s, Xsu ))dWs , Xt = x,


20

Francisco Venegas-Mart´Ĺnez

y a la restricci´ on u(s, y) ∈ U para todo (s, y) ∈ [t, T ] Ă— Rn .

La funci´on de valor J : R+ Ă— Rn Ă— U → R es definida mediante T u u F (s, Xs , us )ds + ÎŚ(XT ) | Ft . J(t, x, u) = E t

La funci´on de valor ´ optimo es J : R+ Ă— Rn → R y est´a definida por x) = max J(t, x, u). J(t, u∈U

El objetivo, ahora, es caracterizar la funci´on de valor en el control ´optimo mediante una ecuaci´ on diferencial parcial, mejor conocida como la ecuaci´on diferencial parcial de HJB10 . Es importante destacar que la derivaci´ on que, a continuaci´ on, se hace de la ecuaci´on de HJB es informal, pero ilustrativa. Suponga que: 1) Existe una regla de control ´ optimo, 2) La funci´ on de valor ´ optimo J es de clase C 2 .

Considere el par (t, x) ∈ (0, T ) Ă— Rn fijo pero arbitrario, y suponga un incremento muy pequeËœ no, de hecho diferencial, dt ∈ R tal que t < t + dt < T . Tambi´en se considera una regla de control u fija pero arbitraria. Por lo tanto, dada la definici´on de la funci´on de valor ´optimo y el incremento dt, se tiene la relaci´ on recursiva temporal T u u F (s, Xs , us )ds + ÎŚ(XT ) | Ft J(t, x) = max J(t, x, u) = max E u∈U u∈U t t+dt T u u u F (s, Xs , us )ds + F (s, Xs , us )ds + ÎŚ(XT ) | Ft = max E u∈U

= max E u∈U

t

t+dt

t+dt

t

F (s, Xsu , us )ds

u u + J(t + dt, Xt + dXt ) | Ft .

En esta expresi´ on se aplica al primer sumando el teorema del valor medio de c´alculo integral y al segundo una expansi´on en serie de Taylor, de lo cual resulta Xtu ) Xtu ) = max E F (t, Xtu , ut )dt + o(dt) + J(t, J(t, u∈U X u ) + o(dt) | Ft . +dJ(t, t 10

La ecuaci´ on HJB es el resultado central en la teor´Ĺa de control o ´ptimo. La ecuaci´ on correspondiente en tiempo discreto se conoce como la ecuaci´ on de Bellman.


21

Procesos markovianos en la toma de decisiones

Despu´es de simplificar, se tiene Xtu ) | Ft . 0 = max E F (t, Xtu , ut )dt + o(dt) + dJ(t, u∈U

En la expresi´ on anterior se aplica el lema de Itˆo para obtener la dife as´Ĺ rencial estoc´ astica de J, F (t, Xtu , ut ) dt + o (dt)

0 = max E u∈U

+

+

+

n

∂ J (t, Xtu ) ∂ J (t, Xtu ) u + Âľi (xi , t) ∂t ∂xi

1 2

i=1

n n



∂ 2 J (t, Xtu ) u Ďƒi (xi , t) Ďƒju (xi , t) Ď ij  dt ∂xj ∂xi j=1 i=1

n ∂ J (t, X u ) t

i=1

∂xi

Ďƒiu (xi , t) dWit |Ft

,

donde, como antes, Ď ij satisface Ď ij dt = Cov(dWit , dWjt ). Despu´es de tomar valores esperados a los t´erminos aleatorios de la ecuaci´on anterior y dado que dWit âˆź N (0, dt), se obtiene  n ∂ J (t, Xtu ) ∂ J (t, Xtu ) u u  0 = max F (t, Xt , u) + Âľi (xi , t) + u∈U ∂t ∂xi i=1  n n 2 u 1 ∂ J (t, Xt ) u + Ďƒi (xi , t) Ďƒju (xi , t) Ď ij  . 2 ∂xj ∂xi j=1 i=1

Toda vez que el an´ alisis ha sido realizado sobre un punto fijo pero arbitrario, entonces la ecuaci´ on es v´ alida para todo (t, x) ∈ (0, T ) Ă— Rn , de tal forma que: 1) J satisface on de HJB:  la ecuaci´ n X u) Xtu ) ∂ J(t, ∂ J(x, t u  0 = max F (t, Xt , u) + Âľui (xi , t) + u∈U ∂t ∂xi j=1  n n 2 u ∂ J(t, Xt ) u 1 Ďƒi (xi , t)Ďƒju (xi , t)Ď ij  + 2 ∂xj ∂xi j=1 i=1

x) = ÎŚ(x) ∀ x ∈ Rn . ∀(t, x) ∈ (0, T ) Ă— Rn , J(T,


22

Francisco Venegas-Mart´ınez

2) Para cada (t, x) ∈ (0, T ) × Rn , el m´aximo en la ecuaci´on HJB es alcanzado por u = u(t, x). A partir de la ecuaci´ on HJB se sigue que u es u ´nica ya que x y t son µu , σ u y σ u son consideradas como dadas. Si se fijos y las funciones F, J, i i j supone que u ∈ U es ´ optimo, entonces se obtiene la siguiente ecuaci´on diferencial parcial de segundo orden en J, 0=

F (t, Xtu , u)

n X u) Xtu ) ∂ J(t, ∂ J(t, t + + µui (xi , t) ∂t ∂xi i=1

+

n n X u) 1 ∂ 2 J(t, t

2

j=1 i=1

∂xj ∂xi

σiu (xi , t)σju (xi , t)ρij .

Al derivar esta ecuaci´ on con respecto de la variable de control, u, se tiene la siguiente condici´ on de primer orden (condici´on necesaria): n X u) Xtu ) ∂ ∂ J(t, ∂F (t, Xtu , u) ∂ 2 J(t, t + + µui (xi , t) 0= ∂u ∂u∂t ∂u ∂xi i=1   n n Xtu ) ∂  1 ∂ 2 J(t, σiu (xi , t)σju (xi , t)ρij  . + ∂u 2 ∂xj ∂xi j=1 i=1

La ecuaci´ on anterior caracteriza al control ´optimo u en funci´on de x es decir u Para resolver la ecuaci´on anterior y eny t y J; =u (t, x, J). contrar la trayectoria ´ optima del control, se procede a utilizar el m´etodo de funciones en variables separables, aunque es necesario recordar que, en general, es dif´ıcil obtener una soluci´on explicita de la ecuaci´on HBJ. Sin embargo, en diversas aplicaciones en las ciencias naturales y sociales la ecuaci´on de HJB tiene una soluci´ on anal´ıtica; v´eanse, por ejemplo, Merton (1990) y Hakansson (1970).

7

Conclusiones

La forma en que los agentes definen su actuar requiere de un proceso de abstracci´ on en el que el individuo escoge y organiza sus acciones, en su propio beneficio, de acuerdo con un criterio preestablecido, elaborando con ello un plan para anticipar posibles efectos no deseados. En esta investigaci´ on se ha realizado una revisi´on de las contribuciones de On´esimo Hern´ andez-Lerma a la teor´ıa y pr´actica de los procesos markovianos. Se resaltan los avances recientes de On´esimo Hern´andez-Lerma


Procesos markovianos en la toma de decisiones

23

que han impulsado el potencial y las bondades t´ecnicas de los procesos markovianos en el modelado de los procesos de toma de decisiones de agentes racionales incorporando din´ amicas m´as realistas a diversas variables (econ´omicas y financieras) de inter´es. Particularmente, se destacan las extensiones y reformulaciones de On´esimo Hern´andez-Lerma en los procesos markovianos de decisi´ on, los juegos estoc´asticos, la optimalidad de Blackwell para procesos de difusi´on controlados y el control ´optimo estoc´ astico donde las restricciones son procesos markovianos de difusi´on. Varios temas de gran potencial para la investigaci´on en sistemas controlados con procesos markovianos has sido ampliamente estudiados por Hern´andez-Lerma, entre ellos destacan los refinamientos de los criterios de utilidad promedio tales como optimalidad rebasante, optimalidad en sesgo y los llamados criterios sensibles al descuento, los cuales incluyen la optimalidad de Blackwell. En este sentido, se destacan los trabajos de Hern´andez-Lerma y varios coautores sobre las condiciones para la existencia y caracterizaci´ on de equilibrios ´optimos en sesgo y rebase, sobre todo en lo que se refiere a la caracterizaci´on de estrategias ´optimas de ganancia promedio. Francisco Venegas-Mart´ınez Escuela Superior de Econom´ıa Instituto Polit´ecnico Nacional Plan de Agua Prieta, No. 66 Col. Plutarco El´ıas Calles Del. Miguel Hidalgo M´exico, D. F. 11340 M´exico fvenegas1111@yahoo.com.mx

Referencias ´ [1] Alvarez-Mena J.; Hern´ andez-Lerma O., Existence of Nash equilibria for constrained stochastic games, Math. Methods Oper. Res. 63 (2006), 261–285. [2] Atsumi H. Neoclassical growth and the efficient program of capital accumulation, Rev. Econ. Stud. 32 (1965), 127–136. [3] Bj¨ork T.; Myhrman J.; Persson M., Optimal consumption with stochastic prices in continuous time, J. Appl. Probab. 24 No. 1 (1987), 35–47.


24

Francisco Venegas-Mart´ınez

[4] Cerra V.; Saxena S. C., Did output recover from the Asian crisis?, IMF Staff Papers 52 (2005), 1–23. [5] Escobedo-Trujillo B. A.; L´ opez-Barrientos J. D.; Hern´andezLerma O., Bias and overtaking equilibria for zero-sum stochastic differential games, J. Optim. Theory Appl. 153 No. 3 (2012), 662–687. [6] Escobedo-Trujillo B. A.; Hern´ andez-Lerma O., Overtaking optimality for controlled Markov-modulated diffusion, Optimization (2011), 1–22. [7] Feinberg E. A., Controlled Markov processes with arbitrary numerical criteria, Theor. Probability Appl. 27 (1982), 486–503. [8] Federgruen A.; Schweitzer P. J., Nonstationary Markov decision problems with converging parameters, J. Optim. Theory Appl. 34 (1981), 207–241. [9] Goldfeld S. M.; Quandt R. E., A Markov model for switching regressions, J. Econom. 1 (1973), 3–16. [10] Guo H.; Hsu W., A survey of algorithms for real-time bayesian network inference, Joint Workshop on Real-Time Decision Support and Diagnosis Systems, Edmonton, Canada (2002). [11] Guo X. P.; Hern´ andez-Lerma O., Continuous-time controlled Markov chains, Ann. Appl. Probab., 13 (2003a), 363–388. [12] Guo X. P.; Hern´ andez-Lerma O., Drift and monotonicity conditions for continuous-time Markov control processes with an average criterion, IEEE Trans. Automat. Control, 48 (2003b), 236– 245. [13] Guo X. P.; Hern´ andez-Lerma O., Continuous-time controlled Markov chains with discounted rewards, Acta Appl. Math., 79 (2003c), 195–216. [14] Guo X. P.; Hern´ andez-Lerma O., Zero-sum games for continuoustime Markov chains with unbounded transition and average payoff rates, J. Appl. Probab., 40 No. 2 (2003d), 327–345. [15] Guo X. P.; Hern´ andez-Lerma O., Zero-sum continuous-time Markov games with unbounded transition and discounted payoff rates, Bernoulli 11 No. 6 (2005a), 1009–1029.


Procesos markovianos en la toma de decisiones

25

[16] Guo X. P.; Hern´ andez-Lerma O., Nonzero-sum games for continuous-time Markov chains with unbounded discounted payoffs, J. Appl. Probab. 42 No. 2 (2005b), 303–320. [17] Guo X. P.; Hern´ andez-Lerma O., Zero-sum games for continuoustime jump Markov processes in polish spaces: discounted payoffs, Adv. in Appl. Probab. 39 No. 3 (2007), 645–668. [18] Guo X.P.; Hern´ andez-Lerma O., Continuous-Time Markov Decision Processes: Theory and Applications, Springer-Verlag, New York, 2009. [19] Hakansson N., Optimal investment and consumption strategies under risk for a class of utility functions, Econometrica, 38 No. 5 (1970), 587–607. [20] Hern´andez-Lerma O., Control ´ optimo y juegos estoc´asticos, Escuela de Matem´ atica de Am´erica Latina y del Caribe, CIMAT, Guanajuato, M´exico (2005). [21] Hern´andez-Lerma O., Lectures on Continuous-Time Markov Control Processes, Aportaciones Matem´aticas 3, Sociedad Matem´atica Mexicana, Mexico City (1994). [22] Hern´andez-Lerma O., Lecture Notes on Discrete-Time Markov Control Processes, Departamento de Matem´aticas, CINVESTAVIPN, 1990. [23] Hern´andez-Lerma O., Adaptive Markov Control Processes, Springer-Verlag, New York, 1989. [24] Hern´andez-Lerma O., Finite-state approximations for denumerable multidimensional state discounted Markov decision processes, J. Math. Anal. Appl., 113 No. 2 (1986), 382–389. [25] Hern´andez-Lerma O., Nonstationary value-iteration and adaptive control of discounted semi-Markov processes, J. Math. Anal. Appl., 112 (1985), 435–445. [26] Hern´andez-Lerma O.; Marcus S. I., Optimal adaptive control of priority assignment in queueing systems, Systems Control Lett., 4 (1984), 65–72.


26

Francisco Venegas-Mart´ınez

[27] Hern´ andez-Lerma O.; Marcus S. I., Adaptive control of discounted Markov decision chains, J. Optim. Theory Appl., 46 (1985), 227– 235. [28] Hern´ andez-Lerma O.; Marcus S. I., Adaptive policies for discretetime stochastic systems with unknown disturbance distribution, Systems Control Lett., 9 (1987), 307–315. [29] Hern´ andez-Lerma O.; Marcus S. I., Nonparametric adaptive control of discrete-time partially observable stochastic systems, J. Math. Anal. Appl., 137 No. 2 (1989), 312–334. [30] Hern´ andez-Lerma O.; Govindan T. E., Nonstationary continuoustime Markov control processes with discounted costs on infinite horizon, Acta Appl. Math., 67 (2001), 277–293. [31] Hern´ andez-Lerma O.; Lasserre J. B., Discrete-Time Markov Control Processes, Springer-Verlag, New York, 1996. [32] Hern´ andez-Lerma O.; Lasserre J. B., Further Topics on DiscreteTime Markov Control Processes, Springer-Verlag, New York, 1999. [33] Hern´ andez-Lerma O.; Lasserre J. B., Zero-sum stochastic games in Borel spaces: average payoff criterion, SIAM J. Control Optimization, 39 (2001a), 1520–1539. [34] Hern´ andez-Lerma O.; Lasserre J. B., Further criteria for positive Harris recurrence of Markov chains, Proc. Amer. Math. Soc., 129 No. 5 (2001b), 1521–1524. [35] Hern´ andez-Lerma O.; Lasserre J. B., Markov Chains and Invariant Probabilities, Birkh¨ auser, Basel, 2003. [36] Jasso-Fuentes H.; Hern´ andez-Lerma O., Ergodic control, bias and sensitive discount optimality for Markov diffusion processes, Stoch. Anal. Appl., 27 (2007), 363–385. [37] Jasso-Fuentes H.; Hern´ andez-Lerma O., Characterizations of overtaking optimality for controlled diffusion processes, Appl. Math. Optim., 57 (2008), 349–369. [38] Jasso-Fuentes H.; Hern´ andez-Lerma O., Blackwell optimality for controlled diffusion processes, J. Appl. Probab., 46 No. 2 (2009), 372–391.


Procesos markovianos en la toma de decisiones

27

[39] Merton R. C., Continuous-time finance, Rev. Econom. Statist., 51 No. 2 (1992), 247–257. [40] Merton R. C., Continuous-Time Finance, Basil Blackwell, Cambridge, Massachusetts, 1990. [41] Neck R., A differential fame model of fiscal and monetary policies: conflict and cooperation, Optimal control theory and economic analysis, Second Viennese Workshop on Economic Applications of Control Theory, Vienna (1984), 2 1985, 607–632. [42] Neck R., Non-cooperative equilibrium solution for a stochastic dynamic game of economic stabilization policies, Dynamic Games in Economic Analysis, Lecture Notes in Control and Information Sciences, 157 (1991). [43] Nowak A. S. (2003a). Zero-sum stochastic games with Borel state spaces, Proceedings of the NATO Advanced Study Institute, Stony Brook, New York (1999), 570, 77–91. [44] Nowak A. S. (2003b). On a new class of nonzero-sum discounted stochastic games having stationary Nash equilibrium points, Int. J. Game Theory, 32, 121–132. [45] Nowak A. S.; Szajowski P.(2003). On Nash equilibria in stochastic games of capital accumulation, Stochastic Games and Applications, 9, 118–129. [46] Nowak A. S.; Szajowski K., Advances in Dynamic Games. Annals of the International Society of Dynamic Games Vol. 7 , Birkhauser, Boston, 2005. [47] Prieto-Rumeau T.; Hern´ andez-Lerma O. Bias and overtaking equilibria for zero-sum continuous time Markov games, Math. Meth. Oper. Res., 61 (2005), 437–454. [48] Prieto-Rumeau T.; Hern´ andez-Lerma O. Bias optimality for continuous-time controlled Markov chains, SIAM J. Control Optim., 45 (2006), 51–73. [49] Prieto-Rumeau T.; Hern´ andez-Lerma O. Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains, Math. Meth. Oper. Res., 70 (2009), 527– 240.


28

Francisco Venegas-Mart´ınez

[50] Prieto-Rumeau T.; Hern´ andez-Lerma O. Selected Topics on Continuous-Time Controlled Markov Chains and Markov Games, ICP Advanced Texts in Mathematics, Vol. 5, World Scientific, 2012. [51] Ramsey F. P. A mathematical theory of savings, Economic Journal, 38 (1928), 543–559. [52] Rinc´ on-Zapatero J. P., Characterization of Markovian equilibria in a class of differential games, J. Econ. Dyn. Control, 28 (2004), 1243–1266. [53] Rinc´ on-Zapatero J. P.; Mart´ınez J.; Mart´ın-Herr´an G. New method to characterize subgame perfect Nash equilibria in differential games, J. Optim. Theory Appl., 96 (1998), 377–395. [54] Rinc´ on-Zapatero J. P.; Mart´ınez J.; Mart´ın-Herr´an G. Identification of efficient subgame-perfect Nash equilibria in a class of differential games, J. Optim. Theory Appl., 104 (2000), 235–242. [55] Shapley L. S. (1953). A Value for n-person Games, In: Contributions to the Theory of Games, volume II, H. W. Kuhn and A.W. Tucker (eds.). [56] Sch¨ al, M. Estimation and control in discounted stochastic dynamic programming, Stochastics, 20 (1987), 51–71. [57] Taylor S. J., Asset Price Dynamics, Volatility, and Prediction, Princeton University Press, Princeton, 2005. [58] Tierney L., Markov chains for exploring posterior distributions, Ann. Statist., 22 No. 4 (1994), 1701–1728. [59] von Weizs¨ acker C. C. Existence of optimal programs of accumulation for an infinite horizon, Rev. Econ. Stud., 32 (1965), 85–104. [60] White D. J., in: Recent Developments in Markov Decision Processes (ed. R. Hartley, L. C. Thomas and D. J. White), Academic Press, New York (1980).


.

Morfismos, Vol. 16, No. 2, 2012, pp. 29–49

.

Morfismos, Vol. 16, No. 2, 2012, pp. 29–49

Homotopy theory of non-orientable mapping class of groups Homotopy theory non-orientable mapping class groups Miguel A. Maldonado

1

Miguel A. Maldonado

1

Abstract We give a homotopical approachAbstract to the theory of mapping class groups of surfaces with marked points. Using configuration spaces we construct spaces 1) for the mapping class We give Eilenberg-MacLane a homotopical approach toK(π, the theory of mapping class groups groupsofofsurfaces the projective planepoints. P2 andUsing the Klein bottle IK. with marked configuration spaces Thesewespaces are closely related to the K(π, 1) K(π, spaces1)for construct Eilenberg-MacLane spaces forthe thecorremapping sponding cohomological this IK. class braid groupsgroups. of the Some projective plane P2 consequences and the Kleinofbottle approach presented. Theseare spaces are closely related to the K(π, 1) spaces for the corresponding braid groups. Some cohomological consequences of this approach are presented.

2010 Mathematics Subject Classification: 55P20, 20F36, 37E30. Keywords and phrases: mapping class groups, configuration spaces, bra2010 Mathematics Subject Classification: 55P20, 20F36, 37E30. id groups. Keywords and phrases: mapping class groups, configuration spaces, braid groups.

1

Introduction 1

Introduction

Let Sg be the compact orientable surface of genus g and let Diff+ (Sg ) denote the topological group of orientation-preserving self-diffeomorLet of SgSbe the compact orientable surface of genus g and let Diff+ (S ) phisms g . The mapping class group of Sg is the group Γ(Sg ) of g + denote the intopological of Γ(S orientation-preserving self-diffeomorthat is, isotopy classes Diff+ (Sg ), group g ) = π0 Diff (Sg ). This group phisms of Sg . The mapping class group of Sg is the group Γ(Sg ) of 1 + + This paperclasses is part in of Diff the author’s Ph.D.is, thesis the supervision of Prof. (Sg ), that Γ(Sunder (Sg ). This group isotopy g) = π 0 Diff Miguel Xicot´encatl. The thesis was presented at the Mathematics Department of the 1 Cinvestav-IPN in December This paper is part 2011. of the author’s Ph.D. thesis under the supervision of Prof. Miguel Xicot´encatl. The thesis was presented at the Mathematics Department of the Cinvestav-IPN in December 2011.

29

29


30

Miguel A. Maldonado

has been widely studied in the past few decades specially due to its action on the Teichm¨ uller space Tg of isotopy classes of complex structures on Sg . The quotient by this action is the moduli space Mg of Riemann surfaces, since the space Tg is contractible and the action of Γ(Sg ) is properly discontinuous ([18]), there is an isomorphism on rational cohomology H ∗ (Mg ; Q) ∼ = H ∗ (Γ(Sg ); Q). This isomorphism is one of the main motivations for studying the cohomology of mapping class groups. On the other hand, knowing the cohomology groups of the mapping class group is also useful in the classification of surface bundles. A surface bundle or Sg -bundle is a fiber bundle where the fiber is Sg and the structure group of Sg -bundles is Diff+ (Sg ). One way of determining if two Sg -bundles are isomorphic or not is considering characteristic classes, which are classes in H ∗ (BDiff+ (Sg ); Z) which are natural with respect to bundle maps. It turns that for genus 0 and 1 this problem is solved (see [21]) and for genus g ≥ 2 it turns out that BDiff+ (Sg ) is the classifying space of Γ(Sg ) and thus characteristic classes of Sg -bundles are classes in H ∗ (Γ(Sg ); Z). This work is concentrated on the homotopy theory associated with the theory of mapping class groups when the surface is non-orientable and has a set of k distinguished points. These groups are denoted by Γk (S) and are called the punctured mapping class group of the surface S. Some K(π, 1) for these groups arise naturally from certain actions on the unordered configuration spaces Fk (S)/Σk . The motivation for this approach is the relation of configuration spaces with surface braid groups and thus some constructions applies to these groups. Section 2 begins with the definition of the groups Γk (S) and then a discussion on the construction of K(π, 1) spaces for these groups involving Borel constructions on configuration spaces. Section 3 focuses on the development of basic tools for the theory of configurations spaces which includes Fadell-Neuwirth fibrations and the identification of the fundamental group of Fk (S) and Fk (S)/Σk as braid groups of S. The case of P2 is tackled in Section 4 using the theory of orbit configuration spaces Fk (M ; G) with a particular interest in the spaces Fk (S 2 ; Z2 ) as they are related to Fk (P2 ) via the covering map S 2 → P2 . This will be used to show that the SO(3)-Borel construction of Fk (P2 )/Σk is a K(Γk (P2 ), 1) space. Configuration spaces of the Klein bottle IK are considered in Section


Homotopy theory of mapping class groups

31

5 to demonstrate that the space ESO(2) ×SO(2) Fk (IK)/Σk is a K(π, 1) k (IK), where Γ k denotes the reduced mapping class group space for Γ defined in Section 2. Here the SO(2)-action is given by rotations on the first coordinate and is thought to be the restriction of the action of the group Diff0 (IK) of diffeomophisms of IK isotopic to the identity. The work is concluded by recalling a method for calculating the additive structure of the cohomology of unordered configuration spaces Fk (M )/Σk when M is a surface. This method consist on recognizing the homology of configuration spaces as a part of the homology of a larger space C(M ; X) called the labelled configuration spaces or configuration space with parameters, which is based on the classical work of C. F. B¨odigheimer, F. Cohen and L. Taylor ([7]). This method is useful for obtaining information about the cohomological structure of mapping class groups, although the details of the calculations will appear elsewhere.

2

The punctured mapping class group

The mapping class group Γ(S) of a compact surface S is the group of isotopy classes in Diff(S). Equivalently, one has Γ(S) = Diff(S)/Diff0 (S), where Diff0 (S) consists of self-diffeomorphisms of S which are isotopic to the identity. As mentioned, we are interested on diffeomorphisms preserving a set of distinguished points. Let Qk ⊂ S denote a subset of cardinality k and consider the group Diff+ (S; k) = {f ∈ Diff+ (S) | f (Qk ) = Qk } The punctured mapping class group of S is the group Γk (S) of isotopy classes of Diff+ (S; k), where the isotopies preserve the set Qk . As above, one can also define Γk (S) = Diff(S; k)/Diff0 (S; k). The punctured mapping class group arises from the study of the mapping class group of a surface and that of its branched covering ([4]). From the equivalent definitions above it is easy to see that the problem of determining the homotopy type of both Diff(S) and Diff0 (S) plays an important role in the theory of mapping class groups. One finds that, apart from few cases, Diff0 (S) and Diff(S) do not have the


32

Miguel A. Maldonado

same homotopy type, which will have a great impact on the constructions made in this work. We thus consider a slightly different defini k (S) as the tion of Γk (S) and define the reduced mapping class group Γ k (S) = group of path components of Diff0 (S) ∩ Diff(S; k), which is Γ π0 (Diff0 (S) ∩ Diff(S; k)). This group fits into an exact sequence of the form ([20]) k (S) −→ Γk (S) −→ Γ(S) −→ 1 1 −→ Γ

We mention some low-genus cases for this sequence. First, recall that the mapping class group Γ(S 2 ) is trivial ([3]) since every orientationpreserving self-diffeomorphism of S 2 is isotopic to the identity. The k (S 2 ) ∼ sequence above gives the isomorphism Γ = Γk (S 2 ). Now, since 2 every self-diffeomorphism of P can be lifted to a self-diffeomorphism of S 2 it follows that Γ(P2 ) consists of a single class and there is an k (P2 ) ∼ isomorphism Γ = Γk (P2 ).

Recall that the Klein bottle IK can be obtained as the quotient of = S 1 × S 1 by identifying a pair (u1 , u2 ) with (−u1 , u2 ), where u is complex conjugation. The mapping class group Γ(IK) is isomorphic to Z2 ⊕ Z2 ([19]) and thus there is a sequence T2

k (IK) −→ Γk (IK) −→ Z2 ⊕ Z2 −→ 1 1 −→ Γ

k (IK) as an order 4 subgroup of the punctured mapping which exhibits Γ class group. As will be shown our K(π, 1) constructions depend heavily on the homotopy type of Diff(S) and Diff0 (S), which is recorded next in a classical theorem from A. Gramain. Theorem 2.1. Let N be a compact connected surface, non-orientable, with or without boundary, and let Diff0 (N ) be the subgroup of Diff(N ) of diffeomorphisms isotopic to the identity. Then, 1. If N is P2 , then Diff0 (P2 ) Diff(P2 ) is homotopy equivalent to SO(3). 2. If N is the Klein bottle or the M¨ obius band, then Diff0 (S) is homotopy equivalent to SO(2). 3. For other non-orientable surfaces, Diff0 (N ) is contractible.


Homotopy theory of mapping class groups

33

A more general statement of this theorem involving orientable surfaces is considered in [12],[17], but for most surfaces Diff0 (S) is contractible. The rest of the paper will concentrate on the construction of k (IK), which will be done usK(π, 1) spaces for the groups Γk (P2 ) and Γ ing the concept of orbit homotopy space or Borel construction recorded in the following. Let G be a topological group acting on a space X. The Borel construction associated to this action is the quotient EG ×G X := (EG × X)/G, where EG is a contractible space with a free G-action and the G-action on the product is given by g · (e, x) = (eg −1 , gx). This construction appears in the field of equivariant cohomology where the ordinary cohomology of EG ×G M is defined as the equivariant cohomology of a manifold M with a G-action. The present work focuses on the Diff(S)-action on the configuration spaces Fk (X)/Σk and the associated Borel constructions have a clear description using the Theorem 2.1 as we will show. In the case of the group Diff(P2 ) there is a homotopy equivalence obtained from a diagram of fibrations EDiff(P2 )

× Fk (P2 )/Σk ESO(3) × Fk (P2 )/Σk , SO(3) Diff(P2 )

where the diagonal action of SO(3) on Fk (P2 )/Σk is induced from the action by rotations of lines through the origin in R3 . Also from Theorem 2.1 above there is a homotopy equivalence EDiff0 (IK)

× Fk (IK)/Σk ESO(2) × Fk (IK)/Σk , SO(2) Diff0 (IK)

where the diagonal action of SO(2) is given by rotations on the first coordinate of IK = T 2 / ≈ defined above. Finally, note that for nonorientable surfaces of genus g ≥ 2, there is a homotopy equivalence EDiff0 (Ng ) ×Diff0 (Ng ) Fk (Ng )/Σk Fk (Ng )/Σk . In the next sections we consider the constructions associated to P2 and IK and we prove they are indeed K(π, 1) spaces for the groups Γk (S).


34

Miguel A. Maldonado

This will be done using some tools from the theory of configuration spaces and basic properties of Borel constructions. These topics are introduced in the next section.

3

Surface configuration spaces

Let M be a closed manifold. For k ≥ 1 define the k-th configuration space of M as the subspace of M k given by Fk (M ) = {(x1 , . . . , xk ) | xi = xj , for i = j} For 1 ≤ m ≤ k consider the projection pk,m : Fk (M ) −→ Fm (M ),

(x1 , x2 , . . . , xk ) → (x1 , . . . , xm ),

and note that for a fixed x ˆ = (x1 , . . . , xm ) the space p−1 x) is identified k,m (ˆ with Fk−m (M \Qm ), where Qm is a subset of cardinality m. Theorem 3.1. The projection pk,m is a fibration with fiber Fk−m (M \Qm ) These maps are called the Fadell-Neuwirth fibrations, named after E. Fadell and L. Neuwirth who introduced the theory in the classical paper [13]. These fibrations represent one of the main tools in the theory of configuration spaces and are mainly used in inductive arguments. It is classically known that the fundamental group of Fk (R2 )/Σk is isomorphic to the classical Artin braid group on k strands Bk and for ordered configurations Fk (R2 ) one gets the pure braid group Pk ([14]). A basic analysis of these spaces shows these are Eilenberg-MacLane spaces K(π, 1). In general the pure braid group of M is defined as Pk (M ) = π1 Fk (M ) and Bk (M ) = π1 Fk (M )/Σk is its associated braid group. The theory of surface configuration spaces is particularly interesting due to a result of J. Birman ([2]) which states that the homomorphism i∗ : Pk (M ) −→ (π1 M )k induced by the natural inclusion i : Fk (M ) → M k is an isomorphism if dimM ≥ 3 and is an epimorphism for dim M = 2. This result shows


Homotopy theory of mapping class groups

35

that the structure of Pk (M ) is totally determined by the geometry of M (expressed by the fundamental group) and when M is a surface the group Pk (M ) exhibits a more complex structure since it has the pure braid group Pk as subgroup, arising from the canonical inclusion R2 ⊂ M . In fact, for a surface distinct from the 2-sphere and the projective plane, the kernel of the homomorphism i∗ above is the normal closure of Pk ([16]). From a homotopical point of view more can be said Theorem 3.2 ([13]). For a compact 2-manifold M that is neither the 2-sphere S 2 nor the projective plane P2 , the spaces Fk (M \Qm ) and Fk (M \Qm )/Σk are K(π, 1) spaces, for m ≥ 0. This theorem shows that from a homotopical point of view questions about configuration spaces of surfaces may be reduced to algebraic questions about their associated braid groups. Configuration spaces for S 2 are not K(π, 1) since Fk (S 2 ) contains the higher homotopy of S 2 and SO(3), see [10]. For configurations on P2 the space ES 3 ×S 3 Fk (P2 ) is a K(π, 1) ([23]), where the S 3 -action is induced from the double cover S 3 → SO(3). This construction is the total space of the fibration Fk (P2 ) −→ ES 3 × Fk (P2 ) −→ BS 3 , S3

whose long exact sequence in homotopy shows that there is an isomorphism πn Fk (P2 ) ∼ = πn (S 3 ), for n ≥ 2 and thus Fk (P2 ) is not a K(π, 1) space. In order to obtain a K(π, 1) space for the S 2 and P2 cases one must consider a Borel construction on their associated configuration space. Configuration spaces as homogeneous spaces. Note the group Diff(S) of self-diffeomorphisms of S acts transitively on the configuraˆ, yˆ ∈ Fk (S)/Σk tion space Fk (S)/Σk . That is, given two configurations x there is an element f ∈ Diff(S) such that f (ˆ x) = yˆ. Moreover, this diffeomorphism can be choosen to be isotopic to the identity ([20]). Also note that for a basepoint x ˆ ∈ Fk (S)/Σk , the isotropy subgroup is precisely Diff(S; k). On the other hand, recall that for a general compact manifold M the topological group Diff(M ) is a metrizable manifold modeled in a


36

Miguel A. Maldonado

Fr´echet space ([12], [1]); hence it has the homotopy type of a CWcomplex. In particular, for a compact surface Diff(S) is locally compact and its diagonal action on Fk (S)/Σk induces a homeomorphism Diff(S)/Diff(S; k) ∼ = Fk (S)/Σk ([22]). Under these conditions the Borel construction EDiff(S) ×Diff(S) Fk (S)/Σk is homotopy equivalent to EDiff(S) ×Diff(S) Diff(S)/Diff(S; k) EDiff(S)/Diff(S; k), where the space on the right is a model for the classifying space of Diff(S; k). Note that if we consider the fundamental group of this Borel construction we get π1 (EDiff(S) ×Diff(S) Fk (S)/Σk ) ∼ = π1 BDiff(S; k) ∼ = Γk (S). This is the motivation for considering Borel constructions for constructing Eilenberg-MacLane spaces for punctured mapping class groups. In the following sections we consider the case of the projective plane P2 and the Klein bottle IK. In particular, we will prove that the spaces obtained at the end of Section 2 are indeed K(π, 1) spaces. We close this section showing a relation between braid groups and mapping class groups via covering maps. Let SO(3) acts on S 2 by rotations. For the diagonal action of SO(3) on the space Fk (S 2 )/Σk one has the following Proposition 3.3 ([9]). For k ≥ 3, ESO(3)

×

SO(3)

Fk (S 2 )/Σk is an

Eilenberg-MacLane space of the type K(Γk (S 2 ), 1). can be identified with the quaternions of The universal cover SO(3) unit length, which is the 3-sphere S 3 . Thus the map φ : S 3 → SO(3) induces a S 3 -action on configurations of S 2 . Corollary 3.4. For k ≥ 3 the space ES 3 × Fk (S 2 )/Σk S3

is an Eilenberg-MacLane space of the type K(Bk (S 2 ), 1).


37

Homotopy theory of mapping class groups

Here the K(π, 1) part follows from the diagram of fibrations ES 3 × 3 X S

X

BS 3

id

ESO(3) ×SO(3) X

X

BSO(3)

and the following lemma Lemma 3.5 ([23]). If ESO(3) × X is a K(π, 1), then ES 3 × X is SO(3)

a

4

K(π , 1).

Moreover,

π

∼ = π1 (X).

S3

Orbit configuration spaces

Let G be a group acting freely on a connected manifold M . The k-th orbit configuration space of M ([24]) is the space Fk (M ; G) of k-tuples of points on distinct G-orbits: Fk (M ; G) = {(m1 , . . . , mk ) ∈ M k | Gmi ∩ Gmj = ∅, i = j}, where Gm denotes the G-orbit of m. Note that F1 (M ; G) = M since the action is free and for the trivial group e one has Fk (M ; e) = Fk (M ). In this context of group actions on manifolds there is also a version of the Fadell-Neuwirth fibrations. Theorem 4.1. For 1 ≤ m ≤ k, the projection on the first m coordinates pk,m : Fk (M ; G) −→ Fm (M ; G) is a fibration with fiber Fk−m (M \Om ; G), where Om is the disjoint union of m distinct orbits. There is another tool relating the quotient M/G of the G-action and the orbit configuration spaces Fk (M ; G). For a principal G-bundle π : M → M/G there is an induced principal Gk -bundle π ˜ : Fk (M ; G) −→ Fk (M/G),


38

Miguel A. Maldonado

where π ˜ (m1 , . . . , mk ) = (π(m1 ), . . . , π(mk )) and the action of Gk on Fk (M ; G) is given by (g1 , g2 , . . . gk ) · (m1 , m2 , . . . , mk ) = (g1 m1 , g2 m2 , . . . , gk mk ). For the proof of these statements see [24]. As an example consider Z2 acting on S 2 via the antipodal map. By the comments above, there is a covering map (Z2 )k −→ Fk (S 2 ; Z2 ) −→ Fk (P2 ) which shows that one way for studying configurations on the projective plane is considering the orbit configuration spaces for the 2-sphere. We will do this in the next paragraphs. Let us consider the orbit configuration space for S 2 \On and note that F1 (S 2 \On ; Z2 ) = S 2 \On = S 2 \Q2n , where On = Q2n consists of n distinct pairs of antipodal points. We thus note that F1 (S 2 \On ; Z2 ) is a K(π, 1) space. Applying induction on k in the fibration F1 (S 2 \On+k−1 ; Z2 ) −→ Fk (S 2 \On ; Z2 ) −→ Fk−1 (S 2 \On ; Z2 ) we obtain that Fk (S 2 \On ; Z2 ) is a K(π, 1). Theorem 4.2. Let SO(3) acts diagonally on Fk (S 2 ; Z2 ). Then, for k ≥ 2, the space ESO(3) × Fk (S 2 ; Z2 ) is a K(π, 1). SO(3)

Proof. Consider the subspace W of F2 (S 2 ; Z2 ) given by pairs of orthogonal vectors of unit lenght. The group SO(3) acts on W transitively and freely and commutes with the antipodal action of Z2 . Thus one has that W = SO(3) and the inclusion W −→ F2 (S 2 ; Z2 ) is a SO(3)-equivariant homotopy equivalence ([10]). The inclusion thus induces a homotopy equivalence between Borel constructions

ESO(3) × W −→ ESO(3) × F2 (S 2 ; Z2 ). SO(3)

SO(3)


39

Homotopy theory of mapping class groups

The space on the left is homotopy equivalent to ESO(3) which is contractible. Thus ESO(3) Ă—SO(3) F2 (S 2 ; Z2 ) is contractible. On the other hand, consider the fibration p

Fk−2 (S 2 \O2 ; Z2 ) −→ Fk (S 2 ; Z2 ) −→ F2 (S 2 ; Z2 ). By the comments preceding the theorem, the fiber is K(Ď€, 1) and thus the map p induces isomorphisms âˆź =

p∗ : Ď€n Fk (S 2 ; Z2 ) −→ Ď€n F2 (S 2 ; Z2 ), n = 1 Finally, consider the commutative diagram Fk (S 2 ; Z2 )

ESO(3) Ă— Fk (S 2 ; Z2 )

p

F2 (S 2 ; Z2 )

BSO(3)

SO(3)

id

ESO(3) Ă— F2 (S 2 ; Z2 )

BSO(3)

SO(3)

where each row is a fibration. Since p induces isomorphisms on πn for n = 1 if follows that πn ESO(3) × Fk (S 2 ; Z2 ) = 0 SO(3)

for n = 1 and the theorem follows.

The main consequence of this theorem is a determination of the homotopy type of the Borel construction for the SO(3)−action on configurations of the projective plane. This result can also be proved directly from the identification of P2 with the Grassmann manifold O(3)/O(1) Ă— O(2), this proof is given in [23].

Theorem 4.3. For k ≼ 2, the construction ESO(3) ×SO(3) Fk (P2 ) is a K(π, 1) space. Proof. Given the covering map π : Fk (S 2 ; Z2 ) → Fk (P2 ) let us consider the induced commutative diagram which each row is a fibration Fk (S 2 ; Z2 )

ESO(3) Ă— Fk (S 2 ; Z2 )

BSO(3)

ESO(3) Ă— Fk (P2 )

BSO(3)

SO(3)

Ď€

Fk (P2 )

SO(3)


40

Miguel A. Maldonado

Here, SO(3) acts diagonally on Fk (P2 ) considering P2 as the space of lines through the origin in R3 . Since π is a covering map, it induces an isomorphism in πi , for all i = 1. It follows that the vertical map on the middle also induces an isomorphism in πi , for i = 1. From the preceding theorem, it follows that ESO(3) × Fk (P2 ) is a K(π, 1). SO(3)

Since 1 × Σk acts freely on ESO(3) × Fk (P2 ) it follows that SO(3)

ESO(3) × Fk (P2 )/Σk SO(3)

is a K(π, 1) with π isomorphic to Γk (P2 ), by the results on Section 3. This theorem is used in [23] and [10] to get that ES 3 ×S 3 Fk (P2 ) and ES 3 ×S 3 Fk (P2 )/Σk are K(π, 1) spaces for the pure braid group Pk (P2 ) and the (full) braid group Bk (P2 ), respectively. Proposition 4.4. [23] For k ≥ 2, the space ES 3 ×S 3 Fk (P2 )/Σk is a K(Bk (P2 ), 1). The proof of this theorem can be obtained from Lemma 3.5 above and certain diagram of fibrations. Moreover, these methods can be applied also for K(π, 1) spaces for the braid groups of the 2-sphere S 2 .

5

Configurations on the Klein bottle

Let T 2 = S 1 × S 1 be the standard 2-dimensional torus and let IK = T 2 / ∼ be the Klein bottle, where (u1 , u2 ) ∼ (−u1 , u2 ) and u2 is complex conjugation. Consider the action θ : O(2) × T 2 −→ T 2 , given by (A, (u1 , u2 )) → (Au1 , det(A)u2 ), where Au1 is matrix multiplication. Thus, there is a commutative diagram O(2) × T 2

θ

T2

θˆ

π

1×π

O(2) × IK

IK


41

Homotopy theory of mapping class groups

where π : T 2 → IK is the obvious quotient map, and θˆ is the induced action on IK. The restriction of the action θˆ above gives an SO(2)-action on IK, which is given by matrix multiplication on the first coordinate and leaving the second coordinate fixed: SO(2) × IK −→ IK,

(A, (u1 , u2 )) → (Au1 , u2 ).

Consider the SO(2)-action induced on Fk (IK) and its associated Borel construction. Lemma 5.1. Let k ≥ 1 and assume that ESO(2) × IK is a K(π, 1) SO(2)

space. Then the space ESO(2) × Fk (IK) SO(2)

is also a K(π , 1). Remark. The assumption on the SO(2)-Borel construction for IK will be proven at the end of this section, so Lemma 5.1 is actually true for k ≥ 1. Proof. Consider the Fadell-Neuwirth fibration IK\Q2 −→ F2 (IK\Q1 ) −→ IK\Q1 . It follows from the homotopy exact sequence that F2 (IK\Q1 ) is a K(π , 1). Also, this space is the base space of the fibration IK\Q3 −→ F3 (IK\Q1 ) −→ F2 (IK\Q1 ) from which one gets that F3 (IK\Q1 ) is also a K(π , 1). Continuing with this process one can show that Fk (IK\Q1 ) is a K(π, 1) space. Now consider the fibration induced at the level of Borel constructions Fk−1 (IK\Q1 ) −→ ESO(2) × Fk (IK) −→ ESO(2) × IK. SO(2)

SO(2)

By hypothesis, the space ESO(2) × IK is a K(π, 1). Then the lemma SO(2)

follows from the associated homotopy exact sequence.


42

Miguel A. Maldonado

Corollary 5.2. For k ≥ 1, the space ESO(2) × Fk (IK)/Σk is a SO(2)

K(π, 1). The rest of the section will be devoted to prove the following resut, which was assumed in Lemma 5.1. Lemma 5.3. The space ESO(2) × IK is a K(π, 1). SO(2)

Proof. Consider the natural fibration IK −→ ESO(2) × IK −→ BSO(2) SO(2)

and its associated homotopy exact sequence · · · −→ πi (IK) −→ πi (ESO(2) × IK) −→ πi (BSO(2)) −→ · · · SO(2)

Now, since πi (IK) is trivial for i ≥ 2 and BSO(2) is a K(Z, 2), the homotopy groups πi (ESO(2) × IK) are trivial for i ≥ 3. Notice that SO(2)

π2 (ESO(2) × IK) is trivial if and only if the boundary map ∂ in the SO(2)

following exact sequence is injective ∂

π2 (ESO(2) × IK) → π2 BSO(2) −→ π1 IK → π1 (ESO(2) × IK). SO(2)

SO(2)

Since IK is a finite dimensional K(π, 1) the fundamental group π1 IK is torsion-free 2 and thus it suffices to prove that ∂ is nonzero. First consider the diagram π2 (BSO(2))

π1 (IK)

∼ =

π1 ΩBSO(2)

∼ =

π1 SO(2) 2

This result can be obtained as a consequence of Proposition 4.2 in [8] on the free resolution induced by the universal cover of a K(π, 1) space.


Homotopy theory of mapping class groups

43

where the isomorphism at the top arises from the homotopy exact sequence for the path-loop fibration of BSO(2) and the isomorphism at the bottom is induced by the natural homotopy equivalence. Via these isomorphisms the boundary map ∂ can be considered as a morphism on fundamental groups π1 SO(2) → π1 IK. On the other hand, recall there is a SO(2)-action on IK by rotation on the first coordinate: (Au1 , u2 ). If (u0 , u0 ) denotes the base point of IK this action induces a map θ : SO(2) → IK given by evaluation of a rotation on (u0 , u0 ). We finish the proof by noting that ∂ is the map induced by θ at the level of fundamental groups θ∗ : π1 SO(2) −→ π1 IK which is clearly a non-zero homomorphism.

6

Cohomological considerations

The K(π, 1) spaces constructed previously can be used to get a homotopical aproximation to the calculation of the cohomology of mapping class groups and also for braid groups. This is done by the identification H ∗ (π) = H ∗ (K(π, 1)), for a commutative ring of coefficients. This approach has been considered in [20], [23], [6] obtaining remarkable information on the cohomological structure. In what follows we will recall some of these results. The 2-sphere. Let SO(3) acts on S 2 by rotations and consider the associated diagonal action on Fk (S 2 )/Σk . The Borel construction for this action ESO(3) × Fk (S 2 )/Σk SO(3)

is a K(π, 1) for the punctured mapping class group Γk (S 2 ), for k ≥ 2, as mentioned in Section 3. Moreover, with mod-2 coefficients this construction gives an isomorphism of H ∗ (BSO(3); F2 )-modules H ∗ (Γ2k (S 2 ); F2 ) ∼ = H ∗ (BSO(3); F2 ) ⊗ H ∗ (F2k (S 2 )/Σ2k ; F2 ),


44

Miguel A. Maldonado

where the cohomology H ∗ (F2k (S 2 )/Σ2k ; F2 ) can be expressed in terms of the cohomology of braid groups ([9]). Calculations with coefficients in the sign representation F(−1) and with mod-p coefficients can be obtained by considering certain models of function spaces involving the Borel construction above. See [6]. The projective plane. Recall the K(π, 1) space obtained in Theorem 4.3 for Γk (P2 ) is the total space of a fibration with base BSO(3) and fiber Fk (P2 )/Σk . It turns that its cohomology spectral sequence with coefficients mod-2 collapses at the E2 -term ([20]) and thus there is an isomorphism of modules H ∗ (Γk (P2 ); F2 ) ∼ = H ∗ (BSO(3); F2 ) ⊗ H ∗ (Fk (P2 )/Σk ; F2 ) The Klein bottle. The case of the Klein bottle is also treated in [20]. The Borel fibration for the SO(2)-action Fk (IK)/Σk −→ ESO(2) × Fk (IK)/Σk −→ BSO(2) SO(2)

has a collapsing spectral sequence on mod-2 cohomology and thus the cohomology of Γk (IK) can be expressed as a direct product of modules: H ∗ (Γk (IK); F2 ) ∼ = H ∗ (BSO(2); F2 ) ⊗ H ∗ (Fk (IK)/Σk ; F2 ) The two isomorphisms above express that the mod-2 cohomology of punctured mapping classes is completely determined by the mod2 cohomology of unordered configuration spaces of surfaces. In [20] this relation is used to obtain information about the cohomology of the groups Γk (P2 ) and Γk (IK). We recall the method in what follows. First one considers the homology of Fk (M )/Σk as part of the homology of a larger space called the labelled configuration space C(M ; X), which is defined for a CW-complex X with basepoint ∗ as the quotient   C(M ; X) =  Fj (M ) × X j  / ≈, j≥1

Σj

where the relation ≈ is generated by

(m1 , . . . , mj ; x1 , . . . , xj ) ≈ (m1 , . . . , mj−1 ; x1 , . . . , xj−1 ),

if xj = ∗.


Homotopy theory of mapping class groups

45

This space is filtered by subspaces Ck (M ; X) given by all configurations of lenght ≤ k and stably splits ([5]) as the wedge of successive quotients Dk (M ; X) = Ck (M ; X)/Ck−1 (M ; X). Thus there is a isomorphism on reduced homology i C(M ; X) ∼ H =

∞ k=1

i Dk (M ; X). H

Moreover, in the case of X = S n there is an explicit description of the mod-2 homology of C(M ; S n ) as a graded vector space ([7]) in terms of the homology of iterated loop spaces (1)

H∗ (C(M ; S n )) ∼ =

m q=0

H∗ (Ωm−q S m+n )⊗βq

where βq is q-th Betti number of M . Every factor on the right hand is an algebra with weights associated to its generators and the reduced homology of Dk (M ; S n ) is the vector subspace generated by all elements of weight k. On the other hand, it is easy to see ([7]) that the space Dk (M ; S n ) is the Thom space of the n-fold sum of the k-dimensional vector bundle η : Rk × Fk (M )/Σk −→ Fk (M )/Σk . Σk

and thus there is an isomorphism H∗ (Fk (M )/Σk ) ∼ = H∗+kn (Dk (M ; S n )), the additive structure of the homology of unordered configurations can be found by counting generators of weight k. The isomorphism (1) above is valid for a general smooth compact manifold M of dimension m and coefficients in a field of characteristic zero. If the coefficients are different to F2 the proof of the isomorphism (1) requires m + n to be odd. When M is a surface the mod-2 homology of the labelled configuration space C(M ; S n ) is given by the tensor product H∗ (Ω2 S n+2 )⊗β0 ⊗ H∗ (ΩS n+2 )⊗β1 ⊗ H∗ (S n+2 )⊗β2 , where β0 , β1 and β2 are the mod-2 Betti numbers of M . More explicitly, H∗ C(M ; S n ) ∼ = F2 [y0 , y1 , . . . , ]⊗β0 ⊗ F2 [x]⊗β1 ⊗ (F2 [u]/u2 )⊗β2 ,


46

Miguel A. Maldonado

where y0 , x and u are the fundamental classes on degrees n, n + 1 and n + 2, respectively, and yj = Qj1 (y0 ) = Q1 Q1 · · · Q1 (y0 ). Here Q1 is the first Dyer-Lashof operation ([11]). The weights of all generators are given by ω(u) = 1, ω(yj ) = 2j , ω(xi ) = i,

for i = 1, 2, . . .

Thus, for the genus g non-orientable closed surface Ng the isomorphism has the form H∗ (C(Ng ; S n )) ∼ = F2 [y0 , y1 , . . .] ⊗ F2 [x1 , x2 , . . . , xg ] ⊗ F2 [u]/u2 , and a basis for H q Dk (Ng ; S n ) consists of monomials of degree q of the form a h = ue xa11 . . . xg g y0b0 y1b1 y2b2 . . . yrbr , for some r ≥ 0, e = 0, 1 and ai , bj ≥ 0, such that ω(h) = e +

g i=1

ai +

r

2j bj = k

j=0

For k = 2, after counting the degrees of the monomials above one finds that  if q = 0   F2   g+1  if q = 1   F2 2 g +1 Hq (F2 (Ng )/Σ2 ; F2 ) = F2 if q = 2   g   F2 if q = 3    0 otherwise For k = 2 and g = 1, one has

  F2    2    F2 Hq (F2 (P2 )/Σ2 ; F2 ) = F22    F2     0

if q = 0 if q = 1 if q = 2 if q = 3 otherwise

This can be compared with the calculations in [15] where the ring structure of H ∗ (F2 (P2 )/Σ2 ) was obtained motivated by the symmetric


Homotopy theory of mapping class groups

47

topological complexity of real projective spaces Pn and the problem of embeddings Pn ⊂ Rm . The cohomology of F2 (P2 )/Σ2 is a truncated ring polynomial on two 1-dimensional variables, given in terms of the generator z ∈ H 1 (P∞ ). Acknowledgement The author would like to thank the referee for its valuable comments during the preparation of the present work. Miguel A. Maldonado Unidad Acad´emica de Matem´ aticas, Universidad Aut´ onoma de Zacatecas, mmaldonado@mate.reduaz.mx

References [1] Antonelli P. L.; Burghelea D.; Kahn, P.J., The non-finite homotopy type of some diffeomorphism groups, Topology, 11 (1972), 1–49. [2] Birman J.S., On braid groups, Comm. Pure Appl. Math., 22 (1969), 21–72. [3] Birman, J.S., Braids, Links and Mapping Class Groups, Ann. of Math. Stud., 82, Princeton University Press, 1974. [4] Birman, J.S., Hilden, H.M. On isotopies of homeomorphisms of Riemann surfaces, The Annals of Mathematics, 97:3 (1973), 424– 439. [5] B¨odigheimer C. F., Stable splittings of mapping spaces, Algebraic Topology, Seattle, Wash. (1985), Lecture Notes in Mathematics, 1286 1987, 174–187. [6] B¨odigheimer C. F.; Cohen F.; Peim M., Mapping class groups and function spaces, Contemporary Mathematics, 271 (2001), 17–31. [7] B¨odigheimer C. F.; Cohen F.; Taylor L., On the homology of configuration spaces, Topology, 28:1 (1989), 111–123. [8] Brown K. S., Cohomology of Groups, Graduate Texts in Mathematics, Springer Verlag, 1982.


48

Miguel A. Maldonado

[9] Cohen F. R., On the hyperelliptic mapping class groups, SO(3) and Spinc (3), Amer. J. Math. 115 (1993), 389–434. [10] Cohen F.; Pakianathan J., Configuration spaces and briad groups, Course Notes. [11] Dyer E.; Lashof R. K., Homology of iterated loop spaces, Amer. J. Math., 84 (1962), 35–88. [12] Earle C. J.; Eells J., A fibre bundle description of Teichm¨ uller space, J. Differential Geometry, 3 (1969), 19–43. [13] Fadell E.; Neuwirth L., Configuration spaces, Math. Scand., 10 (1962), 111–118. [14] Fox, R.H., Neuwirth, L., The braid groups, Math. Scand., 10 (1962), 119–126. [15] Gonz´ alez J., Symmetric topological complexity as the first obstruction in Goodwillie’s Euclidean embedding tower for real projective spaces, Trans. Amer. Math. Soc., 363 (2011), 6713–6741. [16] Goldberg C. H., An exact sequence of braid groups, Math. Scand., 33 (1972), 69–82. [17] Gramain A., Le type d’homotopie du groupe des diff´eomorphismes ´ Norm. Sup., 6 (1973) d’une surface compacte, Ann. scient. Ec. 53–66. [18] Imayoshi Y.; Taniguchi M., An Introduction to Teichm¨ uller Spaces, Springer Verlag, Tokyo, 1992. [19] Korkmaz M., Mapping class groups of non-orientable surfaces, Geom. Dedicata, 89 (2002), 109–133. [20] Maldonado M. A., On the cohomology of mapping class groups for non-orientable surfaces, PhD thesis, Cinvestav-IPN, 2011. [21] Morita S., Geometry of Characteristic Classes, IWANAM Series in Modern Mathematics, American Mathematical Society, 2001. [22] tom Dieck T., Algebraic Topology, Ems Textbooks in Mathematics, European Mathematical Society, 2008.


Homotopy theory of mapping class groups

49

[23] Wang J. H., On the braid groups for P2 , J. Pure Appl. Algebra, 166 (2002), 203–227. [24] Xicot´encatl M., Orbit configuration spaces, infinitesimal braid relations in homology and equivariant loop spaces, Ph.D. thesis, University of Rochester, 1997.



Morfismos, Vol. 16, No. 2, 2012, pp. 51–68 Morfismos, Vol. 16, No. 2, 2012, pp. 51–68

Variance optimality for controlled ∗ Markov-modulated diffusions Variance optimality for controlled Markov-modulated diffusions Beatris Adriana Escobedo-Trujillo Carlos Octavio Rivera-Blanco Beatris Adriana Escobedo-Trujillo Carlos Octavio Rivera-Blanco

Abstract This paper concerns controlled Markov-modulated diffusions. Our main objective is to give conditions for the existence of optimal Abstract policies for the limiting average variance criterion. To this end, This paper concerns controlled Markov-modulated diffusions. Our we use the fact that the family of average reward optimal policies main objective is to give conditions for the existence of optimal is nonempty. Then, within this family, we search policies that policies for the limiting average variance criterion. To this end, minimize the limiting average variance. we use the fact that the family of average reward optimal policies is nonempty. Then, within this family, search policies that 2010 Mathematics Subject Classification: 93E20,we60J60. minimize the limiting average variance.

Keywords and phrases: variance optimality criterion, controlled Markovmodulated diffusions, exponential ergodicity. 93E20, 60J60. 2010 Mathematics Subject Classification:

1

Keywords and phrases: variance optimality criterion, controlled Markovmodulated diffusions, exponential ergodicity. Introduction

Using the fact that the family of average reward optimal policies is 1 Introduction nonempty (see [5] for details), in this paper we study the existence of stationary policies theoflimiting the is Using the fact that that minimize the family averageaverage reward variance optimal in policies classnonempty of average(see optimal policies. Under our assumptions we extend to of [5] for details), in this paper we study the existence controlled switching diffusions the results in [6] on discrete-time Markov stationary policies that minimize the limiting average variance in the control classprocesses. of average optimal policies. Under our assumptions we extend to A diffusionswitching with Markovian switchings as a piecewise controlled diffusions the results(also in [6]known on discrete-time Markov diffusions, control switching processes. diffusions, or Markov-modulated diffusions) is a stochastic differential equation with coefficients on as a continuA diffusion with Markovian switchings depending (also known a piecewise ous-time irreducible finite-state homogeneous Markov chain. The moti- is a diffusions, switching diffusions, or Markov-modulated diffusions) vation to studydifferential switching equation diffusionswith is that recent studies suggest stochastic coefficients depending on a that continu-

∗ ous-time irreducible finite-state homogeneous Markov chain. The motiThe research of the first author (BAET) was supported by CONACyT scholarship vation 167588. to study switching diffusions is that recent studies suggest that ∗

The research of the first author (BAET) was supported by CONACyT scholar51 ship 167588.

51


52

B. A. Escobedo-Trujillo and C. O. Rivera-Blanco

such processes are more general and appropriate for a wide variety of applications not covered by standard Markovs diffusion models. For related references see [1, 4, 5, 12, 14, 15, 16]. The existence of optimal policies for Markov-modulated diffusions with the average reward criterion has been previously studied in the literature; see, for instance [3, 4, 5]. But, as is well known, this criterion is very underselective because an average reward optimal policy may have an arbitrarily bad behavior for large but finite lengths of time. To avoid this situation, many authors consider more sensitive criteria such as the limiting average variance criterion; see [6, 7, 9, 10, 11, 13]. The paper is organized as follows. Section 2 introduces our assumptions, which lead to the notion of w and w2 −exponential ergodicity, a crucial tool for our results. In Section 3 we define the average optimality criterion we are interested in, and we summarize some results from [5] on the existence of solutions to the average reward problem, which is essentially our point of departure to analyze variance optimality. In Section 4 we define the variance optimality criterion and we prove the existence of variance optimal policies. In Section 5 we prove the Theorem 4.4 which states that the limiting average variance equals a constant independient of the initial state. Our results are illustrated with an example in Section 6.

2

Model Definition and Ergodic Properties

The control system we are concerned with is the controlled Markovmodulated diffusion process (1)

dx(t) = b(x(t), ψ(t), u(t))dt + σ(x(t), ψ(t))dW (t),

for t ≥ 0, x(0) = x, and ψ(0) = i, with coefficients depending on a continuous-time irreducible Markov chain ψ(·) with a finite state space E = {1, 2, . . . , N }, and transition probabilities (2)

P(ψ(s + t) = j|ψ(s) = i) = qij t + o(t).

For states i = j the number qij ≥ 0 is the transition rate from i to j, while qii := − j =i qij . Moreover, in (1), b : Rn × E × U → Rn and σ : Rn × E → Rn×d are given functions, and W (·) is a d-dimensional standard Brownian motion independent of ψ(·). The stochastic process u(·) is a U −valued process called a control process, and the set U ⊂ Rm is called the control (or action) space.


Variance optimality for controlled Markov-modulated diffusions

53

Notation. • For x ∈ Rn and a matrix A, we use the usual Euclidean norms x2k and |A|2 := A2k,l . |x|2 := k

k,l

We will also denote by A the transpose of a square matrix A. • We use P x,i,u (t, ·) to denote the transition probability of the process (x(·), ψ(·)), i.e., P x,i,u (t, B × J) := P ((x(t), ψ(t)) ∈ B × J|x(0) = x, ψ(0) = i) for every Borel set B ⊂ Rn and J ⊂ E. The associated conditional expectation is written Ex,i,u (·). Assumption 2.1. (a) The control set U is compact. (b) b(x, i, u) is continuous on Rn × E × U , and x → b(x, i, u) satisfies a Lipschitz condition uniformly in (i, u) ∈ E × U ; that is, there exist a positive constant K1 such that max

(i,u)∈E×U

|b(x, i, u) − b(y, i, u)| ≤ K1 |x − y| for all x, y ∈ Rn .

(c) There exists a positive constant K2 such that, for each i ∈ E and x, y ∈ Rn , |σ(x, i) − σ(y, i)| ≤ K2 |x − y|, (d) There exists a positive constant K3 such that the matrix a(x, i) := σ(x, i)σ (x, i) satisfies that, for each i ∈ E and x, y ∈ Rn , x a(y, i)x ≥ K3 |x|2

(uniform ellipticity).

Remark 2.2. The Lipschitz conditions on b and σ in Assumption 2.1 imply that b and σ satisfy a linear growth condition. That is, there exists a constant C ≥ K1 + K2 such that for all x ∈ Rn sup (u,i)∈U ×E

(|b(x, i, u)|) + |σ(x, i))| ≤ C(1 + |x|).


54

B. A. Escobedo-Trujillo and C. O. Rivera-Blanco

Control policies. For our present purposes, we can restrict ourselves to consider stationary Markov policies, defined as follows. Definition 2.3. Let F be the family of measurable functions f : Rn × E → U . A control policy of the form u(t) := f (x(t), ψ(t)) for some f ∈ F and t ≥ 0 is called a stationary Markov policy. Actually, by an abuse of terminology, f itself will be referred to as a stationary Markov policy. Infinitesimal generator. Let C2 (Rn × E) be the space of realvalued continuous functions ν(x, i) on Rn × E, which are twice continuously differentiable in x ∈ Rn for each i ∈ E. For ν ∈ C2 (Rn × E) and u ∈ U , let N Qν(x, i) := qij ν(x, j), j=1

where Q = [qij ] is the generator of the Markov chain ψ(·), and

n ∂ν 1 ∂2ν L ν(x, i) := (x, i)bk (x, i, u) + akl (x, i) (x, i) ∂xk 2 ∂xk ∂xl u

k,l

k=1

+Qν(x, i),

where bk is the k−th component of b, and akl is the k, l−component of the matrix a(·, ·) defined in Assumption 2.1(d). For each f ∈ F and (x, i) ∈ Rn × E let (3)

Lf ν(x, i) := Lf (x,i) ν(x, i).

Under Assumption 2.1 for each stationary Markov policy f ∈ F there exists an almost surely unique strong solution of (1)-(2); see [12, page 88-90]. On the other hand, even though x(t) itself is not necessarily Markov, it is well known that the joint process (x(·), ψ(·)) is Markov; see, for instance, [12, pp. 104-106]. The infinitesimal generator of the Markov process (x(t), ψ(t)) is Lf in (3) for each stationary Markov policy f ∈ F; see [12, page 48]. Recurrence and ergodicity. For the variance optimality criterion, we require the following second order condition (a Lyapunov-like condition) that ensures the positive recurrence of the controlled Markovmodulated diffusion (1)-(2) (see [5], [14] and [15].) Assumption 2.4. There exists a function w ∈ C2 (Rn ×E), with w ≥ 1, and constants p ≥ q > 0 such that


Variance optimality for controlled Markov-modulated diffusions

55

(i) lim|x|→∞ w(x, i) = +∞ for each i ∈ E, and (ii) for each u ∈ U and (x, i) ∈ Rn × E (4)

Lu w2 (x, i) ≤ −qw2 (x, i) + p.

Under the Assumption 2.4, for each f ∈ F, the Markov process (x(·), ψ(·)) is Harris positive recurrent with a unique invariant probability measure µf (dx, i) (see [15]) for which µf (w2 ) :=

N i=1

Rn

w2 (x, i)µf (dx, i) < ∞.

Definition 2.5. Let Bw (Rn × E) be the normed linear space of realvalued measurable functions ν on Rn × E with finite w−norm, which is defined as | ν(x, i) | ν w := . sup n (x,i)∈R ×E w(x, i) Remark 2.6. A consequence of Assumptions 2.1(d) and 2.4 (ii) states that the condition of second order (4) implies the following first order condition: Lu w(x, i) ≤ −q1 w(x, i) + p1 ,

(5)

for constants q1 = 2q and p1 = p2 , where p and q are the constants given in Assumption 2.4. For details, see [7, Proposition 2.3]. Under Assumptions 2.1 and 2.4, Theorem 2.8 in [3] ensures that the controlled Markov-modulated diffusion (1)-(2) is uniformly w−exponentially ergodic, that is, there exist positive constants C and δ such that (6)

sup |Ex,i,f [ν(x(t), ψ(t))] − µf (ν)| ≤ Ce−δt ν w w(x, i) f ∈F

for all (x, i) ∈ Rn × E, ν ∈ Bw (Rn × E), and t ≥ 0, where µf (ν) := N i=1 Rn ν(x, i)µf (dx, i).

3

Average Optimality Criteria

Let r : Rn × E × U → R be a measurable function, which we call the reward rate. It satisfies the following conditions:


56

B. A. Escobedo-Trujillo and C. O. Rivera-Blanco

Assumption 3.1. (a) The function r(x, i, u) is continuous on Rn × E × U and locally Lipschitz in x uniformly with respect to i ∈ E and u ∈ U ; that is, for each R > 0, there exists a constant K(R) > 0 such that sup (i,u)∈E×U

|r(x, i, u) − r(y, i, u)| ≤ K(R)|x − y| for all |x|, |y| ≤ R.

(b) r(·, ·, u) is in Bw (Rn × E) uniformly in u; that is, there exists M > 0 such that for each (x, i) ∈ Rn × E sup |r(x, i, u)| ≤ M w(x, i).

u∈U

Notation. For each Markov policy f ∈ F, x ∈ Rn and i ∈ E, we write r(x, i, f ) := r(x, i, f (x, i)). The following definition concerns the long-run average optimality criterion. Definition 3.2. For each f ∈ F, (x, i) ∈ Rn × E, and T ≥ 0, let JT (x, i, f ) := E

x,i,f

T 0

r(x(t), ψ(t), f )dt .

The long-run expected average reward given the initial state (x, i) is (7)

J(x, i, f ) := lim inf T →∞

1 JT (x, i, f ). T

The function J ∗ (x, i) := sup J(x, i, f ) for all (x, i) ∈ Rn × E f ∈F

is referred to as the optimal gain or the optimal average reward. If there is a policy f ∗ ∈ F for which J(x, i, f ∗ ) = J ∗ (x, i) for all (x, i) ∈ Rn × E, then f ∗ is called average optimal. Remark 3.3. The following important results were proven in [3].


Variance optimality for controlled Markov-modulated diffusions

(1) The w−exponential ergodicity (6) gives that the long-run expected average reward (7) coincides with a constant g(f ), which is defined by (8)

g(f ) := µf (r(·, ·, f )) =

N i=1

Rn

r(x, i, f )µf (dx, i).

for every f ∈ F. This is, g(f ) = J(x, i, f ). (2) Under Assumptions 2.1, 2.4, and 3.1, Theorem 4.2 in [3] ensures the existence of optimal average reward policies. Denoting by g ∗ the optimal average reward and by Fao the family of average optimal policies, we have: g ∗ := sup g(f ) = sup J(x, i, f ) for all (x, i) ∈ Rn × E. f ∈F

f ∈F

(3) We define, for each f ∈ F, the bias of f as the function (9) ∞ hf (x, i) := [Ex,i,f r(x(t), ψ(t), f ) − g(f )]dt for (x, i) ∈ Rn × E. 0

Note that this function is finite-valued because (6) and the Assumption 3.1(b) give, for all t ≥ 0, (10)

| Ex,i,f r(x(t), ψ(t), f ) − g(f ) |≤ e−δt CM w(x).

Hence, by (9) and (10), the bias of f is such that |hf (x, i)| ≤ δ −1 CM w(x),

and so

hf (x, i) w ≤ δ −1 CM.

This means that the bias hf is a finite-valued function and, in fact, it is in Bw (Rn × E). (4) In addition, by Proposition 5.2 in [3] we known that for each f ∈ F, the pair (g(f ), hf ) is the unique solution of the following Poisson equation (11)

g(f ) = r(x, i, f ) + Lf hf (x, i) for (x, i) ∈ Rn × E.

57


58

B. A. Escobedo-Trujillo and C. O. Rivera-Blanco

4

Variance Optimality

In this section we study the existence of a stationary policy that minimizes the limiting average variance in the class Fao of average optimal policies. Remark 4.1. We define the normed linear space Bw2 (Rn × E) of realvalued measurable functions ν on Rn × E with finite w2 −norm, similarly to the normed linear spaces Bw (Rn × E), with w2 in lieu of w. Remark 4.2. Under the Assumptions 2.1, 2.4, and 3.1, the process (x(·), ψ(·)) is uniformly w2 −exponentially ergodic, i.e, there exist positive constants C and δ such that sup |Ex,i,f [ν(x(t), ψ(t))] − µf (ν)| ≤ Ce−δt ν w2 w2 (x, i) f ∈F

for all (x, i) ∈ Rn × E, ν ∈ Bw2 (Rn × E), and t ≥ 0. The proof of this result is similar to that given in [3, Theorem 2.8] with w2 in lieu of w. Definition 4.3. For each f in F, the limiting average variance of f given the initial state (x, i) ∈ Rn × E, is the function 1 (12) σ (x, i, f ) := lim Ex,i,f T →∞ T 2

T 0

r(x(t), ψ(t), f )dt − JT (x, i, f )

2

.

The following theorem, which is proved in Section 5, states that the limiting average variance equals a constant.

Theorem 4.4. Under Assumptions 2.1 and 2.4, for each f in F and an arbitrary initial state (x, i) ∈ Rn × E, the limiting average variance σ 2 (x, i, f ) equals the constant (13)

2

σ (f ) := 2

n i=1

Rn

(r(x, i, f ) − g(f ))hf (x, i)µf (dx, i).

The following definition concerns the variance optimality criterion. Definition 4.5. We say that a stationary policy f ∗ is variance optimal if f ∗ ∈ Fao and, moreover, (14)

σ 2 (f ∗ ) = min σ 2 (f ). f ∈Fao


Variance optimality for controlled Markov-modulated diffusions

We define for each (x, i) ∈ Rn × E the set U ∗ (x, i) := {u ∈ U | g = r(x, i, u) + Lu h(x, i)}, with g ∈ R and h ∈ C 2 (Rn × E) ∩ Bw (Rn × E). By [3, Lemma 6.2] for each (x, i) ∈ Rn × E, U ∗ (x, i) is a nonempty compact set. Proposition 4.6. Suppose that Assumptions 2.1, 2.4, and 3.1 are satisfied. Then (i) There exist g, σ 2 ∈ R, h ∈ C 2 (Rn × E) ∩ Bw (Rn × E), and φ ∈ C 2 (Rn × E) ∩ Bw2 (Rn × E), that satisfy the system of equations g = max{r(x, i, u) + Lu h(x, i)},

(15)

u∈U

σ2 =

(16)

min {2(r(x, i, u) − g)h(x, i) + Lu φ(x, i)}

u∈U ∗ (x,i)

for all (x, i) ∈ Rn × E. (ii) A policy f ∗ in F is variance optimal if and only if f ∗ attains the maximum and the minimum in (15) and (16), respectively. The minimal limiting average variance σ 2 (f ∗ ) equals σ 2 in (16). Proof. (i) The existence of a constant g and a function h ∈ C 2 (Rn × E) ∩ Bw (Rn × E) that satisfy (15) follows from Theorem 4.2 in [3]. The latter theorem also yields the existence of a stationary policy f ∈ F that attains the maximum in the right-hand side of (15), i.e., g = r(x, i, f ) + Lf h(x, i) for all (x, i) ∈ Rn × E. Now suppose that f is in Fao . Then by Proposition 5.3 in [3] the bias of f satisfies that hf (x, i) = h(x, i) − µf (h) for all (x, i) ∈ Rn × E.

(17)

Thus, using (8), the limiting average variance of f verifies that 2

σ (f ) = 2 (18)

= 2

n

i=1 R n i=1

n

Rn

(r(x, i, f ) − g(f ))hf (x, i)µf (dx, i) (r(x, i, f ) − g)h(x, i)µf (dx, i).

59


60

B. A. Escobedo-Trujillo and C. O. Rivera-Blanco

This implies that σ 2 (f ) is the expected average reward of the policy f when the reward rate is the function r (x, i, u) := 2(r(x, i, u) − g)h(x, i) for all (x, i) ∈ Rn × E. Hence, to find a solution of (16) we need to solve a new average reward control problem. This problem has the following components: the dynamic system (1), the action sets U ∗ (x, i), and the reward rate r (x, i, u). Note that |r (x, i, u)| = 2|(r(x, i, u) − g)h(x, i)|

≤ 2(M w(x, i) + g)|h(x, i)|

≤ 2(M w(x, i) + g)||h(x, i)||w w(x, i)

≤ 2w2 (x, i)||h||w (M + g).

where the first inequality is by assumption 3.1(b) and the second inequality holds since h ∈ Bw (Rn × E). Therefore r (x, i, u) verifies the assumption 3.1(b) when w is replaced with w2 . The control problem with the above components satisfies the assumptions 2.1, 2.4 and 3.1 replacing w with w2 . Hence, by [3, Theorem 4.2], there exists (σ 2 , φ), with σ 2 ∈ R and φ ∈ C 2 (Rn ×E)∩Bw2 (Rn ×E), that satisfy the equation (16). (ii) By [3, Theorem 4.2] there exists a policy f ∗ ∈ F that attains the maximum in (15). Note that a stationary policy f ∗ is in Fao if and only if f ∗ (x, i) is in U ∗ (x, i) for all (x, i) ∈ Rn × E. Moreover, by [3, Lemma 6.2 (a)] , U ∗ (x, i) is a compact set for each (x, i) ∈ R × E. Hence, by [3, Theorem 4.2] , f ∗ is variance optimal. Now, by (16) we have σ 2 ≤ 2(r(x, i, f ) − g)h(x, i) + Lf φ(x, i).

(19)

Then, by Dynkin’s formula for diffusions with Markovian switchings [12, p. 48], for each T > 0 we obtain Ex,i,f φ(x(T ), ψ(T )) = φ(x, i) + Ex,i,f and using (19) E

x,i,f

T 0

Lf φ(x(s), ψ(s))ds ,

T 0

2(r(x(s), ψ(s), f ) − g)h(x(s), ψ(s))ds


61

Variance optimality for controlled Markov-modulated diffusions

≼ Ďƒ 2 t + φ(x, i) − Ex,i,f φ(x(T ), Ďˆ(T )). Replacing (17) in the latter inequality gives E

x,i,f

2(r(x(s), Ďˆ(s), f ) − g)hf (x(s), Ďˆ(s)ds 0 T +Âľf (h)Ex,i,f 2(r(x(s), Ďˆ(s), f ) − g)ds T

0

≼ Ďƒ 2 T + φ(x, i) − Ex,i,f φ(x(T ), Ďˆ(T )).

Thus, multiplying by T −1 both sides of this inequality and letting T → ∞, it follows from (13) and the w−exponencial ergodicity (6), that for all f ∈ Fao , Ďƒ 2 (f ) ≼ Ďƒ 2 . Hence, inf f ∈Fao Ďƒ 2 (f ) ≼ Ďƒ 2 . Now let us consider f ∗ in Fao that minimizes (16) and proceeding as above we obtain that Ďƒ 2 (f ∗ ) = Ďƒ 2 , and so Ďƒ 2 (f ∗ ) = Ďƒ 2 ≤ inf Ďƒ 2 (f ).

(20)

f ∈Fao

This completes the proof of part (ii).

Remark 4.7. Equation (15) is called average reward Hamilton-JacobiBellman equation, which is also known as the Bellman equation or the dynamic programming equation.

5

Proof of Theorem 4.4

To prove Theorem 4.4 first note that the Dynkin’s formula applied to hf , the equation Poisson (11), and the ergodicity exponential (6) given that the total expected payoff of f ∈ F over the time interval [0, T ], when the initial state is x ∈ Rn (recall Definition 3.2) can be write as JT (x, i, f ) = T g(f ) + hf (x, i) + O(e−δT ) where O(¡) is a residual term converging to zero as t → ∞. Replacing this last equation in the limiting average variance (12), we obtain that (21)

1 Ďƒ (x, i, f ) := lim Ex,i,f T →∞ T 2

T 0

r(x(t), Ďˆ(t), f )dt − T g(f )

2

.


62

B. A. Escobedo-Trujillo and C. O. Rivera-Blanco

We define, for f ∈ F and T ≥ 0 T f r(x(s), ψ(s), f )ds − T g(f ). (22) Y (T ) := 0

Then, replacing (22) in (21) we have

(23)

2 1 x,i,f f E Y (T ) . T →∞ T

σ 2 (x, i, f ) := lim

Seeing the equation (23) the first that comes to mind for of the Theorem 4.4 is to use the moment generating function associated to the process Y f (T ). However, a key problem with moment generating functions is that moments and the moment generating function may not exist. By contrast, the characteristic function of the process Y f (T ) always exists, and thus may be used instead. Hence, the main idea in the proof of Theorem 4.4 consists to use the f characteristic function, C z (T ) := eizY (T ) , z ∈ R, of the process Y f (T ). To show, using Ito’s formula, Poisson equation (11), and integration by parts that C z (T ) satisfies a certain integral equation. Then, consider the Taylor series of the C z (t) and substitute into of the integral equation 2 obtained to find a expression for Ex,i,f Y f (T ) which gives the result. To begin, we need the following lemma. Lemma 5.1. Let Y f (·) be as in (22), and let hf be the bias function defined in (9). Then (24)

lim

T →∞

1 x,i,f E [hf (x(T ), ψ(T ))Y f (T )] = 0. T

Proof. By Ito’s Lemma for semimartingales (see [8] section 8.10, page 234, or [12] section 1.8, page 48), T Lf hf (x(s), ψ(s))ds hf (x(T ), ψ(T )) = hf (x, i) + 0 T − qψ(s),ψ(s+ ) [hf (x(s), ψ(s+ )) 0

(25)

0≤s<T

− hf (x(s), ψ(s))])ds T n d ∂hf (x(s), ψ(s))σkl (x(s), ψ(s))dWl (s) + 0 k=1 l=1 ∂xk + [hf (x(s), ψ(s+ )) − hf (x(s), ψ(s))]. 0≤s<T


Variance optimality for controlled Markov-modulated diffusions

On the other hand, by Theorem 1 in [1] we obtain that T [hf (x(s), ψ(s+ )) − hf (x(s), ψ(s))] = [hf (x(s), j) 0

0≤s<T

−hf (x(s), ψ(s))](q0 − v)(ds, j) + (26)

j∈E

T

0

qψ(s),j [hf (x(s), j)

j∈E

− hf (x(s), ψ(s))]ds,

where q0 is the jump measure of ψ and v is the compensator of q0 . Replacing (26) in (25) and noting that hf (x, i) satisfies the Poisson equation (11), we have T hf (x(T ), ψ(T )) = hf (x, i) + (r(x(s), ψ(s), f ) − g(f ))ds +

0

0

(27)

+

n d T

T 0

k=1 l=1

j∈E

∂hf (x(s), ψ(s))σkl (x(s), ψ(s))dWl (s) ∂xk

[hf (x(s), j) − hf (x(s), ψ(s))](q0 − v)(ds, j).

For notational ease we define T n d ∂hf (x(s), ψ(s))σkl (x(s), ψ(s))dWl (s) NT := ∂xk 0 k=1 l=1

and

MT :=

T 0

j∈E

[hf (x(s), j) − hf (x(s), ψ(s))](q0 − v)(ds, j).

Next we multiply by hf (x(T ), ψ(T )) on both sides of (27) and taking expectations we find that Ex,i,f [hf (x(T ), ψ(T ))Y f (T )] = Ex,i,f [h2f (x(T ), ψ(T ))] − Ex,i,f [hf (x(T ), ψ(T ))]hf (x, i) (28)

− Ex,i,f [hf (x(T ), ψ(T ))NT ]

− Ex,i,f [hf (x(T ), ψ(T ))MT ].

From the w2 -exponential ergodicity, we have that (29) Ex,i,f [h2f (x(T ), ψ(T ))]/T → 0 and Ex,i,f [hf (x(T ), ψ(T ))]/T → 0.

63


64

B. A. Escobedo-Trujillo and C. O. Rivera-Blanco

Now, by Remark 2.2 and Remark 3.3(3) we obtain that NT is square integrable martingale and, moreover Ex,i,f |hf (x(s), Ďˆ(s+ )) − hf (x(s), Ďˆ(s))| < ∞. 0≤s<T

Then, it follows from Theorem 26.12 in [2] that MT is a martingale and, furthermore, it can shown that MT is also a square integrable martingale. Then, applying the Cauchy-Schwarz inequality to the third and fourth summand of the right-hand side of (28) we get 2 ≤ Ex,i,f [h2f (x(T ), Ďˆ(T ))] Ex,i,f hf (x(T ), Ďˆ(T ))NT

(30)

(31)

¡

Ex,i,f [NT2 ],

2 Ex,i,f hf (x(T ), Ďˆ(T ))MT ≤ Ex,i,f [h2f (x(T ), Ďˆ(T ))] ¡

Ex,i,f [MT2 ].

Finally, the orthogonality property of martingale differences of NT and MT yields Ex,i,f [NT2 ] = O(T ),

(32) and

Ex,i,f [MT2 ] = O(T ).

(33)

Therefore, (24) follows from (29)-(33) Proof of the Theorem 4.4. For z ∈ R we define the characteristic function of process Y f (t) defined in (22) as C z (T ) := eizY

(34)

f (T )

.

Note that (35) dC z (T ) = izdY f (T )C z (T ) = iz[r(x(t), Ďˆ(t), f ) − g(f )]C z (T )dt. This implies (36)

z

C (T ) = 1 + iz

T 0

[r(x(t), Ďˆ(t), f ) − g(f )]C z (t)dt.


Variance optimality for controlled Markov-modulated diffusions

As the bias function hf of f ∈ F satisfies the Poisson equation (11), from (36) we obtain z

(37)

C (T ) = 1 − iz

T 0

Lf hf (x(t), ψ(t))C z (t)dt.

Applying Ito’s formula to hf in the interval [0, T ] we obtain dhf (x(t), ψ(t)) = Lf hf (x(t), ψ(t)) − qψ(t),ψ(t+ ) [hf (x(t), ψ(t+ )) − hf (x(t), ψ(t))]) 0≤t<T

n d ∂hf

+

∂xk

k=1 l=1

(38)

+

0≤t<T

(x(t), ψ(t))σkl (x(t), ψ(t))dWl (t)

[hf (x(t), ψ(t+ )) − hf (x(t), ψ(t))],

and multiplication of (38) by C z (t) gives Lf hf (x(t), ψ(t))C z (t) = dhf (x(t), ψ(t))C z (t) qψ(t),ψ(t+ ) [hf (x(t), ψ(t+ )) − hf (x(t), ψ(t))]) + C z (t) 0≤t<T

− C z (t) (39)

− C z (t)

n d ∂hf k=1 l=1

0≤t<T

∂xk

(x(t), ψ(t))σkl (x(t), ψ(t))dWl (t)

[hf (x(t), ψ(t+ )) − hf (x(t), ψ(t))].

Replacing (39) in (37) we have

z

T

dhf (x(t), ψ(t)C z (t)dt C (T ) = 1 − iz 0 T + C z (t) qψ(t),ψ(t+ ) [hf (x(t), ψ(t+ )) − hf (x(t), ψ(t))])dt 0

− (40)

0≤t<T

T

z

C (t) 0

k=1 l=1

T 0

n d ∂hf

C z (t)

0≤t<T

∂xk

(x(t), ψ(t))σkl (x(t), ψ(t))dWl (t)dt

[hf (x(t), ψ(t+ )) − hf (x(t), ψ(t))]dt .

65


66

B. A. Escobedo-Trujillo and C. O. Rivera-Blanco

Using integration by parts we get T dhf (x(t), Ďˆ(t))C z (t)dt = hf (x(T ), Ďˆ(T ))C z (T ) − hf (x, i) 0 T + (41) hf (x(t), Ďˆ(t))dC z (t). 0

Therefore, replacing (41) in (40), taking expectations, and using the arguments in the proof of Lemma 5.1 we obtain Ex,i,f [C z (T )] = 1 − izEx,i,f hf (x(T ), Ďˆ(T ))C z (T ) − hf (x, i) T + (42) hf (x(t), Ďˆ(t))dC z (t)dt . 0

Finally, substituting (35) in (42) we obtain Ex,i,f [C z (T )] = 1 − izEx,i,f hf (x(T ), Ďˆ(T )C z (T ) − hf (x, i) T + hf (x(t), Ďˆ(t))iz[r(x(t), Ďˆ(t), f ) − g(f )]C z (t)dt . 0

Consider now the Taylor series of C z (t) in the last equality Ex,i,f

∞ (izY f (T ))k ) k=0

k!

=1

∞ (izY f (T ))k ) − hf (x, i) − izEx,i,f hf (x(T ), Ďˆ(T )) k! k=0 T ∞ (izY f (T ))k ) dt . hf (x(t), Ďˆ(t))iz[r(x(t), Ďˆ(t), f ) − g(f )] + k! 0 k=0

Equating second order terms in z we have

(43)

Ex,i,f [Y f (T )2 ] = 2Ex,i,f [hf (x(T ), Ďˆ(T ))Y f (T )] T x,i,f +2E hf (x(t), Ďˆ(t))[r(x(t), Ďˆ(t), f ) − g(f )]dt . 0

It is easy prove that hf (x, i)[r(x, i, f (x, i)) − g(f )] is in Bw2 (Rn Ă— E) and also that x(¡) is w2 − exponentially ergodic (recall Remark 4.2). Hence multiplying (43) by 1/t and letting t → ∞ the result (13) follows from Lemma 5.1 and the w2 −exponential ergodicity of x(¡).


Variance optimality for controlled Markov-modulated diffusions

6

67

An Example

Now we give an example to illustrate our results. This example is an extension of the one presented in [3]. Consider the scalar linear system (44) dx(t) = [b(ψ(t))x(t) + βu(t)]dt + σdW (t), x(0) = x, ψ(0) = i, where b : E → R and the coefficients β, σ are given positive constants. The control u(t) takes values in the compact set U := [0, a], with a > 0. The controlled Markov-modulated diffusion (44) satisfies the Assumption 2.1. Now let r(x, i, u) := u be the reward rate. Choose a function w(x, i) that satisfies Assumptions 2.4 and 3.1, respectively. Our goal is to find stationary policies u(t) := f (x(t), ψ(t)) that minimize the limiting average variance. To this end, first, we find stationary policies that optimize the long-run expected average reward and within this set we search variance optimal policies. To do this, we will use the equation (15), which in the present case takes the form (45)

1 g = max {u + hx (x, i)[b(i)x + βu] + σ 2 hxx (x, i) + Qh(x, i)}. 2 u∈[0,a]

For the particular case when hx (x, i) < −1/β for all i ∈ E and x ∈ Rn , the control policy f ∗ (x, i) = 0 is the unique policy that attains the maximum in (45) (hence, it is the unique average optimal policy). Consequently, by uniqueness of f ∗ , f ∗ it is also variance optimal.

Acknowledgement We wish to thank Dr. Onésimo Hernández-Lerma for his valuable comments on an early version of this paper. B. A. Escobedo-Trujillo Civil Engineering Faculty Universidad Veracruzana Coatzacoalcos, Ver., México Tel. (921) 218-77-83, 211-57-07 bescobedo@uv.mx

C. O. Rivera-Blanco Centro de Investigación en Recursos Energéticos y Sustentables Universidad Veracruzana Coatzacoalcos, Ver., México Tel. (921) 218-77-83, 211-57-07 crivera@uv.mx

References [1] Bäuerle N; Rieder U., Portfolio optimization with Markov-modulated stock prices and interest rates, IEEE Trans. Automatic Control. 49 (2004), 442-447.


68

B. A. Escobedo-Trujillo and C. O. Rivera-Blanco

[2] Davis M. H. A., Markov Models and Optimization, Chapman & Hall, London, 1993. [3] Escobedo-Trujillo B.A.; Hernández-Lerma O., Overtaking optimality for controlled Markov-modulated diffusions, J. Optimization. , (2011). [4] Ghosh M.K.; Arapostathis A.; Marcus S.I., Optimal control of switching diffusions with applications to flexible manufacturing systems, SIAM J. Control Optim. 31 (1993), 1183-1204. [5] Ghosh M.K.; Arapostathis A.; Marcus S.I., Ergodic control of switching diffusions, SIAM J. Control Optim. 35 (1997), 1962-1988. [6] Hernández-Lerma O.; Vega-Amaya O.; Carrasco G., Sample-path optimality and variance-minimization of average cost Markov control processes, SIAM J. Control Optim. 38 (1999), 79-93. [7] Jasso-Fuentes H.; Hernández-Lerma O., Optimal ergodic control of Markov diffusion processes with minimum variance, Stochastics. Electronic version. [8] Klebaner F. C., Introduction to Stochastic Calculus with Applications, Imperial College Press, London, second edition, 2005. [9] Mandl P., On the variance in controlled Markov chains, Kybernetika. (Prague) 7 (1971), 1-12. [10] Mandl P., An application of Ito’s formula to stochastic contol systems, Lecture Notes in Mathematics 294 (1972), 8-13. [11] Mandl P., A connection between controlled Markov chains and martingales, Kybernetika. (Prague) 9 (1973), 237-241. [12] Mao X.; Yuan C., Stochastic Differential Equations with Markovian Switching, Imperial College Press, London, 2006. [13] Prieto-Rumeau T.; Hernández-Lerma O., Variance minimization and overtaking optimality approach to continuous-time controlled Markov chains, Math. Meth. Oper. Res. 70 (2009), 527-540. [14] Yin G., Zhu C.; On notion of weak stability and related issues of hybrid diffusion systems, Nonlinear Analysis: Hybrid Systems 1 (2007), 173-187. [15] Zhu C.; Yin G., Asymptotic properties of hybrid diffusion systems, SIAM J. Control Optim. 46 (2007), 1155-1179. [16] Yin G.; Zhu C., Hybrid Switching Diffusions: Properties and Applications, Springer, New York, 2010.


Morfismos se imprime en el taller de reproducci´ on del Departamento de Matem´ aticas del Cinvestav, localizado en Avenida Instituto Polit´ecnico Nacional 2508, Colonia San Pedro Zacatenco, C.P. 07360, M´exico, D.F. Este n´ umero se termin´ o de imprimir en el mes de febrero de 2013. El tiraje en papel opalina importada de 36 kilogramos de 34 × 25.5 cm. consta de 50 ejemplares con pasta tintoreto color verde.

Apoyo t´ecnico: Omar Hern´ andez Orozco.


Contents - Contenido Procesos markovianos en la toma de decisiones: contribuciones de Onésimo Hernández-Lerma Francisco Venegas-Martínez . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Homotopy theory of non-orientable mapping class groups Miguel A. Maldonado . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29 Variance optimality for controlled Markov-modulated diffusions Beatrisb Adriana Escobedo-Trujillo and Carlos Octavio Rivera-Blanco . . . . 51


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.