VOLUMEN 10 NÚMERO 1 ENERO A JUNIO DE 2006 ISSN: 1870-6525
Morfismos Comunicaciones Estudiantiles Departamento de Matem´aticas Cinvestav
Editores Responsables • Isidoro Gitler • Jes´ us Gonz´alez
Consejo Editorial • Luis Carrera • Samuel Gitler • On´esimo Hern´andez-Lerma • Hector Jasso Fuentes • Miguel Maldonado • Ra´ ul Quiroga Barranco • Enrique Ram´ırez de Arellano • Enrique Reyes • Armando S´anchez • Mart´ın Solis • Leticia Z´arate
Editores Asociados • Ricardo Berlanga • Emilio Lluis Puebla • Isa´ıas L´opez • Guillermo Pastor • V´ıctor P´erez Abreu • Carlos Prieto • Carlos Renter´ıa • Luis Verde
Secretarias T´ecnicas • Roxana Mart´ınez • Laura Valencia ISSN: 1870 - 6525 Morfismos puede ser consultada electr´onicamente en “Revista Morfismos” en la direcci´on http://www.math.cinvestav.mx. Para mayores informes dirigirse al tel´efono 50 61 38 71. Toda correspondencia debe ir dirigida a la Sra. Laura Valencia, Departamento de Matem´aticas del Cinvestav, Apartado Postal 14-740, M´exico, D.F. 07000 o por correo electr´onico: laura@math.cinvestav.mx.
VOLUMEN 10 NÚMERO 1 ENERO A JUNIO DE 2006 ISSN: 1870-6525
Informaci´ on para Autores El Consejo Editorial de Morfismos, Comunicaciones Estudiantiles del Departamento de Matem´ aticas del CINVESTAV, convoca a estudiantes de licenciatura y posgrado a someter art´ıculos para ser publicados en esta revista bajo los siguientes lineamientos: • Todos los art´ıculos ser´ an enviados a especialistas para su arbitraje. No obstante, los art´ıculos ser´ an considerados s´ olo como versiones preliminares y por tanto pueden ser publicados en otras revistas especializadas. • Se debe anexar junto con el nombre del autor, su nivel acad´ emico y la instituci´ on donde estudia o labora. • El art´ıculo debe empezar con un resumen en el cual se indique de manera breve y concisa el resultado principal que se comunicar´ a. • Es recomendable que los art´ıculos presentados est´ en escritos en Latex y sean enviados a trav´ es de un medio electr´ onico. Los autores interesados pueden obtener el foron web mato LATEX 2ε utilizado por Morfismos en “Revista Morfismos” de la direcci´ http://www.math.cinvestav.mx, o directamente en el Departamento de Matem´ aticas del CINVESTAV. La utilizaci´ on de dicho formato ayudar´ a en la pronta publicaci´ on del art´ıculo. • Si el art´ıculo contiene ilustraciones o figuras, ´ estas deber´ an ser presentadas de forma que se ajusten a la calidad de reproducci´ on de Morfismos. • Los autores recibir´ an un total de 15 sobretiros por cada art´ıculo publicado.
• Los art´ıculos deben ser dirigidos a la Sra. Laura Valencia, Departamento de Matem´ aticas del Cinvestav, Apartado Postal 14 - 740, M´ exico, D.F. 07000, o a la direcci´ on de correo electr´ onico laura@math.cinvestav.mx
Author Information Morfismos, the student journal of the Mathematics Department of the Cinvestav, invites undergraduate and graduate students to submit manuscripts to be published under the following guidelines: • All manuscripts will be refereed by specialists. However, accepted papers will be considered to be “preliminary versions” in that authors may republish their papers in other journals, in the same or similar form. • In addition to his/her affiliation, the author must state his/her academic status (student, professor,...). • Each manuscript should begin with an abstract summarizing the main results.
• Morfismos encourages electronically submitted manuscripts prepared in Latex. Authors may retrieve the LATEX 2ε macros used for Morfismos through the web site http://www.math.cinvestav.mx, at “Revista Morfismos”, or by direct request to the Mathematics Department of Cinvestav. The use of these macros will help in the production process and also to minimize publishing costs. • All illustrations must be of professional quality.
• 15 offprints of each article will be provided free of charge.
• Manuscripts submitted for publication should be sent to Mrs. Laura Valencia, Departamento de Matem´ aticas del Cinvestav, Apartado Postal 14 - 740, M´ exico, D.F. 07000, or to the e-mail address: laura@math.cinvestav.mx
Lineamientos Editoriales “Morfismos” es la revista semestral de los estudiantes del Departamento de Matem´ aticas del CINVESTAV, que tiene entre sus principales objetivos el que los estudiantes adquieran experiencia en la escritura de resultados matem´ aticos. La publicaci´ on de trabajos no estar´ a restringida a estudiantes del CINVESTAV; deseamos fomentar tambi´en la participaci´ on de estudiantes en M´exico y en el extranjero, as´ı como la contribuci´ on por invitaci´ on de investigadores. Los reportes de investigaci´ on matem´ atica o res´ umenes de tesis de licenciatura, maestr´ıa o doctorado pueden ser publicados en Morfismos. Los art´ıculos que aparecer´ an ser´ an originales, ya sea en los resultados o en los m´etodos. Para juzgar ´esto, el Consejo Editorial designar´ a revisores de reconocido prestigio y con experiencia en la comunicaci´ on clara de ideas y conceptos matem´ aticos. Aunque Morfismos es una revista con arbitraje, los trabajos se considerar´ an como versiones preliminares que luego podr´ an aparecer publicados en otras revistas especializadas. Si tienes alguna sugerencia sobre la revista hazlo saber a los editores y con gusto estudiaremos la posibilidad de implementarla. Esperamos que esta publicaci´ on propicie, como una primera experiencia, el desarrollo de un estilo correcto de escribir matem´ aticas.
Morfismos
Editorial Guidelines “Morfismos” is the journal of the students of the Mathematics Department of CINVESTAV. One of its main objectives is for students to acquire experience in writing mathematics. Morfismos appears twice a year. Publication of papers is not restricted to students of CINVESTAV; we want to encourage students in Mexico and abroad to submit papers. Mathematics research reports or summaries of bachelor, master and Ph.D. theses will be considered for publication, as well as invited contributed papers by researchers. Papers submitted should be original, either in the results or in the methods. The Editors will assign as referees well–established mathematicians. Even though Morfismos is a refereed journal, the papers will be considered as preliminary versions which could later appear in other mathematical journals. If you have any suggestions about the journal, let the Editors know and we will gladly study the possibility of implementing them. We expect this journal to foster, as a preliminary experience, the development of a correct style of writing mathematics.
Morfismos
Contenido A unified approach to continuous-time discounted Markov control processes Tom´ as Prieto-Rumeau and On´esimo Hern´ andez-Lerma . . . . . . . . . . . . . . . . . . . 1
On bounds for the stability number of graphs Isidoro Gitler and Carlos E. Valencia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Morfismos, Vol. 10, No. 1, 2006, pp. 1–40
A unified approach to continuous-time discounted Markov control processes ∗ Toma´s Prieto-Rumeau
On´esimo Hern´andez-Lerma
Abstract In this paper we consider continuous-time Markov control processes with Borel state and action spaces. The performance criterion is the expected discounted reward over a finite or an infinite time horizon. The reward rates and the transition rates of the system are allowed to be unbounded. We propose conditions ensuring that the optimal discounted reward of both the finite and the infinite horizon problem satisfy a dynamic programming optimality equation, and we also prove the existence of ε-optimal and optimal strategies. Finite horizon approximations to the infinite horizon problem are discussed. We illustrate our results by showing that our hypotheses are satisfied by some classes of controlled Markov chains and controlled diffusions.
2000 Mathematics Subject Classification: 93E20, 60J25, 90C40. Keywords and phrases: continuous-time Markov control processes, discounted optimality criterion, value iteration.
1
Introduction
This paper is concerned with continuous-time Markov control processes with values in a Borel space, and with Borel action spaces. The criterion to be maximized is the expected discounted reward over a time horizon, which may be either finite or infinite. The transition rates of the system and the reward rates may be both unbounded. Many particular cases of such continuous-time Markov control processes have been extensively studied in the previous literature. We may ∗
Invited Article. Research partially supported by CONACyT Grant 45693-F.
1
2
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
mention, for instance, controlled Markov chains [13], controlled jump processes [12, 22], piecewise deterministic controlled processes [1, 5] or controlled diffusions [10, 11, 18], among others. The usual approach to deal with such problems is by means of the dynamic programming equation or, in other words, by deriving a so-called optimality equation (or Hamilton-Jacobi-Bellman equation) that characterizes the optimal expected reward of the control problem. Then, from this optimality equation one can obtain the existence of “nearly” optimal strategies or even the existence of optimal strategies. However, in most of the existing works, the optimality equation is derived using a technique that is specific to the model under study. For instance, for controlled Markov chains, an explicit expression for the transition probabilities is obtained in [13]; for piecewise deterministic processes, a discrete-time embedded Markov process (determined by the jump times) is studied in [1]; for controlled diffusions, the properties of the associated second order differential operator are exploited (see [18]). Only a few references [6, 14, 15] analyze the discounted optimality equation for general Markov processes. Moreover, in [14, 15] only sufficient conditions are given. More precisely, it is shown that if a given function satisfies a certain optimality equation, then this function is the optimal expected reward. But no (necessary) results proving that the optimal expected discounted reward satisfies the optimality equation are given. For the infinite horizon case, Doshi [6] proposes a set of such necessary conditions for a model with bounded transition rates (that is, the generator of the process is defined only for bounded functions) and bounded reward rates. In this paper we place ourselves in the framework of general Markov control processes, and one of our goals is to generalize the results in [6] to a model with both unbounded transition and reward rates. To this end, we impose some Lyapunov or drift conditions on the generator of the process yielding useful ergodic properties. Note also that Doshi [6] proposes a policy improvement algorithm whose convergence is proved under a very severe condition, namely that the action space is finite. In this paper, rather than dealing with policy iteration, we study the value iteration algorithm, that is, we study finite horizon approximations to the infinite horizon control problem, and prove its convergence under hypotheses less restrictive than Doshi’s. Therefore, in short, one of the main contributions of this paper is to propose fairly general assumptions that cover all of the previously studied models, and so its applicability ranges from models of the
Continuous-time Markov control processes
3
continuous-state type (e.g., diffusions) to models of the discontinuous type (e.g., Markov chains). Indeed, we illustrate our results by showing that our assumptions are implied by the standard ones imposed on controlled diffusion processes and on controlled Markov chains. To give a flavor of the general assumptions we make, let us mention that they are, roughly speaking, continuity and stability assumptions. The starting point is the well-known fact that a strategy that is optimal over a time horizon, say [0, m], is necessarily optimal over all time horizons I ⊆ [0, m]. (Note that this is a particular feature of the discounted reward criterion but not, for example, of the average reward criterion. For a general approach to average reward Markov control processes, the interested reader is referred to [7].) It is worth mentioning that other approaches to continuous-time Markov control processes have been proposed in the literature. These include time-discretization procedures, in which the continuous-time model is seen as the limit of discrete-time models [1, 2]. The continuoustime optimality equation is then derived from the discrete-time optimality equations. In this paper, we have preferred to follow a purely continuous-time approach. A linear programming formulation has also been proposed; see [16] for discrete-time models and [4, 19, 25] for their continuous-time counterpart. The idea is to see the optimal expected reward as being the maximum of a linear programming problem over a suitably defined space of occupation measures. Optimal controls are then characterized as those reaching this maximum. Our approach is essentially different because the linear programming formulation does not derive an optimality equation, as ours does. Besides, optimal controls are defined in a weaker sense. More precisely, they are optimal controls for almost every initial state (see [25, Section 5]), whereas the approach in this paper yields optimality for every initial state. When dealing with general Markov processes, a key issue is that of the formal definition of these Markov processes. They may be defined starting from the transition probability function and then deriving the Kolmogorov differential equations [9, 13]. They are also frequently defined using the so-called generator of the process, which is a sort of derivative of a semi-group operator [6, 14, 15]. In this paper, we have chosen the approach of the martingale characterization of the generator of the Markov process [19, 24]. Further details are given in Remark 2.1 in the next section. The rest of the paper is organized as follows. In Section 2 we intro-
4
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
duce the continuous-time control model and give some basic definitions. Section 3 studies the finite horizon discounted problem. In Section 4 we analyze the infinite horizon problem and the value iteration approximations. Also in this section we compare our hypotheses to those in [6]. For clarity of exposition, the proofs of the results stated in Sections 3 and 4 are postponed to a later section. In Section 5 we illustrate our results by studying two particular cases: finite horizon controlled diffusions and infinite horizon controlled Markov chains. The rest of the material is technical. Section 6 proves several useful lemmas. In Section 7 we prove the results stated in Sections 3 and 4. Finally, in Section 8 we provide the proofs of the results in the examples section, Section 5.
2
The control model
In this section we briefly introduce the control model. Additional specifications are introduced in later sections, as needed. The state space is X and the action space is A. They are both assumed to be Borel spaces endowed with their respective Borel σalgebras B(X) and B(A). For every x ∈ X, the set of admissible actions at x is the nonempty σ-compact Borel set A(x) ⊆ A. Let K := {(x, a) ∈ X × A : a ∈ A(x)}, assumed to be a measurable subset of the Borel space X × A. We also consider a measure space (Ω, F). Multifunctions. Now we recall some terminology on multifunctions and measurable selectors (see [3], [16, Appendix D] or [23]). A setvalued function Ψ : X → 2A , where Ψ(x) ̸= ∅ for every x ∈ X, is called a multifunction from X to A. Let Ψ− (B) := {x ∈ X : Ψ(x) ∩ B ̸= ∅} for B ⊆ A. If Ψ− (B) is closed for every closed set B ⊆ A then Ψ is said to be upper semicontinuous. If Ψ− (B) is open for every open set B ⊆ A then Ψ is said to be lower semicontinuous. The multifunction Ψ is continuous if it is both upper and lower semicontinuous. Finally, we say that Ψ is compact-valued if A(x) is compact for every x ∈ X. A measurable function f : X → A such that f (x) ∈ Ψ(x) for each x ∈ X is called a measurable selector of the multifunction Ψ. Strategies. Loosely speaking, a Markov strategy ϕ prescribes the action ϕ(t, x) ∈ A(x) to be chosen when the state of the system at time t ≥ 0 is x ∈ X. More precisely, a Markov strategy ϕ is a measurable function ϕ : [0, ∞) × X → A, where ϕ(t, x) ∈ A(x) for every t ≥ 0
5
Continuous-time Markov control processes
and x ∈ X, such that for every s ≥ 0 and every initial state x ∈ X at time s there exists a probability measure on (Ω, F) denoted Pϕ,s,x , with corresponding expectation operator Eϕ,s,x , and a right-continuous Markov process {x(t, ϕ)}t≥s , with x(s, ϕ) = x, satisfying that v(x(t, ϕ)) − v(x) −
!
s
t
(Lϕ(u,x(u,ϕ)) v)(x(u, ϕ))du
for t ≥ s
is a Pϕ,s,x -martingale for every measurable v ∈ Dϕ , where the operator L ≡ {(La v)(x)}(x,a)∈ is the generator of the stochastic control process, and Dϕ is the domain of the generator of the process x(·, ϕ). This is the martingale characterization of the generator [19, 24]. If s = 0 then Pϕ,s,x and Eϕ,s,x will be denoted by Pϕ,x and Eϕ,x , respectively. The family of Markov strategies is denoted by Φ. If ϕ is a Markov strategy such that ϕ(t, x) = f (x) for some measurable function f : X → A and all t ≥ 0 and x ∈ X, then we say that ϕ is a stationary strategy. In this case we also identify ϕ with the time-independent function f . The set of stationary strategies is denoted by F. Observe that ϕ ∈ Φ is a measurable selector for the multifunction from [0, ∞)×X to A defined by (s, x) (→ A(x), and, similarly, f ∈ F is a measurable selector for the multifunction from X to A defined by x (→ A(x). Remark 2.1 The above definition of the Markov control process may seem vague: indeed, the existence of the corresponding probability measure Pϕ,s,x is assumed yet not proved. Note, however, that in the papers [4, 19, 24], whose authors also propose the martingale characterization as definition of the Markov process, the existence of a stationary Markov control process is established. Proving the existence of a nonstationary Markov control process is beyond the scope of this paper. In spite of this, for a given model, the existence of such nonstationary controls can be proved by imposing some specific assumptions. As an illustration, this is the case for controlled diffusions [18], where some regularity assumptions on the parameters of the stochastic differential equations are made, and for controlled Markov chains [9, 13], where the continuity of the transition rates is assumed. Sometimes we will also need to use the so-called extended generator of the process x(·, ϕ), where ϕ ∈ Φ. Whereas the generator of the process is defined for functions with domain X, the extended generator is defined for “time-space” functions. Given a measurable function v :
6
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
[0, ∞) × X → R, we say that v belongs to the domain of the extended generator of the process x(·, ϕ), denoted by v ∈ Dϕ , if (1)
v(t, x(t, ϕ)) − v(s, x) −
!
t
(Lϕ(u,x(u,ϕ)) v)(u, x(u, ϕ))du
s
for t ≥ s is a Pϕ,s,x -martingale for every s ≥ 0 and every initial state x ∈ X at time s. Observe that we use the notation L for both the generator and the extended generator. Suppose that the reward rate is the Borel measurable function r : K → R and that a discount factor α > 0 is given. In this paper we shall deal with both finite and infinite horizon problems. Finite horizon problems. Consider a finite horizon control problem, where the time horizon is [0, m], for m > 0. Given ϕ ∈ Φ, 0 ≤ s ≤ m and an initial state x ∈ X at time s, define the expected discounted reward of ϕ on [s, m] as ! m V m (ϕ, s, x) := Eϕ,s,x (2) e−α(t−s) r(x(t, ϕ), ϕ(x(t, ϕ)))dt. s
The optimal reward of the control problem over the time interval [s, m] for a given initial state x ∈ X at time s is then defined as (3)
V ∗m (s, x) := sup V m (ϕ, s, x), ϕ∈Φ
assumed to be measurable (this measurability assumption is not restrictive in practice). We say that ϕ ∈ Φ is optimal for the control problem with finite horizon [0, m] if V m (ϕ, s, x) = V ∗m (s, x) for every x ∈ X and 0 ≤ s ≤ m, and that ϕ ∈ Φ is ε-optimal, where ε > 0, for the control problem with finite horizon [0, m] if V m (ϕ, s, x) + ε ≥ V ∗m (s, x) for every x ∈ X and 0 ≤ s ≤ m. Clearly, when dealing with finite horizon problems, the extended generator needs to be defined only for functions v : [0, m] × X → R, and then the martingale property (1) is only required for s ≤ t ≤ m.
Continuous-time Markov control processes
7
Infinite horizon problems. For infinite horizon problems, given a Markov strategy ϕ ∈ Φ, define the (infinite horizon) expected discounted reward of ϕ when the initial state at time s ≥ 0 is x ∈ X as ! ∞ V (ϕ, s, x) := Eϕ,s,x (4) e−α(t−s) r(x(t, ϕ), ϕ(t, x(t, ϕ)))dt. s
If s = 0 we simply write V (ϕ, 0, x) =: V (ϕ, x). The optimal expected discounted reward is given by V ∗ (s, x) := sup V (ϕ, s, x),
(5)
ϕ∈Φ
which for s = 0 reduces to V ∗ (x) := sup V (ϕ, x),
(6)
ϕ∈Φ
assumed to be measurable. In fact, it will be shown in Lemma 6.1 below that V ∗ (s, x) = V ∗ (x) for every x ∈ X and s ≥ 0. A Markov strategy ϕ ∈ Φ is optimal for the infinite horizon problem if V (ϕ, x) = V ∗ (x) for every x ∈ X, and it is said to be ε-optimal, for ε > 0, if V (ϕ, x) + ε ≥ V ∗ (x) for every x ∈ X. Our assumptions will ensure that the above defined expressions, i.e., (2)–(6), are well defined and finite. Consider a given measurable function W : X → [1, ∞) such that the level sets {x ∈ X : W (x) ≤ C} have compact closure for every C ≥ 1. Such a function is called a Lyapunov function; it is also known as a moment or a norm-like function. Let BW be the Banach space of measurable functions v : X → R with norm ||v||W := sup
x∈X
|v(x)| < ∞. W (x)
We will make the following assumption on the control model. Assumption A. (a) The function W is in Dϕ for every ϕ ∈ Φ, and there exist constants 0 < c < α and b ≥ 0 such that (La W )(x) ≤ cW (x) + b
for every (x, a) ∈ K.
(b) There exists a constant M > 0 such that |r(x, a)| ≤ M W (x)
for each (x, a) ∈ K.
8
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
Assumption A(a) imposes a so-called Lyapunov or drift condition on the generator. A similar assumption is made in the papers [12, 13, 20], among others. This is a standard assumption when dealing with unbounded cost or reward rates because it imposes a growth condition on the reward rate; see Lemma 6.2 below and also, for instance, [17, Chapter 8] or [12, 13]. It is worth noting that the positivity of the constant c in Assumption A(a) is not strictly necessary. Indeed, if c < 0 then the bound obtained in Lemma 6.2 below should be modified in accordance (see [13, Theorem 3.1]) and our results would still be valid.
3
The finite horizon case
In this section we study the optimality equation for finite horizon problems, as well as the existence of ε-optimal or optimal strategies. First, we introduce some notation. Fix x ∈ X, a neighborhood N of x, and s ≥ 0. Given ϕ ∈ Φ, consider the Markov process {x(t, ϕ)}t≥s with initial state x at time s. Define τ (x, N, ϕ, s) as the exit time of the process x(·, ϕ) from N , that is, τ (x, N, ϕ, s) := inf{t ≥ s : x(t, ϕ) ∈ / N }. If s = 0 we will simply write (7)
τ (x, N, ϕ) := τ (x, N, ϕ, 0).
Our next assumption imposes some continuity and stability conditions. Parts (a)–(c) in Assumption B below are minimal continuity requirements, whereas part (d) excludes “instantaneous jumps” of the state process; in other words, for each initial state x at time s, the state process x(·, ϕ) remains in a neighborhood N of x a positive time period, uniformly in ϕ ∈ Φ. We will provide in Lemma 6.8 an easily verifiable sufficient condition for Assumption B(d). Assumption B. (a) For each m > 0 the function (s, x) #→ V ∗m (s, x) is continuous on [0, m] × X and it belongs to Dϕ for every ϕ ∈ Φ. (b) For each m > 0 the function (s, x) #→ sup {r(x, a) + (La V ∗m )(s, x)} a∈A(x)
Continuous-time Markov control processes
9
is continuous on [0, m] × X. (c) The function a → " r(x, a) + (La V ∗m )(s, x) is continuous on A(x) for every fixed 0 ≤ s ≤ m and x ∈ X. (d) Given x ∈ X, a neighborhood N of x, s ≥ 0 and δ > 0, sup Eϕ,s,x [exp{−α min{τ (x, N, ϕ, s) − s, δ}}] < 1.
ϕ∈Φ
Concerning Assumption B(d), see Lemma 6.8. The next result is similar to [14, Theorem 5.1] or [15, Theorem 6.1]. Observe however that in these references the existence of a solution to the optimality equations (8)–(9) is assumed, whereas in Theorem 3.1 we prove the existence of this solution. Theorem 3.1 Consider a control problem with finite horizon [0, m]. If Assumptions A and B are verified, then V ∗m is the unique function v : [0, m] × X → R in ∩ϕ∈Φ Dϕ satisfying Assumption B(c) that is a solution of the equation (8) αv(s, x) = sup {r(x, a) + (La v)(s, x)} a∈A(x)
for 0 ≤ s < m, x ∈ X,
with terminal condition (9)
v(m, x) = 0
for x ∈ X.
Moreover, for each given ε > 0, there exist ε-optimal strategies for the finite horizon problem on [0, m]. The following two corollaries propose conditions under which there exist optimal strategies. This is achieved requiring that either Assumptions A and B hold and that A(x) is compact for every x ∈ X, or relaxing Assumptions B(b) and B(c) and strengthening instead the hypotheses on the multifunction x "→ A(x). Corollary 3.2 Suppose that Assumptions A and B are verified and that the action set A(x) is compact for every x ∈ X. Then the conclusions of Theorem 3.1 remain valid and, moreover, there exist optimal strategies for the finite horizon problem.
10
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
Corollary 3.3 Suppose that Assumptions A, B(a), and B(d) are verified and, in addition, the function (s, x, a) !→ r(x, a) + (La V ∗m )(s, x) is continuous on [0, m] × K. Assume further that the multifunction from X to A defined by x !→ A(x) is continuous and compact-valued. Then the conclusions of Theorem 3.1 remain valid and, moreover, there exist optimal strategies for the finite horizon problem.
4
The infinite horizon problem
In this section we study the infinite horizon control problem: the optimality equation and the existence of ε-optimal or optimal strategies. Actually we consider two different approaches. First, we consider finite horizon approximations — which can be also described as a successive approximations approach — to the infinite horizon problem, that is, we explore the limiting behavior of the results in the previous section as the time horizon goes to infinity. This approach requires restrictive assumptions (see Assumptions C and D below) but it could be extremely helpful for practical purposes because, as shown in Theorem 4.1, the convergence of V ∗m (0, ·) to V ∗ is exponential and, furthermore, given ε > 0, we can explicitly determine a time horizon m such that ||V ∗m (0, ·) − V ∗ ||W ≤ ε (see Remark 4.2). In our second approach, under reasonably mild hypotheses (see Assumption E) we obtain the optimality equation and the existence of optimal policies for the infinite horizon problem but, unfortunately, we get no information on the finite horizon approximations. It should be noted that our results in this section prove the existence of a solution to the optimality equation, whereas in previous papers (e.g., [14, Theorem 6.2] or [15, Theorem 4.1]) the existence of such a solution was assumed. The successive approximations approach. We will require the following assumption. (Recall the notation in (4).) Assumption C. For every ϕ ∈ Φ, V (ϕ, ·, ·) is in Dϕ and, furthermore, the functions (s, x) !→ V (ϕ, s, x) and (s, x) !→ r(x, ϕ(s, x)) + (Lϕ(s,x) V (ϕ, ·, ·))(s, x) are continuous on [0, ∞) × X.
Continuous-time Markov control processes
11
Theorem 4.1 below deals with the value iteration approximations, i.e., the convergence of V ∗m to V ∗ as m → ∞. In the literature on discrete-time control problems, it is well known that, under suitable conditions, the value iteration procedure converges geometrically, that is, V ∗m converges at a geometric rate to V ∗ . The reason for this is that V ∗ is the fixed point of a contraction mapping; see, for instance [17, Theorem 8.3.6]. Theorem 4.1 shows that this result is also true for continuous-time problems. Theorem 4.1 If Assumptions A, B and C are verified, then, for every T > 0, V ∗m (s, ·) converges exponentially to V ∗ uniformly on 0 ≤ s ≤ T in the W -norm as m → ∞; in symbols, sup ||V ∗m (s, ·) − V ∗ (·)||W = O(e−(α−c)m )
0≤s≤T
as m → ∞.
Remark 4.2 In the proof of Theorem 4.1, we will derive an explicit expression for the term O(e−(α−c)m ) above depending only on known constants: b, c, α, M and T (see (38)). Therefore, for a given ε > 0, we can effectively determine a value of m for which sup0≤s≤T ||V ∗m (s, ·) − V ∗ (·)||W ≤ ε. As a consequence of Theorem 4.1, we obtain a sufficient condition for the continuity of V ∗ , which is sometimes required (see Assumption E(b) below). Corollary 4.3 Suppose that the state space X is locally compact and that the Lyapunov function W is bounded on compact sets. If Assumptions A, B and C are satisfied, then V ∗ is continuous on X. Theorem 4.1, together with the finite horizon optimality equation in Theorem 3.1, i.e., αV ∗m (s, x) = sup {r(x, a) + (La V ∗m )(s, x)} for 0 ≤ s < m, x ∈ X, a∈A(x)
suggests that letting m → ∞ in this equation would yield the infinite horizon optimality equation (10) below with v = V ∗ . To this end, we must impose the following condition. We consider the family of functions on [0, 1] × X (s, x) '→ Gf,m (s, x) := (Lf (x) V ∗m )(s, x) − (Lf (x) V ∗ )(x), parametrized by f ∈ F and the time horizon m ∈ N.
12
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
Assumption D. (a) The infinite horizon optimal reward V ∗ is in Dϕ for every ϕ ∈ Φ. (b) The family of functions {Gf,m }f ∈ for every x ∈ X.
,m∈
is equicontinuous at (0, x)
To prove the convergence of the policy iteration algorithm, a hypothesis similar to the equicontinuity condition Assumption D(b) is made in [6, Assumption 8]. ∗ Theorem 4.4 If Assumptions ! A, B, C and D are satisfied, then V is the unique function v in ( ϕ∈Φ Dϕ ) ∩ BW satisfying Assumption B(c) that is a solution of the optimality equation
(10)
αv(x) = sup {r(x, a) + (La v)(x)} a∈A(x)
for x ∈ X.
Moreover, for every ε > 0 there exist ε-optimal stationary strategies. In connection with this theorem see Corollary 4.6. Direct approach to the infinite horizon problem. So far we have analyzed the successive approximations approach to the infinite horizon problem. Now, we drop Assumptions C and D and, instead, we propose a less restrictive assumption (Assumption E below) allowing us to derive directly the infinite horizon optimality equation, but we do not obtain information on the finite horizon approximations. (In Assumption E(d), recall the notation in (7).) Assumption E. (a) The function V ∗ is in ∩ϕ∈Φ Dϕ . (b) The functions V ∗ and x #→ supa∈A(x) {r(x, a)+(La V ∗ )(x)} are continuous on X. (c) For each x ∈ X, a #→ r(x, a) + (La V ∗ )(x) is continuous on A(x). (d) Given x ∈ X and a neighborhood N of x sup Eϕ,x [e−ατ (x,N,ϕ) ] < 1.
ϕ∈Φ
Continuous-time Markov control processes
13
Observe that Assumption B(d) —and hence the condition in Lemma 6.8— implies Assumption E(d). The following theorem gives the same conclusions as Theorem 4.4 although, for clarity of exposition, we prefer to state it separately. ∗ Theorem 4.5 Suppose that ! Assumptions A and E hold. Then V is the unique function v in ( ϕ∈Φ Dϕ ) ∩ BW satisfying Assumption E(c) that is a solution of the optimality equation
αv(x) = sup {r(x, a) + (La v)(x)} a∈A(x)
for x ∈ X.
Moreover, for every ε > 0 there exist ε-optimal stationary strategies. The following corollaries are similar to Corollaries 3.2 and 3.3 on the finite-horizon problem. Corollary 4.6 Suppose that either Assumptions A, B, C and D, or Assumptions A and E are verified. Suppose also that A(x) is compact for every x ∈ X. Then the conclusions of Theorems 4.4 and 4.5 remain valid and, moreover, there exist optimal stationary strategies. Corollary 4.7 Suppose that Assumptions A, E(a), and E(d) are verified and that the functions x #→ V ∗ (x)
and
(x, a) #→ r(x, a) + (La V ∗ )(x)
are continuous on X and K, respectively. Assume also that the multifunction from X to A defined by x #→ A(x) is continuous and compactvalued. Then the conclusions of Theorem 4.5 remain valid and, furthermore, there exist optimal stationary strategies. To conclude this section, let us compare our assumptions to those in Doshi [6]. The proof techniques are essentially the same, combining continuity and stability features. There are however two important improvements. The first one is that we allow the reward rates and the transition rates (i.e., the generator) to be unbounded; see [6, Definition 2.4] and [6, Assumption 1(d)]. Second, our stability condition (for instance, Assumption E(d)) is, by far, more general than the corresponding stability condition stated in Assumption 5 in [6, p. 1226]. Indeed, as shown in Lemma 6.8 below, a sufficient condition for Assumption E(d)
14
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
is that given x ∈ X, a neighborhood N of x and s ≥ 0, there exists δ0 > 0 such that (11)
inf Pϕ,s,x {τ (x, N, ϕ, s) ≥ s + δ0 } > 0,
ϕ∈Φ
whereas [6, Assumption 5] would require the existence of δ0 > 0 with inf Pϕ,s,x {τ (x, N, ϕ, s) ≥ s + δ0 }
ϕ∈Φ
arbitrarily close to one, which of course is much more restrictive than (11). The reason for this is that to prove a result such as Theorem 3.1, for instance, Doshi considers deterministic intervals, say [s, s + δ0 ], while in our proofs we deal with random intervals [s, τ (x, N, ϕ, s)].
5
Examples
Finite horizon controlled diffusions. The topic of controlled diffusions over finite intervals has been extensively studied by many authors, in particular, by Krylov [18]. We are going to show that his hypotheses on the control model imply our assumptions and, therefore, we can use the results in Section 3 to prove the existence of a solution to the optimality equation and the existence of optimal controls. For expositional ease, we will analyze one-dimensional controlled diffusions, though the results may be generalized to the multidimensional case. Suppose that the state space is R and that the set of admissible controls is a compact Borel space A. We fix a time horizon [0, T ], where T > 0. Let {Wt , Ft }t≥0 be a one-dimensional Brownian motion. We consider real-valued functions b(t, x, a) and σ(t, x, a), defined for 0 ≤ t ≤ T , x ∈ R and a ∈ A. Suppose that the state of the system is x at time t ∈ [0, T ) and that we use the control a ∈ A. Then, the infinitesimal evolution of the state of the system is determined by dx = b(t, x, a)dt + σ(t, x, a)dWt . This formula will be given a formal definition in the next paragraph. Define C t as the family of continuous functions from [0, t] to R. Let Nt , for t ≥ 0, be the minimal σ-algebra that contains the sets of the form {x ∈ C t : x(s) ∈ Γ} for s ∈ [0, t] and Γ ∈ B, the Borel σalgebra of R. We say that ϕ := {ϕ(t, ·)}0≤t≤T is an admissible strategy
Continuous-time Markov control processes
15
if ϕ(t, ·) : C t → A is Nt -measurable for all 0 ≤ t ≤ T and, moreover, for every s ∈ [0, T ] and x ∈ R, there exists a solution of ! t ! t x(t) = x + b(s + u, x(u), ϕ(u, x))du + σ(s + u, x(u), ϕ(u, x))dWu 0
0
for 0 ≤ t ≤ T − s, which is adapted to {Ft }t≥0 . Then we define x(t, ϕ) := x(t − s) for s ≤ t ≤ T , which has initial value x(s, ϕ) = x. Assumption CD below ensures the existence of such solutions. The probability measure Pϕ,s,x and the expectation operator Eϕ,s,x corresponding to {x(t, ϕ)}s≤t≤T are given the same definitions as in Section 2. It follows that history-dependent strategies are also admissible. We denote by Φ the set of admissible strategies, which contains Φ, the family of Markov strategies. We define V T (ϕ, s, x) as in (2), that is (12)
V (ϕ, s, x) := Eϕ,s,x T
!
T
e−α(t−s) r(x(t, ϕ), ϕ(t, x(t, ϕ)))dt,
s
(recall that r is the reward rate) and (cf. (3)) V ∗T (s, x) := sup V T (ϕ, s, x)
(13)
ϕ∈Φ
for s ∈ [0, T ] and x ∈ R. We impose the following Assumption CD (where CD stands for “controlled diffusions”), which is drawn from [18]. Assumption CD. (a) The functions b(t, x, a), σ(t, x, a) and r(x, a) are continuous with respect to (t, x, a) ∈ [0, T ] × R × A, uniformly in a ∈ A. (b) There exist nonnegative constants K and p such that |σ(t, x, a) − σ(t, y, a)| + |b(t, x, a) − b(t, y, a)| ≤ K|x − y|, |σ(t, x, a)| + |b(t, x, a)| ≤ K(1 + |x|)
and
for all 0 ≤ t ≤ T , a ∈ A and x, y ∈ R.
|r(x, a)| ≤ K(1 + |x|)p
(c) The functions b, σ and r have derivatives with respect to t ∈ [0, T ] and second derivatives with respect to x ∈ R, for each a ∈ A. Assume also that these derivatives are continuous on [0, T ] × R and that they are bounded by K(1 + |x|)p .
16
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
(d) For each R > 0 there exists some δR > 0 such that |σ(t, x, a)| ≥ δR for t ∈ [0, T ], a ∈ A and |x| ≤ R. These assumptions are quite standard in the theory of stochastic differential equations. By Vt , Vx and Vxx we will denote the derivative with respect to t ∈ [0, T ] and the first and second derivatives with respect to x ∈ R, respectively, of a real-valued function V defined on [0, T ] × R. If Vt and Vxx are continuous on [0, T ] × R we will write V ∈ C 1,2 ([0, T ] × R). In Theorem 3.1 we proved that Assumptions A and B implied the existence of a solution to the finite horizon optimality equation. Notice however that Assumption A was used only to ensure the finiteness of (12) and (13), which in fact can be derived from Assumption CD; see the footnote in [18, p. 131]. Therefore, it suffices to prove that Assumption CD implies Assumption B. Theorem 5.1 Suppose that Assumption CD holds and that Vt∗T and ∗T are continuous on [0, T ] × R. Then Assumption B is satisfied and, Vxx therefore, V ∗T is the unique solution v in (∩ϕ∈Φ Dϕ ) ∩ C 1,2 ([0, T ] × R) of the equation αv(t, x) = supa∈A {r(x, a) + vt (t, x) + b(t, x, a)vx (t, x)
! + 12 σ 2 (t, x, a)vxx (t, x)
for 0 ≤ t < T , and v(T, x) = 0, for every x ∈ R. Moreover, there exists an optimal Markov strategy.
Observe that Theorem 4.7.7 in [18] ensures the existence of Vt∗T and ∗T but their continuity cannot be proved in such a general framework. Vxx However, for specific models, the continuity of the derivatives can be shown; see, for instance, the one dimensional controlled diffusion with two boundaries analyzed in [18, Section 1.4]. Furthermore, under Assumption CD, [18, Theorem 4.7.7] shows that ∗T V is a solution of the optimality equation almost-everywhere in [0, T ]× R. Also, under Assumption CD, a uniqueness result is proved in [18, Theorem 5.3.14]. As a conclusion, we have reached the same results as in [18] but using a general approach.
Continuous-time Markov control processes
17
Infinite horizon controlled Markov chains. We consider a continuous-time controlled Markov chain with denumerable state space. Our approach in based on the paper [13]. Without loss of generality, we assume that the state space X is the set of nonnegative integers. Define q(y|x, a) as the transition rate from state x ∈ X to state y ∈ X when the action a ∈ A(x) is chosen. Note that we do not assume that the Borel space A(x) is σ-compact. Indeed, this hypothesis was needed to derive the existence of a measurable selector as in [23, Corollary 4.3]. In our particular case, since X is discrete, any function f : X → A with f (x) ∈ A(x) is a measurable selector. Admissible strategies ϕ are defined as those for which t #→ qxy (t, ϕ) := q(y|x, ϕ(t, x)) is continuous on [0, ∞) for every x, y ∈ X. In [13] randomized strategies are also considered but, in fact, it is easily shown that we can restrict ourselves to deterministic strategies, as we did in Section 2. The following Assumption CMC (where CMC stands for “controlled Markov chains”) is taken from [13]. Assumption CMC. (a) The transition rates are conservative and stable, that is !
q(y|x, a) = 0
y∈X
and
q(x) := sup {−q(x|x, a)} < ∞ a∈A(x)
for every (x, a) ∈ K. (b) Assumption A (see Section 2) is satisfied. (c) For each x ∈ X, A(x) is compact " and the functions r(x, a) (r is the reward rate), q(y|x, a) and y∈X q(y|x, a)W (y) are continuous on A(x) for each x, y ∈ X. (d) There exists a nonnegative function W ′ on X and constants c′ > 0, b′ ≥ 0 and M ′ > 0 such that ! q(y|x, a)W ′ (y) ≤ c′ W ′ (x) + b′ q(x)W (x) ≤ M ′ W ′ (x) and y∈X
for (x, a) ∈ K.
18
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
Observe that Assumptions A(1) and A(2) in [13] are omitted in Assumption CMC above because they are implied by the fact that W is a Lyapunov function. It is proved in [13] that Assumptions CMC(a) and CMC(b) imply that V ∗ is the unique solution in BW of the infinite horizon optimality equation and that there exist ε-optimal stationary policies. Furthermore, if Assumptions CMC(c) and CMC(d) are satisfied, then there exists an optimal stationary policy. Let us now show that our assumptions in this paper are implied by Assumption CMC. Theorem 5.2 (i) Suppose that Assumptions CMC(a) and CMC(b) are satisfied. Then Assumptions A and E are verified and, therefore, V ∗ is the unique solution v in BW of the equation ! q(y|x, a)v(y)} for x ∈ X αv(x) = sup {r(x, a) + a∈A(x)
y∈X
and, for every ε > 0, there exist ε-optimal stationary strategies. (ii) If, in addition, Assumptions CMC(c)–(d) are satisfied, then the hypotheses of Corollary 4.6 are verified and, therefore, there exist optimal stationary strategies. This theorem shows that our general approach may be used to derive the result stated in [13, Theorem 3.2] for continuous-time controlled Markov chains.
6
Preliminary results
In this section we present several useful results. Although most of them are well known in the literature of continuous-time control models, we include them here for completeness. Lemma 6.1 For every x ∈ X and s ≥ 0 V ∗ (s, x) := sup V (ϕ, s, x) = sup V (ϕ, x) =: V ∗ (x). ϕ∈Φ
ϕ∈Φ
Proof. For an arbitrary ϕ ∈ Φ define ϕ+s ∈ Φ as ϕ+s (t, ·) := ϕ(s + t, ·) for t ≥ 0,
Continuous-time Markov control processes
and ϕ−s ∈ Φ as
!
ϕ−s (t, ·) := ϕ(0, ·)
19
if 0 ≤ t < s
ϕ−s (t, ·) := ϕ(t − s, ·) if t ≥ s.
It turns out that V (ϕ, s, ·) = V (ϕ+s , ·) and V (ϕ, ·) = V (ϕ−s , s, ·). Taking the supremum in these equalities yields the stated fact.
!
Our next result, which is stated without proof, is a consequence of [20, Theorem 2.1(iii)]. See also [13, Theorem 3.1]. Lemma 6.2 Suppose that Assumption A(a) is verified and let ϕ ∈ Φ. Then for every s ≥ 0 and x ∈ X b Eϕ,s,x W (x(t, ϕ)) ≤ ec(t−s) W (x) + (ec(t−s) − 1) c
for every t ≥ s.
This lemma, together with Assumption A(b), shows that the expressions (2)–(6) are well defined and finite. This lemma also ensures that the functions we will deal with (e.g. V (ϕ, s, ·), V ∗ , V ∗m (s, ·), etc.) are in BW . We state without proof the standard “product derivative” property of the generator L (see, e.g., [14, Lemma 2.1]). Lemma 6.3 Suppose that ϕ ∈ Φ and v : [0, ∞) × X → R in Dϕ are given. For α > 0, define the function v : [0, ∞) × X → R as v(t, x) := e−αt v(t, x). Then v is in Dϕ and (Lϕ(t,x) v)(t, x) = e−αt (Lϕ(t,x) v)(t, x) − αe−αt v(t, x) for x ∈ X and t ≥ 0. Our following three lemmas are usually known as verification theorems. Lemma 6.4 Suppose that Assumption A is verified. Fix a time horizon [0, m], where m > 0. Given ϕ ∈ Φ, if a function v : [0, m] × X → R in Dϕ satisfies, for all x ∈ X and 0 ≤ t < m, (14)
αv(t, x) = r(x, ϕ(t, x)) + (Lϕ(t,x) v)(t, x), v(m, x) = 0,
20
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
then (15)
v(s, x) = V m (ϕ, s, x)
for all x ∈ X and s ∈ [0, m].
If (14) holds with inequality ≥ or ≤ then the equality in (15) is replaced with ≥ or ≤, respectively. Proof. We fix s ∈ [0, m) and an initial state x ∈ X at time s. By (14) we have, for s ≤ t ≤ m, (16) αv(t, x(t, ϕ)) = r(x(t, ϕ), ϕ(t, x(t, ϕ))) + (Lϕ(t,x(t,ϕ)) v)(t, x(t, ϕ)) (17) v(m, x(m, ϕ)) = 0. Multiplying (16) by e−α(t−s) , integrating over the interval [s, m], and then taking expectation Eϕ,s,x gives !" m
# αe−α(t−s) v(t, x(t, ϕ))dt (18) !" m # = V m (ϕ, s, x) + Eϕ,s,x s e−α(t−s) (Lϕ(t,x(t,ϕ)) v)(t, x(t, ϕ))dt . Eϕ,s,x
s
Applying Lemma 6.3 to v : (t, x) $→ e−αt v(t, x), from (18) and using Dynkin’s formula, we obtain 0 = V m (ϕ, s, x) + eαs [e−αm Eϕ,x,s [v(m, x(m, ϕ))] − e−αs v(s, x)], and thus (recall (17)) v(s, x) = V m (ϕ, s, x). The proof when the equality is replaced with an inequality is made using the same technique. ! In Lemma 6.5 below we state a verification theorem for the infinite horizon discounted payoff of a stationary strategy. The corresponding result for a nonstationary strategy is given in Lemma 6.6. Lemma 6.5 Suppose that Assumption A is satisfied, and let f ∈ F. The infinite horizon discounted reward V (f, ·) is the unique solution v in Df ∩ BW of the equation (19)
αv(x) = r(x, f (x)) + (Lf (x) v)(x)
for every x ∈ X.
If (19) holds with inequality ≥ or ≤, then v ≥ V (f, ·) or v ≤ V (f, ·), respectively.
Continuous-time Markov control processes
21
Proof. Lemma 6.2 implies that the function x !→ r(x, f (x)) is in the domain of the α-resolvent of the Markov process x(·, f ). Indeed, # !" ∞ −αt Ef,x e |r(x(t, f ), f (x(t, f )))|dt 0 " ∞ ≤ M e−αt Ef,x [W (x(t, f ))]dt < ∞. 0
As a consequence of [8, Lemma 4.3], it follows that V (f, ·) ∈ Df is a solution of (19). Suppose now that v ∈ Df ∩ BW satisfies (19). Then v(x(t, f )) = r(x(t, f ), f (x(t, f ))) + (Lf (x(t,f )) v)(x(t, f )) for all t ≥ 0. Multiplying this expression by e−αt , integrating over [0, T ], then taking expectation Ef,x , and using Dynkin’s formula as in Lemma 6.4, gives that # !" T −αt v(x) = Ef,x e r(x(t, f ), f (x(t, f )))dt + e−αT Ef,x [v(x(T, f ))]. 0
By Lemma 6.2 and dominated convergence, it follows that # !" T −αt lim Ef,x e r(x(t, f ), f (x(t, f )))dt = V (f, x). T →∞
0
Also by Lemma 6.2 and recalling that v is in BW , e−αT Ef,x [v(x(T, f ))] tends to zero as T → ∞. It follows that v = V (f, ·). The result for the inequalities is derived similarly. ! Lemma 6.6 Suppose that Assumptions A, B(d) and C are verified. Given ϕ ∈ Φ, s ≥ 0 and x ∈ X, the following holds: (20)
αV (ϕ, s, x) = r(x, ϕ(s, x)) + (Lϕ(s,x) V (ϕ, ·, ·))(s, x).
Proof. The proof technique is similar to that of Theorem 3.1 and, therefore, we will skip some details. We proceed by contradiction. If the stated result does not hold then there exist s ≥ 0 and x ∈ X such that for some ε > 0 either (21) or (22)
αV (ϕ, s, x) + 2ε ≤ r(x, ϕ(s, x)) + (Lϕ(s,x) V (ϕ, ·, ·))(s, x) αV (ϕ, s, x) − 2ε ≥ r(x, ϕ(s, x)) + (Lϕ(s,x) V (ϕ, ·, ·))(s, x).
22
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
Suppose for instance that (21) holds. By the continuity Assumption C, there exist δ > 0 and a neighborhood N of x such that (23)
αV (ϕ, t, y) + ε ≤ r(y, ϕ(t, y)) + (Lϕ(t,y) V (ϕ, ·, ·))(t, y)
for s ≤ t ≤ s + δ and y ∈ N . Define τδ := min{τ (x, N, ϕ, s), s + δ}, and notice that Assumption B(d) implies that Eϕ,s,x [e−α(τδ −s) ] < 1. Inequality (23) gives αV (ϕ, t, x(t, ϕ)) + ε ≤ r(x(t, ϕ), ϕ(t, x(t, ϕ))) + (Lϕ(t,x(t,ϕ)) V (ϕ, ·, ·))(t, x(t, ϕ)) for s ≤ t ≤ τδ . Multiplying this expression by e−α(t−s) , integrating over [s, τδ ], taking expectation Eϕ,s,x , and then using Dynkin’s formula yields ε (Eϕ,s,x [e−α(τδ −s) ] − 1) ≥ 0, α which is a contradiction. Similary, one can prove that (22) cannot hold, and the stated result follows. ! Observe that to prove the converse of Lemma 6.6 (cf. Lemma 6.5) and, in order to use the same technique as in the proof of Lemma 6.5, we would need to impose some boundedness assumptions on the function v(·, x) solution of (20). The next result deals with the existence of measurable maximizers for a multifunction. For its proof see Theorems 1 and 2 in [3, Section VI.3], for instance. Lemma 6.7 Suppose that the multifunction Ψ from X to A is continuous and compact-valued. Let g : K → R be a continuous function and define the function h as h(x) := max g(x, a) a∈Ψ(x)
for x ∈ X.
Then there exists a measurable selector f : X → A such that h(x) = g(x, f (x)) for each x ∈ X and, moreover, h is continuous.
Continuous-time Markov control processes
23
Finally, to conclude this section we propose a more easily verifiable sufficient condition for Assumption B(d). Lemma 6.8 Suppose that given x ∈ X, a neighborhood N of x, and s ≥ 0, there exists δ0 > 0 such that inf Pϕ,s,x {τ (x, N, ϕ, s) ≥ s + δ0 } > 0.
ϕ∈Φ
Then Assumption B(d) is verified. Proof. Fix x ∈ X, a neighborhood N of x and s ≥ 0. Observe that the function δ #→ inf Pϕ,s,x {τ (x, N, ϕ, s) ≥ s + δ} ϕ∈Φ
is decreasing, and so, our hypothesis implies that inf Pϕ,s,x {τ (x, N, ϕ, s) ≥ s + δ} > 0 for all 0 < δ ≤ δ0 .
ϕ∈Φ
Now fix ϕ ∈ Φ and, to simplify the notation, let D(δ, ϕ) := {τ (x, N, ϕ, s) ≥ s + δ}. Note that Eϕ,s,x [exp{−α min{τ (x, N, ϕ, s) − s, δ}}] equals Eϕ,s,x [ID(δ,ϕ) e−αδ ] + Eϕ,s,x [IDc (δ,ϕ) exp{−α(τ (x, N, ϕ, s) − s)}]. As a consequence, Eϕ,s,x [exp{−α min{τ (x, N, ϕ, s)−s, δ}}] ≤ 1−(1−e−αδ )Pϕ,s,x (D(δ, ϕ)). Hence, sup Eϕ,s,x [exp{−α min{τ (x, N, ϕ, s) − s, δ}}] < 1
ϕ∈Φ
for all 0 < δ ≤ δ0 . Observe now that δ #→ sup Eϕ,s,x [exp{−α min{τ (x, N, ϕ, s) − s, δ}}] ϕ∈Φ
is a decreasing function, from which we conclude that Assumption B(d) is satisfied. !
24
7
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
Proof of results in Sections 3 and 4
Proof of Theorem 3.1. It is obvious that V ∗m satisfies (9). To prove (8) we will proceed as follows. (i) Show that (24)
αV ∗m (s, x) ≤ sup {r(x, a) + (La V ∗m )(s, x)} a∈A(x)
for 0 ≤ s < m and x ∈ X. (ii) Prove that there exist ε-optimal strategies. (iii) Show that αV ∗m (s, x) ≥ sup {r(x, a) + (La V ∗m )(s, x)} a∈A(x)
for 0 ≤ s < m and x ∈ X. (iv) Finally, we will establish that V ∗m is the unique solution of (8)–(9) that satisfies Assumption B(c). Proof of (i). We proceed by contradiction. If (24) does not hold then there exist 0 ≤ s < m and x ∈ X such that V ∗m (s, x) > sup {r(x, a) + (La V ∗m )(s, x)}. a∈A(x)
Our continuity assumptions B(a) and B(b) ensure that there exist ε > 0, δ ∈ (0, m − s) and a neighborhood N of x such that for every y ∈ N and t ∈ [s, s + δ] V ∗m (t, y) − ε > sup {r(y, a) + (La V ∗m )(t, y)}, a∈A(y)
and thus for every ϕ ∈ Φ (25)
V ∗m (t, y) − ε ≥ r(y, ϕ(t, y)) + (Lϕ(t,y) V ∗m )(t, y)
for s ≤ t ≤ s + δ and y ∈ N . Let (26)
τδ := min{τ (x, N, ϕ, s), s + δ}
where, for notational simplicity, we omit x, N , ϕ and s.
Continuous-time Markov control processes
25
Hence, for s ≤ t ≤ τδ the process {x(t, ϕ)} verifies V ∗m (t, x(t, ϕ)) − ε ≥ r(x(t, ϕ), ϕ(x(t, ϕ))) + (Lϕ(t,x(t,ϕ)) V ∗m )(t, x(t, ϕ)). Multiplying this expression by e−α(t−s) , integrating over the interval [s, τδ ], and then taking expectation Eϕ,s,x yields Eϕ,s,x (27)
!" τδ s
≥
# e−α(t−s) V ∗m (t, x(t, ϕ))dt − αε (1 − Eϕ,s,x e−α(τδ −s) ) !" τ Eϕ,s,x s δ e−α(t−s) [r(x(t, ϕ), ϕ(x(t, ϕ))) # + (Lϕ(t,x(t,ϕ)) V ∗m )(t, x(t, ϕ))]dt .
Then observe that by the Markov property !" τ # V m (ϕ, s, x) = Eϕ,s,x s δ e−α(t−s) [r(x(t, ϕ), ϕ(x(t, ϕ)))]dt (28) + Eϕ,s,x [e−α(τδ −s) V m (ϕ, τδ , x(τδ , ϕ))], ∗m
and also that defining V (t, y) := e−αt V ∗m (t, y) for 0 ≤ t ≤ m and y ∈ X, by Lemma 6.3 we have (29)
(La V
∗m
)(t, y) = e−αt (La V ∗m )(t, y) − αe−αt V ∗m (t, y).
Hence, applying Dynkin’s formula we obtain Eϕ,s,x [e−ατδ V ∗m (τδ , x(τδ , ϕ))] − e−αs V ∗m (s, x) $" % (30) ∗m τ = Eϕ,s,x s δ (Lϕ(t,x(t,ϕ)) V )(t, x(t, ϕ))dt . Substituting (28), (29) and (30) in (27) gives
V ∗m (s, x) − V m (ϕ, s, x) ≥ αε (1 − Eϕ,s,x [e−α(τδ −s) ])
+ Eϕ,s,x [e−α(τδ −s) (V ∗m (τδ , x(τδ , ϕ)) − V m (ϕ, τδ , x(τδ , ϕ)))],
which, by (3), implies that V ∗m (s, x) − V m (ϕ, s, x) ≥
ε (1 − Eϕ,s,x [e−α(τδ −s) ]). α
By Assumption B(d) there exists some η such that Eϕ,s,x [e−α(τδ −s) ] ≤ η < 1 for every ϕ ∈ Φ. We deduce that V ∗m (s, x) − V m (ϕ, s, x) ≥
ε (1 − η) for every ϕ ∈ Φ, α
26
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
which contradicts the definition of V ∗m (see (3)). This establishes statement (i). Proof of (ii). We fix an arbitrary ε > 0. The continuity of the function a !→ r(x, a) + (La V ∗m )(s, x) (Assumption B(c)), together with the fact that A(x) is σ-compact and that K is measurable, implies that the hypotheses of [23, Corollary 4.3] are fulfilled. So, there exists a measurable selector or, in other words, there exists ϕ ∈ Φ such that (31)
αV ∗m (t, y) ≤ r(y, ϕ(t, y)) + (Lϕ(t,y) V ∗m )(t, y) + αε
for 0 ≤ t < m and y ∈ X (recall (24)), and, therefore, (32)
α(V ∗m − ε1)(t, x(t, ϕ)) ≤ r(x(t, ϕ), ϕ(t, x(t, ϕ)))
+ (Lϕ(t,x(t,ϕ)) (V ∗m − ε1))(t, x(t, ϕ))
for 0 ≤ t < m, where 1 is the constant function equal to 1. Multiplying expression (32) by e−α(t−s) , where 0 ≤ s < m, integrating over the interval t ∈ [s, m] and then taking expectation Eϕ,s,x (for arbitrary x ∈ X) yields, using standard arguments, that V ∗m (s, x) − ε ≤ V m (ϕ, s, x) + e−α(m−s) (Eϕ,s,x [V ∗m (m, x(m, ϕ))] − ε) or, equivalently, taking into account (9) (cf. Lemma 6.4), (33)
V ∗m (s, x) ≤ V m (ϕ, s, x) + ε(1 − e−α(m−s) )
for 0 ≤ s < m and x ∈ X.
This shows that ϕ ∈ Φ is ε-optimal.
Proof of (iii). To prove (iii) we will also proceed by contradiction. Hence suppose that there exist x ∈ X and s ∈ [0, m) such that αV ∗m (s, x) < sup {r(x, a) + (La V ∗m )(s, x)}. a∈A(x)
Therefore, by Assumption B, there exist β > 0, δ ∈ (0, m − s) and a neighborhood N of x such that αV ∗m (t, y) + β ≤ sup {r(y, a) + (La V ∗m )(t, y)} a∈A(y)
for y ∈ N and s ≤ t ≤ s + δ. Fix arbitrary ε > 0 and choose ϕ ∈ Φ as in (31). Then (34)
αV ∗m (t, y) + β ≤ r(y, ϕ(t, y)) + (Lϕ(t,y) V ∗m )(t, y) + αε
Continuous-time Markov control processes
27
for y ∈ N and s ≤ t ≤ s + δ. Observe that (34) is the same as (25) except for the inequality sign and the value of the constants. Hence, exactly as in the proof of (i) and with τδ defined similarly (see (26)), we obtain αε−β −α(τδ −s) ]) α (1 − Eϕ,s,x [e Eϕ,s,x [e−α(τδ −s) (V ∗m (τδ , x(τδ , ϕ)) − V m (ϕ, τδ , x(τδ , ϕ)))].
V ∗m (s, x) − V m (ϕ, s, x) ≤ +
Recalling (33), this yields that
(35)
0 ≤ V ∗m (s, x) − V m (ϕ, s, x) ≤
αε−β α (1
− Eϕ,s,x [e−α(τδ −s) ])
+ ε(Eϕ,s,x [e−α(τδ −s) ] − e−α(m−s) ).
By Assumption B(d), there exists some η such that Eϕ,s,x [e−α(τδ −s) ] ≤ η < 1 for all ϕ ∈ Φ and thus, letting ε tend to 0 in (35), we obtain 0 ≤ − αβ (1 − η), which is not possible. Therefore, we have proved (iii). Proof of (iv). Finally, the uniqueness of the solution of (8)–(9) follows from standard arguments. More precisely, αv(s, x) ≥ sup {r(x, a) + (La v)(s, x)} for 0 ≤ s < m and x ∈ X a∈A(x)
and (9) imply that V ∗m ≤ v (see Lemma 6.4), whereas the reverse inequality gives (as in the proof of (ii) and recalling that v satisfies Assumption B(c)) the existence of strategies with payoff arbitrarily close to v, thus establishing that v = V ∗m . ! Proof of Corollaries 3.2 and 3.3. These results are easily proved using [23, Corollary 4.3] and Lemma 6.7. ! Proof of Theorem 4.1. Fix a time horizon [0, m], with m > T , and an arbitrary strategy ϕ ∈ Φ. By Theorem 3.1 we have that, for 0 ≤ t < m and y ∈ X, (36)
αV ∗m (t, y) ≥ r(y, ϕ(t, y)) + (Lϕ(t,y) V ∗m )(t, y).
Recalling Lemma 6.6 we also have (37)
αV (ϕ, t, y) = r(y, ϕ(t, y)) + (Lϕ(t,y) V (ϕ, ·, ·))(t, y)
28
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
for 0 ≤ t < m and y ∈ X. Substracting (37) from (36) gives that for 0 ≤ t < m and y ∈ X αV ∗m (t, y) − αV (ϕ, t, y) ≥ (Lϕ(t,y) (V ∗m − V (ϕ, ·, ·)))(t, y) or, equivalently, (Lϕ(t,y) V )(t, y) ≤ 0, where we define V (t, y) := e−αt (V ∗m (t, y) − V (ϕ, t, y)) (recall Lemma 6.3). Using Dynkin’s formula we derive that Eϕ,s,x [e−αm (V ∗m (m, x(m, ϕ)) − V (ϕ, m, x(m, ϕ)))]
≤ e−αs (V ∗m (s, x) − V (ϕ, s, x)).
By Theorem 3.1, we have that V ∗m (m, x(m, ϕ)) = 0 and thus, for every x ∈ X, 0 ≤ s ≤ T and ϕ ∈ Φ, −Eϕ,s,x [e−αm V (ϕ, m, x(m, ϕ))] ≤ e−αs (V ∗m (s, x) − V (ϕ, s, x)). Using properties of the conditional expectation, one can easily show that Eϕ,s,x [e−αm V (ϕ, m, x(m, ϕ))] = Eϕ,s,x
!" ∞ m
# e−αt r(x(t, ϕ), ϕ(t, x(t, ϕ)))dt .
Now, by Lemma 6.2, for x ∈ X and s ∈ [0, T ],
|Eϕ,s,x [e−αm V (ϕ, m, x(m, ϕ))]| ≤ W (x)H(T, m), where $
(38) H(T, m) := M ecT
%
e−(α−c)m (1 + (b/c)) α−c
&
' e−αm + (b/c) , α
with b, c and M as in Assumption A. Therefore, for every ϕ ∈ Φ, x ∈ X and s ∈ [0, T ], (39)
V (ϕ, s, x) − V ∗m (s, x) ≤ W (x)H(T, m)eαT .
Taking the supremum in (39) over ϕ ∈ Φ and recalling Lemma 6.1 gives (40)
V ∗ (x) − V ∗m (s, x) ≤ W (x)H(T, m)eαT
for x ∈ X and 0 ≤ s ≤ T .
Continuous-time Markov control processes
29
To prove the theorem we will show that an inequality similar to (40) holds, but now with a lower bound. Fix a positive number ε and let ϕ be ε-optimal for the finite horizon problem over the time horizon [0, m], that is, V m (ϕ, s, x) + ε ≥ V ∗m (s, x) for every x ∈ X and s ∈ [0, m]. On the other hand, V (ϕ, s, x) = V (ϕ, s, x) + Eϕ,s,x m
≤ V ∗ (x)
!"
∞
−α(t−s)
e
r(x(t, ϕ), ϕ(t, x(t, ϕ)))dt
m
#
and, therefore, using the bound (39), we obtain V ∗m (s, x) − H(T, m)W (x)eαT ≤ V ∗ (x) + ε for every x ∈ X, ε > 0 and 0 ≤ s ≤ T . Since ε is arbitrary, this yields, together with (40), sup ||V ∗m (s, ·) − V ∗ ||W ≤ H(T, m)eαT
for m > 0.
0≤s≤T
Since H(T, m) converges to zero when T is fixed and m → ∞, we obtain the stated result. ! Proof of Corollary 4.3. By Assumption B(a), x '→ V ∗m (0, x) is a continuous function. Since W is bounded on compact sets, it follows that {V ∗m (0, ·)}m>0 converges uniformly to V ∗ on compact sets. Recall! ing that X is locally compact, this proves that V ∗ is continuous. Remark 7.1 Let us make some final comments on these results. If we do not impose Assumption C then, in the proof of Theorem 4.1, expressions (36) and (37) would be valid only for stationary strategies f ∈ F. Thus defining V˜ ∗ (x) := supf ∈ V (f, x) and recalling Lemma 6.5, inequality (40) would become V˜ ∗ (x) − V ∗m (s, x) ≤ H(T, m)W (x)eαT
for x ∈ X and 0 ≤ s ≤ T .
The rest of the proof of Theorem 4.1 remains valid, and then −H(T, m)W (x)eαT ≤ V ∗ (x) − V ∗m (s, x)
for x ∈ X and 0 ≤ s ≤ T .
30
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
It follows that if we knew in advance that V ∗ = V˜ ∗ , then Theorem 4.1 would be verified without imposing Assumption C. However, the (well known) fact that V ∗ = V˜ ∗ is derived using the infinite horizon optimality equation, which has not yet been proved. Proof of Theorem 4.4. First of all let us prove that for every x ∈ X (41)
lim sup |(Lf (x) V ∗m )(0, x) − (Lf (x) V ∗ )(x)| = 0.
m→∞ f ∈
We will proceed by contradiction. Hence, suppose that there exist x ∈ X, ε > 0 and fm ∈ F such that |(Lfm (x) V ∗m )(0, x) − (Lfm (x) V ∗ )(x)| > ε for infinitely many values of m. Therefore, for infinitely many m, either (42) or (43)
(Lfm (x) V ∗m )(0, x) − (Lfm (x) V ∗ )(x) > ε (Lfm (x) V ∗m )(0, x) − (Lfm (x) V ∗ )(x) < −ε.
Suppose for instance that (42) holds. By Assumption D(b) there exists δ > 0 and a neighborhood N of x such that (44)
(Lfm (y) V ∗m )(t, y) − (Lfm (y) V ∗ )(y) ≥ ε/2
for 0 ≤ t ≤ δ and y ∈ N , for infinitely many m. Observe that, by equicontinuity, neither δ nor N depend on m. Let τm be defined by τm := min{τ (x, N, fm ), δ} and observe that (44) implies that for 0 ≤ t ≤ τm (Lfm (x(t,fm )) V ∗m )(t, x(t, fm )) − (Lfm (x(t,fm )) V ∗ )(x(t, fm )) ≥ ε/2. As a consequence of Dynkin’s formula we obtain that Efm ,x [V ∗m (τm , x(τm , fm )) − V ∗ (x(τm , fm ))] − [V ∗m (0, x) − V ∗ (x)] ≥
ε 2 Efm ,x [τm ].
It follows from Theorem 4.1, the fact that τm ≤ δ and Lemma 6.2, that the left-hand side of the above inequality is less than or equal to ! " b cδ cδ e W (x) + (e + 1) H(δ, m)eαδ + W (x)H(0, m). c
Continuous-time Markov control processes
31
On the other hand, Assumption B(d) ensures that there exists some η > 0 such that Efm ,x [τm ] ≥ η for all m. Therefore, for infinitely many m " ! b |c|δ εη cδ e W (x) + (e + 1) H(δ, m)eαδ + W (x)H(0, m) ≥ , c 2 which is not possible because H(δ,m) and H(0,m) tend to zero as m→∞. A similar conclusion is obtained if (43) holds for infinitely many m. Therefore, (41) shows that for fixed x ∈ X r(x, a) + (La V ∗m )(0, x) → r(x, a) + (La V ∗ )(x) uniformly in a ∈ A(x) as m → ∞, and thus lim
sup {r(x, a) + (La V ∗m )(0, x)} = sup {r(x, a) + (La V ∗ )(x)}.
m→∞ a∈A(x)
a∈A(x)
This also yields that a %→ r(x, a) + (La V ∗ )(x) is continuous on A(x).
(45)
Hence, letting m → ∞ in the finite horizon optimality equation αV ∗m (0, x) = sup {r(x, a) + (La V ∗m )(0, x)} a∈A(x)
shows that αV ∗ (x) = sup {r(x, a) + (La V ∗ )(x)} for x ∈ X. a∈A(x)
The existence of ε-optimal strategies is derived, as in the proof of Theorem 3.1, using (45) and [23, Corollary 4.3]. The uniqueness of the solution is proved as in Theorem 3.1. This completes the proof of Theorem 4.4. ! Proof of Theorem 4.5. The proof mimics that of Theorem 3.1 except for some technical details derived from the fact that the optimality equation does not depend on the time component. However, for completeness of the exposition, we give a full proof of the theorem. We will show that V ∗ satisfies (10) and then we will establish the uniqueness property. First of all, let us prove that (46)
αV ∗ (x) ≤ sup {r(x, a) + (La V ∗ )(x)} for every x ∈ X. a∈A(x)
32
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
We will proceed by contradiction. Hence, suppose that there exists some x ∈ X such that αV ∗ (x) > sup {r(x, a) + (La V ∗ )(x)}. a∈A(x)
By Assumption E(b), there exists some ε > 0 and a neighborhood N of x with (47) αV ∗ (y) − ε ≥ sup {r(y, a) + (La V ∗ )(y)} for every y ∈ N . a∈A(y)
Given arbitrary ϕ ∈ Φ and T > 0, define (48)
τ (ϕ, T ) := min{τ (x, N, ϕ), T }
where for notational convenience, we have omitted x and N . By (47), for every 0 ≤ t ≤ τ (ϕ, T ) we have αV ∗ (x(t, ϕ)) − ε ≥ r(x(t, ϕ), ϕ(x(t, ϕ))) + (Lϕ(t,x(t,ϕ)) V ∗ )(x(t, ϕ)). Multiplying both sides of the above inequality by e−αt , integrating over the interval [0, τ (ϕ, T )], and then taking expectation Eϕ,x yields # !" τ (ϕ,T ) αe−αt V ∗ (x(t, ϕ))dt − αε (1 − Eϕ,x [e−ατ (ϕ,T ) ]) Eϕ,x 0 !" τ (ϕ,T ) −αt (49) ≥ Eϕ,x 0 e [r(x(t, ϕ), ϕ(x(t, ϕ))) + (Lϕ(t,x(t,ϕ)) V ∗ )(x(t, ϕ))]dt ] .
Now observe that the Markov property gives # !" τ (ϕ,T ) −αt e r(x(t, ϕ), ϕ(x(t, ϕ)))dt V (ϕ, x) = Eϕ,x 0 (50) + Eϕ,x [e−ατ (ϕ,T ) V (ϕ, x(τ (ϕ, T ), ϕ))]. Observe also that the extended generator applied to the function V defined as ∗
V (t, y) := e−αt V ∗ (y) for t ≥ 0 and y ∈ X, verifies (by Lemma 6.3) that for t ≥ 0 and y ∈ X ∗
(Lϕ(t,y) V )(t, y) = e−αt (Lϕ(t,y) V ∗ )(y) − αe−αt V ∗ (y)
∗
Continuous-time Markov control processes
33
and also that, by Dynkin’s formula, (51)
Eϕ,x [e−ατ (ϕ,T ) V ∗ (x(τ (ϕ, T ), ϕ))] − V ∗ (x) !" # ∗ τ (ϕ,T ) = Eϕ,x 0 (Lϕ(t,x(t,ϕ)) V )(t, x(t, ϕ))dt .
Substituting (50) and (51) in the inequality (49) shows that V ∗ (x) − V (ϕ, x) −
ε (1 − Eϕ,x [e−ατ (ϕ,T ) ]) ≥ α
(52) Eϕ,x [e−ατ (ϕ,T ) [V ∗ (x(τ (ϕ, T ), ϕ)) − V (ϕ, x(τ (ϕ, T ), ϕ))]]. Since V ∗ (·) ≥ V (ϕ, ·), (52) gives V ∗ (x) − V (ϕ, x) ≥
ε (1 − Eϕ,x [e−ατ (ϕ,T ) ]). α
If T ↑ ∞ then τ (ϕ, T ) ↑ τ (x, N, ϕ) and, by monotone convergence, V ∗ (x) − V (ϕ, x) ≥
ε (1 − Eϕ,x [e−ατ (x,N,ϕ) ]). α
Therefore, by Assumption E(d), there exists some δ > 0 such that V ∗ (x) − V (ϕ, x) ≥ δ for every ϕ ∈ Φ, which contradicts the definition of V ∗ . This establishes (46). Before proceeding to prove the reverse inequality, that is, (53)
αV ∗ (x) ≥ sup {r(x, a) + (La V ∗ )(x)} for every x ∈ X, a∈A(x)
we need first to prove the existence of ε-optimal strategies. Given ε > 0, [23, Corollary 4.3] and Assumption E(c) ensure that there exists some f ∈ F for which αV ∗ (x) ≤ r(x, f (x)) + (Lf (x) V ∗ )(x) + εα
for every x ∈ X,
which implies (Lemma 6.5) that f is ε-optimal. Now we are ready to prove (53). Again, we proceed by contradiction supposing that there exists some x ∈ X such that αV ∗ (x) < sup {r(x, a) + (La V ∗ )(x)}. a∈A(x)
By Assumption E(b), for some β > 0 and a neighborhood N of x, αV ∗ (y) + β ≤ sup {r(y, a) + (La V ∗ )(y)} for every y ∈ N . a∈A(y)
34
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
Given arbitrary ε > 0, let f ∈ F be such that αV ∗ (y) + β ≤ r(y, f (y)) + (Lf (y) V ∗ )(y) + αε for every y ∈ N . Exactly as before (only the inequality sign has changed) we can show that (cf. (52)) (54)
β−ε −ατ (f,T ) ]) α (1 − Ef,x [e [e−ατ (f,T ) [V ∗ (x(τ (f, T ), f )) − V
V ∗ (x) − V (f, x) + ≤ Ef,x
(f, x(τ (f, T ), f ))]],
where τ (f, T ) is defined as in (48). Now, by ε-optimality, V ∗ (·) ≤ V (f, ·) + ε and so, (54) implies V ∗ (x) − V (f, x) +
β−ε (1 − Ef,x [e−ατ (f,T ) ]) ≤ εEf,x [e−ατ (f,T ) ]. α
Letting T → ∞ and rearranging terms, we obtain V ∗ (x) − V (f, x)
≤ − αβ (1 − Ef,x [e−ατ (x,N,f )) ]) + ε
!1
α
+
"
1+α −ατ (x,N,f ) ] α Ef,x [e
.
Taking the lim supε→0 in this expression, recalling that f is ε-optimal and that (by Assumption E(d)) Ef,x [e−ατ (x,N,f ) ] is bounded away from 1 (uniformly in f ), yields a contradiction. This establishes (53) which, together with (46), proves that V ∗ satisfies (10). The uniqueness property is proved as in Theorem 4.4. ! Remark 7.2 Observe that Theorem 4.5 is verified if we replace the continuity condition on V ∗ (Assumption E(b)) with the following conditions: (i) V ∗ is lower semicontinuous, (ii) for every f ∈ F, V (f, ·) is continuous. The reason for this is that proving (46) only requires lower semicontinuity of V ∗ . Then the existence of ε-optimal stationary strategies shows that V (fn , ·) converges uniformly to V ∗ for some sequence {fn } ⊆ F, thus establishing that V ∗ is continuous, which is needed to prove (53). Proof of Corollaries 4.6 and 4.7. These results are easily proved using [23, Corollary 4.3] and Lemma 6.7. !
Continuous-time Markov control processes
8
35
Proof of results in Section 5
Finite horizon controlled diffusions. Lemma 8.1 If Assumption CD holds, then Assumption B(a) is satisfied. Proof. First of all, observe that V ∗T as defined in (13) is not the supremum of the expected reward of the family of Markov strategies, as defined in (3). However, as a consequence of [18, Theorem 5.1.2] (in particular, Assumption CD(d) is used) V ∗T (s, x) = sup V T (ϕ, s, x) for every 0 ≤ s ≤ T and x ∈ R. ϕ∈Φ
It is worth noting that the same result can be obtained replacing Assumption CD(d) with other nondegeneracy conditions; see [18, p. 214]. By [18, Theorem 4.7.7], we have that V ∗T is continuous on [0, T ] × R and also that V ∗T has bounded first derivative with respect to t ∈ [0, T ] and bounded second derivative with respect to x ∈ R . Thus, V ∗T ∈ ! ∩ϕ∈Φ Dϕ follows. Lemma 8.2 If Assumption CD is satisfied and, moreover, the function V ∗T is in C 1,2 ([0, T ] × R), then Assumptions B(b) and B(c) hold. Proof. The result is easily proved using Assumption CD and Lemma 6.7. ! Lemma 8.3 Assumption CD implies Assumption B(d). Proof. Fix x ∈ R, a neighborhood N of x and s ≥ 0. To simplify the notation, and without loss of generality, we will assume that s = 0. Suppose that (x − ε, x + ε) ⊆ N for some ε > 0. Let us prove that there exists δ > 0 such that inf Pϕ,x {τ (x, N, ϕ) > δ} > 0.
ϕ∈Φ
Recall that x(t, ϕ) = x +
!
t 0
b(s, x(s, ϕ), ϕ(s, x(s, ϕ)))ds ! t + σ(s, x(s, ϕ), ϕ(s, x(s, ϕ)))dWs 0
36
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
for 0 ≤ t ≤ T . Observe now that {τ (x, N, ϕ) ≤ δ} ⊆ { sup |x(t, ϕ) − x| ≥ ε}, 0≤t≤δ
and that { sup |x(t, ϕ) − x| ≥ ε} ⊆ D1 (δ) ∪ D2 (δ), 0≤t≤δ
where D1 (δ) :=
!"
0
and D2 (δ) :=
$
δ
|b(s, x(s, ϕ), ϕ(s, x(s, ϕ)))|ds ≥ ε/2
#
& %" t % % % sup %% σ(s, x(s, ϕ), ϕ(s, x(s, ϕ)))dWs %% ≥ ε/2 .
0≤t≤δ
0
Suppose that 0 < δ ≤ ε/4K (where K is as in Assumption CD(b)), and so !" δ # !" δ # 2 2 2 D1 (δ) ⊆ |x(s, ϕ)|ds ≥ ε/4K ⊆ x (s, ϕ)ds ≥ ε /16δK , 0
0
where the first inclusion follows from Assumption CD(b) and the second one from Jensen’s inequality. Therefore, from Chebychev’s inequality, ( '" δ 16δK 2 2 Eϕ,x x (s, ϕ)ds . Pϕ,x (D1 (δ)) ≤ ε2 0 On the other hand, by the martingale property of the stochastic integrals and Itˆo’s isometry (see [21, Corollary 3.2.6]) ( '" δ 4K 2 2 Eϕ,x (1 + |x(s, ϕ)|) ds Pϕ,x (D2 (δ)) ≤ ε2 0 ( '" δ 8δK 2 8K 2 2 ≤ + 2 Eϕ,x x (s, ϕ)ds . ε2 ε 0 Define now G(ϕ, x, δ) := Eϕ,x
'"
0
So far we have shown that (55)
Pϕ,x {τ (x, N, ϕ) ≤ δ} ≤
δ
( x2 (s, ϕ)ds .
8δK 2 16δK 2 + 8K 2 + G(ϕ, x, δ). ε2 ε2
Continuous-time Markov control processes
37
Therefore, to prove the stated result, it suffices to show that lim sup G(ϕ, x, δ) = 0.
(56)
δ→0 ϕ∈Φ
It follows from standard arguments (see the proof of Theorem 5.2.1 in [21]) that ! s 2 2 2 Eϕ,x [x2 (u, ϕ)]du Eϕ,x [x (s, ϕ)] ≤ 4K (T + 1)T + 4K (T + 1) 0
and, from Gronwall’s inequality [21, p. 78], Eϕ,x [x2 (s, ϕ)] ≤ 4K 2 (T + 1)T e4K
2 (T +1)T
=: ∆
for 0 ≤ s ≤ T ,
which proves that G(ϕ, x, δ) ≤ ∆δ, and (56) follows. Lemma 6.8 together with (55) completes the proof. ! Proof of Theorem 5.1. The proof follows from Lemmas 8.1–8.3, Theorem 3.1 and Corollary 3.2. ! Infinite horizon controlled Markov chains. Proof of Theorem 5.2. Proof of (i). Assumptions E(a) and E(b) are easily proved (recall that the state space X is endowed with the discrete topology). Note however that the continuity of a "→ r(x, a)+(La V ∗ )(x) stated in Assumption E(c) is not necessary because this property was used to derive the existence of a measurable selector. To prove that Assumption E(d) holds, we shall prove that Lemma 6.8, which implies Assumption B(d), which in turn implies Assumption E(d), is satisfied. Let x ∈ X, s ≥ 0 and N = {x}, which is a neighborhood of x. Then ! Pϕ,s,x {τ (x, N, ϕ, s) ≥ s + δ0 } = exp{
s+δ0
qxx (u, ϕ)du}
s
≥ exp{−q(x)δ0 },
where q(x) is defined as in Assumption CMC(a). Thus inf Pϕ,s,x {τ (x, N, ϕ, s) ≥ s + δ0 } > 0
ϕ∈Φ
and so, Lemma 6.8 is verified. This shows that Assumptions CMC(a)– (b) imply Assumption E.
38
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
Proof of (ii). It suffices to show that Assumption E(c) is satisfied, that is, we must prove that for each x ∈ X, the function ! (57) q(y|x, a)V ∗ (y) a "→ y∈X
is continuous on A(x). For " each integer p ≥ 0 we know from Assumption CMC(c) that a "→ 0≤y≤p q(y|x, a)V ∗ (y) is continuous on A(x). Therefore, if we prove that # # # #! # # lim sup # q(y|x, a)V ∗ (y)# = 0, p→∞ a∈A(x) # # y>p
then the continuity of (57) will follow. To this end, we will show that if p ≥ x then ! lim sup (58) q(y|x, a)W (y) = 0. p→∞ a∈A(x)
y>p
Observe that the decreasing sequence of continuous functions (indexed by p ≥ x) ! a "→ q(y|x, a)V ∗ (y) y>p
converges to zero and, therefore, by Dini’s theorem, convergence is uniform on the compact set A(x), thus proving (58). This completes the proof. ! Tom´ as Prieto-Rumeau Departmento de Estad´ıstica, Facultad de Ciencias, UNED, c/ Senda del Rey 9, 28040 Madrid, Spain, tprieto@ccia.uned.es
On´esimo Hern´ andez-Lerma Departamento de Matem´ aticas, CINVESTAV-IPN, Apartado Postal 14-740, M´exico D.F. 07000, Mexico, ohernand@math.cinvestav.mx
References [1] Almudevar A., A dynamic programming algorithm for the optimal control of Markov piecewise deterministic processes, SIAM J. Control Optim. 40 (2002), 525–539. [2] Bensoussan A.; Robin M., On the convergence of the discrete time dynamic programming equation for general semigroups, SIAM J. Control Optim. 20 (1982), 722–746.
Continuous-time Markov control processes
39
[3] Berge C., Topological Spaces, Macmillan, New York, 1963. [4] Bhatt A. G.; Borkar V. S., Occupation measures for controlled Markov processes: characterization and optimality, Ann. Probab. 24 (1996), 1531–1562. [5] Dempster M. A. H.; Ye J. J., Necessary and sufficient optimality conditions for control of piecewise deterministic Markov processes, Stoch. Stoch. Rep. 40 (1992), 125–145. [6] Doshi B. T., Continuous-time control of Markov processes on an arbitrary state space: discounted rewards, Ann. Statist. 4 (1976), 1219–1235. [7] Doshi B. T., Continuous-time control of Markov processes on an arbitrary state space: average return criterion, Stoch. Proc. Appl. 4 (1976), 55–77. [8] Down D.; Meyn S. P.; Tweedie R. L., Exponential and uniform ergodicity of Markov processes, Ann. Probab. 23 (1995), 1671–1691. [9] Feller W., On the integro-differential equations of purely discontinuous Markoff processes, Trans. Amer. Math. Soc. 48 (1940), 488–515. [10] Fleming W. H.; Rishel R. W., Deterministic and Stochastic Optimal Control, Springer, New York, 1975. [11] Fleming W. H.; Soner H. M., Controlled Markov Processes and Viscosity Solutions, Springer, New York, second edition, 2006. [12] Guo X. P., Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces, Math. Oper. Res. (2007), 73–87. [13] Guo X. P.; Hern´andez-Lerma O., Continuous-time controlled Markov chains with discounted rewards, Acta Appl. Math. 79 (2003), 195–216. [14] Hern´andez-Lerma O., Lectures on Continuous-Time Markov Control Processes, Sociedad Matem´atica Mexicana, Mexico City, 1994. [15] Hern´andez-Lerma O.; Govindan T. E., Nonstationary continuoustime Markov control processes with discounted costs on infinite horizons, Acta Appl. Math. 67 (2001), 277–293.
40
Tom´ as Prieto-Rumeau and On´esimo Hern´andez Lerma
[16] Hern´andez-Lerma O.; Lasserre J. B., Discrete-Time Markov Control Processes: Basic Optimality Criteria, Springer, New York, 1996. [17] Hern´andez-Lerma O.; Lasserre J. B., Further Topics on DiscreteTime Markov Control Processes, Springer, New York, 1999. [18] Krylov N. V., Controlled Diffusion Processes, Springer, New York, 1980. [19] Kurtz T. G.; Stockbridge R. H., Existence of Markov controls and characterization of optimal Markov controls, SIAM J. Control Optim. 36 (1998), 609–653. Erratum, ibid. 37 (1998), 1310–1311. [20] Meyn S. P.; Tweedie R. L., Stability of Markovian processes III: Foster-Lyapunov criteria for continuous-time processes, Adv. Appl. Prob. 25 (1993), 518–548. [21] Øksendal B., Stochastic Differential Equations, Fifth Edition, Springer, New York, 1998. [22] Pliska S. R., Controlled jump processes, Stoch. Proc. Appl. 3 (1975), 259–282. [23] Rieder U., Measurable selection theorems for optimization problems, Manuscripta Math. 24 (1978), 115–131. [24] Stockbridge R. H., Time-average control of martingale problems: existence of a stationary solution, Ann. Probab. 18 (1990), 190– 205. [25] Stockbridge R. H., Time-average control of martingale problems: a linear programming formulation, Ann. Probab. 18 (1990), 206–217.
Morfismos, Vol. 10, No. 1, 2006, pp. 41–58 Morfismos, Vol. 10, No. 1, 2006, pp. 41–58
On bounds for the stability number of graphs Isidoro Gitler
∗
Carlos E. Valencia
Abstract Let G be a graph without isolated vertices and let α(G) be its stability number and τ (G) its covering number. In this paper we study the minimum number of edges a connected graph can have as a function of α(G) and τ (G). In particular we obtain the following lower bound: q(G) ≥ α(G) − c(G) + Γ(α(G), τ (G)), where c(G) is the number of connected components of G and a
Γ(a, t) = min i=1
zi 2
z1 + · · · + za = a + t and zi ≥ 0 ∀ i = 1, . . . , a
,
for a and t two arbitrary natural numbers. Also we prove that α(G) ≤ τ (G)[1 + δ(G)], where δ(G) = α(G) − σv (G) and σv (G) is the σv -cover number of a graph, that is, the maximum natural number m such that every vertex of G belongs to a maximal independent set with at least m vertices.
2000 Mathematics Subject Classification: 05C69, 05C35. Keywords and phrases: q-minimal graph, stability number, covering number.
1
Preliminaries
Let G = (V, E) be a graph with |V | = n vertices and |E| = q edges. If U ⊆ V is a subset of vertices, then the induced subgraph on U , denoted ∗
Invited Article. Authors partially supported by CONACyT grant 49835 and SNI.
41
42
I. Gitler and C. Valencia
by G[U ], is the graph with U as a vertex set and whose edges are precisely the edges of G with both ends in U . A subset A ⊂ V is a minimal vertex cover for G if: (i) every edge of G is incident with at least one vertex in A, and (ii) there is no proper subset of A with the first property. If A satisfies condition (i) only, then A is called a vertex cover of G. The vertex covering number of G, denoted by τ (G), is the number of vertices in a minimum vertex cover in G, that is, the size of any smallest vertex cover in G. It is convenient to regard the empty set as a minimal vertex cover for a graph with all its vertices isolated. A subset M of V is called a stable set if no two vertices in M are adjacent. We call M a maximal stable set if it is maximal with respect to inclusion. The stability number of a graph G is given by α(G) = max{|M | | M ⊂ V (G) is a stable set in G}. Note that a set of vertices in G is maximal stable set if and only if its complement is a minimal vertex cover for G. Thus we have α(G) + τ (G) = n. A subset W of V is called a clique if any two vertices in W are adjacent. We call W maximal if it is maximal with respect to inclusion. The clique number of a graph G is given by ω(G) = max{|W | | W ⊂ V (G) is a clique in G}. Given a subset U ⊂ V , the neighbour set of U , denoted by N (U ), is defined as N (U ) = {v ∈ V | v is adjacent to some vertex in U }.
2
The number of edges of a connected graph with fixed stability number
We give in Theorem 2.3 a lower bound for the number of edges of a graph G as a function of the stability number α(G), the covering number τ (G) and the number of connected components c(G) of G. This is an answer to an open question posed by Ore in his book [6] which is a variant for connected graphs of a celebrated theorem of Turan [7]. For a graph G = (V, E), we will denote by q(G) the cardinality of the edge set E(G) of G. We say that a connected graph G is q-minimal if there is no graph G′ such that
On bounds for the stability number of graphs
43
(i) G′ is connected, (ii) α(G′ ) = α(G), (iii) τ (G′ ) = τ (G) and (iv) q(G′ ) < q(G). Hence if G is q-minimal, then either α(G) < α(G − e) or c(G) < c(G − e) for all the edges e of G (note that α(G) < α(G − e) if and only if τ (G) > τ (G − e)). That is, an edge of a q-minimal graph is either α-critical or a bridge. Therefore the blocks of a q-minimal graph are α-critical graphs. Here an edge e of a graph G is α-critical if α(G − e) = α(G) + 1, G is α-critical if all the edges of G are α-critical and is a τ -critical graph if τ (G − v) = τ (G) − 1 for all the vertices v of G.
In order to bound the number of edges we introduce the following numerical function. Let a and t be two natural numbers and let ! a # $ % & % z1 + · · · + za = a + t " zi % Γ(a, t) = min , % 2 % and zi ≥ 0 ∀ i = 1, . . . , a i=1
Lemma 2.1 Let a and t be natural numbers, then # $ # $ r r+1 (i) Γ(a, t) = (a − s) +s where a + t = r(a) + s with 2 2 0 ≤ s < a.
a+t 2 (ii) Γ(a − 1, t) − Γ(a, t) ≥ 12 (⌊ a+t a ⌋ − ⌊ a ⌋) ≥ 0 for all a ≥ 2 and t ≥ 1. (2 ' a+t ( ' ) if − Moreover we have that Γ(a − 1, t) − Γ(a, t) = 21 ( a+t a ' a+t ( ' a+t (2 ' a+t ( a t and only if a ≥ a−1 and we have that a − a = 0 if and only if 0 ≤ t < a. ' ( )t* (iii) Γ(a, t) − Γ(a, t − 1) = 1 + t−1 = a for all a ≥ 1 and t ≥ 2. a +k +k +k (iv) i=1 Γ(ai , ti ) ≥ Γ( i=1 ai , i=1 ti ) for all ai ≥ 1 and ti ≥ 1.
Furthermore we have that
(v)
.
Γ(a1 , t1 ) + Γ(a2 , t2 ) = Γ(a1 + a2 , t1 + t2 ) , - , if and only if at11 = at22 . /
' ( = 1 + at + L, where L = −1 if and only if a = 1, L = 0 if and only if either a̸ | t (a does not divide t) or a|t (a divides t) and a ≥ 2 and L = 1 if either 1 ≤ t < a or a + 2 ≤ t < 2a.
2(a−1+Γ(a,t)) a+t
44
I. Gitler and C. Valencia
Proof: (i) The case for a = 1 is trivial. For a ≥ 2 we will use the next result. Claim 2.2 Let n, m ≥ 1 be natural numbers with n > m + 1, then ! " ! " ! " ! " n m n−1 m+1 + > + . 2 2 2 2 ! " ! " n n−1 Proof: It follows easily, since − = n − 1. 2 2
✷
Let a ≥ 2 and t ≥ 1 be fixed natural numbers, (z! . , za ) ∈ Na 1 , . ." # # z such that ai=1 zi = a + t and let L(z1 , . . . , za ) = ai=1 i . Now, if 2 {z1 , . . . , za } ̸= {r, . . . , r, r + 1, . . . , r + 1} $ %& ' $ %& ' s
a−s
where a + t = r(a) + s with 0 ≤ s < a, then there exist zi1 and zi2 with zi1 > zi2 + 1. Applying Claim 2.2 we have that L(z1 , . . . za ) > L(z1 , . . . , zi1 − 1, . . . , zi2 + 1, . . . , za ) ≥ Γ(a, t), and therefore we obtain the result. (ii) Let a + t = ar + s with r ≥ 1 and 0 ≤ s < a, then
✷
a + t − 1 = (a − 1)(r + l) + s′ where r + s − 1 = (a − 1)l + s′ with l ≥ 0 and 0 ≤ s′ < a − 1. Using part (i) and after some algebraic manipulations we obtain that 2(Γ(a − 1, t) − Γ(a, t)) = (r2 − r) + (l2 − l)(a − 1) + 2ls′ . a+t 2 ′ Therefore Γ(a − 1, t) − Γ(a, t) ≥ 12 (⌊ a+t a ⌋ − ⌊ a ⌋) ≥ 0, since r, l, s ≥ 0 2 and u −u ≥ 0 for all u ≥ 0. Moreover we have that Γ(a−1, t)−Γ(a, t) = 1 a+t 2 a+t 2 (⌊ a ⌋ − ⌊ a ⌋) if and only if ′
(l, s ) =
(
(0, s′ ) (1, 0)
These two possibilities imply that r + s < a and r + s = a, rea+t 2 spectively. Finally it is clear that ⌊ a+t a ⌋ − ⌊ a ⌋ = 0 if and only if 0 ≤ t < a.
45
On bounds for the stability number of graphs
(iii) Let a + t − 1 = ar + s with r ≥ 1 and 0 ≤ s < a, then a+t=
!
ar + (s + 1) if 0 ≤ s < a − 1 a(r + 1) if s = a − 1
and by (i) we have that Γ(a, t) − Γ(a, t − 1) = & ' & ' & ' & ' ⎧ r r+1 r r+1 ⎪ ⎪ (a − s − 1) + (s + 1) − (a − s) +s ⎪ ⎪ 2 2 2 2 ⎨ = & ' & ' & ' ⎪ ⎪ r+1 r r+1 ⎪ ⎪ ⎩a − + (a − 1) 2 2 2 ) & ' & ' ( a+t−1 r+1 r . = − =r= 2 2 a
(iv) Follows directly from the definition of Γ(a, t). (v) Let a + t = ar + s with r ≥ 1 and 0 ≤ s < a then, by (i) we have that & ' & '' ⎤ ⎡ & r r+1 + * 2 a − 1 + (a − s) +s ⎢ ⎥ 2 2 2(a − 1 + Γ(a, t)) ⎢ ⎥ = ⎢ ⎥ a+t a+t ⎢ ⎥ ⎢ ⎥ =
= = =
*
2(a − 1) + (a − s)r(r − 1) + s(r + 1)r a+t * + 2(a − 1) + r(ar + s) − r(a − s) ar + s * + 2(a − 1) − r(a − s) r+ ar + s ( ) * + a+t 2(a − 1) − r(a − s) + a ar + s
+
46
I. Gitler and C. Valencia
Finally !
Since
" 2(a − 1) − r(a − s) L = ar + s ⎧ ⎪ ⎨−1 if a = 1, = 0 if a̸ | t or a|t, and a ≥ 2, ⎪ ⎩ 1 if either r = 1 and s ≥ 1 or r = 2 and s ≥ 2.
L ≥ −1 ⇔ −2(ar + s) < r(s − a) + 2(a − 1) ⇔ 2 < (a + s)(2 + r)
⇔ a, r ≥ 1,
L ≥ 0 ⇔ −(ar + s) < r(s − a) + 2(a − 1) ⇔ 2 < s(r + 1) + 2a
⇔ s > 0 or s = 0 and a ≥ 2, L ≥ 1 ⇔ 0 < r(s − a) + 2(a − 1)
⇔ 0 < (a − 1)(2 − r) + r(s − 1)
⇔ r = 1, s ≥ 1 or r = 2, s ≥ 2.
✷
Theorem 2.3 ([4, Theorem 3.3]) Let G be a graph, then q(G) ≥ α(G) − c(G) + Γ(α(G), τ (G)). Proof: We will use induction on τ (G), the covering number of G. For τ (G) = 1 it is easy to see that the unique connected graphs with τ (G) = 1 are the stars K1,n (α(K1,n ) = n − 1) and the result follows, since q(K1,n ) = n − 1 = (n − 1) − 1 + 1 = α(K1,n ) + c(K1,n ) + Γ(n − 1, 1). In the same way it is easy to see that the unique graphs G with α(G) = 1 are the complete graphs Kn (τ (Kn ) = n − 1). Since we have that, ' ( ' ( n n =1−1+ = α(Kn ) + c(Kn ) + Γ(1, n − 1), q(Kn ) = 2 2 it follows that the family of complete graphs satisfies the result.
On bounds for the stability number of graphs
47
Moreover the graphs of both families are q-minimal graphs. So we can assume that the result is true for τ (G) ≤ k > 1. graph with τ (G)! = k + 1. Since q(G) = !s Let G be a q-minimal !s s q(G ), α(G) = α(G ) and τ (G) = i i i=1 i=1 i=1 τ (Gi ) where G1 , . . . , Gs are the connected components of G, it follows from Lemma 2.1(iv) that we can assume with out loss of generality that G is connected and α(G) ≥ 2. Let e be an edge of G and consider the graph G′ = G − e. We have two possibilities " τ (G) ′ τ (G ) = τ (G) − 1 That is, an edge of G is either a bridge or α-critical.
Case 1 First assume that G has no bridges, that is, G is a α-critical graph. Let v be a vertex of G of maximum degree. Since any α-critical graph is τ -critical we have that τ (G−v) = τ (G)−1 and α(G−v) = α(G), moreover since the α-critical graphs are blocks we have that G − v is connected. Now, by the induction hypothesis we have that q(G − v) ≥ α(G) − 1 + Γ(α(G), τ (G) − 1). Using the formula α(G−v)+τ (G−v)
# i=1
deg(vi ) = 2q(G − v)
we conclude that there must exist a vertex v ′ ∈ V (G − v) with $ % 2q(G − v) ′ deg(v ) ≥ α(G − v) + τ (G − v) $ % 2(α(G) − 1 + Γ(α(G), τ (G) − 1)) ≥ . α(G) + τ (G) − 1 Now by Lemma 2.1(iii) and (v) we have that
(1)
q(G) = q(G − v) + deg(v)
≥ α(G) − 1 + Γ(α(G), τ (G) − 1) + deg(v ′ ) ≥ α(G) − 1 + Γ(α(G), τ (G)).
So, if the graph G has an edge that is a bridge, we have that c(G′ ) = c(G) − 1 = 2. Denote by G1 and G2 the connected components of G − e. We have two more cases:
48
I. Gitler and C. Valencia
Case 2 Assume that τ (G1 ) > 0 or τ (G2 ) > 0, then τ (G1 ) ≤ k and τ (G2 ) ≤ k and by the induction hypothesis we have that q(G1 ) ≥ α(G1 ) − 1 + Γ(α(G1 ), τ (G1 )), q(G2 ) ≥ α(G2 ) − 1 + Γ(α(G2 ), τ (G2 )). Using the above formulas and Lemma 2.1(iv) we have that q(G)
= q(G1 ) + q(G2 ) + 1 ≥ α(G1 ) − 1 + α(G2 ) − 1 + Γ(α(G1 ), τ (G1 )) + Γ(α(G2 ), τ (G2 )) + 1 = α(G) − 1 + Γ(α(G1 ), τ (G1 )) + Γ(α(G2 ), τ (G2 )) (iv)
≥ α(G) − 1 + Γ(α(G), τ (G))
Note that α(G) = α(G1 ) + α(G2 ) and τ (G) = τ (G1 ) + τ (G2 ). Case 3 Assume that there does not exist a bridge satisfying the above conditions, that is, for all the bridges of G we have that τ (G1 ) = 0 or τ (G2 ) = 0. In this case we must have that G is equal to an α-critical graph G1 with a vertex of G1 being the center of a star K1,l . Moreover we have that τ (G) = τ (G1 ) and α(G) = l + α(G1 ) because G1 is vertexcritical and therefore each vertex belongs to a minimum vertex cover. Now using Case 1 and Lemma 2.1(ii), we obtain, q(G) = l + q(G1 ) ≥ l + (α(G1 ) − 1 + Γ(α(G1 ), τ (G1 ))) = α(G) − 1 + Γ(α(G1 ), τ (G)) (ii)
≥ α(G) − 1 + Γ(α(G), τ (G)).
3
✷
A classification of q-minimal graphs
A 1-linking of a graph G is a new graph G′ with the same vertex set as G but obtained from G by adding the minimum number of edges possible such that G′ be connected. The graph G is called the subjacent graph of the 1-linking graph G′ and the edges that we add are called the linking edges. Clearly a 1-linking graph G′ of a disconnected graph G can be obtained by adding c(G) − 1 edges, where c(G) is the number of connected components of G. This definition is equivalent to the one given in [1] of a tree-linking of a graph.
On bounds for the stability number of graphs
49
A graph G is a Tur´an graph, denoted by T (a, t), if G is the disjoint union of a − s complete graphs with r vertices and s complete graphs with r + 1 vertices, where a + t = r(a) + s with 0 ≤ s < a. A graph G with covering number τ (G) = t and stability number α(G) = a is said to be a transformed Tur´ an graph or TT graph if either G is isomorphic to T (a, t), or a ≤ t ≤ 2a and G can be obtained from T (a, t) by the following construction: Take a positive integer k such that k ≤ min{k2 , k3 } where k2 ,k3 denote the number of copies in T (a, t) of K2 and K3 respectively. For every 1 ≤ i ≤ k replace ji copies of K2 and one copy of K3 by a cycle C2ji +3 , where j1 + · · · + jk ≤ k2 . Given a 1 − linking G′ of G, we define a leaf in G′ as a connected component Gi of G incident to a unique linking edge or as a connected component Gi with the property that their exist a unique vertex v in Gi such that all linking edges with one end in Gi are incident to the vertex v. Lemma 3.1 A graph G is q-minimal if and only if G is a 1-linking of a transformed Tur´ an graph. Proof: We will use double induction on the stability and covering number of the graph. For α(G) = 1 we have that G must be a complete graph and the result is clear. Therefore we can assume that G is a q-minimal graph with α(G) ≥ 2. If G is not 2-connected, using the same arguments used in cases 2 and 3 in the proof of Theorem 2.3 and the induction hypothesis the result follows readily. Hence we can assume that G is a 2-connected graph, in fact that, G is an α-critical graph. Therefore the proof will be complete if we prove that G is an odd cycle. Let G be a q-minimal and α-critical graph and let v ∈ V (G) be a vertex of maximal degree. Claim 3.2 G \ v is q-minimal.
Proof: Assume that G \ v is not q-minimal, then by the same arguments as those in case 1 in the proof of Theorem 2.3 and Lemma 2.1 we have q(G)
= (v)
≥
(2)
(iii)
=
q(G \ v) + deg(v) ≥ q(G \ v) + deg(v ′ ) ! " α(G) + τ (G) − 1 α(G) + Γ(α(G), τ (G) − 1) + α(G) α(G) + Γ(α(G), τ (G)),
50
I. Gitler and C. Valencia
which is a contradiction to the q-minimality of G. ✷ Since G is α-critical and in particular τ -critical, then α(G) = α(G \ v), that is, the set of vertices N (v) must satisfy that N (v) ∩ M ̸= ∅ for all maximum stable sets M of G \ v. Hence α(G \ N [v]) = α(G) − 1, where N [v] = N (v) ∪ {v}. A set of vertices N in G \ v can be the set of neighbors of v in G if and only if V (G \ v) \ N induce a subgraph G′ of G \ v with α(G′ ) = α(G) − 1. Moreover N is minimal under inclusion if and only if G[V (G \ v) \ N ] is maximal under inclusion. Now, since G \ v is q − minimal, by induction hypothesis G \ v is a 1-linking of a TT graph. In this case it is easy to find the maximal induced subgraph G′ of G \ v with α(G′ ) = α(G) − 1. Claim 3.3 Let H be a TT graph with H1 , . . . , Ha connected components and let L be a 1-linking of H. Take L′ to be a maximal induced subgraph of L with α(L′ ) = α(L) − 1, then we have that (i) L′ is induced by the set of vertices V (L) \ V (Hi ), for some Hi with α(Hi ) = 1, or
(ii) L′ is induced by the set of vertices in V (L) \ {v1 , v2 , v3 }, where {v1 , v2 , v3 } are vertices of an odd cycle Hj such that Hj \{v1 , v2 , v3 } is a disjoint union of paths with an even number of vertices, or (iii) L′ satisfies the following conditions: (1 ) V (Hi ) ∩ V (L′ ) ̸= ∅ for all Hi , (2 ) if Hi is an odd cycle, then V (Hi ) ⊂ V (L′ ), (3 ) if Hi is a complete graph such that V (Hi ) ̸⊂ V (L′ ), then for all v ∈ V (Hi ) ∩ V (L′ ) there exist at least one linking edge ev incident to v. Proof: If V (L′ ) ∩ V (Hi ) = ∅ for some 1 ≤ i ≤ a with α(Hi ) = 1, then L′ = L[V (H) \ V (Hi )], since V (L′ ) ⊆ V (H) \ V (Hi ) and α(L[V (H) \ V (Hi )]) = α(L) − 1. Therefore we can assume that if L′ ̸= L \ Hi with α(Hi ) = 1, then V (Hi ) ∩ V (L′ ) ̸= ∅ for all Hi with α(Hi ) = 1. If Hj is an odd cycle with 2m + 1 vertices and since all the proper induced graphs of a cycle are paths Pn with α(Pn ) = ⌈ n2 ⌉, then α(Hj \ C) = α(Hj ) − 1 for some C ⊂ V (Hj ) if and only if Hj [C c ] is a disjoint union of three paths Pm1 , Pm2 , Pm3 with m1 , m2 , m3 ≥ 0 even numbers and such that m1 +m2 +m3 = 2(m−1). Therefore, either L′ is described as in (ii) or V (Hj ) ⊂ V (L′ ) for all Hj with α(Hj ) ≥ 2. To finish, if L′ is not given by (i ) or (ii ), then we can assume that V (L′ ) ∩ V (Hi ) ̸= ∅ for all 1 ≤ i ≤ a, moreover if Hj is an odd cycle then
51
On bounds for the stability number of graphs
V (Hj ) ⊂ V (L′ ). Clearly if v ∈ V (Hi )∩V (L′ ) and v is not incident to any linking edge, then V (Hi ) ⊂ V (L′ ) because α(L′ ) = α(L[V (L′ ) ∪ V (Hi )]) ✷ Applying Claim 3.3 to G \ v it is easy to conclude that • G is a complete graph whenever G \ N [v] is as in (i). • G is not q-minimal whenever G \ N [v] is as in (ii). Therefore it only remains to considerer when G \ N [v] satisfies the conditions given in case (iii ). Let Hi0 be a complete graph such that Hi0 is a connected component of the subjacent graph of G \ v (a TT graph) with V (Hi0 ) ̸⊂ V (G′ ) (note that by Claim 3.3 (iii) there exists at least one graph Hi with this condition) and take P = V (Hi0 ) ∩ V (G′ ) and Q = V (Hi0 ) \ P . Since G \ v is q-minimal, then we have that for all u ∈ P , (G \ v) \ u in not connected. For all u ∈ P , let Gu the union of the connected components of (G \ v) \ u such that V (Gu ) ∩ V (Hi0 ) = ∅. Note that Gu is an induced subgraph (a disjoint union of 1-linking of TT graphs) of G \ v such that Gu is joined to u by linking edges. Note that if |V (Hi0 )| ≥ 2, then Gu is unique. Here we need to considerer two cases, the first case is when Gu is not a TT graph. If S is a leaf of Gu not joined to u by a linking edge, then by the 2-connectivity of G we have that v must be incident with at least one vertex of S. In the other case, if Gu is a TT graph, then by the 2-connectivity of G we have that there exists at least one vertex w such that w is incident to v and we can consider that S = Gu is the only leaf of Gu .
G\v
Q
Gr
Hi0 r
S1 Gs
s
P
S2
Moreover, by Claim 3.3 (iii) we have that if vs is the unique vertex of S such that all the linking edges with one end in S are incident to vs , then v must be incident with all the vertices of S \ vs , more precisely
52
I. Gitler and C. Valencia
we have that: (3)
deg(v) ≥ |Q| +
!
!
(∗)
(|Hj | − 1) ≥ |Hi0 |,
u∈P Hj ∈L(Gu )
where L(Gu ) is either the set of leaves of Gu not joined to u when Gu is not a TT graph or equal to Gu when Gu is a TT graph. We have equality in (∗) if and only if all the leaves of Gu are isomorphic to K2 and if Gu is not a TT graph, then Gu has exactly two leaves. " # |V (G \ v)| Now, let Hi0 with |Hi0 | = k = , using that α(G) deg(v) = q(G) − q(G \ v)
≤ Γ(α(G), τ (G)) − Γ(α(G), τ (G) − 1) $ % |V (G)| = −1 α(G) ≤ k
and deg(v) ≥ 2k − 2 (by Equation (3)), then k must be equal & to 2. 'A (G\v)| . similar argument shows that k = 2 when we take |Hi0 | = k = |Vα(G) & ' (G\v)| Hence Hi0 = K2 , deg(v) = 2, |Vα(G) ≤ 3, |P | = 1 and Gu has only two leaves. Therefore G must be an odd cycle since Gu must be a 1 − linking of a TT graph whose components are all isomorphic to K2 . ✷
4
A relation between the stability and covering number
In this section we present some relations between two important invariants of a graph G, the stability number α(G) and covering number τ (G). The origin of our interest in the study of these relations comes from monomial algebras, more precisely we have that: the stability number α(G) of a graph G, is equal to the dimension of the Stanley-Reisner ring associated to the graph G; and the covering number τ (G) of G is equal to the height of the ideal associated to the graph G. Finally, α(G) − δ(G) is an upper bound to the depth of this ring.
On bounds for the stability number of graphs
53
From the algebraic point of view an important class of rings is given by those rings R such that their dimension is equal to their depth. The rings in this class are called Cohen-Macaulay rings. A graph is Cohen-Macaulay if the Stanley-Reisner ring associated to it, is Cohen-Macaulay. We have that if a graph G is Cohen-Macaulay, then δ(G) = 0; note that this is a necessary condition but not a sufficient condition. The family of graphs with δ(G) ≥ 1 correspond to the StanleyReisner rings that have a large depth. Moreover, the dimension minus the depth is bounded below by δ(G), and hence δ(G) is a measure of how far these rings are from being Cohen-Macaulay. The following results are in the spirit of [3], in that paper the authors where motivated in bounding invariants for edge rings. In this paper we concentrate mainly in the combinatorial aspects of these bounds. The theorem below, gives an idea of the class of graphs that are Cohen-Macaulay and of those graphs that are far from being CohenMacaulay. We thank N. Alon (private communication) for some useful suggestions in making the proof of this result simpler and more readable. Theorem 4.1 Let G be a graph without isolated vertices, then α(G) ≤ τ (G)[1 + δ(G)]. Proof: First, fix a minimal vertex cover C with τ (G) vertices. Let v ∈ C, then there exist a maximal stable set M ′ with v ∈ M ′ and |M ′ | ≥ σv (G). Hence there exist a natural number k ≤ τ (G) and T1 , . . . , Tk maximal stable sets with |Ti | ≥ σv (G) such that C⊂
k !
Ti .
i=1
Let M = V \ C and take Ci = C ∩ Ti and Mi = M ∩ Ti for all i = 1, . . . , k. Since the graph G does not have isolated vertices, then for all v ∈ M there exists an edge e of G with e = {v, v ′ }. Now, as C = V (G) \ M and C is a vertex cover we have that v ′ ∈ C, that is (4)
M=
k !
(M ∩ N (Ci )).
i=1
54
I. Gitler and C. Valencia
Since Si = V (G) \ Ti = (C \ Ci ) ∪ (M \ Mi ) is a minimal vertex cover with |Si | ≤ n − σv (G) for all i = 1, . . . , k, then |C \ Ci | + |M \ Mi | = |(C \ Ci ) ∪ (M \ Mi )| = |Si | ≤ n − σv (G). Hence as M ∩ N (Ci ) ⊆ M \ Mi we have that (5)
|M ∩ N (Ci )| ≤ |M \ Mi | ≤ n − σv (G) − |C \ Ci | = |C| + α(G) − σv (G) − |C \ Ci | = |Ci | + α(G) − σv (G) = |Ci | + δ(G). Taking Ai = Ci \ (
i−1 !
j=1
Cj ) and Bi = (M ∩ N (Ci )) \
i−1 !
(M ∩ N (Cj )),
j=1
we have that (6)
|Ci \ Ai | ≤ |M ∩ N (Ci \ Ai )|,
since if |Ci \ Ai | > |M ∩ N (Ci \ Ai )|, then C \ (Ci \ Ai ) ∪ (M ∩ N (Ci \ Ai )) would be a vertex cover of cardinality |C \ (Ci \ Ai )| + |M ∩ N (C \ Ai )| < |C|; a contradiction.
To finish the proof, we use the inequalities (5) and (6) to conclude that " |Bi | = |(M ∩ N (Ci ))| − |(M ∩ N (Ci )) ∩ i−1 j=1 (M ∩ N (Cj ))| " = |(M ∩ N (Ci ))| − |M ∩ N (Ci ) ∩ N ( i−1 j=1 Cj ))| (5) " ≤ |Ci | + α(G) − σv (G) − |M ∩ N (Ci ∩ i−1 j=1 Cj ))| (6)
≤ |Ci | + α(G) − σv (G) − |Ci \ Ai |
= |Ai | + α(G) − σv (G) = |Ai | + δ(G). Therefore (4)
α(G) = |
"k
i=1 (M
∩ N (Ci ))| =
#k
i=1 |Bi |
≤
#k
i=1 (|Ai |
+ δ(G))
≤ |C| + τ (G)δ(G) = τ (G)[1 + δ(G)]
✷
55
On bounds for the stability number of graphs
Remark 4.2 If δ(G) > 0, then we have that α(G) = τ (G)[1 + δ(G)] if and only if G is formed by a clique Kτ (G) with each vertex of this clique being the center of a star K1,δ(G)+1 . Furthermore, if δ(G) = 0 and α(G) = τ (G), then the graph has a perfect matching.
K1,δ(G)+1 Kτ (G) Figure 1: The graph formed by a clique Kτ (G) with each vertex of this clique being the center of a star K1,δ(G)+1 .
Let αcore (G) =
stable !set
|Mi |=α(G)
Mi and τcore (G) =
vertex !cover |Ci |=τ (G)
Ci ,
be the intersection of all the maximum stable sets and of all the minimum vertex covers of G, respectively. A graph is τ -critical if τ (G \ v) < τ (G) for all the vertices v ∈ V (G), that is, a graph is τ -critical if and only if αcore (G) = ∅. Similarly, we have that G is a B-graph if and only if τcore (G) = ∅. We define Bα∩τ = V (G) \ {αcore (G) ∪ τcore (G)}. Proposition 4.3 Let G be a graph, then V (G) = αcore (G) $ τcore (G) $ Bα∩τ , furthermore (i) N (αcore (G)) ⊆ τcore (G),
56
I. Gitler and C. Valencia
(ii) G[αcore (G)] is a trivial graph, (iii) G[Bα∩τ ] is both a τ -critical graph as well as a B-graph without isolated vertices. Proof: Clearly αcore (G) ∩ τcore (G) = ∅. Now, since G[V (G) \ τcore (G)] is a B-graph, we have that, αcore (G) ⊂ V (G) \ τcore (G) is the set of isolated vertices of G[V (G) \ τcore (G)]. Therefore N (αcore (G)) ⊆ τcore (G), proving (i). Hence, we have that G[αcore (G)] is a graph without edges, ✷ giving (ii). Finally, by definition of Bα∩τ we obtain (iii). Example 4.4 To illustrate the previous result consider the following graph:
v1
v5
v6
v7
v3 v4
v2 since α(G) = 3, τ (G) = 4 and {v3 , v4 , v5 }, {v3 , v4 , v6 }, {v3 , v4 , v7 } are the maximum stable sets of G, we have that • αcore (G) = {v3 , v4 }, • τcore (G)] = {v1 , v2 }, • Bα∩τ = {v5 , v6 , v7 }. Remark 4.5 It is easy to see that if v is an isolated vertex, then v ∈ αcore (G), in a similar way we have that if deg(v) > τ (G), then v does not belong to any stable set with α(G) vertices and therefore v ∈ τcore (G). Note that in general the induced graph G[Bα∩τ ] is not necessarily connected. Corollary 4.6 Let G be a graph, then α(G) − |αcore (G)| ≤ τ (G) − |τcore (G)|.
On bounds for the stability number of graphs
57
Proof: By Proposition 4.3 we have that G[Bα∩τ ] is a B-graph. Now, since α(G[Bα∩τ ]) = α(G) − |αcore (G)| and τ (G[Bα∩τ ]) = τ (G) − |τcore (G)|, and by applying Theorem 4.1 to G[Bα∩τ ] we obtain that α(G) − |αcore (G)| ≤ τ (G) − |τcore (G)|.
✷
Remark 4.7 The bound of Corollary 4.6 improves the bound given in [5, Theorem 2.11] for the number of vertices in αcore (G). Their result states: If G is a graph of order n and α(G) > (n + k − min{1, |N (αcore (G))|})/2, for some k ≥ 1, then |αcore (G)| ≥ k + 1. Moreover, if (n + k − min{1, |N (αcore (G))|})/2 is even, then |αcore (G)| ≥ k + 2.
Notice that if α(G) ≥ n/2 + k ′ /2, our bound gives, |αcore (G)| ≥ k ′ + |τcore (G)|.
Remark 4.8 After this paper was submitted, the authors learned that Theorem 2.3 was also obtained independently in [2]. Isidoro Gitler Departmento de Matem´ aticas, Centro de Investigaci´ on y de Estudios Avanzados del IPN, Apartado Postal 14–740, 07000 M´exico City, D.F., igitler@math.cinvestav.mx
Carlos E. Valencia Departmento de Matem´ aticas, Centro de Investigaci´ on y de Estudios Avanzados del IPN, Apartado Postal 14–740, 07000 M´exico City, D.F., cvalenci@math.cinvestav.mx
References [1] Bougard N.; Gwena¨el Joret, Tur´ an’s theorem and k-connected graphs, manuscript. [2] Christophe J. et al, Linear Inequalities among Graph Invariants: using GraPHedron to uncover optimal relationships, e-print available on Optimization Online.
58
I. Gitler and C. Valencia
[3] Gitler I.; Valencia C. E, Bounds for invariants of edge-rings, Comm. Algebra 33 (2005), 1603–1616. [4] Gitler I.; Valencia C. E., Bounds for graph invariants, arXiv: math.CO/0510387. [5] Levit V. E.; Mandrescu E., Combinatorial properties of the family of maximum stable sets of a graph, Discrete Appl. Math. 117 (2002), 149–161. [6] Ore O., Theory of graphs, American Mathematical Society Colloquium Publications, Vol. XXXVIII, American Math. Society, Providence, R. I., 1962. [7] Tur´an P., Eine Extremalaufgabe aus der Graphentheorie, Mat. Fiz. Lapok 48 (1941), 436–452.
Morfismos, Comunicaciones Estudiantiles del Departamento de Matem´ aticas del CINVESTAV, se termin´ o de imprimir en el mes de junio de 2007 en el taller de reproducci´ on del mismo departamento localizado en Av. IPN 2508, Col. San Pedro Zacatenco, M´exico, D.F. 07300. El tiraje en papel opalina importada de 36 kilogramos de 34 × 25.5 cm consta de 500 ejemplares con pasta tintoreto color verde.
Apoyo t´ecnico: Omar Hern´ andez Orozco.
Contenido A unified approach to continuous-time discounted Markov control processes Tom´ as Prieto-Rumeau and On´esimo Hern´ andez-Lerma . . . . . . . . . . . . . . . . . . . 1
On bounds for the stability number of graphs Isidoro Gitler and Carlos E. Valencia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41