Copying prohibited All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. The papers and inks used in this product are eco-friendly. Art. No 32217 ISBN 978-91-44-11529-0 Edition 3:1 Š The Authors and Studentlitteratur 2005, 2016 www.studentlitteratur.se Studentlitteratur AB, Lund Cover illustration: Emil Gustavsson Cover design by Francisco Ortega Printed by Holmbergs i MalmÜ AB, Sweden 2016
Contents
I
Introduction
1
1 Modelling and classification 1.1 Modelling of optimization problems . . . . . . . . . . 1.2 A quick glance at optimization history . . . . . . . . 1.3 Classification of optimization models . . . . . . . . . 1.4 Conventions . . . . . . . . . . . . . . . . . . . . . . . 1.5 Applications and modelling examples . . . . . . . . . 1.6 On optimality conditions . . . . . . . . . . . . . . . . 1.7 Soft and hard constraints . . . . . . . . . . . . . . . 1.7.1 Definitions . . . . . . . . . . . . . . . . . . . 1.7.2 A derivation of the exterior penalty function 1.8 A road map through the material . . . . . . . . . . . 1.9 Background and didactics . . . . . . . . . . . . . . . 1.10 Illustrating the theory . . . . . . . . . . . . . . . . . 1.11 Notes and further reading . . . . . . . . . . . . . . .
II
. . . . . . . . . . . . .
. . . . . . . . . . . . .
. . . . . . . . . . . . .
Fundamentals
3 3 11 13 16 18 19 20 20 21 23 28 30 31
33
2 Analysis and algebra—A summary 35 2.1 Reductio ad absurdum . . . . . . . . . . . . . . . . . . . . 35 2.2 Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3 Convex analysis 3.1 Convexity of sets . . 3.2 Polyhedral theory . . 3.2.1 Convex hulls 3.2.2 Polytopes . . 3.2.3 Polyhedra . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
43 43 45 45 49 50
Contents
3.3 3.4 3.5
III
3.2.4 Fourier Elimination . . . . . . . . . . . . . . 3.2.5 Farkas’ Lemma . . . . . . . . . . . . . . . . . 3.2.6 Minkowski–Weyl Theorem . . . . . . . . . . . Convex functions . . . . . . . . . . . . . . . . . . . . Application: the projection of a vector onto a convex Notes and further reading . . . . . . . . . . . . . . .
. . . . . . . . set . .
. . . . . .
Optimality Conditions
4 An introduction to optimality conditions 4.1 Local and global optimality . . . . . . . . . . . . . . . . 4.2 Existence of optimal solutions . . . . . . . . . . . . . . . 4.2.1 A classic result . . . . . . . . . . . . . . . . . . . 4.2.2 ∗ Non-standard results . . . . . . . . . . . . . . . 4.2.3 Special optimal solution sets . . . . . . . . . . . 4.3 Optimality conditions for unconstrained optimization . . 4.4 Optimality conditions for optimization over convex sets 4.5 Near-optimality in convex optimization . . . . . . . . . . 4.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Continuity of convex functions . . . . . . . . . . 4.6.2 The Separation Theorem . . . . . . . . . . . . . 4.6.3 Euclidean projection . . . . . . . . . . . . . . . . 4.6.4 Fixed point theorems . . . . . . . . . . . . . . . 4.7 Notes and further reading . . . . . . . . . . . . . . . . .
56 60 63 65 74 76
79 . . . . . . . . . . . . . .
81 81 84 84 87 89 91 94 102 103 103 105 115 116 122
5 Optimality conditions 125 5.1 Relations between optimality conditions and CQs at a glance125 5.2 A note of caution . . . . . . . . . . . . . . . . . . . . . . . 126 5.3 Geometric optimality conditions . . . . . . . . . . . . . . 128 5.4 The Fritz John conditions . . . . . . . . . . . . . . . . . . 134 5.5 The Karush–Kuhn–Tucker conditions . . . . . . . . . . . . 141 5.6 Proper treatment of equality constraints . . . . . . . . . . 145 5.7 Constraint qualifications . . . . . . . . . . . . . . . . . . . 147 5.7.1 Mangasarian–Fromovitz CQ (MFCQ) . . . . . . . 147 5.7.2 Slater CQ . . . . . . . . . . . . . . . . . . . . . . . 148 5.7.3 Linear independence CQ (LICQ) . . . . . . . . . . 148 5.7.4 Affine constraints . . . . . . . . . . . . . . . . . . . 149 5.8 Sufficiency of the KKT conditions under convexity . . . . 150 5.9 Applications and examples . . . . . . . . . . . . . . . . . . 152 5.10 Notes and further reading . . . . . . . . . . . . . . . . . . 154
Contents 6 Lagrangian duality 157 6.1 The relaxation theorem . . . . . . . . . . . . . . . . . . . 157 6.2 Lagrangian duality . . . . . . . . . . . . . . . . . . . . . . 158 6.2.1 Lagrangian relaxation and the dual problem . . . . 158 6.2.2 Global optimality conditions . . . . . . . . . . . . 163 6.2.3 Strong duality for convex programs . . . . . . . . . 165 6.2.4 Strong duality for linear and quadratic programs . 170 6.2.5 Two illustrative examples . . . . . . . . . . . . . . 172 6.3 Differentiability properties of the dual function . . . . . . 174 6.3.1 Subdifferentiability of convex functions . . . . . . . 174 6.3.2 Differentiability of the Lagrangian dual function . 178 6.4 ∗ Subgradient optimization methods . . . . . . . . . . . . . 180 6.4.1 Convex problems . . . . . . . . . . . . . . . . . . . 180 6.4.2 Application to the Lagrangian dual problem . . . . 186 6.4.3 The generation of ascent directions . . . . . . . . . 189 6.5 ∗ Obtaining a primal solution . . . . . . . . . . . . . . . . 190 6.5.1 Differentiability at the optimal solution . . . . . . 191 6.5.2 Everett’s Theorem . . . . . . . . . . . . . . . . . . 192 6.6 ∗ Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . 194 6.6.1 Analysis for convex problems . . . . . . . . . . . . 194 6.6.2 Analysis for differentiable problems . . . . . . . . . 196 6.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 197 6.7.1 Electrical networks . . . . . . . . . . . . . . . . . . 197 6.7.2 A Lagrangian relaxation of the traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . 201 6.8 Notes and further reading . . . . . . . . . . . . . . . . . . 206
IV
Linear Programming
209
7 Linear programming: An introduction 211 7.1 The manufacturing problem . . . . . . . . . . . . . . . . . 211 7.2 A linear programming model . . . . . . . . . . . . . . . . 212 7.3 Graphical solution . . . . . . . . . . . . . . . . . . . . . . 213 7.4 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . 213 7.4.1 An increase in the number of large pieces available 214 7.4.2 An increase in the number of small pieces available 215 7.4.3 A decrease in the price of the tables . . . . . . . . 216 7.5 The dual of the manufacturing problem . . . . . . . . . . 217 7.5.1 A competitor . . . . . . . . . . . . . . . . . . . . . 217 7.5.2 A dual problem . . . . . . . . . . . . . . . . . . . . 217 7.5.3 Interpretations of the dual optimal solution . . . . 218
Contents 8 Linear programming models 219 8.1 Linear programming modelling . . . . . . . . . . . . . . . 219 8.2 The geometry of linear programming . . . . . . . . . . . . 224 8.2.1 Standard form . . . . . . . . . . . . . . . . . . . . 225 8.2.2 Basic feasible solutions and the Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 228 8.2.3 Adjacent extreme points . . . . . . . . . . . . . . . 235 8.3 Notes and further reading . . . . . . . . . . . . . . . . . . 237 9 The simplex method 9.1 The algorithm . . . . . . . . . . . . . . 9.1.1 A BFS is known . . . . . . . . 9.1.2 A BFS is not known: phase I & 9.1.3 Alternative optimal solutions . 9.2 Termination . . . . . . . . . . . . . . . 9.3 Computational complexity . . . . . . . 9.4 Notes and further reading . . . . . . .
. . . . II . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
239 239 240 247 251 251 252 253
10 LP duality and sensitivity analysis 255 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 255 10.2 The linear programming dual . . . . . . . . . . . . . . . . 256 10.2.1 Canonical form . . . . . . . . . . . . . . . . . . . . 257 10.2.2 Constructing the dual . . . . . . . . . . . . . . . . 257 10.3 Linear programming duality theory . . . . . . . . . . . . . 261 10.3.1 Weak and strong duality . . . . . . . . . . . . . . . 261 10.3.2 Complementary slackness . . . . . . . . . . . . . . 264 10.3.3 An alternative derivation of the dual linear program268 10.4 The dual simplex method . . . . . . . . . . . . . . . . . . 268 10.5 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . 272 10.5.1 Perturbations in the objective function . . . . . . . 273 10.5.2 Perturbations in the right-hand side coefficients . . 274 10.5.3 Addition of a variable or a constraint . . . . . . . 275 10.6 Column generation in linear programming . . . . . . . . . 277 10.6.1 The minimum cost multi-commodity network flow problem . . . . . . . . . . . . . . . . . . . . . . . . 277 10.6.2 The column generation principle . . . . . . . . . . 280 10.6.3 An algorithm instance . . . . . . . . . . . . . . . . 282 10.7 Notes and further reading . . . . . . . . . . . . . . . . . . 284
Contents
V
Algorithms
287
11 Unconstrained optimization 289 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 289 11.2 Descent directions . . . . . . . . . . . . . . . . . . . . . . 291 11.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 291 11.2.2 Newton’s method and extensions . . . . . . . . . . 293 11.2.3 Least-squares problems and the Gauss–Newton algorithm . . . . . . . . . . . . . . . . . . . . . . . . 297 11.3 The line search problem . . . . . . . . . . . . . . . . . . . 299 11.3.1 A characterization of the line search problem . . . 299 11.3.2 Approximate line search strategies . . . . . . . . . 300 11.4 Convergent algorithms . . . . . . . . . . . . . . . . . . . . 302 11.5 Finite termination criteria . . . . . . . . . . . . . . . . . . 305 11.6 A comment on non-differentiability . . . . . . . . . . . . . 307 11.7 Trust region methods . . . . . . . . . . . . . . . . . . . . . 309 11.8 Conjugate gradient methods . . . . . . . . . . . . . . . . . 310 11.8.1 Conjugate directions . . . . . . . . . . . . . . . . . 310 11.8.2 Conjugate direction methods . . . . . . . . . . . . 311 11.8.3 Generating conjugate directions . . . . . . . . . . . 313 11.8.4 Conjugate gradient methods . . . . . . . . . . . . . 313 11.8.5 Extension to non-quadratic problems . . . . . . . . 316 11.9 A quasi-Newton method: DFP . . . . . . . . . . . . . . . 317 11.10Convergence rates . . . . . . . . . . . . . . . . . . . . . . 320 11.11Implicit functions . . . . . . . . . . . . . . . . . . . . . . . 321 11.12Notes and further reading . . . . . . . . . . . . . . . . . . 322 12 Feasible-direction methods 323 12.1 Feasible-direction methods . . . . . . . . . . . . . . . . . . 323 12.2 The Frank–Wolfe algorithm . . . . . . . . . . . . . . . . . 325 12.3 The simplicial decomposition algorithm . . . . . . . . . . 328 12.4 The gradient projection algorithm . . . . . . . . . . . . . 331 12.5 Application: traffic equilibrium . . . . . . . . . . . . . . . 337 12.5.1 Model analysis . . . . . . . . . . . . . . . . . . . . 337 12.5.2 Algorithms and a numerical example . . . . . . . . 340 12.6 Algorithms given by closed maps . . . . . . . . . . . . . . 341 12.6.1 Algorithmic maps . . . . . . . . . . . . . . . . . . 341 12.6.2 Closed maps . . . . . . . . . . . . . . . . . . . . . 344 12.6.3 A convergence theorem . . . . . . . . . . . . . . . 345 12.6.4 Additional results on composite maps . . . . . . . 348 12.6.5 Convergence of some algorithms defined by closed descent maps . . . . . . . . . . . . . . . . . . . . . 352 12.7 Active set methods . . . . . . . . . . . . . . . . . . . . . . 353
Contents 12.7.1 The methods of Zoutendijk, Topkis, and Veinott . 12.7.2 A primal–dual active set method for polyhedral constraints . . . . . . . . . . . . . . . . . . . . . . 12.7.3 Rosen’s gradient projection algorithm . . . . . . . 12.8 Reduced gradient algorithms . . . . . . . . . . . . . . . . 12.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . 12.8.2 The reduced gradient algorithm . . . . . . . . . . . 12.8.3 Convergence analysis . . . . . . . . . . . . . . . . . 12.9 Notes and further reading . . . . . . . . . . . . . . . . . . 13 Constrained optimization 13.1 Penalty methods . . . . . . . . . . . . . . . . . . . . . 13.1.1 Exterior penalty methods . . . . . . . . . . . . 13.1.2 Interior penalty methods . . . . . . . . . . . . 13.1.3 Computational considerations . . . . . . . . . . 13.1.4 Applications and examples . . . . . . . . . . . 13.2 Sequential quadratic programming . . . . . . . . . . . 13.2.1 Introduction . . . . . . . . . . . . . . . . . . . 13.2.2 A penalty-function based SQP algorithm . . . 13.2.3 A numerical example on the MSQP algorithm . 13.2.4 On recent developments in SQP algorithms . . 13.3 A summary and comparison . . . . . . . . . . . . . . . 13.4 Notes and further reading . . . . . . . . . . . . . . . .
VI
. . . . . . . . . . . .
Exercises
14 Exercises Chapter 1: Modelling and classification . . . . . . . Chapter 3: Convexity . . . . . . . . . . . . . . . . . Chapter 4: An introduction to optimality conditions Chapter 5: Optimality conditions . . . . . . . . . . . Chapter 6: Lagrangian duality . . . . . . . . . . . . Chapter 8: Linear programming models . . . . . . . Chapter 9: The simplex method . . . . . . . . . . . Chapter 10: LP duality and sensitivity analysis . . . Chapter 11: Unconstrained optimization . . . . . . . Chapter 12: Feasible direction methods . . . . . . . Chapter 13: Constrained optimization . . . . . . . .
354 359 362 366 366 367 369 370
373 . 373 . 374 . 378 . 381 . 382 . 385 . 385 . 388 . 393 . 394 . 394 . 395
397 . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
399 399 402 406 412 417 422 424 426 431 436 439
Contents 15 Answers to the exercises Chapter 1: Modelling and classification . . . . . . . Chapter 3: Convexity . . . . . . . . . . . . . . . . . Chapter 4: An introduction to optimality conditions Chapter 5: Optimality conditions . . . . . . . . . . . Chapter 6: Lagrangian duality . . . . . . . . . . . . Chapter 8: Linear programming models . . . . . . . Chapter 9: The simplex method . . . . . . . . . . . Chapter 10: LP duality and sensitivity analysis . . . Chapter 11: Unconstrained optimization . . . . . . . Chapter 12: Optimization over convex sets . . . . . Chapter 13: Constrained optimization . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
443 443 446 449 452 455 457 459 462 464 466 467
References
469
Index
487
Convex analysis
III
This chapter is devoted to the theory of convex sets and functions. Section 3.1 provides the definition of a convex set and some examples of convex as well as non-convex sets. Section 3.2 is devoted to the theory of polyhedra, introducing important notions such as convex and affine hulls, and including the Representation Theorem which has a crucial role in linear programming. The first main building block when deriving these results is Farkas’ Lemma and several relatives thereof, all dealing with the relationships between the solvability of two related linear systems of equations and inequalities which are known as “theorems of the alternative”. Section 3.3 is devoted to characterizations of convex functions utilizing the differentiability properties of the function in question, and in particular establishes a characterization of the convexity of a function through the convexity of its epigraph.
3.1
Convexity of sets
Definition 3.1 (convex set) Let S ⊆ Rn . The set S is convex if x1 , x2 ∈ S =⇒ λx1 + (1 − λ)x2 ∈ S λ ∈ (0, 1) holds. A set S is convex if, from everywhere in S, all other points of S are “visible.” Figure 3.1 illustrates a convex set. Two non-convex sets are shown in Figure 3.2. Example 3.2 (convex and non-convex sets) By using the definition of a convex set, the following can be established:
Convex analysis
Îťx1 + (1 − Îť)x2
S
x2
1
x
Figure 3.1: A convex set. (For the intermediate vector shown, the value of Îť is ≈ 1/2.)
x1 Îťx1 + (1 − Îť)x2 S
S x2 Figure 3.2: Two non-convex sets.
(a) The set Rn is a convex set. (b) The empty set is a convex set. (c) The set { x ∈ Rn | ďż˝xďż˝ ≤ a } is convex for every value of a ∈ R. [Note: ďż˝ ¡ ďż˝ here denotes any vector norm, but we will almost always use the 2-norm, n 2 ďż˝xďż˝2 := xj . j=1
We will not write the index 2 , but instead use the 2-norm implicitly whenever writing ďż˝ ¡ ďż˝.] (d) The set { x ∈ Rn | ďż˝xďż˝ = a } is non-convex for every a > 0. (e) The set {0, 1, 2} is non-convex. (The second illustration in Figure 3.2 is a case of a non-convex set of integral points in R2 .)
Proposition 3.3 (convex intersection) Suppose that Sk , k ∈ K, is any collection of convex sets. Then, the intersection S := Sk k∈K
is a convex set. 44
Polyhedral theory Proof. Let both x1 and x2 belong to S. (If two such points cannot be found, then the result holds vacuously.) Then, x1 ∈ Sk and x2 ∈ Sk for all k ∈ K. Take λ ∈ (0, 1). Then, λx1 + (1 − λ)x2 ∈ Sk , k ∈ K, by the convexity of the sets Sk . So, λx1 + (1 − λ)x2 ∈ ∩k∈K Sk = S.
3.2 3.2.1
Polyhedral theory Convex hulls
Consider the set V := {v 1 , v 2 }, where v 1 , v 2 ∈ Rn and v 1 �= v 2 . A set naturally related to V is the line in Rn through v 1 and v 2 [see Figure 3.3(b)], that is, { λv 1 + (1 − λ)v 2 | λ ∈ R } = { λ1 v 1 + λ2 v 2 | λ1 , λ2 ∈ R; λ1 + λ2 = 1 }. Another set naturally related to V is the line segment between v 1 and v 2 [see Figure 3.3(c)], that is, { λv 1 + (1 − λ)v 2 | λ ∈ [0, 1] } = { λ1 v 1 + λ2 v 2 | λ1 , λ2 ≥ 0; λ1 + λ2 = 1 }. Motivated by these examples we define the affine hull and the convex hull of a finite set in Rn . Definition 3.4 (affine hull) The affine hull of a set V ⊆ Rn , denoted aff V , is the smallest affine subspace of Rn that includes V . Definition 3.5 (convex hull) The convex hull of a set V ⊆ Rn , denoted conv V , is the smallest convex set that includes V . Remark 3.6 Defining the convex hull of a set V ⊆ Rn as the smallest set containing V may seem ambiguous, after all we have not yet shown that such a set even exists! However, letting F be the family of convex sets in Rn containing V , we may write conv V = C∈F C, since this a convex set (by Proposition 3.3) and by construction no smaller convex set can contain V , since then it would a member of the family F . The same remark applies to affine hulls, where to show that the affine hull is well defined, we need to prove the analogue of Proposition 3.3 for affine subspaces. This is left as an exercise. A convex combination of a set of points is a “weighted average” of those points, which is formalized in the following proposition. Definition 3.7 (affine combinations, convex combinations) An affine combination of the points {v1 , . . . , v k } is a vector v satisfying 45
Convex analysis
v=
k
λi v i
i=1
k Where i=1 λi = 1. If the weights additionally satisfy λi ≥ 0 for all i = 1, . . . , k, the vector v is called a convex combination. Note that the number of points in the sum defining convex and affine combinations is always assumed to be finite.
v2
v1
v2
v1 (a)
v2
v1 (b)
(c)
Figure 3.3: (a) The set V . (b) The set aff V . (c) The set conv V .
Example 3.8 (affine hull, convex hull) (a) The affine hull of three or more points in R2 not all lying on the same line is R2 itself. The convex hull of five points in R2 is shown in Figure 3.4 (observe that the “corners” of the convex hull of the points are some of the points themselves). (b) The affine hull of three points not all lying on the same line in R3 is the plane through the points. (c) The affine hull of an affine space is the space itself and the convex hull of a convex set is the set itself. A more algebraic description of convex are given by the following proposition. Proposition 3.9 Let V ⊆ Rn . Then, conv V is the set of all convex combinations of points of V . 46
Polyhedral theory v1 v2
v5
v4
v3 Figure 3.4: The convex hull of five points in R2 .
Proof. Let Q be the set of all convex combinations of points of V . We first establish that conv V ⊆ Q. To see this, we simply note that V ⊆ Q, and that Q is clearly convex. Since conv V is the smallest set with this property, conv V ⊆ Q. To establish that Q ⊆ conv V , we proceed with an induction argument. Let m x= αi xi , (3.1) i=1
m
where αi ≥ 0 for i = 1, . . . , m, i=1 αi = 1, and xi ∈ V for all i. For m = 1, we let x = x1 for some x1 ∈ Q; clearly, x ∈ conv V . Suppose then that for some m > 1, any vector x given by (3.1), under the given conditions on the weights αi , belong to conv V , and let x=
m+1
αi xi ,
i=1
m+1 i with αi ≥ 0 for i = 1, . . . , m + 1, i=1 αi = 1, and x ∈ V for all i. Without loss of generality we assume that αm+1 < 1. We may then write x = (1 − αm+1 )z + αm+1 xm+1 , where
m
αi xi . 1 − α m+1 i=1 As αi /(1 − αm+1 ) ≥ 0 for i = 1, . . . , m, and m i=1 αi /(1 − αm+1 ) = 1, it follows from the induction hypothesis that x ∈ conv V . Combining the inclusions gives Q = conv V . z=
47
Convex analysis
A similar result holds for affine hulls. Proposition 3.10 Let V ⊆ Rn . Then, aff V is the set of all affine combinations of points of V . Proof. The proof is similar to that of Proposition 3.9; we leave it as an exercise. We can summarize the above two propositions as follows: for any V ⊆ Rn we have conv V =
aff V =
k i=1
k i=1
λi v i | v 1 , . . . , v k ∈ V, λi v i | v 1 , . . . , v k ∈ V,
k i=1
k i=1
λi = 1, λ1 , . . . , λk ≥ 0 ,
λk = 1 .
We emphasize that the sums in the above two expressions are assumed to be finite. Proposition 3.9 shows that every point of the convex hull of a set can be written as a convex combination of points from the set. It tells, however, nothing about how many points that are required. This is the content of Carath´eodory’s Theorem. Theorem 3.11 (Carath´eodory’s Theorem) Let x ∈ conv V , where V ⊆ Rn . Then, x can be expressed as a convex combination of n + 1 or fewer points of V . Proof. From Proposition 3.9 it follows that x = λ1 a1 + · · · + λm am for m 1 m some a , . . . , a ∈ V and λ1 , . . . , λm ≥ 0 such that i=1 λi = 1. We assume that this representation of x is chosen so that x cannot be expressed as a convex combination of fewer than m points of V . It follows that no two of the points a1 , . . . , am are equal and that λ1 , . . . , λm > 0. We prove the theorem by showing that m ≤ n + 1. Assume that m > n + 1. Then the set {a1 , . . . , am } must be affinely dependent, so exist m there m α1 , . . . , αm ∈ R, not all zero, such that i=1 αi ai = 0n and i=1 αi = 0. Let ε > 0 be such that λ1 + εα1 , . . . , λm + εαm are non-negative with at least one of them zero (such an ε exists since the λ’s are mall positive and at least one of the α’s must be negative). Then, x = i=1 (λi + εαi )ai , and if terms with zero coefficients are omitted then this is a representation of x with fewer than m points; this is a contradiction.
48
Polyhedral theory Again, a similar result holds for affine hulls, we state it below and leave the proof as an exercise. Proposition 3.12 Let x ∈ aff V , where V ⊆ Rn . Then, x can be expressed as an affine combination of n + 1 or fewer points of V .
3.2.2
Polytopes
We are now ready to define the geometrical object polytope. Definition 3.13 (polytope) A subset P of Rn is a polytope if it is the convex hull of finitely many points in Rn . Example 3.14 (polytopes) (a) The set shown in Figure 3.4 is a polytope. (b) A cube and a tetrahedron are polytopes in R3 . We next show how to characterize a polytope as the convex hull of its extreme points. Definition 3.15 (extreme point) A point v of a convex set P is called an extreme point if whenever v = λx1 + (1 − λ)x2 holds, where x1 , x2 ∈ P and λ ∈ (0, 1), then v = x1 = x2 . Example 3.16 (extreme points) The set shown in Figure 3.3(c) has the extreme points v 1 and v 2 . The set shown in Figure 3.4 has the extreme points v 1 , v 2 , and v 3 . The set shown in Figure 3.3(b) does not have any extreme points. Lemma 3.17 Let V := {v1 , . . . , v k } ⊂ Rn and let P be the polytope conv V . Then, each extreme point of P lies in V . Proof. / V is an extreme Assume that w ∈ point of P . We have that w = ki=1 λi v i , for some λi ≥ 0 such that ki=1 λi = 1. At least one of the coefficients λi must be nonzero, say λ1 . If λ1 = 1 then w = v 1 (a contradiction) so λ1 ∈ (0, 1). We have that w = λ1 v 1 + (1 − λ1 )
k i=2
λi vi . 1 − λ1
Since ki=2 λi /(1 − λ1 ) = 1 we have that ki=2 λi /(1 − λ1 )v i ∈ P , but w is an extreme point of P so w = v 1 , a contradiction. 49
Convex analysis Proposition 3.18 Let V := {v1 , . . . , v k } ⊂ Rn and let P be the polytope conv V . Then P is equal to the convex hull of its extreme points. Proof. Let Q be the set of extreme points of P . If v i ∈ Q for all i = 1, . . . , k then we are done, so assume that v 1 ∈ / Q. Then v 1 = λu + (1 − λ)w for some λ ∈ (0, 1) and u, w ∈ P , u �= w. Further, k k u = i=1 αi v i and w = i=1 βi v i , for some α1 , . . . , αk , β1 , . . . , βk ≥ 0 such that ki=1 αi = ki=1 βi = 1. Hence, v1 = λ
k i=1
αi v i + (1 − λ)
k i=1
βi v i =
k i=1
(λαi + (1 − λ)βi )v i .
It must hold that α1 , β1 �= 1, since otherwise u = w = v 1 , a contradiction. Therefore, v1 =
k i=2
k
λαi + (1 − λ)βi vi , 1 − (λα1 + (1 − λ)β1 )
and since i=2 (λαi + (1 − λ)βi )/(1 − λα1 − (1 − λ)β1 ) = 1 it follows that conv V = conv (V \ {v1 }). Similarly, every v i ∈ / Q can be removed, and we end up with a set T ⊆ V such that conv T = conv V and T ⊆ Q. On the other hand, from Lemma 3.17 we have that every extreme point of the set conv T lies in T and since conv T = conv V it follows that Q is the set of extreme points of conv T , so Q ⊆ T . Hence, T = Q and we are done.
3.2.3
Polyhedra
Closely related to the polytope is the polyhedron. We will show that every polyhedron is the sum of a polytope and a polyhedral cone. In the next subsection we show that a set is a polytope if and only if it is a bounded polyhedron. Definition 3.19 (polyhedron) A subset P of Rn is a polyhedron if there exist a matrix A ∈ Rm×n and a vector b ∈ Rm such that P = { x ∈ Rn | Ax ≤ b } holds. The importance of polyhedra is obvious, since the set of feasible solutions of every linear programming problem (see Chapters 7 and 8) is a polyhedron. 50
Polyhedral theory Example 3.20 (polyhedra) (a) Figure 3.5 shows the bounded polyhedron P := { x ∈ R2 | x1 ≥ 2; x1 + x2 ≤ 6; 2x1 − x2 ≤ 4 }. (b) The unbounded polyhedron P := { x ∈ R2 | x1 + x2 ≥ 2; x1 − x2 ≤ 2; 3x1 − x2 ≥ 0 } is shown in Figure 3.6. x2
6 x1 = 2
5 4
2x1 − x2 = 4
3 P
2
x1 + x2 = 6
1 1
2
3
4
5
6
x1
Figure 3.5: Illustration of the bounded polyhedron P := { x ∈ R2 | x1 ≥ 2; x1 + x2 ≤ 6; 2x1 − x2 ≤ 4 }. Often it is hard to decide whether a point in a convex set is an extreme point or not. This is not the case for the polyhedron since there is an algebraic characterization of the extreme points of such a set. Given ˜ ∈ { x ∈ Rn | Ax ≤ b } we refer to the rows of A˜ an x x ≤ b that are fulfilled with equality as the equality subsystem of A˜ x ≤ b, and denote ˜ consists of the rows i ∈ {1, . . . , m} of (A, b) ˜x = ˜ ˜ b) it by A˜ b, that is, (A, ˜ = bi , where Ai. is the ith row of A. The number of rows such that Ai. x ˜ is denoted by m. ˜ b) in (A, ˜ ˜ ∈ Theorem 3.21 (algebraic characterization of extreme points) Let x P = { x ∈ Rn | Ax ≤ b }, where A ∈ Rm×n has rank A = n and ˜ be the equality subsystem of A˜ ˜x = b b ∈ Rm . Further, let A˜ x ≤ b. ˜ = n. ˜ is an extreme point of P if and only if rank A Then x 51
Convex analysis
x2 3x1 − x2 = 0 6 5 4 P
3
x1 − x2 = 2
2 1 1
2
3
6 5 x1 + x2 = 2 4
x1
Figure 3.6: Illustration of the unbounded polyhedron P := { x ∈ R2 | x1 + x2 ≥ 2; x1 − x2 ≤ 2; 3x1 − x2 ≥ 0 }.
˜ is an extreme point of P . If A˜ Proof. [=⇒] Suppose that x x < b ˜ + ε1n , x ˜ − ε1n ∈ P if ε > 0 is sufficiently small. But x ˜ = then x ˜ is an extreme point, 1/2(˜ x +ε1n )+1/2(˜ x−ε1n ) which contradicts that x so assume that at least one of the rows in A˜ x ≤ b is fulfilled with equality. ˜x = b ˜ is the equality subsystem of A˜ ˜ ≤ n − 1, If A˜ x ≤ b and rank A n m ˜ ˜ ˜ + εw, x ˜ − εw ∈ P then there exists a w �= 0 such that Aw = 0 , so x ˜ = 1/2(˜ if ε > 0 is sufficiently small. But x x + εw) + 1/2(˜ x − εw), which ˜ = n. ˜ is an extreme point. Hence, rank A contradicts that x ˜ ˜ is the unique solution [⇐=] Assume that rank A = n. Then, x ˜ = b. ˜ If x ˜ is not an extreme point of P it follows that x ˜ = to Ax λu + (1 − λ)v for some λ ∈ (0, 1) and u, v ∈ P , u �= v. This yields ˜ and since Au ≤ b and Av ≤ b it follows ˜ + (1 − λ)Av ˜ = b, that λAu ˜ ˜ ˜ ˜ is the unique solution to that Au = Av = b, which contradicts that x ˜ Therefore x ˜ = b. ˜ must be an extreme point. Ax
52
Polyhedral theory Corollary 3.22 Let A ∈ Rm×n and b ∈ Rm . The number of extreme points of the polyhedron P := { x ∈ Rn | Ax ≤ b } is finite. Proof. The theorem implies that the number of extreme points of P never exceeds the number of ways in which n objects can be chosen from a set of m objects, that is, the number of extreme points is less than or equal to m! m = . n n!(m − n)! We are done. Remark 3.23 Since the number of extreme points is finite, the convex hull of the extreme points of a polyhedron is a polytope. For example, the polytope P in Figure 3.5 is defined by three extreme points. Definition 3.24 (cone) A subset C of Rn is a cone if λx ∈ C whenever x ∈ C and λ > 0. Example 3.25 (cone) (a) The set { x ∈ Rn | Ax ≤ 0m }, where A ∈ Rm×n , is a cone. Since this set is a polyhedron, this type of cone is usually called a polyhedral cone. (b) Figure 3.7(a) illustrates a polyhedral convex cone in R2 . Figure 3.7(b) illustrates a non-convex cone in R2 ; as it is non-convex, it is not a polyhedral cone. We have arrived at the most important theorem of this section, namely the Representation Theorem, which tells that every polyhedron is the sum of a polytope and a polyhedral cone. The Representation Theorem will have great importance in the linear programming theory in Chapter 8. Theorem 3.26 (Representation Theorem) Let A ∈ Rm×n and b ∈ Rm . Let Q := { x ∈ Rn | Ax ≤ b }, P denote the convex hull of the extreme points of Q, and C := { x ∈ Rn | Ax ≤ 0m }. If rank A = n then Q = P + C := { x ∈ Rn | x = u + v for some u ∈ P and v ∈ C }. In other words, every polyhedron (that has at least one extreme point) is the sum of a polytope and a polyhedral cone. ˜ be the corresponding equality subsystem ˜x = b ˜ ∈ Q and A˜ Proof. Let x ˜ of A˜ x ≤ b. We prove the theorem by induction on the rank of A. ˜ ˜ is an extreme If rank A = n it follows from Theorem 3.21 that x ˜ ∈ P + C, since 0n ∈ C. Now assume that x ˜ ∈ P + C for point of Q, so x 53
Convex analysis
(a)
(b)
Figure 3.7: (a) A convex cone in R2 . (b) A non-convex cone in R2 .
˜ ≤ n, and choose x ˜ = k − 1. ˜ ∈ Q with k ≤ rank A ˜ ∈ Q with rank A all x ˜ ˜ = 0m Then there exists a w �= 0n such that Aw . If |λ| is sufficiently ˜ + λw ∈ Q. (Why?) If x ˜ + λw ∈ Q for all λ ∈ R small it follows that x we must have Aw = 0n which implies rank A ≤ n − 1, a contradiction. ˜ + λ+ w ∈ Q. Then if Suppose that there exists a largest λ+ such that x + ¯ ¯ x + λ+ w) ≤ b we must A(˜ x + λ w) = b is the equality subsystem of A(˜ ¯ have rank A ≥ k. (Why?) By the induction hypothesis it then follows ˜ + λ+ w ∈ P + C. On the other hand, if x ˜ + λw ∈ Q for all λ ≥ 0 that x ˜ + λ(−w) ∈ Q for all λ ≥ 0 then Aw ≤ 0m , so w ∈ C. Similarly, if x ˜ +λ− (−w) ∈ Q then −w ∈ C, and if there exists a largest λ− such that x − ˜ + λ (−w) ∈ P + C. then x Above we obtained a contradiction if none of λ+ or λ− existed. If ˜ + λ+ w ∈ P + C and −w ∈ C, only one of them exists, say λ+ , then x ˜ ∈ P + C. Otherwise, if both λ+ and λ− exist then and it follows that x + ˜ + λ w ∈ P + C and x ˜ + λ− (−w) ∈ P + C, and x ˜ can be written as x ˜ ∈ P + C. We have a convex combination of these points, which gives x ˜ ∈ P + C for all x ˜ ∈ Q with k − 1 ≤ rank A˜ ≤ n and the shown that x theorem follows by induction.
Example 3.27 (illustration of the Representation Theorem) Figure 3.8(a) ˜ can be written as a shows a bounded polyhedron. The interior point x convex combination of the extreme point x5 and the point v on the 54
Polyhedral theory boundary, that is, there is a λ ∈ (0, 1) such that ˜ = λx5 + (1 − λ)v. x
Further, the point v can be written as a convex combination of the extreme points x2 and x3 , that is, there exists a µ ∈ (0, 1) such that v = µx2 + (1 − µ)x3 .
This gives that ˜ = λx5 + (1 − λ)µx2 + (1 − λ)(1 − µ)x3 , x
and since λ, (1 − λ)µ, (1 − λ)(1 − µ) ≥ 0 and
λ + (1 − λ)µ + (1 − λ)(1 − µ) = 1
˜ lies in the convex hull of the extreme points x2 , holds we have that x 3 5 x , and x . ˜ Figure 3.8(b) shows an unbounded polyhedron. The interior point x can be written as a convex combination of the extreme point x3 and the point v on the boundary, that is, there exists a λ ∈ (0, 1) such that ˜ = λx3 + (1 − λ)v. x
The point v lies on the half-line { x ∈ R2 | x = x2 + µ(x1 − x2 ), µ ≥ 0 }. All the points on this half-line are feasible, which gives that if the polyhedron is given by { x ∈ R2 | Ax ≤ b } then A(x2 + µ(x1 − x2 )) = Ax2 + µA(x1 − x2 ) ≤ b,
µ ≥ 0.
But then we must have that A(x1 − x2 ) ≤ 02 since otherwise some component of µA(x1 − x2 ) tends to infinity as µ tends to infinity. Therefore x1 − x2 lies in the cone C := { x ∈ R2 | Ax ≤ 02 }. Now there exists a µ ≥ 0 such that v = x2 + µ(x1 − x2 ),
and it follows that ˜ = λx3 + (1 − λ)x2 + (1 − λ)µ(x1 − x2 ), x
˜ is the sum of a point in the so since (1 − λ)µ ≥ 0 and x1 − x2 ∈ C, x convex hull of the extreme points and a point in the polyhedral cone C. ˜ in a polyhedron is normally Note that the representation of a vector x not uniquely determined; in the case of Figure 3.8(a), for example, we ˜ as a convex combination of x1 , x4 , and x5 . can also represent x Note finally that the unbounded polyhedron in (b) can be represented as the sum of the convex hull of the vectors x1 , x2 , x3 , and x4 , and the cone (or, conical hull) of the two extreme directions of the set which may be represented by the two vectors x1 − x2 and x4 − x3 . 55
Convex analysis
v
x2 x1
v
x1 x
3
˜ x
x4
˜ x x4
x2 x3
x5 (a)
(b)
Figure 3.8: Illustration of the Representation Theorem (a) in the bounded case, and (b) in the unbounded case.
3.2.4
Fourier Elimination
Given a system of linear inequalities, a natural question to ask is whether or not it has a solution. A simple method to answer this question was devised by Fourier [Fou1826]. Fourier’s method is similar to Gaussian elimination, performing row operations to eliminate one variable at a time. In addition, it will be seen in Section 3.2.5 that Fourier elimination can be used to derive Farkas’ Lemma, with important consequences in the subsequent discussions. Let A ∈ Rm×n and b ∈ Rm be given. We are interested in determining if the system Ax ≤ b has a solution. Our first step to answer the question is to reduce this question to another question for a system with n − 1 variables. Specifically, we aim to derive necessary and sufficient conditions for which, given a vector (¯ x1 , . . . , x ¯n−1 )T ∈ Rn−1 , there exists T x ¯n ∈ R such that (¯ x1 , . . . , x ¯n ) satisfies Ax ≤ b. Let I := {1, . . . , m}, and define I − := {i ∈ I : ain < 0}, I 0 := {i ∈ I : ain = 0}. n−1 For each i ∈ I + ∪ I − , we divide the inequality j=1 aij xj + ain xn ≤ bi by |ain |. This results in the following system which is equivalent to
I + := {i ∈ I : ain > 0},
56
Polyhedral theory Ax ≤ b:
n−1 j=1
n−1 j=1
n−1 j=1
a′ij xj + xn
≤ b′i ,
i ∈ I+
a′ij xj − xn
≤ b′i ,
i ∈ I−
a′ij xj
≤ b′i ,
i ∈ I0
(3.2)
where a′ij = aij /|ain | and b′i = bi /|ain | for i ∈ I + ∪ I − . For each pair i ∈ I + and k ∈ I − , we sum the two inequalities indexed by i and k, and we add the resulting inequality to the system in (3.2). In addition, we remove the inequalities indexed by I + and I − from (3.2). This results in the following system: n−1
′ j=1 (aij
+ a′kj )xj
n−1 j=1
aij xj
≤ b′i + b′k ,
i ∈ I +, k ∈ I −,
≤ bi ,
i ∈ I0.
(3.3)
Since the inequalities in (3.3) are implied by the ones in (3.2) and the system in (3.2) is equivalent to Ax ≤ b, if (¯ x1 , . . . , x ¯n−1 , x ¯n )T satT isfies Ax ≤ b then (¯ x1 , . . . , x ¯n−1 ) satisfies (3.3). It turns out that the converse also holds, as explained by the following theorem. Theorem 3.28 A vector (¯ x1 , . . . , x¯n−1 )T satisfies (3.3) if and only if there exists x ¯n such that (¯ x1 , . . . , x ¯n−1 , x ¯n )T satisfies Ax ≤ b. Proof. The “if” statement was already discussed. For the converse, assume there is a vector (¯ x1 , . . . , x ¯n−1 )T satisfying (3.3). Note that the first set of inequalities in (3.3) can be rewritten as n−1 j=1
a′kj xj − b′k ≤ b′i −
n−1 j=1
a′ij xj ,
i ∈ I +, k ∈ I −.
(3.4)
′ ′ Let l := maxk∈I − { n−1 ¯j −b′k } and u := mini∈I + {b′i − n−1 ¯j }, j=1 akj x j=1 aij x − + where we define l := −∞ if I = ∅ and u := +∞ if I = ∅. Since (¯ x1 , . . . , x ¯n−1 )T satisfies (3.3), we have that l ≤ u. Therefore, there exists x ¯n such that l ≤ x¯n ≤ u. The inequalities l ≤ x¯n and x ¯n ≤ u imply that the vector (¯ x1 , . . . , x ¯n−1 , x¯n )T satisfies the first two sets of inequalities in (3.2), respectively. Therefore, the vector (¯ x1 , . . . , x ¯n−1 , x ¯n )T satisfies the entire system (3.2), which is equivalent to Ax ≤ b. Because of Theorem 3.28, the problem of finding a solution to Ax ≤ b is reduced to finding a solution to the system in (3.3) with n−1 variables (one fewer than the system Ax ≤ b). Continuing the reduction pattern, 57
Convex analysis for a given system Ax ≤ b, whether or not it has a solution can be determined by Fourier’s elimination method as follows: Fourier’s elimination method Step 0 Given a system of linear inequalities Ax ≤ b, let An := A, bn := b; Step 1 For p = n, . . . , 1, eliminate variable xp from Ap x ≤ bp with the elimination procedure described above to obtain Ap−1 x ≤ bp−1 (with the example of An−1 x ≤ bn−1 given in (3.3)); Step 2 Consider the system A0 x ≤ b0 without variables, which is equivalent to 0m ≤ b0 . Declare Ax ≤ b having a solution if and only if all entries in b0 are nonnegative. Remark 3.29 1. At each iteration, Fourier’s elimination method removes |I + | + |I − | inequalities and adds |I + | × |I − | inequalities. In the worst case the number of inequalities is approximately squared with each iteration. Therefore, after eliminating p variables, the number of inequalities may be exponential in p. 2. If matrix A and vector b have only rational entries, then all coefficients in (3.3) are rational. 3. Every inequality of Ai x ≤ bi is a nonnegative combination of inequalities of Ax ≤ b. Example 3.30 Consider the below system A3 x ≤ b3 of linear inequalities in three variables: − x1
≤ −1, ≤ −1, − x3 ≤ −1, − x1 − x2 ≤ −3, − x1 − x3 ≤ −3, − x2 − x3 ≤ −3, x1 + x2 + x3 ≤ 6. 58
− x2
Polyhedral theory Applying Fourier’s procedure to eliminate variable x3 , we obtain the system A2 x ≤ b2 , with the appearance − x1
− x2 − x1 − x2 x1 + x2 x2 x1
≤ −1, ≤ −1, ≤ −3, ≤ 5, ≤ 3, ≤ 3,
where the last three inequalities are obtained from A3 x ≤ b3 by summing the third, fifth, and sixth inequality, respectively, with the last inequality. We continue to eliminate variable x2 to obtain A1 x ≤ b1 as − x1 x1 x1 0 0 − x1
≤ −1, ≤ 3, ≤ 4, ≤ 2, ≤ 2, ≤ 0,
where the last four inequalities in A1 x ≤ b1 are obtained by summing up appropriate inequalities in A2 x ≤ b2 . For example, the fourth inequality in A1 x ≤ b1 is resulted by summing up the second and the fourth inequalities in A2 x ≤ b2 . Finally, A0 x ≤ b0 is 0 0 0 0 0 0
≤ 3 − 1, ≤ 4 − 1, ≤ 3, ≤ 4, ≤ 2, ≤ 2.
It can be verified that A0 x ≤ b0 is feasible. Therefore, the original system A3 x ≤ b3 has at least one solution. A solution can now be found by backward substitution. System A1 x ≤ 1 b is equivalent to 1 ≤ x1 ≤ 3. Since x1 can take any value in the interval, we choose x ¯1 = 3. Substituting x1 = 3 in A2 x ≤ b2 , we obtain 1 ≤ x2 ≤ 2. If we choose x ¯2 = 1 and substitute x2 = 1 and x1 = 3 in ¯ = (3, 1, 2)T . A3 x ≤ b3 , we obtain x3 = 2. This gives the solution x As mentioned in Remark 3.29.4, every inequality of A0 x ≤ b0 is a nonnegative combination of the inequalities in Ax ≤ b. For example, the inequality 0 ≤ 4 is resulted from uT Ax ≤ uT b with u = (0, 1, 1, 1, 1, 0, 2)T ≥ 07 . This observation will be useful in the proof of Farkas’ Lemma in the next section. 59
NICLAS ANDRÉASSON is a licentiate of technology in applied mathematics from Chalmers University of Technology, Gothenburg, Sweden. ANTON EVGRAFOV is a doctor of philosophy in applied mathematics from Chalmers University of Technology, Gothenburg, Sweden, and is presently a professor at the Department of Mathematical Sciences at the Norwegian University of Science and Technology, Trondheim. Norway. MICHAEL PATRIKSSON is a professor of applied mathematics at Chalmers University of Technology, Gothenburg, Sweden. EMIL GUSTAVSSON is a doctor of philosophy in mathematics from the University of Gothenburg, Sweden. ZUZANA NEDĔLKOVÁ is a licentiate of technology in applied mathematics at Chalmers University of Technology, Gothenburg, Sweden. KIN CHEONG SOU is a research associate in applied mathematics at Chalmers University of Technology, Gothenburg, Sweden. MAGNUS ÖNNHEIM is a licentiate of technology in applied mathematics at Chalmers University of Technology, Gothenburg, Sweden.
An Introduction to Continuous Optimization Foundations and Fundamental Algorithms Optimization, or mathematical programming, is a fundamental subject within decision science and operations research in which mathematical decision models are constructed, analyzed, and solved. Andreasson provides a basis for the analysis of optimization models and candidate optimal solutions for continuous optimization models. Elementary but rigorous mathematics and linear algebra underlie the workings of convexity and duality, and necessary/sufficient local/global optimality conditions for continuous optimization problems. Natural algorithms are developed from these optimality conditions, and their most important convergence characteristics are analyzed. The author answers many more “Why?” and “Why not?” than “How?” questions. The author provides lecture, exercise and reading material for a first course on continuous optimization and mathematical programming. It is geared towards third-year students, and has already been used for nearly ten years. The preface describes the main changes to the first edition. The book can be used in mathematical optimization courses at engineering, economics, and business schools. It is a perfect starting point for anyone who wishes to develop an understanding of the subject before actually applying it. Art.nr 32217
Third edition studentlitteratur.se
ISBN 978-91-44-11529-0
9 789144 115290