An Introduction to Continuous Optimization Foundations and Fundamental Algorithms Optimization, or mathematical programming, is a fundamental subject within decision science and operations research, in which mathematical decision models are constructed, analyzed, and solved.
The book can be used in mathematical optimization courses at any mathematics, engineering, economics, and business schools. It is a perfect starting book for anyone who wishes to develop his/her understanding of the subject of optimization, before actually applying it. Art.nr 32217
Second edition
ISBN 978-91-44-06077-4
www.studentlitteratur.se
9 789144 060774
978-91-44-06077-4_01_cover.indd 1
An Introduction to Continuous Optimization
The book provides lecture, exercise and reading material for a first course on continuous optimization and mathematical programming, geared towards thirdyear students, and has already been used as such for nearly ten years. The preface to the second edition describes the main changes made since the first, 2005, edition.
An Introduction to Continuous Optimization Second edition ¯2()(x ∇∇ g∇2g(g2x ¯)) ¯x ¯) ∇ g2 ( x
2nd ed.
¯x ¯11()(x ∇ g∇ ¯)) ∇1 (ggx
¯()(x −∇f (fx ¯)) ¯x −∇f −∇ ¯) −∇f ( x
|
The book’s focus lies on providing a basis for the analysis of optimization models and of candidate optimal solutions for continuous optimization models. The main part of the mathematical material therefore concerns the analysis and linear algebra that underlie the workings of convexity and duality, and necessary/sufficient local/global optimality conditions for continuous optimization problems. Natural algorithms are then developed from these optimality conditions, and their most important convergence characteristics are analyzed. The book answers many more questions of the form “Why?” and “Why not?” than “How?”. We use only elementary mathematics in the development of the book, yet are rigorous throughout.
An d r é a s s o n E v g r a f o v Pat r i k s s o n g u s tav s s o n Ö nn h e i m
N i c l a s An d r é a ss o n is a licentiate of technology in applied mathematics at Chalmers University of Technology, Gothenburg. Ant o n E v g r a f o v is a doctor of philosophy in applied mathematics from Chalmers University of Technology, Gothenburg, and is presently an associate professor at the Department of Applied Mathematics and Computer Science, Technical University of Denmark, Lyngby. M i c h a e l Pat r i k ss o n is a professor in applied mathematics at Chalmers University of Technology, Gothenburg. E m i l G ustav ss o n is a licentiate of technology in applied mathematics at the University of Gothenburg. M a gnus Ö nn h e i m is a licentiate of technology in applied mathematics at Chalmers University of Technology, Gothenburg.
¯ x x ¯ ¯ x
¯) ∇ g1 ( x g 2 (gx) == 0 0 2 ( x)
g 2 ( x) = 0
S SS S x) = g 3 (ggx) = 0= 00 33((x) x)= = 0=00 g 1 (gx) g131((x)
g 1 ( x) = 0
N i c l a s An d r é a s s o n An t o n E v g r a f o v M i c h a e l Pat r i k s s o n
with
E m i l G u s tav s s o n M a g n u s Ö nn h e i m
2013-11-08 10.58
Copying prohibited All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. The papers and inks used in this product are eco-friendly. Art. No 32217 ISBN 978-91-44-06077-4 Edition 2:1 Š The Authors and Studentlitteratur 2005, 2013 www.studentlitteratur.se Studentlitteratur AB, Lund Cover illustration: Emil Gustavsson Cover design by Francisco Ortega Printed by Eurographic Danmark A/S, Denmark 2013
Preface to the First Edition
The present book has been developed from course notes written by the third author, and continuously updated and used in optimization courses during the past several years at Chalmers University of Technology, G¨oteborg (Gothenburg), Sweden. A note to the instructor: The book serves to provide lecture and exercise material in a first course on optimization for second to fourth year students at the university. The book’s focus lies on providing a basis for the analysis of optimization models and of candidate optimal solutions, especially for continuous (even differentiable) optimization models. The main part of the mathematical material therefore concerns the analysis and algebra that underlie the workings of convexity and duality, and necessary/sufficient local/global optimality conditions for unconstrained and constrained optimization problems. Natural algorithms are then developed from these principles, and their most important convergence characteristics analyzed. The book answers many more questions of the form “Why/why not?” than “How?” This choice of focus is in contrast to books mainly providing numerical guidelines as to how these optimization problems should be solved. The number of algorithms for linear and nonlinear optimization problems—the two main topics covered in this book—are kept quite low; those that are discussed are considered classical, and serve to illustrate the basic principles for solving such classes of optimization problems and their links to the fundamental theory of optimality. Any course based on this book therefore should add project work on concrete optimization problems, including their modelling, analysis, solution by practical algorithms, and interpretation. A note to the student: The material assumes some familiarity with linear algebra, real analysis, and logic. In linear algebra, we assume an active knowledge of bases, norms, and matrix algebra and calculus. In real analysis, we assume an active knowledge of sequences, the basic
Preface topology of sets, real- and vector-valued functions and their calculus of differentiation. We also assume a familiarity with basic predicate logic, since the understanding of proofs require it. A summary of the most important background topics is found in Chapter 2, which also serves as an introduction to the mathematical notation. The student is advised to refresh any unfamiliar or forgotten material of this chapter before reading the rest of the book. We use only elementary mathematics in the main development of the book; sections of supplementary material that provide an outlook into more advanced topics and that require more advanced methods of presentation are kept short, typically lack proofs, and are also marked with an asterisk. A detailed road map of the contents of the book’s chapters are provided at the end of Chapter 1. Each chapter ends with a selected number of exercises which either illustrate the theory and algorithms with numerical examples or develop the theory slightly further. In Appendix 15 solutions are given, in a few cases in detail. In our work on this book we have benefited from discussions with Docent Ann-Brith Str¨ omberg, presently at the Department of Mathematical Sciences at Chalmers University of Technology, as well as Dr. Fredrik Altenstedt, formerly at mathematics at Chalmers University of Technology, and currently at Jeppesen Systems AB. We thank the heads of undergraduate studies at mathematics, G¨oteborg University and Chalmers University of Technology, Jan-Erik Andersson and Sven J¨arner, respectively, for reducing our teaching duties while preparing this book. We also thank Yumi Karlsson for helping us by typesetting a main part of the first draft based on the handwritten notes; after the fact, we now realize that having been helped with this first draft made us confident that such a tremendous task as that of writing a text book would actually be possible. Finally, we thank all the students who gave us critical remarks on the first versions during 2004 and 2005. G¨oteborg, May 2005
Niclas Andr´easson Anton Evgrafov Michael Patriksson
Preface to the Second Edition
This second edition introduces several areas and items that were not included in the first edition, as well as several corrections. A brief summary of these changes are given next. Chapter 1 includes a discussion on the diet problem, in addition to that on the staff planning problem, in order to very early on introduce linear programming. Figure 1.1 now has the terminating box “Implementation”, whereas the original one had an infinite loop! Chapter 3 has been enriched by several new results on separating and supporting hyperplanes, and the associated theory of convex cones and their polar sets. Thanks to this study of separating hyperplanes, Theorem 5.17 on the necessity of the Fritz John conditions now has a complete proof. The end of Chapter 5 also includes a summary of the fascinating story of the development of the Karush–Kuhn–Tucker conditions. The sensitivity analysis in linear programming has been expanded with a discussion in Section 10.5.3 on the addition of a variable or a constraint, as well as an introduction to column generation based on the example of the minimum cost multi-commodity network flow problem (Section 10.6). Chapter 11 includes a brief discussion on Gauss–Newton methods for least-squares problems. Chapter 12 has changed its name from “Optimization over convex sets” to “Feasible-direction methods,” in order to reflect the fact that the scope is now wider—from essentially polyhedral sets to general closed sets (which, however, most often will be assumed to be convex). In particular, we have added new sections on algorithms defined by closed descent maps—an algorithm principle which was devised and analyzed mainly in the 1960s, and which is a quite elegant means to describe iterative methods. We also utilize this principle to contrast established convergent methods (such as the Frank–Wolfe method) and failed attempts (such as the algorithm of Zoutendijk). We have also added a brief discussion on reduced gradient methods, which are rela-
Preface to the Second Edition tives to the simplex method; they are—in their original statement—not convergent, but a small adjustment results in a closed descent map and hence a convergent method. Exercises and their solutions are now placed at the end of the book, rather than at the end of each chapter. The first edition from 2005 has been used in teaching of several courses at Chalmers University of Technology and the University of Gothenburg. We wish to thank all the students who have given us remarks on the book. We would also like to thank Dr. Kin Cheong Sou for remarks and corrections on the first edition. G¨oteborg, October 2013
Michael Patriksson Emil Gustavsson ¨ Magnus Onnheim
Contents
I
Introduction
1
1 Modelling and classification 1.1 Modelling of optimization problems . . . . . . . . . . . . . 1.2 A quick glance at optimization history . . . . . . . . . . . 1.3 Classification of optimization models . . . . . . . . . . . . 1.4 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Applications and modelling examples . . . . . . . . . . . . 1.6 On optimality conditions . . . . . . . . . . . . . . . . . . . 1.7 Soft and hard constraints . . . . . . . . . . . . . . . . . . 1.7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . 1.7.2 A derivation of the exterior penalty function . . . 1.8 A road map through the material . . . . . . . . . . . . . . 1.9 On the background of this book and a didactics statement 1.10 Illustrating the theory . . . . . . . . . . . . . . . . . . . . 1.11 Notes and further reading . . . . . . . . . . . . . . . . . .
II
Fundamentals
3 3 11 13 16 18 19 20 20 21 23 28 30 31
33
2 Analysis and algebra—A summary 35 2.1 Reductio ad absurdum . . . . . . . . . . . . . . . . . . . . 35 2.2 Linear algebra . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3 Convex analysis 3.1 Convexity of sets . . 3.2 Polyhedral theory . . 3.2.1 Convex hulls 3.2.2 Polytopes . . 3.2.3 Polyhedra . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
43 43 45 45 49 50
Contents 3.2.4 3.2.5
3.3 3.4 3.5
III
The Separation Theorem and Farkas’ Lemma . . . Supporting hyperplanes and the separation of two convex sets . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Convex cones and polarity . . . . . . . . . . . . . . Convex functions . . . . . . . . . . . . . . . . . . . . . . . Application: the projection of a vector onto a convex set . Notes and further reading . . . . . . . . . . . . . . . . . .
Optimality Conditions
4 An introduction to optimality conditions 4.1 Local and global optimality . . . . . . . . . . . . . . . . 4.2 Existence of optimal solutions . . . . . . . . . . . . . . . 4.2.1 A classic result . . . . . . . . . . . . . . . . . . . 4.2.2 ∗ Non-standard results . . . . . . . . . . . . . . . 4.2.3 Special optimal solution sets . . . . . . . . . . . 4.3 Optimality conditions for unconstrained optimization . . 4.4 Optimality conditions for optimization over convex sets 4.5 Near-optimality in convex optimization . . . . . . . . . . 4.6 Applications . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Continuity of convex functions . . . . . . . . . . 4.6.2 The Separation Theorem . . . . . . . . . . . . . 4.6.3 Euclidean projection . . . . . . . . . . . . . . . . 4.6.4 Fixed point theorems . . . . . . . . . . . . . . . 4.7 Notes and further reading . . . . . . . . . . . . . . . . .
56 64 67 68 77 79
81 . . . . . . . . . . . . . .
83 83 86 86 89 91 93 96 104 105 105 107 107 109 115
5 Optimality conditions 117 5.1 Relations between optimality conditions and CQs at a glance117 5.2 A note of caution . . . . . . . . . . . . . . . . . . . . . . . 118 5.3 Geometric optimality conditions . . . . . . . . . . . . . . 120 5.4 The Fritz John conditions . . . . . . . . . . . . . . . . . . 126 5.5 The Karush–Kuhn–Tucker conditions . . . . . . . . . . . . 133 5.6 Proper treatment of equality constraints . . . . . . . . . . 137 5.7 Constraint qualifications . . . . . . . . . . . . . . . . . . . 139 5.7.1 Mangasarian–Fromovitz CQ (MFCQ) . . . . . . . 139 5.7.2 Slater CQ . . . . . . . . . . . . . . . . . . . . . . . 140 5.7.3 Linear independence CQ (LICQ) . . . . . . . . . . 140 5.7.4 Affine constraints . . . . . . . . . . . . . . . . . . . 141 5.8 Sufficiency of the KKT conditions under convexity . . . . 142 5.9 Applications and examples . . . . . . . . . . . . . . . . . . 144 5.10 Notes and further reading . . . . . . . . . . . . . . . . . . 146
Contents 6 Lagrangian duality 149 6.1 The relaxation theorem . . . . . . . . . . . . . . . . . . . 149 6.2 Lagrangian duality . . . . . . . . . . . . . . . . . . . . . . 150 6.2.1 Lagrangian relaxation and the dual problem . . . . 150 6.2.2 Global optimality conditions . . . . . . . . . . . . 155 6.2.3 Strong duality for convex programs . . . . . . . . . 157 6.2.4 Strong duality for linear and quadratic programs . 162 6.2.5 Two illustrative examples . . . . . . . . . . . . . . 164 6.3 Differentiability properties of the dual function . . . . . . 166 6.3.1 Subdifferentiability of convex functions . . . . . . . 166 6.3.2 Differentiability of the Lagrangian dual function . 170 6.4 ∗ Subgradient optimization methods . . . . . . . . . . . . . 172 6.4.1 Convex problems . . . . . . . . . . . . . . . . . . . 172 6.4.2 Application to the Lagrangian dual problem . . . . 178 6.4.3 The generation of ascent directions . . . . . . . . . 181 6.5 ∗ Obtaining a primal solution . . . . . . . . . . . . . . . . 182 6.5.1 Differentiability at the optimal solution . . . . . . 183 6.5.2 Everett’s Theorem . . . . . . . . . . . . . . . . . . 184 6.6 ∗ Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . 186 6.6.1 Analysis for convex problems . . . . . . . . . . . . 186 6.6.2 Analysis for differentiable problems . . . . . . . . . 188 6.7 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.7.1 Electrical networks . . . . . . . . . . . . . . . . . . 189 6.7.2 A Lagrangian relaxation of the traveling salesman problem . . . . . . . . . . . . . . . . . . . . . . . . 193 6.8 Notes and further reading . . . . . . . . . . . . . . . . . . 198
IV
Linear Programming
201
7 Linear programming: An introduction 203 7.1 The manufacturing problem . . . . . . . . . . . . . . . . . 203 7.2 A linear programming model . . . . . . . . . . . . . . . . 204 7.3 Graphical solution . . . . . . . . . . . . . . . . . . . . . . 205 7.4 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . 205 7.4.1 An increase in the number of large pieces available 206 7.4.2 An increase in the number of small pieces available 207 7.4.3 A decrease in the price of the tables . . . . . . . . 208 7.5 The dual of the manufacturing problem . . . . . . . . . . 209 7.5.1 A competitor . . . . . . . . . . . . . . . . . . . . . 209 7.5.2 A dual problem . . . . . . . . . . . . . . . . . . . . 209 7.5.3 Interpretations of the dual optimal solution . . . . 210
Contents 8 Linear programming models 211 8.1 Linear programming modelling . . . . . . . . . . . . . . . 211 8.2 The geometry of linear programming . . . . . . . . . . . . 216 8.2.1 Standard form . . . . . . . . . . . . . . . . . . . . 217 8.2.2 Basic feasible solutions and the Representation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 220 8.2.3 Adjacent extreme points . . . . . . . . . . . . . . . 227 8.3 Notes and further reading . . . . . . . . . . . . . . . . . . 229 9 The simplex method 9.1 The algorithm . . . . . . . . . . . . . . 9.1.1 A BFS is known . . . . . . . . 9.1.2 A BFS is not known: phase I & 9.1.3 Alternative optimal solutions . 9.2 Termination . . . . . . . . . . . . . . . 9.3 Computational complexity . . . . . . . 9.4 Notes and further reading . . . . . . .
. . . . II . . . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
231 . 231 . 232 . 239 . 243 . 243 . 244 . 245
10 LP duality and sensitivity analysis 247 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 247 10.2 The linear programming dual . . . . . . . . . . . . . . . . 248 10.2.1 Canonical form . . . . . . . . . . . . . . . . . . . . 249 10.2.2 Constructing the dual . . . . . . . . . . . . . . . . 249 10.3 Linear programming duality theory . . . . . . . . . . . . . 253 10.3.1 Weak and strong duality . . . . . . . . . . . . . . . 253 10.3.2 Complementary slackness . . . . . . . . . . . . . . 256 10.3.3 An alternative derivation of the dual linear program260 10.4 The dual simplex method . . . . . . . . . . . . . . . . . . 260 10.5 Sensitivity analysis . . . . . . . . . . . . . . . . . . . . . . 264 10.5.1 Perturbations in the objective function . . . . . . . 265 10.5.2 Perturbations in the right-hand side coefficients . . 266 10.5.3 Addition of a variable or a constraint . . . . . . . 267 10.6 Column generation in linear programming . . . . . . . . . 269 10.6.1 The minimum cost multi-commodity network flow problem . . . . . . . . . . . . . . . . . . . . . . . . 269 10.6.2 The column generation principle . . . . . . . . . . 272 10.6.3 An algorithm instance . . . . . . . . . . . . . . . . 274 10.7 Notes and further reading . . . . . . . . . . . . . . . . . . 276
Contents
V
Algorithms
279
11 Unconstrained optimization 281 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 281 11.2 Descent directions . . . . . . . . . . . . . . . . . . . . . . 283 11.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . 283 11.2.2 Newton’s method and extensions . . . . . . . . . . 285 11.2.3 Least-squares problems and the Gauss–Newton algorithm . . . . . . . . . . . . . . . . . . . . . . . . 290 11.3 The line search problem . . . . . . . . . . . . . . . . . . . 292 11.3.1 A characterization of the line search problem . . . 292 11.3.2 Approximate line search strategies . . . . . . . . . 293 11.4 Convergent algorithms . . . . . . . . . . . . . . . . . . . . 295 11.5 Finite termination criteria . . . . . . . . . . . . . . . . . . 298 11.6 A comment on non-differentiability . . . . . . . . . . . . . 299 11.7 Trust region methods . . . . . . . . . . . . . . . . . . . . . 301 11.8 Conjugate gradient methods . . . . . . . . . . . . . . . . . 302 11.8.1 Conjugate directions . . . . . . . . . . . . . . . . . 303 11.8.2 Conjugate direction methods . . . . . . . . . . . . 304 11.8.3 Generating conjugate directions . . . . . . . . . . . 305 11.8.4 Conjugate gradient methods . . . . . . . . . . . . . 306 11.8.5 Extension to non-quadratic problems . . . . . . . . 309 11.9 A quasi-Newton method: DFP . . . . . . . . . . . . . . . 309 11.10Convergence rates . . . . . . . . . . . . . . . . . . . . . . 312 11.11Implicit functions . . . . . . . . . . . . . . . . . . . . . . . 313 11.12Notes and further reading . . . . . . . . . . . . . . . . . . 314 12 Feasible-direction methods 317 12.1 Feasible-direction methods . . . . . . . . . . . . . . . . . . 317 12.2 The Frank–Wolfe algorithm . . . . . . . . . . . . . . . . . 319 12.3 The simplicial decomposition algorithm . . . . . . . . . . 322 12.4 The gradient projection algorithm . . . . . . . . . . . . . 325 12.5 Application: traffic equilibrium . . . . . . . . . . . . . . . 331 12.5.1 Model analysis . . . . . . . . . . . . . . . . . . . . 331 12.5.2 Algorithms and a numerical example . . . . . . . . 334 12.6 Algorithms given by closed maps . . . . . . . . . . . . . . 335 12.6.1 Algorithmic maps . . . . . . . . . . . . . . . . . . 335 12.6.2 Closed maps . . . . . . . . . . . . . . . . . . . . . 338 12.6.3 A convergence theorem . . . . . . . . . . . . . . . 339 12.6.4 Additional results on composite maps . . . . . . . 342 12.6.5 Convergence of some algorithms defined by closed descent maps . . . . . . . . . . . . . . . . . . . . . 346 12.7 Active set methods . . . . . . . . . . . . . . . . . . . . . . 347
Contents 12.7.1 The methods of Zoutendijk, Topkis, and Veinott . 12.7.2 A primal–dual active set method for polyhedral constraints . . . . . . . . . . . . . . . . . . . . . . 12.7.3 Rosen’s gradient projection algorithm . . . . . . . 12.8 Reduced gradient algorithms . . . . . . . . . . . . . . . . 12.8.1 Introduction . . . . . . . . . . . . . . . . . . . . . 12.8.2 The reduced gradient algorithm . . . . . . . . . . . 12.8.3 Convergence analysis . . . . . . . . . . . . . . . . . 12.9 Notes and further reading . . . . . . . . . . . . . . . . . . 13 Constrained optimization 13.1 Penalty methods . . . . . . . . . . . . . . . . . . . . . 13.1.1 Exterior penalty methods . . . . . . . . . . . . 13.1.2 Interior penalty methods . . . . . . . . . . . . 13.1.3 Computational considerations . . . . . . . . . . 13.1.4 Applications and examples . . . . . . . . . . . 13.2 Sequential quadratic programming . . . . . . . . . . . 13.2.1 Introduction . . . . . . . . . . . . . . . . . . . 13.2.2 A penalty-function based SQP algorithm . . . 13.2.3 A numerical example on the MSQP algorithm . 13.2.4 On recent developments in SQP algorithms . . 13.3 A summary and comparison . . . . . . . . . . . . . . . 13.4 Notes and further reading . . . . . . . . . . . . . . . .
VI
. . . . . . . . . . . .
Exercises
14 Exercises Chapter 1: Modelling and classification . . . . . . . Chapter 3: Convexity . . . . . . . . . . . . . . . . . Chapter 4: An introduction to optimality conditions Chapter 5: Optimality conditions . . . . . . . . . . . Chapter 6: Lagrangian duality . . . . . . . . . . . . Chapter 8: Linear programming models . . . . . . . Chapter 9: The simplex method . . . . . . . . . . . Chapter 10: LP duality and sensitivity analysis . . . Chapter 11: Unconstrained optimization . . . . . . . Chapter 12: Feasible direction methods . . . . . . . Chapter 13: Constrained optimization . . . . . . . .
. . . . . . . . . . . .
348 353 356 360 360 361 363 364 367 367 368 372 375 376 379 379 382 387 388 388 389
391 . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
393 393 396 400 406 411 416 418 420 425 430 433
Contents 15 Answers to the exercises Chapter 1: Modelling and classification . . . . . . . Chapter 3: Convexity . . . . . . . . . . . . . . . . . Chapter 4: An introduction to optimality conditions Chapter 5: Optimality conditions . . . . . . . . . . . Chapter 6: Lagrangian duality . . . . . . . . . . . . Chapter 8: Linear programming models . . . . . . . Chapter 9: The simplex method . . . . . . . . . . . Chapter 10: LP duality and sensitivity analysis . . . Chapter 11: Unconstrained optimization . . . . . . . Chapter 12: Optimization over convex sets . . . . . Chapter 13: Constrained optimization . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
437 437 440 443 446 449 451 453 456 458 460 461
References
463
Index
479
Convex analysis
III
This chapter is devoted to the theory of convex sets and functions. Section 3.1 provides the definition of a convex set and some examples of convex as well as non-convex sets. Section 3.2 is devoted to the theory of polyhedra, introducing important notions such as convex and affine hulls, and including the Representation Theorem which has a crucial role in linear programming. The first main building block when deriving these results is the Separation Theorem 3.29 (which talks about the existence of a hyperplane that separates a convex set from a point lying outside of the set); based on it, we derive Farkas’ Lemma and several relatives thereof, all dealing with the relationships between the solvability of two related linear systems of equations and inequalities which are known as “theorems of the alternative�. The second main building block is Theorem 3.46, which in this section is used to establish Gordan’s Theorem 3.38. The section also includes results on supporting hyperplanes and polar sets. Section 3.3 is devoted to characterizations of convex functions utilizing the differentiability properties of the function in question, and in particular establishes a characterization of the convexity of a function through the convexity of its epigraph.
3.1
Convexity of sets
Definition 3.1 (convex set) Let S ⊆ Rn . The set S is convex if x1 , x2 ∈ S Îť ∈ (0, 1) holds.
=⇒ Îťx1 + (1 − Îť)x2 ∈ S
Convex analysis A set S is convex if, from everywhere in S, all other points of S are “visible.” Figure 3.1 illustrates a convex set.
λx1 + (1 − λ)x2
S
x
x2
1
Figure 3.1: A convex set. (For the intermediate vector shown, the value of λ is ≈ 1/2.) Two non-convex sets are shown in Figure 3.2.
x1 λx + (1 − λ)x2 1
S
S x2 Figure 3.2: Two non-convex sets.
Example 3.2 (convex and non-convex sets) By using the definition of a convex set, the following can be established: (a) The set Rn is a convex set. (b) The empty set is a convex set. (c) The set { x ∈ Rn | kxk ≤ a } is convex for every value of a ∈ R. [Note: k · k here denotes any vector norm, but we will almost always use the 2-norm, v uX u n 2 kxk2 := t xj . j=1
We will not write the index 2 , but instead use the 2-norm implicitly whenever writing k · k.] (d) The set { x ∈ Rn | kxk = a } is non-convex for every a > 0. (e) The set {0, 1, 2} is non-convex. (The second illustration in Figure 3.2 is a case of a non-convex set of integral points in R2 .) 44
Polyhedral theory Proposition 3.3 (convex intersection) Suppose that Sk , k ∈ K, is any collection of convex sets. Then, the intersection \ S := Sk k∈K
is a convex set. Proof. Let both x1 and x2 belong to S. (If two such points cannot be found, then the result holds vacuously.) Then, x1 ∈ Sk and x2 ∈ Sk for all k ∈ K. Take λ ∈ (0, 1). Then, λx1 + (1 − λ)x2 ∈ Sk , k ∈ K, by the convexity of the sets Sk . So, λx1 + (1 − λ)x2 ∈ ∩k∈K Sk = S.
3.2 3.2.1
Polyhedral theory Convex hulls
Consider the set V := {v 1 , v 2 }, where v 1 , v 2 ∈ Rn and v 1 6= v 2 . A set naturally related to V is the line in Rn through v 1 and v 2 [see Figure 3.3(b)], that is, { λv 1 + (1 − λ)v 2 | λ ∈ R } = { λ1 v 1 + λ2 v 2 | λ1 , λ2 ∈ R; λ1 + λ2 = 1 }. Another set naturally related to V is the line segment between v 1 and v 2 [see Figure 3.3(c)], that is, { λv 1 + (1 − λ)v 2 | λ ∈ [0, 1] } = { λ1 v 1 + λ2 v 2 | λ1 , λ2 ≥ 0; λ1 + λ2 = 1 }. Motivated by these examples we define the affine hull and the convex hull of a finite set in Rn . Definition 3.4 (affine hull) The affine hull of a set V ⊆ Rn , denoted aff V , is the smallest affine subspace of Rn that includes V . Definition 3.5 (convex hull) The convex hull of a set V ⊆ Rn , denoted conv V , is the smallest convex set that includes V . Remark 3.6 Defining the convex hull of a set V ⊆ Rn as the smallest set containing V may seem ambiguous, after all we have not yet shown that such a set even exists! However, letting F beTthe family of convex sets in Rn containing V , we may write conv V = C∈F C, since this a convex set (by Proposition 3.3) and by construction no smaller convex set can contain V , since then it would a member of the family F. The same remark applies to affine hulls, where to show that the affine hull is well defined, we need to prove the analogue of Proposition 3.3 for affine subspaces. This is left as an exercise. 45
Convex analysis A convex combination of a set of points is a “weighted average” of those points, which is formalized in the following proposition. Definition 3.7 (affine combinations, convex combinations) An affine combination of the points {v 1 , . . . , v k } is a vector v satisfying v=
k X
λi v i
i=1
Pk Where i=1 λi = 1. If the weights additionally satisfy λi ≥ 0 for all i = 1, . . . , k, the vector v is called a convex combination. Note that the number of points in the sum defining convex and affine combinations is always assumed to be finite.
v2
v1
v2
v1 (a)
v2
v1 (b)
(c)
Figure 3.3: (a) The set V . (b) The set aff V . (c) The set conv V .
Example 3.8 (affine hull, convex hull) (a) The affine hull of three or more points in R2 not all lying on the same line is R2 itself. The convex hull of five points in R2 is shown in Figure 3.4 (observe that the “corners” of the convex hull of the points are some of the points themselves). (b) The affine hull of three points not all lying on the same line in R3 is the plane through the points. (c) The affine hull of an affine space is the space itself and the convex hull of a convex set is the set itself. A more algebraic description of convex are given by the following proposition. 46
Polyhedral theory v1 v2
v5
v4
v3 Figure 3.4: The convex hull of five points in R2 .
Proposition 3.9 Let V ⊆ Rn . Then, conv V is the set of all convex combinations of points of V . Proof. Let Q be the set of all convex combinations of points of V . We first establish that conv V ⊆ Q. To see this, we simply note that V ⊆ Q, and that Q is clearly convex. Since conv V is the smallest set with this property, conv V ⊆ Q. To establish that Q ⊆ conv V , we proceed with an induction argument. Let m X αi xi , (3.1) x= i=1
Pm
where αi ≥ 0 for i = 1, . . . , m, i=1 αi = 1, and xi ∈ V for all i. For m = 1, we let x = x1 for some x1 ∈ Q; clearly, x ∈ conv V . Suppose then that for some m > 1, any vector x given by (3.1), under the given conditions on the weights αi , belong to conv V , and let x=
m+1 X
αi xi ,
i=1
Pm+1 i with αi ≥ 0 for i = 1, . . . , m + 1, i=1 αi = 1, and x ∈ V for all i. Without loss of generality we assume that αm+1 < 1. We may then write x = (1 − αm+1 )z + αm+1 xm+1 , where z=
m X i=1
αi xi . 1 − αm+1 47
Convex analysis Pm As αi /(1 − αm+1 ) ≥ 0 for i = 1, . . . , m, and i=1 αi /(1 − αm+1 ) = 1, it follows from the induction hypothesis that x ∈ conv V . Combining the inclusions gives Q = conv V . A similar result holds for affine hulls. Proposition 3.10 Let V ⊆ Rn . Then, aff V is the set of all affine combinations of points of V . Proof. The proof is similar to that of Proposition 3.9; we leave it as an exercise. We can summarize the above two propositions as follows: for any V ⊆ Rn we have conv V =
( k X i=1
aff V =
( k X i=1
λi v i | v 1 , . . . , v k ∈ V, λi v i | v 1 , . . . , v k ∈ V,
k X i=1
k X i=1
)
λi = 1, λ1 , . . . , λk ≥ 0 , )
λk = 1 .
We emphasize that the sums in the above two expressions are assumed to be finite. Proposition 3.9 shows that every point of the convex hull of a set can be written as a convex combination of points from the set. It tells, however, nothing about how many points that are required. This is the content of Carath´eodory’s Theorem. Theorem 3.11 (Carath´eodory’s Theorem) Let x ∈ conv V , where V ⊆ Rn . Then, x can be expressed as a convex combination of n + 1 or fewer points of V . Proof. From Proposition 3.9 it follows that x = λ1P a1 + · · · + λm am for m 1 m some a , . . . , a ∈ V and λ1 , . . . , λm ≥ 0 such that i=1 λi = 1. We assume that this representation of x is chosen so that x cannot be expressed as a convex combination of fewer than m points of V . It follows that no two of the points a1 , . . . , am are equal and that λ1 , . . . , λm > 0. We prove the theorem by showing that m ≤ n + 1. Assume that m > n + 1. Then the set {a1 , . . . , am } must be affinely dependent, so exist Pm Pthere m α1 , . . . , αm ∈ R, not all zero, such that i=1 αi ai = 0n and i=1 αi = 0. Let ε > 0 be such that λ1 + εα1 , . . . , λm + εαm are non-negative with at least one of them zero (such an ε exists since the λ’s are Pmall positive and at least one of the α’s must be negative). Then, x = i=1 (λi + εαi )ai , 48
Polyhedral theory and if terms with zero coefficients are omitted then this is a representation of x with fewer than m points; this is a contradiction. Again, a similar result holds for affine hulls, we state it below and leave the proof as an exercise. Proposition 3.12 Let x ∈ aff V , where V ⊆ Rn . Then, x can be expressed as an affine combination of n + 1 or fewer points of V .
3.2.2
Polytopes
We are now ready to define the geometrical object polytope. Definition 3.13 (polytope) A subset P of Rn is a polytope if it is the convex hull of finitely many points in Rn . Example 3.14 (polytopes) (a) The set shown in Figure 3.4 is a polytope. (b) A cube and a tetrahedron are polytopes in R3 . We next show how to characterize a polytope as the convex hull of its extreme points. Definition 3.15 (extreme point) A point v of a convex set P is called an extreme point if whenever v = λx1 + (1 − λ)x2 holds, where x1 , x2 ∈ P and λ ∈ (0, 1), then v = x1 = x2 . Example 3.16 (extreme points) The set shown in Figure 3.3(c) has the extreme points v 1 and v 2 . The set shown in Figure 3.4 has the extreme points v 1 , v 2 , and v 3 . The set shown in Figure 3.3(b) does not have any extreme points. Lemma 3.17 Let V := {v 1 , . . . , v k } ⊂ Rn and let P be the polytope conv V . Then, each extreme point of P lies in V . Proof. / V is an extremePpoint of P . We have that Pk Assume that w ∈ k w = i=1 λi v i , for some λi ≥ 0 such that i=1 λi = 1. At least one of the coefficients λi must be nonzero, say λ1 . If λ1 = 1 then w = v 1 (a contradiction) so λ1 ∈ (0, 1). We have that w = λ1 v 1 + (1 − λ1 ) Pk
k X i=2
λi vi . 1 − λ1
Pk Since i=2 λi /(1 − λ1 ) = 1 we have that i=2 λi /(1 − λ1 )v i ∈ P , but w is an extreme point of P so w = v 1 , a contradiction. 49
Convex analysis Proposition 3.18 Let V := {v 1 , . . . , v k } ⊂ Rn and let P be the polytope conv V . Then P is equal to the convex hull of its extreme points. Proof. Let Q be the set of extreme points of P . If v i ∈ Q for all i = 1, . . . , k then we are done, so assume that v 1 ∈ / Q. Then v 1 = λu + (1 − λ)w for some λ ∈ (0, 1) and u, w ∈ P , u 6= w. Further, Pk Pk u = i=1 αi v i and w = i=1 βi v i , for some α1 , . . . , αk , β1 , . . . , βk ≥ 0 Pk Pk such that i=1 αi = i=1 βi = 1. Hence, v1 = λ
k X i=1
αi v i + (1 − λ)
k X i=1
βi v i =
k X i=1
(λαi + (1 − λ)βi )v i .
It must hold that α1 , β1 6= 1, since otherwise u = w = v 1 , a contradiction. Therefore, v1 =
k X i=2
Pk
λαi + (1 − λ)βi vi , 1 − (λα1 + (1 − λ)β1 )
and since i=2 (λαi + (1 − λ)βi )/(1 − λα1 − (1 − λ)β1 ) = 1 it follows that conv V = conv (V \ {v 1 }). Similarly, every v i ∈ / Q can be removed, and we end up with a set T ⊆ V such that conv T = conv V and T ⊆ Q. On the other hand, from Lemma 3.17 we have that every extreme point of the set conv T lies in T and since conv T = conv V it follows that Q is the set of extreme points of conv T , so Q ⊆ T . Hence, T = Q and we are done.
3.2.3
Polyhedra
Closely related to the polytope is the polyhedron. We will show that every polyhedron is the sum of a polytope and a polyhedral cone. In the next subsection we show that a set is a polytope if and only if it is a bounded polyhedron. Definition 3.19 (polyhedron) A subset P of Rn is a polyhedron if there exist a matrix A ∈ Rm×n and a vector b ∈ Rm such that P = { x ∈ Rn | Ax ≤ b } holds. The importance of polyhedra is obvious, since the set of feasible solutions of every linear programming problem (see Chapters 7 and 8) is a polyhedron. 50
Polyhedral theory Example 3.20 (polyhedra) (a) Figure 3.5 shows the bounded polyhedron P := { x ∈ R2 | x1 ≥ 2; x1 + x2 ≤ 6; 2x1 − x2 ≤ 4 }. (b) The unbounded polyhedron P := { x ∈ R2 | x1 + x2 ≥ 2; x1 − x2 ≤ 2; 3x1 − x2 ≥ 0 } is shown in Figure 3.6. x2
6 x1 = 2
5 4
2x1 − x2 = 4
3 P
2
x1 + x2 = 6
1 1
2
3
4
5
6
x1
Figure 3.5: Illustration of the bounded polyhedron P := { x ∈ R2 | x1 ≥ 2; x1 + x2 ≤ 6; 2x1 − x2 ≤ 4 }. Often it is hard to decide whether a point in a convex set is an extreme point or not. This is not the case for the polyhedron since there is an algebraic characterization of the extreme points of such a set. Given ˜ ∈ { x ∈ Rn | Ax ≤ b } we refer to the rows of A˜ an x x ≤ b that are fulfilled with equality as the equality subsystem of A˜ x ≤ b, and denote ˜ that is, (A, ˜ consists of the rows i ∈ {1, . . . , m} of (A, b) ˜ x = b, ˜ b) it by A˜ ˜ = bi , where Ai. is the ith row of A. The number of rows such that Ai. x ˜ ˜ in (A, b) is denoted by m. ˜ ˜ ∈ Theorem 3.21 (algebraic characterization of extreme points) Let x P = { x ∈ Rn | Ax ≤ b }, where A ∈ Rm×n has rank A = n and ˜ be the equality subsystem of A˜ ˜x = b b ∈ Rm . Further, let A˜ x ≤ b. ˜ = n. ˜ Then x is an extreme point of P if and only if rank A 51
An Introduction to Continuous Optimization Foundations and Fundamental Algorithms Optimization, or mathematical programming, is a fundamental subject within decision science and operations research, in which mathematical decision models are constructed, analyzed, and solved.
The book can be used in mathematical optimization courses at any mathematics, engineering, economics, and business schools. It is a perfect starting book for anyone who wishes to develop his/her understanding of the subject of optimization, before actually applying it. Art.nr 32217
Second edition
ISBN 978-91-44-06077-4
www.studentlitteratur.se
9 789144 060774
978-91-44-06077-4_01_cover.indd 1
An Introduction to Continuous Optimization
The book provides lecture, exercise and reading material for a first course on continuous optimization and mathematical programming, geared towards thirdyear students, and has already been used as such for nearly ten years. The preface to the second edition describes the main changes made since the first, 2005, edition.
An Introduction to Continuous Optimization Second edition ¯2()(x ∇∇ g∇2g(g2x ¯)) ¯x ¯) ∇ g2 ( x
2nd ed.
¯x ¯11()(x ∇ g∇ ¯)) ∇1 (ggx
¯()(x −∇f (fx ¯)) ¯x −∇f −∇ ¯) −∇f ( x
|
The book’s focus lies on providing a basis for the analysis of optimization models and of candidate optimal solutions for continuous optimization models. The main part of the mathematical material therefore concerns the analysis and linear algebra that underlie the workings of convexity and duality, and necessary/sufficient local/global optimality conditions for continuous optimization problems. Natural algorithms are then developed from these optimality conditions, and their most important convergence characteristics are analyzed. The book answers many more questions of the form “Why?” and “Why not?” than “How?”. We use only elementary mathematics in the development of the book, yet are rigorous throughout.
And r é a s s o n E v g r a f o v Pat r i k s s o n g u s tav s s o n Ö nn h e i m
N i c l a s An d r é a ss o n is a licentiate of technology in applied mathematics at Chalmers University of Technology, Gothenburg. Ant o n E v g r a f o v is a doctor of philosophy in applied mathematics from Chalmers University of Technology, Gothenburg, and is presently an associate professor at the Department of Applied Mathematics and Computer Science, Technical University of Denmark, Lyngby. M i c h a e l Pat r i k ss o n is a professor in applied mathematics at Chalmers University of Technology, Gothenburg. E m i l G ustav ss o n is a licentiate of technology in applied mathematics at the University of Gothenburg. M a gnus Ö nn h e i m is a licentiate of technology in applied mathematics at Chalmers University of Technology, Gothenburg.
¯ x x ¯ ¯ x
¯) ∇ g1 ( x g 2 (gx) == 0 0 2 ( x)
g 2 ( x) = 0
S SS S x) = g 3 (ggx) = 0= 00 33((x) x)= = 0=00 g 1 (gx) g131((x)
g 1 ( x) = 0
N i c l a s An d r é a s s o n An t o n E v g r a f o v M i c h a e l Pat r i k s s o n
with
E m i l G u s tav s s o n M a g n u s Ö nn h e i m
2013-11-08 10.58