SOLUTIONS MANUAL FOR Finite Dimensional Linear Algebra
by Mark S. Gockenbach Michigan Technological University
Contents Errata for the first printing
1
2 Fields and vector spaces 2.1 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Linear combinations and spanning sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Linear independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Basis and dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Properties of bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.8 Polynomial interpolation and the Lagrange basis . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Continuous piecewise polynomial functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5 5 10 12 16 18 21 29 37 43
3 Linear operators 3.1 Linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 More properties of linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Isomorphic vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Linear operator equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Existence and uniqueness of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 The fundamental theorem; inverse operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Linear ordinary differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Graph theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Coding theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12 Linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47 47 52 56 64 67 73 81 82 83 85 86 87
4 Determinants and eigenvalues 4.1 The determinant function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Further properties of the determinant function . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Practical computation of det(A) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Eigenvalues and the characteristic polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Eigenvalues of linear operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Systems of linear ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.9 Integer programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
91 91 95 96 100 104 108 114 116
5 The Jordan canonical form 117 5.1 Invariant subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 5.2 Generalized eigenspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 5.3 Nilpotent operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 5.4 The Jordan canonical form of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 i
CONTENTS
ii 5.5 5.6
The matrix exponential . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Graphs and eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
6 Orthogonality and best approximation 149 6.1 Norms and inner products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 6.2 The adjoint of a linear operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 6.3 Orthogonal vectors and bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 6.4 The projection theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 6.5 The Gram-Schmidt process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 6.6 Orthogonal complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.7 Complex inner product spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.8 More on polynomial approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 6.9 The energy inner product and Galerkin’s method . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.10 Gaussian quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 6.11 The Helmholtz decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 7 The spectral theory of symmetric matrices 193 7.1 The spectral theorem for symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 7.2 The spectral theorem for normal matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 7.3 Optimization and the Hessian matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 7.4 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 7.5 Spectral methods for differential equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 8 The singular value decomposition 209 8.1 Introduction to the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 8.2 The SVD for general matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 8.3 Solving least-squares problems using the SVD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 8.4 The SVD and linear inverse problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 8.5 The Smith normal form of a matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 9 Matrix factorizations and numerical linear algebra 223 9.1 The LU factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 9.2 Partial pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 9.3 The Cholesky factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 9.4 Matrix norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 9.5 The sensitivity of linear systems to errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 9.6 Numerical stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 9.7 The sensitivity of the least-squares problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 9.8 The QR factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 9.9 Eigenvalues and simultaneous iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 9.10 The QR algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 10 Analysis in vector spaces 247 10.1 Analysis in Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 10.2 Infinite-dimensional vector spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 10.3 Functional analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 10.4 Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
Errata for the first printing The following corrections will be made in the second printing of the text, expected in 2011. The solutions manual is written as if they have already been made. Page 65: Exercise 14: belongs in Section 2.7. Page 65: Exercise 16: should read “(cf. Exercise 2.3.21)”, not “(cf. Exercise 2.2.21)”. Page 71: Exercise 9 (b): Z54 should be Z45 . Page 72: Exercise 11: “over V ” should be “over F ”. Page 72: Exercise 15: “i = 1, 2, . . . , k” should be “j = 1, 2, . . . , k” (twice). Page 79: Exercise 1: “x3 = 2” should be “x3 = 3”. Page 82: Exercise 14(a): “Each Ai and Bi has degree 2n + 1” should read “Ai , Bi ∈ P2n+1 for all i = 0, 1, . . . , n”. Page 100, Exercise 11: “K : C[a, b] → C[a, b]” should be “K : C[c, d] → C[a, b]” Page 114, Line 9: “L : F n → Rm ” should be “L : F n → F m ”. Page 115: Exercise 8: S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} X = {(1, 1, 1), (0, 1, 1), (0, 0, 1)}. should be S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, X = {(1, 1, 1), (0, 1, 1), (0, 0, 1)}. Page 116, Exercise 17(b): “F mn ” should be “F mn ”. Page 121, Exercise 3: “T : R4 → R3 ” should be “T : R4 → R4 ”. Page 124, Exercise 15: “T : X/ker(L) → R(U )” should be “T : X/ker(L) → R(L)”. Page 124, Exercise 15: T ([x]) = T (x) for all [x] ∈ X/ker(L) should be T ([x]) = L(x) for all [x] ∈ X/ker(L). Page 129, Exercise 4(b): Period is missing at the end of the sentence. Page 130, Exercise 8: L : Z33 → Z33 should read L : Z35 → Z35 Page 130, Exercise 13(b): “T defines . . . ” should be “S defines . . . ”. Page 131, Exercise 15: “K : C[a, b] × C[c, d] → C[a, b]” should be “K : C[c, d] → C[a, b]”. Page 138, Exercise 7(b): “define” should be “defined”. Page 139, Exercise 12: In the last line, “sp{x1 , x2 , . . . , xn }” should be “sp{x1 , x2 , . . . , xk }”. Page 139, Exercise 12: The proposed plan for the proof is not valid. Instead, the instructions should read: Choose vectors x1 , . . . , xk ∈ X such that {T (x1 ), . . . , T (xk )} is a basis for R(T ), and choose a basis {y1 , . . . , y } for ker(T ). Prove that {x1 , . . . , xk , y1 , . . . , y } is a basis for X. (Hint: First show that ker(T ) ∩ sp{x1 , . . . , xk } is trivial.) Page 140, Exercise 15: In the displayed equation, |Aii should be |Aii |. Page 168: Definition 132 defines the adjacency matrix of a graph, not the incidence matrix (which is something different). The correct term (adjacency matrix) is used throughout the rest of the section. (Change “incidence” to “adjacency” in three places: the title of Section 3.10.1, Page 168 line -2, Page 169 line 1.) Page 199, Equation (3.41d): “x1 , x2 ≤ 0” should be “x1 , x2 ≥ 0”. 1
Errata for the first printing
2
Page 204, Exercise 10: “α1 , . . . , αk ∈ R” should be “α1 , . . . , αk ≥ 0”. Also, C should not be boldface in the displayed formula. Page 221, Exercise 9: “m > n” should be “m < n”. Page 242, Corollary 194: “for each i = 1, 2, . . . , t” should be “for each i = 1, 2, . . . , m”. Page 251, Exercise 18(e): 0 , w= v should be
w=
0 v
.
(That is, the comma should be a period.) Page 256, Exercise 13: First line should read “Let X be a finite-dimensional vector space over C with basis. . . ”. References in part (b) to F n×n , F k×k , F k× , F × should be replaced with Cn×n , etc. Also, in part (b), “Prove that [T ]X ” should be replaced with “Prove that [T ]X ,X ”. Page 264, Exercise 3: Add “Assume {p, q} is linearly independent.” Page 271, Exercise 3: “. . . we introduced the incidence matrix . . . ” should be “. . . we introduced the adjacency matrix . . . ”. Page 282, Exercise 6: S = sp{(1, 3, −3, 2), (3, 7, −11, −4)} should be S = sp{(1, 4, −1, 3), (4, 7, −19, 3)}. Page 282, Exercise 7(b): “N (A) ∩ col(A)” should be “N (A) ∩ col(A) = {0}”. Page 283, Exercise 12: “Lemma 5.1.2” should be “Lemma 229”. Page 306, Example 252: “B = {p0 , D(p0 ), D2 (p0 )} = {x2 , 2x, 2}” should be “B = {D2 (p0 ), D(p0 ), p0 } = {2, 2x, x2 }”. Also, “[T ]B,B ” should be “[D]B,B ” (twice). Similarly, A should be defined as {2, −1+2x, 1−x+x2 } and “[T ]A,A ” should be “[D]A,A ”. Page 308, Exercise 3: “Suppose X is a vector space. . . ” should be “Suppose X is a finite-dimensional vector space. . . ”. Page 311, Line 7: “corresponding to λ” should be “corresponding to λi ”. Page 316, Exercise 6(f ): Should end with a “;” instead of a “.”. Page 317, Exercise 15: “ker((T − λI)2 ) = ker(A − λI)” should be “ker((T − λI)2 ) = ker(T − λI)”. Page 322, displayed equation (5.21): The last line should read vr = λvr . Page 325, Exercise 9: “If U (t0 ) is singular, say U (t)c = 0 for some c ∈ Cn , c = 0” should be “If U (t0 ) is singular, say U (t0 )c = 0 for some c ∈ Cn , c = 0”. Page 331, Line 16: “. . . is at least t + 1” should be “. . . is at least s + 1”. Page 356, Exercise 9: “. . . such that {x1 , x2 , x3 , x4 }.” should be “. . . such that {x1 , x2 , x3 , x4 } is an orthogonal basis for R4 .” Page 356, Exercise 13: “. . . be a linearly independent subset of V ” should be “. . . be an orthogonal subset of V ” Page 356, Exercise 14: “. . . be a linearly independent subset of V ” should be “. . . be an orthogonal subset of V ” Pages 365–368: Miscellaneous exercises 1–21 should be numbered 2–22. Page 365, Exercise 6 (should be 7): “. . . under the L2 (0, 1) norm” should be “. . . under the L2 (0, 1) inner product”. Page 383, Line 1: “col(T )” should be “col(A)” and “col(T )⊥ ” should be “col(A)⊥ ”. Page 383, Exercise 3: “. . . a basis for R4 ” should be “. . . a basis for R3 ”. Page 384, Exercise 6: “basis” should be “bases”. Page 385, Exercise 14: “Exercise 6.4.13” should be “Exercise 6.4.1”. “That exercise also” should be “Exercise 6.4.13”. Page 385, Exercise 15: “See Exercise 6.4” should be “See Exercise 6.4.14”. Page 400, Exercise 4: b−a (t + 1) . f˜(x) = f a + 2
Errata for the first printing
3
should be f˜(x) = f
b−a (x + 1) . a+ 2
Page 410, Exercise 1: The problem should specify = 1, k(x) = x + 1, f (x) = −4x − 1. Page 411, Exercise 6: “u( ) = 0.” should be “u( ) = 0” (i.e. there should not be a period after 0). Page 424, Exercise 1: “. . . prove (1)” should be “. . . prove (6.50)”. Page 432, Exercise 9: “G−1/2 is the inverse of G−1/2 ” should be “G−1/2 is the inverse of G1/2 ”. Page 433, Exercise 16: “. . . so we will try to estimate the values u(x1 ), u(x2 ), . . . , u(xn )” should be “. . . so we will try to estimate the values u(x1 ), u(x2 ), . . . , u(xn−1 )”. Page 438, Exercise 3: “. . . define T : Rn → F n ” should be “. . . define T : F n → F n ”. Page 448, Exercise 8: In the formula for f , −200x21 x2 should be −200x21 x2 . Also, (−1.2, 1) should be (1, 1). Page 453, Exercise 6: Add: “Assume ∇g(x(0)) has full rank.” Page 475, Exercise 10: “A = GH” should be “A = GQ”. Page 476, Exercise 15(a): m n A F = |A2ij for all A ∈ Cm×n . i=1 j=1
should be
m A F =
n
|Aij |2 for all A ∈ Cm×n .
i=1 j=1
Page 476, Exercise 15: No need to define C F again. Page 501, last paragraph: The text fails to define k ≡ (mod p) for general k, ∈ Z. The following text should be added: “In general, for k, ∈ Z, we say that k ≡ (mod p) if p divides k − , that is, if there exists m ∈ Z with k = + mp. It is easy to show that, if r is the congruence class of k ∈ Z, then p divides k − r, and hence this is consistent with the earlier definition. Moreover, it is a straightforward exercise to show that k ≡ (mod p) if and only if k and have the same congruence class modulo p.” (k) (k) Page 511, Theorem 381: “Aij = Aij ” should be “Mij = Aij ”. Page 516, Exercise 8: “A(1) , A(2) , . . . , A(n−1) ” should be “M (1) , M (2) , . . . , M (n−1) ”. Page 516, Exercise 10: n2 /2 − n/2 should be n2 − n. Page 523, Exercise 6(b): “. . . the columns of AP are. . . ” should be “. . . the columns of AP T are. . . ”. Page 535, Theorem 401: A 1 should be A ∞ (twice). Page 536, Exercise 1: “. . . be any matrix norm. . . ” should be “. . . be any induced matrix norm. . . ”. Page 554, Exercise 4: “. . . b is consider the data . . . ” should be “. . . b is considered the data . . . ”. Page 554, Exercise 5: “. . . b is consider the data . . . ” should be “. . . b is considered the data . . . ”. Page 563, Exercise 7: “Let v ∈ Rm be given and define α = ± x 2 , let {u1 , u2 , . . . , um } be an orthonormal basis for Rm , where u1 = x/ x 2 . . . ” should be “Let v ∈ Rm be given, define α = ± v 2 , x = αe1 − v, u1 = x/ x 2 , and let {u1 , u2 , . . . , um } be an orthonormal basis for Rm ,. . . ”. Page 571, Exercise 3: “Prove that the angle between Ak v0 and x1 converges to zero as k → ∞” should be “Prove that the angle between Ak v0 and sp{x1 } = EA (λ1 ) converges to zero as k → ∞. Page 575, line 15: 3n2 − n should be 3n2 + 2n − 5. Page 575, line 16: “n square roots” should be “n − 1 square roots”. Page 580, Exercise 3: “. . . requires 3n2 − n arithmetic operations, plus the calculation of n square roots,. . . ” should be “. . . requires 3n2 + 2n − 5 arithmetic operations, plus the calculation of n − 1 square roots,. . . ”. Page 585, line 19: “original subsequence” should be “original sequence”. Page 585, line 20: “original subsequence” should be “original sequence”. Page 604, Exercise 4: “Theorem 4” should be “Theorem 451”. Page 608, line 18: “. . . exists a real number. . . ” should be “. . . exists as a real number. . . ”.
Chapter 2
Fields and vector spaces 2.1
Fields
1. Let F be a field. (a) We wish to prove that −1 = 0. Let us argue by contradiction, and assume that −1 = 0. Then α + (−1) = α for all α ∈ F ; in particular, 1 + (−1) = 1. But 1 + (−1) = 0 by definition of −1, and therefore 1 = 0 must hold. This contradicts the definition of a field (which states that 1 and 0 are distinct elements), and hence −1 = 0 cannot hold in a field. (b) It need not be the case that −1 = 1; in fact, in Z2 , 1 + 1 = 0, and therefore −1 = 1. 2. Let F be a field. We wish to show that the multiplicative identity 1 is unique. Let us suppose that γ ∈ F satisfies αγ = α for all α ∈ F . We then have 1 = 1 · γ (since γ is a multiplicative identity), and also γ = 1 · γ (since 1 is a multiplicative identity). This implies that γ = 1, and hence the multiplicative identity is unique. 3. Let F be a field and let α ∈ F be nonzero. We wish to show that the multiplicative inverse of α is unique. Suppose β ∈ F satisfies αβ = 1. Then, multiplying both sides of the equation by α−1 , we obtain α−1 (αβ) = α−1 · 1, or (α−1 α)β = α−1 , or 1 · β = α−1 . It follows that β = α−1 , and thus α has a unique multiplicative inverse. 4. Let F be a field, and suppose α, β, γ ∈ F . (a) We have (−α) + α = 0; since the additive inverse of −α is unique, this implies that α = −(−α). (b) Using the associate and commutative properties of addition, we can rewrite (α + β) + (−α + (−β)) as (α + (−α)) + (β + (−β)) = 0 + 0 = 0. Therefore, −(α + β) = −α + (−β). (c) As in the last part, we can use commutativity and associativity to show that (α − β) + (−α + β) = 0. (d) We have αβ + α(−β) = α(β + (−β)) = α · 0 = 0, and this proves that −(αβ) = α(−β). (e) This follows from the previous result and the commutative property of multiplication. (f) Applying the first, fourth, and fifth results, we have (−α)(−β) = −(α(−β)) = −(−(αβ)) = αβ. (g) Assume α = 0. Then α(α−1 + (−α)−1 ) = αα−1 + α(−α)−1 = 1 − (−α)(−α)−1 = 1 − 1 = 0. Since α = 0, α(α−1 + (−α)−1 ) = 0 implies that α−1 + (−α)−1 = 0, which in turn implies that (−α)−1 = −(α−1 ). (h) Using the associative and commutative properties of multiplication, we can rewrite (αβ)(α−1 β −1 ) as (αα−1 )(ββ −1 ) = 1 · 1 = 1. This shows that α−1 β −1 = (αβ)−1 . (i) Using the definition of subtraction, the distributive property, and the fourth property above, we have α(β − γ) = α(β + (−γ)) = αβ + α(−γ) = αβ + (−(αγ)) = αβ − αγ. (j) This is proved in the same way as the previous result. 5
CHAPTER 2. FIELDS AND VECTOR SPACES
6
5. (a) By definition, i2 = (0 + 1 · i)(0 + 1 · i) = (0 · 0 − 1 · 1) + (0 · 1 + 1 · 0)i = −1 + 0 · i = −1. (b) We will prove only the existence of multiplicative inverses, the other properties of a field being straightforward (although possibly tedious) to verify. Let a + bi be a nonzero complex number (which means that a = 0 or b = 0). We must prove that there exists a complex number c + di such that (a + bi)(c + di) = (ac − bd) + (ad + bc)i equals 1 = 1 + 0 · i. This is equivalent to the pair of equations ac − bd = 1, ad + bc = 0. If a = 0, then the second equation yields d = −bc/a; substituting into the first equations yields ac + b2 c/a = 1, or c = a/(a2 + b2 ). Then d = −bc/a = −b/(a2 + b2 ). This solution is well-defined even if a = 0 (since in that case b = 0), and it can be directly verified that it satisfies (a + bi)(c + di) = 1 in that case also. Thus each nonzero complex number has a multiplicative inverse. 6. Let F be a field, and let α, β, γ ∈ F with γ = 0. Suppose αγ = βγ. Then, multiplying both sides by γ −1 , we obtain (αγ)γ −1 = (βγ)γ −1 , or α(γγ −1 ) = β(γγ −1 ), which then yields α · 1 = β · 1, or α = β. 7. Let F be a field and let α, β be elements of F . We wish to show that the equation α + x = β has a unique solution. The proof has two parts. First, if x satisfies α + x = β, then adding −α to both sides shows that x must equal −α + β = β − α. This shows that the equation has at most one solution. On the other hand, x = −α + β is a solution since α + (−α + β) = (α − α) + β = 0 + β = β. Therefore, α + x = β has a unique solution, namely, x = −α + β. 8. Let F be a field, and let α, β ∈ F . We wish to determine if the equation αx = β always has a unique solution. The answer is no; in fact, there are three possible cases. First, if α = 0 and β = 0, then αx = β is satisfied by every element of F , and there are multiple solutions in this case. Second, if α = 0, β = 0, then αx = β has no solution (since αx = 0 for all x ∈ F in this case). Third, if α = 0, then αx = β has the unique solution x = α−1 β. (Existence and uniqueness is proved in this case as in the previous exercise.) 9. Let F be a field. Let α, β, γ, δ ∈ F , with β, δ = 0. We wish to show that α γ αδ + βγ α γ αγ + = , · = . β δ βδ β δ βδ Using the definition of division, the commutative and associative properties of multiplication, and finally the distributive property, we obtain α γ + = αβ −1 + γδ −1 = (α · 1)β −1 + (γ · 1)δ −1 β δ = (α(δδ −1 ))β −1 + (γ(ββ −1 ))δ −1 = ((αδ)δ −1 )β −1 + ((γβ)β −1 )δ −1 = (αδ)(δ −1 β −1 ) + (γβ)(β −1 δ −1 ) = (αδ)(δβ)−1 ) + (γβ)(βδ)−1 = (αδ)(βδ)−1 ) + (γβ)(βδ)−1 = ((αδ) + (γβ))(βδ)−1 αδ + βγ . = βδ Similarly, α γ · = (αβ −1 )(γδ −1 ) = ((αβ −1 )γ)δ −1 = (α(β −1 γ))δ −1 = (α(γβ −1 ))δ −1 β δ αγ . = ((αγ)β −1 )δ −1 = (αγ)(β −1 δ −1 ) = (αγ)(βδ)−1 = βδ Now assuming that β, γ, δ = 0, we wish to show that α/β αδ = . γ/δ βγ
2.1. FIELDS
7
Using the fact that (δ −1 )−1 = δ, we have α/β = (αβ −1 )(γδ −1 )−1 = (αβ −1 )(γ −1 δ) = ((αβ −1 )γ −1 )δ γ/δ = (α(β −1 γ −1 ))δ = (α(βγ)−1 )δ = α((βγ)−1 δ) αδ . = α(δ(βγ)−1 ) = (αδ)(βγ)−1 = βγ 10. Let F be a field, and let α ∈ F be given. We wish to prove that, for any β1 , . . . , βn ∈ F , α(β1 +· · ·+βn ) = αβ1 + · · ·+ αβn . We argue by induction on n. For n = 1, the result is simply αβ1 = αβ1 . Suppose that for some n ≥ 2, α(β1 + · · · + βn−1 ) = αβ1 + · · · + αβn−1 for any β1 , . . . , βn−1 ∈ F . Let β1 , . . . , βn−1 , βn ∈ F . Then α(β1 + · · · + βn ) = α((β1 + · · · + βn−1 ) + βn ) =α(β1 + · · · + βn−1 ) + αβn = αβ1 + · · · + αβn−1 + αβn . (In the last step, we applied the induction hypothesis, and in the step preceding that, the distributive property of addition of multiplication.) This shows that α(β1 + · · · + βn ) = αβ1 + · · · + αβn , and the general result now follows by induction. 11. (a) The space Z is not a field because multiplicative inverses do not exist in general. For example, 2 = 0, yet there exists no n ∈ Z such that 2n = 1. (b) The space Q of rational number is a field. Assuming the usual definitions for addition and multiplication, all of the defining properties of a field are straightforward to verify. (c) The space of positive real numbers is not a field because there is no additive identity. For any z ∈ (0, ∞), x + z > x for all x ∈ (0, ∞). 12. Let F = {(α, β) : α, β ∈ R}, and define addition and multiplication on F by (α, β)+(γ, δ) = (α+γ, β+δ), (α, β) · (γ, δ) = (αγ, βδ). With these definitions, F is not a field because multiplicative inverses do not exists. It is straightforward to verify that (0, 0) is an additive inverse and (1, 1) is a multiplicative inverse. Then (1, 0) = (0, 0), yet (1, 0) · (α, β) = (α, 0) = (1, 1) for all (α, β) ∈ F . Since F contains a nonzero element with no multiplicative inverse, F is not a field. 13. Let F = (0, ∞), and define addition and multiplication on F by x ⊕ y = xy, x y = xln y . We wish to show that F is a field. Commutativity and associativity of addition follow immediately from these properties for ordinary multiplication of real numbers. Obviously 1 is an additive inverse, and the additive inverse of x ∈ F is its reciprocal 1/x. The properties of multiplication are less obvious, but note that x y = eln (y x) = eln (y) ln (x) , and this formula makes both commutativity and associativity easy to verify. We also see that e is a multiplicative identity: x e = xln (e) = x1 = x for all x ∈ F . For any x ∈ F , x = 1, y = e1/ ln (x) is a multiplicative inverse. Finally, for any x, y, z ∈ F , x (y ⊕ z) = x (yz) = xln (yz) = xln (y)+ln (z) = xln (y) xln (z) = (x y)(x z) = (x y) ⊕ (x z). Thus the distributive property holds. 14. Suppose F is a set on which are defined two operations, addition and multiplication, such that all the properties of a field are satisfied except that addition is not assumed to be commutative. We wish to show that, in fact, addition must be commutative, and therefore F must be a field. We first note that it is possible to prove that 0 · γ = 0, −1 · γ = −γ, and−(−γ) = γ for all γ ∈ F without invoking commutativity of addition. Moreover, for all α, β ∈ F , −β + (−α) = −(α + β) since (α + β) + (−β + (−α)) = ((α + β) + (−β)) + (−α) = (α + (β + (−β))) + (−α) = (α + 0) + (−α) = α + (−α) = 0. We therefore conclude that −1 · (α + β) = −β + (−α) for all α, β ∈ F . But, by the distributive property, −1 · (α + β) = −1 · α + (−1) · β = −α + (−β), and therefore −α + (−β) = −β + (−α) for all α, β ∈ F . Applying this property to −α, −β in place of α, β, respectively, yields α + β = β + α for all α, β ∈ F , which is what we wanted to prove.
CHAPTER 2. FIELDS AND VECTOR SPACES
8
15. (a) In Z2 = {0, 1}, we have 0 + 0 = 0, 1 + 0 = 0 + 1 = 1, and 1 + 1 = 0. This shows that −0 = 0 (as always) and −1 = 1. Also, 0 · 0 = 0 · 1 = 1 · 0 = 0, 1 · 1 = 1, and 1−1 = 1 (as in any field). (b) The addition and multiplication tables for Z3 = {0, 1, 2} are + 0 1 2
0 0 1 2
1 1 2 0
2 2 0 1
· 0 1 2
0 0 0 0
1 0 1 2
2 0 2 1
We see that −0 = 0, −1 = 2, −2 = 1, 1−1 = 1, 2−1 = 2. The addition and multiplication tables for Z5 = {0, 1, 2, 3, 4} are + 0 1 2 3 4
0 0 1 2 3 4
1 1 2 3 4 0
2 2 3 4 0 1
3 3 4 0 1 2
4 4 0 1 2 3
+ 0 1 2 3 4
0 0 0 0 0 0
1 0 1 2 3 4
2 0 2 4 1 3
3 0 3 1 4 2
4 0 4 3 2 1
We see that −0 = 0, −1 = 4, −2 = 3, −3 = 2, −4 = 1, 1−1 = 1, 2−1 = 3, 3−1 = 2, 4−1 = 4. 16. Let p be a positive integer that is prime. We wish to show that Zp is a field. The commutativity of addition and multiplication in Zp follow immediately from the commutativity of addition and multiplication of integers, and similarly for associativity and the distributive property. Obviously 0 is an additive identity and 1 is a multiplicative identity. Also, 1 = 0. For any α ∈ Zp , the integer p − α, regarded as an element of Zp , satisfies α+(p−α) = 0 in Zp ; therefore every element of Zp has an additive inverse. It remains only to prove that every nonzero element of Zp has a multiplicative inverse. Suppose α ∈ Zp , α = 0. Since Zp has only finitely many elements, α, α2 , α3 , . . . cannot all be distinct; there must exist positive integers k, , with k > , such that αk = α in Zp . This means that the integers αk , α satisfy αk = λ + np for some positive integer n, which in turn yields αk − α = np or α (αk− − 1) = np. Now, a basic theorem from number theory states that if a prime p divides a product of integers, then it must divide one of the integers in the product. In this case, p must divide α or αk− − 1. Since 0 < α < p, p does not divide α, and therefore p divides αk− − 1. Therefore, αk− − 1 = sp, where s is a positive integer; this is equivalent to αk− = 1 in Zp . Finally, this means that ααk− −1 = 1 in Zp , and therefore α has a multiplicative inverse, namely, αk− −1 . This completes the proof that Zp is a field. 17. Let p be a positive integer that is not prime. It is easy to see that 1 is a multiplicative identity in Zp . Since p is not prime, there exist integers m, n satisfying 1 < m, n < p and mn = p. But then, if m and n are regarded as elements of Zp , m, n = 0 and mn = 0, which is impossible in a field. Therefore, Zp is not a field when p is not prime. 18. Let F be a finite field. (a) Consider the elements 1, 1 + 1, 1 + 1 + 1, . . . in F . Since F contains only finitely many elements, there must exist two terms in this sequence that are equal, say 1 + 1 + · · · + 1 ( terms) and 1 + 1 + · · · + 1 (k terms), where k > . We can then add −1 to both sides times to show that 1 + 1 + · · · + 1 (k − terms) equals 0 in F . Since at least one of the sequence 1, 1 + 1, 1 + 1 + 1, . . . equals 0, we can define n to be the smallest integer greater than 1 such that 1 + 1 + · · · + 1 = 0 (n terms). We call n the characteristic of the field. (b) Given that the characteristic of F is n, for any α ∈ F , we have α + α + · · · + α = α(1 + 1 + · · · + 1) = α · 0 = 0 if the sum has n terms. (c) We now wish to show that the characteristic n is prime. Suppose, by way of contradiction, that n = k , where 1 < k, < n. Define α = 1 + 1 + · · · + 1 (k terms) and β = 1 + 1 + · · · + 1 ( terms). Then αβ = 1 + 1 + · · · + 1 (n terms), so that αβ = 0. But this implies that α = 0 or β = 0, which contradicts the definition of the characteristic n. This contradiction shows that n must be prime.
2.1. FIELDS
9
19. We are given that H represents the space of quaternions and the definitions of addition and multiplication in H. The first two parts of the exercise are purely computational. (a) i2 = j 2 = k 2 = −1, ij = k, ik = −j, jk = i, ji = −k, ki = j, kj = −i, ijk = −1. (b) xx = xx = x21 + x22 + x23 + x24 . (c) The additive identity in H is 0 = 0 + 0i + 0j + 0k. The additive inverse of x = x1 + x2 i + x3 j + x4 k is −x = −x1 − x2 i − x3 j − x4 k. (d) The calculations above show that multiplication is not commutative; for instance, ij = k, ji = −k. (e) It is easy to verify that 1 = 1 + 0i + 0j + 0k is a multiplicative identity for H. (f) If x ∈ H is nonzero, then xx = x21 + x22 + x23 + x24 is also nonzero. It follows that x x1 x2 x3 x4 = − i− j− k xx xx xx xx xx is a multiplicative inverse for x: xx x = = 1. xx xx 20. In order to solve the first two parts of this problem, it is convenient to prove the following result. Suppose F is a field under operations +, ·, G is a nonempty subset of F , and G is a field under operations ⊕, . Moreover, suppose that for all x, y ∈ G, x + y = x ⊕ y and x · y = x y. Then G is a subfield of F . We already know (since the operations on F reduce to the operations on G when the operands belong to G) that G is closed under + and ·. We have to prove that the additive and multiplicative identities of F belong to G, which we will do by showing that 0F = 0G (that is, the additive identity of F equals the additive identity of G under ⊕) and 1F = 1G (which has the analogous meaning). To prove the first, notice that 0G + 0G = 0G ⊕ 0G since 0G ∈ G, and therefore 0G + 0G = 0G . Adding the additive inverse (in F ) −0G to both sides of this equation yields 0G = 0F . A similar proof shows that 1F = 1G . Thus 0F , 1F ∈ G. We next show that if x ∈ G and −x denotes the additive inverse of x in F , then −x ∈ G. We write x for the additive inverse of x in G. We have x ⊕ ( x) = 0, which implies that x + ( x) = 0. But then, adding −x to both sides, we obtain x = −x, and therefore −x ∈ G. Similarly, if x ∈ G, x = 0, and x−1 denotes the multiplicative inverse of x in F , then x−1 ∈ G. This completes the proof. x
(a) We wish to show that R is a subfield of C. It suffices to prove that addition and multiplication in C reduce to the usual addition and multiplication in R when the operands are real numbers. If x, y ∈ R ⊂ C, then (x + 0i) + (y + 0i) = (x + y) + (0 + 0)i = (x + y) + 0i = x + y. Similarly, (x + 0i)(y + 0i) = (xy − 0 · 0) + (x · 0 + 0 · y)i = xy + 0i = xy. The the operations on C reduce to the operations on R when the operands are elements of R, and therefore R is a subfield of C. (b) We now wish to show that C is a subfield of H by showing that the operations of H reduce to the operations on C when the operands belong to C. Let x = x1 + x2 i, y = y1 + y2 i belong to C, so that x = x1 + x2 i + 0j + 0k, y = y1 + y2 i + 0j + 0k can be regarded as elements of H. By definition, x + y = (x1 + x2 i + 0j + 0k) + (y1 + y2 i + 0j + 0k) = (x1 + y1 ) + (x2 + y2 )i + (0 + 0)j + (0 + 0)k = (x1 + y1 ) + (x2 + y2 )i, xy =(x1 + x2 i + 0j + 0k)(y1 + y2 i + 0j + 0k) =(x1 y1 − x2 y2 − 0 · 0 − 0 · 0)+ (x1 y2 + x2 y1 + 0 · 0 − 0 · 0)i+ (x1 · 0 − x2 · 0 + 0 · y1 + 0 · y2 )j+ (x1 · 0 + x2 · 0 − 0 · y2 + 0 · y1 )k =(x1 y1 − x2 y2 ) + (x1 y2 + x2 y1 )i. Thus both operations on H reduce to the usual operations on C, which shows that C is a subfield of H.
CHAPTER 2. FIELDS AND VECTOR SPACES
10
(c) Consider the subset S = {a + bi + cj : a, b, c ∈ R} of H. We wish to determine whether S is a subsfield of H. In fact, S is not a subfield because it is not closed under multiplication. For example, i, j ∈ S, but ij = k ∈ S.
2.2
Vector spaces
1. Let F be a field, and let V = {0} , with addition and scalar multiplication on V defined by 0 + 0 = 0, α · 0 = 0 for all α ∈ F . We wish to prove that V is a vector space over F . This is a straightforward verification of the defining properties. The commutative property of addition is vacuous, since V contains a single element. We have (0 + 0) + 0 = 0 + 0 = 0 + (0 + 0), so the associative property holds. The definition 0 + 0 = 0 shows both that 0 is an additive identity and that 0 is the additive inverse of 0, the only vector in V . Next, for all α, β ∈ F , we have α(β · 0) = α · 0 = 0 = (αβ) · 0, so the associative property of scalar multiplication is satisfied. Also, α(0 + 0) = α · 0 = 0 = 0 + 0 = α · 0 + α · 0 and (α + β) · 0 = 0 = α · 0 + β · 0, so both distributive properties hold. Finally, 1 · 0 = 0 by definition, so the final property of a vector space holds. Thus V is a vector space over F . 2. Let F be an infinite field, and let V be a nontrivial vector space over F . We wish to show that V contains infinitely many vectors. By definition, V contains a nonzero vector u. It suffices to show that, for all α, β ∈ F , α = β implies αu = βu, since then V contains the infinite subset {αu : α ∈ F }. Suppose α, β ∈ F , α = β. Then αu = βu if and only if αu − βu = 0, that is, if and only if (α − β)u = 0. Since u = 0 by assumption, this implies that α − β = 0 by Theorem 5. Thus αu = βu implies α = β. which completes the proof. 3. Let V be a vector space over a field F . (a) Suppose z ∈ V is an additive identity. Then z + 0 = z (since 0 is an additive identity) and 0 + z = 0 (since z is an additive identity). Then z = z + 0 = 0 + z = 0, which shows that 0 is the only additive identity in V . (b) Let u ∈ V . If u + v = 0, then −u + (u + v) = −u + 0, which implies that (−u + u) + v = −u, or 0 + v = −u, or finally v = −u. Thus the additive inverse −u of u is unique. (c) Suppose u, v ∈ V . Then (u + v) + (−u + (−v)) = ((u + v) + (−u)) + (−v) = (u + (v + (−u))) + (−v) = (u + (−u + v)) + (−v) = ((u + (−u)) + v) + (−v) = (0 + v) + (−v) = v + (−v) = 0. By the preceding result, this shows that −u + (−v) = −(u + v). (d) Suppose u, v, w ∈ V and u + v = u + w. Then −u + (u + v) = −u + (u + w), which implies that (−u + u) + v = (−u + u) + w, or 0 + v = 0 + w, or finally v = w. (e) Suppose α ∈ F and 0 is the zero vector in V . Then α0 + α0 = α(0 + 0) = α0; adding −(α0) to both sides yields α0 = 0, as desired. (f) Suppose α ∈ F , u ∈ V , and αu = 0. If α = 0, then α−1 exists and α−1 (αu) = α−1 · 0, which implies (α−1 α)u = 0 (applying the last result), which in turn yields 1 · u = 0 or finally u = 0. Therefore, αu = 0 implies that α = 0 or u = 0. (g) Suppose u ∈ V . Then 0 · u + 0 · u = (0 + 0) · u = 0 · u. Adding −(0 · u) to both sides yields 0 · u = 0. We then have u + (−1)u = 1 · u + (−1)u = (1 + (−1))u = 0 · u = 0, which shows that (−1)u = −u. 4. We are to prove that if F is a field, then F n is a vector space over F . This is a straightforward verification of the defining properties of a vector space, which follow in this case from the analogous properties of the field F . The details are omitted. 5. We are to prove that F [a, b] (the space of all functions f : [a, b] → R) is a vector space over R. Like the last exercise, this straightforward verification is omitted. 6. (a) Let p be a prime and n a positive integer. Since each of the n components of x ∈ Znp can take on any of the p values 0, 1, . . . , p − 1, there are pn distinct vectors in Znp .
2.2. VECTOR SPACES
11
(b) The elements of Z22 are (0, 0), (0, 1), (1, 0), (1, 1). We have (0, 0) + (0, 0) = (0, 0), (0, 0) + (0, 1) = (0, 1), (0, 0) + (1, 0) = (1, 0), (0, 0) + (1, 1) = (1, 1), (0, 1) + (0, 1) = (0, 0), (0, 1) + (1, 0) = (1, 1), (0, 1) + (1, 1) = (1, 0), (1, 0) + (1, 0) = (0, 0), (1, 0) + (1, 1) = (0, 1), (1, 1) + (1, 1) = (0, 0). 7. (a) The elements of P1 (Z2 ) are the polynomials 0, 1, x, 1 + x, which define distinct functions on Z2 . We have 0 + 0 = 0, 0 + 1 = 1, 0 + x = x, 0 + (1 + x) = 1 + x, 1 + 1 = 0, 1 + x = 1 + x, 1 + (1 + x) = x, x+x = (1+1)x = 0x = 0, x+(1+x) = 1+(x+x) = 1, (1+x)+(1+x) = (1+1)+(x+x) = 0+0 = 0. (b) Nominally, the elements of P2 (Z2 ) are 0, 1, x, 1 + x, x2 , 1 + x2 , x + x2 , 1 + x + x2 . However, since these elements are interpreted as functions mapping Z2 into Z2 , it turns out that the last four functions equal the first four. In particular, x2 = x (as functions), since 02 = 0 and 12 = 1. Then 1 + x2 = 1 + x, x + x2 = x + x = 0, and 1 + x + x2 = 1 + 0 = 1. Thus we see that the function spaces P2 (Z2 ) and P1 (Z2 ) are the same. (c) Let V be the vector space consisting of all functions from Z2 into Z2 . To specify f ∈ V means to specify the two values f (0) and f (1). There are exactly four ways to do this: f (0) = 0, f (1) = 0 (so f (x) = 0); f (0) = 1, f (1) = 1 (so f (x) = 1); f (0) = 0, f (1) = 1 (so f (x) = x); and f (0) = 1, f (1) = 0 (so f (x) = 1 + x). Thus we see that V = P1 (Z2 ). 8. Let V = (0, ∞), with addition ⊕ and scalar multiplication defined by u ⊕ v = uv for all u, v ∈ V and α u = uα for all α ∈ R and all u ∈ V . We will prove that V is a vector space over R. First of all, ⊕ is commutative and associative (because multiplication of real numbers has these properties). For all u ∈ V , u ⊕ 1 = u · 1 = u, so there is an additive identity. Also, if u ∈ V , then 1/u ∈ V satisfies u ⊕ (1/u) = u(1/u) = 1, so each vector has an additive inverse. Next, if α, β ∈ R and u, v ∈ V , then α (β u) = α (uβ ) = (uβ )α = uαβ = (αβ) u, so the associative property of scalar multiplication holds. Also, α (u ⊕ v) = α (uv) = (uv)α = uα v α = (α u) ⊕ (α v) and (α + β) u = uα+β = uα uβ = (α u) ⊕ (β u). Thus both distributive properties hold. Finally, 1 u = u1 = u. This completes the proof that V is a vector space over R. 9. Let V = R2 with the usual scalar multiplication and the following nonstandard vector addition: u ⊕ v = (u1 + v1 , u2 + v2 + 1) for all u, v ∈ R2 . It is easy to check that commutativity and associativity of ⊕ hold, that (0, −1) is an additive identity, and that each u = (u1 , u2 ) has an additive inverse, namely, (−u1 , −u2 − 2). Also, α(βu) = (αβ)u for all u ∈ V , α, β ∈ R (since scalar multiplication is defined in the standard way). However, if α ∈ R, then α(u + v) = α(u1 + v1 , u2 + v2 + 1) = (αu1 + αv1 , αu2 + αv2 + α), while αu + αv = (αu1 , αu2 ) + (αv1 , αv2 ) = (αu1 + αv1 , αu2 + αv2 + 1), and these are unequal if α = 1. Thus the first distributive property fails to hold, and V is not a vector space over R. (In fact, the second distributive property also fails.) 10. Let V = R2 with the usual scalar multiplication, and with addition defined by u⊕v = (α1 u1 +β1 v1 , α2 u2 + β2 v2 ), where α1 , α2 , β1 , β2 ∈ R are fixed. We wish to determine what values of α1 , α2 , β1 , β2 will make V a vector space over R. We first note that ⊕ is commutative if and only if α1 = β1 , α2 = β2 . We therefore redefine u ⊕ v as (α1 u1 + α1 v1 , α2 u2 + α2 v2 ) = (α1 (u1 + v1 ), α2 (u2 + v2 )). Next, we have (u ⊕ v) ⊕ w = (α21 u1 + α21 v1 + α1 w1 , α22 u2 + α22 v2 + α2 w2 ), u ⊕ (v ⊕ w) = (α1 u1 + α21 v1 + α21 w1 , α2 u2 + α22 v2 + α22 w2 ). From this, it is easy to show that (u ⊕ v) ⊕ w = u ⊕ (v ⊕ w) for all u, v, w ∈ R if and only if α21 = α1 and α22 = α2 , that is, if and only if α1 = 0 or α1 = 1, and similarly for α2 . However, if α1 = 0 or α2 = 0, then no additive identity can exist. For suppose α1 = 0. Then u ⊕ v = (0, α2 (u2 + v2 )) for all u, v ∈ V, and no z ∈ V can satisfy u ⊕ z = u if u1 = 0. Similarly, if α2 = 0, then no additive identity can exist. Therefore, if V is to be a field over R2 , then we must have α1 = β1 = α2 = β2 = 1, and V reduces to R2 under the usual vector space operations. 11. Suppose V is the set of all polynomials (over R) of degree exactly two, together with the zero polynomial. Addition and scalar multiplication are defined on V in the usual fashion. Then V is not a vector space over R because it is not closed under addition. For example, 1 + x + x2 ∈ V , 1 + x − x2 ∈ V , but (1 + x + x2 ) + (1 + x − x2 ) = 2 + 2x ∈ V . 12. (a) We wish to find a function lying in C(0, 1) but not in C[0, 1]. A suitable function with a discontinuity at one of the endpoints provides an example. For example, f (x) = 1/x satisfies f ∈ C(0, 1) and
CHAPTER 2. FIELDS AND VECTOR SPACES
12
f ∈ C[0, 1], as does f (x) = 1/(1 − x) or f (x) = 1/(x − x2 ). A different type of example is provided by f (x) = sin (1/x). (b) The function f (x) = |x| belongs to C[−1, 1] but not to C 1 [−1, 1]. 13. Let V be the space of all infinite sequences of real numbers, and define {xn } + {yn } = {xn + yn }, α{xn } = {αxn }. The proof that V is a vector space is a straightforward verification of the defining properties, no different than for Rn , and will not be given here. 14. Let V be the set of all piecewise continuous functions f : [a, b] → R, with addition and scalar multiplication defined as usual for functions. We wish to show that V is a vector space over R. Most of the properties of a vector space are automatically satisfied by V because it is a subset of the space of all real-valued functions on [a, b], which is known to be a vector space. Specifically, commutativity and associativity of addition, the associative property of scalar multiplication, the two distributive laws, and the fact that 1 · u = u for all u ∈ V are all obviously satisfied. Moreover, the 0 function is continuous and hence by definition piecewise continuous, and therefore 0 ∈ V . It remains only to show that V is closed under addition and scalar multiplication (then, since −u = −1 · u for any function u, each function u ∈ V must have an additive inverse in V ). Let u ∈ V , α ∈ R, and suppose u has points of discontinuity x1 < x2 < · · · < xk−1 , where x1 > x0 = a and xk−1 < xk = b. Then u is continuous on each interval (xi−1 , xi ), i = 1, 2, . . . , k, and therefore, by a simple theorem of calculus (any multiple of a continuous function is continuous), αu is also continuous on each (xi−1 , xi ). The one-sided limits of αu at x0 , x1 , . . . , xk exist since, for example, lim αu(x) = α lim u(x)
x→x+ i
x→x+ i
(and similarly for left-hand limits). Therefore, αu is piecewise continuous and therefore αu ∈ V . Now suppose u, v belong to V . Let {x1 , x2 , . . . , x −1 } be the union of the sets of points of discontinuity of u and of v, ordered so that a = x0 < x1 < · · · < x −1 = x = b. Then, since both u and v are continuous at all other points in (a, b), u + v is continuous on every interval (xi−1 , xi ). Also, at each xi , either limx→xi u(x) exists (if u is continuous at xi , that is, if xi is a point of discontinuity only for v), or the one-sided limits limx→x+ u(x) and limx→x− u(x) both exist. In the first case, the two one-sided limits i i exist (and are equal), so in any case the two one-sided limits exist. The same is true for v. Thus, for each xi , i = 0, 1, . . . , − 1, lim+ (u(x) + v(x)) = lim+ u(x) + lim+ v(x), x→xi
x→xi
x→xi
and similarly for the left-hand limits at x1 , x2 , . . . , x . This shows that u + v is piecewise continuous, and therefore belongs to V . This completes the proof. 15. Suppose U and V are vector spaces over a field F , and define addition and scalar multiplication on U × V by (u, v) + (w, z) = (u + w, v + z), α(u, v) = (αu, αv). We wish to prove that U × V is a vector space over F . In fact, the verifications of all the defining properties of a vector space are straightforward. For instance, (u, v) + (w, z) = (u + w, v + z) = (w + u, z + v) = (w, z) + (u, v) (using the commutativity of addition in U and V ), and therefore addition in U × V is commutative. Note that the additive identity in U × V is (0, 0), where the first 0 is the zero vector in U and the second is the zero vector in V . We will not verify the remaining properties here.
2.3
Subspaces
1. Let V be a vector space over F . (a) Let S = {0}. Then 0 ∈ S, S is closed under addition since 0 + 0 = 0 ∈ S, and S is closed under scalar multiplication since α · 0 = 0 ∈ S for all α ∈ F . Thus S is a subspace of V . (b) The entire space V is a subspace of V since 0 ∈ V and V is closed under addition and scalar multiplication by definition.
2.3. SUBSPACES
13
2. Suppose we adopt an alternate definition of subspace, in which “0 ∈ S” is replaced with “S is nonempty.” We wish to show that the alternate definition is equivalent to the original definition. If S is a subspace according to the original definition, then 0 ∈ S, and therefore S is nonempty. Hence S is a subspace according to the alternate definition. Conversely, suppose S satisfies the alternate definition. Then S is nonempty, so there exists x ∈ S. Since S is closed under scalar multiplication and addition, it follows that −x = −1 · x ∈ S, and hence 0 = −x + x ∈ S. Therefore, S satisfies the original definition of subspace. 3. Let V be a vector space over R, and let v ∈ V be nonzero. We wish to prove that S = {0, v} is not a subspace of V . If S were a subspace, then 2v would lie in S. But 2v = 0 by Theorem 5, and 2v = v (since otherwise adding −v to both sides would imply that v = 0). Hence 2v ∈ S, and therefore S is not a subspace of V . 4. We wish to determine which of the given subsets are subspaces of Z32 . Notice that since Z2 = {0, 1}, if S contains the zero vector, then it is automatically closed under scalar multiplication. Therefore, we need only check whether the given subset contains (0, 0, 0) and is closed under addition. (a) S = {(0, 0, 0), (1, 0, 0)}. This set contains (0, 0, 0) and is closed under addition since (0, 0, 0) + (0, 0, 0) = (0, 0, 0), (0, 0, 0) + (1, 0, 0) = (1, 0, 0), and (1, 0, 0) + (1, 0, 0) = (0, 0, 0). Thus S is a subspace. (b) S = {(0, 0, 0), (0, 1, 0), (1, 0, 1), (1, 1, 1)}. This set contains (0, 0, 0) and it can be verified that it is closed under addition (for instance, (0, 1, 0) + (1, 0, 1) = (1, 1, 1), (1, 1, 1) + (0, 1, 0) = (1, 0, 1), etc.). Thus S is a subspace. (c) S = {(0, 0, 0), (1, 0, 0), (0, 1, 0), (1, 1, 1)} is not a subspace because it is not closed under addition: (1, 0, 0) + (0, 1, 0) = (1, 1, 0) ∈ S. 5. Suppose S is a subset of Zn2 . We wish to show that S is a subspace of Zn2 if and only if 0 ∈ S and S is closed under addition. Of course, the “only if” direction is trivial. The other direction follows as in the preceding exercise: If 0 ∈ S, then S is automatically closed under scalar multiplication, since 0 and 1 are the only elements of the field Z2 , and 0 · v = 0 for all v ∈ S, 1 · v = v for all v ∈ S. 6. Define S = {x ∈ R2 : x1 ≥ 0, x2 ≥ 0}. Then S is not a subspace of R2 , since it is not closed under scalar multiplication. For instance, (1, 1) ∈ S but −1 · (1, 1) = (−1, −1) ∈ S. 7. Define S = {x ∈ R2 : ax1 + bx2 = 0}, where a, b ∈ R are constants. We will show that S is subspace of R2 . First, (0, 0) ∈ S, since a · 0 + b · 0 = 0. Next, suppose x ∈ S and α ∈ R. Then ax1 + bx2 = 0, and therefore a(αx1 ) + b(αx2 ) = α(ax1 + bx2 ) = α · 0 = 0. This shows that αx ∈ S, and therefore S is closed under scalar multiplication. Finally, suppose x, y ∈ S, so that ax1 + bx2 = 0 and ay1 + by2 = 0. Then a(x1 + y1 ) + b(x2 + y2 ) = (ax1 + bx2 ) + (ay1 + by2 ) = 0 + 0 = 0, which shows that x + y ∈ S, and therefore that S is closed under addition. This completes the proof. 8. (a) The set A = {x ∈ R2 : x1 = 0 or x2 = 0} is closed under scalar multiplication but not addition. Closure under scalar multiplication holds since if x1 = 0, then (αx)1 = αx1 = α · 0 = 0, and similarly for the second component. The set is not closed under addition; for instance, (1, 0), (0, 1) ∈ A, but (1, 0) + (0, 1) = (1, 1) ∈ A. (b) The set Q = {x ∈ R2 : x1 ≥ 0, x2 ≥ 0} is closed under addition but not scalar multiplication. Since (1, 1) ∈ Q but −1 · (1, 1) = (−1, −1) ∈ Q, we see that Q is not closed under scalar multiplication. On the other hand, if x, y ∈ Q, so that x1 , x2 , y1 , y2 ≥ 0, we see that (x + y)1 = x1 + y1 ≥ 0 + 0 = 0 and (x + y)2 = x2 + y2 ≥ 0 + 0 = 0. This shows that x + y ∈ Q, and therefore Q is closed under addition. 9. Let V be a vector space over a field F , let u ∈ V , and define S = {αu : α ∈ F }. We will show that S is a subspace of V . First, 0 ∈ V because 0 = 0 · u. Next, suppose x ∈ S and β ∈ F . Since x ∈ S, there exists α ∈ F such that x = αu. Therefore, βx = β(αu) = (βα)u (using the associative property of scalar multiplication, which shows that βx belongs to S. Thus S is closed under scalar multiplication. Finally, suppose x, y ∈ S; then there exist α, β ∈ F such that x = αu, y = βu, and x + y = αu + βu = (α + β)u
CHAPTER 2. FIELDS AND VECTOR SPACES
14
by the second distributive property. Therefore S is closed under addition, and we have shown that S is a subspace. 10. Let R be regarded as a vector space over R. We wish to prove that R has no proper subspaces. It suffices to prove that if S is a nontrivial subspace of R, then S = R. So suppose S is a nontrivial subspace, which means that there exists x = 0 belonging to S. But then, given any y ∈ R, y = (yx−1 )x belongs to S because S is closed under scalar multiplication. Thus R ⊂ S, and hence S = R. 11. We wish to describe all proper subspaces of R2 . We claim that every proper subspace of R2 has the form {αx : α ∈ R, α = 0}, where x ∈ R2 is nonzero (geometrically, such a set is a line through the origin). To prove, this, let us suppose S is a proper subspace of R2 . Then there exists x ∈ S, x = 0. Since S is closed under scalar multiplication, every vector of the form αx, α ∈ R, must belong to S. Therefore, S contains the set {αx : α ∈ R, α = 0}. Let us suppose that there exists y ∈ S such that y cannot be written as y = αx for some α ∈ R. In this case, we argue that every z ∈ R2 belongs to S, and hence S is not a proper subspace of R2 . To justify this conclusion, we first note that, since y is not a multiple of x, x1 y2 − x2 y1 = 0. Let z ∈ R2 be given and consider the equation αx + βy = z. It can be verified directly that α = (y2 z1 − y1 z2 )/(x1 y2 − x2 y1 ), β = (x1 z2 − x2 z1 )/(x1 y2 − x2 y1 ) satisfy this equation, from which it follows that z ∈ S (since S is closed under addition and scalar multiplication). Therefore, if S contains any vector not lying in {αx : α ∈ R, α = 0}, then S consists of all of R2 , and S is not a proper subspace of R2 . 12. We wish to find a proper subspace of R3 that is not a plane. One such subspace is the x1 -axis: S = {x ∈ R3 : x2 = x3 = 0}. It is easy to verify that S is a subspace of R3 , and geometrically, S is a line. More generally, using the results of Exercise 10, we can show that {αx : α ∈ R}, where x = 0 is a given vector, is a proper subspace of R3 . Such a subspace represents a line through the origin. 13. Consider the subset Rn of Cn . Although Rn contains the zero vector and is closed under addition, it is not closed under scalar multiplication, and hence is not a subspace of Cn . Here the scalars are complex numbers (since Cn is a vector space over C), and, for example, (1, 0, . . . , 0) ∈ Rn , i ∈ C, and i(1, 0, . . . , 0) = (i, 0, . . . , 0) does not belong to Rn . 14. Let S = {u ∈ C[a, b] : u(a) = u(b) = 0}. Then S is a subspace of C[a, b]. The zero function clearly belongs to S. Suppose u ∈ S and α ∈ R. Then (αu)(a) = αu(a) = α · 0 = 0, and similarly (αu)(b) = 0. It follows that αu ∈ S, and S is closed under scalar multiplication. If u, v ∈ S, then (u + v)(a) = u(a) + v(a) = 0 + 0 = 0, and similarly (u + v)(b) = 0. Therefore S is closed under addition, and we have shown that S is a subspace of C[a, b]. 15. Let S = {u ∈ C[a, b] : u(a) = 1}. Then S is not a subspace of C[a, b] because the zero function does not belong to S. b
16. Let S = u ∈ C[a, b] : a u(x) dx = 0 . We will show that S is a subspace of C[a, b]. First, since the integral of the zero function is zero, we see that the zero function belongs to S. Next, suppose u ∈ S b b b and α ∈ R. Then a (αu)(x) dx = a αu(x) dx = α a u(x) dx = α · 0 = 0, and therefore αu ∈ S. Finally, b b b b suppose u, v ∈ S. Then a (u + v)(x) dx = a (u(x) + v(x)) dx = a u(x) dx + a v(x) dx = 0 + 0 = 0. This shows that u + v ∈ S, and we have proved that S is a subspace of C[a, b]. 17. Let V be the vector space of all (infinite) sequences of real numbers. (a) Define Z = {{xn } ∈ V : limn→∞ xn = 0}. Clearly the zero seqence converges to zero, and hence belongs to Z. If {xn } ∈ Z and α ∈ R, then limn→∞ αxn = α limn→∞ xn = α · 0 = 0, which implies that α{xn } = {αxn } belongs to Z, and therefore Z is closed under scalar multiplication. Now suppose {xn }, {yn } both belong to Z. Then limn→∞ (xn +yn ) = limn→∞ xn +limn→∞ yn = 0+0 = 0. Therefore {xn } + {yn } = {xn + yn } belongs to Z, Z is closed under addition, and we have shown that Z is a subspace of V .
2.3. SUBSPACES
15
∞ ∞ (b) Define S = {{xn } ∈ V : n=1 xn < ∞}. From calculus, we know that if n=1 xn converges, then ∞ ∞ ∞ ∞ so does n=1 αxn = α n=1 xn for any α ∈ R. Similarly, if n=1 xn , n=1 yn converge, then so ∞ ∞ ∞ does n=1 (xn + yn ) = n=1 xn + n=1 yn . Using these facts, it is straightforward to show that S is closed under addition and scalar multiplication. Obviously the zero sequence belongs to S. ∞
2 (c) Define L = {{xn } ∈ V : n=1 xn < ∞}. Here is it obvious that the zero sequence belongs to L and that L is closed under scalar multiplication. To prove that L is closed under addition, notice that, for any x, y ∈ R, (x − y)2 ≥ 0 and (x + y)2 ≥ 0 together imply that |xy| ≤ (x2 + y 2 )/2. It ∞ 2 2 follows that (xn + yn )2 = x2n + 2xn yn + yn2 ≤ 2(x2n + yn2 ). It follows that if ∞ n=1 xn and n=1 yn ∞ ∞ ∞ ∞ both converge, and so does n=1 (xn + yn )2 , with n=1 (xn + yn )2 ≤ 2 n=1 x2n + 2 n=1 yn2 . From this we see that L is closed under addition, and thus L is a subspace. ∞
By a common theorem of calculus, we know that if n=1 xn converges, then limn→∞ xn = 0, and the 2 same is true if ∞ n=1 xn converges. Therefore, S and L are subspaces of Z. However, the converse of this result is not true (if the sequence converges to zero, this does not imply that the corresponding series converges). Therefore, S and L are proper subspaces of Z. We know that L is not a subspace of S; for ∞ ∞ instance n=1 (1/n2 ) converges, but n=1 (1/n) does √ not, which shows that {1/n} belongs to L but not to S. Also, S is not a subspace of L, since {(−1)n / n} belongs to S (by the alternating series test) but not to L. 18. Let V be a vector space over a field F , and let X and Y be subspaces of V . (a) We will show that X ∩ Y is also a subspace of V . First of all, since 0 ∈ X and 0 ∈ Y , it follows that 0 ∈ X ∩ Y . Next, suppose x ∈ X ∩ Y and α ∈ F . Then, by definition of intersection, x ∈ X and x ∈ Y . Since X and Y are subspaces, both are closed under scalar multiplication and therefore αx ∈ X and αx ∈ Y , from which it follows that α ∈ X ∩ Y . Thus X ∩ Y is closed under scalar multiplication. Finally, suppose x, y ∈ X ∩ Y . Then x, y ∈ X and x, y ∈ Y . Since X and Y are closed under addition, we have x + y ∈ X and x + y ∈ Y , from which we see that x + y ∈ X ∩ Y . Therefore, X ∩ Y is closed under addition, and we have proved that X ∩ Y is a subspace of V . (b) It is not necessarily that case that X ∪ Y is a subspace of V . For instance, let V = R2 , and define X = {x ∈ R2 : x2 = 0}, Y = {x ∈ R2 : x1 = 0}. Thus X ∪ Y is not closed under addition, and hence is not a subspace of R2 . For instance, (1, 0) ∈ X ⊂ X ∪ Y and (0, 1) ∈ Y ⊂ X ∪ Y ; however, (1, 0) + (0, 1) = (1, 1) ∈ X ∪ Y . 19. Let V be a vector space over a field F , and let S be a nonempty subset of V . Define T to be the intersection of all subspaces of V that contain S. (a) We wish to show that T is a subspace of V . First, 0 belongs to every subspace of V that contains S, and therefore 0 belongs to the intersection T . Next, suppose x ∈ T and α ∈ F . Then x belongs to every subspace of V containing S. Since each of these subspaces is closed under scalar multiplication, it follows that αx also belongs to each subspace, and therefore αx ∈ T . Therefore, T is closed under scalar multiplication. Finally, suppose x, y ∈ T . Then both x and y belong to every subspace of V containing S. Since each subspace is closed under addition, it follows that x + y belongs to every subspace of V containing S. Therefore x + y ∈ T , T is closed under addition, and we have shown that T is a subspace. (b) Now suppose U is any subspace of V containing S. Then U is one of the sets whose intersection defines T , and therefore every element of T belongs to U by definition of intersection. It follows that T ⊂ U . This means that T is the smallest subspace of V containing S. 20. Let V be a vector space over a field F , and let S, T be subspaces of V . Define S+T = {s+t : s ∈ S, t ∈ T }. We wish to show that S + T is a subspace of V . First of all, 0 ∈ S and 0 ∈ T because S and T are subspaces. Therefore, 0 = 0+0 ∈ S +T . Next, suppose x ∈ S +T and α ∈ F . Then, by definition of S +T , there exist s ∈ S, t ∈ T such that x = s + t. Since S and T are subspaces, they are closed under scalar multiplication, and therefore αs ∈ S and αt ∈ T . It follows that αx = α(s + t) = αs + αt ∈ S + T . Thus S + T is closed under scalar multiplication. Finally, suppose x, y ∈ S + T . Then there exist s1 , s2 ∈ S,
CHAPTER 2. FIELDS AND VECTOR SPACES
16
t1 , t2 ∈ T such that x = s1 + t1 , y = s2 + t2 . Since S and T are closed under addition, we see that s1 + s2 ∈ S, t1 + t2 ∈ T , and therefore x + y = (s1 + t1 ) + (s2 + t2 ) = (s1 + s2 ) + (t1 + t2 ) ∈ S + T . It follows that S + T is closed under addition, and we have shown that S + T is a field.
2.4
Linear combinations and spanning sets
1. Write u1 = (−1, −2, 4, −2), u2 = (0, 1, −5, 4). (a) With v = (−1, 0, −6, 6), the equation α1 u1 + α2 u2 = v has a (unique) solution: α1 = 1, α2 = 2. This shows that v ∈ sp{u1 , u2 }. (b) With v = (1, 1, 1, 1), the equation α1 u1 + α2 u2 = v has no solution, and therefore v ∈ sp{u1 , u2 }. 2. Let S = sp{ex , e−x } ⊂ C[0, 1]. (a) The function f (x) = cosh(x) belongs to S because cosh(x) = (1/2)ex + (1/2)e−x . (b) The function f (x) = 1 does not belong to S because there are no scalars α1 , α2 satisfying α1 ex + α2 e−x = 1 for all x ∈ [0, 1]. To prove this, note that any solution α1 , α2 would have to satisfy the equations that result from substituting any three values of x from the interval [0, 1]. For instance, if we choose x = 0, x = 1/2, x = 1, then we obtain the equations α1 + α2 = 1, α1 e
1/2
+ α2 e−1/2 = 1,
α1 e + α2 e−1 = 1. A direct calculation shows that this system is inconsistent. Therefore no solution α1 , α2 exists, and f ∈ S. 3. Let S = sp{1 + 2x + 3x2 , x − x2 } ⊂ P2 . (a) There is a (unique) solution α1 = 2, α2 = 1 to α1 (1 + 2x + 3x2 ) + α2 (x − x2 ) = 2 + 5x + 5x2 . Therefore, 2 + 5x + 5x2 ∈ S. (b) There is no solution α1 , α2 to α1 (1 + 2x + 3x2) + α2 (x − x2 ) = 1 − x + x2 . Therefore, 1 − x + x2 ∈ S. 4. Let u1 = (1 + i, i, 2), u2 = (1, 2i, 2 − i), and define S = sp{u1 , u2 } ⊂ C3 . The vector v = (2 + 3i, −2 + 2i, 5 + 2i) belongs to S because 2u1 + iu2 = v. 5. Let S = sp{(1, 2, 0, 1), (2, 0, 1, 2)} ⊂ Z43 . (a) The vector (1, 1, 1, 1) belongs to S because 2(1, 2, 0, 1) + (2, 0, 1, 2) = (1, 1, 1, 1). (b) The vector (1, 0, 1, 1) does not belong to S because α1 (1, 2, 0, 1) + α2 (2, 0, 1, 2) = (1, 0, 1, 1) has no solution. 6. Let S = sp{1 + x, x + x2 , 2 + x + x2 } ⊂ P3 (Z3 ). (a) If p(x) = 1 + x + x2 , then 0(1 + x) + 2(x + x2 ) + 2(2 + x + x2 ) = p(x), and therefore p ∈ S. (b) Let q(x) = x3 . Recalling that P3 (Z3 ) is a space of polynomials functions, we notice that q(0) = 0, q(1) = 1, q(2) = 2, which means that q(x) = x for all x ∈ Z3 . We have 1(1 + x) + 2(x + x2 ) + 1(2 + x + x2 ) = x = q(x), and therefore q ∈ S. 7. Let u = (1, 1, −1), v = (1, 0, 2) be vectors in R3 . We wish to show that S = sp{u, v} is a plane in R3 . First note that if S = {x ∈ R3 : ax1 + bx2 + cx3 = 0}, then (taking x = u, x = v) we see that a, b, c must satisfy a + b − c = 0, a + 2c = 0. One solution is a = 2, b = −3, c = −1. We will now prove that S = {x ∈ R3 : 2x1 − 3x2 − x3 = 0}. First, suppose x ∈ S. Then there exist α, β ∈ R such that x = αu+βv = α(1, 1, −1)+β(1, 0, 2) = (α+β, α, −α+2β), and 2x1 −3x2 −x3 = 2(α+β)−3α−(−α+2β) =
2.4. LINEAR COMBINATIONS AND SPANNING SETS
17
2α + 2β − 3α + α − 2β = 0. Therefore, x ∈ {x ∈ R3 : 2x1 − 3x2 − x3 = 0}. Conversely, suppose x ∈ {x ∈ R3 : 2x1 − 3x2 − x3 = 0}. If we solve the equation αu + βv = x, we see that it has the solution α = x2 , β = x1 −x2 , and therefore x ∈ S. (Notice that x2 (1, 1, −1)+(x1 −x2 )(1, 0, 2) = (x1 , x2 , 2x1 −3x2 ), and the assumption 2x1 − 3x2 − x3 = 0 implies that 2x1 − 3x2 = x3 .) This completes the proof. 8. The previous exercise does not hold true for every choice of u, v ∈ R3 . For instance, if u = (1, 1, 1), v = (2, 2, 2), then S = sp{u, v} is not a plane; in fact, S is easily seen to be the line passing through (0, 0, 0) and (1, 1, 1). 9. Let v1 = (1, −2, 1, 2), v2 = (−1, 1, 2, 1), and v3 = (−7, 9, 8, 1) be vectors in R4 , and let S = sp{v1 , v2 , v3 }. Suppose x ∈ S, say x = β1 v1 + β2 v2 + β3 v3 = (β1 − β2 − 7β3 , −2β1 + β2 + 9β3, β1 + 2β2 + 8β3 , 2β1 + β2 + β3 ). The equation α1 v1 + α2 v2 = (β1 − β2 − 7β3 , −2β1 + β2 + 9β3 , β1 + 2β2 + 8β3 , 2β1 + β2 + β3 ) has a unique solution, namely, α1 = β1 − 2β3 , α2 = β2 + 5β3 . This shows that x is a linear combination of v1 , v2 alone. Alternate solution: We can solve α1 v1 + α2 v2 + α3 v3 = 0 to obtain α1 = −2, α2 = 5, α3 = −1, which means that −2v1 + 5v2 − v3 = 0 or v3 = −2v1 + 5v2 . Now suppose x ∈ S, say x = β1 v1 + β2 v2 + β3 v3 . It follows that x = β1 v1 + β2 v2 + β3 (−2v1 + 5v2 ) = (β1 − 2β3 )v1 + (β2 + 5β3 )v2 , and therefore x can be written as a linear combination of v1 and v2 alone. 10. Let u1 = (1, 1, 1), u2 = (1, −1, 1), u3 = (1, 0, 1), and define S1 = sp{u1 , u2 }, S2 = sp{u1 , u2 , u3 }. We wish to prove that S1 = S2 . We first note that if x ∈ S1 , then there exists scalars α1 , α2 such that x = α1 u1 + α2 u2 . But then x can be written as x = α1 u1 + α2 u2 + 0 · u3 , which shows that x is a linear combination of u1 , u2 , u3 , and hence x ∈ S2 . Conversely, suppose that x ∈ S2 , say x = β1 u1 + β2 u2 + β3 u3 = (β1 + β2 + β3 , β1 − β2 , β1 + β2 + β3 ). We wish to show that x can be written as a linear combination of u1 , u2 alone, that is, that there exist scalars α1 , α2 such that α1 u1 + α2 u2 = x. A direct calculation shows that this equation has a unique solution, namely, α1 = β1 + β3 /2, α2 = β2 + β3 /2. This shows that x ∈ sp{u1 , u2 } = S1 , and the proof is complete. (The second part of the proof can be done as in the previous solution, by first showing that u3 = (1/2)u1 + (1/2)u2 .) 11. Let S = sp{(−1, −3, 3), (−1, −4, 3), (−1, −1, 4)} ⊂ R3 . We wish to determine if S = R3 or if S is a proper subspace of R3 . Given an arbitrary x ∈ R3 , we solve α1 (−1, −1, 3) + α2 (−1, −4, 3) + α3 (−1, −1, 4) = (x1 , x2 , x3 ) and find that there is a unique solution, namely, α1 = −13x1 + x2 − 3x3 , α2 = 9x1 − x2 + 2x3 , α3 = 3x1 + x3 . This shows that every x ∈ R3 lies in S, and therefore S = R3 . 12. Let S = sp{(−1, −5, 1), (3, 14, −4), (1, 4, −2)}. Given an arbitary x ∈ R3 , if we try to solve α1 (−1, −5, 1)+ α2 (3, 14, −4) + α3(1, 4, −2) = (x1 , x2 , x3 ), we find that there is a solution if and only if 6x1 − x2 + x3 = 0. Since not all x ∈ R3 satisfy this condition, S is a proper subspace of R3 . 13. Let S = sp{1 − x, 2 − 2x + x2 , 1 − 3x2 } ⊂ P2 . We wish to determine if S is a proper subspace of P2 . Given any p ∈ P2 , say p(x) = c0 + c1 x + c2 x2 , we try to solve α1 (1 − x) + α2 (2 − 2x + x2 ) + α3 (1 − 3x2 ) = c0 + c1 x + c2 x2 . We find that there is a unique solution, α1 = −6c0 − 7c1 − 2c2 , α2 = 3c0 + 3c1 + c2 , α3 = c0 + c1 . Therefore, each p ∈ P2 belongs to S, and therefore S = P2 . 14. Suppose V is a vector space over a field F and S is a subspace of V . We wish to prove that u1 , . . . , uk ∈ S, α1 , . . . , αk ∈ F imply that α1 u1 + · · · + αk uk ∈ S. We argue by induction on k. For k = 1, we have that α1 u1 ∈ S because S is a subspace and therefore closed under scalar multiplication. Now suppose that, for some k ≥ 2, α1 u1 + · · · + αk−1 uk−1 ∈ S for any u1 , . . . , uk−1 ∈ S, α1 , . . . , αk−1 ∈ F . Let u1 , . . . , uk ∈ S, α1 , . . . , αk ∈ F be arbitrary. Then α1 u1 + . . . + αk uk = (α1 u1 + · · · + αk−1 uk−1 ) + αk uk . By the induction hypothesis, α1 u1 + · · · + αk−1 uk−1 ∈ S, and αk uk ∈ S because S is closed under scalar multiplication. But then α1 u1 + . . . + αk uk = (α1 u1 + · · · + αk−1 uk−1 ) + αk uk ∈ S because S is closed under addition. Therefore, by induction, the result holds for all k ≥ 1, and the proof is complete. 15. Let V be a vector space over a field F , and let u ∈ V , u = 0, α ∈ F . We wish to prove that sp{u} = sp{u, αu}. First, if x ∈ sp{u}, then x = βu for some β ∈ F , in which case we can write x = βu + 0(αu), which shows that x also belongs to sp{u, αu}. Conversely, if x ∈ sp{u, αu}, then there exist scalars β, γ ∈ F such that x = βu + γ(αu). But then x = (β + γα)u, and therefore x ∈ sp{u}. Thus sp{u} = sp{u, αu}.
CHAPTER 2. FIELDS AND VECTOR SPACES
18
16. Let V be a vector space over a field F , and suppose x, u1 , . . . , uk , v1 , . . . , v are vectors in V . Assume x ∈ sp{u1 , . . . , uk } and uj ∈ sp{v1 , . . . , v } for j = 1, . . . , k. We wish to show that x ∈ sp{v1 , . . . , v }. Since uj ∈ sp{v1 , . . . , v }, there exist scalars βj,1 , . . . , βj, such that uj = βj,1 v1 + · · · + βj, v . This is true for each uj , j = 1, . . . , k. Also, x ∈ sp{u1 , . . . , uk }, so there exist α1 , . . . , αk ∈ F such that x = α1 u1 + · · · + αk uk . It follows that x = α1 (β1,1 v1 + · · · + β1, v ) + α2 (β2,1 v1 + · · · + β2, v ) + · · · + αk (βk,1 v1 + · · · + βk, v ) = α1 β1,1 v1 + · · · + α1 β1, v + α2 β2,1 v1 + · · · + α2 β2, v + · · · + αk βk,1 v1 + · · · + αk βk, v = (α1 β1,1 + α2 β2,1 + · · · + αk βk,1 )v1 + (α1 β1,2 + α2 β2,2 + · · · + αk βk,2 )v2 + · · · + (α1 β1, + α2 β2, + · · · + αk βk, )v . This shows that x ∈ sp{v1 , . . . , v }. 17. (a) Let V be a vector space over R, and let u, v be any two vectors in V . We wish to prove that sp{u, v} = sp{u + v, u − v}. We first suppose that x ∈ sp{u + v, u − v}, say x = α(u + v) + β(u − v). Then x = αu + αv + βu − βv = (α + β)u + (α − β)v, which shows that x ∈ sp{u, v}. Conversely, suppose that x ∈ sp{u, v}, say x = αu + βv. We notice that u = (1/2)(u + v) + (1/2)(u − v), and v = (1/2)(u + v) − (1/2)(u − v). Therefore, x = α((1/2)(u + v) + (1/2)(u − v)) + β((1/2)(u + v) − (1/2)(u − v)) = (α/2 + β/2)(u + v) + (α/2 − β/2)(u − v), which shows that x ∈ sp{u + v, u − v}. Therefore, x ∈ sp{u, v} if and only if x ∈ sp{u + v, u − v}, and hence the two subspaces are equal. (b) The result just proved does not necessarily hold if V is a vector space over an arbitrary field F . More specifically, the first part of the proof is always valid, and therefore sp{u + v, u − v} ⊂ sp{u, v} always holds. However, it is not always possible to write u and v in terms of u + v and u − v, and therefore sp{u, v} ⊂ sp{u + v, u − v} need not hold. For example, if F = Z2 , V = Z22 , u = (1, 0), v = (0, 1), then we have u + v = (1, 1) and u − v = (1, 1) (since −1 = 1 in Z2 ). It follows that sp{u + v, u − v} = {(0, 0), (1, 1)}, and hence u, v ∈ sp{u + v, u − v}, which in turn means that sp{u, v} ⊂ sp{u + v, u − v}.
2.5
Linear independence
1. Let V be a vector space over a field F , and let u1 , u2 ∈ V . We wish to prove that {u1 , u2 } is linearly dependent if and only if one of these vectors is a multiple of the other. Suppose first that {u, v} is linearly dependent. Then there exist scalars α1 , α2 , not both zero, such that α1 u1 + α2 u2 = 0. Suppose α1 = 0; −1 then α−1 1 exists, and we have α1 u1 + α2 u2 = 0 ⇒ α1 u1 = −α2 u2 ⇒ u1 = −α1 α2 u2 . Therefore, in this case, u1 is a multiple of u2 . Similarly, if α2 = 0, we can show that u2 is a multiple of u1 . Conversely, suppose one of u1 , u2 is a multiple of the other, say u1 = αu2 . We can then write u1 − αu2 = 0, or 1 · u1 + (−α)u2 = 0, which, since 1 = 0, shows that {u1 , u2 } is linearly dependent. A similar proof shows that if u2 is a multiple of u1 , then {u1 , u2 } is linearly dependent. This completes the proof. 2. Let V be a vector space over a field F , and suppose v ∈ V . We wish to prove that {v} is linearly independent if and only if v = 0. First, if v = 0, then αv = 0 implies that α = 0 by Theorem 5. It follows that {v} is linearly independent if v = 0. On the other hand, if v = 0, then 1 · v = 0, which shows that {v} is linearly dependent (there is a nontrivial solution to αv = 0). Thus {v} is linearly independent if and only if v = 0. 3. Let V be a vector space over a field F , and let u1 , . . . , un ∈ V . Suppose ui = 0 for some i, 1 ≤ i ≤ n, and define scalars α1 , . . . , αn ∈ F by αk = 0 if k = i, αi = 1. Then α1 u1 + · · · + αn un = 0 · u1 + · · · + 0 · ui−1 + 1 · 0 + 0 · ui+1 + · · · + 0 · un = 0, and hence there is a nontrivial solution to α1 u1 + · · · + αn un = 0. This shows that {u1 , . . . , un } is linearly dependent. 4. Let V be a vector space over a field F , let {u1 , . . . , uk } be a linearly independent subspace of V , and assume v ∈ V , v ∈ sp{u1 , . . . , uk }. We wish to show that {u1 , . . . , uk , v} is also linearly independent. We argue by contradiction and assume that {u1 , . . . , uk , v} is linearly dependent. Then there exist scalars
2.5. LINEAR INDEPENDENCE
19
α1 , . . . , αk , β, not all zero, such that α1 u1 +· · ·+αk uk +βv = 0. We now consider two cases. First, if β = 0, then not all of α1 , . . . , αk are zero, and we see that α1 u1 + · · · + αk uk = α1 u1 + · · · + αk uk + 0 · v = 0. This contradicts the fact that {u1 , . . . , uk } is linearly independent. Second, if β = 0, then we can solve α1 u1 + · · · + αk uk + βv = 0 to obtain v = −β −1 α1 u1 − · · · − β −1 αk uk , which contradicts that v ∈ sp{u1 , . . . , uk }. Thus, in either case, we obtain a contradiction, and the proof is complete. 5. We wish to determine whether each of the following sets is linearly independent or not. (a) The set {(1, 2), (1, −1)} ⊂ R2 is linearly independent by Exercise 1, since neither vector is a multiple of the other. (b) The set {(−1, −1, 4), (−4, −4, 17), (1, 1, −3)} is linearly dependent. Solving α1 (−1, −1, 4) + α2 (−4, −4, 17) + α3 (1, 1, −3) = 0 shows that α1 = 5, α2 = −1, α3 = 1 is a nontrivial solution. (c) The set {(−1, 3, −2), (3, −10, 7), (−1, 3, −1)} is linearly independent. Solving α1 (−1, 3, −2) + α2 (3, −10, 7) + α3 (−1, 3, −1) = 0 shows that the only solution is α1 = α2 = α3 = 0. 6. We wish to determine whether each of the following sets of polynomials is linearly independent or not. (a) The set {1 − x2 , x + x2 , 3 + 3x − 4x2 } ⊂ P2 is linearly independent since the only solution to α1 (1 − x2 ) + α2 (x + x2 ) + α3 (3 + 3x − 4x2 ) = 0 is α1 = α2 = α3 = 0. (b) The set {1 + x2 , 4 + 3x2 + 3x3 , 3 − x + 10x3 , 1 + 7x2 − 18x3 } ⊂ P3 is linearly dependent. Solving α1 (1 + x2 ) + α2 (4 + 3x2 + 3x3 ) + α3 (3 − x + 10x3 ) + α4 (1 + 7x2 − 18x3 ) = 0 yields a nontrivial solution α1 = −25, α2 = 6, α3 = 0, α4 = 1. 7. The set {ex , e−x , cosh (x)} ⊂ C[0, 1] is linearly dependent since (1/2)ex + (1/2)e−x − cosh (x) = 0 for all x ∈ [0, 1]. 8. The subset {(0, 1, 2), (1, 2, 0), (2, 0, 1)} of Z33 is linearly dependent because 1 · (0, 1, 2) + 1 · (1, 2, 0) + 1 · (2, 0, 1) = (0, 0, 0). 9. We wish to show that {1, x, x2 } is linearly dependent in P2 (Z2 ). The equation α1 · 1 + α2 x + α3 x2 = 0 has the nontrivial solution α1 = 0, α2 = 1, α3 = 1. To verify this, we must simply verify that x + x2 is the zero function in P2 (Z2 ). Substituting x = 0, we obtain 0 + 02 = 0 + 0 = 0, and with x = 1, we obtain 1 + 12 = 1 + 1 = 0. 10. The set {(i, 1, 2i), (1, 1+i, i), (1, 3+5i, −4+3i)} ⊂ C3 is linearly dependent, because α1 (i, 1, 2i)+α2(1, 1+ i, i) + α3 (1, 3 + 5i, −4 + 3i) = (0, 0, 0) has the nontrivial solution α1 = −2i, α2 = −3, α3 = 1. 11. We have already seen that {(3, 2, 2, 3), (3, 2, 1, 2), (3, 2, 0, 1)} ⊂ R4 is linearly dependent, because (3, 2, 2, 3)− 2(3, 2, 1, 2) + (3, 2, 0, 1) = (0, 0, 0, 0). (a) We can solve this equation for any one of the vectors in terms of the other two; for instance, (3, 2, 2, 3) = 2(3, 2, 1, 2) − (3, 2, 0, 1). (b) We can show that (−3, −2, 2, 1) ∈ sp{(3, 2, 2, 3), (3, 2, 1, 2), (3, 2, 0, 1)} by solving α1 (3, 2, 2, 3) + α2 (3, 2, 1, 2) + α3 (3, 2, 0, 1) = (−3, −2, 2, 1). One solution is (−3, −2, 2, 1) = 3(3, 2, 2, 3) − 4(3, 2, 1, 2). Substituting (3, 2, 2, 3) = 2(3, 2, 1, 2) − (3, 2, 0, 1), we obtain another solution: (−3, −2, 2, 1) = 2(3, 2, 1, 2) − 3(3, 2, 0, 1).
20
CHAPTER 2. FIELDS AND VECTOR SPACES
12. We wish to show that {(−1, 1, 3), (1, −1, −2), (−3, 3, 13)} ⊂ R3 is linearly dependent by writing one of the vectors as a linear combination of the others. We will try to solve for the third vector in terms of the other two. (There is an element of trial and error involved here: Even if the three vectors form a linearly independent set, there is no guarantee that this will work; it could be, for instance, that the first two vectors form a linearly dependent set and the third vector does not lie in the span of the first two.) Solving α1 (−1, 1, 3) + α2(1, −1, −2) = (−3, 3, 13) yields a unique solution: (−3, 3, 13) = 7(−1, 1, 3) + 4(1, −1, −2). This shows that the set is linearly dependent. Alternate solution: We begin by solving α1 (−1, 1, 3) + α2 (1, −1, −2) + α3 (−3, 3, 13) = (0, 0, 0) to obtain 7(−1, 1, 3)+4(1, −1, −2)−(−3, 3, 13) = (0, 0, 0). We can easily solve this for the third vector: (−3, 3, 13) = 7(−1, 1, 3) + 4(1, −1, −2). 13. We wish to show that {p1 , p2 , p3 }, where p1 (x) = 1 − x2 , p2 (x) = 1 + x − 6x2 , p3 (x) = 3 − 2x2 , is linearly independent and spans P2 . We first verify that the set is linearly independent by solving α1 (1−x2 )+α2 (1+x−6x2 )+α3 (3−2x2 ) = 0. This equation is equivalent to the system α1 +α2 +3α2 = 0, α2 = 0, −α1 − 6α2 − 2α3 = 0, and a direct calculation shows that the only solution is α1 = α2 = α3 = 0. To show that the set spans P2 , we take an arbitrary p ∈ P2 , say p(x) = c0 + c1 x + c2 x2 , and solve α1 (1 − x2 ) + α2 (1 + x − 6x2 ) + α3 (3 − 2x2 ) = c0 + c1 x + c2 x2 . This is equivalent to the system α1 + α2 + 3α2 = c0 , α2 = c1 , −α1 − 6α2 − 2α3 = c2 . There is a unique solution: α1 = −2c0 − 16c1 − 3c2 , α2 = c1 , α3 = c0 +5c1 +c2 . This shows that p ∈ sp{p1 , p2 , p3 }, and, since p was arbitrary, that {p1 , p2 , p3 } spans all of P2 . 14. Let V be a vector space over a field F and let {u1 , . . . , uk } be a linearly independent subset of V . Suppose u, v ∈ V , {u, v} is linearly independent, and u, v ∈ sp{u1 , . . . , uk }. We wish to determine whether {u1 , . . . , uk , u, v} is necessarily linearly independent. In fact, this set need not be linearly independent. For example, take V = R4 , F = R, k = 3, and u1 = (1, 0, 0, 0), u2 = (0, 1, 0, 0), u3 = (0, 0, 1, 0). With u = (0, 0, 0, 1), v = (1, 1, 1, 1), we see immediately that {u, v} is linearly independent (neither vector is a multiple of the other), and that neither u nor v belongs to sp{u1 , u2 , u3 }. Nevertheless, {u1 , u2 , u3 , u, v} is linear dependent because v = u1 + u2 + u3 + u. 15. Let V be a vector space over a field F , and suppose S and T are subspaces of V satisfying S ∩ T = {0}. Suppose {s1 , . . . , sk } ⊂ S and {t1 , . . . , t } ⊂ T are both linearly independent sets. We wish to prove that {s1 , . . . , sk , t1 , . . . , t } is linearly independent. Suppose scalars α1 , . . . , αk , β1 , . . . , β satisfy α1 s1 + · · · + αk sk + β1 t1 + · · · + β t = 0. We can rearrange this equation to read α1 s1 + · · · + αk sk = −β1 t1 − · · · − β t . If v is the vector represented by these two expressions, then v ∈ S (since v is a linear combination of s1 , . . . , sk ) and v ∈ T (since v is a linear combination of t1 , . . . , t ). But the only vector in S ∩ T is the zero vector, and hence α1 s1 + · · · + αk sk = 0, −β1 t1 − · · · − β t = 0. The first equation implies that α1 = · · · = αk = 0 (since {s1 , . . . , sk } is linearly independent), while the second equation implies that β1 = · · · = β = 0 (since {t1 , . . . , t } is linearly independent). Therefore, α1 s1 +· · ·+αk sk +β1 t1 +· · ·+β t = 0 implies that all the scalars are zero, and hence {s1 , . . . , sk , t1 , . . . , t } is linearly independent. 16. Let V be a vector space over a field F , and let {u1 , . . . , uk } and {v1 , . . . , v } be two linearly independent subsets of V . We wish to find a condition that implies that {u1 , . . . , uk , v1 , . . . , v } is linearly independent. By the previous exercise, a sufficient condition for {u1 , . . . , uk , v1 , . . . , v } to be linearly independent is that S = sp{u1 , . . . , uk }, T = sp{v1 , . . . , v } satisfy S ∩ T = {0}. We will prove that this condition is also necessary. Suppose {u1 , . . . , uk } and {v1 , . . . , v } are linearly independent subsets of V , and that {u1 , . . . , uk , v1 , . . . , v } is also linearly independent. Define S and T as above. If x ∈ S ∩ T , then there exist scalars α1 , . . . , αk ∈ F such that x = α1 u1 + · · ·+ αk uk (since x ∈ S), and also scalars β1 , . . . , β ∈ F such that x = β1 v1 + · · · + β v (since x ∈ T ). But then α1 u1 + · · · + αk uk = β1 v1 + · · · + β v , which implies that α1 u1 + · · ·+ αk uk − β1 v1 − · · ·−β v = 0. Since {u1 , . . . , uk , v1 , . . . , v } is linearly independent by assumption, this implies that α1 = · · · = αk = β1 = · · · = β = 0, which in turn shows that x = 0. Therefore S ∩ T = {0}, and the proof is complete. 17. (a) Let V be a vector space over R, and suppose {x, y, z} is a linearly independent subset of V . We wish to show that {x + y, y + z, x + z} is also linearly independent. Let α1 , α2 , α3 ∈ R satisfy α1 (x + y) +
2.6. BASIS AND DIMENSION
21
α2 (y + z) + α3 (x + z) = 0. This equation is equivalent to (α1 + α3 )x + (α1 + α2 )y + (α2 + α3 )z = 0. Since {x, y, z} is linearly independent, it follows that α1 + α3 = α1 + α2 = α2 + α3 = 0. This system can be solved directly to show that α1 = α2 = α3 = 0, which proves that {x + y, y + z, x + z} is linearly independent. (b) We now show, by example, that the previous result is not necessarily true if V is a vector space over some field F = R. Let V = Z32 , and define x = (1, 0, 0), y = (0, 1, 0), and z = (0, 0, 1). Obviously {x, y, z} is linearly independent. On the other hand, we have (x + y) + (y + z) + (x + z) = (1, 1, 0)+(0, 1, 1)+(1, 0, 1) = (1+0+1, 1+1+0, 0+1+1) = (0, 0, 0), which shows that {x+y, y+z, x+z} is linearly dependent. 18. Let U and V be vector spaces over a field F , and define W = U × V . Suppose {u1 , . . . , uk } ⊂ U and {v1 , . . . , v } ⊂ V are linearly independent. We wish to show that {(u1 , 0), . . . , (uk , 0), (0, v1 ), . . . , (0, v )} is also linearly independent. Suppose α1 , . . . , αk , β1 , . . . , β ∈ F satisfy α1 (u1 , 0) + · · · + αk (uk , 0) + β1 (0, v1 ) + · · · + β (0, v ) = (0, 0). This reduces to (α1 u1 + · · · + αk uk , β1 v1 + · · · + β v ) = (0, 0), which holds if and only if α1 u1 + · · · + αk uk = 0 and β1 v1 + · · · + β v = 0. Since {u1 , . . . , uk } is linearly independent, the first equation implies that α1 = · · · = αk = 0, and, since {v1 , . . . , v } is linearly independent, the second implies that β1 = · · · = β = 0. Since all the scalars are necessarily zero, we see that {(u1 , 0), . . . , (uk , 0), (0, v1 ), . . . , (0, v )} is linearly independent. 19. Let V be a vector space over a field F , and let u1 , u2 , . . . , un be vectors in V . Suppose a nonempty subset of {u1 , u2 , . . . , un }, say {ui1 , . . . , uik }, is linearly dependent. (Here 1 ≤ k < n and i1 , . . . , ik are distinct integers each satisfying 1 ≤ ij ≤ n.) We wish to prove that {u1 , u2 , . . . , un } itself is linearly dependent. By assumption, there exist scalars αi1 , . . . , αik ∈ F , not all zero, such that αi1 ui1 + · · · + αik uik = 0. For each i ∈ {1, . . . , n} \ {i1 , . . . , ik }, define αi = 0. Then we have α1 u1 + · · ·+ αn un = 0 + αi1 ui1 + · · ·+ αik uik = 0, and not all of α1 , . . . , αn are zero since at least one αij is nonzero. This shows that {u1 , . . . , un } is linearly dependent. 20. Let V be a vector space over a field F , and suppose {u1 , u2 , . . . , un } is a linearly independent subset of V . We wish to prove that every nonempty subset of {u1 , u2 , . . . , un } is also linearly independent. The result to be proved is simply the contrapositive of the statement in the previous exercise, and therefore holds by the previous proof. 21. Let V be a vector space over a field F , and suppose {u1 , u2 , . . . , un } is linearly dependent. We wish to prove that, given any i, 1 ≤ i ≤ n, either ui is a linear combination of u1 , . . . , ui−1 , ui+1 , . . . , un or these vectors form a linearly dependent set. By assumption, there exist scalars α1 , . . . , αn ∈ F , not all zero, such that α1 u1 + · · · + αi ui + · · · + αn un = 0. We now consider two cases. If αi = 0, the we can solve the latter −1 −1 −1 equation for ui to obtain ui = −α−1 i α1 u1 −· · ·−αi αi−1 ui−1 −αi αi+1 ui+1 −· · ·−αi αn un . In this case, ui is a linear combination of the remaining vectors. The second case is that αi = 0, in which case at least one of α1 , . . . , αi−1 , αi+1 , . . . , αn is nonzero, and we have α1 u1 +· · ·+αi−1 ui−1 +αi+1 ui+1 +· · ·+αn un = 0. This shows that {u1 , . . . , ui−1 , ui+1 , . . . , un } is linearly dependent.
2.6
Basis and dimension
1. Suppose {v1 , v2 , . . . , vn } is a basis for a vector space V . (a) We wish to show that if any vj is removed from the basis, the resulting set of n − 1 vectors does not span V and hence is not a basis. This follows from Theorem 24: Since {v1 , v2 , . . . , vn } is linearly independent, no vj , j = 1, 2, . . . , n, can be written as a linear combination of the remaining vectors. Therefore, vj ∈ sp{v1 , . . . , vj−1 , vj+1 , . . . , vn }, which proves the desired result. (b) Now we wish to show that if any vector u ∈ V , u ∈ {v1 , v2 , . . . , vn }, is added to the basis, the resulting set of n + 1 vectors is linearly dependent. This is immediate from Theorem 34: Since
CHAPTER 2. FIELDS AND VECTOR SPACES
22
the dimension of V is n, every set containing more than n vectors is linearly dependent. Since {v1 , v2 , . . . , vn , u} contains n + 1 vectors, it must be linearly dependent. 2. Consider the following vectors in R3 : v1 = (−1, 4, −2), v2 = (5, −20, 9), v3 = (2, −7, 6). We wish to determine if {v1 , v2 , v3 } is a basis for R3 . If we solve α1 v1 + α2 v2 + α3 v3 = x for an arbitrary x ∈ R3 , we find a unique solution: α1 = 57x1 + 12x2 − 5x3 , α2 = 10x1 + 2x2 − x3 α3 = 4x1 + x2 . By Theorem 28, this implies that {v1 , v2 , v3 } is a basis for R3 . 3. We now repeat the previous exercise for the vectors v1 = (−1, 3, −1), v2 = (1, −2, −2), v3 = (−1, 7, −13). If we try to solve α1 v1 + α2 v2 + α3 v3 = x for an arbitrary x ∈ R3 , we find that this equation is equivalent to the following system: −α1 + α2 − α3 = x1 α2 + 4α3 = 3x1 + x2 0 = 8x1 + 3x2 + x3 . Since this system is inconsistent for most x ∈ R3 (the system is consistent only if x happens to satisfy 8x1 + 3x2 + x3 = 0), {v1 , v2 , v3 } does not span R3 and therefore is not a basis. 4. Let S = sp{ex , e−x } be regarded as a subspace of C(R). We will show that {ex , e−x }, {cosh (x), sinh (x)} are two different bases for S. First, to verify that {ex , e−x } is a basis, we merely need to verify that it is linearly independent (since it spans S by definition). This can be done as follows: If c1 ex + c2 e−x = 0, where 0 represents the zero function, then the equation must hold for all values of x ∈ R. So choose x = 0 and x = ln 2; then c1 and c2 must satisfy c1 + c2 = 0, 1 2c1 + c2 = 0. 2 It is straightforward to show that the only solution of this system is c1 = c2 = 0, and hence {ex , e−x } is linearly independent. Next, since cosh (x) =
1 x 1 −x 1 1 e + e , cosh (x) = ex − e−x , 2 2 2 2
we see that cosh (x), sinh (x) ∈ S = sp{ex , e−x }. We can verify that {cosh (x), sinh (x)} is linearly independent directly: If c1 cosh (x) + c2 sinh (x) = 0, then, substituting x = 0 and x = ln 2, we obtain the system 5 3 1 · c1 + 0 · c2 = 0, c1 + c2 = 0, 4 4 and the only solution is c1 = c2 = 0. Thus {cosh (x), sinh (x)} is linearly independent. Finally, let f be any function in S. Then, by definition, f can be written as f (x) = α1 ex + α2 e−x for some α1 , α2 ∈ R. We must show that f ∈ sp{cosh (x), sinh (x)}, that is, that there exist c1 , c2 ∈ R such that c1 cosh (x) + c2 sinh (x) = f (x). This equation can be manipulated as follows: c1 cosh (x) + c2 sinh (x) = f (x) 1 x 1 −x 1 x 1 −x e + e e − e + c2 = α1 ex + α2 e−x ⇔ c1 2 2 2 2 1 1 1 1 c1 + c2 e x + c1 − c2 e−x = α1 ex + α2 e−x . ⇔ 2 2 2 2
2.6. BASIS AND DIMENSION
23
Since {{ex, e−x } is linearly independent, Theorem 26 implies that the last equation can hold only if 1 1 1 1 c1 + c2 = α1 , c1 − c2 α2 . 2 2 2 2 This last system has a unique solution: c1 = α1 + α2 , c2 = α1 − α2 . This shows that f ∈ sp{cosh (x), sinh (x)}, and the proof is complete. 5. Let p1 (x) = 1 − 4x + 4x2, p2 (x) = x + x2 , p3 (x) = −2 + 11x − 6x2. We will determine whether {p1 , p2 , p3 } is a basis for P2 or not by solving α1 p1 (x) + α2 p2 (x) + α3 p3 (x) = c0 + c1 x + c2 x2 , where c0 + c1 x + c2 x2 is an arbitrary element of P2 . A direct calculation shows that there is a unique solution for α1 , α2 , α3 : α1 = 17c0 + 2c1 − 2c2 , α2 = 3c2 − 2c1 − 20c0 , α3 = 8c0 + c1 − c2 . By Theorem 28, it follows that {p1 , p2 , p3 } is a basis for P2 . 6. Let p1 (x) = 1 − x2 , p2 (x) = 2 + x, p3 (x) = x + 2x2 . We will determine whether {p1 , p2 , p3 } a basis for P2 by trying to solve α1 p1 (x) + α2 p2 (x) + α3 p3 (x) = c0 + c1 x + c2 x2 , where c0 , c1 , c2 are arbitrary real numbers. This equation is equivalent to the system α1 + 2α2 = c0 , α2 + α3 = c1 , −α1 + 2α3 = c2 . Solving this system by elimination leads to α1 + 2α2 = c0 , α2 + α3 = c1 , 0 = c0 − 2c1 + c2 , which is inconsistent for most values of c0 , c1 , c2 . Therefore, {p1 , p2 , p3 } does not span P2 and hence is not a basis for P2 . 7. Consider the subspace S = sp{p1 , p2 , p3 , p4 , p5 } of P3 , where p1 (x) = −1 + 4x − x2 + 3x3 , p2 (x) = 2 − 8x + 2x2 − 5x3 , p3 (x) = 3 − 11x + 3x2 − 8x3 , p4 (x) = −2 + 8x − 2x2 − 3x3 , p5 (x) = 2 − 8x + 2x2 + 3x3 . (a) The set {p1 , p2 , p3 , p4 , p5 } is linearly dependent (by Theorem 34) because it contains five elements and the dimension of P3 is only four. (b) As illustrated in Example 39, we begin by solving α1 p1 (x) + α2 p2 (x) + α3 p3 (x) + α4 p4 (x) + α5 p5 (x) = 0; this is equivalent to the system −α1 + 2α2 + 3α3 − 2α4 + 2α5 = 0, 4α1 − 8α2 − 11α3 + 8α4 − 8α5 = 0, −α1 + 2α2 + 3α3 − 2α4 + 2α5 = 0, 3α1 − 5α2 − 8α3 − 3α4 + 3α5 = 0,
CHAPTER 2. FIELDS AND VECTOR SPACES
24 which reduces to
α1 = 16α4 − 16α5 , α2 = 9α4 − 9α5 , α3 = 0. Since there are nontrivial solutions, {p1 , p2 , p3 , p4 , p5 } is linearly dependent (which we already knew), but we can deduce more than that. By taking α4 = 1, α5 = 0, we see that α1 = 16, α2 = 9, α3 = 0, α4 = 1, α5 = 0 is one solution, which means that 16p1 (x) + 9p2 (x) + p4 (x) = 0 ⇒ p4 (x) = −16p1 (x) − 9p2 (x). This shows that p4 ∈ sp{p1 , p2 } ⊂ sp{p1 , p2 , p3 }. Similarly, taking α4 = 0, α5 = 0, we find that −16p1 (x) − 9p2 (x) + p5 (x) = 0 ⇒ p5 (x) = 16p1 (x) + 9p2 (x), and hence p5 ∈ sp{p1 , p2 } ⊂ sp{p1 , p2 , p3 }. It follows from Lemma 19 that sp{p1 , p2 , p3 , p4 , p5 } = sp{p1 , p2 , p3 }. Our calculations above show that {p1 , p2 , p3 } is linearly independent (if α4 = α5 = 0, then also α1 = α2 = α3 = 0). Therefore, {p1 , p2 , p3 } is a linearly independent spanning set of S and hence a basis for S. 8. We wish to find a basis for sp{(1, 2, 1), (0, 1, 1), (1, 1, 0)} ⊂ R3 . We will name the vectors v1 , v2 , v3 , respectively, and begin by testing the linear independence of {v1 , v2 , v3 }. The equation α1 v1 + α2 v2 + α3 v3 = 0 is equivalent to α1 + α3 = 0, 2α2 + α2 + α3 = 0, α1 + α2 = 0, which reduces to α1 = −α3 , α2 = α3 . One solution is α1 = −1, α2 = 1, α3 = 1, which shows that −v1 + v2 + v3 = 0, or v3 = v1 − v2 . This in turns shows that sp{v1 , v2 , v3 } = sp{v1 , v2 } (by Lemma 19). Clearly {v1 , v2 } is linearly independent (since neither vector is a multiple of the other), and hence {v1 , v2 } is a basis for sp{v1 , v2 , v3 }. 9. We wish to find a basis for S = sp{(1, 2, 1, 2, 1), (1, 1, 2, 2, 1), (0, 1, 2, 0, 2)} in Z53 . The equation α1 (1, 2, 1, 2, 1) + α2 (1, 1, 2, 2, 1) + α3 (0, 1, 2, 0, 2) = (0, 0, 0, 0, 0) is equivalent to the system α1 + α2 = 0, 2α1 + α2 + α3 = 0, α1 + 2α2 + 2α3 = 0, 2α1 + 2α2 = 0, α1 + α2 + 2α3 = 0. Reducing this system by Gaussian elimination (in modulo 3 arithmetic), we obtain α1 = α2 = α3 = 0, which shows that the given vectors form a linearly independent set and therefore a basis for S.
2.6. BASIS AND DIMENSION
25
10. We will show that {1 + x + x2 , 1 − x + x2 , 1 + x + 2x2 } is a basis for P2 (Z3 ) by showing that there is a unique solution to α1 1 + x + x2 + α2 1 − x + x2 + α3 1 + x + 2x2 = c0 + c1 x + c2 x2 . We first note that 1 − x + x2 = 1 + 2x + x2 in P2 (Z3 ), so we can write our equation as α1 1 + x + x2 + α2 1 + 2x + x2 + α3 1 + x + 2x2 = c0 + c1 x + c2 x2 . We rearrange the previous equation in the form (α1 + α2 + α3 ) + (α1 + 2α2 + α3 )x + (α1 + α2 + 2α3 )x2 = c0 + c1 x + c2 x2 . Since the polynomials involved are of degree 2 and the field Z3 contains 3 elements, this last equation is is equivalent to the system α1 + α2 + α3 = c0 , α1 + 2α2 + α3 = c1 , α1 + α2 + 2α3 = c2 (cf. the discussion on page 45 of the text). Applying Gaussian elimination (modulo 3) shows that there is a unique solution: α1 = c0 + 2c1 + c2 , α2 = 2c0 + c1 , α3 = 2c0 + c2 . This in turn proves (by Theorem 28) that the given polynomials form a basis for P2 (Z3 ). 11. Suppose F is a finite field with q distinct elements. (a) Assume n ≤ q − 1. We wish to show that {1, x, x2 , . . . , xn } is a linearly independent subset of Pn (F ). (Since {1, x, x2 , . . . , xn } clearly spans Pn (F ), this will show that it is a basis for Pn (F ), and hence that dim(Pn (F )) = n + 1 in the case that n ≤ q − 1.) The desired conclusion follows from the discussion on page 45 of the text. If c0 · 1 + c1x + · · · + cn xn = 0 (where 0 is the zero function), then every element of F is a root of c0 · 1 + c1 x + · · · + cn xn . Since F contains more than n elements and a nonzero polynomial of degree n can have at most n distinct roots, this is impossible unless c0 = c1 = . . . = cn = 0. Thus {1, x, . . . , xn } is linearly independent. (b) Now suppose that n ≥ q. The reasoning above shows that {1, x, x2 , . . . , xq−1 } is linearly independent in Pn (F ) (c0 · 1 + c1 x + · · · + cq−1 xq−1 has at most q − 1 distinct roots, and F contains more than q − 1 elements, etc.). This implies that dim (Pn (F )) ≥ q in the case n ≥ q. 12. Suppose V is a vector space over a field F , and S, T are two n-dimensional subspaces of V . We wish to prove that if S ⊂ T , then in fact S = T . Let {s1 , s2 , . . . , sn } be a basis for S. Since S ⊂ T , this implies that {s1 , s2 , . . . , sn } is a linearly independent subset of T . We will now show that {s1 , s2 , . . . , sn } also spans T . Let t ∈ T be arbitrary. Since T has dimension n, the set {s1 , s2 , . . . , sn , t} is linearly dependent by Theorem 34. But then, by Lemma 33, t must be a linear combination of s1 , s2 , . . . , sn (since no sk is a linear combination of s1 , s2 , . . . , sk−1 ). This shows that t ∈ sp{s1 , s2 , . . . , sn }, and hence we have shown that {s1 , s2 , . . . , sn } is a basis for T . But then T = sp{s1 , s2 , . . . , sn } = S, as desired. 13. Suppose V is a vector space over a field F , and S, T are two finite-dimensional subspaces of V with S ⊂ T . We are asked to prove that dim(S) ≤ dim(T ). Let {s1 , s2 , . . . , sn } be a basis for S. Since S ⊂ T , it follows that {s1 , s2 , . . . , sn } is a linearly independent subset of T , and hence, by Theorem 34, any basis for T must have at least n vectors. It follows that dim(T ) ≥ n = dim(S).
CHAPTER 2. FIELDS AND VECTOR SPACES
26
14. (Note: This exercise belongs in Section 2.7 since the most natural solution uses Theorem 43.) Let V be a vector space over a field F , and let S and T be finite-dimensional subspaces of V . We wish to prove that dim(S + T ) = dim(S) + dim(T ) − dim(S ∩ T ). We know from Exercise 2.3.19 that S ∩ T is a subspace of V , and since it is a subset of S, dim(S ∩ T ) ≤ dim(S). Since S is finite-dimensional by assumption, it follows that S ∩ T is also finite-dimensional, and therefore either S ∩ T = {0} or S ∩ T has a basis. Suppose first that S ∩ T = {0}, so that dim(S ∩ T ) = 0. Let {s1 , s2 , . . . , sm } be a basis for S and {t1 , t2 , . . . , tn } be a basis for T . We will show that {s1 , . . . , sm , t1 , . . . , tn } is a basis for S + T , from which it follows that dim(S + T ) = m + n = m + n − 0 = dim(S) + dim(T ) − dim(S ∩ T ). The set {s1 , . . . , sm , t1 , . . . , tn } is linearly independent by Exercise 2.5.15. Given any v ∈ S + T , there exist s ∈ S, t ∈ T such that v = s + t. But since s ∈ S, there exist scalars α1 , . . . , αm ∈ F such that s = α1 s1 + · · · + αm sm . Similarly, since t ∈ T , there exist β1 , . . . , βn ∈ F such that t = β1 t1 + · · · + βn tn . But then v = s + t = α1 s1 + · · · + αm sm + β1 t1 + · · · + βn tn , which shows that v ∈ sp{s1 , . . . , sm , t1 , . . . , tn }. Thus we have shown that {s1 , . . . , sm , t1 , . . . , tn } is a basis for S + T , which completes the proof in the case that S ∩ T = {0}. Now suppose S ∩ T is nontrivial, with basis {v1 , . . . , vk }. Since S ∩ T is a subset of S, {v1 , . . . , vk } is a linearly independent subset of S and hence, by Theorem 43, can be extended to a basis {v1 , . . . , vk , s1 , . . . , sp } of S. Similarly, {v1 , . . . , vk } can be extended to a basis {v1 , . . . , vk , t1 , . . . , tq } of T . We will show that {v1 , . . . , vk , s1 , . . . , sp , t1 , . . . , tq } is a basis of S + T . Then we will have dim(S) = k + p, dim(T ) = k + q, dim(S ∩ T ) = k and dim(S + T ) = k + p + q = (k + p) + (k + q) − k = dim(S) + dim(T ) − dim(S ∩ T ), as desired. First, suppose v ∈ S + T . Then, by definition of S + T , there exist s ∈ S and t ∈ T such that v = s + t. Since {v1 , . . . , vk , s1 , . . . , sp } is a basis for S, there exist scalars α1 , . . . , αk , β1 , . . . , βp ∈ F such that s = α1 v1 + · · · + αk vk + β1 s1 + · · · + βp sp . Similarly, there exist γ1 , . . . , γk , δ1 , . . . , δq ∈ F such that t = γ1 v1 + · · · + γk vk + δ1 s1 + · · · + δq sq . But then v = s + t = α1 v1 + · · · + αk vk + β1 s1 + · · · + βp sp + γ1 v1 + · · · + γk vk + δ1 s1 + · · · + δq sq = (α1 + γ1 )v1 + · · · + (αk + γk )vk + β1 s1 + · · · + βp sp + δ 1 t1 + · · · + δ q tq ∈ sp{v1 , . . . , vk , s1 , . . . , sp , t1 , . . . , tq }. This shows that {v1 , . . . , vk , s1 , . . . , sp , t1 , . . . , tq } spans S+T . Now suppose α1 , . . . , αk , β1 , . . . , βp , γ1 , . . . , γq ∈ F satisfy α1 v1 + · · · + αk vk + β1 s1 + · · · + βp sp + γ1 t1 + · · · + γq tq = 0. This implies that α1 v1 + · · · + αk vk + β1 s1 + · · · + βp sp = −γ1 t1 − · · · − γq tq .
2.6. BASIS AND DIMENSION
27
The vector on the left belongs to S, while the vector on the right belongs to T ; hence both vectors (which are really the same) belong to S ∩ T . But then −γ1 t1 − · · · − γq tq can be written in terms of the basis {v1 , . . . , vk } of S ∩ T , say −γ1 t1 − · · · − γq tq = δ1 v1 + · · · + δk vk . But this gives two representations of the vector −γ1 t1 − · · · − γq tq ∈ T in terms of the basis {v1 , . . . , vk , t1 , . . . , tq }. Since each vector in T must be uniquely represented as a linear combination of the basis vectors, this is possible only if γ1 = · · · = γq = δ1 = · · · = δk = 0. But then α1 v1 + · · · + αk vk + β1 s1 + · · · + βp sp = 0, and the linear independence of {v1 , . . . , vk , σ1 , . . . , sp } implies that α1 = · · · = αk = β1 = · · · = βp = 0. We have thus shown that {v1 , . . . , vk , s1 , . . . , sp , t1 , . . . , tq } is linearly independent, which completes the proof. 15. Let V be a vector space over a field F , and let S and T be finite-dimensional subspaces of V . Consider the four subspaces X1 = S, X2 = T, X3 = S + T, X4 = S ∩ T. For every choice of i, j with 1 ≤ i < j ≤ 4, we wish to determine if dim(Xi ) ≤ dim(Xj ) or dim(Xi ) ≥ dim(Xj ) (or neither) must hold. First of all, since S and T are arbitrary subspaces, it is obvious that there need be no particular relationship between the dimensions of S and T . However, S ⊂ S + T since each s ∈ S can be written as s = s + 0 ∈ S + T (0 ∈ T because every subspace contains the zero vector). Therefore, by Exercise 13, dim(S) ≤ dim(S +T ). By the same reasoning, T ⊂ S + T and hence dim(T ) ≤ dim(S + T ). Next, S ∩ T ⊂ S, S ∩ T ⊂ T , and hence dim(S ∩ T ) ≤ dim(S), dim(S ∩ T ) ≤ dim(T ). Finally, we have S ∩ T ⊂ S ⊂ S + T , and hence dim(S ∩ T ) ≤ dim(S + T ). 16. Let V be a vector space over a field F , and suppose S and T are subspaces of V satisfying S ∩ T = {0}. Suppose {s1 , s2 , . . . , sk } ⊂ S and {t1 , t2 , . . . , t } ⊂ T are bases for S and T , respectively. We wish to prove that {s1 , s2 , . . . , sk , t1 , t2 , . . . , t } is a basis for S + T . This was done in the course of proving the result in Exercise 14. 17. Let U and V be vector spaces over a field F , and let {u1 , . . . , un } and {v1 , . . . , vm } be bases for U and V , respectively. We are asked to prove that {(u1 , 0), . . . , (un , 0), (0, v1 ), . . . , (0, vm )} is a basis for U × V . First, let (u, v) be an arbitrary vector in U × V . Then u ∈ U and there exist α1 , . . . , αn ∈ F such that u = α1 u1 + · · · + αn un . Similarly, v ∈ V and there exist β1 , . . . , βm ∈ F such that v = β1 v1 + · · · + βm vm . It follows that (u, v) = (u, 0) + (0, v) = (α1 u1 + · · · + αn un , 0) + (0, β1 v1 + · · · + βm vm ) = α1 (u1 , 0) + · · · αn (un , 0) + β1 (0, v1 ) + · · · + βm (0, vm ). This shows that {(u1 , 0), . . . , (un , 0), (0, v1 ), . . . , (0, vm )} spans U × V . Next, suppose α1 , . . . , αn ∈ F , β1 , . . . , βm ∈ F satisfy α1 (u1 , 0) + · · · αn (un , 0) + β1 (0, v1 ) + · · · + βm (0, vm ) = 0.
CHAPTER 2. FIELDS AND VECTOR SPACES
28 Since the zero vector ∈ U × V is (0, 0), this yields
(α1 u1 + · · · + αn un , 0) + (0, β1 v1 + · · · + βm vm ) = (0, 0), which is equivalent to α1 u1 + · · · + αn un = 0, β1 v1 + · · · + βm vm = 0. Since both {u1 , . . . , un } and {v1 , . . . , vm } are linearly independent, it follows that α1 = · · · = αn = β1 = · · · = βm = 0. This shows that {(u1 , 0), . . . , (un , 0), (0, v1 ), . . . , (0, vm )} is linearly independent, and the proof is complete. 18. We will prove that the number of elements in a finite field must be pn , where p is a prime number and n is a positive integer. Let F be a finite field. (a) Let p be the characteristic of F . Then 0, 1, 1 + 1, 1 + 1 + 1, . . . , 1 + 1 + · · · + 1 (p − 1 terms in the last sum) are distinct elements of F , while 1 + 1 + · · · + 1 (p terms) is 0. We will write 2 = 1 + 1, 3 = 1 + 1 + 1, and so forth, thus labeling p distinct elements of F , namely, 0, 1, . . . , p − 1. We can then show that {0, 1, 2, . . . , p − 1} ⊂ F is a subfield of F isomorphic to Zp . Writing out a formal proof is difficult, because the symbols 0, 1, 2, . . . , p − 1 have now three different meanings (they are elements of Z, elements of Zp , and now elements of F ). For the purposes of this proof, we will temporarily write 0F , 1F , . . . , (p − 1)F for the elements of F , 0Zp , 1Zp , . . . , (p − 1)Zp for the elements of Zp , and 0, 1, . . . , p − 1 for the elements of Z. Let us define G = {0F , 1F , . . . , (p − 1)F } and φ : G → Zp by φ(kF ) = kZp . If k + < p, then kF + F is the sum of k + copies of 1F , which is (k + )F by definition. Similarly, kZp + Zp = (k + )Zp by definition of addition in Zp . Therefore, φ(kF + F ) = φ((k + )F ) = (k + )Zp = kZp + Zp = φ(kF ) + φ( F ). On the other hand, if 0 ≤ k, ≤ p − 1 and k + ≥ p, then kF + F is the sum of k + copies of 1F , which can be written (by the associative property of addition) as the sum of p copies of 1F plus the sum of k + − p copies of 1F . This reduces to 0F + (k + − p)F = (k + − p)F . Similarly, by the definition of addition in Zp , kZp + Zp = (k + − p)Zp , and therefore, in this case also, we see φ(kF + F ) = φ((k + − p)F ) = (k + − p)Zp = kZp + Zp = φ(kF ) + φ( F ). Therefore, φ preserves addition. Now, by the distributive law, kF F can be written as the sum of k copies of 1F (a careful proof of this would require induction). If k = qp + r, where q ≥ 0 and 0 ≤ r ≤ p − 1, then, by the associative property of addition, we can write kF F as the sum of q + 1 sums, the first q of them consisting of p copies of 1F (and thus each equalling 0F ) and the last consisting of r copies of 1F . It follows that kF F = rF . By definition of multiplication in Zp , we simiarly have kZp Zp = rZp , and hence φ(kF F ) = φ(rF ) = rZp = kZp Zp = φ(kF )φ( F ). Therefore, φ also preserves multiplication, and we have shown that G and Zp are isomorphic as fields. (b) We now drop the subscripts and identify Zp with the subfield G of F . We wish to show that that F is a vector space over Zp . We already know that addition in F is commutative, associative, and has an identity, and that each element of F has an additive inverse in F . The associative property of scalar multiplication and the two distributive properties of scalar multiplication reduce to the associative property of multiplication and the distributive property in F . Finally, 1 · u = u for all u ∈ F since 1 is nothing more than the multiplicative identity in F . This verifies that F is a vector field over Zp .
2.7. PROPERTIES OF BASES
29
(c) Since F has only a finite number of elements, it must be a finite-dimensional vector space over Zp . Let the dimension be n, and let {f1 , . . . , fn } be a basis for F . Then every element of F can be written uniquely in the form α1 f1 + α2 f2 + · · · + αn fn , (2.1) where α1 , α2 , . . . , αn ∈ Zp . Conversely, for each choice of α1 , α2 , . . . , αn in Zp , (2.1) defines an element of F . Therefore, the number of elements of F is precisely the number of different ways to choose α1 , α2 , . . . , αn ∈ Zp , which is pn .
2.7
Properties of bases
1. Consider the following vectors in R3 : v1 = (1, 5, 4), v2 = (1, 5, 3), v3 = (17, 85, 56), v4 = (1, 5, 2), v5 = (3, 16, 13). (a) We wish to show that {v1 , v2 , v3 , v4 , v5 } spans R3 . Given an arbitrary x ∈ R3 , the equation α1 v1 + α2 v2 + α3 v3 + α4 v4 + α5 v5 = x is equivalent to the system α1 + α2 + 17α3 + α4 + 3α5 = x1 , 5α1 + 5α2 + 85α3 + 5α4 + 16α5 = x2 , 4α1 + 3α2 + 56α3 + 2α4 + 13α5 = x3 . Applying Gaussian elimination, this system reduces to α1 = 17x1 − 4x2 + x3 − 5α3 + α4 , α2 = x2 − x1 − x3 − 12α3 − 2α5 , α5 = x2 − 5x1 . This shows that there are solutions regardless of the value of x; that is, each x ∈ R3 can be written as a linear combination of v1 , v2 , v3 , v4 , v5 . Therefore, {v1 , v2 , v3 , v4 , v5 } spans R3 . (b) Now we wish to find a subset of {v1 , v2 , v3 , v4 , v5 } that is a basis for R3 . According to the calculations given above, each x ∈ R3 can be written as a linear combination of {v1 , v2 , v5 } (just take α3 = α4 = 0 in the system solved above). Since dim(R3 ) = 3, any three vectors spanning R3 form a basis for R3 (by Theorem 45). Hence {v1 , v2 , v5 } is a basis for R3 . 2. Consider the following vectors in R4 : u1 = (1, 3, 5, −1), u2 = (1, 4, 9, 0), u3 = (4, 9, 7, −5). (a) We wish to show that {u1 , u2 , u3 } is linearly independent, which we do by solving α1 u1 + α2 u2 + α3 u3 = 0. This is equivalent to the system α1 + α2 + 4α3 = 0, 3α1 + 4α2 + 9α3 = 0, 5α1 + 9α2 + 7α3 = 0, −α1 − 5α3 = 0. Applying Gaussian elimination, we find that the only solution is α1 = α2 = α3 = 0. Thus {u1 , u2 , u3 } is linearly independent. (b) Since sp{u1 , u2 , u3 } is a three-dimensional subspace of R4 , and hence a very small part of R4 , almost every vector in R4 does not belong to sp{u1 , u2 , u3 } and hence would be a valid fourth vector for a basis. We choose u4 = (0, 0, 0, 1), and test whether {u1 , u2 , u3 , u4 } is linearly independent. A direct calculation shows that α1 u1 + α2 u2 + α3 u3 + α4 u4 = 0 has only the trivial solution, and hence (by Theorem 45) {u1 , u2 , u3 , u4 } is a basis for R4 .
CHAPTER 2. FIELDS AND VECTOR SPACES
30 3. Let p1 (x) = 2 − 5x, p2 (x) = 2 − 5x + 4x2 .
(a) Obviously {p1 , p2 } is linearly independent, because neither polynomial is a multiple of the other. (b) Now we wish to find a polynomial p3 ∈ P2 such that {p1 , p2 , p3 } is a basis for P2 . Since sp{p1 , p2 } is a two-dimensional subspace of the three-dimensional space P2 , almost any polynomial will do; we choose p3 (x) = 1. We then test for linear independence by solving c1 p1 (x) + c2 p2 (x) + c3 p3 (x) = 0. This equation is equivalent to (2c1 + 2c2 + c3 ) + (−5c1 − 5c2 )x + 4c2 x2 = 0, which in turn is equivalent to the system 2c1 + 2c2 + c3 = 0, −5c1 − 5c2 = 0, 4c2 = 0. A direct calculation shows that the only solution is c1 = c2 = c3 = 0, and hence {p1 , p2 , p3 } is linearly independent. It follows from Theorem 45 that {p1 , p2 , p3 } is a basis for R3 . 4. Define p1 , p2 , p3 , p4 , p5 ∈ P2 by p1 (x) = x, p2 (x) = 1 + x, p3 (x) = 3 + 5x, p4 (x) = 5 + 8x, p5 (x) = 3 + x − x2 . (a) We first show that {p1 , p2 , p3 , p4 , p5 } spans P2 . Given an arbitrary q(x) = a0 + a1 x + a2 x2 in P2 , the equation c1 p1 (x) + c2 p2 (x) + c3 p3 (x) + c4 p4 (x) + c5 p5 (x) = q(x) is equivalent to (c2 + 3c3 + 5c4 + 3c5 ) + (c1 + c2 + 5c3 + 8c4 + c5 )x − c5 x2 = a0 + a1 x + a2 x2 , and hence to the system c2 + 3c3 + 5c4 + 3c5 = a0 , c1 + c2 + 5c3 + 8c4 + c5 = a1 , −c5 = a2 . Applying Gaussian elimination, we obtain the reduced system c1 = a1 − a0 − 2a2 − 2c3 − 3c4 , c2 = a0 + 3a2 − 3c3 − 5c4 , c5 = −a2 . We see that, regardless of the values of a0 , a1 , a2 , there is a solution, and hence {p1 , p2 , p3 , p4 , p5 } spans P2 . (b) We now find a subset of {p1 , p2 , p3 , p4 , p5 } that forms a basis for P2 . From the above calculation, we see that every q ∈ P2 can be written as a linear combination of p1 , p2 , p5 . Since dim(P2 ) = 3, Theorem 45 implies that {p1 , p2 , p5 } is a basis for P2 . 5. Let u1 = (1, 4, 0, −5, 1), u2 = (1, 3, 0, −4, 0), u3 = (0, 4, 1, 1, 4) be vectors in R5 . (a) To show that {u1 , u2 , u3 } is linearly independent, we solve the equation α1 u1 + α2 u2 + α3 u3 = 0, which is equivalent to the system α1 + α2 = 0, 4α1 + 3α2 + 4α3 = 0, α3 = 0, −5α1 − 4α2 + α3 = 0, α1 + 4α3 = 0. A direct calculation shows that this system has only the trivial solution.
2.7. PROPERTIES OF BASES
31
(b) To extend {u1 , u2 , u3 } to a basis for R5 , we need two more vectors. We will try u4 = (0, 0, 0, 1, 0) and u5 = (0, 0, 0, 0, 1). We solve α1 u1 + α2 u2 + α3 u3 + α4 u4 + α5 u5 = 0 and find that the only solution is the trivial one. This implies that {u1 , u2 , u3 , u4 , u5 } is linearly independent and hence, by Theorem 45, a basis for R5 . 6. Consider the following vectors in R5 : ⎤ ⎤ ⎤ ⎡ ⎡ 1 −1 1 ⎢ 2 ⎥ ⎢ 3 ⎥ ⎢ 7 ⎥ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎥ , u2 = ⎢ 2 ⎥ , u3 = ⎢ 2 ⎥ , 0 u1 = ⎢ ⎥ ⎥ ⎥ ⎢ ⎢ ⎢ ⎣ 1 ⎦ ⎣ 1 ⎦ ⎣ 3 ⎦ 1 −1 1 ⎤ ⎤ ⎡ ⎡ 1 2 ⎢ −2 ⎥ ⎢ 10 ⎥ ⎥ ⎥ ⎢ ⎢ ⎥ ⎥ ⎢ u4 = ⎢ ⎢ −1 ⎥ , u5 = ⎢ 3 ⎥ . ⎣ 1 ⎦ ⎣ 6 ⎦ 1 2 ⎡
Let S = sp{u1 , u2 , u3 , u4 , u5 }. We wish to find a subset of {u1 , u2 , u3 , u4 , u5 } that is a basis for S. We let x be an arbitrary vector in R5 and solve α1 u1 + α2 u2 + α3 u3 + α4 u4 + α5 u5 = x. This equation is equivalent to the system α1 − α2 + α3 + α4 + 2α5 = x1 , 2α1 + 3α2 + 7α3 − 2α4 + 10α5 = x2 , 2α2 + 2α3 − α4 + 3α5 = x3 , α1 + α2 + 3α3 + α4 + 6α5 = x4 , α1 − α2 + α3 + α4 + 2α5 = x5 . Applying Gaussian elimination, this system is equivalent to 3 1 x1 + x3 − x4 − 2α3 − 3α5 , 2 2 1 1 α2 = − x1 + x4 − α3 − 2α5 , 2 2 α4 = −x1 − x3 + x4 − α5 , 3 7 0 = − x1 + x2 − 4x3 + x4 , 2 2 0 = −x1 + x5 . α1 =
We see first of all that S is a proper subspace of R5 , since x ∈ S unless 7 3 − x1 + x2 − 4x3 + x4 = 0, −x1 + x5 = 0. 2 2 We also see that any x ∈ S can be represented as a linear combination of u1 , u2 , u4 by taking α3 = α5 = 0 in the above equations. Finally, it can be verified directly that {u1 , u2 , u4 } is linearly independent and hence a basis for S. 7. Consider the following polynomials in P3 : p1 (x) = 1 − 4x + x2 + x3 , p2 (x) = 3 − 11x + x2 + 4x3 , p3 (x) = −x + 2x2 − x3 , p4 (x) = −x2 + 2x3 , p5 (x) = 5 − 18x + 2x2 + 5x3 .
CHAPTER 2. FIELDS AND VECTOR SPACES
32
We wish to determine the dimension of S = sp{p1 , p2 , p3 , p4 , p5 }. We solve c1 p1 (x) + c2 p2 (x) + c3 p3 (x) + c4 p4 (x) + c5 p5 (x) = 0 to determine the linear dependence relationships among these vectors (notice that we already know that {p1 , p2 , p3 , p4 , p5 } is linearly dependent because the dimension of P3 is only 4). This equation is equivalent to the system c1 + 3c2 + 5c5 = 0, −4c1 − 11c2 − c3 − 18c5 = 0, c1 + c2 + 2c3 − c4 + 2c5 = 0, 2c1 + 4c2 − c3 + 2c4 + 5c5 = 0. Applying Gaussian elimation yields 5 1 c1 = 0, c2 = − c5 , c3 = c5 , c4 = c5 . 3 3 Choosing c5 = 1, we see that 5 1 − p2 (x) + p3 (x) + p4 (x) + p5 (x) = 0. 3 3 We can solve this equation for p5 (x), which shows that p5 ∈ sp{p2 , p3 , p4 } ⊂ sp{p1 , p2 , p3 , p4 }. Therefore, S = sp{p1 , p2 , p3 , p4 }. The above calculations also show that the only solution of c1 p1 (x) + c2 p2 (x) + c3 p3 (x) + c4 p4 (x) + c5 p5 (x) = 0 with c5 = 0 is the trivial solution, that is, the only solution of c1 p1 (x) + c2 p2 (x) + c3 p3 (x) + c4 p4 (x) = 0 is the trivial solution. Thus {p1 , p2 , p3 , p4 } is linearly independent and hence a basis for S. (This also shows that S is four-dimensional and hence equals all of P3 .) 8. Let S = sp{v1 , v2 , v3 , v4 } ⊂ C3 , where v1 = (1 − i, 3 + i, 1 + i), v2 = (1, 1 − i, 3), v3 = (i, −2 − 2i, 2 − i), v4 = (2 − i, 7 + 3i, 2 + 5i). We will find a basis for S. We begin by solving α1 v1 + α2 v2 + α3 v3 + α4 v4 = 0, which is equivalent to the system (1 − i)α1 + α2 + iα3 + (2 − i)α4 = 0, (3 + i)α1 + (1 − i)α2 + (−2 − 2i)α3 + (7 + 3i)α4 = 0, (1 + i)α1 + 3α2 + (2 − i)α3 + (2 + 5i)α4 = 0. Applying Gaussian elimination yields α1 = α3 − 2α4 , α2 = −α3 − iα4 . Taking α3 = 1, α4 = 0, we see that v3 can be written as a linear combination of v1 , v2 , and taking α3 = 0, α4 = 1, we see that v4 can be written as a linear combination of v1 , v2 . Thus both v3 and v4 belong to sp{v1 , v2 }, which shows that S = sp{v1 , v2 }. Since neither v1 nor v2 is a multiple of the other, we see that {v1 , v2 } is linearly independent and hence is a basis for S. 9. Consider the vectors u1 = (3, 1, 0, 4) and u2 = (1, 1, 1, 4) in Z45 . (a) It is obvious that {u1 , u2 } is linearly independent, since neither vector is a multiple of the other. (b) To extend {u1 , u2 } to a basis for Z45 , we must find vectors u3 , u4 such that {u1 , u2 , u3 , u4 } is linearly independent. We try u3 = (0, 0, 1, 0) and u4 = (0, 0, 0, 1). A direct calculation then shows that α1 u1 + α2 u2 + α3 u3 + α4 u4 = 0 has only the trivial solution. Therefore {u1 , u2 , u3 , u4 } is linearly independent and hence, since dim(Z45 ) = 4, it is a basis for Z45 .
2.7. PROPERTIES OF BASES
33
10. Let S = sp{v1 , v2 , v3 } ⊂ Z33 , where v1 = (1, 2, 1), v2 = (2, 1, 2), v3 = (1, 0, 1). We wish to find a subset of {v1 , v2 , v3 } that is a basis for S. As usual, we proceed by solving α1 v1 + α2 v2 + α3 v3 = 0 to find the linear dependence relationships (if any). This equation is equivalent to the system α1 + 2α2 + α3 = 0, 2α1 + α2 = 0, α1 + 2α2 + α3 = 0, which reduces, by Gaussian elimination, to α1 = α2 , α3 = 0. It follows that v1 + v2 = 0, or v2 = 2v1 (notice that −1 = 2 in Z3 ). Therefore sp{v1 , v3 } = sp{v1 , v2 , v3 } = S. It is obvious that {v1 , v3 } is linearly independent (since neither vector is a multiple of the other), and therefore {v1 , v3 } is a basis for S. 11. Let F be a field. We will show how to produce different bases for a nontrivial, finite-dimensional vector space over V . (a) Let V be a 1-dimensional vector space over F , and let {u1 } be a basis for V . Then {αu1 } is a basis for V for any α = 0. To prove this, we first note that u1 is nonzero since {u1 } is linearly independent by assumption; therefore α1 (αu1 ) = 0 implies that (α1 α)u1 = 0 and hence (by Theorem 5, part 6) that α1 α = 0. Since α = 0, this in turn yields α1 = 0, and hence {αu1 } is linearly independent. v ∈ V . Since {u1 } is a basis for V , there exists β ∈ F such that βu1 = v. But then Now−1suppose (αu1 ) = v, which shows that {αu1 } spans V . Thus {αu1 } is a basis for V . βα (b) Now let V be a 2-dimensional vector space over F , and let {u1 , u2 } be a basis for V . We wish to prove that {αu1 , βu1 +γu2 } is a basis for V for any α = 0, γ = 0. First, suppose α1 (αu1 )+α2 (βu1 +γu2 ) = 0. We can rewrite this equation as (αα1 + βα2 )u1 + (γα2 )u2 = 0 and, since {u1 , u2 } is linearly independent, this implies that αα1 + βα2 = 0, γα2 = 0. Since γ = 0 by assumption, the second equation implies that α2 = 0. Then the first equation simplifies to αα1 = 0, which implies (since α = 0) that α1 = 0. This shows that {αu1 , βu1 + γu2 } is linearly independent. Now suppose v ∈ V . Since {u1 , u2 } is a basis for V , there exist β1 , β2 ∈ F such that v = β1 u1 +β2 u2 . We wish to find α1 , α2 ∈ F such that α1 (αu1 ) + α2 (βu1 + γu2 ) = v. We can rearrange this last equation to read (αα1 + βα2 )u1 + (γα2 ) = v, which then yields (αα1 + βα2 )u1 + (γα2 ) = β1 u1 + β2 u2 . Since v has a unique representation as a linear combination of the basis vectors u1 , u2 , it follows that αα1 + βα2 = β1 , γα2 = β2 . This system can be solved to yield a unique solution: α1 = α−1 (β1 − βγ −1 β2 ), α2 = γ −1 β2 . This shows that v ∈ sp{αu1 , βu1 + γu2 }, and hence that {αu1 , βu1 + γu2 } spans V . Therefore, {αu1 , βu1 + γu2 } is a basis for V .
CHAPTER 2. FIELDS AND VECTOR SPACES
34
(c) Let V be a vector space over F with basis {u1 , . . . , un }. We wish to generalize the previous parts of this exercise to show how to produce a collection of different bases for V . We choose any scalars αij , i = 1, 2, . . . , n, j = 1, 2, . . . , i, with αii = 0 for all i = 1, 2, . . . , n. Then the set {α11 u1 , α21 u1 + α22 u2 , . . . , αn1 u1 + αn2 u2 + · · · + αnn un }
(2.2)
is a basis for V . We can prove this by induction on n. We have already done the case n = 1 (and also n = 2). Let us assume that the construction leads to a basis if the dimension of the vector space is n − 1, and suppose that V has dimension n, with basis {u1 , u2 , . . . , un }. By the induction hypothesis, {α11 u1 , α21 u1 + α22 u2 , . . . , αn−1,1 u1 + · · · + αn−1,n−1 un−1 } (2.3) is a basis for S = sp{u1 , u2 , . . . , un−1 }. We now show that (2.2) is a basis for V by showing that each v ∈ V can be uniquely represented as a linear combination of the vectors in (2.2). So let v be any vector in V , say v = β1 u1 + β2 u2 + · · · + βn un . Notice that βn un = βn α−1 nn (αn1 u1 + · · · + αnn un ) − −1 βn α−1 nn αn1 u1 + · · · + βn αnn αn,n−1 un−1 . The vector −1 βn α−1 nn αn1 u1 + · · · + βn αnn αn,n−1 un−1
belongs to S and, by the induction hypothesis, can be written uniquely as γ1 (a11 u1 ) + γ2 (α21 u1 + α22 u2 ) + · · · + γn−1 (αn−1,1 u1 + · · · + αn−1,n−1 un−1 ). Also, β1 u1 + · · · + βn−1 un−1 = δ1 (a11 u1 ) + δ2 (α21 u1 + α22 u2 ) + · · · + δn−1 (αn−1,1 u1 + · · · + αn−1,n−1 un−1 ). Putting this all together, we obtain v = β1 u1 + · · · + βn−1 un−1 + βn un = δ1 (a11 u1 ) + δ2 (α21 u1 + α22 u2 ) + · · · + δn−1 (αn−1,1 u1 + · · · + αn−1,n−1 un−1 )+ βn α−1 nn (αn1 u1 + · · · + αnn un ) − (γ1 (a11 u1 ) + γ2 (α21 u1 + α22 u2 ) + · · · + γn−1 (αn−1,1 u1 + · · · + αn−1,n−1 un−1 ) = (δ1 − γ1 )(a11 u1 ) + (δ2 − γ2 )(α21 u1 + α22 u2 ) + · · · + (δn−1 − γn−1 )(αn−1,1 u1 + · · · + αn−1,n−1 un−1 )+ βn α−1 nn (αn1 u1 + · · · + αnn un ) . This shows that each v can be written as a linear combination of the vectors in (2.2). Uniqueness follows from the induction hypothesis and the fact that there is only one way to write βn un as a multiple of αn1 u1 + · · · + αnn un plus a vector from S.
2.7. PROPERTIES OF BASES
35
12. We wish to prove that every nontrivial subspace of a finite-dimensional vector space has a basis (and hence is finite-dimensional). Let V be a finite-dimensional vector space, let dim(V ) = n, and suppose S is a nontrivial subspace of V . Since S is nontrivial, it contains a nonzero vector s1 . Then either {v1 } spans S, or there exists v2 ∈ S \ sp{v1 }. In the first case, {v1 } is a basis for S (since any set containing a single nonzero vector is linearly independent by Exercise 2.5.2). In the second case, {v1 , v2 } is linearly independent by Exercise 2.5.4. Either {v1 , v2 } spans S, in which case it is a basis for S, or we can find v3 ∈ S \ sp{v1 , v2 }. We continue to add vectors in this fashion until we obtain a basis for S. We know that the process will end with a linearly independent spanning set for S, containing at most n vectors, because S ⊂ V , and any linearly independent set with n vectors spans all of V by Theorem 45. 13. Let F be a finite field with q distinct elements, and let n be positive integer, n ≥ q. We wish to prove that dim (Pn (F )) = q by showing that {1, x, . . . , xq−1 } is a basis for Pn (F ). We have already seen (in Exercise 2.6.11) that {1, x, . . . , xq−1 } is linearly independent. Consider the following vectors in F q : ⎤ ⎡ ⎡ ⎡ 2 ⎤ ⎡ q−1 ⎤ ⎤ α1 α1 1 α1 ⎢ 1 ⎥ ⎢ α2 ⎥ ⎢ α22 ⎥ ⎢ αq−1 ⎥ ⎥ ⎢ ⎢ ⎢ ⎢ 2 ⎥ ⎥ ⎥ v1 = ⎢ . ⎥ , v2 = ⎢ . ⎥ , v3 = ⎢ . ⎥ , . . . , vq = ⎢ . ⎥ . ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ ⎣ .. ⎦ 1
αq
α2q
αq−1 q
The equation c1 v1 + c2 v2 + · · · + cq vq = 0 is equivalent to the q equations c1 · 1 + c2 αi + · · · + cq αq−1 = 0, i = 1, 2, . . . , q, i which collectively are equivalent to the statement c1 · 1 + c2 x + · · · + cq xq−1 = 0 for all x ∈ F. Since {1, x, . . . , xq−1 } is linearly independent, this equation implies that c1 = c2 = · · · = cq = 0, and hence we have shown that {v1 , v2 , . . . , vq } is linearly independent in F q . Since we know that dim(F q ) = q, Theorem 45 implies that {v1 , v2 , . . . , vq } also spans F q . Now let p be any polynomial in Pn (F ), and define ⎤ ⎡ p(α1 ) ⎢ p(α2 ) ⎥ ⎥ ⎢ u=⎢ ⎥ ∈ F q. .. ⎦ ⎣ . p(αq ) Since {v1 , v2 , . . . , vq } spans F q , there exist scalars c1 , c2 , . . . , cq ∈ F such that c1 v1 + c2 v2 + · · · + cq vq = u. This last equation is equivalent to c1 · 1 + c2 αi + · · · + cq αq−1 = p(αi ), i = 1, 2, . . . , q, i and hence to c1 · 1 + c2 x + · · · + cq xq−1 = p(x) for all x ∈ F. This shows that p ∈ sp{v1 , v2 , . . . , vq }, and hence that {v1 , v2 , . . . , vq } is a basis for Pn (F ). Thus dim(Pn (F )) = q, as desired. 14. Let V be an n-dimensional vector space over a field F , and suppose S and T are subspaces of V satisfying S ∩ T = {0}. Suppose that {s1 , s2 , . . . , sk } is a basis for S, {t1 , t2 , . . . , t } is a basis for T , and k + = n. We wish to prove that {s1 , s2 , . . . , sk , t1 , t2 , . . . , t } is a basis for V . This follows immediately from Theorem 45, since we have already shown in Exercise 2.5.15 that {s1 , s2 , . . . , sk , t1 , t2 , . . . , t } is linearly independent. 15. Let V be a vector space over a field F , and let {u1 , . . . , un } be a basis for V . Let v1 , . . . , vk be vectors in V , and suppose vj = α1,j u1 + . . . + αn,j un , j = 1, 2, . . . , k.
CHAPTER 2. FIELDS AND VECTOR SPACES
36 Define the vectors x1 , . . . , xk in F n by
xj = (α1,j , . . . , αn,j ), j = 1, 2, . . . , k. (a) We first prove that {v1 , . . . , vk } is linearly independent if and only if {x1 , . . . , xk } is linearly independent. We will do this by showing that c1 v1 +· · ·+ck vk = 0 in V is equivalent to c1 x1 +· · ·+ck xk = 0 in F n . Then the first equation has only the trivial solution if and only if the second equation does, and the result follows. The proof is a direct manipulation, for which summation notation is convenient: n k k cj vj = 0 ⇔ j=1
cj
αij ui
j=1 k
i=1 n
⇔
cj αij ui = 0 j=1 i=1 n
k
⇔
cj αij ui = 0 i=1 j=1 n
⇔
⎛ ⎝
i=1
k
⎞ cj αij ⎠ ui = 0.
j=1
Since {u1 , . . . , un } is linearly independent, the last equation is equivalent to k
cj αij = 0, i = 1, 2, . . . , n, j=1
which, by definition of xj and of addition in F n , is equivalent to k
cj xj = 0. j=1
This completes the proof. (b) Now we show that {v1 , . . . , vk } spans V if and only if {x1 , . . . , xk } spans F n . Since each vector in V can be represented uniquely as a linear combination of u1 , . . . , un , there is a one-to-one correspondence between V and F n : w = c1 u1 + · · · + cn un ∈ V ←→ x = (c1 , . . . , cn ) ∈ F n . Mimicking the manipulations in the first part of the exercise, we see that k
k
cj vj = w ⇔ j=1
cj xj = x. j=1
Thus the first equation has a solution for every v ∈ V if and only if the second equation has a solution for every x ∈ F n . The result follows. 16. Consider the polynomials p1 (x) = −1 + 3x + 2x2 , p2 (x) = 3 − 8x − 4x2 , and p3 (x) = −1 + 4x + 5x2 in P2 . We wish to use the result of the previous exercise to determine if {p1 , p2 , p3 } is linearly independent. The standard basis for P2 is {u1 , u2 , u3 } = {1, x, x2 }. In terms of this basis, p1 , p2 , p3 correspond to ⎤ ⎤ ⎤ ⎡ ⎡ ⎡ −1 3 −1 x1 = ⎣ 3 ⎦ , x2 = ⎣ −8 ⎦ , x3 = ⎣ 4 ⎦ ∈ R3 , 2 −4 5 respectively. A direct calculation shows that c1 x1 +c2 x2 +c3 x3 = 0 has only the trivial solution. Therefore {x1 , x2 , x3 } and {p1 , p2 , p3 } are both linearly independent.
2.8. POLYNOMIAL INTERPOLATION AND THE LAGRANGE BASIS
2.8
37
Polynomial interpolation and the Lagrange basis
1. (a) The Lagrange polynomials for the interpolation nodes x0 = 1, x1 = 2, x3 = 3 are 1 (x − 2)(x − 3) = − (x − 2)(x − 3), (1 − 2)(1 − 3) 2 (x − 1)(x − 3) L1 (x) = = −(x − 1)(x − 3), (2 − 1)(2 − 3) 1 (x − 1)(x − 2) L2 (x) = = (x − 1)(x − 2). (3 − 1)(3 − 2) 2 L0 (x) =
(b) The quadratic polynomial interpolating (1, 0), (2, 2), (3, 1) is p(x) = 0L0 (x) + 2L1 (x) + L2 (x) 1 = −2(x − 1)(x − 3) + (x − 1)(x − 2) 2 3 2 13 = − x + x − 5. 2 2 2. (a) The Lagrange polynomials for the interpolation nodes x0 = −2, x1 = −1, x2 = 0, x3 = 1, x4 = 2 are 1 (x + 1)x(x − 1)(x − 2) = x(x + 1)(x − 1)(x − 2), (−2 + 1)(−2 − 0)(−2 − 1)(−2 − 2) 24 1 (x + 2)x(x − 1)(x − 2) L1 (x) = = − x(x + 2)(x − 1)(x − 2), (−1 + 2)(−1 − 0)(−1 − 1)(−1 − 2) 6 1 (x + 2)(x + 1)(x − 1)(x − 2) L2 (x) = = (x + 2)(x + 1)(x − 1)(x − 2), (0 + 2)(0 + 1)(0 − 1)(0 − 2) 4 1 (x + 2)(x + 1)x(x − 2) L3 (x) = = − x(x + 2)(x + 1)(x − 2), (1 + 2)(1 + 1)(1 − 0)(1 − 2) 6 1 (x + 2)(x + 1)x(x − 1) L4 (x) = = x(x + 2)(x + 1)(x − 1). (2 + 2)(2 + 1)(2 − 0)(2 − 1) 24
L0 (x) =
(b) Using the Lagrange basis, we find the interpolating polynomial passing through the points (−2, 10), (−1, −3), (0, 2), (1, 7), (2, 18) to be p(x) = 10L0 (x) − 3L1 (x) + 2L2 (x) + 7L3 (x) + 18L4(x) 1 5 x(x + 1)(x − 1)(x − 2) + x(x + 2)(x − 1)(x − 2)+ = 12 2 1 7 (x + 2)(x + 1)(x − 1)(x − 2) − x(x + 2)(x + 1)(x − 2)+ 2 6 3 x(x + 2)(x + 1)(x − 1). 4 A tedious calculation shows that p(x) = x4 − x3 − x2 + 6x + 2, which is the same result obtained in Example 48. 3. Consider the data (1, 5), (2, −4), (3, −4), (4, 2). We wish to find the cubic polynomial interpolating these points. (a) Using the standard basis, we write p(x) = c0 + c1 x + c2 x2 + c3 x4 . The equations p(1) = 5, p(2) = −4, p(3) = −4, p(4) = 2
CHAPTER 2. FIELDS AND VECTOR SPACES
38 are equivalent to the system
c0 + c1 + c2 + c3 = 5, c0 + 2c1 + 4c2 + 8c3 = −4, c0 + 3c1 + 9c2 + 27c3 = −4, c0 + 4c1 + 16c2 + 64c3 = 2. Gaussian elimination yields c0 = 26, c1 = −28, c2 =
15 1 , c3 = − , 2 2
and thus
15 2 1 3 x − x . 2 2 (b) The Lagrange polynomials for these interpolation nodes are p(x) = 26 − 28x +
1 (x − 2)(x − 3)(x − 4) = − (x − 2)(x − 3)(x − 4), (1 − 2)(1 − 3)(1 − 4) 6 (x − 1)(x − 3)(x − 4) 1 L1 (x) = = (x − 1)(x − 3)(x − 4), (2 − 1)(2 − 3)(2 − 4) 2 1 (x − 1)(x − 2)(x − 4) = − (x − 1)(x − 2)(x − 4), L2 (x) = (3 − 1)(3 − 2)(3 − 4) 2 1 (x − 1)(x − 2)(x − 3) L2 (x) = = (x − 1)(x − 2)(x − 3), (4 − 1)(4 − 2)(4 − 3) 6 L0 (x) =
and the interpolating polynomial is p(x) = 5L0 (x) − 4L1 (x) − 4L2 (x) + 2L3 (x) 5 = − (x − 2)(x − 3)(x − 4) − 2(x − 1)(x − 3)(x − 4)+ 6 1 2(x − 1)(x − 2)(x − 4) + (x − 1)(x − 2)(x − 3). 3 A tedious calculation shows that this is the same polynomial computed in the first part. 4. Let {L0 , L1 , . . . , Ln } be the Lagrange basis constructed on the interpolation nodes x0 , x1 , . . . , xn ∈ F . We wish to prove that, for all p ∈ Pn (F ), p(x) = p(x0 )L0 (x) + p(x1 )L1 (x) + . . . + p(xn )Ln (x) Let q(x) be the polynomial on the right. By the definition of the Lagrange polynomials, we see that q(xi ) = p(xi ) for i = 0, 1, . . . , n. If we define the polynomial r(x) = p(x) − q(x), then we see that r has n + 1 roots: r(xi ) = p(xi ) − q(xi ) = p(xi ) − p(xi ) = 0, i = 0, 1, . . . , n. However, r is a polynomial of degree at most n, and therefore this is impossible unless r is the zero polynomial (compare the discussion on page 45 in the text). This shows that q = p, as desired. 5. We wish to write p2 (x) = 2 + x − x2 as a linear combination of the Lagrange polynomials constructed on the nodes x0 = −1, x1 = 1, x2 = 3. The graph of p passes through the points (−1, p(−1)), (1, p(1)), (3, p(3)), that is, (−1, 0), (1, 2), (3, −4). The Lagrange polynomials are 1 (x − 1)(x − 3) = (x − 1)(x − 3), (−1 − 1)(−1 − 3) 8 1 (x + 1)(x − 3) L1 (x) = = − (x + 1)(x − 3), (1 + 1)(1 − 3) 4 1 (x + 1)(x − 1) L2 (x) = = (x + 1)(x − 1), (3 + 1)(3 − 1) 8 L0 (x) =
2.8. POLYNOMIAL INTERPOLATION AND THE LAGRANGE BASIS
39
and therefore, p(x) = 0L0 (x) + 2L1 (x) − 4L2 (x) 1 1 = − (x + 1)(x − 3) − (x + 1)(x − 1). 2 2 6. Let F be a field and suppose (x0 , y0 ), (x1 , y1 ), . . . , (xn , yn ) are points in F 2 . We wish to show that the polynomial interpolation problem has at most one solution, assuming the interpolation nodes x0 , x1 , . . . , xn are distinct. This follows from reasoning we have seen before. If p, q ∈ Pn (F ) both interpolate the given data, then r = p − q is a nonzero polynomial of degree at most n having n + 1 roots (x0 , x1 , . . . , xn ). The only polynomial of degree n (or less) having n + 1 roots is the zero polynomial; thus p = q and there is at most one interpolating polynomial. 7. Let F be a field and suppose (x0 , y0 ), (x1 , y1 ), . . . , (xn , yn ) are points in F 2 . We wish to show that the polynomial interpolation problem has at most one solution, assuming the interpolation nodes x0 , x1 , . . . , xn are distinct. Suppose p, q ∈ Pn (F ) both interpolate the data, and let {L0 , L1 , . . . , Ln } be the basis for Pn (F ) of Lagrange polynomials for the given interpolation nodes. Both p and q can be written in terms of this basis: n
p=
n
αi Li , q = i=0
βi L i . i=0
Now, we know that the Lagrange polynomials satisfy 1, j = i, Li (xj ) = 0, j = i. It follows that
n
p(xj ) =
n
αi Li (xj ) = αj , q(xj ) = i=0
βi Li (xj ) = βj . i=0
But, since p and q interpolate the given data, p(xj ) = q(xj ) = yj . This shows that αj = βj , j = 0, 1, . . . , n, and hence that p = q. 8. Suppose x0 , x1 , . . . , xn are distinct real numbers. We wish to prove that, for any real numbers y0 , y1 , . . . , yn , the system c0 + c1 x0 + c2 x20 + . . . + cn xn0 = y0 , c0 + c1 x1 + c2 x21 + . . . + cn xn1 = y1 , .. .. . . c0 + c1 xn + c2 x2n + . . . + cn xnn = yn has a unique solution c0 , c1 , . . . , cn . This follows immediately from our work on interpolating polynomials: c0 , c1 , . . . , cn solves the given system if and only if p(x) = c0 + c1 x + · · · + cn xn interpolates the data (x0 , y0 ), (x1 , y1 ), . . . , (xn , yn ). Since there is a unique interpolating polynomial p ∈ Pn , and since this polynomial can be uniquely represented in terms of the standard basis {1, x, . . . , xn }, it follows that the given system of equations has a unique solution. 9. We wish to represent every function f : Z2 → Z2 by a polynomial in P1 (Z2 ). There are exactly four different functions f : Z2 → Z2 , as defined in the following table: x 0 1
f1 (x) 0 0
f2 (x) 1 0
f3 (x) 0 1
f4 (x) 1 1
We have f1 (x) = 0 (the zero polynomial), f2 (x) = 1 + x, f3 (x) = x, and f4 (x) = 1.
CHAPTER 2. FIELDS AND VECTOR SPACES
40
10. The following table defines three functions mapping Z3 → Z3 . We wish to find a polynomial in P2 (Z3 ) representing each one. x 0 1 2
f1 (x) 1 2 0
f2 (x) 0 0 1
f3 (x) 2 2 1
We can compute f1 , f2 , and f3 as interpolating polynomials. The Lagrange polynomials for the given interpolation nodes are (x − 1)(x − 2) = 2x2 + 2, (0 − 1)(0 − 2) x(x − 2) = 2x2 + 2x, L1 (x) = (1 − 0)(1 − 2) x(x − 1) L2 (x) = = 2x2 + x. (2 − 0)(2 − 1)
L0 (x) =
(Here we have used the arithmetic of Z3 to simplify the polynomials: −1 = 2, −2 = 1, 2−1 = 2, etc.). We then have f1 (x) = L0 (x) + 2L1 (x) = 2x2 + 1 + x2 + x = 1 + x, f2 (x) = L2 (x) = 2x2 + x, f3 (x) = 2L0 (x) + 2L1 (x) + L2 (x) = x2 + 2 + x2 + x + 2x2 + x = x2 + 2x + 2. 11. Consider a secret sharing scheme in which five individuals will receive information about the secret, and any two of them, working together, will have access to the secret. Assume that the secret is a twodigit integer, and that p is chosen to be 101. The degree of the polynomial will be one, since then the polynomial will be uniquely determined by two data points. Let us suppose that the secret is N = 42 and we choose the polynomial to be p(x) = N + c1 x, where c1 = 71 (recall that c1 is chosen at random). We also choose the five interpolation nodes at random to obtain x1 = 9, x2 = 14, x3 = 39, x4 = 66, and x5 = 81. We then compute y1 = p(x1 ) = 42 + 71 · 9 = 75, y2 = p(x2 ) = 42 + 71 · 14 = 26, y3 = p(x3 ) = 42 + 71 · 39 = 84, y4 = p(x4 ) = 42 + 71 · 66 = 82, y5 = p(x5 ) = 42 + 71 · 81 = 36 (notice that all arithmetic is done modulo 101). The data points, to be distributed to the five individuals, are (9, 75), (14, 26), (39, 84), (66, 82), (81, 36). 12. An integer N satisfying 1 ≤ N ≤ 256 represents a secret to be shared among five individuals. Any three of the individuals are allowed access to the information. The secret is encoded in a polynomial p, according to the secret sharing scheme described in Section 2.8.1, lying in P2 (Z257 ). Suppose three of the individuals get together, and their data points are (15, 13), (114, 94), and (199, 146). We wish to determine the secret. We begin by finding the Lagrange polynomials for the interpolation nodes 15, 114,
2.8. POLYNOMIAL INTERPOLATION AND THE LAGRANGE BASIS and 199: (x − 114)(x − 199) = 58x2 + 93x + 205, (15 − 114)(15 − 199) (x − 15)(x − 199) = 74x2 + 98x + 127, L1 (x) = (114 − 15)(114 − 199) (x − 15)(x − 114) = 125x2 + 66x + 183. L2 (x) = (199 − 15)(199 − 114) L0 (x) =
The interpolating polynomial p is p(x) = 13L0 (x) + 94L1(x) + 146L2(x). We need only compute p(0): p(0) = 13L0 (0) + 94L1 (0) + 146L2 (0) = 13 · 205 + 94 · 127 + 146 · 183 = 201. Thus the secret is N = 201. (The polynomial p(x) simplifies to p(x) = 3x2 + 11x + 201). 13. We wish to solve the following interpolation problem: Given v1 , v2 , d1 , d2 ∈ R, find p ∈ P3 such that p(0) = v1 , p(1) = v2 , p (0) = d1 , p (1) = d2 . (a) If we represent p as p(x) = c0 + c1 x + c2 x2 + c3 x3 , then the given conditions p(0) = v1 , p (0) = d1 , p(1) = v2 , p (1) = d2 are equivalent to the system c0 = v1 , c1 = d1 , c0 + c1 + c2 + c3 = v2 , c1 + 2c2 + 3c3 = d2 . It is straightforward to solve this system: c0 = v1 , c1 = d1 , c2 = 3v2 − 3v1 − 2d1 − d2 , c3 = 2v1 − 2v2 + d1 + d2 . (b) We now find the special basis {q1 , q2 , q3 , q4 } of P3 satisfying the following conditions: q1 (0) = 1, q1 (0) = 0, q1 (1) = 0, q1 (1) = 0, q2 (0) = 0, q2 (0) = 1, q2 (1) = 0, q2 (1) = 0, q3 (0) = 0, q3 (0) = 0, q3 (1) = 1, q3 (1) = 0,
q4 (0) = 0, q4 (0) = 0, q4 (1) = 0, q4 (1) = 1.
We can use the result of the first part of this exercise to write down the solutions immediately: q1 (x) = 1 − 3x2 + 2x3 , q2 (x) = x − 2x2 + x3 , q1 (x) = 3x2 − 2x3 , q1 (x) = −x2 + x3 . In terms of the basis {q1 , q2 , q3 , q4 }, the solution to the interpolation problem is p(x) = v1 q1 (x) + d1 q2 (x) + v2 q3 (x) + d2 q4 (x).
41
CHAPTER 2. FIELDS AND VECTOR SPACES
42
14. We are given n + 1 interpolation nodes, x0 , x1 , . . . , xn . Given v0 , v1 , . . . , vn ∈ R, d0 , d1 , . . . , dn ∈ R, we wish to find p ∈ P2n+1 such that p(xi ) = vi , p (xi ) = di , i = 0, 1, . . . , n. We define (in terms of the Lagrange polynomials L0 , L1 , . . . , Ln for the nodes x0 , x1 , . . . , xn ) Ai (x) = (1 − 2(x − xi )L i (xi )) L2i (x), Bi (x) = (x − xi )L2i (x) for i = 0, 1, . . . , n. (a) Since each Lagrange polynomial Li has degree exactly n, we see that Bi has degree 2n + 1 for each i = 0, 1, . . . , n, while Ai has degree 2n + 1 if L i (xi ) = 0 and degree 2n if L i (xi ) = 0. Thus, in every case, Ai , Bi ∈ P2n+1 for all i = 0, 1, . . . , n. (b) If j = i, then Li (xj ) = 0 and Ai (xj ) = (1 − 2(xj − xi )L i (xi ))L2i (xj ) = (1 − 2(xj − xi )L i (xi )) · 0 = 0. If j = i, then Li (xj ) = 1 and Ai (xj ) = (1 − 2(xi − xi )L i (xi ))L2i (xi ) = 1 · 1 = 1. Therefore,
Ai (xj ) =
Also,
1, i = j, 0, i = j.
A i (x) = 2(1 − 2(x − xj )L i (xi ))Li (x)L i (x) − 2L i (xi )L2i (x).
If j = i, then Li (xj ) = 0 and therefore A i (xj ) = 0 (since both terms contain a factor of Li (xj )). If j = i, then Li (xj ) = 1 and A i (xj ) simplifies to A i (xj ) = 2L i (xj ) − 2L i (xj ) = 0. Thus A i (xj ) = 0, j = 0, 1, . . . , n. (c) Since either xj − xi = 0 (if j = i) or Li (xj ) = 0 (if j = i), it follows that Bi (xj ) = 0 for all j = 0, 1, . . . , n. Now, Bi (x) = L2i (x) + 2(x − xi )Li (x)L i (x). If j = i, then
Bi (xj ) = L2i (xj ) + 2(xj − xi )Li (xj )L i (xj ) = 0 − 0 = 0
since Li (xj ) = 0. Also, Bi (xi ) = L2i (xi ) + 2(xi − xi )Li (xi )L i (xi ) = 1 − 0 = 1. Therefore, Bi (xj ) =
1, i = j, 0, i = j.
(d) We now wish to prove that {A0 , . . . , An , B0 , . . . , Bn } is a basis for P2n+1 . Since dim(P2n+1 ) = 2n+2 and the proposed basis contains 2n + 2 elements, it suffices to prove that the set spans P2n+1 . Given any p ∈ P2n+1 , define vj = p(xj ), dj = p (xj ), j = 0, 1, . . . , n. Then n
q(x) =
n
vi Ai (x) + i=0
di Bi (x) i=0
2.9. CONTINUOUS PIECEWISE POLYNOMIAL FUNCTIONS
43
agrees with p at each xj , and q agrees with p at each xj (see below). Define r = p − q. Then r is a polynomial of degree at most 2n + 1, and r(xj ) = r (xj ) = 0 for j = 0, 1, . . . , n. Using elementary properties of polynomials, this implies that r(x) can be factored as r(x) = f (x)(x − x0 )2 (x − x1 )2 · · · (x − xn )2 , where f (x) is a polynomial. But then deg(r(x)) ≥ 2n + 2 unless f = 0. Since we know that deg(r(x)) ≤ 2n + 1, it follows that f = 0, in which case r = 0 and hence q = p. Thus p ∈ sp{A0 , . . . , An , B0 , . . . , Bn }. This proves that the given set is a basis for P2n+1 . Using the properties of A0 , A1 , . . . , An , B0 , B1 , . . . , Bn derived above, the solution to the Hermite interpolation problem is n
p(x) =
n
vi Ai (x) + i=0
di Bi (x). i=0
To verify this, notice that n
p(xj ) =
n
vi Ai (xj ) + i=0
di Bi (xj ). i=0
Every term in the second sum vanishes, as do all terms in the first sum except vj Aj (xj ) = vj · 1 = vj . Thus p(xj ) = vj . Also, p (xj ) =
n
vi A i (xj ) +
i=0
n
di Bi (xj ).
i=0
Now every term in the first sum vanishes, as do all terms in the second sum except dj Bj (xj ) = dj · 1 = dj . Therefore p (xj ) = dj , and p is the desired interpolating polynomial.
2.9
Continuous piecewise polynomial functions
1. The following table shows the maximum errors obtained in approximating f (x) = ex on the interval [0, 1] by polynomial interpolation and by piecewise linear interpolation, each on a uniform grid with n nodes. n 1 2 3 4 5 6 7 8 9 10
Poly. interp. err. 2.1187 · 10−1 1.4420 · 10−2 9.2390 · 10−4 5.2657 · 10−5 2.6548 · 10−6 1.1921 · 10−7 4.8075 · 10−9 1.7565 · 10−10 5.8575 · 10−12 1.8119 · 10−13
PW linear interp. err. 2.1187 · 10−1 6.6617 · 10−2 3.2055 · 10−2 1.8774 · 10−2 1.2312 · 10−2 8.6902 · 10−3 6.4596 · 10−3 4.9892 · 10−3 3.9692 · 10−3 3.2328 · 10−3
For this example, polynomial interpolation is very effective. 2. The following table shows the maximum errors obtained in approximating f (x) = 1/(1 + x2 ) on the interval [−5, 5] by polynomial interpolation and by piecewise linear interpolation, each on a uniform grid with n nodes.
CHAPTER 2. FIELDS AND VECTOR SPACES
44 n 1 2 3 4 5 6 7 8 9 10
Poly. interp. err. 9.6154 · 10−1 6.4615 · 10−1 7.0701 · 10−1 4.3836 · 10−1 4.3269 · 10−1 6.1695 · 10−1 2.4736 · 10−1 1.0452 3.0028 · 10−1 1.9156
PW linear interp. err. 9.6154 · 10−1 4.1811 · 10−1 7.3529 · 10−1 1.8021 · 10−1 5.0000 · 10−1 6.2304 · 10−2 3.3784 · 10−1 6.3898 · 10−2 2.3585 · 10−1 6.7431 · 10−2
Here we see that polynomial interpolation is not effective (cf. Figure 2.6 in the text). Also, we see that piecewise linear interpolation is much more effective with n even, since then there is an interpolation node at x = 0, which is the peak of the graph of f . 3. We now repeat the previous two exercises, using piecewise quadratic interpolation instead of piecewise linear interpolation. We use a uniform grid of 2n + 1 nodes. (a) Here the function is f (x) = ex on [0, 1]. n 1 2 3 4 5
Poly. interp. err. 1.4416 · 10−2 5.2637 · 10−5 1.1921 · 10−7 1.7565 · 10−10 1.8119 · 10−13
PW quad. interp. err. 1.4416 · 10−2 2.2079 · 10−3 7.0115 · 10−4 3.0632 · 10−4 1.6017 · 10−4
Once again, polynomial interpolation is quite effective for this function. We only go to n = 5, since for larger n we reach the limits of the finite precision arithmetic (in standard double precision). (b) Here f (x) = 1/(1 + x2 ) on [−5, 5]. n 1 2 3 4 5 6 7 8
Poly. interp. err. 6.4615 · 10−1 4.3818 · 10−1 6.1667 · 10−1 1.0452 1.9156 3.6629 7.1920 1.4392 · 101
PW quad. interp. err. 6.4615 · 10−1 8.5472 · 10−2 2.3570 · 10−1 9.7631 · 10−2 8.5779 · 10−2 7.4693 · 10−2 3.4671 · 10−2 4.7781 · 10−2
For n > 8, it is not possible to solve the linear system defining the polynomial interpolant accurately in standard double precision arithmetic. Once again, for this example, we see the advantage of using piecewise polynomials to approximate f . 4. Let x0 , x1 , . . . , xn define a uniform mesh on [a, b] (that is, xi = a+ih, i = 0, 1, . . . , n, where h = (b−a)/n). We wish to prove that hn+1 |(x − x0 )(x − x1 ) · · · (x − xn )| max ≤ . x∈[a,b] (n + 1)! 2(n + 1) Let x ∈ [a, b] be given. If x equals any of the nodes x0 , x1 , . . . , xn , then the left-hand side is zero, and thus the inequality holds. So let us assume that x ∈ (xi−1 , xi ) for some i = 1, 2, . . . , n. The distance from x to the nearer of xi−1 , xi is at most h/2 and the distance to the further of the two is at most h. If we then list the remaining n − 1 nodes in order of distance from x (nearest to furthest), we see that the
2.9. CONTINUOUS PIECEWISE POLYNOMIAL FUNCTIONS
45
distances are at most 2h, 3h, . . . , nh. Therefore, |(x − x0 )(x − x1 ) · · · (x − xn )| = |x − x0 ||x − x1 | · · · |x − xn | 1 h (h)(2h) · · · (nh) = n!hn+1 . ≤ 2 2 Therefore,
|(x − x0 )(x − x1 ) · · · (x − xn )| 1 n!hn+1 hn+1 ≤ . (n + 1)! 2 (n + 1)! 2(n + 1)
Since this holds for each x ∈ [a, b], the proof is complete. 5. We wish to derive a bound on the error in piecewise quadratic interpolation in the case of a uniform mesh. We use the notation of the text, and consider an arbitrary element [x2i−2 , x2i ] (with x2i − x2i−2 = h). Let f belonging to C 3 [a, b] be approximated on this element by the quadratic q that interpolates f at x2i−2 , x2i−1 , x2i . Then the remainder term is given by f (x) − q(x) =
f (3) (cx ) (x − x2i−2 )(x − x2i−1 )(x − x2i ), x ∈ [a, b], 3!
where cx ∈ [a, b]. We assume |f (3) (x)| ≤ M for all x ∈ [a, b]. We must maximize |(x − x2i−2 )(x − x2i−1 )(x − x2i )| on [a, b]. This is equivalent, by a simple change of variables, to maximizing |p(x)| on [0, h], where p(x) = x(x − h/2)(x − h). This is a simple problem in single-variable calculus, and we can easily verify that h3 |p(x)| ≤ √ for all x ∈ [0, h]. 12 3 Therefore, |f (x) − q(x)| ≤
h3 |f (3) (cx )| 3 |f (3) (cx )| √ h for all x ∈ [x2i−2 , x2i ]. · √ = 3! 12 3 72 3
Since this is valid over every element in the mesh, we obtain |f (x) − q(x)| ≤
M 3 √ h for all x ∈ [a, b]. 72 3
Chapter 3
Linear operators 3.1
Linear operators
1. Let m, b be real numbers, and define f : R → R be defined by f (x) = mx + b. If b is nonzero, then this function is not linear. For example, f (2 · 1) = f (2) = 2m + b, 2f (1) = 2(m + b) = 2m + 2b. Since b = 0, 2m + b = 2m + 2b, which shows that f is not linear. On the other hand, if b = 0, then f is linear: f (x + y) = m(x + y) = mx + my = f (x) + f (y) for all x, y ∈ R, f (ax) = m(ax) = a(mx) = af (x) for all a, x ∈ R. Thus we see that f (x) = mx + b is linear if and only if b = 0. 2. We wish to prove that f : R → R defined by f (x) = x2 is not linear. This is simple; for example, f (1 + 1) = f (2) = 4, f (1) + f (1) = 1 + 1 = 2. (In general, f (x + y) = x2 + 2xy + y 2 = f (x) + f (y) + 2xy, which does not equal f (x) + f (y) unless at least one of x, y is zero.) 3. Let L : R3 → R3 be defined by the following conditions: (a) L is linear; (b) L(e1 ) = (1, 2, 1); (c) L(e2 ) = (2, 0, −1); (d) L(e3 ) = (0, −2, −3). Here {e1 , e2 , e3 } is the standard basis for R3 . We wish to show that there is a matrix A ∈ R3×3 such that L(x) = Ax for all x ∈ R3 . For any x ∈ R3 , we have x = x1 e1 +x2 e2 +x3 e3 . Since L is linear, it follows that L(x) = x1 L(e1 ) + x2 L(e2 ) + x3 L(e3 ). On the other hand, by definition of matrix-vector multiplication, if A ∈ R3×3 , then Ax = x1 A1 + x2 A2 + x3 A3 , where A1 , A2 , A3 are the columns of A. Comparing these two formulas, we see that if we define A to be the matrix whose columns are L(e1 ), L(e2 ), L(e3 ), then L(x) = Ax for all x ∈ R3 . Thus the desired matrix is ⎤ ⎡ 1 2 0 0 −2 ⎦ . A=⎣ 2 1 −1 −3 47
CHAPTER 3. LINEAR OPERATORS
48
4. Consider the operator M : Pn → Pn+1 defined by M (p) = q, where q(x) = xp(x) (that is, M (p)(x) = xp(x)). We wish to show that M is linear. If p, q ∈ Pn , then M (p + q)(x) = x((p + q)(x)) = x(p(x) + q(x)) = xp(x) + xq(x) = M (p)(x) + M (q)(x). Thus M (p + q) = M (p) + M (q) for all p, q ∈ Pn . If p ∈ Pn and α ∈ R, then M (αp)(x) = x((αp)(x)) = x(αp(x)) = xαp(x) = α(xp(x)) = αM (p)(x), which shows that M (αp) = αM (p). This proves that M is linear. (Notice how we used the definitions of addition and scalar multiplication for polynomials: (p + q)(x) = p(x) + q(x), (αp)(x) = αp(x).) 5. We wish to determine which of the following real-valued functions defined on Rn is linear. n i=1 xi . n g : Rn → R, g(x) = i=1 |xi |. n h : Rn → R, h(x) = i=1 xi .
(a) f : Rn → R, f (x) = (b) (c)
The function f is linear: n
f (x + y) =
n
(x + y)i = i=1 n
i=1 n
(αx)i =
f (αx) =
n
i=1
n
(xi + yi ) = i=1
yi = f (x) + f (y), i=1
n
αxi = α i=1
xi +
xi = αf (x). i=1
However, g and h are both nonlinear. For instance, if x = 0, then g(−x) = −g(x) (in fact, g(−x) = g(x) for all x ∈ Rn ). Also, if no component of x is zero, then h(2x) = 2n h(x) = 2h(x) (of course, we are assuming n > 1). 6. Let A : C[a, b] → C 1 [a, b] be defined by the condition A(f ) = F , where F = f (that is, A(f ) is an antiderivative of f ). We wish to determine whether A is linear or nonlinear. Actually, this is something of a trick question, since A is not well-defined. Each f ∈ C[a, b] has infinitely many antiderivatives (any two differing by a constant). Therefore, to define A we have to specify a specific antiderivative. A natural choice would be x A(f )(x) = f (s) ds. a
With this definition, A is linear: A(f + g)(x) =
x
x (f + g)(s) ds =
a
(f (s) + g(s)) ds x x = f (s) ds + g(s) ds a
a
a
= A(f )(x) + A(g)(x), x x A(αf )(x) = (αf )(s) ds = αf (s) ds a a x =α f (s) ds a
= α(A(f )(x)). Thus A(f + g) = A(f ) + A(g) for all f, g ∈ C[a, b] and A(αf ) = αA(f ) for all f ∈ C[a, b] and α ∈ R.
3.1. LINEAR OPERATORS
49
7. Let Q : Pn → Pn+1 be defined by Q (c0 + c1 x + · · · + cn xn ) = c0 x +
c1 2 cn n+1 x + ···+ x . 2 n+1
We will show that Q is linear. Actually, Q is the operator A of the previous exercise, with a = 0, restricted to the subspace Pn of C[0, b]. A direct proof is straightforward: Q((a0 + a1 x + · · · + an xn ) + (c0 + c1 x + · · · + cn xn )) = Q((a0 + c0 ) + (a1 + c1 )x + · · · + (an + cn )xn ) a 1 + c1 2 an + cn n+1 x + ··· + x = (a0 + c0 )x + 2 n+1 a1 an n+1 c1 cn n+1 x x = a0 x + x2 + · · · + + c0 x + x2 + · · · + 2 n+1 2 n+1 = Q(a0 + a1 x + · · · + an xn ) + Q(c0 + c1 x + · · · + cn xn ), and Q(α(a0 + a1 x + · · · + an xn )) = Q((αa0 ) + (αa1 )x + · · · + (αan )xn ) αa1 2 αan n+1 x + ···+ x = (αa0 )x + 2 n+1 a1 2 an n+1 x = α a0 x + x + · · · + 2 n+1 = αQ(a0 + a1 x + · · · + an xn ). 8. Let L : Pn → P2n−1 be defined by L(p) = pp . Then L is nonlinear; for example, if p is nonconstant, then L(2p) = (2p)(2p ) = 4pp = 4L(p) = 2L(p). One can also verify that L(p + q) = L(p) + L(q) for most polynomials p and q. 9. (a) If A ∈ C2×3 , x ∈ C3 are defined by A=
1+i 2−i
1 − i 2i 1 + 2i 3
then Ax = (b) If A ∈ Z3×3 , x ∈ Z32 are defined by 2
⎡
1 A=⎣ 1 0 then
⎤ 3 , x = ⎣ 2+i ⎦, 1 − 3i
12 + 4i 9 − 7i
⎡
.
⎤ ⎤ ⎡ 1 1 0 0 1 ⎦, x = ⎣ 1 ⎦, 1 1 1 ⎤ 0 Ax = ⎣ 0 ⎦ . 0 ⎡
10. (a) We wish to solve Ax = b, where A ∈ R3×3 , b ∈ R3 are defined by ⎤ ⎤ ⎡ ⎡ 4 1 −2 −4 A = ⎣ 5 −11 −15 ⎦ , b = ⎣ 23 ⎦ . −14 −2 6 −1
CHAPTER 3. LINEAR OPERATORS
50
By the definition of matrix-vector multiplication, Ax = b is equivalent to the system x1 − 2x2 − 4x3 = 4, 5x1 − 11x2 − 15x3 = 23, −2x1 + 6x2 − x3 = −14,
which can be solved by Gaussian elimination to yield x1 = −2, x2 = −3, x3 = 0, that is, x = (−2, −3, 0). (b) Now define A ∈ Z3×3 , b ∈ Z33 by 3 ⎡
2 A=⎣ 1 1
⎤ ⎡ ⎤ 0 1 2 0 1 ⎦, b = ⎣ 1 ⎦. 2 2 0
The system Ax = b is equivalent to 2x1 + x2 + 2x3 = 0, x1 + x3 = 1, x1 + 2x2 = 2, We solve this by Gaussian elimination (using arithmetic modulo 3) to find x1 = 0, x2 = 1, x3 = 1, or x = (0, 1, 1). 11. Let k : [a, b] × [c, d] → R be continuous (that is k(s, t) is a continuous function of s ∈ [a, b] and t ∈ [c, d]), and define K : C[c, d] → C[a, b] by d K(x)(s) =
k(s, t)x(t) dt, s ∈ [a, b].
c
The linearity of the operator K follows from the linearity of integration: d k(s, t)(x + y)(t) dt
K(x + y)(s) = c
d k(s, t)(x(t) + y(t)) dt
= c
d (k(s, t)x(t) + k(s, t)y(t)) dt
= c
d
d k(s, t)x(t) dt +
= c
= K(x)(s) + K(y)(s)
k(s, t)y(t) dt c
3.1. LINEAR OPERATORS
51
and d k(s, t)(αx)(t) dt
K(αx)(s) = c
d k(s, t)(αx(t)) dt
= c
d αk(s, t)x(t) dt
= c
d k(s, t)x(t) dt
=α c
= αK(x)(s). 12. Let F be a field and suppose A ∈ F m×n , B ∈ F n×p . By definition of matrix-matrix multiplication, we have (AB)j = ABj (that is, the jth column of AB is A times the jth column of B. It follows that n n (AB)ij = (ABj )i =
Bkj Ak k=1
= i
Aik Bkj . k=1
(In the last step, we used the fact that vector addition and scalar multiplication are defined componentwise in F m .) 13. We now wish to give a formula for the ith row of AB, assuming A ∈ F m×n , B ∈ F n×p . As pointed out in the text, we have a standard notation for the columns of a matrix, but no standard notation for the rows. Let us suppose that the rows of B are r1 , r2 , . . . , rn . Then n n (AB)ij =
Aik (rk )j = k=1
Aik rk k=1
j
(once again using the componentwise definition of the operations). This shows that the ith row of AB is n
Aik rk , k=1
that is, the linear combination of the rows of B, with the weights taken from the ith row of A. 14. Suppose X and U are vector spaces over a field F and L : X → U is linear. Suppose L is constant, say L(x) = v for all x ∈ X, where v ∈ U is a fixed vector. Since L(0) = 0 by Theorem 57, we see that v must be the zero vector. Thus the only constant linear operator is the zero operator. 15. Let X and U be vector spaces over a field F , and let L : X → U be linear. We wish to prove that L(α1 x1 + · · · + αk xk ) = α1 L(x1 ) + · · · + αk L(xk ) for all x1 , . . . , xk ∈ X and all α1 , . . . , αk ∈ F . We argue by induction on k. For k = 1, this follows directly from the definition of linearity: L(α1 x1 ) = α1 L(x1 ). Now suppose that the result holds for any linear combination of k − 1 vectors, and let x1 , . . . , xk ∈ X and α1 , . . . , αk ∈ F . Then we can write α1 x1 + · · · + αk xk = (α1 x1 + · · · + αk−1 xk−1 ) + αk xk .
CHAPTER 3. LINEAR OPERATORS
52 Therefore, since L is linear,
L(α1 x1 + · · · + αk xk ) = L((α1 x1 + · · · + αk−1 xk−1 ) + αk xk ) = L(α1 x1 + · · · + αk−1 xk−1 ) + L(αk xk ) = L(α1 x1 + · · · + αk−1 xk−1 ) + αk L(xk ). By the induction hypothesis, we have L(α1 x1 + · · · + αk−1 xk−1 ) = α1 L(x1 ) + · · · + αk−1 L(xk−1 ), and thus L(α1 x1 + · · · + αk xk ) = α1 L(x1 ) + · · · + αk−1 L(xk−1 ) + αk L(xk ). This completes the proof.
3.2
More properties of linear operators
1. Let A be an m × n matrix with real entries, and suppose n > m. We wish to prove that Ax = 0 has a nontrivial solution x ∈ Rn . Let the columns of A be A1 , A2 , . . . , An ; then Ax = x1 A1 + x2 A2 + · · · + xn Am . This is a linear combination of n vectors in Rm . Since n > m, {A1 , . . . , An } is linearly dependent, and hence there is a nontrivial solution to x1 A1 + x2 A2 + · · · + xn Am = 0. But this solution is equivalent to a nonzero vector x ∈ Rn such that Ax = 0. 2. Let R : R2 → R2 be the rotation of angle θ about the origin (a positive θ indicates a counterclockwise rotation). (a) Geometrically, it is obvious that R is linear. First of all, consider α > 0; then αx points in the same direction as x, with a different length (unless α = 1). The transformation R rotates both x and αx by the same angle, and thus R(x) and R(αx) point in the same direction. Moreover, R preserves lengths, so x and R(x) have the same length, as do αx and R(αx). Since the length of αx is α times the length of x, it follows that R(αx) is α times the length of R(x); that is, R(αx) = αR(x). A similar argument shows that R(αx) = αR(x) if α < 0. Now, if x, y ∈ R2 , then x + y is defined, geometrically, by the parallelogram rule (see Figure 2.2 in the text). Since every point of the plane is rotated by the same angle under R, it follows that the entire parallogram is rotated rigidly by R, and hence R(x + y) = R(x) + R(y). (b) We now use Theorem 66 to find the matrix A such that R(x) = Ax for all x ∈ R2 . Using elementary trigonometry, the vector e1 = (1, 0) is rotated to (cos (θ), sin (θ)) and the vector e2 = (0, 1) is rotated to (− sin (θ), cos (θ)). Therefore, cos (θ) − sin (θ) . A= sin (θ) cos (θ) 3. Consider the linear operator mapping R2 into itself that sends each vector (x, y) to its projection onto the x-axis, namely, (x, 0). We see that e1 is mapped to itself and e2 = (0, 1) is mapped to (0, 0). Therefore, the matrix representing the linear operator is 1 0 . A= 0 0 4. A (horizontal) shear acting on the plane maps a point (x, y) to the point (x + ry, y), where r is a real number. Under this operator, e1 is mapped to (1, 0) and e2 is mapped to (r, 1). Thus the linear operator is represented by the matrix 1 r . A= 0 1
3.2. MORE PROPERTIES OF LINEAR OPERATORS
53
5. Consider the linear operator L : Rn → Rn defined by L(x) = rx, where r > 0 is a fixed number. Since L(ei ) = rei for all i = 1, 2, . . . , n, we see that L(x) = Ax for all x ∈ Rn , where ⎡ ⎤ r 0 ··· 0 ⎢ 0 r ··· 0 ⎥ ⎢ ⎥ A=⎢ . . . .. ⎥ . . . . ⎣ . . . . ⎦ 0 0
···
r
6. Let w = α + iβ be a fixed complex number and define f : C → C by f (z) = wz. (a) Regarding C as a vector space (over the field C), it is obvious that f is linear (this follows from the field properties): f (y + z) = w(y + z) = wy + wz = wf (y) + wf (z), f (αz) = w(αz) = α(wz). (b) Now regard the set C as identical with R2 , writing (x, y) for x + iy. Let w = α + iβ. Then f (1) = w = α + iβ, so (1, 0) is mapped to (α, β). Also, f (i) = wi = −β + iα, and therefore (0, 1) is mapped to (−β, α). It follows that f is represented by the matrix α −β . A= β α 7. Let X and U be vector spaces over a field F . (a) Suppose L : X → U is linear and α ∈ F . Then, for any x, y ∈ X and β ∈ F , (αL)(x + y) = α(L(x + y)) = α(L(x) + L(y)) = αL(x) + αL(y), (αL)(βx) = α(L(βx)) = α(βL(x)) = (αβ)L(x) = β(αL(x)) = β((αL)(x)). This shows that αL is also linear. (b) Now suppose that L : X → U and M : X → U are both linear. For any x, y ∈ X and α ∈ F , (M + L)(x + y) = M (x + y) + L(x + y) = (M (x) + M (y)) + (L(x) + L(y)) = (M (x) + L(x)) + (M (y) + L(y)) = (M + L)(x) + (M + L)(y), (M + L)(αx) = M (αx) + L(αx) = αM (x) + αL(x) = α(M (x) + L(x)) = α(M + L)(x). This shows that M + L is also a linear operator. 8. The discrete Fourier transform (DFT) is the mapping F : CN → CN defined by N −1
(F(x))n =
1 xj e−2πinj/N , n = 0, 1, . . . , N − 1, N j=0
√ where i is the complex unit (i = −1). We wish to find the matrix A ∈ CN ×N such that F (x) = Ax for all x ∈ CN . As usual, we proceed by determining F (ek ) for k = 0, 1, . . . , N − 1. Since (ek )j = 1 if j = k and 0 otherwise, we have 1 1 (F(ek ))n = e−2πink/N = e−(2k)nπi/N . N N
CHAPTER 3. LINEAR OPERATORS
54 It follows that
1 (1, 1, . . . , 1), N 1 −2πi/N −4πi/N F (e1 ) = 1, e ,e , . . . , e−2(N −1)πi/N , N 1 −4πi/N −8πi/N 1, e F(e2 ) = ,e , . . . , e−4(N −1)πi/N , N .. .. . . 2 1 1, e−2(N −1)πi/N , e−4(N −1)πi/N , . . . , e−2(N −1) πi/N F (eN −1 ) = N F (e0 ) =
and therefore F is represented by the matrix ⎡
1 ⎢ 1 1 ⎢ ⎢ 1 A= ⎢ N ⎢ .. ⎣ . 1
⎤
e−4πi/N e−8πi/N .. .
··· ··· ··· .. .
e−2(N −1)πi/N e−4(N −1)πi/N .. .
e−4(N −1)πi/N
···
e−2(N −1) πi/N
1
1
e−2πi/N e−4πi/N .. . e−2(N −1)πi/N
1
⎥ ⎥ ⎥ ⎥. ⎥ ⎦
2
9. Let x ∈ RN be denoted as x = (x0 , x1 , . . . , xN −1 ). Given x, y ∈ RN , the convolution of x and y is the vector x ∗ y ∈ RN defined by N −1
xm yn−m , n = 0, 1, . . . , N − 1.
(x ∗ y)n = m=0
In this formula, y is regarded as defining a periodic vector of period N ; therefore, if n − m < 0, we take yn−m = yN +n−m . The linearity of the convolution operator is obvious: N −1
((x + z) ∗ y)n =
N −1
(x + z)m yn−m = m=0
(xm + zm )yn−m m=0 N −1
=
(xm yn−m + zm yn−m ) m=0 N −1
=
N −1
xm yn−m + m=0
zm yn−m m=0
= (x ∗ y)m + (z ∗ y)m , N −1
((αx) ∗ y)n =
N −1
(αx)m yn−m = m=0
N −1
(αxm )yn−m = m=0
α(xm yn−m ) m=0 N −1
=α
xm yn−m m=0
= α(x ∗ y)n . Therefore, if y is fixed and F : RN → RN , then F (x + z) = F (x) + F (z) for all x, z ∈ RN and F (αx) = αF (x) for all x ∈ RN , α ∈ R. This proves that F is linear. Next, notice that, if ek is the kth standard basis vector, then F (ek )n = yn−k , n = 0, 1, . . . , N − 1.
3.2. MORE PROPERTIES OF LINEAR OPERATORS
55
It follows that F (e0 ) = (y0 , y1 , . . . , YN −1 ), F (e1 ) = (yN −1 , y0 , . . . , YN −2 ), F (e2 ) = (yN −2 , yN −1 , . . . , YN −3 ), .. .. . . F (eN −1 ) = (y1 , y2 , . . . , Y0 ). Therefore, F (x) = Ax for all x ∈ RN , where ⎡ y0 yN −1 ⎢ y1 y0 ⎢ ⎢ y2 y1 A=⎢ ⎢ .. .. ⎣ . . yN −1 yN −2
yN −2 yN −1 y0 .. .
··· ··· ··· .. .
yN −3
···
⎤ y1 y2 ⎥ ⎥ y3 ⎥ ⎥. .. ⎥ . ⎦ y0
10. Let L : C 2 (R) → C(R) be the differential operator L = D2 + ω 2 I, where ω > 0 is a real number. (a) We wish to determine which of the functions x(t) = sin (ωt), x(t) = cos (ωt), x(t) = eωt , x(t) = e−ωt satisfy L(x) = 0. A direct calculation shows that the first two functions satisfy x (t) = −ω 2 x(t), and therefore L(x) = 0. However, the last two functions satisfy x (t) = ω 2 x(t), and therefore L(x) = 2ω 2 x = 0. (b) The function x(t) = sin (ωt) + t2 can be written as x = x1 + y, where x1 (t) = sin (ωt) and y(t) = t2 . Since L is linear, we have L(x) = L(x1 + y) = L(x1 ) + L(y) = L(y) (we have already shown that L(x1 ) = 0). Now, y (t) = 2 and therefore L(y)(t) = 2 + ω 2 t2 . This shows that L(x) = f . On the other hand, a direct calculation shows that x(t) = t2 sin (ωt) satisfies (L(x))(t) = 2 sin (ωt) + 4ωt cos (ωt) = ω 2 t2 + 2. Thus the second function does not satisfy L(x) = f . 11. Consider the operator F : C 2 (a, b) → C(a, b) defined by (F (x))(t) = p(t)x (t) + q(t)x (t) + r(t)x(t), where p, q, and r are continuous functions (p, q, r ∈ C(a, b)). The operator F is linear: (F (x1 + x2 ))(t) = p(t)(x1 + x2 ) (t) + q(t)(x1 + x2 ) (t) + r(t)(x1 + x2 )(t) = p(t)(x 1 (t) + x 2 (t)) + q(t)(x 1 (t) + x 2 (t))+ r(t)(x1 (t) + x2 (t)) = p(t)x 1 (t) + p(t)x 2 (t) + q(t)x 1 (t) + q(t)x 2 (t)+ r(t)x1 (t) + r(t)x2 (t) = p(t)x 1 (t) + q(t)x 1 (t) + r(t)x1 (t)+ p(t)x 2 (t) + q(t)x 2 (t) + r(t)x2 (t) = (F (x1 ))(t) + (F (x2 ))(t), (F (αx))(t) = p(t)(αx) (t) + q(t)(αx) (t) + r(t)(αx)(t) = p(t)(αx (t)) + q(t)(αx (t)) + r(t)(αx(t)) = αp(t)x (t) + αq(t)x (t) + αr(t)x(t) = α(p(t)x (t) + q(t)x (t) + r(t)x(t)) = α(F (x))(t). This shows that F (x1 + x2 ) = F (x1 ) + F (x2 ) for all x1 , x2 ∈ C 2 (a, b) and F (αx) = αF (x) for all x ∈ C 2 (a, b), α ∈ R.
CHAPTER 3. LINEAR OPERATORS
56
12. Let F be a field and A ∈ F m×n , B ∈ F n×p . We wish to show that (AB)T = B T AT . We note that (AT )ij = Aji and similarly for other matrices. Therefore,
T
(AB)
n
ij
= (AB)ji =
Ajk Bki , k=1
B T AT ij =
n
n
(B T )ik (AT )kj = k=1
Bki Ajk . k=1
This shows that ((AB)T )ij = (B T AT )ij for each i, j, and therefore (AB)T = B T AT . 13. Let K : C[c, d] → C[a, b] be defined by d (K(x))(s) =
k(s, t)x(t) dt, a ≤ s ≤ b,
c
where k : [a, b] × [c, d] → R is continuous. We establish a rectangular grid on the domain of k by choosing numbers a < s1 < s2 < · · · < sm < b, c < t1 < t2 < · · · < tn < d, where Δs = (b − a)/m, a + (i − 1)Δs ≤ si ≤ a + iΔs, Δt = (d − c)/n, c + (j − 1)Δt ≤ tj ≤ c + jΔt. We then approximate a function x ∈ C[c, d] by a vector X ∈ Rn , where Xj = x(tj ). The integral defining y = K(x) can be approximated by a Riemann sum in terms of X and the values of k at the nodes of the grid: . y(si ) =
m
k(si , tj )x(tj )Δt, i = 1, 2, . . . , m. j=1
If we define A ∈ Rm×n by Aij = k(si , tj )Δt, then we have . y(si ) =
m
m
Aij x(tj ) = j=1
Aij Xj = (AX)i , j=1
. where Xj = x(tj ). Therefore Y = AX approximates y in the sense that Yi = y(si ), i = 1, 2, . . . , m.
3.3
Isomorphic vector spaces
1. (a) The function f : R → R, f (x) = 2x + 1 is invertible since f (x) = y has a unique solution for each y ∈ R: y−1 f (x) = y ⇔ 2x + 1 = y ⇔ x = . 2 We see that f −1 (y) = (y − 1)/2. (b) The functionf : R → (0, ∞), f (x) = ex is also invertible, and the inverse is f −1 (y) = ln (y): eln (y) = y for all y ∈ R, ln (ex ) = x for all x ∈ R. (c) The function f : R2 → R2 , f (x) = (x1 + x2 , x1 − x2 ) is invertible since f (x) = y has a unique solution for each y ∈ R2 : f (x) = y ⇔
x1 + x2 = y1 , x1 − x2 = y2
We see that f −1 (y) =
⇔
2 x1 = y1 +y 2 , y1 −y2 x2 = 2 .
y 1 + y2 y1 − y2 , 2 2
.
3.3. ISOMORPHIC VECTOR SPACES
57
(d) The function f : R2 → R2 , f (x) = (x1 − 2x2 , −2x1 + 4x2 ) fails to be invertible, and in fact is neither injective nor surjective. For example, f (0) = 0 and also f ((2, 1)) = (0, 0) = 0, which shows that f is not injective. The equation f (x) = (1, 1) has no solution: f (x) = (1, 1) ⇔
x1 − 2x2 = 1, −2x1 + 4x2 = 1
⇔
x1 − 2x2 = 1, 0 = 3.
This shows that f is not surjective. 2. Let X, Y be sets and let f : X → Y be invertible. Suppose g : Y → X satisfies g(f (x)) = x for all x ∈ X and f (g(y)) = y for all y ∈ Y . We wish to show that g = f −1 . Actually, we need only one of the hypotheses. Suppose g satisfies g(f (x)) = x for all x ∈ X. Given any y ∈ Y , define x = f −1 (y). Then g(f (x)) = x ⇔ g(f (f −1 (y))) = f −1 (y) ⇔ g(y) = f −1 (y). Since this holds for all y ∈ Y , we see that g = f −1 . On the other hand, suppose we have f (g(y)) = y for all y ∈ Y . For any given y ∈ Y , we can apply f −1 to both sides of this equation to obtain f −1 (f (g(y))) = f −1 (y) ⇔ g(y) = f −1 (y). Once again, we see that g = f −1 . 3. See the solution of the previous exercise. 4. Let X, Y , and Z be sets, and suppose f : X → Y , g : Y → Z are bijections. We wish to prove that g ◦ f is a bijection mapping X onto Z. First, let z ∈ Z. Since g is surjective, there exists y ∈ Y such that g(y) = z. Also, f is surjective, so there exists x ∈ X such that f (x) = y. But then (g ◦ f )(x) = g(f (x)) = g(y) = z. This shows that g ◦ f is surjective. Now suppose x1 , x2 ∈ X and (g ◦ f )(x1 ) = (g ◦ f )(x2 ). This is equivilent to g(f (x1 )) = g(f (x2 )), Since g is injective, this implies that f (x1 ) = f (x2 ), which in turn implies that x1 = x2 since f is injective. Thus (g ◦ f )(x1 ) = (g ◦ f )(x2 ) implies that x1 = x2 , and therefore g ◦ f is injective. We have (g ◦ f )−1 = f −1 ◦ g −1 . To prove this, it suffices by Theorem 73 to verify that (f −1 ◦ g −1 )(g ◦ f )(x) = x for all x ∈ X. But, for any x ∈ X, (f −1 ◦ g −1 )(g ◦ f )(x) = f −1 (g −1 (g(f (x)))) = f −1 (f (x)) = x, and the proof is complete. 5. Let X, Y , and Z be sets, and suppose f : X → Y , g : Y → Z are given functions. (a) If f and g ◦ f are invertible, then g is invertible. We can prove this using the previous exercise and the fact that f −1 is invertible. We have g = (g ◦ f ) ◦ f −1 : ((g ◦ f ) ◦ f −1 )(y) = g(f (f −1 (y))) = g(y) for all y ∈ Y. Therefore, since g ◦ f and f −1 are invertible, the previous exercise implies that g is invertible. (b) Similarly, if g and g ◦ f are invertible, then f is invertible since f = g −1 ◦ (g ◦ f ).
CHAPTER 3. LINEAR OPERATORS
58
(c) If we merely know that g ◦ f is invertible, we cannot conclude that either f or g is invertible. For example, let f : R → R2 , g : R2 → R be defined by f (x) = (x, 0) for all x ∈ R and g(y) = y1 + y2 for all y ∈ R2 , respectively. Then (g ◦ f )(x) = g(f (x)) = g((x, 0)) = x + 0 = x, and it is obvious that g ◦ f is invertible (in fact, (g ◦ f )−1 = g ◦ f )). However, f is not surjective and g is not injective, so neither is invertible. 6. Consider the linear operator M : Pn → Pn+1 defined by (M (p))(x) = xp(x). We wish to find the matrix representing M , using the standard bases for both Pn and Pn+1 . We will write Sn = {1, x, . . . , xn } and Sn+1 = {1, x, . . . , xn+1 }. Now, M maps xi to xi+1 , and therefore maps 1 to x, x to x2 , . . . , xn to xn+1 . The columns of [M ]Sn ,Sn+1 are therefore (0, 1, 0, 0, . . . , 0), (0, 0, 1, 0, . . . , 0), . . . , (0, 0, 0, 0, . . . , 1) ⎡
and hence
0 ⎢ 1 ⎢ ⎢ 0 ⎢ [M ]Sn ,Sn+1 = ⎢ 0 ⎢ ⎢ .. ⎣ .
0 0 1 0 .. .
0 ···
0 0 7. Let L : R2 → R2 be defined by L(x) = Ax, where 1 A= 1
··· ··· ··· ··· .. .
0 0 0 1 .. .
1 1
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ . 0 ⎥ ⎥ .. ⎥ . ⎦ 1
.
Let S be the standard basis for R2 , S = {(1, 0), (0, 1)}, and let X be the alternate basis {x1 , x2 } = {(1, 1), (1, 2)}. We wish to find [L]X ,X . We first find [y]X for an arbitrary y ∈ R2 : y1 1 1 + c2 = y = c1 x1 + c2 x2 ⇔ c1 1 2 y2 ⇔
c 1 + c2 = y 1 , c1 + 2c2 = y2
⇔
c1 = 2y1 − y2 , c2 = y 2 − y 1 .
Thus [y]X = (2y1 − y2 , y2 − y1 ). We then have [L(x1 )]X = [Ax1 ]X = [(2, 2)]X = (2, 0), [L(x2 )]X = [Ax2 ]X = [(3, 3)]X = (3, 0), which together imply that
[L]X ,X =
2 3 0 0
.
8. Consider two bases for R3 : S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)}, X = {(1, 1, 1), (0, 1, 1), (0, 0, 1)}. Let I : R3 → R3 be the identity operator: Ix = x for all x ∈ R3 . We wish to find each of the following: [I]S,S , [I]S,X , [I]X ,S , [I]X ,X . If A is the matrix representing I with respect to two given bases, then the jth column of A consists of the coordinates of the jth vector from the first basis (that is, the basis for
3.3. ISOMORPHIC VECTOR SPACES
59
the domain) with respect to the second basis (the basis for the co-domain) (cf. formula (3.8) in the text and the discussion that follows). Now, ⎤ ⎡ 1 0 0 [I(ej )]S = [ej ]S = ej ⇒ [I]S,S = ⎣ 0 1 0 ⎦ . 0 0 1 Next, [I(e1 )]X = [e1 ]X = (1, −1, 0), [I(e2 )]X = [e2 ]X = (0, 1, −1), [I(e3 )]X = [e3 ]X = (0, 0, 1), ⎤ 1 0 0 1 0 ⎦. [I]S,X = ⎣ −1 0 −1 1 ⎡
and therefore
Next, [I(x1 )]S = [x1 ]S = (1, 1, 1), [I(x2 )]S = [x2 ]S = (0, 1, 1), [I(x3 )]S = [x3 ]S = (0, 0, 1), ⎡
and therefore
1 0 [I]X ,S = ⎣ 1 1 1 1
⎤ 0 0 ⎦. 1
Finally, [I(x1 )]X = [x1 ]X = (1, 0, 0), [I(x2 )]X = [x2 ]X = (0, 1, 0), [I(x3 )]X = [x3 ]X = (0, 0, 1), ⎡
and therefore
1 [I]X ,X = ⎣ 0 0
⎤ 0 0 1 0 ⎦. 0 1
9. We wish to find the matrix [L]X ,X of Example 80 by solving the equation [L]X ,X [x]X = [L(x)]X ,X for all x ∈ R2 directly. We write B ∈ R2×2 for [L]X ,X . We already know that 1 1 2 x1 + 2 x2 , [x]X = 1 1 2 x1 − 2 x2
and L(x) = Ax =
2x1 + x2 x1 + 2x2
⇒ [L(x)]X =
3
3 2 x1 + 2 x2 1 1 2 x1 − 2 x2
.
Therefore, B[x]X = [L(x)]X is equivalent to 3 3 B11 12 x1 + 12 x2 + B12 12 x1 − 12 x2 2 x1 + 2 x2 1 = 1 1 B21 2 x1 + 12 x2 + B22 12 x1 − 12 x2 2 x1 − 2 x2 " ! B11 +B12 3 3 12 x1 + B11 −B x2 2 2 2 x1 + 2 x2 = ⇒ . 1 1 B21 +B22 22 x1 + B21 −B x2 2 x1 − 2 x2 2 2
CHAPTER 3. LINEAR OPERATORS
60
Since this equation must hold for all x1 , x2 , it is equivalent to four equations for the four unknowns B11 , B12 , B21 , B22 : B11 + B12 3 B11 − B12 3 B21 + B22 1 B21 − B22 1 = , = , = , =− . 2 2 2 2 2 2 2 2 It is easy to solve these equations to find B11 = 3, B12 = 0, B21 = 0, B22 = 1, that is, 3 0 . [L]X ,X = 0 1 10. Let T : P2 → P2 be defined by (T (p))(x) = p(x − 1). We will prove that T is linear and find [T ]M,M , where M is the standard basis for P2 : M = {1, x, x2 }. First, we have (T (p1 + p2 ))(x) = (p1 + p2 )(x − 1) = p1 (x − 1) + p2 (x − 1) = (T (p1 )(x) + T (p2 )(x)), (T (αp))(x) = (αp)(x − 1) = αp(x − 1) = α(T (p))(x). Thus T is linear. Now, T maps the constant polynomial 1 to itself, x to x − 1, and x2 to (x − 1)2 = x2 −2x+1. The coordinate vectors of these three images are (1, 0, 0), (−1, 1, 0), and (1, −2, 1), respectively. Therefore, ⎤ ⎡ 1 −1 1 1 −2 ⎦ . [T ]M,M = ⎣ 0 0 0 1 11. The operator T from the previous exercise is an isomorphism. We already have shown that it is linear, so we must show that it is surjective and bijective. Given an q ∈ P2 , define p(x) = q(x + 1). Then (T (p))(x) = p(x − 1) = q(x + 1 − 1) = q(x), and hence T (p) = q. Since q was arbitrary, this shows that T is surjective. Now suppose that p, q ∈ P2 and T (p) = T (q). This means that T (p) and T (q) are equal as functions, and hence, for every x ∈ R, (T (p))(x + 1) = (T (q))(x + 1). But then, by definition of T , we have p(x) = q(x) for all x ∈ R, and hence p = q. This shows that T is surjective, and the proof is complete. 12. Let X = {(1, 0, 0), (1, 1, 0), (1, 1, 1)} ⊂ Z32 . We denote the vectors in X by x1 , x2 , x3 , respectively. We can prove that X is a basis for Z32 by showing that there is a unique solution c1 , c2 , c3 to c1 x1 + c2 x2 + c3 x3 = y for each y ∈ Z32 . This equation is equivalent to the system c 1 + c2 + c3 = y 1 , c 2 + c3 = y 2 , c3 = y 3 . It is easy to verify that the unique solution is c1 = y1 + y2 , c2 = y2 + y3 , and c3 = y3 . It then follows that, for any y ∈ Z32 , [y]X = (y1 + y2 , y2 + y3 , y3 ). 13. Let X be the basis for Z32 from the previous exercise, let A ∈ Z3×3 be defined by 2 ⎤ ⎡ 1 1 0 A = ⎣ 1 0 1 ⎦, 0 1 1 and define L : Z32 → Z32 by L(x) = Ax. We wish to find [L]X ,X . The columns of this matrix are [L(x1 )]X , [L(x2 )]X , [L(x3 )]X . We have [L(x1 )]X = [(1, 1, 0)]X = (0, 1, 0), [L(x2 )]X = [(0, 1, 1)]X = (1, 0, 1), [L(x3 )]X = [(0, 0, 0)]X = (0, 0, 0),
3.3. ISOMORPHIC VECTOR SPACES
61 ⎡
and hence
0 1 [L]X ,X ⎣ 1 0 0 1
⎤ 0 0 ⎦. 0
14. Consider the basis S = {1, x, x2 } for P3 (Z3 ) (recall that we proved in Exercise 2.7.13 that dim(P3 (Z3 )) = 3, and that S is a basis). We wish to find # $ c0 + c1 x + c2 x2 + c3 x3 S , where c0 , c1 , c2 , c3 are arbitrary elements of Z3 . We begin by expressing x3 in terms of the elements of S: The equation a0 + a1 x + a2 x2 = x3 means that a0 + a1 x + a2 x2 = x3 for all x ∈ Z3 , and since Z3 contains only the three elements 0, 1, 2, this is equivalent to the system a0 = 0, a1 + a2 + a3 = 1, a1 + 2a2 + a3 = 2. We solve this system to find the unique solution a0 = 0, a1 = 1, a2 = 0, which shows that, as functions on Z3 , x3 and x are equal. Therefore, c0 + c1 x + c2 x2 + c3 x3 = c0 + c1 x + c2 x2 + c3 x = c0 + (c1 + c3 )x + c2 x2 , which shows that
$ # c0 + c1 x + c2 x2 + c3 x3 S = (c0 , c1 + c3 , c2 ).
15. Let X, Y , and Z be vector spaces over a field F , and suppose X, and Y are isomorphic and Y and Z are also isomorphic. We wish to prove that X and Z are isomorphic. Let L : X → Y and M : Y → Z be isomorphisms. We have already seen that the composition of linear operators is linear (Theorem 58) and that the composition of bijections is bijective (Exercise 3.3.4). Therefore, M L is a linear bijection mapping X onto Z. This proves that X and Z are isomorphic. 16. Let X and Y be n-dimensional vectors spaces over a field F , and let {x1 , x2 , . . . , xn }, {y1 , y2 , . . . , yn } be bases for X, Y , respectively. We wish to prove that E : X → Y defined by E(α1 x1 + · · · + αn xn ) = α1 y1 + · · · + αn yn is linear. Let u, v ∈ X; then we can write u = α1 x1 + · · · + αn xn , v = β1 x1 + · · · + βn xn for some scalars α1 , . . . , αn , β1 , . . . , βn ∈ F . Then u + v = (α1 + β1 )x1 + · · · + (αn + βn )xn , and therefore E(u + v) = (α1 + β1 )y1 + · · · + (αn + βn )yn = α1 y1 + β1 y1 + · · · + αn yn + βn yn = α1 y1 + · · · + αn yn + β1 y1 + · · · + βn yn = E(u) + E(v). Similarly, for any γ ∈ F and u = α1 x1 + · · · + αn xn ∈ X, we have γu = (γα1 )x1 + · · · + (γαn )xn , and therefore E(γu) = (γα1 )y1 + · · · + (γαn )yn = γ(α1 y1 + · · · + αn yn ) = γE(u). Thus E is linear, as desired.
CHAPTER 3. LINEAR OPERATORS
62
17.(a,b) Let F be a field. We are asked to show that F m×n is a vector space over F , where addition and scalar multiplication are defined entrywise: (A + B)ij = Aij + Bij , (αA)ij = αAij . Each element of F m×n consists of mn scalars from F ; although we normally visualize these scalars in a rectangular array, there is no difference (in terms of addition and scalar multiplication) from listing the scalars in a one-dimensional list. It is trivial to verify the defining properties of a vector space (just as it is trivial in the case of F n ). We define L : Rm×n → F mn by L(A) = v, where vk = Aij , where k = (j − 1)m + i, 1 ≤ i ≤ m, 1 ≤ j ≤ n. (A little thought shows that for each k = 1, 2, . . . , mn, there is a unique choice of i and j satisfying the above relationship. This is a variation on the usual division algorithm; cf. Theorem 479 in Appendix A.) It is easy to see that L is invertible and L−1 (v) = A, where Aij = v(j−1)m+i . Also, L is linear: L(A + B)k = (A + B)ij (where k = (j − 1)m + i, 1 ≤ i ≤ m, 1 ≤ j ≤ n) = Aij + Bij = L(A)k + L(B)k = (L(A) + L(B))k L(αA)k = (αA)ij (where k = (j − 1)m + i, 1 ≤ i ≤ m, 1 ≤ j ≤ n) = αAij = αL(A)k = (αL(A))k . Thus L is an isomorphism, F m×n is isomorphic to F mn , and the dimension of F m×n is mn. A basis for F m×n consists of the matrices E1 , E2 , . . . , Emn mapping to the standard basis vectors in F mn . Each such matrix contains a single entry of 1, with all other entries equal to 0: 1, if (j − 1)m + i = k, (Ek )ij = 0, otherwise. (c) Let X and U be vector spaces over a field F , and suppose the dimensions of X and U are n and m, respectively. We wish to prove that L(X, U ) is isomorphic to F m×n . Let X = {x1 , . . . , xn } and U = {u1 , . . . , um } be bases for X and U , respectively. We define P : L(X, U ) → F m×n by P (L) = [L]X ,U . Recall that [L]X ,U is defined by the equation [L]X ,U [x]X = [L(x)]U for all x ∈ X. We have seen that for each L ∈ L(X, U ), there exists a unique A ∈ F m×n such that A[x]X = [L(x)]U for all x ∈ X (namely, the matrix A whose columns are [L(x1 )]U , . . . , [L(xn )]U ). Thus P is well-defined. Now, we have [(K + L)(x)]U = [K(x) + L(x)]U = [K(x)]U + [L(x)]U since the mapping from a vector to its coordinates, u → [u]U , is linear. Therefore, [(K + L)(x)]U = [K(x)]U + [L(x)]U = [K]X ,U [x]X + [L]X ,U [x]X = ([K]X ,U + [L]X ,U )[x]X .
3.3. ISOMORPHIC VECTOR SPACES
63
This holds for all K, L ∈ L(X, U ). Similar reasoning shows that [(αL)(x)]U = (α[L]X ,U )[x]X for all L ∈ L(X, U ), α ∈ F . It follows that P is a linear operator. Finally, given any A ∈ F m×n , we can define L = Q(A) ∈ L(X, U ) implicitly by [L(x)]U = A[x]X for all x ∈ X. It is straightforward to prove that this defines L uniquely. It is then immediate that P (Q(A)) = A for all A ∈ F m×n and Q(P (L)) = L for all L ∈ L(X, U ), Thus P is invertible, and hence L(X, U ) is isomorphic to F m×n . 18. Let U = R, regarded as a vector space over R, and let V be the vector space of Exercise 2.2.8. We will show that U and V are isomorphic by showing that F : U → V defined by F (x) = ex is an isomorphism from U to V . For any x, y ∈ R, we have F (x + y) = ex+y = ex ey = ex ⊕ ey = F (x) ⊕ F (y). Also, for any α ∈ R, x ∈ R, α
F (αx) = eαx = (ex ) = α
ex = α
F (x).
Thus F is linear. We know from calculus that F is a bijection (the inverse is F −1 (u) = ln (u)), and hence F is an isomorphism. 19. We wish to determine if the operator D defined in Example 79 is an isomorphism. In fact, D is not an isomorphism since it is not surjective. For example, if p(x) = x and q(x) = x + 1, then p = q but D(p) = D(q). 20. Let F be a field and let X and U be finite-dimensional vector spaces over F . Let X = {x1 , x2 , . . . , xn } and U = {u1 , u2 , . . . , um } be bases for X and U , respectively. Let A ∈ F m×n be given. The unique linear operator L : X → U such that [L]X ,U = A is defined by m
L(α1 x1 + · · · + αn xn ) = i=1
⎛
n
⎝
⎞ Aij αj ⎠ ui
j=1
for all α1 x1 + · · ·+ αn xn ∈ X. Notice that since X is a basis for X, each x ∈ X can be written uniquely as x = α1 x1 + · · · + αn xn , and hence L is well-defined. It is easy to verify directly that L is linear. Moreover, m
L(α1 x1 + · · · + αn xn ) =
(Aα)i ui , i=1
which shows that [L(x)]U = A[x]X for all x ∈ X, that is, that [L]X ,U = A. Finally, if M : X → U also satisfies [M ]X ,U = A, then we must have m
M (α1 x1 + · · · + αn xn ) =
(Aα)i ui = L(α1 x1 + · · · + αn xn ) i=1
for all α1 x1 + · · · + αn xn ∈ X, which shows that M = L.
CHAPTER 3. LINEAR OPERATORS
64
3.4
Linear operator equations
1. Suppose L : R3 → R3 is linear, b ∈ R3 is given, and y = (1, 0, 1), z = (1, 1, −1) are two solutions to L(x) = b. We are asked to find two more solutions to L(x) = b. We know that y − z = (0, −1, 2) satisfies L(y − z) = 0. Therefore, z + α(y − z) satisfies L(z + α(y − z)) = L(z) + αL(y − z) = b + α · 0 = b for all α ∈ R. Thus two more solutions of L(x) = b are z + 2(y − z) = (1, 1, −1) + (0, −2, 4) = (1, −1, 3), z + 3(y − z) = (1, 1, −1) + (0, −3, 6) = (1, −2, 5). 2. Let L : C 2 (R) → C(R) be a linear differential operator, and let f in C(R) be defined by f (t) = 2(1 − t)et . Suppose x1 (t) = t2 et and x2 (t) = (t2 + 1)et are solutions of L(x) = f . We wish to find two more solutions of L(x) = f . Let z(t) = x2 (t) − x1 (t) = (t2 + 1)et − t2 et = et ; then L(z) = L(x2 − x1 ) = L(x2 ) − L(x1 ) = f − f = 0. It follows that x(t) = x1 (t) + αz(t) = t2 et + αet satisfies L(x) = L(x1 + αz) = L(x1 ) + αL(z) = f + α · 0 = f for all α ∈ R. Therefore, in particular, x(t) = t2 et + 2et , x(t) = t2 et + 3et are two more solutions of L(x) = f . 3. Suppose T : R4 → R4 has kernel sp{(1, 2, 1, 2), (−2, 0, 0, 1)}. Suppose further that y = (1, 2, −1, 1), b = (3, 1, −2, −1) satisfy T (y) = b. We wish to find three distinct solutions, each different from y, to T (x) = b. We know that x = y + αu + βv, where {u, v} is the given spanning set for ker(T ), also solve T (x) = b. Therefore, three more solutions to T (x) = b are x = y + u = (2, 4, 0, 3), x = y + v = (−1, 2, −1, 2), x = y + u + v = (0, 4, 0, 4). 4. Let T and b be defined as in the previous exercise. We wish to determine if z = (0, 4, 0, 1) a solution of T (x) = b. Since the solution set of T (x) = b is y + ker(T ), we must determine if z can be written as z = y + αu + βv. This equation is equivalent to the system α − 2β = −1, 2α = 2, α = 1, 2α + β = 0. It is easy to see that there is no solution to this system (the first two equations imply that α = β = 1, but these values do not satisfy the fourth equation), and hence that z is not a solution to T (x) = b. 5. Let L : R3 → R3 satisfy ker(L) = sp{(1, 1, 1)} and L(u) = v, where u = (1, 1, 0) and v = (2, −1, 2). A vector x satisfies L(x) = v if and only if there exists α such that x = u + αz (where z = (1, 1, 1)), that is, if and only if x − u = αz for some α ∈ R. (a) For x = (1, 2, 1), x − u = (0, 1, 1), which is not a multiple of z. Thus L(x) = v. (b) For x = (3, 3, 2), x − u = (2, 2, 2) = 2z. Thus L(x) = v. (c) For x = (−3, −3, −2), x − u = (−4, −4, −2), which is not a multiple of z. Thus L(x) = v.
3.4. LINEAR OPERATOR EQUATIONS
65
6. (a) Let X and U be vector spaces over a field F , and let L : X → U be linear. Suppose that b, c ∈ U are given, y ∈ X satisfies L(y) = b, and z ∈ X satisfies L(z) = c. Then, for any β, γ ∈ F , L(βy + γz) = βL(y) + γL(z) = βb + γc, and hence x = βy + γz solves L(x) = βb + γc. (b) Suppose F : R3 → R3 is a linear operator, and define b = (1, 2, 1), c = (1, 0, 1). Assume y = (1, 1, 0) satisfies F (y) = b, while z = (2, −1, 1) satisfies F (z) = c. Let d = (1, 4, 1). We can find β, γ ∈ R such that d = βb + γc, namely, β = 2, γ = −1. It follows that x = 2y − z = (0, 3, −1) satisfies F (x) = F (2y − z) = 2F (y) − F (z) = 2b − c = d. 7. Let A ∈ Z3×3 , b ∈ Z32 be defined by 2 ⎡
1 1 A=⎣ 1 0 0 1
⎤ ⎤ ⎡ 1 0 1 ⎦, b = ⎣ 0 ⎦. 1 1
If we solve Ax = 0 directly, we obtain two solutions: x = (0, 0, 0) or x = (1, 1, 1). Thus the solution space of Ax = 0 is {(0, 0, 0), (1, 1, 1)}. If we solve Ax = b, we also obtain two solutions: x = (0, 1, 0) or x = (1, 0, 1). If L : Z32 → Z32 is the linear operator defined by L(x) = Ax for all x ∈ Z32 , then ker(L) = {(0, 0, 0), (1, 1, 1)} and the solution set of L(x) = b is {(0, 1, 0), (1, 0, 1)} = (0, 1, 0) + ker(L), as predicted by Theorem 90. 8. Let A ∈ Z3×3 , b ∈ Z33 be defined by 3 ⎡
1 2 A=⎣ 2 0 0 1
⎤ ⎤ ⎡ 1 0 1 ⎦, b = ⎣ 1 ⎦. 1 2
The solution set of Ax = 0 is {(0, 0, 0), (1, 1, 1), (2, 2, 2)}, while the solution set of Ax = b is {(2, 1, 0), (0, 2, 1), (1, 0, 2)} = (2, 1, 0) + {(0, 0, 0), (1, 1, 1), (2, 2, 2)}. If L : Z33 → Z33 is defined by L(x) = Ax for all x ∈ Z33 , then we see that ker(L) = {(0, 0, 0), (1, 1, 1), (2, 2, 2)}, while the solution set of L(x) = b is (2, 1, 0) + ker(L). This is consistent with Theorem 90. 9. Let V be a vector space over a field F , let x̂ ∈ V , and let S be a subspace of V . We wish to show that if x̃ ∈ x̂ + S, then x̃ + S = x̂ + S. We first notice that x̃ ∈ x̂ + S implies that there exists r ∈ S such that x̃ = x̂ + r, which is equivalent to x̂ = x̃ − r. Now suppose that y ∈ x̂ + S. Then there exists s ∈ S such that y = x̂ + s. But then y = (x̃ − r) + s = x̃ + (s − r). Since s, r ∈ S and S is a subspace, s − r also belongs to S. It follows that y ∈ x̃ + S, and we have shown that x̂ + S ⊂ x̃ + S. On the other hand, suppose y ∈ x̃+S, say y = x̃+s, s ∈ S. We then have y = (x̂+r)+s = x̂+(r +s), and r + s ∈ S since r, s ∈ S and S is a subspace. Therefore y ∈ x̂ + S, and we have shown that x̃ + S ⊂ x̂ + S. It follows that x̃ + S = x̂ + S. The interpretation of this result is that, in the representation x̂ + ker(T ) of the solution set of T (x) = b, it does not matter which particular solution x̂ is chosen. 10. Let D : C 1 (R) → C(R) be the derivative operator: D(F ) = F . (a) As is well-known, D(F ) = 0 if and only if F is a constant function. Thus ker(D) is the space of constant functions. We can represent this as ker(D) = sp{1}, where 1 represents the constant function F (x) = 1.
CHAPTER 3. LINEAR OPERATORS
66
(b) We usually write indefinite integrals in the form f (x) dx = F (x) + C, where C is the constant of integration. Indefinite integration is equivalent to solving the linear operator equation D(F ) = f . According to Theorem 90, the solution set is F + ker(D), where F is any particular antiderivative of f . In the above formula, C can be thought of as an arbitrary element of ker(D), and hence the usual formula expresses the conclusion of Theorem 90. 11. Let F : R2 → R be defined by F (x) = x21 + x22 − 1. (Here the co-domain is regarded as a one-dimensional vector space.) The following statements would be true if F were linear: (a) The range of F is a subspace of R. (b) The solution set of F (x) = 0 (that is, {x ∈ R2 : F (x) = 0}) is a subspace of R2 . (c) The solution set of F (x) = c (where c ∈ R is a given constant) is x̂ + {x ∈ R2 : F (x) = 0}, where x̂ is any one solution to F (x) = c. Since F is nonlinear, these conclusions are not guaranteed to hold. In fact, the first conclusion fails, since R(F ) = [−1, ∞) is not a subspace of R. The second conclusion fails since the solution set of F (x) = 0 consists of all points lying on the circle of radius 1 centered at the origin, which is not a subspace (for instance, it does not contain the √ zero vector). The third conclusion also fails, since the solution set of F (x) = c is the circle of radius 1 + c centered at the origin, if c ≥ −1, or the empty set if c < −1. None of these sets is the unit circle translated by a fixed vector. 12. Let X and U be vector spaces over a field F , let T : X → U be linear, and suppose u ∈ U is a nonzero vector. We wish to determine if the solution set S to T (x) = u a subspace of X. It is not, and in fact does not satisfy any of the properties of a subspace. For instance, 0 ∈ S since T (0) = 0 = u. Also, S is not closed under scalar multiplication, since if x ∈ S, then T (2x) = 2T (x) = 2u = u (since u = 0). Similarly, it is easy to show that S is not closed under addition. 13. Let V be a vector space over a field F , and let S be a proper subspace of V . We wish to prove that the relation ∼ defined by u ∼ v if and only if u − v ∈ S is an equivalence relation on V . First, for any u ∈ V , u − u = 0 ∈ S, and hence u ∼ u. Second, if u ∼ v, then u − v ∈ S, which implies that v − u = −(u − v) ∈ S, and hence that v ∼ u. Finally, if u ∼ v and v ∼ w, then u − v, v − w ∈ S. But then u − w = (u − v) + (v − w) ∈ S, which proves that u ∼ w. Thus ∼ is an equivalence relation on V . 14. Let V be a vector space over a field F , and let S be a proper subspace of V . For any vector u ∈ V , let [u] denote the equivalence class of u under the equivalence relation defined in the previous exercise. We denote the set of all equivalence classes by V /S (the quotient space of V over S) and define addition and scalar multiplication on V /S by [u] + [v]
= [u + v] for all [u], [v] ∈ V /S,
α[u] = [αu] for all [u] ∈ V /S, α ∈ F. (a) We first prove that addition and scalar multiplication are well-defined. Suppose u1 ∼ u2 , v1 ∼ v2 , so that [u1 ] = [u2 ] and [v1 ] = [v2 ]. We must prove that [u1 + v1 ] = [u2 + v2 ], which is equivalent to (u1 + v1 ) ∼ (u2 + v2 ) and hence to (u1 + v1 ) − (u2 + v2 ) ∈ S. By definition of ∼, we have u1 − u2 = s ∈ S, v1 − v2 = t ∈ S. Hence (u1 + v1 ) − (u2 + v2 ) = u1 + v1 − u2 − v2 = (u1 − u2 ) + (v1 − v2 ) = s + t ∈ S. Thus [u1 + v1 ] = [u2 + v2 ], and hence addition is well-defined. Now suppose [u] = [v] and α ∈ F . By definition, we have u − v = s ∈ S, and hence αu − αv = α(u − v) = αs ∈ S, which shows that αu ∼ αv and hence that [αu] = [αv]. Thus scalar addition is also well-defined.
3.5. EXISTENCE AND UNIQUENESS OF SOLUTIONS
67
(b) We now wish to prove that V /S, defined above, is a vector space under the given operations. Commutativity and associativity of addition in V /S follow immediately from the commuatativity and associativity of addition in V . It is obvious from the definition of addition in V /S that [0] is an additive identity and that [−v] is an additive inverse for [v]. For all α, β ∈ F and v ∈ V , we have α(β[v]) = α[βv] = [α(βv)] = [(αβ)v] = (αβ)[v] and (α + β)[v] = [(α + β)v] = [αv + βv] = [αv] + [βv] = α[v] + β[v]. Also, for α ∈ F and u, v ∈ V , α([u] + [v]) = α[u + v] = [α(u + v)] = [αu + αv] = [αu] + [αv] + α[u] + α[v]. Finally, 1 · [u] = [1 · u] = [u] for all u ∈ V . Thus we see that all the vector space properties of V /S follow immediately from the fact that V is a vector space. 15. Now let X and U be vector spaces over a field F , and let L : X → U be a linear operator. Define T : X/ker(L) → R(L) by T ([x]) = L(x) for all [x] ∈ X/ker(L). (a) We first prove that T is a well-defined linear operator. To prove that T is well-defined, we must show that if [x] = [y], then T ([x]) = T ([y]). By definition, [x] = [y] if and only if x − y ∈ ker(L). But then L(x − y) = 0 ⇒ L(x) − L(y) = 0 ⇒ L(x) = L(y) ⇒ T ([x]) = T ([y]). Thus T is well-defined. Now, for any [x], [y] ∈ X/ker(L), we have T ([x] + [y]) = T ([x + y]) = L(x + y) = L(x) + L(y) = T ([x]) + T ([y]), and, for α ∈ F and [x] ∈ X/ker(L), we have T (α[x]) = T ([αx]) = L(αx) = αL(x) = αT ([x]). Thus T is linear. (b) We now prove that T is bijective and hence an isomorphism. First, suppose [x], [y] ∈ X/ker(L) and T ([x]) = T ([y]). Then L(x) = L(y), which, by linearity, implies that L(x − y) = 0 and hence that x − y ∈ ker(L). But then, by definition, [x] = [y]. This shows that T is injective. Now suppose u ∈ R(L), say u = L(x). Then, by definition of T , T ([x]) = L(x) = u, and hence u ∈ R(T ). This shows that T is surjective, and the proof is complete. (c) Let u ∈ R(L) be given, and let x̂ ∈ X be a solution to L(x) = u. Then, L(x) = u if and only if x ∈ x̂ + ker(L). But x ∈ x̂ + ker(L) is equivalent to x = x̂ + s for some s ∈ ker(L), which in turn is equivalent to x − x̂ ∈ ker(L) or [x] = [x̂]. It follows that [x̂] is precisely the solution set of L(x) = u. (This also shows that [x̂] = x̂ + ker(L), a fact that we did not note above.) Thus there is a one-to-one correspondence between v ∈ U and elements of X/ker(L), each of which is the solutions set of L(x) = v.
3.5
Existence and uniqueness of solutions
1. We wish to give an example of a matrix operator T : Rn → Rn that is not bijective. The simplest example would be T (x) = Ax, where A is the zero matrix (that is, the matrix whose entries are all zero). Such a T maps every input to the zero vector, and hence is not injective. To construct a less trivial example, let u ∈ Rn (n > 1) be any nonzero vector, and define A ∈ Rn×n by A = [u|u| · · · |u] (every column of A equals u). Then, for all x ∈ Rn , T (x) = Ax is a multiple of u, which shows that T cannot be surjective.
68
CHAPTER 3. LINEAR OPERATORS 2. Each part of this exercise describes an operator with certain properties. We wish to determine if such an operator can exist. (a) There cannot exist a linear operator T : R3 → R2 that is nonsingular. An operator T is nonsingular if and only if it is injective, and if T is injective then the dimension of the domain cannot exceed that of the co-domain (Theorem 93). (b) We can easily construct a linear operator T : R2 → R3 that is nonsingular, such as T (x) = (x1 , x2 , 0). Clearly x = 0 implies that T (x) = 0. (c) There cannot exist a linear operator T : R3 → R2 having full rank. For such any linear operator T : R3 → R2 , rank(T ) = dim(R(T )) ≤ dim(R2 ) = 2, while T has full rank if and only if rank(T ) = dim(R3 ) = 3. (d) The linear operator T : R2 → R3 described above (T (x) = (x1 , x2 , 0)) has full rank since rank(T ) = dim(R(T )) = 2 = dim(R2 ). (e) The same operator T : R2 → R3 described in the previous part is nonsingular but not invertible. Indeed, no linear operator from R2 into R3 is invertible by Corollary 103. (f) There are many invertible linear operators T : R2 → R2 . The simplest is the identity operator: T (x) = x for all x ∈ R2 . 3. Each part of this exercise describes an operator with certain properties. We wish to determine if such an operator can exist. (a) We wish to find a linear operator T : R3 → R2 such that T (x) = b has a solution for all b ∈ R2 . Any surjective operator will do, such as T : R3 → R2 defined by T (x) = (x1 , x2 ). (b) No linear operator T : R2 → R3 has the property that T (x) = b has a solution for all b ∈ R3 . This would imply that T is surjective, but then Theorem 99 would imply that dim(R3 ) ≤ dim(R2 ), which is obviously not true. (c) Every linear operator T : R3 → R2 has the property that, for some b ∈ R2 , the equation T (x) = b has infinitely many solutions. In fact, since dim(R2 ) < dim(R3 ), any such T : R3 → R2 fails to be injective and hence has a nontrivial kernel. Therefore, T (x) = 0 necessarily has infinitely many solutions. (d) We wish to construct a linear operator T : R2 → R3 such that, for some b ∈ R3 , the equation T (x) = b has infinitely many solutions. We will take T defined by T (x) = (x1 − x2 , x1 − x2 , 0). Then, for every α ∈ R, x = (α, α) satisfies T (x) = 0. (e) We wish to find a linear operator T : R2 → R3 with the property that T (x) = b does not have a solution for all b ∈ R3 , but when there is a solution, it is unique. Any nonsingular T will do, since R(T ) is necessarily a proper subspace of R3 (that is, T : R2 → R3 cannot be surjective), and nonsingularity implies that T (x) = b has a unique solution for each b ∈ R(T ) (by Definition 96 and Theorem 92). For example, T : R2 → R3 defined by T (x) = (x1 , x2 , 0) has the desired properties. 4. Each part of this exercise describes an operator with certain properties. We wish to determine if such an operator can exist. (a) The linear operator T : Z32 → Z22 defined by T (x) = (x1 , x2 ) is surjective. (b) No linear operator T : Z22 → Z32 can be surjective (by Theorem 99). (c) No linear operator T : Z33 → Z23 can be injective (by Theorem 93). (d) The linear operator T : Z23 → Z33 defined by T (x) = (x1 , x2 , 0) is injective. (e) Every linear operator T : Z22 → Z32 fails to be surjective, as we already pointed out above. The operator T : Z22 → Z32 defined by T (x) = (x1 , x2 , 0) is injective.
3.5. EXISTENCE AND UNIQUENESS OF SOLUTIONS 5. Define M : R4 → R3 by
69
⎡
⎤ x1 + 3x2 − x3 − x4 M (x) = ⎣ 2x1 + 7x2 − 2x3 − 3x4 ⎦ . 3x1 + 8x2 − 3x3 − 16x4
We wish to find the rank and nullity of M . We begin by finding the kernel of M . The equation M (x) = 0 is equivalent to the system x1 + 3x2 − x3 − x4 = 0, 2x1 + 7x2 − 2x3 − 3x4 = 0, 3x1 + 8x2 − 3x3 − 16x4 = 0. Gaussian elimination yields x1 = x3 , x2 = 0, x4 = 0. This shows that every element of the kernel is of the form (α, 0, α, 0); that is, ker(M ) = sp{(1, 0, 1, 0)} and hence the nullity of M is 1. A direct calculation shows that we can solve M (x) = y, which is equivalent to the system x1 + 3x2 − x3 − x4 = y1 , 2x1 + 7x2 − 2x3 − 3x4 = y2 , 3x1 + 8x2 − 3x3 − 16x4 = y3 , for every y ∈ R3 . Therefore, R(M ) = R3 , and hence rank(M ) = dim(R(M )) = 3. 6. Define T : Pn → Pn+1 by T (p)(x) = xp(x). We wish to find the rank and nullity of T . Let p ∈ Pn , p(x) = c0 + c1 x + c2 x2 + · · · + cn xn . Then (T (p))(x) = c0 x + c1 x2 + c2 x3 + · · · + cn xn+1 . Since {x, x2 , . . . , xn+1 } is linearly independent, we see that T (p) = 0 if and only if c0 = c1 = . . . = cn = 0, that is, if and only if p = 0. Thus ker(T ) = {0} and the nullity of T is 0. We also see that R(T ) = sp{x, x2 , . . . , xn+1 }, which shows that rank(T ) = n + 1. 7. Define S : Pn → Pn by S(p)(x) = p(2x + 1). We wish to find the rank and nullity of S. We first find the kernel of S. We have that S(p) = 0 if and only if p(2x + 1) = 0 for all x ∈ R. Consider an arbitrary y ∈ R and define x = (y − 1)/2. Then y−1 p(2x + 1) = 0 ⇒ p 2 · + 1 = 0 ⇒ p(y) = 0. 2 Since p(y) = 0 for all y ∈ R, it follows that p must be the zero polynomial, and hence the kernel of S is trivial. Thus nullity(S) = 0. Now consider any q ∈ Pn , and define p ∈ Pn by p(x) = q((x − 1)/2). Then (S(p))(x) = p(2x + 1) = q(
2x + 1 − 1 ) = q(x). 2
This shows that S is surjective, and hence rank(S) = dim(Pn ) = n + 1. 8. This problem has a typo; the operator should be defined on Z35 , not Z33 (note the 4 appearing in the definition of L).
CHAPTER 3. LINEAR OPERATORS
70
If we solve the problem as written, we interpret 4 as equal to 1 in Z3 . Note also that −2 = 1 in Z3 , so the definition of L can be written as L(x) = (x1 + x2 + x3 , x1 + x2 + x3 , x1 + x2 + x3 ). We then see that L(x) = (x1 + x2 + x3 )(1, 1, 1) for all x ∈ Z33 , that is, R(L) = sp{(1, 1, 1)}. It follows that rank(L) = 1. Also, L(x) = 0 if and only if x1 + x2 + x3 = 0, which is true if x1 = 2x2 + 2x3 , or x = (2α + 2β, α, β) for α, β ∈ Z3 . Therefore, ker(L) = sp{(2, 1, 0), (2, 0, 1)} and nullity(L) = 2. If we define L : Z35 → Z35 , then we have −2 = 3 and L(x) = (x1 + x2 + x3 , x1 + 3x2 + x3 , x1 + 4x2 + x3 ). We can manipulate this to show that L(x) = (x1 + x3 )(1, 1, 1) + x2 (1, 3, 4). It follows that R(L) = sp{(1, 1, 1), (1, 3, 4)} and therefore rank(L) = 2. The equation L(x) = 0 is equivalent to the system x1 + x2 + x3 = 0, x1 + 3x2 + x3 = 0, x1 + 42 + x3 = 0, which reduces under Gaussian elimation to x1 = 4x2 , x2 = 0. Therefore, every solution of L(x) = 0 is of the form x = α(4, 0, 1), ker(L) = sp{(4, 0, 1)}, and nullity(L) = 1. 9. Let p be a prime number. Given a positive integer n, define T : Zn+1 → Pn (Zp ), p T (c) = c1 + c2 x + c3 x2 + · · · + cn+1 xn . We wish to determine the values of n for which T is an isomorphism. We know that dim(Zn+1 ) = n + 1, p while by Exercises 2.6.11 and 2.7.13, n + 1, if p ≥ n + 1, dim(Pn (Zp )) = p, if p < n + 1. Therefore, if p < n + 1, that is, if n > p − 1, then T is not an isomorphism because dim(Pn (Zp )) = dim(Zn+1 ). On the other hand, if n ≤ p − 1, then Exercise 2.6.11 shows that {1, x, . . . , xn } is a basis p for Pn (Zp ), and it follows that T is an isomorphism (equivalent to the isomorphism E in the proof of Theorem 76). 10. We wish to determine if the following statement a theorem: Let X and U be vector spaces over a field F , and let T : X → U be linear. Then {x1 , x2 , . . . , xn } ⊂ X is linearly independent if and only if {T (x1 ), T (x2 ), . . . , T (xn )} ⊂ U is linearly independent. In fact, this is not a theorem. We know that if {T (x1 ), T (x2 ), . . . , T (xn )} is linearly independent, then {x1 , x2 , . . . , xn } is linearly independent (see the proof of Theorem 99). However, the converse is not true. For an extreme example, let X be any nontrivial vector space, let {x1 , x2 , . . . , xn } be a linearly independent subset of X, and define T : X → X by T (x) = 0 for all x ∈ X. Then T is linear but {T (x1 ), T (x2 ), . . . , T (xn )} is certainly not linearly independent since T (xi ) = 0 for all i = 1, 2, . . . , n. To generate more examples, let T : X → U be any singular linear operator and let {x1 , x2 , . . . , xn } be a basis for ker(T ). Then, once again, {T (x1 ), T (x2 ), . . . , T (xn )} fails to be linearly independent.
3.5. EXISTENCE AND UNIQUENESS OF SOLUTIONS
71
11. Suppose X and U are vector spaces over a field F , with U finite-dimensional, and L : X → U is linear. Let {u1 , u2 , . . . , um } be a basis for U and assume that, for each j, L(x) = uj has a solution x ∈ X. We wish to prove that L is surjective. For each j = 1, 2, . . . , m, let xj ∈ X satisfy L(xj ) = uj . Given any u ∈ U , there exist α1 , α2 , . . . , αm such that u = α1 u1 + α2 u2 + · · ·+ αm um . Define x = α1 x1 + α2 x2 + · · ·+ αm xm ; then L(x) = L(α1 x1 + α2 x2 + · · · + αm xm ) = α1 L(x1 ) + α2 L(x2 ) + · · · + αm L(xm ) = α1 u1 + α2 u2 + · · · + αm um = u. Thus, for each u ∈ U , there exists x ∈ X such that L(x) = u. This proves that L is surjective. 12. Let S : Pn−1 → Pn be defined by
x
(S(p))(x) = 0 2
p(t) dt = a0 x +
a1 2 an−1 n x + ...+ x . 2 n
Obviously R(S) = sp{x, x , . . . , x }. Since Pn = sp{1, x, x2 , . . . , xn }, every element of Pn can be written in the form q(x) = c0 + c1 x + · · · + cn xn . n
We see that q ∈ R(S) if c0 = 0 and q ∈ R(S) if c0 = 0. 13. (a) Suppose X and U are finite-dimensional vector spaces over a field F and T : X → U is an injective linear operator. Then R(T ) is a subspace of U , and we can define T1 : X → R(T ) by T1 (x) = T (x) for all x ∈ X. Obviously T1 is surjective, and it is injective since T is injective. Thus T1 is an isomorphism between X and R(T ). (b) Consider the operator S : Pn−1 → Pn of Example 100 (and the previous exercise), which we have seen to be injective. By the previous part of this exercise, S defines an isomorphism between Pn−1 and R(S) = sp{x, x2 , . . . , xn } ⊂ Pn . 14. Let V be a vector space over a field F , and let S, T be subspaces of V . Define a linear operator L : S × T → S + T by L((s, t)) = s + t. (a) We wish to prove that ker(L) is isomorphic to S ∩ T . Let us define M : S ∩ T → ker(L) by M (s) = (s, −s). Obviously M is well-defined since (s, −s) really does belong to ker(L). Also, it is straightforward to show that M is linear (we omit the details). If s1 , s2 ∈ S ∩T and M (s1 ) = M (s2 ), then (s1 , −s1 ) = (s2 , −s2 ), and therefore s1 = s2 . Thus M is injective. Now suppose (s, t) ∈ ker(L). Then s + t = 0, or t = −s, and hence (s, t) = (s, −s). If we prove that s ∈ S ∩ T , then it will follow that M (s) = (s, t), and hence that M is surjective. But s ∈ S, t ∈ T by assumption, and since T is a subspace, it follows that −t also belongs to T . Since s = −t, we see that s ∈ S, s = −t ∈ T ; that is, s ∈ S ∩ T . This completes the proof that M is surjective. Since M is both injective and surjective, it is bijective, and hence an isomorphism. (b) Suppose S ∩ T = {0}. We wish to prove that S × T is isomorphic to S + T by proving that L is an isomorphism. Since ker(L) is isomorphic to S ∩ T = {0}, it follows that ker(L) is trivial and hence that L is injective. Also, given any s + t ∈ S + T , we have s + t = L((s, t)), and therefore L is surjective. Thus L is bijective and hence an isomorphism. 15. Let K : C[c, d] → C[a, b] be an integral operator of the form d K(x) = y, y(s) = k(s, t)x(t) dt, a ≤ s ≤ t c
Suppose the kernel k has the special form N
k(s, t) =
fi (s)gi (t), i=1
CHAPTER 3. LINEAR OPERATORS
72
where fi ∈ C[a, b], gi ∈ C[c, d] for i = 1, 2, . . . , N . Then, given any x ∈ C[c, d], we have N d
d
k(s, t)x(t) dt =
(K(x))(s) = c
fi (s)gi (t) x(t) dt c
i=1
d N =
fi (s)gi (t)x(t) dt c
i=1
N
i=1
c
=
d
gi (t)x(t) dt fi (s).
This shows that K(x) ∈ sp{f1 , f2 , . . . , fN }. Hence R(K) ⊂ sp{f1 , f2 , . . . , fN } and rank(K) = dim(R(K)) ≤ N . Notice that the above argument does not automatically prove that R(K) = sp{f1 , f2 , . . . , fN }. However, this is true if {g1 , g2 , . . . , gN } is linearly independent. We will sketch a proof, although a complete proof needs material from later sections. To show that R(K) = sp{f1 , f2 , . . . , fN }, we need to show that, given any c1 , c2 , . . . , cN ∈ R, there exists x ∈ C[c, d] such that d gi (t)x(t) dt = ci , i = 1, 2, . . . , N. c
In fact, we can show that there exists x ∈ sp{g1 , g2 , . . . , gN } such that this condition holds. It is convenient to define T : sp{g1 , g2 , . . . , gN } → RN by d (T (x))i =
gi (t)x(t) dt = ci , i = 1, 2, . . . , N. c
Then we must show that T is surjective. In fact, T is bijective, which follows from Theorem 273 in Section 6.2. Assuming this fact, we obtain R(K) = sp{f1 , f2 , . . . , fN }. Then rank(K) = N holds if {f1 , f2 , . . . , fN } is linearly independent. Thus, rank(K) = N holds if both {g1 , g2 , . . . , gN } and {f1 , f2 , . . . , fN } are linearly independent. 16. Every real number x can be written uniquely in a decimal expansion of the form x = . . . xk xk−1 . . . x0 .x−1 x−2 . . . , where each digit xi belongs to {0, 1, 2, . . . , 9}, xi = 0 for only finitely many positive integers i, and xi = 0 for infinitely many negative integers i. Define f : R2 → R by f (x, y) = . . . xk yk xk−1 yk−1 . . . x0 y0 .x−1 y−1 x−2 y−2 . . . . We wish to prove that f is a bijection. We notice that for any (x, y) ∈ R2 , z = f (x, y) has digits if k is even, yk/2 , zk = x(k−1)/2 , if k is odd. Given z ∈ R, then, we define (x, y) ∈ R2 by xk = z2k+1 , yk = z2k , k ∈ Z. Then f (x, y) = z, which shows that f is surjective. Also, f (x, y) = f (u, v) if and only if x, u have the same decimal digits and y, v have the same decimal digits, that is, if and only if x = u and y = v. Thus f is injective.
3.6. THE FUNDAMENTAL THEOREM; INVERSE OPERATORS
3.6
73
The fundamental theorem; inverse operators
1. Let X and U be n-dimensional vector spaces over a field F , and let T : X → U be linear. We wish to show that T is surjective if and only if it is injective. We know that T is surjective if and only if R(T ) = U , which is equivalent to rank(T ) = dim(R(T )) = dim(U ) = n. Also, T is injective if and only if ker(T ) = {0}, which is equivalent to nullity(T ) = 0. By the fundamental theorem, we know that rank(T ) + nullity(T ) = n, which shows that rank(T ) = n if and only if nullity(T ) = 0, that is, T is surjective if and only if it is injective. 2. Let X and U be vector spaces over a field F , and let T : X → U be an invertible linear operator. We wish to show that T −1 is also linear. Let u, v ∈ U . Since T is surjective, there exist x, y ∈ X with T (x) = u,T (y) = v, namely, x = T −1 (u), y = T −1 (v). By linearity of T , it follows that T (x + y) = u + v, and hence that T −1 (u + v) = x + y = T −1 (u) + T −1 (v). Thus T −1 satisfies the first property of linearity. Now suppose α ∈ R and u ∈ U . Define x = T −1 (u), so that T (x) = u. Then, by linearity of T , T (αx) = αu, which implies that T −1 (αu) = αx = αT −1 (u). This proves that the second property of linearity also holds. 3. Let F be a field, let A ∈ F n×n , and let T : F n → F n be defined by T (x) = Ax. We first wish to show that A is invertible if and only if T is invertible. We begin by noting that if M ∈ F n×n and M x = x for all x ∈ F n , then M = I. This follows because each linear operator mapping F n into F n is represented by a unique matrix (with respect to the standard basis on F n ). The condition M x = x for all x ∈ F n shows that M represents the identity operator, as does the identity matrix I; hence M = I. Now suppose that T is invertible, and let B be the matrix of T −1 under the standard basis. We then have (AB)x = A(Bx) = T (T −1(x)) = x for all x ∈ F n ⇒ AB = I and
(BA)x = B(Ax) = T −1 (T (x)) = x for all x ∈ F n ⇒ BA = I.
This shows that A is invertible and that B = A−1 . Conversely, suppose that A is invertible, and define S : F n → F n by S(x) = A−1 x. Then S(T (x)) = A−1 (Ax) = (A−1 A)x = Ix = x for all x ∈ F n and
T (S(x)) = A(A−1 x) = (AA−1 )x = Ix = x for all x ∈ F n .
This shows that T is invertible and that S = T −1 . Notice that the above also shows that if A is invertible, then A−1 is the matrix defining T −1 . 4. Let F be a field and suppose A ∈ F n×n , B ∈ F n×n satisfy BA = I. We wish to show that A is invertible and B = A−1 . We first note that Ax = 0 ⇒ B(Ax) = B0 ⇒ (BA)x = 0 ⇒ Ix = 0 ⇒ x = 0. Therefore, A defines an injective linear operator and hence, by the fundamental theorem, an invertible linear operator. But then, by Theorem 115, A is invertible. Also, BA = I ⇒ (BA)A−1 = IA−1 ⇒ B(AA−1 ) = A−1 ⇒ BI = A−1 ⇒ B = A−1 . This completes the proof.
CHAPTER 3. LINEAR OPERATORS
74
5. Suppose A, B ∈ Rn×n are both invertible. We wish to prove that AB is also invertible and (AB)−1 = B −1 A−1 . This is a direct verification of the definition of invertibility for matrices: (B −1 A−1 )(AB) = B −1 (A−1 A)A = B −1 IA = B −1 B = I, (AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIA−1 = AA−1 = I. By Definition 114, this completes the proof. 6. Let M : Pn → Pn+1 be defined by (M (p))(x) = xp(x). We wish to determine if M is surjective and find R(M ) if it is not. Similarly, we wish to determine if M is injective and find ker(M ) if it is not. We know that M cannot be surjective, since the dimension of the domain is less than the dimension of the co-domain (cf. Theorem 99). In Exercise 3.5.6, we saw that R(M ) = sp{x, x2 , . . . , xn+1 } (hence rank(M ) = n + 1). The fundamental theorem now implies that nullity(M ) = 0, and hence M is injective. 7. We repeat the previous exercise for the operators defined below. (a) M : R2 → R3 defined by M (x) = Ax, where ⎤ 1 1 A = ⎣ 1 0 ⎦. 0 1 ⎡
Since the dimension of the domain is less than the dimension of the co-domain, Theorem 99 implies that M is not surjective. The range of M is spanned by the columns of A (which are linearly independent), so R(M ) = sp{(1, 1, 0), (1, 0, 1)} and rank(M ) = 2. By the fundamental theorem, we see that nullity(M ) = 0; thus ker(M ) is trivial and M is injective. (b) M : R3 → R2 define by M (x) = Ax, where A=
1 1
2 1 0 −1
.
Since the dimension of the domain is greater than the dimension of the co-domain, Theorem 93 implies that M cannot be injective. A direct calculation shows that ker(M ) = sp{(1, −1, 1)}, and hence nullity(M ) = 1. By the fundamental theorem, we see that rank(M ) = 2 = dim(R2 ). Hence M must be surjective. 8. We now repeat Exercise 6 for the following operators: (a) M : Z23 → Z33 defined by M (x) = Ax, where
⎤ 1 2 A = ⎣ 2 1 ⎦. 1 2 ⎡
Since the dimension of the domain is less than the dimension of the co-domain, Theorem 99 implies that M is not surjective. Moreover, the second column of A is the additive inverse of the first, so the range of M is spanned by a single vector, the first column of A: R(M ) = sp{(1, 2, 1)}. Then rank(M ) = 1 and the fundamental theorem implies that ker(M ) has dimension 1, so M is not injective either. A direct calculation shows that ker(M ) = sp{(1, 1)}.
3.6. THE FUNDAMENTAL THEOREM; INVERSE OPERATORS (b) M : Z33 → Z23 define by M (x) = Ax, where A=
1 1
0 2 2 1
75
.
Since the dimension of the domain is greater than the dimension of the co-domain, Theorem 93 implies that M cannot be injective. A direct calculation shows that ker(M ) = sp{(1, 2, 1)}, and hence nullity(M ) = 1. The fundamental theorem then implies that rank(M ) = 2 = dim(Z23 ), and hence M must be surjective. 9. Let X and U be finite-dimensional vector spaces over a field F , let X = {x1 , x2 , . . . , xn } and U = {u1 , u2 , . . . , um } be bases for X and Y respectively, and let T : X → U be linear. We wish to prove that T is invertible if and only if [T ]X ,U is an invertible matrix. We will write A = [T ]X ,U ∈ F m×n . Notice that under either hypothesis, m = n must hold, and we will assume this in what follows. First suppose that T is invertible, and let B = [T −1 ]U ,X . For all x ∈ X, we have (BA)[x]X ) = B(A[x]X ) = B[T (x)]U = [T −1 (T (x))]X = [x]X . For all u ∈ F n , there exists x ∈ X with [x]X = u, and therefore we have (BA)u = u for all u ∈ F n . Thus BA = I, which shows (by Theorem 118) that A is invertible (and B = A−1 ). Conversely, suppose A is invertible, and define S : U → X by [S(u)]X = A−1 [u]U for all u ∈ U. (In other words, S(u) = v1 x1 + · · · + vn xn , where v ∈ F n is defined by v = A−1 [u]U .) For all x ∈ X, we have [S(T (x))]X = A−1 [T (x)]U = A−1 (A[x]X ) = (A−1 A)[x]X = I[x]X = [x]X . This in turn implies that S(T (x)) = x for all x ∈ X. A similar calculation shows that T (S(u)) = u for all u ∈ U , and hence T is invertible (and S = T −1 ). 10. Suppose X and U are n-dimensional vector spaces over a field F , and let T : X → U be linear. (a) If there exists S : U → X such that S(T (x)) = x for all x ∈ X, then T is invertible and S = T −1 . (b) If there exists S : U → X such that T (S(u)) = u for all u ∈ U , then T is invertible and S = T −1 . Both results follow from Theorem 73 once we know that T is invertible. Suppose first that S(T (x)) = x for all x ∈ X. Then, if x1 , x2 ∈ X and T (x1 ) = T (x2 ), we see that S(T (x1 )) = S(T (x2 )) and hence that x1 = x2 . Therefore T is injective and by Corollary 105, T is invertible. Next suppose that T (S(u)) = u for all u ∈ U . Then, given any u ∈ U , T (x) = u for x = S(u). It follows that T is surjective, and once again, Corollary 105 implies that T is invertible. This completes the proof. 11. We wish to determine whether the following statement is true or not. Let X and U be vector spaces over a field F , where X and U are finite-dimensional, and let T : X → U be linear. Then T is nonsingular if and only if T has full rank. In fact, this statement is true. We know that T is nonsingular if and only if nullity(T ) = 0, while T has full rank if and only if rank(T ) = dim(X). By the fundamental theorem, rank(T ) + nullity(T ) = dim(X), which shows that rank(T ) = dim(X) if and only if nullity(T ) = 0. 12. Let X and U be vector spaces over a field F , let X be finite-dimensional, and let T : X → U be linear. We wish to prove that rank(T ) + nullity(T ) = dim(X). Choose vectors x1 , . . . , xk in X such that {T (x1 ), . . . , T (xk )} is a basis for R(T ). We know that {x1 , . . . , xk } must be linearly independent (see the proof of Theorem 99). Next choose a basis {y1 , . . . , y } for ker(T ). We will show that {x1 , . . . , xk , y1 , . . . , y } is a basis for X; then it follows that dim(X) = k + = rank(T ) + nullity(T ),
CHAPTER 3. LINEAR OPERATORS
76
and the proof will be complete. First we show that {x1 , . . . , xk , y1 , . . . , y } is linearly independent. Let x ∈ ker(T ) ∩ sp{x1 , . . . , xk }, say x = c1 x1 + · · · + ck xk . Then, since x ∈ ker(T ), T (x) = 0 and hence c1 T (x1 ) + · · · + ck T (xk ) = T (c1 x1 + · · · + ck xk ) = 0. Since {T (x1 ), . . . , T (xk )} is linearly independent, this implies that c1 = . . . = ck = 0, that is, x = 0. Therefore, ker(T ) ∩ sp{x1 , . . . , xk } = {0}. Now, for any scalars α1 , . . . , αk , β1 , . . . , β ∈ F , α1 x1 + · · · + αk xk + β1 y1 + · · · + β y = 0 ⇒ T (α1 x1 + · · · + αk xk + β1 y1 + · · · + β y ) = 0 ⇒ T (α1 x1 + · · · + αk xk ) = −T (β1 y1 + · · · + β y ) = 0 ⇒ α1 x1 + · · · + αk xk ∈ ker(T ) ⇒ α1 x1 + · · · + αk xk = 0 ⇒ α1 = . . . αk = 0. We then see that β1 y1 + · · · + β y = 0, which implies that β1 = · · · = β = 0 as well. Therefore, {x1 , . . . , xk , y1 , . . . , y } is linearly independent. Now suppose z ∈ X. We have T (z) ∈ R(T ), and therefore, for some scalars α1 , . . . , αk ∈ F , T (z) = α1 T (x1 ) + · · · + αk T (xk ) ⇒ T (z − (α1 x1 + · · · + αk xk )) = 0 ⇒ z − (α1 x1 + · · · + αk xk ) ∈ ker(T ) ⇒ z − (α1 x1 + · · · + αk xk ) = β1 y1 + · · · + β y for some β1 , . . . , β ∈ F . But then z = α1 x1 + · · · + αk xk + β1 y1 + · · · + β y , and we have shown that {x1 , . . . , xk , y1 , . . . , y } spans X. 13. Let F be a field and suppose A ∈ F m×n , B ∈ F n×p . We wish to show that that rank(AB) ≤ rank(A). For all x ∈ F p , we have (AB)x = A(Bx), where Bx ∈ F n . It follows that (AB)x ∈ R(A) for all x ∈ F p , which shows that R(AB) ⊂ R(A). Hence, by Exercise 2.6.13, dim(R(AB)) ≤ dim(R(A)), that is, rank(AB) ≤ rank(A). 14. Consider the following initial value problem: Find x ∈ C 1 (a, b) satisfying x (t) + p(t)x(t) = f (t), a < t < b,
(3.1)
x(t0 ) = 0.
(3.2)
Here p ∈ C(a, b), f ∈ C(a, b), and a < t0 < b. We will consider p and t0 fixed throughout this problem. (a) Let us define S = {x ∈ C 1 (a, b) : x(t0 ) = 0} and L : S → C(a, b) by L(x) = x + px. Notice that S is a subspace of C 1 (a, b). Then the IVP (3.1–3.2) is equivalent to L(x) = f . (b) It is easy to verify that L is linear: L(x + y) = (x + y) + p(x + y) = x + y + px + py = (x + px) + (y + py) = L(x) + L(y), L(αx) = (αx) + p(αx) = αx + αpx = α(x + px) = αL(x). Since, by the existence and uniqueness theorem for IVPs, L(x) = f has a unique solution x for each f ∈ C(a, b), it follows that L is invertible, and hence x = L−1 (f ). By Theorem 108, L−1 is linear, and hence the mapping from f to x is linear.
3.6. THE FUNDAMENTAL THEOREM; INVERSE OPERATORS
77
(c) If we now change the initial condition to x(t0 ) = x0 , where x0 = 0, then the relationship between f and x is no longer linear. To see this, we first find y ∈ C 1 (a, b) such that y (t) + p(t)y(t) = 0 for all t ∈ (a, b) and y(t0 ) = x0 . Then, for any f ∈ C(a, b), if x̂ = L−1 (f ), then the solution of x (t) + p(t)x(t) = f (t), a < t < b, x(t0 ) = 0 is x = x̂ + y = L−1 (f ) + y (this is easy to verify directly). Thus we see that the mapping from f to x is not linear in this case, but rather “affine” (linear plus a constant)—comparable to F : Rn → Rm defined by F (x) = Ax + b for some A ∈ Rm×n , b ∈ Rm , b = 0 (cf. Exercise 3.1.1). 15. Suppose A ∈ Cn×n is called strictly diagonally dominant: n
|Aii | >
|Aij |, i = 1, 2, . . . , n. j = 1 j = i
We wish to prove that a A is nonsingular. Suppose x ∈ Rn satisfies x = 0, Ax = 0. Let j be the index with the property that |xj | ≥ |xk | for all k = 1, 2, . . . , n, k = j. Then n
(Ax)i =
n
Aij xj = Aii xi +
Aij xj . j = 1 j = i
j=1
Since (Ax)i = 0, we obtain
n
Aii xi = −
Aij xj . j =1 j = i
But
% % % % n % % % % Aij xj % ≤ %− % % j=1 % % j = i
n
n
|Aij ||xj | ≤ j =1 j = i
|Aij ||xi | j = 1 j = i
n
= |xi |
|Aij | < |xi ||Aii |. j = 1 j = i
This is a contradiction, which shows that x cannot be nonzero. Thus the only solution of Ax = 0 is x = 0, which shows that A must be nonzero. 16. We wish to prove the following theorem: Let X and U be vector spaces over a field F , and let T : X → U be linear. (a) There exists a left inverse S of T if and only if T is injective (and hence nonsingular). (b) There exists a right inverse S of T if and only if T is surjective. First, suppose that there exists a left inverse S : U → X of T . Then x1 , x2 ∈ X, T (x1 ) = T (x2 ) ⇒ S(T (x1 )) = S(T (x2 )) ⇒ x1 = x2 . Thus T is injective. Conversely, suppose T is injective. Let {x1 , . . . , xn } be a basis for X; then {T (x1 ), . . . , T (xn )} is a basis for R(T ) (cf. the proof of Theorem 93). We extend {T (x1 ), . . . , T (xn )} to a basis {T (x1 ), . . . , T (xn ), un+1 , . . . , um }
CHAPTER 3. LINEAR OPERATORS
78 of U and define S : U → T by
S(α1 T (x1 ) + · · · + αn T (xn ) + αn+1 un+1 + · · · + αm um ) = α1 x1 + · · · + αn xn . It is easy to prove that S is linear. For any x = α1 x1 + · · · + αn xn in X, we have S(T (α1 x1 + · · · + αn xn )) = S(α1 T (x1 ) + · · · + αn T (xn )) = α1 x1 + · · · + αn xn by definition of S. Thus S is a left inverse of T . Suppose that there exists a right inverse S : U → X of T . Then, for any u ∈ U , x = S(u) satisfies T (x) = T (S(u)) = u. This shows that T is surjective. Conversely, suppose T is surjective. Let {u1 , . . . , um } be a basis for U , and choose x1 , . . . , xm ∈ X such that T (xi ) = ui , i = 1, . . . , m. It follows that {x1 , . . . , xm } must be linearly independent (cf. the proof of Theorem 99). Extend {x1 , . . . , xm } to a basis {x1 , . . . , xm , xm+1 , . . . , xn } of X. Now defined S : U → X by S(α1 u1 + · · · + αm um ) = α1 x1 + · · · + αm xm . Then S is a right inverse of T : T (S(α1 u1 + · · · + αm um )) = T (α1 x1 + · · · + αm xm ) = α1 T (x1 ) + · · · + αm T (xm ) = α1 u1 + · · · + αm um . Since every u ∈ U can be represented in the form u = α1 u1 + · · · + αm um , this proves the desired result. 17. We wish to prove the following theorem: Let A ∈ F m×n , where F is a field. (a) There exists a left inverse B of A if and only if N (A) is trivial. (b) There exists a right inverse B of A if and only if col(A) = F m . This follows from the previous exercise. Let T : F n → F m be defined by T (x) = Ax. If A has a left inverse B, define S : F m → F n by S(u) = Bu. Then S is a left inverse of T , as is easy to verify, hence T is injective, which implies that N (A) = ker(T ) is trivial. Conversely, if N (A) is trivial, then T is injective, which implies that T has a left inverse S : F m → F n . Let B be the matrix representing S (with respect to the standard bases). Then (BA)x = B(Ax) = S(T (x)) = x for all x ∈ F n , which shows that BA = I and hence that B is a left inverse of A. The proof of the second part of the theorem follows similarly from the second part of the previous exercise. 18. Let F be a field and suppose A ∈ F n×n . (a) We wish to prove that if there exists a matrix B ∈ F n×n such that AB = I, where I ∈ F n×n is the identity matrix, then A is invertible and B = A−1 . The hypothesis means that B is a right inverse of A, which implies by the previous exercise that col(A) = F n , that is, that rank(A) = n. Hence A is invertible (see page 137 of the text). Moreover, B = A−1 (just multiply both sides of AB = I on the left by A−1 ). (b) Next, we show that if there exists a matrix B ∈ F n×n such that BA = I, where I ∈ F n×n is the identity matrix, then A is invertible and B = A−1 . The hypothesis means that B is a left inverse of A, which means that N (A) is trivial and hence A is invertible. Once again, we obtain B = A−1 , this time by multiplying both sides of BA = I on the right by A−1 . 19. Consider the operators D : Pn → Pn−1 and S : Pn−1 → Pn defined by x D(p) = p , (S(p))(x) = p(t) dt. 0
3.6. THE FUNDAMENTAL THEOREM; INVERSE OPERATORS By the fundamental theorem of calculus, d (D(S(p)))(x) = dx
x
79
p(t) dt
= p(x)
0
for all p ∈ Pn−1 , and hence D is a left inverse of S. On the other hand, x p (t) dt = p(x) − p(0). (S(D(p)))(x) = 0
Thus S(D(p)) = p unless p(0) = 0, which does not hold for all p ∈ Pn . Thus D is not a right inverse of S. 20. Let M : Pn → Pn+1 be the operator defined in Exercise 6. (a) Since M is injective, it has a left inverse, namely, L : Pn+1 → Pn by L(a0 + a1 x + · · · + an+1 xn+1 ) = a1 + a2 x + · · · + an+1 xn . (b) The operator M is not surjective, so it has no right inverse. 21. (a) Let M : R2 → R3 be the operator of Exercise 7(a). Since M is not surjective, it has no right inverse. However, M is injective, so it does have a left inverse. We can construct it using the reasoning in the solution of Exercise 16(a). We see that {(1, 1, 0), (1, 0, 1)} is a basis for col(M ), and we extend this to the basis {(1, 1, 0), (1, 0, 1), (0, 0, 1)} of R3 . Let B11 B12 B13 . B= B21 B22 B23 We wish to find the six entries of B so that ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ 0 1 1 0 0 1 ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ . , B 0 = , B 0 = B 1 = 0 1 0 1 1 0 This gives us six (simple) equations for the six unknown entries of B, and we easily find 0 1 0 . B= 1 −1 0 Thus a left inverse L : R3 → R2 of M is defined by L(x) = Bx. (We can verify this by verifying that BA = I.) (b) Now let M : R3 → R2 be the operator of Exercise 7(b). Since M is not injective, it has no left inverse. However, M is surjective, and hence it does have a right inverse. We can construct it using the reasoning in the solution of Exercise 16(b). It is easy to find x1 , x2 ∈ R3 such that M (x1 ) = e1 , M (x2 ) = e2 , namely, x1 = (0, 1/2, 0), x2 = (1, −1/2, 0). We then define L : R3 → R2 by L(e1 ) = x1 , L(e2 ) = x2 , that is, L(u) = Bu, where ⎡ ⎤ 0 1 B = ⎣ 12 − 12 ⎦ . 0 0 Then L is a right inverse of M (we can verify this by verifying that AB = I). 22. Let X, Y , and Z be vector spaces over a field F , and let T : X → Y and S : Y → Z be linear operators. (a) Suppose S and T have left inverses R and P , respectively (that is, P (T (x)) = x for all x ∈ X and R(S(y)) = y for all y ∈ Y ). Then P R is a left inverse for ST : (P R)(ST )(x) = P (R(S(T (x)))) = P (T (x)) = x for all x ∈ X.
CHAPTER 3. LINEAR OPERATORS
80
(b) Now suppose that S and T have right inverses R and P , respectively (so that T (P (y)) = y for all y ∈ Y and S(R(z)) = z for all z ∈ Z). Then P R is a right inverse for ST : (ST )(P R)(z) = S(T (P (R(z)))) = S(R(z)) = z for all z ∈ Z. In the following exercises, we assum that X is a finite-dimensional vector space over a field F , and X = {x1 , . . . , xn } and Y = {y1 , . . . , yn } are two bases for X. 23. Since Y is a basis for X , each of the basis vectors xj ∈ X can be written uniquely as n
xj =
Cij yi i=1
for scalars C1j , C2j , . . . , Cnj ∈ F . Let C ∈ F n×n be defined by the scalars Cij , i, j = 1, 2, . . . , n. We wish to show that that for all x ∈ X, C[x]X = [x]Y . Let us write α = [x]X . Then n n n n n x=
αj xj = j=1
αj j=1
Cij yi
=
i=1
Cij αj yi j=1 i=1 n n
=
Cij αj yi i=1 j=1 n
=
⎛ ⎝
i=1
n
⎞ Cij αj ⎠ yi
j=1
n
=
(Cα)i yi . i=1
But x =
n i=1 (Cα)i yi implies that Cα = [x]Y , which is what we wanted to prove.
24. Since the change of coordinate matrix C is square, we can prove that it is invertible by proving that n N (C) is trivial. Suppose Cα = 0 and define x = j=1 αj xj . Then [x]Y = Cα = 0, which shows that x = 0 · y1 + · · · + 0 · yn = 0. This in turn shows that α1 = . . . = αn = 0 (since {x1 , . . . , xn } is linearly independent). Therefore C is nonsingular and hence invertible. Note that this implies that [x]X = C −1 [x]Y for all x ∈ X. 25. Now suppose L : X → X is linear, and let A = [L]X ,X and B = [L]Y,Y . We then have, for all x ∈ X, B[x]Y = [L(x)]Y ⇒ BC[x]X = C[L(x)]X ⇒ C −1 BC[x]X = [L(x)]X . Since there is a unique matrix [L(x)]X ,X satisfying [L(x)]X ,X [x]X = [L(x)]X for all x ∈ X, this shows that A = C
−1
BC or, equivalently, B = CAC −1 .
26. Now suppose L : X → U is linear, where X and U are two finite-dimensional vector spaces. Suppose X has bases {x1 , . . . , xn } and Y = {y1 , . . . , yn }, while U has bases U = {u1 , . . . , um } and V = {v1 , . . . , vm }. Let C and D be the change of coordinate matrices from X to Y and from U to V, respectively, so that C[x]X = [x]Y for all x ∈ X, D[u]U = [u]V for all u ∈ U. Then, for all x ∈ X, B[x]Y = [L(x)]V ⇒ BC[x]X = D[L(x)]U ⇒ D−1 BC[x]X = [L(x)]U , which implies that A = D−1 BC. Equivalently, B = DAC −1 .
3.7. GAUSSIAN ELIMINATION
3.7
81
Gaussian elimination
1. Suppose A ∈ F m×n , and row reduction, applied to Ax = 0, produces a general solution x = α1 v1 + α2 v2 + . . . + αk vk , where xi1 , xi2 , . . . , xik are the free variables, the parameter αj corresponds to xij (that is, xij = αj ), and v1 , v2 , . . . , vk ∈ F n . We wish to prove that {v1 , v2 , . . . , vk } is linearly independent. We will write xp1 , xp2 , . . . , xp for the remaining (non-free, that is, constrained) variables (so k + = n must hold). We will assume that Gaussian elimination with back substitution (in augmented matrix form) reduces [A|0] to [M |0], where any rows of zeros have been dropped. Then M has rows, one corresponding to each constrained variable. We will index the rows of M by ip1 , . . . , ip (instead of by 1, . . . , ). Now, Ax = 0 if and only if M x = 0, and this last equation is solved to yield k
xpq = −
Mpq ,ij xij , q = 1, . . . , . j=1
This expresses the constrained variables in terms of the free variables. We then define xij = αj , j = 1, . . . , k, which yields k
xpq = −
Mpq ,ij αj , q = 1, . . . , , j=1
and from this we can find the vectors v1 , . . . , vk that span the null space of A: ⎧ 1, r = ij , ⎨ 0, r = i1 , . . . , ik , r = ij , (vj )r = ⎩ −Mpq ,ij , r = p 1 , . . . , p . Then we see explicitly that, for each j = 1, . . . , k, exactly one of v1 , . . . , vk has a 1 for the ij th component and the rest have 0 for that component. Therefore, (α1 v1 + · · · + αk vk )ij = αj , and hence α1 v1 + · · · + αk vk = 0 implies that αj = 0 for each j = 1, . . . , k. Thus {v1 , . . . , vk } is linearly independent, and hence a basis for N (A). 2. For each given system of equation, we apply the row reduction algorithm. (a) The system has a unique solution, x = (1, 4, −2). The coefficient matrix A has rank 3 and nullity 0. (b) The system is inconsistent; Gaussian elimination reduces the augmented matrix to ⎡ ⎤ −2 −9 −12 1 8 ⎢ 0 −3 −12 −9 12 ⎥ ⎢ ⎥. ⎣ 0 0 −3 −3 3 ⎦ 0 0 0 0 40 The coefficient matrix A has rank 3 and nullity 1; a basis for N (A) is {(2, 1, −1, 1)}. (c) The system has infinitely many solutions; one is (0, 2, 0). The rank of the coefficient matrix A is 2 and the nullity is 1; a basis for N (A) is {(−3, −3, 1)}. (d) The system has infinitely many solutions; one is (−3, 0, 0). The rank of the coefficient matrix A is 1 and the nullity is 2; a basis for N (A) is {(3, 1, 0), (−1, 0, 1)}. (e) The system has infinitely many solutions; one is (−2, 0, 0, 0). The rank of the coefficient matrix A is 2 and the nullity is 2; a basis for N (A) is {(−5, 3, 1, 0), (−1, 0, 0, 1)}.
CHAPTER 3. LINEAR OPERATORS
82
(f) The system has a unique solution, x = (4, −2, 1, 3). The coefficient matrix A has rank 4 and nullity 0. (g) The system has infinitely many solutions; one is (1, 1, 0). The rank of the coefficient matrix A is 2 and the nullity is 1; a basis for N (A) is {(3, 0, 1)}. 3. The matrix A has no inverse; its rank is only 2. ⎤ −19 1 −3 ⎣ −11 0 −2 ⎦ . 6 0 1 ⎡
4. The inverse of A is
5. The system has a unique solution x = (0, 1, 1) ∈ Z32 . ⎤ 1 1 1 1 ⎦. 1 2
⎡
6. The inverse of A is
2 ⎣ A= 1 0 7. The matrix A is not invertible; its rank is only 2. 8. The inverse of A is
A=
3.8
1 2 5 − 5i 1 3 10 + 10 i
1 3 10 + 10 i 3 1 10 − 10 i
.
Newton’s method
1. The two solutions of (3.13) are ⎛) ⎞ ⎛ ) ⎞ √ √ √ √ −1 + 5 5 −1 + 5 5 −1 + −1 + ⎝ ⎠ , ⎝− ⎠. , , 2 2 2 2 2. Consider the following system of nonlinear equations: x31 + x1 − x2 + 1 = 0, x22 − 3x2 − x1 − 2 = 0. (a) The two equations are graphed in the figure shown below; we see that there are two solutions, approximately (1, 4) and (−1, −0.5). (b) We apply Newton’s method to solve F (x) = 0, where 3 x1 + x1 − x2 + 1 F (x) = x22 − 3x2 − x1 − 2 to find that the two solutions are (1.18145, 3.83055), (−0.814090, −0.353621) (to six digits). 10
5
0
−5
−10 −10
−5
0
5
10
3.9. LINEAR ORDINARY DIFFERENTIAL EQUATIONS
83
3. We apply Newton’s method to solve F (x) = 0, where F : R3 → R3 is defined by ⎡ 2 ⎤ x1 + x22 + x23 − 1 F (x) = ⎣ x21 + x22 − x3 ⎦ . 3x1 + x2 + 3x3 There are two solutions: (−0.721840, 0.311418, 0.6180340), (−0.390621, −0.682238, 0.6180340) (to six digits). 4. We apply Newton’s method to solve F (x) = 0, where F : R4 → R4 is defined by ⎡ 2 ⎤ x1 + x22 + x23 + x24 − 1 ⎢ ⎥ x2 − x21 ⎥. F (x) = ⎢ 2 ⎣ ⎦ x3 − x2 2 x4 − x3 There are two solutions: (±0.752437, 0.566161, 0.320538, 0.102745). 5. Newton’s method for f (x) = 0, f : R → R, takes the form xk+1 = xk −
f (xk ) , k = 0, 1, 2, . . . . f (xk )
. If f (x) = cos (x) − x and x0 = 0.5, Newton’s method converges to x = 0.739085.
3.9
Linear ordinary differential equations
1. If x1 (t) = ert , x2 (t) = tert , where r ∈ R, then the Wronskian matrix is rt e 0 t0 ert0 W = . rert0 ert0 + rt0 ert0 Choosing t0 = 0, we obtain
W =
1 0 r 1
,
which is obvious nonsingular. Thus {ert , tert } is a linearly independent subset of C(R). 2. If x1 (t) = ert cos (θt), x2 (t) = ert sin (θt), where r, θ ∈ R, then the Wronskian matrix is ert0 cos (θt0 ) ert0 sin (θt0 ) . W = rert0 cos (θt0 ) − θert0 sin (θt0 ) rert0 sin (θt0 ) + θert0 cos (θt0 ) Choosing t0 = 0, we obtain
W =
1 0 r θ
,
which is obvious nonsingular as long as θ = 0. Thus {ert cos (θt), tert sin (θt)} is a linearly independent subset of C(R). 3. (a) The general solution of x + 4x + 4x = 0 is x(t) = c1 e−2t + c2 te−2t .
CHAPTER 3. LINEAR OPERATORS
84 (b) The general solution of x + 2x + 2x = 0 is
x(t) = c1 e−t cos (t) + c2 e−t sin (t). (c) The general solution of x + 3x + 2x = 0 is x(t) = c1 e−t + c2 e−2t . 4. We have already seen that the general solution of x + 3x + 2x = 0 is x(t) = c1 e−t + c2 e−2t . Given that x(t) = sin (t) solves x + 3x + 2x = sin (t) + 3 cos (t), the general solution of this ODE is x(t) = sin (t) + c1 e−t + c2 e−2t . 5. We must find c1 , c2 such that x(t) = sin (t) + c1 e−t + c2 e−2t satisfies x(0) = 1, x (0) = 0. The solution is c1 = 1, c2 = 0, so x(t) = sin (t) + e−t solves the given IVP. 6. Let x1 , x2 , . . . , xn be the special solutions of αn x(n) + αn−1 x(n−1) + · · · + a1 x + α0 x = 0 (k)
(k)
satisfying xj (0) = 1 if k = j − 1 and xj (0) = 0 otherwise. We wish to prove that, for any values of c1 , c2 , . . . , cn , the function x = c1 x1 + c2 x2 + · · · + cn xn solves the IVP αn x(n) + αn−1 x(n−1) + · · · + a1 x + α0 x = 0, x(0) = c1 , x (0) = c2 , . . . , x(n−1) (0) = cn . By linearity, we already know that x satisfies the ODE, since each xi satisfies this equation. It remains only to verify the initial conditions. We have (k)
(k)
x(k) = c1 x1 + c2 x2 + · · · + cn x(k) n , and therefore
(k)
(k)
x(k) (0) = c1 x1 (0) + c2 x2 (0) + · · · + cn x(k) n (0). (k)
The only nonzero term on the right is ck+1 xk+1 (0) = ck+1 . Thus x(k) (0) = ck+1 , k = 0, 1, . . . , n − 1, as desired. 7. Consider the set {x1 , x2 , x3 } ⊂ C(R), where x1 (t) = t, x2 (t) = t2 , x3 (t) = t3 . (a) The Wronskian matrix of x1 , x2 , x3 at t0 = 1 is ⎡
1 W =⎣ 1 0
⎤ 1 1 2 3 ⎦. 2 6
A direct calculation shows that ker(W ) = {0}, and hence W is nonsingular. By Theorem 129, this implies that {x1 , x2 , x3 } is linearly independent. (b) The Wronskian matrix of x1 , x2 , x3 at t0 = 0 is ⎤ ⎡ 0 0 0 W = ⎣ 1 0 0 ⎦, 0 2 0 which is obviously singular. Theorem 130 states that, if x1 , x2 , x3 are all solutions of a third-order linear ODE, then W is nonsingular for all t0 if and only if {x1 , x2 , x3 } is linearly independent. In this example, {x1 , x2 , x3 } is linearly independent, but W is singular for t0 = 0. This does not violate Theorem 130, because x1 , x2 , x3 are not solutions of a common third-order ODE.
3.10. GRAPH THEORY
3.10
85
Graph theory
1. Let G be a graph, and let vi , vj be two nodes in VG . Since (A G )ij is the number of walks of length joining vi and vj , the distance between vi and vj is the smallest ( = 1, 2, 3, . . .) such that (A G )ij = 0. 2. Let G be a graph. We wish to prove that diam(G) is the smallest value of d such that, for each i = j, there exists such that 1 ≤ ≤ d and A G ij = 0. But this follows immediately from the previous exercise: If, for each i = j, there exists ≤ d with (A G )ij = 0, then d(vi , vj ) ≤ d for all pairs vi , vj . By definition, the diameter of G is the maximum d(vi , vj ), which equals the smallest d such that d(vi , vj ) ≤ d for all i = j. 3. Let G be a graph, and let AG be the adjacency matrix of G. We wish to prove that A2G ii is the degree of vi for each i = 1, 2, . . . , n. Since AG is symmetric ((AG )ij = (AG )ji for all i, j = 1, 2, . . . , n), we have
A2G ii =
n
(AG )2ij .
j=1
Now, (AG )2ij is 1 if an edge joins vi and vj , and it equals 0 otherwise. Thus A2G ii counts the number of edges having vi as an endpoint. 4. Let G be a complete graph with n vertices. (a) The adjacency matrix AG is the n × n matrix with each diagonal entry equal to 0 and each nondiagonal entry equal to 1. (b) We have
* (AkG )ij =
(n−1)k −(−1)k , n (n−1)k −(−1)k + (−1)k , n
i = j, i = j.
This shows how many walks there are from vi to vj (i = j) and from vi back to itself (i = j). This formula can be discovered and proved as follows: By generating a number of examples, we notice that (AkG )ij is constant for all i = j, and also (AkG )ii is constant for all i. Let us call these quantities rk and sk . We further notice that rk and sk satisfy the following recurrence relations: r1 = 1, r2 = n − 2, rk+1 = (n − 2)rk + (n − 1)rk−1 , k = 2, 3, . . . , s1 = 0, s2 = n − 1, sk+1 = (n − 2)sk + (n − 1)sk−1 , k = 2, 3, . . . . We can solve these recurrences to find the above formulas. It then straightforward, using the simple form of AG , to verify the formulas by induction. The induction proof is simplified by the following notation: ei ∈ Rn is the usual standard basis vector, ai ∈ Rn is the vector with every component equal to 1 except for the ith component, which equals 0. Then the ith column (and the ith row) of AG is ai , and the ith column (and the ith row) of AkG is rk ai + sk ei . Using n − 2, i = j, 1, i = j, ai · e j = ai · a j = n − 1, i = j, 0, i = j, it is easy to compute (Ak+1 G )ij in terms of rk and sk , and the induction proof proceeds smoothly. 5. Suppose G and H are graphs, each having n nodes. Suppose there exists a permutation matrix P ∈ Rn×n such that AG = P AH P T . We wish to show that G and H are isomorphic. Let P be defined by the permutation τ1 , . . . , τn of 1, . . . , n, that is, suppose 1, j = τi , Pij = 0, j = τi . Then a straightforward calculation shows that AG = P AH P T ⇒ (AG )ij = (AH )τi ,τj , i, j = 1, . . . , n.
CHAPTER 3. LINEAR OPERATORS
86
Let the nodes of G be v1 , . . . , vn and the nodes of H be u1 , . . . , um . The above formula shows that vi , vj are adjacent in G if and only if uτi , uτj are adjacent in H. This shows that φ : VG → VH defined by φ(vi ) = uτi is a graph isomorphism. 6. The desired permutation matrix P is
⎡
0 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎢ 0 P =⎢ ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 1
3.11
0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0
0 0 0 1 0 0 0 0
0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0
0 0 0 0 1 0 0 0
⎤ 1 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎥ ⎥. 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 0
Coding theory
1. Consider the code C of Example 141. (a) We wish to show that c = (0, 1, 0, 1, 0, 0) is not a codeword. From the given generator matrix G, we see that, for mG = c is equivalent to the equations m1 = 0, m2 = 0, m3 = 0, m4 = 0, m1 + m3 + m4 = 0, m1 + m2 + m4 = 0. But these equations are inconsistent (the first four yield m = (0, 1, 0, 1), and this m does not satisfy the fifth equation). (b) Now we wish to show that there is not a unique codeword closest to c = (0, 1, 0, 1, 0, 0). In fact, there are two codeswords whose distance from c is 1: c1 = (0, 1, 0, 1, 1, 0), c2 = (0, 1, 1, 1, 0, 0. These codewords correspond to the messages m1 = (0, 1, 0, 1), m2 = (0, 1, 1, 1), respectively. 2. The following message is received. 010110 001101 010110 010001 010001 111111 010110 000000 001010 000111 It is known that the code of Example 141 is used. We wish to decode and find the message. Let B be the 10 × 6 binary matrix whose rows are the above codewords. Solving M G = B yields ⎤ ⎡ 0 1 0 1 ⎢ 0 0 1 1 ⎥ ⎥ ⎢ ⎢ 0 1 0 1 ⎥ ⎥ ⎢ ⎢ 0 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 1 0 0 ⎥ ⎥ ⎢ M =⎢ ⎥ ⎢ 1 1 1 1 ⎥ ⎢ 0 1 0 1 ⎥ ⎥ ⎢ ⎢ 0 0 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 1 0 ⎦ 0 0 0 1 Combining every two rows to get one ASCII code, we obtain 01010011, 01010100, 01001111, 01010000, 00100001, which is the ASCII code for “STOP!”
3.12. LINEAR PROGRAMMING
87
3. The following message is received. 010110 000101 010110 010110 010001 100100 010110 010001 It is known that the code of Example 141 is used. Let B be the 8 × 6 binary matrix whose rows are the above codewords. If we try to solve M G = B, we obtain ⎤ ⎡ 0 1 0 1 ⎢ x x x x ⎥ ⎥ ⎢ ⎢ 0 1 0 1 ⎥ ⎥ ⎢ ⎢ 0 1 0 1 ⎥ ⎥, ⎢ M =⎢ ⎥ ⎢ 0 1 0 0 ⎥ ⎢ 1 0 0 1 ⎥ ⎥ ⎢ ⎣ 0 1 0 1 ⎦ 0 1 0 0 where the second “codeword” cannot be decoded because it is not a true codeword (that is mG = b2 is inconsistent, where b2 = 000101). In fact, both 000111 (= mG for m = 0001) and 001101 (= mG for m = 0011) are distance 1 from b2 . This means that the first ASCII character, with code 0101xxxx could be 01010001 (Q) or 01010011 (S). The remaining characters are 01010101 (U), 01001001 (I), 01010100 (T), so the message is either “QUIT” or “SUIT”. 4. The following message is received. 0000101 0001110 0100101 0100101 0101011 1000011 0100101 0101011 It is known that the code of Example 142 is used. (a) Solving M G = B, where B is the 8 × 7 matrix whose rows are the blocks of the received message, we find ⎤ ⎡ x x x x ⎢ 0 0 0 1 ⎥ ⎥ ⎢ ⎢ 0 1 0 1 ⎥ ⎥ ⎢ ⎢ 0 1 0 1 ⎥ ⎥. ⎢ M =⎢ ⎥ ⎢ 0 1 0 0 ⎥ ⎢ 1 0 0 1 ⎥ ⎥ ⎢ ⎣ 0 1 0 1 ⎦ 0 1 0 0 The first block, b1 = 0000101, does not equal any codeword (mG = b1 is inconsistent), and hence there is an error in the received message. (b) However, there is a single codeword at distance 1 from b1 , namely, b = 0100101, correponding to m = 0101. Thus we decode b1 as 0101 and obtain the following message in ASCII: 01010001, 01010101, 01001001, 01010100. This spells “QUIT”.
3.12
Linear programming
1. The feasible set for LP (3.41) (page 199 of the text) is shown in the figure below. 8
6
4
2
0 0
2
4
6
CHAPTER 3. LINEAR OPERATORS
88 Notice that the set is unbounded toward the upper right. 2. (a) The feasible set for the LP is shown in the figure below. 15
10
5
0 0
5
10
15
20
25
30
35
(b) The optimal solution is (x1 , x2 ) = (5/2, 11/2), with z = 27/2. (c) There are 10 basic solutions: (0, 0)∗ , (7, 0)∗ , (8, 0), (30, 0), (6, 2)∗ , (40/9, 46/9), (5/2, 11/2)∗, (0, 6)∗ , (0, 8), (0, 14). The basic feasible solutions are indicated with an asterisk. (d) There are five basic feasible solutions. Using the simplex method with the usual starting point ((x1 , x2 ) = (0, 0)), we move to (x1 , x2 ) = (0, 6) (x2 enters the basis, x4 leaves) and then to (x1 , x2 ) = (5/2, 11/2) (x1 enters, x3 leaves). 3. Consider the LP from the previous exercise, with c changed to c = (2, 1). The new LP differs from the last in that the contours of the objective function are now parallel to one of the faces of the feasible set. This suggests that the optimal solution is not unique. Indeed, if we apply the simplex method, starting with (x1 , x2 ) = (0, 0), we move to (x1 , x2 ) = (0, 6) (x2 enters the basis, x4 leaves), then to (x1 , x2 ) = (5/2, 11/2) (x1 enters, x3 leaves), and then to (x1 , x2 ) = (6, 2) (x4 enters, x5 leaves). At this point, the equation for z reads z = 14 − x5, which shows that (6, 2) is optimal, but also that the nonbasic variable x3 does not affect the objective function. We can bring x3 into the basis, which forces x2 to become 0 and leave the basis. We obtain the BFS (7, 0), which is also optimal, but with the same value of z. Indeed, any point on the line segment with endpoints (6, 2) and (7, 0) is optimal (although only the endpoints are basic feasible solutions). 4. (a) The feasible set for the LP is shown in the figure below. 4
3
2
1
0 0
1
2
3
4
Notice that the origin in infeasible. Therefore, there is not an obvious BFS to use to start the simplex method (we cannot simply set the original variables to zero and solve for the slack variables), and therefore we must use the two-phase simplex method. (b) Using the two-phase, we find that (x1 , x2 , x3 , x4 , x5 ) = (2, 0, 0, 4, 2) is feasible. We then apply the ordinary simplex method to obtain the optimal solution (x1 , x2 , x3 , x4 , x5 ) = (1, 3, 2, 0, 0). (c) The vertices of the feasible region visited by the simplex method are (2, 0), (4, 0), and (1, 3). This corresponds to the choice of x3 entering on the first iteration (x5 leaves), x2 entering on the second iteration (x4 leaves). 5. The LP is unbounded; every point of the form (x1 , x2 ) = (3 + t, t/3), t ≥ 0, is feasible, and for such points z increases without bound as t → ∞. 6. The optimal solution is (x1 , x2 , x3 ) = (0, 2, 2), where z = 10.
3.12. LINEAR PROGRAMMING
89
7. Define H = {x ∈ Rn : r · x ≤ β}, where r ∈ Rn and β ∈ R are given. We wish to show that H is a convex set. Let x1 , x2 belong to H and assume α1 , α2 ∈ (0, 1), α1 + α2 = 1. Then r · (α1 x1 + α2 x2 ) = α1 (r · x1 ) + α2 (r · x2 ) ≤ α1 β + α2 β = (α1 + α2 )β = β, since α1 + α2 = 1 by assumption. This shows that α1 x1 + α2 x2 ∈ H, and therefore that H is convex. 8. Let A be an arbitrary index set, and assume that Cα is a convex subset of Rn for each α ∈ A. We wish to prove that + C= Cα α∈A
is also convex. Let x1 , x2 ∈ C and α1 , α2 ∈ R, α1 + α2 = 1. Then, by definition of C, x1 , x2 ∈ Cα for each α ∈ A, and, since each Cα is convex, it follows that α1 x1 + α2 x2 ∈ Cα for all α ∈ A. But this implies that α1 x1 + α2 x2 belongs to the intersection of the Cα ’s, that is, α1 x1 + α2 x2 ∈ C. Thus C is convex. 9. Consider an LP in standard equality form, and let S be the feasible set of the LP (which we know is a polyhedron). Assume that rank(A) = m. We wish to prove that x is an extreme point of S if and only if x is a BFS of the LP. Let us suppose first that x ∈ S is a BFS. Then we can partition {1, 2, . . . , n} into subsets B and N such that N contains n − m indices, {Ai : i ∈ B} is linearly independent (that is, AB is invertible), and xi = 0 for all i ∈ N . We then have xB = A−1 B b. Now suppose u, v ∈ S, α, β ∈ (0, 1) satisfy α + β = 1, αu + βv = x. We then have, in particular, αuN + βvN = xN = 0. Since uN , vN ≥ 0, α, β > 0, this implies that uN = vN = 0. But then Au = b ⇒ AB uB + AN uN = b ⇒ AB uB = b ⇒ uB = A−1 B b = xB . This shows that u = x, and similarly v = x. Therefore x is an extreme point of S. Conversely, suppose x ∈ S is not a BFS. We will show that x cannot be an extreme point. If x is not a BFS, then for every index sets B, N ⊂ {1, 2, . . . , n} with m and n − m elements, respectively, B ∩ N = ∅, and such that AB is invertible, it holds that xN = 0. In particular, we must be able to choose B and N such that xB > 0 and xN ≥ 0, xN = 0. Choose any index j ∈ N such that xj > 0 and define y ∈ Rn by i ∈ N, i = j, xi , −1 and yB = A−1 yi = B b − AB AN yN . xj + t, i = j, Notice that yN = xN + te, where e is a standard basis vector in Rn−m . We then have −1 −1 −1 yB = A−1 B b − AB AN xN − tAB AN e = xB − tAB AN e.
Since xj > 0, xB > 0, there exists s > 0 sufficiently small that y ≥ 0 for all t with |t| ≤ s. Now define u = y with t = s and v = y with t = −s. By construction, u and v both belong to S. We have 1 1 1 1 uN + vN = (xN + se) + (xN − se) = xN 2 2 2 2 and
1 1 1 1 uB + vB = xB − sA−1 xB + sA−1 B AN e + B AN e = xB , 2 2 2 2
that is, 1 1 u + v. 2 2 Since u = x, v = x by construction, this proves that x is not an extreme point. x=
CHAPTER 3. LINEAR OPERATORS
90 10. Let C be a convex subset of Rn . Suppose
x1 , . . . , xk ∈ C, α1 , . . . , αk ≥ 0, α1 + · · · + αk = 1. We wish to prove that α1 x1 + · · · + αk xk belongs to C. We can argue by induction on k. By definition of convex set, the result holds for k = 2. Suppose that for any x1 , . . . , xk−1 ∈ C, α1 , . . . , αk−1 ≥ 0, α1 + · · · + αk−1 = 1, α1 x1 + · · · + αk−1 xk−1 ∈ C. Now suppose x1 , . . . , xk ∈ C, α1 , . . . , αk ≥ 0, α1 + · · · + αk = 1. We can assume αk = 1, since otherwise the desired result is immediate. Now, αk−1 α1 α1 x1 + · · · + αk xk = (1 − αk ) x1 + · · · + xk−1 + αk xk . 1 − αk 1 − αk We have
α1 αk−1 α1 + · · · + αk−1 + ··· + = = 1, 1 − αk 1 − αk 1 − αk
and clearly αi /(1 − αk ) ≥ 0 for i = 1, . . . , k − 1. Therefore, α1 αk−1 x1 + · · · + xk−1 ∈ C 1 − αk 1 − αk by the induction hypothesis, and then α1 x1 + · · · + αk xk = (1 − αk )
α1 αk−1 x1 + · · · + xk−1 1 − αk 1 − αk
+ αk xk ∈ C
since 1 − αk ≥ 0, αk ≥ 0, (1 − αk ) + αk = 0, and C is convex. This completes the proof by induction. 11. If we apply the simplex method to the LP of Example 158, using the smallest subscript rule to choose both the entering and leaving variables, the method terminates after 7 iterations with an optimal solution of x = (1, 0, 1, 0) and z = 1. The basic variables change as follows: {x5 , x6 , x7 } → {x1 , x6 , x7 } → {x1 , x2 , x7 } → {x3 , x2 , x7 } → {x3 , x4 , x7 } → {x5 , x4 , x7 } → {x5 , x1 , x7 } → {x5 , x1 , x3 }.
Chapter 4
Determinants and eigenvalues 4.1
The determinant function
1. Define
⎡
0 A=⎣ 0 1
1 0 0
⎤ ⎡ 0 0 1 ⎦, B = ⎣ 0 1 0
⎤ 0 1 1 0 ⎦. 0 0
We have det(A) = det(A1 , A2 , A3 ) = det(e3 , e1 , e2 ) = −det(e1 , e3 , e2 ) = det(e1 , e2 , e3 ) = 1 (using Theorem 161, part 4, twice). Similarly, det(B) = det(B1 , B2 , B3 ) = det(e3 , e2 , e1 ) = −det(e1 , e2 , e3 )) = −1. 2. Define
⎡
0 ⎢ 0 A=⎢ ⎣ 1 0
0 1 0 0
0 0 0 1
⎤ ⎡ 0 1 ⎢ 0 0 ⎥ ⎥, B = ⎢ ⎣ 0 0 ⎦ 1 0
0 0 1 0
1 0 0 0
⎤ 0 1 ⎥ ⎥. 0 ⎦ 0
Then det(A) = det(e3 , e2 , e4 , e1 ) = −det(e3 , e2 , e1 , e4 ) = det(e1 , e2 , e3 , e4 ) = 1. Similarly, det(B) = det(e4 , e3 , e1 , e2 ) = −det(e1 , e3 , e4 , e2 ) = det(e1 , e3 , e2 , e4 ) = −det(e1 , e2 , e3 , e4 ) = −1. 3. Define
⎡
a ⎢ 0 A=⎢ ⎣ 0 0
0 0 b 0 0 c 0 0
⎡ ⎤ a 0 ⎢ 0 0 ⎥ ⎥, B = ⎢ ⎣ 0 0 ⎦ 0 d 91
b c e f 0 h 0 0
⎤ d g ⎥ ⎥. i ⎦ j
CHAPTER 4. DETERMINANTS AND EIGENVALUES
92 Then
det(A) = det(ae1 , be2 , ce3 , de4 ) = adet(e1 , be2 , ce3 , de4 ) = abdet(e1 , e2 , ce3 , de4 ) = abcdet(e1 , e2 , e3 , de4 ) = abcddet(e1 , e2 , e3 , e4 ) = abcd. Here we have repeatedly used the second part of the definition of the determinant. To compute det(B), we note that we can add any multiple of one column to another without changing the determinant (the third part of the definition of the determinant). We have det(B) = det(ae1 , be1 + ee2 , ce1 + f e2 + he3 , de1 + ge2 + ie3 + je4 ) = adet(e1 , be1 + ee2 , ce1 + f e2 + he3 , de1 + ge2 + ie3 + je4 ) = adet(e1 , ee2 , f e2 + he3 , ge2 + ie3 + je4 ) = aedet(e1 , e2 , f e2 + he3 , ge2 + ie3 + je4 ) = aedet(e1 , e2 , he3 , ie3 + je4 ) = aehdet(e1 , e2 , e3 , ie3 + je4 ) = aehdet(e1 , e2 , e3 , je4 ) = aehjdet(e1 , e2 , e3 , e4 ) = aehj. 4. Define
⎡
1 ⎢ 0 A=⎢ ⎣ 0 0
0 1 0 0
λ λ λ 0
⎤ ⎡ 1 0 ⎢ 0 0 ⎥ ⎥, B = ⎢ ⎣ 0 0 ⎦ 0 1
0 1 0 0
a b c d
⎤ 0 0 ⎥ ⎥. 0 ⎦ 1
Reasoning as in the previous exercise, we have det(A) = det(e1 , e2 , λe1 + λe2 + λe3 , e4 ) = det(e1 , e2 , λe2 + λe3 , e4 ) = det(e1 , e2 , λe3 , e4 ) = λdet(e1 , e2 , e3 , e4 ) = λ. The determinant of B is computed similarly: det(B) = det(e1 , e2 , ae1 + be2 + ce3 + de4 , e4 ) = det(e1 , e2 , be2 + ce3 + de4 , e4 ) = det(e1 , e2 , ce3 + de4 , e4 ) = det(e1 , e2 , ce3 , e4 ) = cdet(e1 , e2 , e3 , e4 ) = c. 5. Consider the permutation τ = (4, 3, 2, 1) ∈ S4 . We can write τ = [2, 3][1, 4], τ = [3, 4][2, 4][2, 3][1, 4][1, 3][1, 2]. Since τ is the product of an even number of permutations, we see that σ(τ ) = 1. 6. The permutation τ = (2, 5, 3, 1, 4) ∈ S5 can be written as τ = [1, 5][4, 5][1, 2], τ = [3, 5][4, 5][1, 4][1, 3][1, 2]. Since τ is the product of an odd number of permutations, we see that σ(τ ) = −1.
4.1. THE DETERMINANT FUNCTION
93
7. Let A ∈ F 2×2 , where F is any field. There are two permutations in S2 , namely, (1, 2) and (2, 1) (with σ((1, 2)) = 1 and σ((2, 1)) = −1). We therefore have det(A) = σ((1, 2))A1,1 A2,2 + σ((2, 1))A2,1 A1,2 = A1,1 A2,2 − A2,1 A1,2 . explicitly in terms of the entries of A. 8. Let A ∈ F 3×3 , where F is any field. There are 6 permutations in S3 , listed below with their signatures: σ((1, 2, 3)) = σ((2, 3, 1)) = σ((3, 1, 2)) = 1, σ((1, 3, 2)) = σ((2, 1, 3)) = σ((3, 2, 1)) = −1. We therefore have det(A) = σ((1, 2, 3))A1,1 A2,2 A3,3 + σ((2, 3, 1))A2,1 A3,2 A1,3 + σ((3, 1, 2))A3,1 A1,2 A2,3 + σ((1, 3, 2))A1,1 A3,2 A2,3 + σ((2, 1, 3))A2,1 A1,2 A3,3 + σ((3, 2, 1))A3,1 A2,2 A1,3 = A1,1 A2,2 A3,3 + A2,1 A3,2 A1,3 + A3,1 A1,2 A2,3 − A1,1 A3,2 A2,3 − A2,1 A1,2 A3,3 − A3,1 A2,2 A1,3 . 9. Let A ∈ F 2×2 , b ∈ F 2 . Let us suppose that Gaussian elimination is applied to solve Ax = b. Writing the system in augmented matrix form, we have A12 b1 A11 A11 A12 b1 → 21 21 0 A22 − A b2 − A A21 A22 b2 A11 A12 A11 b1 A12 b1 A11 . = A21 0 A−1 11 (A11 A22 − A21 A12 ) b2 − A11 b1 This calculation assumes that A11 = 0. In this case, we see that there is a unique solution for all b ∈ R2 if and only if det(A) = A11 A22 − A21 A12 is nonzero. If A11 = 0, then the system reduces to A12 x2 = b1 , A21 x1 + A22 x2 = b2 , and it is obvious that there is a unique solution if and only if A12 and A21 are both nonzero, that is, if and only if det(A) = −A21 A1 2 = 0. Thus, in either case, A is nonsingular if and only if det(A) = 0. 10. Let F be a field and let A1 , A2 , . . . , An , B be vectors in F n . Suppose {A1 , A2 , . . . , An } is linearly dependent and det satisfies Definition 160. We wish to show that det(A1 , . . . , Aj + B, . . . , An ) = det(A1 , . . . , B, . . . , An ). We consider two cases. First, suppose Aj can be written in terms of A1 , . . . , Aj−1 , Aj+1 , . . . , An , say Aj =
λi Ai . i=j
Then, by Theorem 161, part 1, ⎛
⎞
det(A1 , . . . , Aj + B, . . . , An ) = det ⎝A1 , . . . , Aj + B −
λi Ai , . . . , An ⎠ i=j
= det(A1 , . . . , B, . . . , An ), as desired.
CHAPTER 4. DETERMINANTS AND EIGENVALUES
94
If Aj is not a linear combination of A1 , . . . , Aj−1 , Aj+1 , . . . , An , then the set {A1 , . . . , Aj−1 , Aj+1 , . . . , An } must be linearly dependent. In this case, both {A1 , . . . , Aj−1 , Aj + B, Aj+1 , . . . , An } and {A1 , . . . , Aj−1 , B, Aj+1 , . . . , An } are linearly dependent, and hence both det(A1 , . . . , Aj + B, . . . , An ), det(A1 , . . . , B, . . . , An ) equal 0. Thus the result holds in this case also. 11. Let n be a positive integer, and let i and j be integers satisfying 1 ≤ i, j ≤ n, i = j. For any τ ∈ Sn , define τ by τ = τ [i, j] (that is, τ is the composition of τ and the transposition [i, j]. Finally, define f : Sn → Sn by f (τ ) = τ . We wish to prove that f is a bijection. First, let γ ∈ Sn , and define θ ∈ Sn by θ = γ[i, j]. Then f (θ) = (γ[i, j])[i, j] = γ([i, j][i, j]). It is obvious that [i, j][i, j] is the identity permutation, and hence f (θ) = γ. This shows that f is surjective. Similarly, if θ1 , θ2 ∈ Sn and f (θ1 ) = f (θ2 ), then θ1 [i, j] = θ2 [i, j]. But then θ1 [i, j] = θ2 [i, j] ⇒ (θ1 [i, j])[i, j] = (θ2 [i, j])[i, j] ⇒ θ1 ([i, j][i, j]) = θ2 ([i, j][i, j]) ⇒ θ1 = θ2 . This shows that f is injective, and hence bijective. 12. Let τ = (j1 , j2 , . . . , jn ) ∈ Sn . This permutation can be written as the product of transpositions, and each transposition of A1 , . . . , An changes the determinant by a factor of −1. Therefore, det(Aj1 , Aj2 , . . . , Ajn ) = σ(τ )det(A1 , A2 , . . . , An ) (the sign changes if the number of transpositions is odd and stays the same if the number of transpositions is even). 13. Suppose that the following, alternate definition of the determinant is adopted: Let F be a field and n a n positive integer. The determinant function det : (F n ) → F is defined by the following properties: (a) If {e1 , e2 , . . . , en } is the standard basis for F n , then det(e1 , e2 , . . . , en ) = 1. (b) For any A1 , A2 , . . . , An ∈ F n and any i = j, 1 ≤ i, j ≤ n, det(A1 , . . . , Aj , . . . , Ai , . . . , An ) = −det(A1 , . . . , Ai , . . . , Aj , . . . , An ). (c) For any A1 , A2 , . . . , An ∈ F n , any λ ∈ F , and any j, 1 ≤ j ≤ n, det(A1 , . . . , Aj−1 , λAj , Aj+1 , . . . , An ) = λdet(A1 , . . . , An ). (d) For any A1 , A2 , . . . , An , Bj ∈ F n , and any j, 1 ≤ j ≤ n, det(A1 , . . . , Aj + Bj , . . . , An ) =
det(A1 , . . . , Aj , . . . , An ) + det(A1 , . . . , Bj , . . . , An ).
4.2. FURTHER PROPERTIES OF THE DETERMINANT FUNCTION
95
Prove that this alternate definition is equivalent to Definition 160. We have already seen that if det satisfies Definition 160, then it satisfies the properties of the alternate definition (see Theorem 161). Let us supppose, then, that det satisfies the alternate definition. Two parts of Definition 160 are included in the alternate definition; we need merely prove that for any A1 , A2 , . . . , An ∈ F n , any λ ∈ F , and any i = j, 1 ≤ i, j ≤ n, det(A1 , . . . , Aj−1 , Aj + λAi , Aj+1 , . . . , An ) = det(A1 , . . . , An ). By properties (d) and (c) above, we have det(A1 , . . . , Aj−1 , Aj + λAi , Aj+1 , . . . , An ) = det(A1 , . . . , Aj , . . . , An ) + det(A1 , . . . , λAi , . . . , An ) = det(A1 , . . . , Aj , . . . , An ) + λdet(A1 , . . . , Ai , . . . , An ). The last determinant is det(A1 , . . . , Ai , . . . , Ai , . . . , An ) (the vector Ai appears as both the ith and jth inputs). By property (b) above, if we interchange the ith and jth inputs, the sign of the determinant must change; since interchanging the ith and jth inputs in det(A1 , . . . , Ai , . . . , Ai , . . . , An ) does not change the result, it must be the case that det(A1 , . . . , Ai , . . . , Ai , . . . , An ) = 0. Hence det(A1 , . . . , Aj−1 , Aj + λAi , Aj+1 , . . . , An ) = det(A1 , . . . , Aj , . . . , An ), as desired. This completes the proof.
4.2
Further properties of the determinant function
1. It is not the case that if A ∈ Rn×n and all of the diagonal entries of A are zero, then det(A) = 0. For example, 0 1 ⇒ det(A) = −1 A= 1 0 and
⎡
0 A=⎣ 1 1
⎤ 1 1 0 1 ⎦ ⇒ det(A) = 2. 1 0
2. Let F be a field and let A ∈ F n×n . We wish to prove that AT is singular if and only if A is singular. This is simple using the determinant: AT is singular if and only if det(AT ) = 0. But det(AT ) = det(A), and det(A) = 0 if and only if A is singular. Thus AT is singular if and only if A is singular. 3. Let F be a field and let A ∈ F n×n . We wish to show that AT A is singular if and only if A is singular. We have det(AT A) = det(AT )det(A) = det(A)2 , since det(AT ) = det(A). It follows that det(AT A) = 0 if and only if det(A) = 0, that is, AT A is singular if and only if A is singular. 4. Let n be a positive integer. (a) We wish to show that σ τ −1 = σ(τ ) for all τ ∈ Sn . We first note that each transposition is its own inverse: [i, j][i, j] = ι, where we write ι for the identity permutation: ι = (1, 2, . . . , n). We can write τ as the product of transpositions: τ = [i1 , j1 ][i2 , j2 ] · · · [ik , jk ]. It follows that τ −1 = [ik , jk ] · · · [i2 , j2 ][i1 , j1 ]. Since the products representing τ and τ −1 have the same number of factors, it follows that σ(τ −1 ) = σ(τ ).
CHAPTER 4. DETERMINANTS AND EIGENVALUES
96
(b) Define f : Sn → Sn by f (τ ) = τ −1 . We wish to prove that f is a bijection. For any τ ∈ Sn , f (τ −1 ) = τ since (τ −1 )−1 = τ (this is a general fact about invertible functions). Thus f is surjective. If τ, γ ∈ Sn and f (τ ) = f (γ), then τ −1 (i) = γ −1 (i) for all i = 1, 2, . . . , n ⇒ γ(τ −1 (i)) = γ(γ −1 (i)) for all i = 1, 2, . . . , n ⇒ γ(τ −1 (i)) = i for all i = 1, 2, . . . , n. By Theorem 73 of Section 3.3, this last condition implies that γ = (τ −1 )−1 = τ . Therefore, f is injective, and we have proved that f is bijective. 5. Let F be a field, let A be any matrix in F n×n , and let X ∈ F n×n be invertible. Then, using the product rule for determinants and the fact that det(X −1 ) = det(X)−1 , we have det(X −1 AX) = det(X −1 )det(A)det(X) = det(X)−1 det(A)det(X) = det(A). 6. Let F be a field and let A, B ∈ F n×n . We wish to prove that AB is singular if and only if A is singular or B is singular. We have det(AB) = det(A)det(B), and therefore det(AB) = 0 if and only if det(A) = 0 or det(B) = 0. Since a square matrix is singular if and only if its determinant is zero, this yields the desired result. 7. Let F be a field and suppose x1 , . . . , xn ∈ F n , A ∈ F n×n . We wish to compute det(Ax1 , . . . , Axn ) in terms of det(x1 , . . . , xn ). The vectors Ax1 , . . . , Axn are the columns of AX, where X = [x1 | · · · |xn ]. Therefore, det(Ax1 , . . . , Axn ) = det(AX) = det(A)det(X) = det(A)det(x1 , . . . , xn ). 8. Let A ∈ F m×n , B ∈ F n×m , where m > n. We wish to prove that det(AB) = 0. Since B has more columns than rows, there exists x ∈ Rm , x = 0, such that Bx = 0 (see Exercise 3.2.1). But then (AB)x = 0, x = 0, and this implies that AB is singular, which in turn implies that det(AB) = 0. 9. Let A ∈ F m×n , B ∈ F n×m , where m < n. We will show by example that both det(AB) = 0 and det(AB) = 0 are possible. First, note that ⎤ ⎡ 1 1 4 4 1 2 1 ⎦ ⎣ , B = 1 1 ⇒ AB = A= 2 2 1 0 1 1 1 ⇒ det(AB) = 4 · 2 − 2 · 4 = 0. On the other hand, A=
1 2 1 0
1 1
⎤ 1 1 6 , B = ⎣ 2 0 ⎦ ⇒ AB = 2 1 1 ⎡
2 2
⇒ det(AB) = 6 · 2 − 2 · 2 = 8.
4.3
Practical computation of det(A)
1. (a) The determinant is −30. (b) The determinant is −1. (c) The determinant is 24. 2. (a) The determinant is 0 and the matrix is singular.
4.3. PRACTICAL COMPUTATION OF DET(A)
97
(b) The determinant is 1 and the matrix is nonsingular. (c) The determinant is 0 and the matrix is singular. 3. We wish to prove Theorem 175. First, let A ∈ F n×n be given, where F is a field, and let B be the matrix obtained by interchanging columns 1 and j of A (j > 1). Then n
det(A) = −det(B) = −
(−1)i+1 Bi,1 det(B (i,1) )
i=1 n
=
(−1)i+2 Ai,j det(B (i,1) ).
i=1
Now, B (i,1) is the matrix obtained from B by deleting the ith row and first column. Let Ã1 , . . . , Ãn ∈ F n−1 be the columns of A with the ith rows removed. Then B (i,1) = [Ã2 |Ã3 | · · · |Ãj−1 |Ã1 |Ãj+1 | · · · |Ãn ]. Therefore, B (i,1) is the matrix obtained from A(i,j) by reordering the first j −1 columns: (1, 2, . . . , j −1) → (2, 3, . . . , j − 1, 1). This requires j − 2 interchanges, and hence det(B (i,1) = (−1)j−2 det(A(i,j) ). It follows that n
n
(−1)i+2 Ai,j (−1)j−2 det(A(i,j) ) =
det(A) = i=1
(−1)i+j Ai,j det(A(i,j) ),
i=1
as desired. The second part of Theorem 175 follows immediately from the facts that det(AT ) = det(A) and that cofactor expansion along a row of A is equal to cofactor expansion along a column of AT . 4. Let A ∈ F n×n be nonsingular, where F is a field, and let b ∈ F n . (a) To reduce A to upper triangular form requires n − 1 steps of Gaussian elimination. The ith step requires n−i operations of the type “add a multiple of one row to another.” Each of these operations requires one division (to compute the multiplier)and n−i each of multiplications and additions. Thus the total is n−1
, - n−1 , 2 2(n − i)2 + (n − i) = 2j + j
i=1
j=1 n−1
=2
j2 +
j=1
n−1
j j=1
n(2n − 1)(n − 1) n(n − 1) + 6 2 2 3 1 2 1 = n − n − n. 3 2 6 =2·
(b) If we simultaneously reduce [A|b] to [U |c], where U is upper triangular, we use the operations counted above, plus n − 1 steps to modify the right-hand side. The ith step requires 2(n − i) operations, so the total additional operations is n−1
n−1
2(n − i) = 2 i=1
j=1
j = n2 − n.
CHAPTER 4. DETERMINANTS AND EIGENVALUES
98
Finally, to reduce [U |c] to [I|x] requires n steps; the ith requires 1 division and n − i each of multiplications and divisions. The total is n
n−1
{2(n − i) + 1} = 2 i=1
j + n = n(n − 1) + n = n2 .
j=1
The grand total, then, of reducing [A|b] to [I|x] is 2 3 1 2 1 2 3 7 n − n − n + n2 − n + n2 = n3 + n2 − n. 3 2 6 3 2 6 5. Let A ∈ F n×n be nonsingular, where F is a field, and let b ∈ F n . We wish to count the arithmetic operations required to solve Ax = b by Cramer’s rule (assuming the determinants are computed by row reduction). To compute a determinant by row reduction requires reducing it to triangular form and multiplying the diagonal entries. Using the results of the previous exercise, this yields an operation count of 2 3 1 2 1 2 1 5 n − n − n + n − 1 = n3 − n2 + n − 1. 3 2 6 3 2 6 To apply Cramer’s rule, we need to compute n + 1 determinants, for a total operation count of 2 4 1 3 1 2 1 n + n + n − n − 1. 3 6 3 6 6. Let F be a field and let I ∈ F n×n be the identity matrix. We wish to prove that, for any x ∈ F n and any integer i between 1 and n, det(Ii (x)) = xi . Recall that adding a multiple of one column to another does not change a determinant. We have det(Ii (x)) = det(e1 , . . . , x, . . . , en ) ⎛ = det ⎝e1 , . . . , ⎛ = det ⎝e1 , . . . ,
n
⎞
xj ej , . . . , en ⎠
j=1
⎞
n
xj ej , . . . , en ⎠
xj ej − j=1
j =i
= det (e1 , . . . , xi ei , . . . , en ) = xi det (e1 , . . . , ei , . . . , en ) = xi . 7. Suppose A ∈ Rn×n is invertible and has integer entries, and assume det(A) = ±1. Notice that the determinant of a matrix with integer entries is obviously an integer. We can compute the jth column of A−1 by solving Ax = ej . By Cramer’s rule, (A−1 )ij =
det(Ai (ej )) = ±det(Ai (ej )), i = 1, . . . , n. det(A)
Since det(Ai (ej )) is an integer, so is (A−1 )ij . This holds for all i, j, and hence A−1 has integer entries. 8. As noted in the previous exercise, (A−1 )ij =
det(Ai (ej )) . det(A)
The matrix Ai (ej ) is A with its ith column replaced by ej , and using cofactor expansion down the ith column of Ai (ej ) yields det(Ai (ej )) = (−1)i+j det(A(j,i) ).
4.3. PRACTICAL COMPUTATION OF DET(A)
99
Therefore, (A−1 )ij =
(−1)i+j det(A(j,i) ) , i, j = 1, . . . , n. det(A) ⎤ 0 −1 −3 A = ⎣ −1 −2 −1 ⎦ , 0 2 7 ⎡
With
we obtain det(A) = −1 and
⎡
12 A−1 = ⎣ −7 2
⎤ −1 5 0 −3 ⎦ . 0 1
9. The solution is x = (2, 1, 1). 10. We wish to solve cos (x)u1 (x) + sin (x)u2 (x)
= 0,
− sin (x)u1 (x) + cos (x)u2 (x)
= x,
which can be written in matrix-vector form as Au(x) = b(x), where u(x) = (u1 (x), u2 (x)), b(x) = (0, x), and cos (x) sin (x) . A= − sin (x) cos (x) We have det(A) = cos2 (x) + sin2 (x) = 1, and then it easily follows from Cramer’s rule that u1 (x) = −x sin (x), u2 (x) = x cos (x). 11. Let F be a field, let x0 , x1 , . . . , xn be distinct elements of F , and let V ∈ F (n+1)×(n+1) be the matrix ⎡ ⎤ 1 x0 x20 · · · xn0 ⎢ 1 x1 x21 · · · xn1 ⎥ ⎢ ⎥ V =⎢ . .. .. .. ⎥ . .. ⎣ .. . . . . ⎦ 1
xn
x2n
···
xnn
The matrix V is called a Vandermonde matrix. The purpose of this exercise is to derive a formula for det(V ) and thereby show that V is nonsingular. We will write V (x0 , x1 , . . . , xn ) for the Vandermonde matrix defined by the numbers x0 , x1 , . . . , xn . (a) We can show that det(V (x0 , x1 , . . . , xn )) = (Πni=1 (xi − x0 )) det(V (x1 , x2 , . . . , xn )). Because the notation is difficult, we will show this explicitly for n = 3, which should make it clear how to do the general case: % % % 1 x0 x20 x30 % % % % 1 x1 x21 x31 % % det(V (x0 , x1 , x2 , x3 )) = %% 2 3 % % 1 x2 x22 x23 % % 1 x3 x3 x3 % % % % 1 % x0 x20 x30 % % % 0 x1 − x0 x21 − x20 x31 − x30 % % = %% 2 2 3 3 %. % 0 x2 − x0 x22 − x02 x23 − x03 % % 0 x3 − x0 x3 − x0 x3 − x0 %
CHAPTER 4. DETERMINANTS AND EIGENVALUES
100
The last step results from subtracting multiples of the first row from the other rows, which does not change the determinant. Then, performing a cofactor expansion down the first column, we obtain det(V (x0 , x1 , x2 , x3 )) % % % x1 − x0 x21 − x20 x31 − x30 % % % = %% x2 − x0 x22 − x20 x32 − x30 %% % x3 − x0 x23 − x20 x33 − x30 % % % 1 % = (x1 − x0 )(x2 − x0 )(x3 − x0 ) %% 1 % 1
x1 + x0 x2 + x0 x3 + x0
% x21 + x0 x1 + x20 %% x22 + x0 x2 + x20 %% . x23 + x0 x3 + x20 %
Next, we add −x0 times column 1 to column 2 and −x20 times column 1 to column 3 to obtain det(V (x0 , x1 , x2 , x3 ))
% % 1 % = (x1 − x0 )(x2 − x0 )(x3 − x0 ) %% 1 % 1
x1 x2 x3
% x21 + x0 x1 %% x22 + x0 x2 %% . x23 + x0 x3 %
Then, adding −x0 times column 2 to column 3 yields det(V (x0 , x1 , x2 , x3 ))
% % 1 x1 % = (x1 − x0 )(x2 − x0 )(x3 − x0 ) %% 1 x2 % 1 x3 or
% x21 %% x22 %% , x23 %
det(V (x0 , x1 , x2 , x3 )) = Π3i=1 (xi − x0 ) det(V (x1 , x2 , x3 )).
The proof in the case of a general n > 1 proceeds similarly. (b) We have
% % 1 det(V (x0 , x1 )) = %% 1
% x0 %% n = x1 − x0 = Πn−1 j=0 Πi=1 (xi − xj ) x1 %
(where n = 1). Assuming that, for any n ≥ 1 and any nodes x0 , . . . , xn−1 , n−1 det(V (x0 , x1 , . . . , xn−1 )) = Πn−2 j=0 Πi=j+1 (xi − xj )
(this is the induction hypothesis), we have n det(V (x1 , . . . , xn ) = Πn−1 j=1 Πi=j+1 (xi − xj ).
Therefore, using the first part of the exercise, det(V (x0 , . . . , xn )) = (Πni=1 (xi − x0 )) det(V (x1 , . . . , xn )) n = (Πni=1 (xi − x0 )) Πn−1 j=1 Πi=j+1 (xi − xj ) n = Πn−1 j=0 Πi=j+1 (xi − xj ).
This completes the proof by induction.
4.5
Eigenvalues and the characteristic polynomial
1. (a) The eigenvalues are λ1 = 1 (algebraic multiplicity 2) and λ2 = −1 (algebraic multiplicity 1). Bases for the eigenspaces are {(0, 1, 0), (1, 0, 2)} and {(4, 0, 7)}, respectively.
4.5. EIGENVALUES AND THE CHARACTERISTIC POLYNOMIAL
101
(b) The eigenvalues are λ1 = 2 (algebraic multiplicity 2) and λ2 = 1 (algebraic multiplicity 1). Bases for the eigenspaces are {(1, 1, −2)} and {(5, 5, −9)}, respectively. 2. For each of the following real matrices, find the eigenvalues and a basis for each eigenspace. (a) The only eigenvalue is λ = 2 (algebraic multiplicity 3). A basis for the eigenspace is {(1, 4, 0), (0, 1, 1)}. (b) The eigenvalues are λ1 = 1 + 2i (algebraic multiplicity 1) and λ2 = 1 − 2i (algebraic multiplicity 1). Bases for the eigenspaces are {(−i, 1)} and {(i, 1)}, respectively. 3. (a) The eigenvalues are λ1 = 0 (algebraic multiplicity 1) and λ2 = 1 (algebraic multiplicity 1). Bases for the eigenspaces are {(1, 1)} and {(1, 2)}, respectively. (b) The matrix A has no eigenvalues. The characteristic polynomial is pA (r) = r3 + r2 + 1, which has no roots in Z2 . 4. (a) The eigenvalues are λ1 = −4 + i (algebraic multiplicity 1) and λ2 = 6 + i (algebraic multiplicity 1). Bases for the eigenspaces are {(−3 + 4i, 5)} and {(5, 3 + 4i)}, respectively. (b) The eigenvalues are λ1 = −1, λ2 = 2, and λ3 = 5 (each with algebraic multiplicity 1). Bases for the eigenspaces are {(−1 + 3i, 2, −2i)}, {(−1 − 3i, 5, −3 + i)}, and {(2, 2, 3 + i)}, respectively. 5. Suppose A ∈ Rn×n has a real eigenvalue λ and a corresponding eigenvector z ∈ Cn . We wish to show that either the real or imaginary part of z is an eigenvector of A. We are given that Az = λz, z = 0. Write z = x + iy, x, y ∈ Rn . Then Az = λz ⇒ A(x + iy) = λ(x + iy) ⇒ Ax + iAy = λx + iλy ⇒ Ax = λx and Ay = λy. Since z = 0, it follows that x = 0 or y = 0. If x = 0, then the real part of z is an eigenvector for A corresponding to λ; otherwise, y = 0 and the imaginary part of z is an eigenvector. 6. Let A ∈ R2×2 be defined by
A=
a b
b c
,
where a, b, c ∈ R. (a) The characteristic polynomial of A is pA (r) = r2 − (a + c)r + ac − b2 , and hence the eigenvalues of A are given by the quadratic formula: . a + c ± (a − c)2 + 4b2 λ= . 2 Since (a − c)2 + 4b2 ≥ 0 for all a, b, c, it follows that the eigenvalues are real. (b) The matrix A has a multiple eigenvalue if and only if (a − c)2 + 4b2 = 0, which holds if and only if a = c and b = 0, that is, if and only if A has the form a 0 . A= 0 a 7. Let A ∈ R2×2 be defined by
A=
a c
b d
,
where a, b, c, d ∈ R. Then the characteristic polynomial of A is pA (r) = r2 − (a + d)r + ad − bc,
CHAPTER 4. DETERMINANTS AND EIGENVALUES
102 and the eigenvalues are given by λ=
a+d±
. (a − d)2 + 4bc . 2
(a) The matrix A has a multiple eigenvalue if and only if (a − d)2 + 4bc = 0, in which case the eigenvalue is λ = (a + d)/2. There are three case in which this holds. First, we could have b = 0 and a = d: a 0 . A= c a Second, we could have c = 0 and a = d: A=
a 0
b a
.
Finally, if b = 0, c = 0, then bc < 0 must hold (that is, b and c have opposite signs) and (a−d)2 +4bc = 0 yields c = −(a − d)2 /(4b): " ! a b 2 . A= d − (a−d) 4b Note that b = 0, c = 0, (a − d)2 + 4bc = 0 together imply that a = d. (b) Now we wish to determine, in each of the three cases described above, whether the eigenspace of λ = a + d has dimension one or two. In the first case (b = 0, d = a), the eigenspace is onedimensional if c = 0 and two-dimensional if c = 0. Similarly, in the second case (c = 0, d = a), EA (λ) is one-dimensional if b = 0 and two-dimensional if b = 0. In the third case, the eigenspace is one-dimensional (since it is easy to verify that λI − A is not the zero matrix, which is the only 2 × 2 matrix with a two-dimensional null space). 8. Let A ∈ Rn×n , where n is odd. Then the characteristic polynomial pA (r) is a polynomial with real coefficients and odd degree. By a standard theorem, pA must have a real root. (Using elementary calculus, pA is a continuous function and either lim pA (r) = −∞ and lim pA (r) = ∞
r→−∞
r→∞
or lim pA (r) = ∞ and lim pA (r) = −∞.
r→−∞
r→∞
In either case, the intermediate value theorem implies that pA (r) takes on every real number, including zero. One can also give an algebraic proof.) 9. Let q(r) = rn + cn−1 rn−1 + · · · + c0 be an arbitrary polynomial with coefficients in a field F , and let ⎡
0 0 ⎢ 1 0 ⎢ ⎢ A=⎢ 0 1 ⎢ .. ⎣ .
0 0 0 .. .
··· ··· ··· .. .
−c0 −c1 −c2 .. .
0 0
···
1
−cn−1
⎤ ⎥ ⎥ ⎥ ⎥. ⎥ ⎦
We wish to prove that pA (r) = q(r). We argue by induction on n. The case n = 1 is trivial, since A is 1 × 1 in that case: A = [−c0 ], |rI − A| = r + c0 = q(r). Suppose the result holds for polynomials of degree
4.5. EIGENVALUES AND THE CHARACTERISTIC POLYNOMIAL
103
n − 1, let q(r) = rn + cn−1 rn−1 + . . . + c0 , and let A be defined as above. Then % % % % r 0 0 ··· c0 % % % % −1 r 0 · · · c 1 % % % % 0 −1 r · · · c 2 |rI − A| = % %, % % .. . . . .. .. .. % % . % % % 0 0 · · · −1 r + cn−1 % and cofactor expansion along the first row yields % % % r % 0 ··· c1 % % % −1 r · · · % c2 % % |rI − A| = r % . %+ .. % .. ... % . % % % 0 · · · −1 r + cn−1 % % % % % −1 r ··· % % % % −1 r · · · % % % % n+1 . . .. .. (−1) c0 % %. % % % −1 r %% % % −1 % By the induction hypothesis, the first determinant is c1 + c2 r + · · · + cn−1 rn−2 + rn−1 , and the second is simply (−1)n−1 . Thus pA (r) = |rI − A| = r c1 + c2 r + · · · + cn−1 rn−2 + rn−1 + (−1)n+1 c0 (−1)n−1 = c1 r + c2 r2 + · · · + cn−1 rn−1 + rn + c0 = q(r). This completes the proof by induction. 10. Let A ∈ Cn×n , and let the eigenvalues of A be λ1 , λ2 , . . . , λn , listed according to multiplicity. Then pA (r) = (r − λ1 )(r − λ2 ) · · · (r − λn ). Multiplying this out yields pA (r) = rn − (λ1 + λ2 + · · · + λn )rn−1 + · · · + (−1)n λ1 λ2 · · · λn (we have no need to determine the other coefficients). Notice that pA (0) = (−1)n λ1 λ2 · · · λn . On the other hand, pA (r) = det(rI − A), so pA (0) = det(−A) = (−1)n det(A) (the last step follows from part 2 of the definition of the determinant, Definition 160). This shows that det(A) = λ1 λ2 · · · λn . Also, by looking at the complete expansion of the determinant det(rI − A), we see that there is exactly one term that contains a polynomial of degree greater than n − 2, namely, the term corresponding to the identity permutation: (r − A11 )(r − A22 ) · · · (r − Ann ) = rn − (A11 + A22 + · · · + Ann )rn−1 + · · · . Every other term in the complete expansion must contain at least two off-diagonal entries from rI − A, and the off-diagonal entries do not contain the variable r. Hence the remaining terms from the complete expansion only affect the coefficients of degree n − 2 or less in pA (r), and therefore pA (r) = rn − (A11 + A22 + · · · + Ann )rn−1 + · · · . This shows that A11 + A22 + · · · + Ann = λ1 + λ2 + · · · + λn , that is, tr(A) = λ1 + λ2 + · · · + λn .
CHAPTER 4. DETERMINANTS AND EIGENVALUES
104
11. Let F be a field and let A belong to F n×n . By definition, pAT (r) = det(rI − AT ) = det((rI − A)T ) = det(rI − A), where the last step follows from Theorem 167. Since pA (r) = det(rI − A), this shows that pAT = pA , and hence that A and AT have the same eigenvalues. 12. Let A ∈ F n×n be invertible. This implies in particular that λ = 0 is not an eigenvalue of A (if Ax = 0 · x = 0, x = 0, then A is singular and not invertible). Suppose λ ∈ F , x ∈ F n satisfy Ax = λx, x = 0. Multiplying both sides of the eigenvalue equation by A−1 yields x = λ(A−1 x). Since λ = 0, we can also multiply both sides by λ−1 to obtain A−1 x = λ−1 x. Since x = 0, this shows that x is also an eigenvector of A−1 . The corresponding eigenvalue is λ−1 . 13. The characteristic polynomial of A is pA (r) = r2 − 5r − 2, while the characteristic polynomial of U is pU (r) = r2 + r − 2. 14. Let A ∈ F m×n . The purpose of this exercise is to show that the nonzero eigenvalues of AT A and AAT are the same. (a) Let λ = 0 be an eigenvalue of AT A, say AT Ax = λx, where x ∈ F n is nonzero. Then, multiplying both sides of the eigenvalue equation by A, we obtain A(AT Ax) = A(λx) ⇒ (AAT )(Ax) = λ(Ax). Since λ = 0, Ax = 0 (otherwise, the equation AT Ax = λx would not hold). Therefore, λ is an eigenvalue of AAT with eigenvector Ax. (b) Let λ = 0 be an eigenvalue of AAT , say AAT x = λx, where x ∈ F m is nonzero. Then, multiplying both sides of the eigenvalue equation by AT , we obtain AT (AAT x) = AT (λx) ⇒ (AT A)(AT x) = λ(AT x). Since λ = 0, AT x = 0 (otherwise, the equation AAT x = λx would not hold). Therefore, λ is an eigenvalue of AT A with eigenvector AT x. (c) If λ = 0, then AT Ax = λx implies AT Ax = 0, and hence it is possible that Ax = 0 (in fact, as Exercise 6.4.1 shows, it is necessary that Ax = 0). Since the zero vector cannot be an eigenvector, the above reasoning fails to show that 0 is an eigenvalue (d) For the given A, the eigenvalues of AT A are 0, 1, 2, while the eigenvalues of AAT are 1, 2.
4.6
Diagonalization
1. Let A ∈ F n×n and let λ1 , λ2 be distinct eigenvalues of A. We wish to prove that Eλi (A) ∩ Eλj (A) = {0}. Suppose x ∈ Eλi (A) ∩ Eλj (A). Then Ax = λ1 x and Ax = λ2 x, which implies that λ1 x = λ2 x and hence that (λ1 − λ2 )x = 0. Since λ1 − λ2 = 0 by hypothesis, this implies that x = 0, which is what we wanted to prove. 2. A is diagonalizable: A = XDX −1 , where −1 1 −1 0 . , X= D= 1 1 0 3 3. A is diagonalizable: A = XDX −1 , where √ √ √ 1−i 2 √0 , X = i 2 −i 2 . D= 1 1 0 1+i 2
4.6. DIAGONALIZATION
105
4. A is diagonalizable: A = XDX −1 , where ⎤ ⎡ ⎡ 1 2 0 0 0 ⎦, X = ⎣ 1 D = ⎣ 0 −1 1 0 0 −1
⎤ −1 −1 1 0 ⎦. 0 1
5. A is diagonalizable: A = XDX −1 , where ⎤ ⎡ ⎡ 0 1 0 0 0 ⎦, X = ⎣ 1 D=⎣ 0 2 0 0 0 −1
⎤ −1 0 0 3 ⎦. 1 1
6. A is diagonalizable: A = XDX −1 , where ⎤ ⎤ ⎡ ⎡ 1 i −i 1 0 0 D = ⎣ 0 −i 0 ⎦ , X = ⎣ 0 −1 − i −1 + i ⎦ . 1 1 1 0 0 i 7. A is diagonalizable: A = XDX −1 , where D=
0 0 0 1
0 1 1 1
, X=
.
8. A is not diagonalizable. It has a single eigenvalue, λ = 1, and the dimension of EA (1) is only 1.
9. Let A=
a 0
b c
,
where a, b, c are elements of a given field F . The eigenvalues of A are a and c. If a = c, then A has two distinct eigenvalues and hence two linearly independent eigenvectors; that is, there is a basis for F 2 consisting of eigenvectors of A. Thus A is diagonalizable in this case. On the other hand, if a = c, then A has a single eigenvalue of algebraic multiplicity 2. If b = 0, then 0 −b = sp{(1, 0)} EA (a) = N (aI − A) = N 0 0 has dimension only 1. In this case, there is not a basis of F 2 consisting of eigenvectors of A, and A is not diagonalizable. 10. Let F be a field and let A ∈ F n×n . Suppose λ1 , . . . , λm are distinct eigenvalues of A and, for each (i) (i) i = 1, 2, . . . , m, {x1 , . . . , xki } is a linearly independent set of eigenvectors of A corresponding to λi . We wish to show that (1) (1) (2) (2) (m) (m) x1 , . . . , xk1 , x1 , . . . , xk2 , . . . , x1 , . . . , xkm (4.1) is linearly independent. We first note that if (i)
(i)
yi ∈ sp{x1 , . . . , xki }, i = 1, . . . , m, then y1 + · · · + ym = 0 implies yi = 0 for all i = 1, . . . , m. For otherwise, y1 + · · · + ym = 0 would imply that eigenvectors corresponding to distinct eigenvalues can be linearly dependent, contradicting Theorem (i) (i) 193. Now suppose α1 , . . . , αki ∈ F , i = 1, . . . , t satisfy (1) (1)
(1) (1)
(2) (2)
(2) (2)
(m) (m)
α1 x1 + · · · αk1 xk1 + α1 x1 + · · · + αk2 xk2 + · · · + α1 x1
(m) (m)
+ · · · + αkm xkm = 0.
CHAPTER 4. DETERMINANTS AND EIGENVALUES
106 (i) (i)
(i) (i)
Define yi = α1 x1 + · · · αki xki for each i = 1, . . . , m. Then y1 + · · · + ym = 0, and since (i)
(i)
yi ∈ sp{x1 , . . . , xki } for each i, it follows from our above reasoning that yi = 0 for each i. But then the linear independence (i) (i) (i) (i) of {x1 , . . . , xki } implies that α1 = · · · = αki = 0. This holds for all i = 1, . . . , m, and hence we have shown that (4.1) is linearly independent. 11. Let F be a field. We wish to show that the relation “is similar to” is an equivalence relation on F n×n . First, if A ∈ F n×n , then I −1 AI = A, where I is the identity matrix, and thus A is similar to itself. Next, if A, B ∈ F n×n and A is similar to B, then there exists an invertible X ∈ F n×n such that X −1 AX = B. But then XBX −1 = A, or Y −1 BY = A for Y = X −1 . Since Y is obviously invertible, this shows that B is similar to A. Finally, if A, B, C ∈ F n×n with A similar to B and B similar to C, then there exist invertible matrices X, Y ∈ F n×n such that X −1 AX = B and Y −1 BY = C. But then Y −1 X −1 AXY = C, or (XY )−1 A(XY ) = C. Since XY is invertible, this shows that A is similar to C, and completes the proof. 12. Let F be a field and suppose A ∈ F n×n has n distinct eigenvalues. Then, by Theorem 193, there is a linearly independent subset of F n consisting of n eigenvectors of A. But such a set is a basis for F n . Since there is a basis for F n consisting of eigenvectors of A, A is diagonalizable (by the construction on page 243 of the text). 13. Let F be a finite field. We will show that F is not algebraically closed by constructing a polynomial p(x) with coefficients in F such that p(x) = 0 for all x ∈ F . Let the elements of F be α1 , α2 , . . . , αq . Define p(x) = (x − α1 )(x − α2 ) · · · (x − αq ) + 1. Then p is a polynomial of degree q, and p(x) = 1 for all x ∈ F . Thus p(x) has no roots, which shows that F is not algebraically closed. 14. Let F be a field and suppose A, B ∈ F n×n are diagonalizable. Suppose further that the two matrices have the same eigenvectors. We wish to prove that A and B commute. Let X ∈ F n×n is an invertible matrix whose columns are eigenvectors of A. Then X −1 AX = D is diagonal. Since every eigenvector of A is also an eigenvector of B, it follows that X −1 BX = C is also diagonal. But then we can also write A = XDX −1 and B = XCX −1 , and hence AB = XDX −1 XCX −1 = XDCX −1 = XCDX −1 = XCX −1 XDX −1 = BA. Here we have used the fact that diagonal matrices always commute. This proves that A and B commute. 15. Let F be a field and suppose A ∈ F n×n satisfies A = XDX −1 , where D ∈ F n×n is diagonal and X ∈ F n×n is invertible. Then A2 = XDX −1 XDX −1 = XDDX −1 = XD2 X −1 , A3 = A2 A = XD2 X −1 XDX −1 = XD2 DX −1 = XD3 X −1 , and, in general, Ak = XDk X −1 . It is expensive to multiply two n × n matrices, requiring about 2n3 arithmetic operations. However, multiplying two diagonal matrices requires only n operations. Therefore, if n and k are very large, it is much faster to compute XDk X −1 instead of computing Ak directly. 16. Let F be a field and suppose A ∈ F n×n is diagonalizable. We wish to prove that pA (A) = 0. From the previous exercise, we know that Ak = XDk X −1 for each positive integer k. Hence, if pA (r) = c0 + c1 r + . . . + cn−1 rn−1 + rn ,
4.6. DIAGONALIZATION
107
then pA (A) = c0 I + c1 A + . . . + cn−1 An−1 + An = c0 XX −1 + c1 XDX −1 + . . . + cn−1 XDn−1 X −1 + XDn X −1 = X c0 I + c1 D + . . . + cn−1 Dn−1 + Dn X −1 = XpA (D)X −1 . The matrix D is diagonal and its diagonal entries are the eigenvalues λ1 , . . . , λn of A (listed according to multiplicity). Since (Dk )ii = λki for each positive integer k, it follows that pA (D) is the diagonal matrix with diagonal entries pA (λ1 ), . . . , pA (λn ). But pA (λi ) = 0 for all i = 1, . . . , n (since the eigenvalues of A are the roots of pA ). Thus pA (D) is the zero matrix, from which it follows that pA (D) = 0, as desired. 17. Let A ∈ F n×n be diagonalizable, where F is a field, and let λ ∈ F be an eigenvalue of A. We will prove that N ((A − λI)2 ) = N (A − λI). (a) If (A − λI)x = 0, then (A − λI)2 x = (A − λI)(A − λI)x = (A − λI)0 = 0. This shows that N (A − λI) ⊂ N ((A − λI)2 ). (b) Let A = XDX −1 , where D ∈ F n×n is diagonal, and assume that X = [X1 |X2 ], where the columns of X1 form a basis for Eλ (A), say X1 = [x1 | · · · |x ], and the columns of X2 are eigenvectors corresponding to eigenvalues of A unequal to λ, say X2 = [y1 | · · · |yk ], where Ayi = μi yi , μi = λ, i = 1, . . . , k. If N ((A − λI)2 ) ⊂ N (A − λI), then there exists a vector u ∈ F n such that (A − λI)u = 0, (A − λI)2 u = 0.
(4.2)
We can write u = Xz = X1 w + X2 v. Notice that (A − λI)X1 = 0 since each column of X1 is an eigenvector corresponding to λ. Therefore, (A − λI)u = (A − λI)X1 w + (A − λI)X2 v = (A − λI)X2 v and similarly (A − λI)2 u = (A − λI)2 X2 v. Thus (4.2) implies that (A − λI)X2 v = 0, (A − λI)2 X2 v = 0. (c) However, for each yi , (A − λI)yi = Ayi − λyi = μi yi − λyi = (μi − λ)yi , and similarly (A − λI)2 yi = (μi − λ)2 yi . Therefore, (A − λI)2 X2 v = 0 ⇒ (A − λI)2 (v1 y1 + · · · + vk yk ) = 0 ⇒ v1 (A − λI)2 y1 + · · · + vk (A − λI)2 yk = 0 ⇒ v1 (μ1 − λ)2 y1 + · · · + vk (μk − λ)2 yk = 0. Since {y1 , . . . , yk } is linearly independent and μi − λ = 0 for each i, this implies that v1 = . . . = vk = 0, that is, that v = 0. But this contradicts that (A − λI)X2 v = 0. This contradiction shows that N ((A − λI)2 ) ⊂ N (A − λI) must hold. 18. Let
⎡
1 0 A = ⎣ −5 1 0 0
⎤ 0 1 ⎦ 1
and notice that r = 1, x = (0, 1, 0) = e2 form an eigenpair of A.
CHAPTER 4. DETERMINANTS AND EIGENVALUES
108
(a) We write x1 = x and extend {x1 } to a basis {x1 , x2 , x3 } for R3 , where x2 = e1 , x3 = e3 . (b) Define X = [x1 |x2 |x3 ] = [e2 |e1 |e3 ]. We have
⎡
1 −5 1 B = X −1 AX = ⎣ 0 0 0
⎤ 1 0 ⎦. 1
In terms of the notation of the proof of Theorem 199, we have v = (−5, 1) and 1 0 . C= 0 1 (c) We have
⎤ ⎤ ⎡ ⎤⎡ 1 1 1 −5 1 1 0 ⎦ ⎣ 0 ⎦ = ⎣ 0 ⎦ = 1 · e1 . Be1 = ⎣ 0 0 0 0 0 1 ⎡
(d) It is easy to see that z = (0, 1, 5) is another eigenvector of B. We write 0 , z= u where u = (1, 5). It is straightforward to verify that Cu = 1 · u. (e) Another eigenvector of C is v = (0, 1). Obviously {u, v} is linearly independent (since v is not a multiple of u). We define 0 w= . v Note, however, w is not an eigenvector of B. Indeed, B has a single (independent) eigenvector having the first component equal to zero.
4.7
Eigenvalues of linear operators
1. Let T : R3 → R3 be defined by T (x) = (ax1 + bx2 , bx1 + ax2 + bx3 , bx2 + ax3 ), where a, b ∈ R are constants. Notice that T (x) = Ax for all x ∈ R3 , where ⎤ ⎡ a b 0 A = ⎣ b a b ⎦. 0 b a √ √ The eigenvalues of A are a, a+ 2b, a− 2b. If b = 0, then A is already diagonal, which means that [T ]X ,X is diagonal if X is the standard basis. If b = 0, then A (and hence T ) has three distinct eigenvalues, and hence three linearly independent eigenvectors. It follows that there exists a basis X such that [T ]X ,X is diagonal. 2. Let T : R3 → R3 be defined by T (x) = (x1 + x2 + x3 , x2 , x1 + x3 ). Notice that T (x) = Ax for all x ∈ R3 , where
⎡
1 A=⎣ 0 1
⎤ 1 1 1 0 ⎦. 0 1
4.7. EIGENVALUES OF LINEAR OPERATORS
109
The eigenvalues and eigenvectors of A are λ1 = 0, λ2 = 1, λ3 = 2 and x1 = (−1, 0, 1), x2 = (0, −1, 1), x3 = (1, 0, 1). If we define X = {x1 , x2 , x3 }, then T (xi ) = λi xi , which shows that ⎤ ⎡ ⎤ ⎡ λ1 0 0 0 0 0 0 ⎦ = ⎣ 0 1 0 ⎦. [T ]X ,X = ⎣ 0 λ2 0 0 2 0 0 λ3 3. Let T : R3 → R3 be defined by T (x) = (x1 + x2 + x3 , ax2 , x1 + x3 ), where a ∈ R is a constant. We wish to determine the values of a for which T is diagonalizable. We have T (x) = Ax for all x ∈ R3 , where ⎤ ⎡ 1 1 1 A = ⎣ 0 a 0 ⎦. 1 0 1 The eigenvalues of A are λ1 = 0, λ2 = a, λ3 = 2. If a = 0 and a = 2, then A (and hence T ) has three distinct eigenvalues and hence is diagonalizable. If a = 0 or a = 2, direct calculation shows that A is defective in each case, and hence T is not diagonalizable. Thus T is diagonalizable if and only if a = 0 and a = 2. 4. Let T : P2 → P2 be defined by T (p)(x) = p(ax + b), where a = 0, a = 1, and suppose p1 (x) = 1, p2 (x) =
b b2 2b + x, p3 (x) = x + x2 . + 2 a−1 (a − 1) a−1
Then obviously T (p1 ) = p1 = 1 · p1 . We have ab b + ax + b = + ax = a T (p2 )(x) = a−1 a−1
b + x = ap2 (a). a−1
Finally, b2 2b (ax + b) + (axb )2 + (a − 1)2 a−1 a2 b 2 2a2 b x + a2 x2 = + (a − 1)2 a−1 b2 2b 2 2 x + x = a2 p3 (x). =a + (a − 1)2 a−1
T (p3 )(x) =
5. Let T : P2 → P2 be defined by T (p)(x) = p(x + b), where b = 0. If p1 (x) = 1, p2 (x) = x, p3 (x) = x2 , then T (p1 )(x) = 1 ⇒ T (p1 ) = p1 , T (p2 )(x) = x + b ⇒ T (p2 ) = bp1 + p2 , T (p3 )(x) = (x + b)2 ⇒ T (p3 ) = b2 p1 + 2bp2 + p3 . Therefore
⎡
1 b A = [T ]S,S = ⎣ 0 1 0 0
⎤ b2 2b ⎦ 1
(where S is the standard basis). By inspection, A (and hence T ) has a single eigenvalue λ = 1. Moreover, a straightforward calculation shows that EA (λ) = sp{(1, 0, 0)}. Hence A is defective, and therefore so is [T ]U ,U for any basis U (since [T ]U ,U is similar to A for every basis U of P2 ). This shows that, for every basis U of P2 , [T ]U ,U is not diagonalizable.
CHAPTER 4. DETERMINANTS AND EIGENVALUES
110
6. Let T : Pn → Pn be defined by T (p)(x) = xp (x). We see that T maps xk to kxk for each k = 0, 1, . . . , n. Thus λ = k, p(x) = xk is an eigenpair for each k = 0, 1, . . . , n. Since T has at most n + 1 eigenvalues, we have found all of them, and we see that pT (r) = r(r − 1)(r − 2) · · · (r − n). (We also see that [T ]S,S is the diagonal matrix with diagonal entries 0, 1, 2, . . . , n, where S is the standard basis for Pn .) 7. Let D : Pn → Pn be the differentiation operator: D(p) = p for all p ∈ Pn . Let S = {1, x, x2 , . . . , xn } be the standard basis for Pn , and notice that D maps xk to kxk−1 . This means that [D]S,S is extremely simple: ⎤ ⎡ 0 1 0 0 ··· 0 ⎢ 0 0 2 0 ··· 0 ⎥ ⎥ ⎢ ⎢ 0 0 0 3 ··· 0 ⎥ ⎥ ⎢ . [D]S,S = ⎢ . . . . . . . ... ⎥ ⎥ ⎢ .. .. .. .. ⎥ ⎢ ⎣ 0 0 0 0 ··· n ⎦ 0 0 0 0 ··· 0 We see that λ = 0 is the only eigenvalue of D, and a straightforward calculation shows that the only eigenvector of [D]S,S is e1 ∈ Rn+1 , that is, the only eigenvector of D is the constant polynomial p(x) = 1. 8. Consider the operator T defined in Example 204 and the alternate basis B = {1 − x, 1 + x, 1 − x2 } for P2 . Writing q1 (x) = 1 − x, q2 (x) = 1 + x, q3 (x) = 1 − x2 , we have 1−b+a 1−b−a q1 (x) + q2 (x), 2 2 1+b−a 1+b+a q1 (x) + q2 (x), T (q2 )(x) = 1 + ax + b = 2 2 T (q3 )(x) = 1 − (ax + b)2 T (q1 )(x) = 1 − ax − b =
= which shows that
1 − b2 + 2ab − a2 1 − b2 − 2ab − a2 q1 (x) + q2 (x) + a2 q3 (x), 2 2 ⎡ ⎢ [T ]B,B = ⎣
1−b+a 2 1−b−a 2
1+b−a 2 1+b+a 2
0
0
1−b2 +2ab−a2 2 1−b2 −2ab−a2 2 2
⎤ ⎥ ⎦.
a
A direct calculation shows that the eigenpairs of [T ]B,B are λ1 = 1, v1 = (1, 1, 0), λ2 = a, v2 = (1 − a + b, a + b − 1, 0), λ3 = a2 , v3 = (−(1 − a + b)2 , −(1 − a − b)2 , 2(a − 1)2 ). The corresponding eigenvectors of T are computes as follows: For λ1 = 1: p1 (x) = q1 (x) + q2 (x) = 2. For λ2 = a:
p2 (x) = (1 − a + b)q1 (x) + (a + b − 1)q2 (x) = 2(a − 1)
b +x . a−1
For λ3 = a2 : p3 (x) = −(1 − a + b)2 q1 (x) − (1 − a − b)2 q2 (x) + 2(a − 1)2 q3 (x) 2b b2 2 x + x = −2(a − 1)2 . + (a − 1)2 a−1
4.7. EIGENVALUES OF LINEAR OPERATORS
111
Each of these eigenvectors of T is a scalar multiple of the corresponding eigenvector found in Example 207, and hence the results are equivalent. 9. Let L : C3 → C3 be defined by L(z) = (z1 , 2z2 , z1 + z3 ). Then L(z) = Az for all z ∈ C3 , where ⎡
1 0 A=⎣ 0 2 1 0
⎤ 0 0 ⎦. 1
The eigenvalues of A are the diagonal entries, λ1 = 1 (with algebraic multiplicity 2) and λ2 = 2. A straightforward calculation shows that EA (1) = sp{(0, 0, 1)}, and hence A is defective and not diagonalizable. Notice that A = [L]S,S , where S is the standard basis for C3 . Since [L]X ,X is either diagonalizable for every basis X or diagonalizable for no basis X , we see that [L]X ,X is not diagonalizable for every basis X of C3 . 10. Let T : C3 → C3 be defined by T (x) = (3x1 + x3 , 8x1 + x2 + 4x3 , −9x1 + 2x2 + x3 ). Then the matrix of T under the standard basis is ⎡
3 A=⎣ 8 −9
⎤ 0 1 1 4 ⎦. 2 1
The eigenvalues of A are λ1 = 1 and λ2 = 2. The second eigenvalue has algebraic multiplicity 2 but geometric multiplicity only 1: EA (2) = sp{(−1, −4, 1)}. Thus A is not diagonalizable, which shows that [T ]X ,X cannot be diagonalizable for any basis X of C3 . 11. Let T : Z22 → Z22 be defined by T (x) = (0, x1 + x2 ). Then T (x) = Ax for all x ∈ Z2 , where 0 0 . A= 1 1 The eigenvalues of A are the diagonal entries, λ1 = 0 and λ2 = 1. Corresponding eigenvectors are x1 = (1, 1) and x2 = (0, 1), respectively. It follows that if X = {x1 , x2 }, then [T ]X ,X is diagonal: 0 0 . [T ]X ,X = 0 1 12. Let T : Z33 → Z33 be defined by T (x) = (2x1 + x2 , x1 + x2 + x3 , x2 + 2x3 ). Then the matrix of T with respect to the standard basis is ⎤ ⎡ 2 1 0 A = ⎣ 1 1 1 ⎦. 0 1 2 The characteristic polynomial of A is pA (r) = r2 (r + 1), which shows that λ1 = 0 and λ2 = 2 are the eigenvalues of A (and hence of T ). The first eigenvalue has algebraic multiplicity 2, but EA (0) = sp{(1, 1, 1)}; hence A is defective and not diagonalizable, and it follows that [T ]U ,U is not diagonalizable for any basis U.
CHAPTER 4. DETERMINANTS AND EIGENVALUES
112
13. Let X be a finite-dimensional vector space over C with basis X = {u1 , . . . , uk , v1 , . . . , v }. Let U and V be the subspaces sp{u1 , . . . , uk } and sp{v1 , . . . , v }, respectively. Let T : X → X be a linear operator with the property T (u) ∈ U for each u ∈ U . (We say that U is invariant under T in this case.) (a) We wish to prove that there exists an eigenvector of T belonging to U . We define S : U → U by S(u) = T (u) for all u ∈ U . By Theorem 209, S has an eigenvalue/eigenvector pair, say λ ∈ C, w ∈ U , w = 0, S(w) = λw. Then w ∈ X and T (w) = S(w) = λw. This shows that w ∈ U is an eigenvector of T . (b) Now we prove that [T ]X ,X is a block upper triangular matrix, that is, it has the form A B ∈ Cn×n , [T ]X ,X = 0 C where A ∈ Ck×k , B ∈ Ck× , C ∈ C × , and 0 is the × k matrix of all zeros. We can partition [T ]X ,X to obtain A B ∈ Cn×n , [T ]X ,X = D C where A, B, C have the dimensions given above, and D ∈ C ×k . We must show that D = 0. Now, if we write M = [T ]X ,X and X = {x1 , . . . , xk , xk+1 , . . . , xn } (that is, xi = ui for i = 1, . . . , and xk+i = vi for i = 1, . . . , ), then M [x]X = [T (x)]X n ⇒ M ej = [T (xj )]X implies that T (xj ) = M1j x1 + · · · + Mkj xk + Mk+1,j xk+1 + · · · + Mk+ ,j xk+ . Using the definition of A, B, C, D given above, this can be written as T (uj ) = A1j u1 + · · · + Akj uk + D1j v1 + · · · + D j v . However, T (uj ) ∈ U , and the representation of a vector in terms of a basis is unique, so we must have T (uj ) = A1j u1 + · · · + Akj uk + 0 · v1 + · · · + 0 · v , that is, Dij = 0 for i = 1, . . . , and any j = 1, . . . , k. Thus D is the zero matrix and [T ]X ,X is block upper triangular, as desired. (c) We can now show that if V is also invariant under T , then [T ]X ,X is block diagonal. This is equivalent to proving that B = 0 also holds. The proof is essentially the same as in part (b). We have T (vj ) = M1,k+j x1 + · · · + Mk+ ,k+j xk+ = B1j u1 + · · · + Bkj uk + C1j v1 + · · · + C j v . Assuming T (vj ) ∈ V , the coefficients B1j , . . . , Bkj in this last equation must be zero. Since this holds for all j = 1, . . . , , it follows that B is the zero matrix. 14. Let A ∈ Cn×n be an arbitrary matrix. We will prove by induction that A is similar to an upper triangular matrix B. The case n = 1 is trivial, since every 1 × 1 matrix is upper triangular. Let us assume that every matrix in C(n−1)×(n−1) is similar to an upper triangular matrix. Let A ∈ Cn×n be given. We know that A has an eigenvalue/eigenvector pair, say λ ∈ C, x1 ∈ Cn . We can extend {x1 } to a basis {x1 , x2 , . . . , xn } of Cn . We then define X = [x1 |x2 | · · · |xn ]. We have AX = [Ax1 |Ax2 | · · · |Axn ] = [λx1 |Ax2 | · · · |Axn ] ⇒ X −1 AX = [λX −1 x1 |X −1 Ax2 | · · · |X −1 Axn ].
4.7. EIGENVALUES OF LINEAR OPERATORS
113
Since X −1 X = I and X −1 x1 is the first column of X −1 X, it follows that X −1 x1 = e1 , and thus the first column of X −1 AX is λe1 . We can therefore partition X −1 AX as λ v −1 , X AX = 0 B where v ∈ Cn−1 and B ∈ C(n−1)×(n−1) . By the induction hypthesis, there exists an invertible matrix Y ∈ C(n−1)×(n−1) such that Y −1 BY = T is upper triangular. We can then write λ 1 v 0 λ vY 1 0 λ v = = 0 B 0 Y 0 T 0 Y T Y −1 0 Y −1 (here vY is interpreted as a 1 × (n − 1) matrix v times the (n − 1) × (n − 1) matrix Y ). If we write 1 0 , Z= 0 Y then we see that Z −1 X −1 AXZ =
λ 0
vY T
,
and this last matrix is upper triangular because T is upper triangular. Since Z −1 X −1 AXZ = (XZ)−1 A(XZ), this shows that A is similar to an upper triangular matrix, which completes the proof. 15. Suppose V is a finite-dimensional vector space over C and T : V → V is linear. Let V = {v1 , . . . , vn } be any basis for V , and define A = [T ]V,V . By the previous exercise, there exists an invertible matrix X ∈ Cn×n such that X −1 AX is upper triangular. Now define U = {u1 , . . . , un } by uj = ni=1 Xij vi . Notice that, for any a1 , . . . , an ∈ C, n n n n n n n aj u j = j=1
aj j=1
Xij vi i=1
=
Xij aj vi = j=1 i=1
Xij aj vi i=1 j=1 n
= i=1
⎛ ⎝
n
⎞ Xij aj ⎠ vi .
j=1
This equation shows that [v]V = X[v]U for all v ∈ V. Now, consider [T ]U ,U . We can manipulate its definition as follows: [T ]U ,U [v]U = [T (v)]U ⇒ [T ]U ,U X −1 [v]V = X −1 [T (v)]V ⇒ X[T ]U ,U X −1 [v]V = [T (v)]V . This holds for all v ∈ V , and shows that [T ]V,V = X[T ]U ,U X −1 or, equivalently,
[T ]U ,U = X −1 [T ]V,V X.
Thus [T ]U ,U is upper triangular. Note that we cannot prove this result (or the result of the previous exercise) for an arbitrary field. We need an algebraically closed field like C to be able to conclude that every linear operator (or every matrix) has an eigenvalue/eigenvector pair.
CHAPTER 4. DETERMINANTS AND EIGENVALUES
114
16. Let T : C3 → C3 be the matrix operator defined by the matrix ⎤ 3 0 1 A = ⎣ 8 1 4 ⎦. −9 2 1 ⎡
An eigenvalue of A is λ = 2, and a corresponding eigenvector is x1 = (−1, −4, 1). We define x2 = (0, 1, 0), x3 = (0, 0, 1), and ⎤ ⎡ −1 0 0 X = ⎣ −4 1 0 ⎦ . 1 0 1 Then
⎡
2 0 X −1 AX = ⎣ 0 1 0 2
⎤ −1 0 ⎦, 2
and the matrix B from the solution of Exercise 14 is 1 0 . 2 2 We now repeat this process with the matrix B. An eigenpair is 2, (0, 1). We extend {(0, 1)} to the basis {(0, 1), (1, 0)} and define 0 1 . Y = 1 0 Then Y Finally, we define
−1
BY = ⎡
1 Z =⎣ 0 0 and then
2 2 0 1
.
⎤ 0 0 0 1 ⎦, 1 0
⎤ 2 −1 0 2 2 ⎦. Z −1 X −1 AXZ = ⎣ 0 0 0 1 ⎡
Let X be the basis consisting of the columns of ⎤ −1 0 0 XZ = ⎣ −4 0 1 ⎦ . 1 1 0 ⎡
Then [T ]X ,X = Z −1 X −1 AXZ is upper triangular.
4.8
Systems of linear ODEs
1. Let λ = α + iβ ∈ C. Then eλt = e(α+iβ)t = eαt+iβt = eαt eiβt = eαt (cos (βt) + i sin (βt)),
4.8. SYSTEMS OF LINEAR ODES
115
and therefore d # λt $ e = αeαt (cos (βt) + i sin (βt)) + eαt (−β sin (βt) + iβ cos (βt)) dt = eαt (α cos (βt) + iα sin (βt) − β sin (βt) + iβ cos (βt)) = eαt (α cos (βt) − β sin (βt) + i(α sin (βt) + β cos (βt))) = eαt (α + iβ)(cos (βt) + i sin (βt)) = λeαt (cos (βt) + i sin (βt)) = λeλt . 2. Let A ∈ Cn×n be given, and suppose that λ ∈ C, x ∈ Cn form an eigenpair of A. Then, since x is constant with respect to t, we have u (t) =
d λt d λt e x = e x = λeλt x. dt dt
On the other hand, eλt is a scalar, and so Au(t) = A eλt x = eλt Ax = eλt (λx) = λeλt x. Thus we see that u = Au. 3. Let μ, θ ∈ R and p, q ∈ Rn , and assume {p, q} is linearly independent. We wish to prove that , μt e (cos (θt)p − sin (θt)q) , eμt (sin (θt)p + cos (θt)q) is a linearly independent subset of C(R; Rn ). Notice that, for any x, y ∈ C(R; Rn ), if {x(t0 ), y(t0 )} is linearly independent (in Rn ) for some t0 ∈ R, then {x, y} is linearly independent in C(R; Rn ). This is because c1 x + c2 y = 0 ⇒ c1 x(t) + c2 y(t) = 0 for all t ∈ R ⇒ c1 x(t0 ) + c2 y(t0 ) = 0. Therefore, a nontrivial solution of c1 x + c2 y = 0 is also a nontrivial solution of c1 x(t0 ) + c2 y(t0 ) = 0, which shows that {x, y} cannot be linearly dependent if {x(t0 ), y(t0 )} is linearly independent. Now, if x(t) = eμt (cos (θt)p − sin (θt)q) , y(t) = eμt (sin (θt)p + cos (θt)q) , then {x(0), y(0)} = {p, q}, which is linearly independent by assumption. Thus {x, y} is linearly independent, as desired. Notice that if λ = μ + iθ is a complex eigenvalue of A ∈ Rn×n and x = p + iq is a corresponding eigenvector, then {p, q} must be linearly independent, since otherwise {p + iq, p − iq} would be linearly dependent, which is impossible since p + iq and p − iq are eigenvectors corresponding to the distinct eigenvalues μ + iθ, μ − iθ. 4. Let D ∈ Cn×n be diagonal and X ∈ Cn×n be invertible, and define A = XDX −1 . We wish to prove that etD and etA are invertible matrices for all t ∈ R. Since etA = XetD X −1 , the invertibility of etA follows immediately from the invertibility of etD . Moreover, etD is a diagonal matrix with the nonzero diagonal entries ed11 t , . . . , ednnt , and hence etD is invertible. 5. Let A ∈ R2×2 be defined by
A=
1 9
4 1
.
We wish to solve the IVP u = Au, u(0) = v, where v = (1, 2). The eigenvalues of A are λ1 = 7 and λ2 = −5. Corresponding eigenvectors are (2, 3) and (2, −3), respectively. Therefore, the general solution of the ODE u = Au is 2 2 + c2 e−5t . x(t) = c1 e7t 3 −3
CHAPTER 4. DETERMINANTS AND EIGENVALUES
116
We solve x(0) = (1, 2) to obtain c1 = 7/12, c2 = −1/12, and thus the solution of the IVP is 1 7 7t 2 2 − e−5t . e x(t) = 3 −3 12 12 6.
⎡ e−3t ⎢ etA = ⎣
6
3t
+ e2 + 13
e−3t 1 3 − 3 e3t 1 e−3t 6 − 2 + 3
7. e
tA
=
1 e−3t 3 − 3 2e−3t + 13 3 1 e−3t 3 − 3
e−3t e3t 1 6 − 2 + 3 −3t 1 e 3 − 3 −3t e e3t 1 6 + 2 + 3
et cos (2t) et sin (2t) −et sin (2t) et cos (2t)
⎤ ⎥ ⎦.
.
8. Consider the initial value problem u 1
u 2 u 3
= u1 + u2 + u3 , u1 (0) = 1, = u2 + u3 , u2 (0) = 0, = u3 , u3 (0) = 1.
This can be written as u = Au, u(0) = u0 , where ⎡
1 A=⎣ 0 0
⎤ 1 1 1 1 ⎦. 0 1
The matrix A is defective, having a single eigenvalue λ = 1, with EA (λ) = sp{(1, 0, 0)}. Since A is not diagonalizable, we cannot apply the methods of this section.
4.9
Integer programming
1. Each of the matrices is totally unimodular by Theorem 219. The sets S1 and S2 are given below. (a) S1 = {1, 2}, S2 = {3, 4}. (b) S1 = {1, 2, 3, 4}, S2 = ∅. (c) S1 = {2, 3}, S2 = {1, 4}. 2. An optimal solution has factory 1 shipping 350 units to store 1 and 150 to store 2, factory 2 shipping all 250 units to store 1, and factory 3 shipping all 400 units to store 2. (However, the optimal solution is not unique.) The optimal cost is $14,700. 3. Let G be a bipartite graph with VG = V1 ∪ V2 , where each edge in EG joins one vertex from V1 to a vertex in V2 . Let us write V1 = {vi1 , . . . , vik }, V2 = {vj1 , . . . , vj }, where k + = n and n is the number of vertices in G. Now, consider the vertex-edge incidence matrix A of G, and define S1 = {i1 , . . . , ik }, S2 = {j1 , . . . , j }. Then each column of A contains exactly two nonzeros, each of which is 1, and one of the nonzeros belongs to each of S1 , S2 . Therefore, by Theorem 219, A is totally unimodular. 4. Let A be the node-arc incidence matrix of a digraph G with n vertices, and define S1 = {1, 2, . . . , n}, S2 = ∅. The each column of A contains two nonzeros, a 1 and a −1, both of which belong to S1 . Therefore, by Theorem 219, A is totally unimodular.
Chapter 5
The Jordan canonical form 5.1
Invariant subspaces
1. (a) S is not invariant under A (in fact, A does not map either basis vector into S). (b) T is invariant under A. 2. Let T : P2 → P2 be defined by T (p)(x) = p(2x + 1). If p ∈ P1 , then p can be written as p(x) = a + bx and T (p)(x) = a + b(2x + 1) = (a + b) + 2bx, which shows that T (p) ∈ P1 . Thus P1 is invariant under T . 3. Let S = sp{(1, 3, 1), (1, −1, 0)} and T = sp{(0, 1, 2), (1, −1, 1)} be subspaces of R3 . Then Theorem 226 shows that R3 is not the direct sum of S and T , since dim(R3 ) = 3 = 4 = dim(S) + dim(T ). 4. Let S = sp{(0, −2, 1, 3), (−4, 1, 3, −5)}, T = sp{(4, −4, 3, 5), (4, −9, 6, 4)} be subspaces of R4 . It can be verified that {(0, −2, 1, 3), (−4, 1, 3, −5), (4, −4, 3, 5), (4, −9, 6, 4)} is linearly independent and hence spans R4 . This shows that every vector in R4 can be written as the sum of a vector in S and a vector in T . Moreover, dim(R4 ) = dim(S) + dim(T ). Therefore, by Theorem 226, R4 is the direct sum of S and T . 5. Let A ∈ R3×3 be defined by
⎡
3 0 A = ⎣ −6 1 2 0
⎤ −1 3 ⎦, 0
and let S = sp{s1 , s2 }, where s1 = (0, 1, 0), s2 = (1, 0, 1)}. A direct calculation shows that As1 = s1 , As2 = −3s1 + 2s2 . It follows that S is invariant under A. We extend {s1 , s2 } to a basis {s1 , s2 , s3 } of R3 by defining s3 = (0, 0, 1), and define X = [s1 |s2 |s3 ]. Then ⎤ ⎡ 1 −3 3 2 −1 ⎦ . X −1 AX = ⎣ 0 0 0 1 is block upper triangular (in fact, simply upper triangular). 6. Let A ∈ R4×4 be defined by
⎡
8 ⎢ 12 A=⎢ ⎣ −49 3
⎤ 0 3 3 2 6 6 ⎥ ⎥, 5 −14 −13 ⎦ 3 3 2
117
CHAPTER 5. THE JORDAN CANONICAL FORM
118
and let S = sp{s1 , s2 }, where s1 = (1, 4, −1, 3) and s2 = (4, 7, −19, 3). We will show that S is invariant under A, and find an invertible matrix X ∈ R4×4 such that X −1 AX is block upper triangular. A direct calculation shows that 10 8 8 10 As1 = s1 + s2 , As2 = − s1 − s2 , 3 3 3 3 which proves that S is invariant under A. We extend {s1 , s2 } to a basis X = {s1 , s2 , s3 , s4 } by defining s3 = (1, 0, 1, 0), s4 = (0, 1, 0, 1), and then define X = [s1 |s2 |s3 |s4 ]. We obtain ⎡ 10 ⎢ ⎢ X −1 AX = ⎢ ⎣
3 8 3
0 0
⎤ 5 − 38 − 43 3 10 1 ⎥ − 10 3 3 3 ⎥ ⎥, 0 −1 0 ⎦ 0 0 −1
which is block upper triangular, as expected. 7. Let A ∈ R4×4 be defined by
⎤ 8 −4 −1 5 ⎢ 16 −8 3 10 ⎥ ⎥. A=⎢ ⎣ 0 0 −1 0 ⎦ −2 1 6 −1 ⎡
We wish to determine if there exist a subspace R of R4 such that R4 is the direct sum of R and N = N (A). (a) According to Corollary 228 and Theorem 230, it suffices to determine if N (A2 ) = N (A) holds. A direct calculation shows that N (A) = sp{(1, 2, 0, 0)}, while N (A2 ) = sp{(1, 2, 0, 0), (−1, 0, 0, 2)}. Thus N (A2 ) = N (A), and hence there is no such subspace R. (b) Let z = (1, 2, 0, 0) ∈ N (A). If we solve Ax = z, we find that there are infinitely many solutions; one is x = (−1/2, 0, 0, 1). This shows that z ∈ col(A), and hence that N (A) ∩ col(A) is nontrivial. Hence, by Corollary 228, there is no such subspace R. 8. Let
⎡
−1 A = ⎣ −2 −4
⎤ −1 1 1 2 ⎦. −1 4
(a) A direct calculation shows that N (A2 ) = N (A) = sp{(1, 0, 1)}. Therefore, by Theorem 230 and Corollary 228, R = col(A) is invariant under A, and R3 is the direct sum of N = N (A) and R. (b) If we write z = (1, 0, 1) ∈ N (A), then we find that Ax = z is inconsistent. Therefore, z ∈ col(A). Since {z} spans N (A), this shows that N (A) ∩ col(A) = {0}, which by Corollary 228 shows that col(A) is invariant under A, and that R3 is the direct sum of N (A) and col(A). 9. Let U be a finite-dimensional vector space over a field F , and let T : U → U be a linear operator. Let U = {u1 , u2 , . . . , un } be a basis for U and define A = [T ]U ,U . Suppose X ∈ F n×n is an invertible matrix, and define J = X −1 AX. Finally, define n
vj =
Xij ui , j = 1, 2, . . . , n. i=1
(a) We wish to show that V = {v1 , v2 , . . . , vn } is a basis for U . Since n = dim(V ), it suffices to prove
5.1. INVARIANT SUBSPACES
119
that V is linearly independent. First notice that n n n n cj vj = j=1
cj j=1
Xij ui
n
=
i=1
n
n
Xij cj ui = j=1 i=1
Xij cj ui i=1 j=1 n
n
=
Xij cj j=1 n
=
ui
i=1
(Xc)i ui . j=1
Also,
n j=1 (Xc)i ui = 0 implies that Xc = 0 since U is linearly independent. Therefore, n
n
cj vj = 0 ⇒ j=1
(Xc)i ui = 0 ⇒ Xc = 0 ⇒ c = 0, j=1
where the last step follows from the fact that X is invertible. Thus V is linearly independent. (b) Now we wish to prove that [T ]V,V = J. The calculation above shows that if [v]V = c, then [v]U = Xc, that is, [v]U = X[v]V . Therefore, for all v ∈ V , [T ]V,V [v]V = [T (v)]V ⇔ [T ]V,V X −1 [v]U = X −1 [T (v)]U ⇔ X[T ]V,V X −1 [v]U = [T (v)]U , which shows that [T ]U ,U = X[T ]V,V X −1 , or [T ]V,V = X −1 [T ]U ,U X = X −1 AX = J, as desired. 10. We wish to prove Theorem 226. As the theorem contains several different results, it is helpful to first prove some preliminary lemmas. We are given a finite-dimensional vector space V over a field F , and subspaces S1 , S2 , . . . , St of V . For each subspace Si , we can find a basis (i)
(i)
Vi = {v1 , . . . , vki }. We define V = V1 ∪ · · · ∪ Vt . Then we can prove the following: (a) V spans V if and only if V = S1 + · · · + St . To prove this, suppose first that V = S1 + · · · + St , and let v ∈ V . Then there exist si ∈ Si , i = 1, . . . , t, such that v = s1 + · · · + st . Each si can be written as a linear combination of the vectors in Vi , and the sum of s1 , . . . , st can therefore be written as a linear combination of the vectors in V. Therefore, V spans V . (i) Conversely, suppose V spans V , and let v ∈ V . Then, since V spans V , there exist scalars αj , j = 1, . . . , ki , i = 1, . . . , t, such that t
ki
v=
(i) (i)
αj vj . i=1 j=1
(i) (i) i If we write si = kj=1 αj vj , then si ∈ Si for each i, and v = an arbitrary vector in V , this shows that V = S1 + · · · + St .
t i=1 si .
Since v was chosen to be
(b) V is linearly independent if and only if si ∈ Si , i = 1, . . . , t, s1 + · · · + st = 0 ⇒ s1 = · · · = st = 0.
(5.1)
To prove this, first suppose V is linearly independent. If si ∈ Si , i = 1, . . . , t, and s1 + · · · + st = 0, then we can write each si in terms of the basis for Si : ki
si =
(i) (i)
αj vj . j=1
CHAPTER 5. THE JORDAN CANONICAL FORM
120 Then t
ki
t
(i) (i)
si = 0 ⇒ i=1
αj vj = 0 i=1 j=1 (i)
⇒ αj = 0, j = 1, . . . , ki , i = 1, . . . , t since V is linearly independent. But then si = 0 for each i, and (5.1) is satisfied. Conversely, suppose (5.1) is satisfied, and suppose t
ki
(i) (i)
αj vj = 0 i=1 j=1 (i)
for some scalars αj ∈ F , j = 1, . . . , ki , i = 1, . . . , t. We can then define ki
si =
(i) (i)
αj vj , i = 1, . . . , t j=1
and obtain s1 + · · · + st = 0. By (5.1), it follows that each si = 0. But then, since si is defined as a (i) linear combination of the basis for Si , it follows that αj = 0 for j = 1, . . . , ki . This holds for each i = 1, . . . , t, and therefore V is linearly independent. (c) V is a basis for V if and only if V is the direct sum of S1 , . . . , St . To prove this, first suppose that V is a basis for V . Then, since V spans V , we have V = S1 + · · · + St . Hence each v ∈ V can be written in the form v = s1 + · · · + s2 , si ∈ S, i = 1, . . . , t. Suppose also v = s 1 + · · · + s 2 , s i ∈ S, i = 1, . . . , t. We then have s1 + · · · + st = s 1 + · · · + s t ⇒ (s1 − s 1 ) + · · · + (st − s t ) = 0. We have si − s i ∈ Si for i = 1, . . . , t since each Si is a subspace. Since V is linearly independent, (5.1) holds, and hence si − s i = 0, i = 1, . . . , t. This shows that the representation v = s1 + · · · + st , si ∈ Si , i = 1, . . . , t, is unique, and hence that V is the direct sum of S1 , . . . , St . Conversely, suppose V is the direct sum of S1 , . . . , St . Then (5.1) must hold, since 0 ∈ V has a unique representation v = s1 + · · · + st , si ∈ Si , i = 1, . . . , t, namely, si = 0, i = 1, . . . , t. It follows from the result above that V is linearly independent. Also, V spans V since V = S1 + · · · + St . Therefore, V is a basis for V . We can now prove Theorem 226. Suppose first that V is the direct sum of S1 , . . . , St . By definition, this implies that V = S1 + · · · + St . Also, from above, we know that V is a basis for V and hence linearly independent, which implies that (5.1) holds. Finally, since V is a basis for V , we see that dim(V ) = k1 + · · · + kt = dim(S1 ) + · · · + dim(St ). This proves the first part of the theorem. Now we wish to prove several converses, which are straightforward given the lemmas proved above. (a) If V = S1 + · · · + St and (5.1) holds, then V spans V (by the first lemma) and is linearly independent (by the second lemma), and hence V is a basis for V . But then the third lemma proves that V is the direct sum of S1 , . . . , St . (b) If V = S1 + · · · + St and dim(V ) = dim(S1 ) + · · · + dim(St ), then V spans V (by the first lemma), and the number of vectors in V equals the dimension of V . It follows from Theorem 45 that V is a basis for V , and hence V is the direct sum of S1 , . . . , St (by the third lemma). (c) If (5.1) holds and dim(V ) = dim(S1 ) + · · · + dim(St ), then V is linearly independent (by the second lemma), and the number of vectors in V equals the dimension of V . It follows from Theorem 45 that V is a basis for V , and hence V is the direct sum of S1 , . . . , St (by the third lemma).
5.1. INVARIANT SUBSPACES
121
11. Let V be a finite-dimensional vector space over a field F , suppose {x1 , x2 , . . . , xn } is a basis for V , and let k be an integer satisfying 1 ≤ k ≤ n − 1. We wish to prove that V is the direct sum of S = {x1 , . . . , xk } and T = sp{xk+1 , . . . , xn }. Given any v ∈ V , there exists scalars c1 , . . . , cn ∈ F such that v = c1 x1 + · · · + cn xn = (c1 x1 + · · · + ck xk ) + (ck+1 xk+1 + · · · + cn xn ). Since c1 x1 + · · · + ck xk ∈ S and ck+1 xk+1 + · · · + cn xn ∈ T , this shows that V = S + T . Moreover, dim(V ) = dim(S) + dim(T ), and therefore Theorem 226 shows that V is the direct sum of S and T . 12. Let F be a field, A ∈ F n×n , λ ∈ F , and suppose S is a subspace of F n . Suppose further that S is invariant under A − λI. We wish to show that S is also invariant under A. If x ∈ S, then (A − λI)x ∈ S since S is invariant under A − λI. But (A − λI)x = Ax − λx, and hence there exists s ∈ S such that Ax−λx = s, or Ax = λx+s. Since x and s both belong to the subspace S, so does the linear combination λx + s, which shows that Ax ∈ S. This holds for all x ∈ S, and hence S is invariant under A. 13. Suppose F is a field, A ∈ F n×n , N is a subspace of F n that is invariant under A, and F n is the direct sum of N and col(A). We wish to prove that N must be N (A). This follows from the fact that N ∩ col(A) = {0} (which is guaranteed by Theorem 226). For if x ∈ N , then Ax ∈ col(A) by definition of col(A), and Ax ∈ N since N is invariant under A. But then Ax ∈ N ∩ col(A) = {0}, that is, Ax = 0, and hence x ∈ N (A). Thus we have shown that N ⊂ N (A). Moreover, by the fundamental theorem of linear algebra, we know that dim(N (A)) + dim(col(A)) = n. On the other hand, since F n is the direct sum of N and col(A), dim(N ) + dim(col(A)) = n by Theorem 226. Therefore dim(N ) = dim(N (A)), and then N ⊂ N (A) implies that, in fact, N = N (A). 14. Suppose F is a field and A ∈ F n×n is diagonalizable. Let λ1 , λ2 , . . . , λt be the distinct eigenvalues of A. We wish to prove that col(A − λ1 I) = Eλ2 (A) + · · · + Eλt (A). First note that any x ∈ F n can be written as x = x1 + x2 + · · · + xt (this follows from the fact that A is diagonalizable, and hence there is a basis of F n consisting of eigenvectors of A). Then t
(A − λ1 I)xi
(A − λ1 I)x = i=1 t
(A − λ1 I)xi (since (A − λ1 I)x1 = 0)
= i=2 t
(λi − λ1 )xi ∈ Eλ2 (A) + · · · + Eλt (A).
= i=2
This shows that col(A − λ1 I) ⊂ Eλ2 (A) + · · · + Eλt (A). Now suppose x = x2 + · · · + xt , xi ∈ Eλi (A), i = 2, . . . , t. Define t
y=
(λi − λ1 )−1 xi ;
i=2
then t
(A − λ1 I)y =
(λi − λ1 )−1 (A − λ1 I)xi =
i=2
t
(λi − λ1 )−1 (λi − λ1 )xi
i=2 t
=
xi = x. i=2
This shows that Eλ2 (A) + · · · + Eλt (A) ⊂ col(A − λ1 I), and completes the proof.
CHAPTER 5. THE JORDAN CANONICAL FORM
122
15. The following statement is not a theorem: Let V be a vector space over a field F and let S1 , S2 , . . . , Sk be subspaces of V . If V = S1 + S2 + · · · + Sk and Si ∩ Sj = {0} for all i, j = 1, 2, . . . , k, i = j, then V is the direct sum of S1 , S2 , . . . , Sk . For example, let V = R3 , and define S1 = {x ∈ R3 : x3 = 0}, S2 = {x ∈ R3 : x2 = 0, x3 = x1 }, S3 = {x ∈ R3 : x1 = x2 = 0}. Then S1 ∩ S2 = {0}, S1 ∩ S3 = {0}, and S2 ∩ S3 = {0}. Also, R3 = S1 + S2 + S3 obviously holds (in fact, R3 = S1 + S3 holds). Nevertheless, R3 is not the direct sum of S1 , S2 , S3 since dim(R3 ) = 3 = 4 = dim(S1 ) + dim(S2 ) + dim(S3 ) (cf. Theorem 226).
5.2
Generalized eigenspaces
1. Let F be a field and let A ∈ F n×n . Suppose A has the form B C , A= 0 D where B ∈ F k×k , C ∈ F k×(n−k) , and D ∈ F (n−k)×(n−k) . We wish to prove that det(A) = det(B)det(D) and pA (r) = pB (r)pD (r). We will write = n − k. We first note that
B 0
C D
=
Ik 0
0 D
B 0
C I
,
where Ik is the k × k identity matrix, and similarly for I . We will prove, by induction on k, that % % % Ik 0 % % % % 0 D % = det(D). If k = 1, we have
% % Ik % % 0
% % % 0 %% %% 1 0 %% = 1 · det(D), = D % % 0 D %
where the last step follows by cofactor expansion along the first row. Thus the desired result holds for k = 1. Now suppose it holds for k = t − 1, where t ≥ 2. We can then write % % % % % % % It 0 % % It−1 0 0 % % %. %=% 0 1 0 % 0 D % % % % 0 0 D % The induction hypthesis then yields % % % % It 0 % % 1 % % % % 0 D %=% 0
% 0 %% = 1 · det(D) = det(D), D %
again using cofactor expansion along the first row. Thus the result holds for k = t and, by induction, it therefore holds for all k. This completes the proof. A similar inductive proof yields
% % B % % 0
% C %% = det(B), I %
5.2. GENERALIZED EIGENSPACES and we see that
B 0
C D
123
=
Ik 0
0 D
B 0
C I
= det(B)det(D),
as desired. It then follows immediately that pA (r) =
rI − B 0
−C rI − D
= det(rI − B)det(rI − D) = pB (r)pD (r).
2. Let A be block upper triangular matrix: ⎡
B11 ⎢ 0 ⎢ A=⎢ . ⎣ ..
B12 B22 .. .
··· ··· .. .
⎤ B1t B2t ⎥ ⎥ .. ⎥ . . ⎦
0
0
···
Btt
A straightforward proof, by induction on the number of blocks, shows that det(A) = det(B11 )det(B22 ) · · · det(Btt ). The key observation is that % % B11 % % 0 % % .. % . % % 0
B12 B22 .. .
··· ··· .. .
0
···
% % % B B1t %% %% 11 0 B2t %% %% . .. % = % .. . %% %% 0 Btt % %% 0 % % B11 % % 0 % =% . % .. % % 0
B12 B22 .. .
··· ··· .. .
B1,t−1 B2,t−1 .. .
0 0
··· ···
Bt−1,t−1 0
B12 B22 .. .
··· ··· .. .
B1,t−1 B2,t−1 .. .
0
···
% % % % % % % % Bt−1,t %% Btt % B1t B2t .. .
% % % % % % det(Btt ) % % Bt−1,t−1 %
= det(B11 ) · · · det(Bt−1,t−1 )det(Btt ), where the second step follows from Exercise 1 and the last step from the induction hypothesis. Since rI − A is block upper triangular, with diagonal blocks rI − Bii , it follows that pA (r) = det(rI − A) = det(rI − B11 ) · · · det(rI − Btt ) = pB11 (r) · · · pBtt (r). The desired results about the eigenvalues of A now follow immediately: The set of eigenvalues of A is the union of the sets of eigenvalues of B11 , B22 , . . . , Btt , and the multiplicity of each eigenvalue λ of A is the sum of the multiplicities of λ as an eigenvalue of B11 , B22 , . . . , Btt . 3. Let F be a field and let A ∈ F n×n . (a) If p(r), q(r) ∈ F [r], then p(A) and q(A) commute. This follows from the obvious fact that Ai and Aj commute for all nonnegative integers i and j. Suppose p(r) = a0 + a1 r + · · · + ak rk ,
CHAPTER 5. THE JORDAN CANONICAL FORM
124 q(r) = b0 + b1 r + · · · + b r . Then ⎛ k
i
ai A
p(A)q(A) = i=0
⎝
⎞ j⎠
bj A j=0
k
i
ai A
= j=0
k
bj Aj =
i=0
ai Ai bj Aj
j=0 i=0 k
bj Aj ai Ai
= j=0 i=0
bj Aj
= j=0
⎛ =⎝
k
ai Ai i=0
⎞
j⎠
i
bj A j=0
k
ai A
.
i=0
(b) If S is a subspace of F n that is invariant under A, then S is also invariant under p(A) for all p(r) ∈ F [r]. This follows because x ∈ S implies Ax ∈ S, which in turn implies A2 x = A(Ax) ∈ S (again using the fact that S is invariant under A), which then implies that A3 x = A(A2 x) ∈ S, and so forth. In general, then, Ai x ∈ S for all x ∈ S and i = 0, 1, 2, . . .. Therefore, for any x ∈ S, k k ai Ai
p(A)x = i=0
ai Ai x
x= i=0
is a linear combination of x, Ax, . . . , Ak x ∈ S, and hence p(A)x ∈ S since S is a subspace. This completes the proof. 4. (a) For the given matrix A, pA (r) = (r − 1)3 (r + 1). (b) By inspection, λ = 1 is an eigenvalue of A with algebraic multiplicity 3. (c) By direct calculation, dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I)3 )) = 3, and dim(N ((A − I)4 )) = 3. Therefore, the smallest value of k such that dim(N ((A − I)k+1 )) = dim(N ((A − I)k )) is k = 3. (d) From the above results, dim(N ((A − I)k )) = 3 = m. 5. For the given A ∈ R5×5 , we have pA (r) = (r − 1)3 (r + 1)2 , so the eigenvalues are λ1 = 1 and λ2 = −1. Direct calculation shows that dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I)3 )) = 3, dim(N ((A − I)4 )) = 3 and
dim(N (A + I)) = 1, dim(N ((A + I)2 )) = 2, dim(N ((A + I)3 )) = 2.
These results show that the generalized eigenspaces are N ((A − I)3 ) and N ((A + I)2 ). Bases for these subspaces are N ((A − I)3 ) = sp{(0, 1, 0, 0, 0), (0, 0, 1, 0, 0), (0, 0, 0, 1, 0)}, N ((A + I)2 ) = sp{(−1, 0, 12, 4, 0), (1, 4, 0, 0, 2)}. A direct calculation shows that the union of the two bases is linearly independent and hence a basis for R5 , and it then follows from Theorem 226 that R5 = N ((A − I)3 ) + N ((A + I)2 ). 6. The eigenvalues of A are λ1 = 1 and λ2 = −1, and the generalized eigenspaces of A are N ((A − I)2 ) = sp{(0, 1, 0), (0, 0, 1)}, N (A + I) = sp{(1, 3, 0)}.
5.2. GENERALIZED EIGENSPACES
125 ⎡
We define
0 X =⎣ 1 0 ⎡
Then
1 X −1 AX = ⎣ 0 0
⎤ 0 1 0 3 ⎦. 1 0 ⎤ 1 0 1 0 ⎦, 0 −1
which is block diagonal. 7. For the given A ∈ R4×4 , pA (r) = (r − 2)3 (r − 1). (a) The eigenvalues of A are λ1 = 2, with algebraic multiplicity 3, and λ2 = 1, with algebraic multiplicity 1. (b) We have dim(N (A − 2I)) = 1, dim(N ((A − 2I)2 )) = 2, dim(N ((A − 2I)3 )) = 3, dim(N ((A − 2I)4 )) = 3 and
dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 1.
Therefore, the generalized eigenspaces are N ((A − 2I)3 ) and N (A − I). 8. Let F be a field, let λ be an eigenvalue of A, let k be any positive integer, and define N = N ((A − λI)k ). Suppose x ∈ N and x is an eigenvector of A, say Ax = μx, x = 0. Then, since x ∈ N , it follows that (A − λI)k x = 0. But, since x is an eigenvector of A, we have (A − λI)x = Ax − λx = μx − λx = (μ − λ)x. Similarly, (A − λI)2 x = (μ − λ)2 x, and so forth. It follows that (A − λI)k x = (μ − λ)k x, and then we have (A − λI)k x = 0 ⇒ (μ − λ)k x = 0. Since x = 0, this implies that (μ − λ)k = 0, that is, μ = λ. Thus if x ∈ N is an eigenvector, the corresponding eigenvalue must be λ. 9. Let F be a field, let λ ∈ F be an eigenvalue of A ∈ F n×n , and suppose that the algebraic and geometric multiplicities of λ are equal, say to m. We wish to show that N ((A − λI)2 ) = N (A − λI). By Theorem 235, there exists a positive integer k such that dim(N ((A − λI)k+1 )) = dim(N ((A − λI)k )) and dim(N ((A − λI)k )) = m. We know that N (A − λI) ⊂ N ((A − λI)2 ) ⊂ · · · ⊂ N ((A − λI)k ), and, by hypothesis, dim(N (A − λI)) = m = dim(N ((A − λI)k )). This implies that N (A − λI) = N ((A − λI)k ), and hence that N ((A − λI)2 ) = N (A − λI) (since N (A − λI) ⊂ N ((A − λI)2 ) ⊂ N ((A − λI)k )). 10. Let F be a field and let A ∈ F n×n be defined by 1, j > i, Aij = 0, j ≤ i. Since A is upper triangular with diagonal entries all equal to 0, it follows that pA (r) = rn , and A has a single eigenvalue, λ = 0. The eigenvectors corresponding to λ are the nonzero solutions of Ax = 0, which represents the following n − 1 equations: n
xj = 0, i = 1, 2, . . . , n − 1. j=i+1
CHAPTER 5. THE JORDAN CANONICAL FORM
126
The last equation (i = n − 1) implies that xn = 0; substituting this into the next-to-last equation yields xn−1 = 0, and so forth. The conclusion is that x2 = x3 = · · · = xn = 0, and the only nonzero solution of Ax = 0 is x = e1 . We thus see that dim(A − λI) = 1, and we also see that A has a single generalized eigenspace (since each eigenspace must contain a distinct eigenvector), namely, N (Ak ) for some positive integer k. Recall that k is defined to be the smallest positive integer such that N (Ak+1 ) = N (Ak ). Moreover, since F n is the direct sum of the generalized eigenspaces of A, we see that F n = N (Ak ). Now consider the vectors en , Aen , . . . , An−1 en , An en . We will show that An−1 en = 0, An en = 0, and also that An = 0. This will show that k = n. For each j = 2, 3, . . . , n, we have j−1
Aej =
ei i=1
(this follows immediately from the special form of A). We claim that n−j
aij ei , j = 1, 2, . . . , n − 1,
Aj en =
(5.2)
i=1
where aij is a positive integer for each i, j. From above, this holds for j = 1 with aij = 1 for i = 1, . . . , n−1. Assume (5.2) holds for some j, 1 ≤ j ≤ n − 2. Then n−j j+1
A
en =
n−j i−1
aij Aen = i=1
aij e . i=1 =1
Now, for any expression βi in i, and any fixed j, 1 ≤ j ≤ n − 1, we have n−j−1 n−j
n−j i−1
βi = i=1 =1
βi . =1
i= +1
Therefore, n−j−1 n−j j+1
A
en =
n−j−1
n−j
aij e = =1
i= +1
aij
=1
e .
i= +1
This proves that (5.2) is satisfied for j + 1 in place of j, and completes the proof of (5.2) by induction. It follows that Aj en = 0 for j = 0, 1, . . . , n − 1, and An en = 0 (since An−1 en is a multiple of e1 ). A similar argument shows that An−j en−j = 0 for j = 0, 1, . . . , n − 1, and hence An x = 0 for all x ∈ F n , that is, An is the zero matrix. Thus N (An−1 ) is a proper subspace of F n and N (An ) = F n . This proves that k = n, as desired. 11. Let A ∈ F n×n , and suppose λ ∈ F is an eigenvalue of A, x ∈ F n is a corresponding eigenvector, and p(r) ∈ F [r]. We wish to prove that p(A)x = p(λ)x. Let p(r) = c0 + c1 r + · · · + ck rk . We have Ax = λx, A2 x = AAx = λAx = λ2 x, A3 x = AA2 x = λ2 Ax = λ3 x, and so forth; thus in general, Ai x = λi x. Then k k ci Ai
p(A)x = i=0
ci Ai x =
x= i=0
ci λi x = i=0
k
k
ci λi
x = p(λ)x.
i=0
This completes the proof. 12. Let F be a field, assume A ∈ F n×n , and let mA (r) be factored into irreducible monic polynomials, as in Appendix 5.2.1: mA (r) = p1 (r)k1 p2 (r)k2 · · · pt (r)kt .
5.2. GENERALIZED EIGENSPACES
127
(a) Define i to be the smallest positive integer such that N (pi (A) i +1 ) = N (pi (A) i ). Clearly pi (A)j x = 0 implies that pi (A)j+1 x = 0, and hence N (pi (A)j ) ⊂ N (pi (A)j+1 ) for all positive integers j. The dimension of N (pi (A)j ) cannot increase without bound as j increases, since these spaces are subspaces of a finite dimensional space. Therefore, for some j ≥ 1, N (pi (A)j+1 ) = N (pi (A)j ), and hence i is well-defined. (b) We will prove by induction that N (pi (A) i +s ) = N (pi (A) i ) for all s ≥ 1. By definition of i , this holds for s = 1. Suppose N (pi (A) i +s ) = N (pi (A) i ), and let x ∈ N (pi (A) i +s+1 ). Then pi (A) i +s+1 x = 0 ⇒ pi (A) i +s pi (A)x = 0 ⇒ pi (A)x ∈ N (pi (A) i +s ) = N (pi (A) ) ⇒ x ∈ N (pi (A) i +1 ) = N (pi (A) ). This shows that N (pi (A) i +s+1 ) ⊂ N (pi (A) i ). Since the reverse inclusion is obvious, the proof by induction is complete. (c) Now we wish to prove that i = ki for each i. Suppose first that ki < i . Then there exists x ∈ N (pi (A)ki +1 ) \ N (pi (A)ki ). By Corollary 244, we can write x = x1 + · · · + xt , xj ∈ N (pj (A)kj ) for all j = 1, . . . , t,
(5.3)
pi (A)x = pi (A)x1 + · · · + pi (A)xt .
(5.4)
which implies that For all j = i, pi (A)xj ∈ N (pj (A)kj ) (since pi (A) and pj (A)kj commute). Now, pi (A)xi ∈ N (pi (A)ki ), which implies that it is linearly independent of vectors in N (pj (A)kj ), j = i. Therefore, (5.4) implies that pi (A)xj = 0 for all j = i. But then xj ∈ N (pi (A)) ⊂ N (pi (A)ki ). Since also xj ∈ N (pj (A)kj ), it follows that xj = 0 for j = i. Now (5.3) yields x = xi ∈ N (pi (A)ki ), which contradicts the original definition of x. This contradiction shows that ki < i is impossible. On the other hand, suppose i < ki . Then N (pi (A)ki ) = N (pi (A) i ), from which it follows that p1 (A)k1 · · · pi−1 (A)ki−1 pi (A) i pi+1 (A)ki+1 · · · pt (A)kt x = 0 for all x ∈ F n . But this implies that p(A) = 0, where p(r) = p1 (r)k1 pi−1 (r)ki−1 pi (r) i pi+1 (r)ki+1 · · · pt (r)kt and deg(p(r)) < deg(mA (r)), a contradiction. Thus i < ki is also impossible, and we see that ki = i must hold. 13. Let F be a field and suppose A ∈ F n×n has distinct eigenvalues λ1 , . . . , λt . We wish to show that A is diagonalizable if and only if mA (r) = (r − λ1 ) · · · (r − λt ). First, suppose A is diagonalizable and write p(r) = (r − λ1 ) · · · (r − λt ). Every vector x ∈ Rn can be written as a linear combination of eigenvectors of A, from which it is easy to prove that p(A)x = 0 for all x ∈ F n (recall that the factors (A − λi I) commute with one another). Hence p(A) is the zero matrix. It follows from Theorem 239 that mA (r) divides p(r). But every eigenvalue of A is a root of mA (r), and hence p(r) divides mA (r). Since both p(r) and mA (r) are monic, it follows that mA (r) = p(r), as desired.
CHAPTER 5. THE JORDAN CANONICAL FORM
128
Conversely, suppose the minimal polynomial of A is mA (r) = (r − λ1 ) · · · (r − λt ). But then Corollary 244 implies that F n is the direct sum of the subspaces N (A − λ1 I), . . . , N (A − λt I), which are the eigenspaces EA (λ1 ), . . . , EA (λt ). Since F n is the direct sum of the eigenspaces, it follows that there is a basis of F n consisting of eigenvectors of A (see the proof of Theorem 226 given in Exercise 5.1.10), and hence A is diagonalizable. 14. Let F be a field and let A ∈ F n×n . Assume that x is a cyclic vector for A, that is, that sp{x, Ax, A2 x, . . . , An−1 x} = F n . Since dim(F n ) = n, this implies that X = {x, Ax, . . . , An−1 x} is linearly independent and a basis for F n . (a) Since X is a basis for F n , there exist unique scalars c0 , c1 , . . . , cn−1 ∈ F such that An x = c0 x + c1 Ax + · · · + cn−1 An−1 x. (b) Define X = [x|Ax| · · · |An−1 x]. Then AX = [Ax|A2 x| · · · |An−1 x|An x] = [Ax|A2 x| · · · |An−1 x|c0 x + c1 Ax + · · · + cn−1 An−1 x] ⎡ ⎤ 0 0 ··· 0 c0 ⎢ 1 0 ··· 0 c1 ⎥ ⎢ ⎥ ⎢ 0 1 ··· 0 c2 ⎥ =X⎢ ⎥ ⎢ .. .. . . .. .. ⎥ ⎣ . . . . . ⎦ 0 0 · · · 1 cn−1 = XC. Since X is invertible, this implies that X −1 AX = C. Notice that by Exercise 4.5.9, pC (r) = rn − cn−1 rn − · · · − c1 r − c0 . (c) Since A is similar to C, we have pA (r) = pC (r) = rn − cn−1 rn − · · · − c1 r − c0 by Theorem 198. (d) If deg(mA (r)) = k < n, then mA (A)x = 0 implies that Ak x ∈ sp{x, Ax, . . . , Ak−1 x}. But this is impossible, since {x, Ax, . . . , An−1 x} is linearly independent. Hence deg(mA (r)) ≥ n = deg(pA (r)), which implies that mA (r) = pA (r). 15. Let F be a field and let c0 , c1 , . . . , cn−1 be elements of F . Define A ∈ F n×n by ⎡ ⎤ 0 0 ··· 0 −c0 ⎢ 1 0 ··· 0 −c1 ⎥ ⎢ ⎥ ⎢ 0 1 ··· 0 −c2 ⎥ A=⎢ ⎥. ⎢ .. .. . . ⎥ .. .. ⎣ . . ⎦ . . . 0
0 ···
1
−cn−1
As noted in the previous exercise, pA (r) = rn + cn−1 rn−1 + · · · + c1 r + c0 .
5.3. NILPOTENT OPERATORS
129
(a) By inspection, we see that Ae1 = e2 , A2 e1 = Ae2 = e3 , . . . , An−1 e1 = Aen−1 = en . It follows that {e1 , Ae1 , . . . , An−1 e1 } = {e1 , e2 , . . . , en }, which spans F n . Thus e1 is a cyclic vector for A. (b) If deg(mA (r)) = k < n, then mA (A)e1 = 0 implies that ek+1 = Ak e1 is a linear combination of {e1 , Ae1 , . . . , Ak−1 e1 } = {e1 , e2 , . . . , ek }, which we know is not the case. Therefore, deg(mA (r)) ≥ n = deg(pA (r)). which implies that mA (r) = pA (r).
5.3
Nilpotent operators
1. Let D : P2 → P2 be defined by D(p) = p , and consider the bases B = {2, 2x, x2 } and A = {2, −1 + 2x, 1 − x + x2 } of P2 . We have D : 2 → 0, D : 2x → 2, D : x2 → 2x. It follows that [D]B,B maps e1 to 0, e2 to e1 , and e3 to e2 ; that is, ⎤ ⎡ 0 1 0 [D]B,B = ⎣ 0 0 1 ⎦ . 0 0 0 Similarly,
D : 2 → 0, D : −1 + 2x → 2, D : 1 − x + x2 → −1 + 2x.
Hence [D]A,A also maps e1 to 0, e2 to e1 , and e3 to e2 , and therefore [D]A,A = [D]B,B . 2. Suppose X is a vector space over a field F , and suppose T : X → X is a nilpotent linear operator of index k. Then there exists a vector x ∈ X such that T k−1 (x) = 0 and T k (x) = 0. It follows that y = T k−1 (x) = 0 and T (y) = 0. Thus 0 is an eigenvalue of T . Also, T k is the zero operator, and therefore the only eigenvalue of T k is 0. If λ ∈ F is an eigenvalue of T with eigenvector x ∈ X, then T k (x) = λk x, which shows that λk is an eigenvalue of T k . Since 0 is the only eigenvalue of T k , then there cannot be a nonzero eigenvalue of T . 3. Suppose X is a finite-dimensional vector space over a field F , and suppose T : X → X is a nilpotent linear operator. Let I : X → X be the identity operator. We wish to show that I + T is invertible. We first show that I + T is nonsingular. If x ∈ X and (I + T )(x) = 0, then I(x) + T (x) = 0, or T (x) = −I(x) = −x = −1 · x. Since −1 is not an eigenvalue of T by the previous exercise, it follows that x must be the zero vector. Hence I + T is nonsingular. Since X is finite-dimensional, this implies that I + T is invertible by Corollary 105. 4. This exercise gives an alternate proof of the result of the preceding exercise, one that works even if X is infinite-dimensional. Let T : X → X be a nilpotent linear operator, where X is a vector space over a field F . Let k be the index of nilpotency of T , and let I : X → X be the identity operator. Define S : X → X by S = I − T + T 2 − · · · + (−1)k−1 T k−1 . Then, for any x ∈ X, (I + T )(S(x)) = S(x) + T (S(x)) = x − T (x) + T 2 (x) − · · · + (−1)k−1 T k−1 (x)+ T (x) − T 2 (x) + · · · + (−1)k−2 T k−1 (x) + (−1)k T k (x) = x + (−1)k T k (x).
CHAPTER 5. THE JORDAN CANONICAL FORM
130
Since T k (x) = 0, we see that (I + T )(S(x)) = x. On the other hand, S((I + T )(x)) = S(x) + S(T (x)) = x − T (x) + T 2 (x) − · · · + (−1)k−1 T k−1 (x)+ T (x) − T 2 (x) + · · · + (−1)k−2 T k−1 (x) + (−1)k T k−1 (x) = x + (−1)k T k−1 (x) = x. Since S((I + T )(x)) = x for all x ∈ X and (I + T )(S(x)) = x for all x ∈ X, we see that I + T is invertible and S = (I + T )−1 . 5. Let A ∈ Cn×n be nilpotent. We wish to prove that the index of nilpotency of A is at most n. By Exercise 2, the only eigenvalue of A is λ = 0, which means that the characteristic polynomial of A must be pA (r) = rn . By the Cayley-Hamilton theorem, pA (A) = 0, and hence An = 0. It follows that the index of nilpotency of A is at most n. 6. Suppose A ∈ Cn×n and the only eigenvalue of A is λ = 0. Then the characteristic polynomial of A must be pA (r) = rn . By the Cayley-Hamilton theorem, pA (A) = 0, and hence An = 0. It follows that A is nilpotent of index k for some k satisfying 1 ≤ k ≤ n. 7. (a) Suppose A ∈ Rn×n and Aij > 0 for all i, j. Then it is obvious that every entry of Ak is positive for each k ≥ 1 (one could give a rigorous proof by induction). This implies that Ak = 0 for all k ≥ 1, and hence A is not nilpotent. (b) The conditions A ∈ Rn×n , Aij ≥ 0 for all i, j are not enough to guarantee that A is nilpotent. For example, ⎤ ⎡ 0 1 1 A=⎣ 0 0 1 ⎦ 0 0 0 satisfies these conditions and is nilpotent. However, if we impose the additional condition that tr(A) > 0 (that is, A has at least one nonzero diagonal entry), then A has a nonzero eigenvalue (since tr(A) is the sum of the eigenvalues of A) and then by Exercise 2, A is not nilpotent. 8. Suppose A ∈ R4×4 and the only eigenvalue of A is λ = 0. According to the previous exercise, A is nilpotent of index k for some k satisfying 1 ≤ k ≤ 4. We wish to determine all of the possibilities for the dimensions of N (A), N (A2 ), . . . , N (Ak−1 ). • If k = 1, then A = 0, dim(N (A)) = 4. This is the only possibility. • If k = 2, then we have at least one chain x1 , Ax1 of nonzero vectors. We could have a second chain x2 , Ax2 , in which case dim(N (A)) = 2, dim(n(A2 )) = 4, and A is similar to ⎡
1 ⎢ 0 ⎢ ⎣ 0 0
1 1 0 0
0 0 1 0
⎤ 0 0 ⎥ ⎥. 1 ⎦ 1
It is also possible that there is only one such chain; then dim(N (A)) = 3, dim(N (A2 )) = 4, and A is similar to ⎤ ⎡ 1 1 0 0 ⎢ 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 1 0 ⎦. 0 0 0 1
5.3. NILPOTENT OPERATORS
131
• If k = 3, there must be a chain x1 , Ax1 , A2 x1 of nonzero vectors. In this case, the fourth basis vector must be an independent vector from N (A). Thus dim(N (A)) = 2, dim(N (A2 )) = 3, dim(N (A3 )) = 4, and A is similar to ⎤ ⎡ 1 1 0 0 ⎢ 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 1 0 ⎦. 0 0 0 1 This is the only possibility. • If k = 4, then there is a chain x1 , Ax1 , A2 x1 , A3 x1 of nonzero vectors, we have dim(N (A)) = 1, dim(N (A2 )) = 2, dim(N (A3 )) = 3, dim(N (A4 )) = 4, and A is similar to ⎤ ⎡ 1 1 0 0 ⎢ 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 1 1 ⎦. 0 0 0 1 This is the only possibility. 9. Suppose A ∈ Rn×n has 0 as its only eigenvalue, so that it must be nilpotent of index k for some k satisfying 1 ≤ k ≤ 5. The possibilities are • If k = 1, then A = 0 and dim(N (A)) = 5. • If k = 2, there is at least one chain x1 , Ax1 of nonzero vectors. There could be a second such chain, x2 , Ax2 , in which case the fifth basis vector must come from N (A), we have dim(N (A)) = 3, dim(N (A2 )) = 5, and A is similar to ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 1 0 ⎥. ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 1 It is also possible that there is only one such chain, in which case dim(N (A)) = 4, dim(N (A2 )) = 5, and A is similar to ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥. ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 1 • If k = 3, there is a chain x1 , Ax1 , A2 x1 of nonzero vectors. There are two possibilities for the other vectors needed for the basis. There could be a chain x2 , Ax2 (with A2 x2 0), in which case dim(N (A)) = 2, dim(N (A2 )) = 4, and dim(N (A3 )) = 5, and A is similar to ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥. ⎥ ⎢ ⎣ 0 0 0 1 1 ⎦ 0 0 0 0 1 Altenatively, there could be two more independent vectors in N (A), in which case dim(N (A)) = 3, dim(N (A2 )) = 4, dim(N (A3 )) = 5, and A is similar to ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥. ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 1
CHAPTER 5. THE JORDAN CANONICAL FORM
132
• If k = 4, then there is a chain x1 , Ax1 , A2 x1 , A3 x1 of nonzero vectors, and there must be a second independent vector in N (A). Then dim(N (A)) = 2, dim(N (A2 )) = 3, dim(N (A3 )) = 4, dim(N (A4 )) = 5, and A is similar to ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 1 0 ⎥. ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 1 • If k = 5, then there is a chain x1 , Ax1 , A2 x1 , A3 x1 , A4 x1 of nonzero vectors, dim(N (A)) = 1, dim(N (A2 )) = 2, dim(N (A3 )) = 3, dim(N (A4 )) = 4, dim(N (A5 )) = 5, and A is similar to ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 1 0 ⎥. ⎥ ⎢ ⎣ 0 0 0 1 1 ⎦ 0 0 0 0 1 10. Let A be the nilpotent matrix
⎤ 1 −2 −1 −4 ⎢ 1 −2 −1 −4 ⎥ ⎥. A=⎢ ⎣ −1 2 1 4 ⎦ 0 0 0 0 ⎡
A direct calculation shows that N (A) has dimension 3, and that x1 = (1, 0, 0, 0) does not belong to N (A). We have Ax1 = (1, 1, −1, 0), and x2 , x3 ∈ N (A), x2 = (2, 1, 0, 0), x3 = (4, 0, 0, 1) have the property that {Ax1 , x2 , x3 } is a basis for N (A). We define X = [Ax1 |x1 |x2 |x3 ]; then X is invertible and ⎤ ⎡ 0 1 0 0 ⎢ 0 0 0 0 ⎥ ⎥ X −1 AX = ⎢ ⎣ 0 0 0 0 ⎦. 0 0 0 0 11. Now let
⎤ 1 75 76 −62 ⎢ 1 −2 −1 −4 ⎥ ⎥ A=⎢ ⎣ −1 14 13 −5 ⎦ . 0 16 16 −12 ⎡
A direct calculation shows that N (A) has dimension 1; this is enough to show that the index of nilpotency must be 4, and that there is a single chain x1 , Ax1 , A2 x1 , A3 x3 . We must choose x1 ∈ R3 \ N (A3 ). Since ⎤ ⎡ 0 −4 −4 3 ⎢ 0 −4 −4 3 ⎥ ⎥, A3 = ⎢ ⎣ 0 4 4 −3 ⎦ 0 0 0 0 we see that x1 = (0, 1, 1, 1) does not belong to N (A3 ). We define ⎤ ⎡ −5 −4 89 0 ⎢ −5 1 −7 1 ⎥ ⎥ X = [A3 x1 |A2 x1 |Ax1 |x1 ] = ⎢ ⎣ 5 −1 22 1 ⎦ , 0 0 20 1
5.3. NILPOTENT OPERATORS
133 ⎡
and then
0 ⎢ 0 −1 X AX = ⎢ ⎣ 0 0
1 0 0 0
⎤ 0 0 ⎥ ⎥. 1 ⎦ 0
0 1 0 0
⎤ −4 4 −12 6 10 ⎢ −21 21 −63 58 −27 ⎥ ⎥ ⎢ ⎢ 1 −3 6 −11 ⎥ A = ⎢ −1 ⎥. ⎣ 6 −6 18 −15 3 ⎦ 2 −2 6 −5 1 ⎡
12. Let
A direct calculation shows that dim(N (A)) = 3, dim(N (A2 )) = 4, and dim(N (A5 )) = 5. This shows that the index of nilpotency is 3, there is a single chain x1 , Ax1 , A2 x1 of nonzero vectors, and the basis is completed with two more vectors from N (A). We find that x1 = (0, 0, 0, 2, 1) ∈ N (A3 ) \ N (A2 ), and that {A2 x1 , x2 , x3 } is a basis for N (A), where x2 = (1, 1, 0, 0, 0), x3 = (−3, 0, 1, 0, 0). We then define ⎤ ⎡ 4 22 0 1 −3 ⎢ 21 89 0 1 0 ⎥ ⎥ ⎢ ⎥, 1 1 0 0 1 X = [A2 x1 |Ax1 |x1 |x2 |x3 ] = ⎢ ⎥ ⎢ ⎣ −6 −27 2 0 0 ⎦ −2 −9 1 0 0 and obtain
⎡
0 ⎢ 0 ⎢ X −1 AX = ⎢ ⎢ 0 ⎣ 0 0
1 0 0 0 0
0 1 0 0 0
0 0 0 0 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥. 0 ⎦ 0
13. Suppose X is a finite-dimensional vector space over a field F , and T : X → X is linear. (a) If T is nilpotent, then T is singular. This follows from Exercise 2, which shows that 0 is an eigenvalue of T . Hence there is a nonzero vector x such that T (x) = 0, and hence T is singular. (b) If T is singular, T need not be nilpotent. For instance, let T : R3 → R3 be defined by T (x) = Ax for all x ∈ R3 , where ⎤ ⎡ 1 2 3 A = ⎣ 0 2 1 ⎦. 0 0 0 Then 0 is an eigenvalue of T , and hence T is singular. However, 1 and 2 are also eigenvalues of T , and therefore, by Exercise 2, T cannot be nilpotent. 14. Let X be a finite-dimensional vector space over a field F , and let S : X → X, T : X → X be linear operators. (a) If S and T are nilpotent, then S + T need not be nilpotent. For example, let S, T be the matrix operators defined by ⎤ ⎤ ⎡ ⎡ 0 0 0 0 1 1 A = ⎣ 0 0 1 ⎦, B = ⎣ 1 0 0 ⎦, 1 1 0 0 0 0 respectively. The matrices A and B are nilpotent, but ⎡ 0 1 A+B =⎣ 1 0 1 1
⎤ 1 1 ⎦ 0
is not (the eigenvalues are −1, −1, 2). The same is for S, T , and S + T .
CHAPTER 5. THE JORDAN CANONICAL FORM
134
(b) If S and T are nilpotent, then ST need not be nilpotent. Let S and T be the operators defined in the first part of this exercise. We have ⎤ ⎡ 2 1 0 AB = ⎣ 1 1 0 ⎦ , 0 0 0 which is not nilpotent. Therefore, ST is not nilpotent. 15. Let F be a field and suppose A ∈ F n×n is nilpotent. We wish to prove that det(I + A) = 1. By Theorem 251 and the following discussion, we know there exists an invertible matrix X ∈ F n such that X −1 AX is upper triangular with zeros on the diagonal. But then X −1 (I + A)X = I + X −1 AX is upper triangular with ones on the diagonal. From this, we conclude that det(X −1 (I + A)X) = 1. Since I + A is similar to X −1 (I + A)X, it has the same determinant, and hence det(I + A) = 1.
5.4
The Jordan canonical form of a matrix
1. Let
⎡
1 A=⎣ 0 0
⎤ 1 1 1 1 ⎦ ∈ R3×3 . 0 1
The only eigenvalue of A is λ = 1, and dim(N (A − I)) = 1. Therefore, there must be a single chain of the form (A − I)2 x1 , (A − I)x1 , x1 , where x1 ∈ N ((A − I)3 ) \ N ((A − I)2 ). We have ⎤ ⎡ 0 0 1 (A − I)2 = ⎣ 0 0 0 ⎦ , 0 0 0 and thus it is easy to see that x1 = (0, 0, 1) ∈ N ((A − I)2 ). We define ⎤ ⎡ 1 1 0 X = [(A − I)2 x1 |(A − I)x1 |x1 ] = ⎣ 0 1 0 ⎦ , 0 0 1 ⎡
and then
1 1 X −1 AX = ⎣ 0 1 0 0 2. Let
⎡
0 0 A = ⎣ −2 2 −1 0
⎤ 0 1 ⎦. 1
⎤ 2 1 ⎦ ∈ R3×3 . 3
The characteristic polynomial of A is pA (r) = (r − 1)(r − 2)2 . The eigenspace of λ1 = 2 is only onedimensional, so there is a chain of the form x1 , (A − 2I)x1 , and then an eigenvector x2 of λ2 = 1. We find that x1 = (1, 0, 1) ∈ N ((A − 2I)2 ) \ N (A − 2I), and x2 = (2, 3, 1) is an eigenvector corresponding to λ2 = 1. We therefore define ⎤ ⎡ 0 1 2 X = [(A − 2I)x1 |x1 |x2 ] = ⎣ −1 0 3 ⎦ 0 1 1
5.4. THE JORDAN CANONICAL FORM OF A MATRIX
135 ⎤ 0 0 ⎦. 1
⎡
and obtain
2 1 X −1 AX = ⎣ 0 2 0 0
3. Let T : P3 → P3 be defined by T (p) = p + p for all p ∈ P3 . First, we compute the matrix of T in the standard basis S = {1, x, x2 , x3 }. We have T : 1 → 1, T : x → x, T : x2 → 2 + x2 , T : x3 → 6x + x3 , so
⎡
1 ⎢ 0 A = [T ]S,S = ⎢ ⎣ 0 0
0 1 0 0
2 0 1 0
⎤ 0 6 ⎥ ⎥. 0 ⎦ 1
The characteristic polynomial of A is pA (r) = (r−1)4 , and a direct calculation shows that dim(N (A−I)) = 2 and dim(N ((A− I)2 )) = 4. Therefore, there must be two chains of the form x1 , (A− I)x1 , x2 , (A− I)x2 , where x1 , x2 are independent vectors in N ((A − I)2 ) \ N (A − I). We find that x1 = (0, 0, 1, 0) and x2 = (0, 0, 0, 1) satisfy this condition, so we define ⎤ ⎡ 2 0 0 0 ⎢ 0 0 6 0 ⎥ ⎥ X = [(A − I)x1 |x1 |(A − I)x2 |x2 ] = ⎢ ⎣ 0 1 0 0 ⎦. 0 0 0 1 Then
⎡
1 ⎢ 0 −1 X AX = ⎢ ⎣ 0 0
1 1 0 0
0 0 1 0
⎤ 0 0 ⎥ ⎥. 1 ⎦ 1
The basis of P3 corresponding to X is X = {2, x2 , 6x, x3 }, and [T ]X ,X = X −1 AX is in Jordan canonical form. 4. Let A be a 4 × 4 matrix. For each of the following characteristic polynomials, we will determine the possible Jordan canonical forms. (a) If pA (r) = (r − 1)(r − 2)(r − 3)(r − 4), then A has 4 distinct eigenvalues and hence 4 linearly independent eigenvectors. It follows that A is diagonalizable, and the Jordan canonical form is the diagonal matrix with diagonal entries 1, 2, 3, 4. (b) If pA (r) = (r − 1)2 (r − 2)(r − 3), the A has 3 distinct eigenvalues and hence at least 3 linearly independent eigenvectors. There are two possibilities. If dim(N (A − I)) = 2, then A in fact has 4 linearly independent eigenvalues, and the Jordan canonical form is the diagonal matrix with diagonal entries 1, 1, 2, 3. In this case, dim(N (A − I)) = 2 and dim(N (A − 2I)) = dim(N (A − 3I)) = 1. The other possibility is that dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2 (as well as dim(N (A − 2I)) = dim(N (A − 3I)) = 1), and the Jordan canonical form of A is ⎤ ⎡ 1 1 0 0 ⎢ 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 2 0 ⎦. 0 0 0 3
136
CHAPTER 5. THE JORDAN CANONICAL FORM (c) If pA (r) = (r − 1)2 (r − 2)2 , then there are four possibilities. We could have dim(N (A − I)) = 2 or dim(N (A−I)) = 1, dim(N ((A−I)2 )) = 2, and similarly dim(N (A−2I)) = 2 or dim(N (A−2I)) = 1, dim(N ((A − 2I)2 )) = 2, These four possibilities lead to four possible Jordan canonical forms: ⎤ ⎡ 1 0 0 0 ⎢ 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 2 0 ⎦ , dim(N (A − I)) = dim(N (A − 2I)) = 2, 0 0 0 2 ⎤ ⎡ 1 1 0 0 ⎢ 0 1 0 0 ⎥ 2 ⎥ ⎢ ⎣ 0 0 2 0 ⎦ , dim(N (A − I)) = 1, dim(N ((A − I) ) = dim(N (A − 2I)) = 2, 0 0 0 2 ⎤ ⎡ 1 0 0 0 ⎢ 0 1 0 0 ⎥ 2 ⎥ ⎢ ⎣ 0 0 2 1 ⎦ , dim(N (A − I)) = 2, dim(N (A − 2I)) = 1, dim(N ((A − 2I) )) = 2, 0 0 0 2 ⎤ ⎡ 1 1 0 0 ⎢ 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 2 1 ⎦ , dim(N (A − I)) = dim(N (A − 2I)) = 1, 0 0 0 2 dim(N ((A − I)2 )) = dim(N ((A − 2I)2 )) = 2. (d) If pA (r) = (r − 1)3 (r − 2), then dim(N (A − 2I)) = 1 and there are three possibilities for λ1 = 1: dim(N (A − I)) = 3 or dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 3 or dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I)3 )) = 3. These three possibilites lead to the following three possible Jordan canonical forms, respectively: ⎤ ⎤ ⎡ ⎤ ⎡ ⎡ 1 1 0 0 1 1 0 0 1 0 0 0 ⎢ 0 1 0 0 ⎥ ⎢ 0 1 0 0 ⎥ ⎢ 0 1 1 0 ⎥ ⎥ ⎥ ⎢ ⎥ ⎢ ⎢ ⎣ 0 0 1 0 ⎦, ⎣ 0 0 1 0 ⎦, ⎣ 0 0 1 0 ⎦. 0 0 0 2 0 0 0 2 0 0 0 2 (e) If pA (r) = (r − 1)4 , then there are five possibilities: dim(N (A − I)) = 4, which yields a diagonal Jordan canonical form (equal to the identity matrix in this case); dim(N (A − I)) = 3, dim(N ((A − I)2 )) = 4, which leads to the Jordan canonical form ⎤ ⎡ 1 1 0 0 ⎢ 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 1 0 ⎦; 0 0 0 1 dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 4, which leads to the Jordan canonical form ⎤ ⎡ 1 1 0 0 ⎢ 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 1 1 ⎦; 0 0 0 1 dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 3, dim(N ((A − I)3 )) = 4, which leads to the Jordan canonical form ⎤ ⎡ 1 1 0 0 ⎢ 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 1 0 ⎦; 0 0 0 1
5.4. THE JORDAN CANONICAL FORM OF A MATRIX
137
or dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I))3 ) = 3, dim(N ((A − I)4 )) = 4, which leads to the Jordan canonical form ⎤ ⎡ 1 1 0 0 ⎢ 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 1 1 ⎦. 0 0 0 1 5. Let A ∈ R4×4 be defined by
⎤ −3 1 −4 −4 ⎢ −17 1 −17 −38 ⎥ ⎥ A=⎢ ⎣ −4 −1 −3 −14 ⎦ . 4 0 4 10 ⎡
Then pA (r) = (r − 1)3 (r − 2). A direct calculation shows that dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I)3 )) = 3 (and, of course, dim(N (A − 2I)) = 1. Therefore, the Jordan canonical form of A is ⎤ ⎡ 1 1 0 0 ⎢ 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 1 0 ⎦. 0 0 0 2 6. Let A be a 5 × 5 matrix. For each of the following characteristic polynomials, we will determine the possible Jordan canonical forms. (a) If pA (r) = (r − 1)(r − 2)(r − 3)(r − 4)(r − 5), then A has 5 distinct eigenvalues and hence 5 linearly independent eigenvectors. It follows that A is diagonalizable, and the Jordan canonical form is the diagonal matrix with diagonal entries 1, 2, 3, 4, 5. (b) If pA (r) = (r − 1)2 (r − 2)(r − 3)(r − 4), then A has four distinct eigenvalues and therefore at least four linearly independent eigenvectors. There are two possibilities. If dim(N (A − I)) = 2, then A is diagonalizable and the Jordan canonical form is the diagonal matrix with diagonal entries 1, 1, 2, 3, 4. If dim(N (A− I)) = 1, then dim(N ((A− I)2 )) = 2 and there is a single chain x1 , (A− I)x1 of nonzero vectors. We let x2 , x3 , x4 be eigenvectors corresponding to the eigenvalues 2, 3, 4, respectively. The Jordan canonical form is ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 2 0 0 ⎥. ⎥ ⎢ ⎣ 0 0 0 3 0 ⎦ 0 0 0 0 4 (c) If pA (r) = (r − 1)2 (r − 2)2 (r − 3), then there are a total of four possibilities since dim(N (A − I)) can be either 1 or 2, and similarly for dim(N (A − 2I)). The four possible Jordan canonical forms are ⎤ ⎡ 1 0 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 2 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 2 0 ⎦ 0 0 0 0 3 (dim(N (A − I)) = dim(N (A − 2I)) = 2, dim(N (A − 3I)) = 1), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 2 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 2 0 ⎦ 0 0 0 0 3
CHAPTER 5. THE JORDAN CANONICAL FORM
138
(dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N (A − 2I)) = 2, dim(N (A − 3I)) = 1), ⎡
1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0
0 1 0 0 0
0 0 2 0 0
0 0 1 2 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 3
(dim(N (A − I)) = 2, dim(N (A − 2I)) = 1, dim(N ((A − 2I)2 )) = 2, dim(N (A − 3I)) = 1), ⎡
1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0
1 1 0 0 0
0 0 2 0 0
0 0 1 2 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 3
(dim(N (A − I)) = 1, dim(N (A − I)) = 1, dim(N (A − 2I)) = 1, dim(N ((A − 2I)2 )) = 2, dim(N (A − 3I)) = 1). (d) If pA (r) = (r − 1)3 (r − 2)(r − 3), then there are three possible Jordan canonical forms for A: ⎡
1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 2 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 3
(dim(N (A − I)) = 3, dim(N (A − 2I)) = 1, dim(N (A − 3I)) = 1), ⎡
1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0
1 1 0 0 0
0 0 1 0 0
0 0 0 2 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 3
(dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 3, dim(N (A − 2I)) = 1, dim(N (A − 3I)) = 1), ⎡
1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0
1 1 0 0 0
0 1 1 0 0
0 0 0 2 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 3
(dim(N (A−I)) = 1, dim(N ((A−I)2 )) = 2, dim(N ((A−I)3 )) = 3, dim(N (A−2I)) = 1, dim(N (A− 3I)) = 1). (e) If pA (r) = (r − 1)3 (r − 2)2 , then there are six different possible Jordan canonical forms for A (since there are three choices for the basis corresponding to λ1 = 1 and two choices for the basis corresponding to λ2 = 2): ⎤ ⎡ 1 0 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 2 0 ⎦ 0 0 0 0 2
5.4. THE JORDAN CANONICAL FORM OF A MATRIX (dim(N (A − I)) = 3, dim(N (A − 2I)) = 2), ⎡ 1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0
0 1 0 0 0
0 0 1 0 0
139
0 0 0 2 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 1 ⎦ 2
(dim(N (A − I)) = 3, dim(N (A − 2I)) = 1, dim(N ((A − 2I)2 )) = 2), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 2 0 ⎦ 0 0 0 0 2 (dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 3, dim(N (A − 2I)) = 2), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 2 1 ⎦ 0 0 0 0 2 (dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 3, dim(N (A − 2I)) = 1, dim(N ((A − 2I)2 )) = 2), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 2 0 ⎦ 0 0 0 0 2 (dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I)3 )) = 3, dim(N (A − 2I)) = 2), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 2 1 ⎦ 0 0 0 0 2 (dim(N (A−I)) = 1, dim(N ((A−I)2 )) = 2, dim(N ((A−I)3 )) = 3, dim(N (A−2I)) = 1, dim(N ((A− 2I)2 )) = 2). (f) If pA (r) = (r − 1)4 (r − 2), then there are five different possible Jordan canonical forms for A: ⎤ ⎡ 1 0 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 2 (dim(N (A − I)) = 4, dim(N (A − 2I)) = 1), ⎡ 1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0
1 1 0 0 0
0 0 1 0 0
0 0 0 1 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 2
CHAPTER 5. THE JORDAN CANONICAL FORM
140
(dim(N (A − I)) = 3, dim(N ((A − I)2 )) = 4, dim(N (A − 2I)) = 1), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 2 (dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 4, dim(N (A − 2I)) = 1), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 2 (dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 3, dim(N ((A − I)3 )) = 4, dim(N (A − 2I)) = 1). ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 2 (dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I)3 )) = 3, dim(N ((A − I)4 )) = 4, dim(N (A − 2I)) = 1). (g) If pA (r) = (r − 1)5 , then there are seven different possible Jordan canonical forms for A: ⎤ ⎡ 1 0 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 1 (dim(N (A − I)) = 5),
1 1 0 0 0
0 0 1 0 0
0 0 0 1 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 1
(dim(N (A − I)) = 4, dim(N ((A − I)2 )) = 5), ⎡ 1 1 ⎢ 0 1 ⎢ ⎢ 0 0 ⎢ ⎣ 0 0 0 0
0 0 1 0 0
0 0 1 1 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 1
(dim(N (A − I)) = 3, dim(N ((A − I)2 )) = 5), ⎡ 1 1 ⎢ 0 1 ⎢ ⎢ 0 0 ⎢ ⎣ 0 0 0 0
0 1 1 0 0
0 0 0 1 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥ 0 ⎦ 1
⎡
1 ⎢ 0 ⎢ ⎢ 0 ⎢ ⎣ 0 0
5.4. THE JORDAN CANONICAL FORM OF A MATRIX
141
(dim(N (A − I)) = 3, dim(N ((A − I)2 )) = 4, dim(N ((A − I)3 )) = 5), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 1 1 ⎦ 0 0 0 0 1 (dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 4, dim(N ((A − I)3 )) = 5), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 1 0 ⎦ 0 0 0 0 1 (dim(N (A − I)) = 2, dim(N ((A − I)2 )) = 3, dim(N ((A − I)3 )) = 4, dim(N ((A − I)4 )) = 5), ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 1 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 1 1 ⎦ 0 0 0 0 1 (dim(N (A − I)) = 1, dim(N ((A − I)2 )) = 2, dim(N ((A − I)3 )) = 3, dim(N ((A − I)4 )) = 4, dim(N ((A − I)5 )) = 5). 7. Let A ∈ R5×5 be defined by
⎤ −7 1 24 4 7 ⎢ −9 4 21 3 6 ⎥ ⎥ ⎢ −2 −1 11 2 3 ⎥ A=⎢ ⎥. ⎢ ⎣ −7 13 −18 −6 −8 ⎦ 3 −5 6 3 5 ⎡
Then pA (r) = (r −1)3 (r −2)2 . A direct calculation shows that dim(N (A−I)) = 2, dim(N ((A−I)2 )) = 3, dim(N (A − 2I)) = 1, and dim(N ((A − 2I)2 )) = 2. Therefore, the Jordan canonical form of A is ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎢ 0 0 1 0 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 2 1 ⎦ 0 0 0 0 2 8. In defining the Jordan canonical form, we list the vectors in an eigenvector/generalized eigenvector chain in the following order: (A − λi I)rj −1 xi,j , (A − λi I)rj −2 xi,j , . . . , xi,j . If we list the vectors in the order xi,j , (A − λi I)xi,j , . . . , (A − λi I)rj −1 xi,j instead, the form of the Jordan blocks will be ⎡ 1 ⎢ 1 1 ⎢ ⎢ 1 J =⎢ ⎢ ⎣
⎤ 1 .. .
.. 1
⎥ ⎥ ⎥ ⎥. ⎥ ⎦
. 1
CHAPTER 5. THE JORDAN CANONICAL FORM
142
9. Let λ ∈ C be an eigenvalue of A ∈ Cn×n . We wish to prove that the number of Jordan blocks corresponding to λ equals the geometric multiplicity of λ. According to the discussion on page 310 of the text, there is one eigenvector corresponding to λi for each Jordan block Ji,j (namely, (A − λi I)ri,j −1 xi,j , using the notation from the text). This proves that the geometric multiplicity of λi is at least equal to the number of Jordan blocks. On the other hand, if the geometric multiplicity of λi were strictly greater than the number si of Jordan blocks corresponding to λi , then , (A − λi I)ri,1 −1 xi,1 , . . . , (A − λi I)ri,si −1 xi,si cannot span N (A − λi I). Let yi ∈ N (A − λi I) be an eigenvector that cannot be written as a linear combination of the above eigenvectors. Now, yi belongs to the generalized eigenspace corresponding to λi , and therefore si ri,j −1
αj,k (A − λi I)k xi,j
yi = j=1 k=0
for certain scalars αj,k , j = 1, . . . , si , k = 0, . . . , ri,j − 1. We then have si ri,j −1
αj,k (A − λi I)k+1 xi,j
(A − λi I)yi = j=1 k=0 sj ri,j −2
⇒0=
αj,k (A − λi I)k+1 xi,j j=1 k=0
(since (A − λi I)ri,j xi,j = 0, the last term in each inner sum is zero). The vectors in this last linear combination form a linearly independent set, and hence αj,k = 0, j = 1, . . . , si , k = 0, . . . , ri,j − 2. But then this implies that si
yi =
αj,ri,j −1 (A − λi I)ri,j −1 xi,j ,
j=1
contradicting the assumption above (that yi was independent of the eigenvectors already in the basis). This contradiction shows that the geometric multiplicity cannot be larger than the number of Jordan blocks, which completes the proof. 10. Let λ ∈ C be given and let
⎡ ⎢ ⎢ ⎢ J =⎢ ⎢ ⎣
λ
1 λ
⎤ 1 .. .
..
. λ
⎥ ⎥ ⎥ ⎥ ∈ Ck×k ⎥ 1 ⎦ λ
be a corresponding Jordan block. Then J m is upper triangular, with (J m )ij , j = i, i + 1, . . . , k equal to the first k − i + 1 terms of the sequence m m m λm−1 , λm−2 , . . . , λ, 1. λm , 1 2 m−1 If m + 1 < k − i + 1 (so that the above sequence does not fill out the row), then the remaining entries in the row are all 0. More precisely, ⎧ j < i or j > i + m, ⎨ 0, m (J M )ij = λm−(j−i) , i ≤ j ≤ i + m. ⎩ j−i
5.4. THE JORDAN CANONICAL FORM OF A MATRIX
143
For example, with k = 6, we have ⎡
λ3 ⎢ 0 ⎢ ⎢ 0 3 J =⎢ ⎢ 0 ⎢ ⎣ 0 0 and
⎡
λ6 ⎢ 0 ⎢ ⎢ 0 6 J =⎢ ⎢ 0 ⎢ ⎣ 0 0
3λ2 λ3 0 0 0 0
6λ5 λ6 0 0 0 0
1 3λ 3λ2 λ3 0 0
0 1 3λ 3λ2 λ3 0
⎤ 0 0 ⎥ ⎥ 1 ⎥ ⎥ 3λ ⎥ ⎥ 3λ2 ⎦ λ3
20λ3 15λ4 6λ5 λ6 0 0
15λ2 20λ3 15λ4 6λ5 λ6 0
⎤ 6λ 15λ2 ⎥ ⎥ 20λ3 ⎥ ⎥. 15λ4 ⎥ ⎥ 6λ5 ⎦ λ6
3λ 3λ2 λ3 0 0 0
15λ4 6λ5 λ6 0 0 0
11. Let A ∈ Cn×n and let J be the Jordan canonical form of A. We wish to prove that pA (J) = 0. Since J is block diagonal, it suffices to prove that pA (Ji,j ) = 0 for each Jordan block Ji,j of J. So we will assume, without loss of generality, that j is a k × k Jordan block corresponding to the eigenvalue λ. The λ is a (i) root of pA (r) of multiplicity at least k, and hence pA (λ) = 0 for i = 0, 1, . . . , k − 1. Let us assume that n
cm r m ;
pA (r) = m=0
then
n
cm J m .
pA (J) = m=0
Using the results of the previous exercise, we see that (pA (J))ij = 0 if j < i. Also, if j ≥ i, then n
cm (J m )ij
(pA (J))ij = m=0 n
= m=j−i
m j−i
cm λm−(j−i) n
= (m − (j − i))!
m! cm λm−(j−i) (m − (j − i))!(j − i)! m=j−i n
= (m − (j − i))!
m(m − 1) · · · (m − (j − i) + 1)cm λm−(j−i) m=j−i (j−i)
= (m − (j − i))!pA
(λ).
Since 0 ≤ j − i ≤ k − 1, we see that (pA (J))ij =0 in every case, and the proof is complete. 12. Let B ∈ R2×2 be defined by
B=
1 2
1 1
,
where > 0. The characteristic polynomial of B is pB (r) = (r−1)2 − 2 , from which it follows immediately that the eigenvalues of B are λ1 = 1 + , λ2 = 1 − . A direct calculation then shows that x1 = (1, ) and x2 = (1, − ) are eigenvalues of B corresponding to λ1 and λ2 , respectively.
CHAPTER 5. THE JORDAN CANONICAL FORM
144
13. Suppose A, B ∈ Cn×n , A and B commute, and B has n distinct eigenvalues λ1 , . . . , λn . We wish to prove that A and B are simultaneously diagonalizable. For each i = 1, . . . , n, let xi ∈ Cn be an eigenvector corresponding to λi . Since λ1 , . . . , λn are distinct, {x1 , . . . , xn } is linearly independent, and hence X = [x1 | . . . |xn ] is invertible. We then see that X −1 BX is diagonal. Now, Bxi = λi xi ⇒ ABxi = λi Axi ⇒ BAxi = λi Axi . (In the last step, we use the fact that A and B commute.) Thus Axi ∈ EB (λi ), and since EB (λi ) is one-dimensional, this shows that Axi = 0 or Axi = λi xi . In either case, xi is an eigenvector of A. This holds for each i = 1, 2, . . . , n, and hence the columns of X are eigenvectors of A; that is X diagonalizes A. Thus A and B are simultaneously diagonalizable. 14. Suppose S is a nontrivial subspace of Cn that is invariant under A. Then T : S → S, T (x) = Ax for all x ∈ S is well-defined and linear. By Theorem 209, there exists an eigenpair λ ∈ C, x ∈ S of T , x = 0. But then T (x) = λx, implies that Ax = λx. Since x is nonzero, this implies that x is an eigenvector of A. Thus S contains an eigenvector of A. 15. Suppose A ∈ Cn×n is diagonalizable, S ⊂ Cn is a nontrivial subspace that is invariant under A, and λ is an eigenvalue of A with a corresponding eigenvector belonging to S. Define T : S → S by T (x) = Ax. We wish to prove that ker((T − λI)2 ) = ker(T − λI). By Theorem 230, it suffices to prove that R(T − λI) ∩ ker(T − λI) = {0}. Since A is diagonalizable, we have N ((A − λI)2 ) = N (A − λI) and hence, by Theorem 230, col(A − λI) ∩ N (A − λI) = {0}. By definition of T , we have R(T − λI) ⊂ col(A − λI) and ker(T − λI) ⊂ N (A − λI), and hence R(T − λI) ∩ ker(T − λI) ⊂ col(A − λI) ∩ N (A − λI) = {0}. This completes the proof. 16. Suppose A ∈ Cn×n is diagonalizable and S is a nontrivial invariant subspace for A. Define T as in the previous exercise. By the previous exercise, ker((T − λI)2 ) = ker(T − λI) for every eigenvalue λ of T . By Theorem 234, it follows that T is diagonalizable, that is, that there is a basis X = {x1 , . . . , xk } of S such that [T ]X ,X is diagonal. This implies that, for each i = 1, . . . , k, ei = [xi ]X is an eigenvector of [T ]X ,X , and hence that each xi is an eigenvector of A: [T ]X ,X [xi ]X = [T (xi )]X , [T ]X ,X [xi ]X = λi [xi ]X ⇒ [T (xi )]X = λi [xi ]X = [λi xi ]X ⇒ T (xi ) = λi xi ⇒ Axi = λi xi . Therefore, X is a basis for S consisting of eigenvectors of A. 17. Suppose A, B ∈ Cn×n commute. Then, for any eigenspace EA (λ) of A, x ∈ EA (λ) ⇒ Ax = λx ⇒ BAx = λBx ⇒ ABx = λBx ⇒ Bx ∈ EA (λ). This shows that Eλ (A) is invariant under B. 18. Suppose A, B ∈ Cn×n are diagonalizable and commute. Let λ1 , . . . , λt be the distinct eigenvalues of A, and define Ni = EA (λi ). Then, by the previous exercise, each Ni is invariant under B and hence, by Exercise 16, there is a basis Xi for Ni consisting of eigenvectors of B. Notice that since Xi ⊂ Ni , each vector in Xi is also an eigenvector of A. Then, since Cn is the direct sum of N1 , . . . , Nt (this holds because A is diagonalizable), X = ∪ti=1 Xi is a basis of Cn , and X consists of eigenvectors for both A and B. Thus A and B are simultaneously diagonalizable. 19. One example is
⎡
1 1 A=⎣ 0 1 0 0
⎤ ⎡ 2 0 1 ⎦, B = ⎣ 0 0 1
⎤ 1 0 2 1 ⎦. 0 2
5.5. THE MATRIX EXPONENTIAL
5.5
145
The matrix exponential
1. Let A ∈ Cn×n , and let U : R → Cn×n be any fundamental solution of U = AU . We wish to prove that U (t)U (0)−1 = etA . By the existence and uniqueness theory for linear ODEs, it suffices to prove that Z(t) = U (t)U (0)−1 satisfies Z = AZ, Z(0) = I. The second condition is immediate from the definition of Z. We also have Z (t) =
d d U (t)U (0)−1 = (U (t))U (0)−1 dt dt = U (t)U −1 (0) = AU (t)U −1 (0) = AZ(t)
(the second step follows because U (0)−1 is a constant matrix). This completes the proof. 2. We wish to verify that (5.22) is the general solution to (5.21). We first verify that (5.22) is a solution. For j = 1, . . . , r − 1, we have t2 tr−j vj (t) = eλt cj + cj+1 t + cj+2 + · · · + cr 2! (r − j)! 2 t tr−j + ⇒ vj (t) = λeλt cj + cj+1 t + cj+2 + · · · + cr 2! (r − j)! tr−j−1 eλt cj+1 + cj+2 t + · · · + cr (r − j − 1)! ⇒ vj (t) = λvj (t) + vj+1 (t), and thus equation j is satisfied. For j = r, the equation is obviously satisfied: vr (t) = eλt ⇒ vr (t) = λeλt = λvr (t). 3. We wish to find the matrix exponential etA for the matrix given in Exercise 5.4.5. The similarity transformation defined by ⎤ ⎡ 1 20 −9 0 ⎢ 0 1 0 −4 ⎥ ⎥ X=⎢ ⎣ −1 −20 0 −2 ⎦ 0 0 4 1 puts A in Jordan canonical form: ⎡
1 ⎢ 0 X −1 AX = J = ⎢ ⎣ 0 0 We have
⎡
et ⎢ 0 etJ = ⎢ ⎣ 0 0
tet et 0 0
1 1 0 0
t2 t 2e t
te et 0
0 1 1 0
⎤ 0 0 ⎥ ⎥. 0 ⎦ 2
⎤ 0 0 ⎥ ⎥, 0 ⎦ e2t
and etA = XetJ X −1 is „ « 2 et 1 − 4t − t 2 6 6 6 et (16 − t) − « 16e2t 6 „ 6 2 6 et 8 − 4t + t − 8e2t 4 2 −4et + 4e2t 2
tet et −tet 0
„ « 2 −4t − t 2 et (16 − t) − « 16e2t „ 2 et 9 + 4t + t − 8e2t 2 et
−4et + 4e2t
et et
“ ” −4t − t2
et (36 − 2t) − 36e2t “ ” 18 + 4t + t2 − 18e2t −8et + 4e2t
3 7 7 7 7. 7 7 5
CHAPTER 5. THE JORDAN CANONICAL FORM
146
4. We wish to find the matrix exponential of the matrix from Exercise 5.4.7. The Jordan canonical form of A is ⎤ ⎡ 1 1 0 0 0 ⎢ 0 1 0 0 0 ⎥ ⎥ ⎢ ⎥ J =⎢ ⎢ 0 0 1 0 0 ⎥, ⎣ 0 0 0 2 1 ⎦ 0 0 0 0 2 ⎡
and
et ⎢ 0 ⎢ etJ = ⎢ ⎢ 0 ⎣ 0 0
tet et 0 0 0
0 0 et 0 0
⎤ 0 0 ⎥ ⎥ 0 ⎥ ⎥. te2t ⎦ e2t
0 0 0 e2t 0
Space does not permit the display of etA . 5. Let A, B ∈ Cn×n . (a) We first show that if A and B commute, then so do etA and B. We define U (t) = etA B − BetA . Then U (t) = AetA B − BAetA = AetA B − ABetA = A etA B − BetA = AU (t). Also, U (0) = IB − BI = B − B = 0. Thus U satisfies U = AU , U (0) = 0. But the unique solution of this IVP is U (t) = 0; hence etA B = BetA . (b) Use the preceding result to show that if A and B commute, the et(A+B) = etA etB holds. We define U (t) = etA etB . Then U (t) = AetA etB + etA BetB = AetA etB + BetA etB = (A + B)etA etB (notice how we used the first part of the exercise), and U (0) = I. But et(A+B) also satisfies the IVP U = (A + B)U , U (0) = I. Hence et(A+B) and etA etB must be equal. (c) Let
A= !
Then e and
tA
1+e2 2 −1+e2 2e
= ⎡
0 1
1 0
, B=
−1+e2 2e 1+e2 2e
√ √ 2 − 2 e + e √ √ eA+B = ⎣ 1 √ e 2 − e− 2 2 2 1 2
0 0
1 0
"
.
, etB =
1 0
1 1
√ √ 2 − 2 √1 e − e 2 √ √ 1 2 + e− 2 2 e
⎤ ⎦.
It is easy to verify that eA eB = eA+B . 6. Let A ∈ Cn×n be invertible. Since A and A−1 commute, the previous exercise implies that etA e−tA = et(A−A) = et·0 = I, −1 = e−tA . and hence etA
5.6. GRAPHS AND EIGENVALUES
147
7. Let A ∈ Cn×n , v ∈ Cn be given, and define u(t) = etA v. Since, by definition, etA = I for t = 0, we have u(0) = v. Also, d tA d tA u (t) = e v = e v dt dt since v is a constant vector. Then d tA e = AetA dt yields u (t) = AetA v = Au(t). This shows that u solves the IVP u = Au, u(0) = v. 8. Let A ∈ Cn×n be given. The existence and uniqueness theory for ordinary differential equations implies that, given any t0 ∈ R and v0 ∈ Cn , the IVP u = Au, u(t0 ) = v0 has a unique solution u : R → Cn . Moreover, the theory guarantees that u is real-valued (that is, u : R → Rn ) if A ∈ Rn×n and v0 ∈ Rn . It follows that, for each i = 1, . . . , n, there is a unique solution ui of u = Au, u(0) = ei , where ei is the ith standard basis vector in Cn . Moreover, ui is real-valued if A ∈ Rn×n . Define U (t) = [u1 (t)| · · · |un (t)]; then U (t) = [u 1 (t)| · · · |u n (t)] = [Au1 (t)| · · · |Aun (t)] = A[u1 (t)| · · · |un (t)] = AU (t). Also, U (0) = [u1 (0)| · · · |un (0)] = [e1 | · · · |en ] = I. Therefore, there exists a solution U of U = AU , U (0) = I, and U (t) ∈ Rn×n for all t if A ∈ Rn×n . Suppose V is another solution of U = AU , U (0) = I, and define vi (t) to be the ith column of V (t). Then V (0) = I implies vi (0) = ei . Also, V (t) = AV (t) ⇔ [v1 (t)| · · · |vn (t)] = [Av1 (t)| · · · |Avn (t)], and therefore V = AV implies that vi = Avi . It follows that vi solves the IVP u = Au, u(0) = ei . Since this IVP has a unique solution, it follows that vi = ui for all i = 1, . . . , n, and therefore that V = U . This completes the proof. 9. Let U : R → Cn×n be the unique solution to U = AU, U (0) = I, where A ∈ Cn×n is given. We wish to show that U (t) is nonsingular for all t ∈ R. We will argue by contradiction and assume there exists t0 ∈ R such that U (t0 ) is singular. Then there exists c ∈ Cn , c = 0, such that U (t0 )c = 0. It follows that u(t) = U (t)c solves the IVP u = Au, u(t0 ) = 0. However, the constant function u(t) ≡ 0 also solves this IVP, and by the existence and uniqueness theorem, the two solutions must be equal. Therefore, U (t)c = 0 for all t, and hence U (t) is singular for all t. But this contradicts the fact that U (0) = I, and proves the desired result.
5.6
Graphs and eigenvalues
1. The adjacency matrix is
⎡
0 ⎢ 1 AG = ⎢ ⎣ 0 1
1 0 1 0
⎤ 1 0 ⎥ ⎥, 1 ⎦ 0
0 1 0 1
and its eigenvalues are 0, 0, −2, 2. 2. The adjacency matrix is
⎡
0 ⎢ 0 ⎢ AG = ⎢ ⎢ 0 ⎣ 1 0
0 0 0 1 1
0 0 0 1 0
1 1 1 0 0
⎤ 0 1 ⎥ ⎥ 0 ⎥ ⎥, 0 ⎦ 0
CHAPTER 5. THE JORDAN CANONICAL FORM
148
. . √ √ and its eigenvalues are 0, ± 2 + 2, ± 2 − 2. The graph G is bipartite (with V1 = {v1 , v2 , v3 }, V2 = {v4 , v5 }) and, as guaranteed by Theorem 261, the nonzero eigenvalues come in pairs of the form ±λ. 3. The adjacency matrix is
⎤ 0 1 1 0 0 0 0 ⎢ 1 0 1 0 0 0 0 ⎥ ⎥ ⎢ ⎢ 1 1 0 0 0 0 0 ⎥ ⎥ ⎢ ⎥ AG = ⎢ ⎢ 0 0 0 0 1 0 1 ⎥, ⎢ 0 0 0 1 0 1 0 ⎥ ⎥ ⎢ ⎣ 0 0 0 0 1 0 1 ⎦ 0 0 0 1 0 1 0 and its eigenvalues are 0, ±1, ±2. Notice that the adjacency matrix shows that every vertex has degree 2, and hence G is 2-regular. Theorem 263 states that the largest eigenvalue of G should be 2, as indeed it is. Also, since the multiplicity of λ = 2 is 2, G must have two connected components. It does; the vertices v1 , v2 , v3 form one connected component, and the vertices v4 , v5 , v6 , v7 form the other. ⎡
4. (a) Let G be a bipartite graph with an odd number of vertices. We wish to prove that 0 is an eigenvalue of G. This follows immediately from Theorem 261: All nonzero eigenvalues of G occur in pairs of the form ±λ, and hence G has an even number of nonzero eigenvalues. Since G has an odd number of vertices, the adjacency matrix AG is n × n, where n is odd, and hence has an odd number of eigenvalues (counted according to multiplicity). Thus AG must have at least one zero eigenvalue. (b) Consider the graph G with VG = {v1 , v2 , v3 , v4 }, EG = {{v1 , v3 }, {v2 , v4 }}. Then ⎤ ⎡ 0 0 1 0 ⎢ 0 0 0 1 ⎥ ⎥ AG = ⎢ ⎣ 1 0 0 0 ⎦. 0 1 0 0 The eigenvalues of G are −1, −1, 1, 1. Thus we see that 0 need not be an eigenvalue of a bipartite graph with an even number of vertices. 5. Let G be the graph defined by VG = {v1 , v2 , v3 , v4 , v5 } and EG = {{v1 , v2 }, {v1 , v3 }, {v1 , v4 }, {v1 , v5 }, {v2 , v3 }}. It is easy to see that the diameter of G is 2, but G has 5 distinct eigenvalues: 0, −1, 2.3429, −1.8136, 0.47068 (the last three are approximate). 6. Let P be a permutation matrix. Then there exists a permutation τ ∈ Sn such that 1, j = τ (i), Pij = 0, j = τ (i). We then have
n T
(P P )ij =
n T
(P )ik Pkj = k=1
Pki Pkj = k=1
1, i = j(= τ (k)), 0, i = j.
This shows that P T P = I, and hence that P T = P −1 . 7. The adjacency matrices of the two graphs are ⎤ ⎡ ⎡ 0 0 0 0 0 0 ⎢ 0 0 1 0 1 ⎥ ⎢ 1 ⎥ ⎢ ⎢ ⎢ 0 1 0 1 0 ⎥, ⎢ 1 ⎥ ⎢ ⎢ ⎣ 0 0 1 0 1 ⎦ ⎣ 1 1 0 1 0 1 0
1 0 0 0 0
1 0 0 0 0
1 0 0 0 0
⎤ 1 0 ⎥ ⎥ 0 ⎥ ⎥. 0 ⎦ 0
A direct calculation shows that the eigenvalues of both graphs are −2, 0, 0, 0, 2, and hence the two graphs are cospectral.
Chapter 6
Orthogonality and best approximation 6.1
Norms and inner products
1. Let V be a normed vector space over R, and suppose u, v ∈ V with v = αu, α ≥ 0. Then u + v = u + au = (1 + a)u = (1 + a) u = u +a u = u + au = u + v (note the repeated use of the second property of a norm from Definition 265). Thus, if v = au, a ≥ 0, then equality holds in the triangle inequality ( u + v = u + v ). 2. Let V be a vector space over R, and let ·, · be an inner product on V . We wish to show that if either u or v is zero, then u, v = 0. Without loss of generality, let us assume that v = 0. Then v = 0 · v, and u, v = u, 0 · v = 0 u, v = 0, as desired. 3. (a) We wish to prove that n
|xi | for all x ∈ Rn
x 1= i=1
defines a norm on R . Clearly x 1 ≥ 0 for all x ∈ Rn , and x 1 = 0 if and only if xi = 0 for all i = 1, . . . , n, that is x 1 = 0 if and only if x = 0. Thus the first property of a norm is satisfied. Let x ∈ Rn and α ∈ R. Then n
n
n
n
|αxi | =
αx 1 = i=1
|α||xi | = |α| i=1
|xi | = |α| x 1 , i=1
and we see that the second property is satisfied as well. Finally, the triangle inequality holds for the absolute value function on R: |α + β| ≤ |α| + |β| for all α, β ∈ R. Therefore, for any x, y ∈ Rn , n
n
|xi + yi | ≤
x+y 1 = i=1
n
(|yi | + |xi |) = i=1
n
|xi | + i=1
|yi | i=1
= x 1 + y 1, and hence the triangle inequality holds for · 1 . This completes the proof that · 1 defines a norm on Rn . 149
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
150 (b) Now we wish to prove that
x ∞ = max{|xi | : i = 1, 2, . . . , n} for all x ∈ Rn defines a norm on Rn . Obviously x ∞ ≥ 0 for all x ∈ Rn , and x ∞ = 0 if and only if |xi | = 0 for all i = 1, . . . , n, that is, if and only if x = 0. Thus the first property of a norm is satisfied. If x ∈ Rn and α ∈ R, then αx ∞ = max{|αxi | : i = 1, 2, . . . , n} = max{|α||xi | : i = 1, 2, . . . , n} = |α| max{|xi | : i = 1, 2, . . . , n} = |α| x ∞ . Thus the second property is satisfied as well. Now suppose x, y ∈ Rn . Then x + y ∞ = max{|xi + yi | : i = 1, 2, . . . , n} ≤ max{|xi | + |yi | : i = 1, 2, . . . , n} ≤ max{|xi | : i = 1, 2, . . . , n}+ max{|yi | : i = 1, 2, . . . , n} = x ∞ + y ∞. The crucial step is the third; if k is the index such that |xk + yk | = max{|xi | + |yi | : i = 1, 2, . . . , n}, then we have |xk | ≤ max{|xi | : i = 1, 2, . . . , n}, |yk | ≤ max{|yi | : i = 1, 2, . . . , n}, and hence max{|xi | + |yi | : i = 1, 2, . . . , n} ≤ max{|xi | : i = 1, 2, . . . , n} + max{|yi | : i = 1, 2, . . . , n}. Thus
·
∞ satisfies the triangle inequality. This completes the proof.
4. (a) Obviously x ∞ ≤ x 2 for all x ∈ Rn . Notice that x
2 1 =
n
2 |xi |
n
=
i=1
n
|xi |2 + 2
|xi ||xj | ≥ j =i
i=1
|xi |2 = x 22 .
i=1
This shows that x 2 ≤ x 1 for all x ∈ Rn . √ (b) We wish to show that x 1 ≤ n x 2 for all x ∈ Rn . Define v ∈ Rn by vi = 1 if xi ≥ 0 and vi = −1 if xi < 0. Then n
n
|xi | =
x 1= i=1
vi xi = v · x ≤ v 2 x 2 i=1
√ where the last step follows from the Cauchy-Schwarz inequality. Since v 2 = n, the result follows. √ (c) Finally, we wish to show that x 2 ≤ n x ∞ for all x ∈ Rn . Let x ∈ Rn be given. We have |xi | ≤ x ∞ for i = 1, . . . , n, and hence x 22 =
n i=1
|xi |2 ≤
n
x 2∞ = n x 2∞ .
i=1
The result follows by taking the square root of both sides.
6.1. NORMS AND INNER PRODUCTS
151
5. We wish to derive relationships among the L1 (a, b), L2 (a, b), and L∞ (a, b) norms. Three such relationships exist: f 1 ≤ (b − a) f ∞ for all f ∈ L∞ (a, b), √ f 2 ≤ b − a f ∞ for all f ∈ L∞ (a, b), √ f 1 ≤ b − a f 2 for all f ∈ L2 (a, b). The first two are simple; we have |f (x)| ≤ f ∞ for (almost) all x ∈ [a, b], and hence b f 1= f
2 2 =
a b
|f (x)| dx ≤
b
f ∞ dx = (b − a) f ∞ ,
a
2
|f (x)| dx ≤
a
b a
f 2∞ dx = (b − a) f 2∞ .
To prove the third relationship, define v(x) = 1 for all x ∈ [a, b]. Then b
|f (x)| dx =
a
b a
|v(x)||f (x)| dx = |v|, |f | 2 ≤ v 2 f 2 ,
where the last step follows from the Cauchy-Schwarz inequality. Since v 2 = follows.
√
b − 1, the desired result
Note that it is not possible to bound f ∞ by a multiple of either f 1 or f 2 , nor to bound f 2 by a multiple of f 1 . 6. Let
·
be defined by x = |x1 | + |x2 | + · · · + |xn−1 | for all x ∈ Rn .
Then · is not a norm on Rn since en = 0 but en = 0 (where en is the nth standard basis vector in Rn ). Thus · does not satisfy the first property of a norm. 7. Let
·
be defined by n
x =
x2i for all x ∈ Rn .
i=1
Then, if x = 0,
n
2x = i=1
Thus
·
(2xi )2 =
n i=1
n
4x2i = 4
x2i = 4 x = 2 x .
i=1
does not satisfy the second property of a norm.
8. The path from x = (0, 0) to y = (1, 0) and then to z = (1, 1) has length 2 in the 1 norm (that is, y − x 1 = 1, z − y 1 = 1). Also, the path from x to v = (0, 1) and then to z also has length 2 ( v − x 1 = 1, z − v 1 = 1). Therefore, the shortest path from x to z is not unique when distance is measured in the 1 norm. 9. The following graph shows the (boundaries of the) unit balls in the 2 norm (solid curve), the 1 norm (dotted curve), and the ∞ norm (dashed curve).
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
152
Notice that 1 unit ball is contained in the other two, which is consistent with x 1 ≥ x 2 , x ∞ , while the 2 unit ball is contained in the ∞ unit ball, consistent with x ∞ ≤ x 2 . 10. The H 1 (a, b) inner product is defined by f, g H 1 =
b
{f (x)g(x) + f (x)g (x)} dx.
a
(a) We wish to prove that the H 1 inner product is an inner product on C 1 [a, b]. We have f, g H 1 =
b
{f (x)g(x) + f (x)g (x)} dx =
b
a
a
{g(x)f (x) + g (x)f (x)} dx = g, f H 1 ,
and therefore the inner product is symmetric. Also, αf + βg, h H 1 =
b
{(αf (x) + βg(x))h(x) + (αf (x) + βg (x))h (x)} dx
a
b =
{αf (x)h(x) + βg(x)h(x) + αf (x)h (x) + βg (x)h (x)} dx
a
b =
{α(f (x)h(x) + f (x)h (x)) + β(g(x)h(x) + g (x)h (x))} dx
a
b
=α
{f (x)h(x) + f (x)h (x)} dx + β
a
b
{g(x)h(x) + g (x)h (x)} dx
a
= α f, h H 1 + β g, h H 1 . Thus the inner product is bilinear. Next, f, f H 1 =
b
,
f (x)2 + f (x)2 dx ≥ 0,
a
and f, f H 1 = 0 implies that f (as well as f ) is equal to the zero function (this holds because f is continuous—if f were nonzero at any point, then by continuity f (x) would be nonzero for all x in a small interval around that point, which would imply the integral of f (x)2 would be positive). This proves that the third property of an inner product holds, and the proof is complete. (b) Suppose is a small positive number and ω is a large positive number, say ω = nπ, where n is a large positive integer. Then f L2 = √ , 2 . 2 2 f H1 = √ n π + 1. 2 Therefore, f L2 is small for all n, but f H 1 is large if n is large enough. 11. Suppose V is an inner product space and · is the norm defined by the inner product ·, · on V . Then, for all u, v ∈ V , u + v 2 + u − v 2 = u + v, u + v + u − v, u − v = u, u + 2 u, v + v, v + u, u − 2 u, v + v, v = 2 u, u + 2 v, v = 2 u 2 + 2 v 2. Thus the parallelogram law holds in V .
6.1. NORMS AND INNER PRODUCTS
153
If u = (1, 1) and v = (1, −1) in R2 , then a direct calculation shows that u + v 21 + u − v 21 = 8, 2 u 21 + 2 v 21 = 16, ·
and hence the parallelogram law does not hold for product. For the same u and v, we have
1 . Therefore,
·
1 cannot be defined by an inner
u + v 2∞ + u − v 2∞ = 8, 2 u 2∞ + 2 v 2∞ = 4, and hence the parallelogram law does not hold for · ∞ . Therefore, · ∞ cannot be defined by an inner product. 12. Suppose A ∈ Rn×n is a nonsingular matrix, and let · be a norm on Rn . We wish to show that x A = Ax defines another norm on Rn . For all x, we have x A = Ax ≥ 0, and x A=0 ⇒
Ax = 0 ⇒ Ax = 0 ⇒ x = 0,
where the last step follows from the fact that A is nonsingular. Therefore, · A satisfies the first property of a norm. Next, for all α ∈ R and x ∈ Rn , we have αx A = A(αx) = αAx = |α| Ax = |α| x A . Thus the second property of a norm is satisfied. Finally, if x, y ∈ Rn , then x + y A = A(x + y) = Ax + Ay ≤ Ax + Ay = x A + y A , and thus
·
A satisfies the triangle inequality. This completes the proof that
·
A is a norm.
13. Let λ1 , λ2 , . . . , λn be positive real numbers. We wish to prove that n
x, y =
λi xi yi i=1
defines an inner product on Rn . First, for any x, y ∈ Rn , n
x, y =
n
λi yi xi = y, x .
λi xi yi = i=1
i=1
Thus the first property of an inner product is satisfied. Next, if α, β ∈ R and x, y, z ∈ Rn , then n
n
αx + βy, z =
λi (αxi + βyi )zi = i=1
(αλi xi zi + βλi yi zi ) i=1 n
=α
n
λi xi zi + β i=1
λi yi zi i=1
= α x, z + β y, z . Therefore, ·, · satisfies the second property of an inner product. Finally, n
x, x =
λi x2i ≥ 0 (since λi > 0 for all i).
i=1
Also, since λi x2i ≥ 0 and λi > 0 for each i, x, x = 0 implies that x2i = 0 for all i, and hence that x = 0. This completes the proof that ·, · is an inner product on Rn .
154
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
14. Let ·, · be defined as in the preceding exercise, but suppose λj ≤ 0 for some j, 1 ≤ j ≤ n. Then, if ej is the jth standard basis vector, we have ej = 0 and ej , ej = λj ≤ 0, contradicting the third property of an inner product. Thus ·, · is not an inner product in this case. 15. Let U and V be vector spaces over R with norms of the following is a norm on U × V .
· U and
· V , respectively. We will prove that each
(a) Define (u, v) = u U + v V for all (u, v) ∈ U × V . Obviously (u, v) ≥ 0 for all (u, v) ∈ U × V , and, if (u, v) = 0, then u U = v V = 0, which in turn implies that u = 0, v = 0, that is, (u, v) = (0, 0). Therefore, the first property of a norm is satisfied. Next, if α ∈ R and (u, v) ∈ U × V , then α(u, v) = (αu, αv) = αu U + αv V = |α| u U + |α| v V = |α| ( u U + v V ) = |α| (u, v) , and therefore the second property is satisfied. Finally, if (u, v), (w, z) belong to U × V , then (u, v) + (w, z) = (u + w, v + z) = u+w U + v+z V ≤ ( u U + w U) + ( v V + z V ) ≤ ( u U + v V)+( w U + z V) = (u, v) + (w, z) . Thus the triangle inequality is satisfied, and the proof is complete. . u 2U + v 2V for all (u, v) ∈ U ×V . Obviously (u, v) ≥ 0 for all (u, v) ∈ U ×V , (b) Define (u, v) = and, if (u, v) = 0, then u U = v V = 0, which in turn implies that u = 0, v = 0, that is, (u, v) = (0, 0). Therefore, the first property of a norm is satisfied. Next, if α ∈ R and (u, v) ∈ U × V , then / α(u, v) = (αu, αv) = αu 2U + αv 2V / = α2 ( u 2U + v 2V ) / = |α| u 2U + v 2V = |α| (u, v) , and therefore the second property is satisfied. Finally, if (u, v) and (w, z) belong to U × V , then (u, v) + (w, z) 2 = (u + w, v + z) = u + w 2U + v + z 2V ≤ ( u U + w U )2 + ( v V + z V )2 = u 2U + w 2U + 2 u U w U + v 2V + z 2V + 2 v V z V = u 2U + v 2V + w 2U + z 2V + 2( u U w U + v V z V ).
6.1. NORMS AND INNER PRODUCTS
155
Now, if we apply the Cauchy-Schwarz inequality to the R2 vectors ( u U , v V ) and ( w U , z V ), we obtain / / u U w U+ v V z V ≤ u 2U + v 2V w 2U + z 2V = (u, v) Therefore, (u, v) + (w, z) 2 ≤
(w, z) .
u 2U + v 2V + w 2U + z 2V + 2 (u, v) (w, z)
= (u, v) 2 + (w, v) 2 + 2 (u, v)
(w, z)
2
= ( (u, v) + (w, z) ) , and the triangle inequality is satisfied. This completes the proof. (c) Define (u, v) = max{ u U , v V } for all (u, v) ∈ U × V . Obviously (u, v) ≥ 0 for all (u, v) ∈ U × V , and, if (u, v) = 0, then u U = v V = 0, which in turn implies that u = 0, v = 0, that is, (u, v) = (0, 0). Therefore, the first property of a norm is satisfied. Next, if α ∈ R and (u, v) ∈ U × V , then α(u, v) = (αu, αv) = max{ αu U , αv V } = max{|α| u U , |α| v V } = |α| max{ u U , v V } = |α| (u, v) , and therefore the second property is satisfied. Finally, if (u, v), (w, z) belong to U × V , then (u, v) + (w, z) = (u + w, v + z) = max{ u + w U , v + z V } ≤ max{ u U + w U , v V + z V }. Now, we have u U , v V ≤ max{ u U , v V } = (u, v) and w U , z V ≤ max{ w U , z V } = (w, z) . It follows that u U + w U ≤ (u, v) + (w, z) , v V + z V ≤ (u, v) + (w, z) , which implies that (u, v) + (w, z) ≤ max{ u U + w U , v V + z V } ≤ (u, v) + (w, z) . Thus the triangle inequality is satisfied, and the proof is complete. 16. Let U and V be vector spaces over R with inner products ·, · U and ·, · V , respectively. We wish to show that (u, v), (w, z) = u, w U + v, z V for all (u, v), (w, z) ∈ U × V defines an inner product on U × V . For any (u, v), (w, z) in U × V , (u, v), (w, z) = u, w U + v, z V = w, u U + z, v V = (w, z), (u, v) ,
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
156
and the first property of an inner product is satisfied. If (u, v), (w, z), (x, y) belong to U ×V and α, β ∈ R, then (α(u, v) + β(w, z), (x, y) = (αu + βw, αv + βz), (x, y) = αu + βw, x U + αv + βz, y V = α u, x U + β w, x U + α v, y V + β z, y V = α( u, x U + v, y V ) + β( w, x U + z, y V ) = α (u, v), (x, y) + β (w, z), (x, y) . This shows that the second property of an inner product is satisfied. Finally, if (u, v) ∈ U × V , then (u, v), (u, v) = u, u U + v, v ≥ 0, since u, u U , v, v V ≥ 0. Also, if (u, v), (u, v) = 0, then both u, u U and v, v V must be zero, and hence u = 0, v = 0. But this implies that (u, v) = (0, 0), and hence the third property of an inner product is satisfied. This completes the proof.
6.2
The adjoint of a linear operator
1. Let X and U be finite-dimensional inner product spaces over R, and let T : X → U be linear. We wish ∗ to prove that (T ∗ ) = T . We write S = T ∗ ; then, by the definition, S ∗ : X → U is the unique linear operator satisfying x, S(u) X = S ∗ (x), u U for all x ∈ X, u ∈ U. By definition of T ∗ , we have
x, T ∗ (u) X = T (x), u U for all x ∈ X, u ∈ U. Since S = T ∗ , we see that T satisfies the defining property of S ∗ , and hence T = S ∗ , that is, T = (T ∗ )∗ . 2. Suppose X, U , and W are inner product spaces over R and T : X → U and S : U → W are linear. We wish to show that (ST )∗ = T ∗ S ∗ . For all x ∈ X, w ∈ W , we have (ST )(x), w W = S(T (x)), w W = T (x), S ∗ (w) U = x, T ∗ (S ∗ (w)) X
= x, (T ∗ S ∗ )(w)) X .
But, by definition, (ST )∗ is the unique linear operator from W into X satisfying (ST )(x), w W = x, (ST )∗ (w) X for all x ∈ X, w ∈ W. This shows that (ST )∗ = T ∗ S ∗ . 3. Let A ∈ Rm×n and B ∈ Rn×p . We wish to prove that (AB)T = B T AT . This follows from the previous exercise if we define T : Rp → Rn by T (x) = Bx for all x ∈ Rp and S : Rn → Rm by S(y) = Ay for all y ∈ Rn . Then S(y) · z = (Ax) · z = x · (AT z) for all y ∈ Rn , z ∈ Rm , which shows that S ∗ is defined by S ∗ (z) = AT z. Similarly, T ∗ is defined by T ∗ (x) = B T x, and the result follows. We can also give a direct proof. We have (AB)T ij = (AB)ji =
n
Ajk Bki k=1 n
=
(B T )ik (AT )kj = B T AT ij .
k=1
This holds for all i = 1, . . . , m and j = 1, . . . , p, and hence (AB)T = B T AT .
6.2. THE ADJOINT OF A LINEAR OPERATOR
157
4. Let X and U be finite-dimensional inner product spaces over R and assume that T : X → U is an invertible linear operator. We wish to shows that T ∗ is also invertible, and that ∗ (T ∗ )−1 = T −1 . Since T is invertible, X and U must have the same dimension, and therefore it suffices to prove that T ∗ ((T −1 )∗ (x)) = x for all x ∈ X (cf. Exercise 3.6.10). For all x, y ∈ X, we have 1 0 1 0 1 0 ∗ T ((T −1 )∗ (x)), y X = (T −1 )∗ (x), T (y) X = x, T −1 (T (y)) X = x, y X . By Corollary 275, this implies that T ∗ ((T −1 )∗ (x)) = x for all x ∈ X, which shows that T ∗ is invertible and that (T ∗ )−1 = (T −1 )∗ . 5. Suppose A ∈ Rn×n is invertible. We wish to prove that AT is also invertible and that (AT )−1 = (A−1 )T . This follows from the previous exercise, but we can also give a direct proof. We have T −1 T A (A ) ij =
n k=1 n
= k=1 n
=
(AT )ik (A−1 )T kj Aki (A−1 )jk (A−1 )jk Aki
k=1
= A−1 A ji = Iji = Iij . Since this holds for all i, j = 1, . . . , n, we see that AT (A−1 )T = I. By Theorem 118, this shows that AT is invertible and that (AT )−1 = (A−1 )T . 6. Suppose A, B belong to Rm×n and y · Ax = y · Bx for all x ∈ Rn , y ∈ Rm . We wish to prove that A = B. Applying Corollary 275, the hypothesis implies that Ax = Bx for all x ∈ Rn . But then Aej = Bej for each standard basis vector ej in Rn , which yields Aj = Bj for j = 1, . . . , n. Since A and B have the same columns, it follows that A = B. 7. Let V be an inner product space over R, and let x ∈ V . If x = 0, then, for any y ∈ V , x, y = 0 · x, y = 0 x, y = 0. On the other hand, if x, y = 0 for all y ∈ V , then, in particular, x, x = 0. By the definition of inner product, this implies that x = 0. Thus we see that x, y = 0 for all y ∈ V if and only if x = 0. Now suppose x, y ∈ V . Then, from above, x − y, v = 0 for all v ∈ V if and only if x − y = 0. Equivalently, x, v = y, v for all v ∈ V if and only if x = y. 8. Let D : P2 → P1 be defined by D(p) = p . We wish to find D∗ , assuming that the L2 (0, 1) inner product is imposed on both P1 and P2 . We will follow the technique of Example 277, which uses the proof of Theorem 276. We write S1 = {p1 , p2 } = {1, x} and S2 = {q1 , q2 , q3 } = {1, x, x2 } for the standard bases for P1 and P2 , respectively. The Gram matrix is then ⎡ ⎤ 1 12 13 ⎢ ⎥ G = ⎣ 21 13 14 ⎦ . 1 3
1 4
1 5
158
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION The matrix M ∈ R3×2 , where Mij = D(pi ), qj P1 , is ⎡
0 M =⎣ 1 1
0 1 2 2 3
⎤ ⎦.
Then, as shown in the proof of Theorem 276, ⎤ −6 2 [D∗ ]S1 ,S2 = G−1 M = ⎣ 12 −24 ⎦ , 0 30 ⎡
which shows that D∗ maps a0 + a1 x to (−6a0 + 2a1 ) + (12a0 − 24a1 )x + 30a1 x2 . 9. Let M : P2 → P3 be defined by M (p) = q, where q(x) = xp(x). We wish to find M ∗ , assuming that the L2 (0, 1) inner product is imposed on both P2 and P3 . Following the technique of Example 277 (and the previous exercise), we compute the matrix N , defined by Nij = M (pi ), qj P3 , where S1 = {p1 , p2 , p3 } = {1, x, x2 } and S2 = {q1 , q2 , q3 , q4 } = {1, x, x2 , x3 }: ⎡ ⎤ ⎢ N =⎣
1 2 1 3 1 4
1 3 1 4 1 5
1 4 1 5 1 6
1 5 1 6 1 7
⎥ ⎦.
The Gram matrix is the same as in the previous exercise, and we obtain ⎡ ⎤ 3 1 0 0 20 35 ⎢ ⎥ [M ]S3 ,S2 = G−1 N = ⎣ 1 0 − 53 − 32 35 ⎦ . 12 3 0 1 2 7 This shows that M ∗ maps a0 + a1 x + a2 x2 + a3 x3 to 1 3 3 32 3 12 a2 + a3 + a0 − a2 − a3 x + a1 + a2 + a3 x2 . 20 35 5 35 2 7 10. Suppose A ∈ Rn×n is a symmetric positive definite matrix. We wish to prove that x, y A = x · Ay for all x, y ∈ Rn defines an inner product on Rn . First, for any x, y ∈ Rn , x, y A = x · Ay = (AT x) · y = Ax · y = y · Ax = y, x A (note the use of the symmetry of A). Thus ·, · A is symmetric. If x, y, z belong to Rn and α, β are real numbers, then αx + βy, z A = (αx + βy) · Az = α(x · Az) + β(y · Az) = α x, z A + β y, z A , which shows that ·, · A is bilinear. Finally, since A is positive definite, we have x, x A = x · Ax > 0 for all x = 0, and x, x A is obviously 0 if x is the zero vector. Therefore, the third property of an inner product is satisfied, and the proof is complete. 11. Let X and U be finite-dimensional inner product spaces over R, and suppose T : X → U is linear. Define S : R(T ∗ ) → R(T ) by S(x) = T (x) for all x ∈ R(T ∗ ). We will prove that S is an isomorphism between R(T ∗ ) and R(T ).
6.2. THE ADJOINT OF A LINEAR OPERATOR
159
(a) First, suppose x1 , x2 ∈ R(T ∗ ) satisfy S(x1 ) = S(x2 ). Since x1 , x2 ∈ R(T ∗ ), there exist u1 , u2 ∈ U such that x1 = T ∗ (u1 ), x2 = T ∗ (u2 ). Then we have S(x1 ) = S(x2 ) ⇒ T (x1 ) = T (x2 ) ⇒ T (T ∗ (u1 )) = T (T ∗(u2 )) ⇒ T (T ∗ (u1 − u2 )) = 0 ⇒ u1 − u2 , T (T ∗ (u1 − u2 )) U = 0
⇒ T ∗ (u1 − u2 ), T ∗ (u1 − u2 ) X = 0 ⇒ T ∗ (u1 − u2 ) = 0
⇒ T ∗ (u1 ) = T ∗ (u2 ) ⇒ x1 = x2 . Therefore, S is injective. (b) Since S injective, Theorem 93 implies that dim(R(T )) ≥ dim(R(T ∗ )). This results holds for any linear operator mapping one finite-dimensional inner product space to another, and hence it applies to the operator T ∗ . Hence dim(R(T ∗ )) ≥ dim(R((T ∗ )∗ )). Since (T ∗ )∗ = T , it follows that dim(R(T ∗ )) ≥ dim(R(T )), and therefore that dim(R(T ∗ )) ≥ dim(R(T )) (that is, rank(T ∗ ) = rank(T )). (c) Now we see that S is an injective linear operator mapping one finite-dimensional vector space to another of the same dimension. It follows from Corollary 105 that S is also surjective and hence is an isomorphism. 12. Let X and U be finite-dimensional inner product spaces over R, and let T : X → U be linear. We wish to prove that (a) T is injective if and only if T ∗ is surjective; (b) T is surjective if and only if T ∗ is injective. Using the fundamental theorem of linear algebra and the result of the previous exercise, we have T is injective ⇔ nullity(T ) = 0 ⇔ rank(T ) = dim(X) ⇔ rank(T ∗ ) = dim(X) ⇔ T ∗ is surjective. Thus T is injective if and only if T ∗ is surjective. Moreover, since this applies to every such linear operator, it applies to T ∗ , and hence T ∗ is injective if and only if T is surjective. 13. Let X and U be finite-dimensional inner product spaces over R, let X and U be bases for X and U , respectively, and let T : X → U be linear. We wish to find a formula for [T ∗ ]U ,X in terms of [T ]X ,U and the Gram matrices for X and U. Assume X = {x1 , . . . , xn }, U = {u1 , . . . , um }, and write GX for the gram matrix for X and similarly for GU . Let us choose arbitrary x ∈ X, u ∈ U , and write α = [x]X , β = [u]U , γ = [T ∗ (u)]X , δ = [T (x)]U . Then
2m T (x), u U =
δi u i , i=1 m
3
m
βj u j j=1
m
δi βj ui , uj U
= i=1 j=1
= δ · GU β.
U
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
160
We have [T ]X ,U [x]X = [T (x)]U , and hence δ = [T ]X ,U α. Therefore, T (x), u U = ([T ]X ,U α) · GU β = α · [T ]TX ,U GU β. Similarly, 2 n
∗
3
n
x, T (u) X =
αi xi , i=1
γj xj j=1
X
= α · GX γ = α · (GX [T ∗ ]U ,X β) . Therefore,
α · [T ]TX ,U GU β = α · (GX [T ∗ ]U ,X β) for all α ∈ Rn , β ∈ Rm .
This shows that GX [T ∗ ]U ,X = [T ]TX ,U GU , that is, T [T ∗ ]U ,X = G−1 X [T ]X ,U GU .
14. Let f : X → R be linear, where X is a finite-dimensional inner product space over R. We wish to prove that there exists a unique u ∈ X such that f (x) = x, u for all x ∈ X. Let us first assume that such a u exists. Let X = {x1 , . . . , xn } be a basis for X, and write u as n u = j=1 βj xj . Then 3 2 n x, u = f (x) for all x ∈ X ⇒
j=1
2 ⇒
= f (x) for all x ∈ X
βj xj
x,
3
n
xi ,
βj xj
= f (xi ) for all i = 1, . . . , n
j=1 n
xi , xj βj = f (xi ) for all i = 1, . . . , n
⇒ j=1
⇒ Gβ = v, where v ∈ Rn is defined by vi = f (xi ), i = 1, . . . , n and G is the Gram matrix for X . Therefore, if such a u exists, it is defined by u = nj=1 βj xj , where β = G−1 v. (In particular, this proves the uniqueness of u.) Now suppose β = G−1 v, where v is defined as above, and define u = n i=1 αi xi ∈ X, 3 2 n n x, u =
αi xi , i=1 n n
βj xj j=1
αi βj xi , xj
= i=1 j=1
= α · Gβ = α·v n
αi f (xi ) = f
= i=1
Therefore, u satisfies the desired condition.
n αi xi i=1
= f (x).
n j=1 βj xj .
Then, for any x =
6.3. ORTHOGONAL VECTORS AND BASES
161
15. Let f : P2 → R be defined by f (p) = p (0). We wish to find q ∈ P2 such that f (p) = p, q 2 , where ·, · 2 is the L2 (0, 1) inner product. We compute q using the method described in the previous exercise. Let S = {p1 , p2 , p3 } = {1, x, x2 } be the standard basis for P2 . Then the Gram matrix for S is ⎡ ⎤ 1 12 13 ⎢ ⎥ G = ⎣ 21 13 14 ⎦ . 1 3
1 4
1 5
The vector v = (f (p1 ), f (p2 ), f (p3 )) is v = (0, 1, 0), and β = G−1 v = (−36, 192, −180). Thus q(x) = −36 + 192x − 180x2 .
6.3
Orthogonal vectors and bases
1. Let x, y be two vectors in R2 . By the law of cosines, x − y 22 = x 22 + y 22 − 2 x 2 y 2 cos (θ) (see Figure 6.2 in the text). Since
·
2 is defined by the dot product, we also have
x − y 22 = (x − y) · (x − y) = x · x − 2x · y + y · y = x 22 + y 22 − 2x · y. Comparing these two expressions for x − y 22 , we see that −2x · y = −2 x 2 y 2 cos (θ), or x · y = x 2 y 2 cos (θ), which is what we wanted to prove. 2. The equation α1 u1 + α2 u2 + α3 u3 = v is equivalent to Aα = v, where ⎤ ⎡ 1 1 1 0 −2 ⎦ . A=⎣ 1 1 −1 1 A direct calculation, using Gaussian elimination, shows that α = (1, 1, −1), that is, v = u1 + u2 − u3 . 3. The equation α1 p1 + α2 p2 + α3 p3 = q is equivalent to 1 1 α1 − α2 + α3 + (α2 − α3 )x + α3 x2 = 3 + 2x + x2 , 2 6 which in turn is equivalent to the system 1 1 α1 − α2 + α3 = 3, 2 6 α2 − α3 = 2, α3 = 1. The solution is α = (13/3, 3/1), and thus q(x) =
13 p1 (x) + 3p2 (x) + p3 (x). 3
4. We use the trigonometric identity sin (α) sin (β) =
1 (cos (α − β) − cos (α + β)), 2
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
162 which yields
1 sin (mπx) sin (kπx) dx 0
1 = 2
1 0
{cos ((m − k)πx) − cos ((m + k)πx)} dx
%1 1 sin ((m − k)πx) sin ((m + k)πx) %% − = % 2 (m − k)π (m + k)π 0 1 sin ((m − k)π) sin ((m + k)π) − =0 = 2 (m − k)π (m + k)π since sin(θ) = 0 for every integer multiple of π. This shows that {sin (πx), sin (2πx), . . . , sin (nπx)} is an orthogonal subset of L2 (0, 1). 5. Consider the following vectors in R4 : u1 = (1, 1, 1, 1), u2 = (1, −1, 1, −1), u3 = (1, 0, −1, 0), u4 = (0, −1, 0, 1). A direct calculation shows that ui · uj = 0 for i = j, with u1 · u1 = 4, u2 · u2 = 4, u3 · u3 = 2, u4 · u4 = 2. Therefore, v · u1 v · u2 v · u3 v · u4 u1 + u2 + u3 + u4 u1 · u1 u2 · u2 u3 · u3 u4 · u4 10 −2 −2 2 u1 + u2 + u3 + u4 = 4 4 2 2 5 1 = u1 − u2 − u3 + u4 . 2 2
v=
6. Consider the following quadratic polynomials: p1 (x) = 1+x+x2 , p2 (x) = 65−157x+65x2, p3 (x) = −21+ 144x − 150x2 . A direct calculation shows that pi , pj 2 = 0 if i = j, while pi , pi 2 is 37/10, 1591/2, 129 for i = 1, 2, 3, respectively (here ·, · 2 is the L2 (0, 1) inner product). For each standard basis function qi (x) = xi , i = 0, 1, 2, we have qi (x) =
qi , p1 2 qi , p2 2 qi , p3 2 p1 (x) + p2 (x) + p3 (x). p1 , p1 2 p2 , p2 2 p3 , p3 2
In particular, 55 49 1 p1 (x) + p2 (x) + p3 (x), 111 4773 129 65 1 p1 (x) − p2 (x), x= 222 222 47 55 1 p1 (x) − p2 (x) − p3 (x). x2 = 222 9546 129 1=
7. Consider the functions ex and e−x to be elements of C[0, 1], and regard C[0, 1] as an inner product space under the L2 (0, 1) inner product. Define S = sp{ex , e−x }. We wish to find an orthogonal basis {f1 , f2 } for S. We will take f1 (x) = ex . The function f2 must be of the form f (x) = c1 ex + c2 e−x and satisfy 1 f (x)ex dx = 0. 0
6.3. ORTHOGONAL VECTORS AND BASES
163
This last condition leads to the equation 1 1 1 c1 e2x dx + c2 1 dx = 0 ⇔ (e2 − 1)c1 + c2 = 0. 2 0 0 One solution is c1 = 2, c2 = 1 − e2 . Thus if f2 (x) = 2ex + (1 − e2 )e−x , then {f1 , f2 } is an orthonal basis for S. 8. Consider the following subspace of R4 : S = sp{u1 , u2 }, where u1 = (1, 1, 1, −1), u2 = (−2, −2, −1, 2). We wish to find an orthogonal basis for S. We will choose v1 = u1 and look for v2 in the form v2 = c1 u1 +c2 u2 . Then v1 · v2 = 0 ⇔ u1 · (c1 u1 + c2 u2 ) = 0 ⇔ (u1 · u1 )c1 + (u1 · u2 )c2 = 0. One solution is c1 = −u1 · u2 = 7, c2 = u1 · u1 = 4, which yields v2 = 7u1 + 4u2 = (−1, −1, 3, 1). Then {v1 , v2 } is an orthogonal basis for S. 9. Notice that {x1 , x2 , x3 }, where x1 = (1, 0, 1, 0), x2 = (1, 1, −1, 1), x3 = (−1, 2, 1, 0), is an orthogonal subset of R4 . We wish to find a fourth vector x4 ∈ R4 such that {x1 , x2 , x3 , x4 } is orthogonal. We need a nonzero solution to the system x1 · x = 0, x2 · x = 0, x3 · x = 0, which is equivalent to Ax = 0, where ⎡
1 0 A=⎣ 1 1 −1 2
⎤ 1 0 −1 1 ⎦ . 1 0
Thus any nonzero element in the null space of A will do; one such vector is x4 = (−1, −1, 1, 3). 10. Let {x1 , . . . , xk }, where 1 ≤ k < n, be an orthogonal set in Rn . Let A ∈ Rk×n be the matrix whose rows are the vectors x1 , . . . , xk . Any nonzero solution of Ax = 0 is orthogonal to each of the rows of A, that is, to each xi , i = 1, . . . , k. Notice that N (A) is nontrivial, since A has more columns than rows. 11. If U = {u1 , u2 , . . . , un } is an orthogonal basis for an inner product space, then the Gram matrix of U is diagonal, with diagonal entries ui , ui , i = 1, . . . , n. If U is orthonormal, then the Gram matrix of U is the n × n identity matrix. 12. Let {x1 , x2 , . . . , xn } be an orthonormal set in Rn , and define X = [x1 | · · · |xn ]. We have 1 , i = j, (X T X)ij = xi · xj = 0 , i = j, which implies that X T X = I. Since X is square, Theorem 118 implies that X is invertible and X T = X −1 . Then it also follows that XX T = I, which means that the columns of X T , that is, the rows of X, form an orthonormal set. 13. Let V be an inner product space over R, and let {u1 , . . . , uk } be an orthogonal subset of V . We wish to prove that v ∈ V belongs to sp{u1 , . . . , uk } if and only if k
v=
v, uj uj . u j , uj j=1
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
164
The “if” direction is obvious; if v can be written in the above form, then it is a linear combination of u1 , . . . , uj (since v, uj / uj , uj is a scalar for ech j), and hence v ∈ sp{u1 , . . . , uk }. On the other hand, suppose v ∈ sp{u1 , . . . , uk }. Then Theorem 284 implies that k
v=
v, uj uj . u j , uj j=1
14. Let V be an inner product space over R, let {u1 , . . . , uk } be an orthogonal subset of V , and define S = sp{u1 , . . . , uk }. (a) Suppose v ∈ V and v ∈ S, and define k
w=v−
v, uj uj . uj , uj j=1
Then
k
ui , w = ui , v −
v, uj ui , uj . uj , uj j=1
Since {u1 , . . . , uk } is an orthogonal set, it follows that ui , uj is zero for j = i, and therefore ui , w = ui , v −
v, ui ui , ui = ui , v − ui , v = 0. ui , ui
Thus w is orthogonal to each basis vector of S. Now, for any u ∈ S, there exist α1 , . . . , αk ∈ R such that u = α1 u1 + · · · + αk uk . We then have 2 3 k k w, u =
w,
αi ui i=1
αi w, ui .
= i=1
Since w, ui = 0 for each i = 1, . . . , k, this shows that w, u = 0, and hence w is orthogonal to every vector in S. (b) Now we wish to show that, if v ∈ V , v ∈ S, then then 4 4 4 k 4 4 v, uj 4 4 u v >4 j4 . 4 4 j=1 uj , uj 4 Let w be the vector defined above. Since k
ṽ =
v, uj uj u j , uj j=1
belongs to S, we see that w and ṽ are orthogonal, and also that v = ṽ +w. But then the Pythagorean theorem yields v 2 = ṽ 2 + w 2 . Since w = 0 when v ∈ S, this implies that v 2 > ṽ 2 and hence that v > ṽ , which is what we wanted to prove. 15. Let V be an inner product space over R, and let u, v be vectors in V . (a) Assume u and v are nonzero. We wish to prove that v ∈ sp{u} if and only if | u, v | = u First suppose v ∈ sp{u}. Then, by Exercise 13, 4 4 4 v, u 4 | v, u | v, u | v, u | 4 v= u⇒ v =4 u . 4 u, u 4 = u 2 u = u, u u
v .
6.4. THE PROJECTION THEOREM
165
This yields u v = | v, u |, as desired. On the other hand, if v ∈ sp{u}, then, by the previous exercise, 4 4 4 v, u 4 | v, u | 4 u4 = , v >4 u, u 4 u and hence u v > | v, u |. Thus v ∈ sp{u} if and only if u v = | v, u |. (b) If u and v are nonzero, then the first part of the exercise shows that equality holds in the CauchySchwarz inequality if and only if v ∈ sp{u}, that is, if and only if v is a multiple of u. If v = 0 or u = 0, then equality trivially holds in the Cauchy-Schwarz inequality (both sides are zero). Thus we can say that equality holds in the Cauchy-Schwarz inequality if and only if u = 0 or v is a multiple of u. 16. Let V be an inner product space over R. We wish to determine when equality holds in the triangle inequality. For any u, v ∈ V , we have u + v = u + v ⇔ u + v 2 = ( u + v )2 ⇔ u 2 + 2 u, v + v 2 = u 2 + 2 u ⇔ u, v = u v .
v + v 2
By the previous exercise, it is necessary that u = 0 or v be a multiple of u, say v = αu. In this case, u, v = u
⇔ α u, v = |α| u 2 ⇔ α = |α|.
v
Thus we see that equality holds in the triangle inequality applied to u, v ∈ V if and only if u = 0 or v is a nonnegative multiple of u.
6.4
The projection theorem
1. Let A ∈ Rm×n . (a) We wish to prove that N (AT A) = N (A). First, if x ∈ N (A), then Ax = 0, which implies that AT Ax = 0, and hence that x ∈ N (AT A). Thus N (A) ⊂ N (AT A). Conversely, suppose x ∈ N (AT A). Then AT Ax = 0 ⇒ x · AT Ax = 0 ⇒ (Ax) · (Ax) = 0 ⇒ Ax = 0. Therefore, x ∈ N (A), and we have shown that N (AT A) ⊂ N (A). This completes the proof. (b) If A has full rank, then the null space of A is trivial (by the fundamental theorem of linear algebra) and hence so is the null space of N (AT A). Since AT A is square, this shows that AT A is invertible. (c) Thus, if A has full rank, then AT A is invertible and, for any y ∈ Rm there is a unique solution x = (AT A)−1 AT y of the normal equations AT Ax = AT y. Thus, by Theorem 291, there is a unique least-squares solution to Ax = y. 2. Let A ∈ Rm×n and y ∈ Rm be given, and assume that A has full rank. Then {A1 , A2 , . . . , An } is a basis for col(A), and, for all i, j = 1, . . . , n, m
m
T
Aki Akj = Ai · Aj .
T
(A A)ij =
(A )ik Akj = k=1
k=1
This shows that AT A is the Gram matrix for {A1 , A2 , . . . , An }. Also, n
(AT y)i =
n
Aki yk = Ai · y.
(AT )ik yk = k=1
k=1
Therefore, AT y is the vector called b in Theorem 289, and AT Ay = AT b is the system of equations from part 3 of that theorem.
166
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
3. Let S be the one-dimensional subspace of R3 spanned by the vector u = (1, 2, −1). We wish to find the best approximation from S to v = (1, 0, 1). The best approximation is projS v =
v·u u=0 u·u
since v · u = 0 in this case. Thus v is orthogonal to S, and the vector in S closest to v is the zero vector. 4. Let x1 = (0, 1, 1) and x2 = (1, 1, 2), and define S = sp{x1 , x2 } ⊂ R3 . We wish to find the projection of v = (1, 1, 1) onto S. The solution is u = c1 x1 + c2 x2 , where Gc = b and 2 3 3 . , b= G= 4 3 6 This yields c = (0, 2/3), and therefore u = (2/3)x2 = (2/3, 2/3, 4/3). 5. Let V = C[0, 1], regarded as an inner product space under the L2 (0, 1) inner product. Define S = sp{φ1 , φ2 , φ3 } ⊂ V , where φ1 (x) = sin (πx), φ2 (x) = sin (2πx), and φ3 (x) = sin (3πx). We wish to find the best approximation to f (x) = x(1 − x) from S. We showed in Exercise 6.3.4(b) that {φ1 , φ2 , φ3 } is orthogonal. Therefore, the best approximation is f, φ2 2 f, φ3 2 f, φ1 2 φ1 (x) + φ2 (x) + φ3 (x) φ1 , φ1 2 φ2 , φ2 2 φ3 , φ3 2 3 8 = 3 sin (πx) + 0 · sin (2πx) + sin (3πx) π 27π 3 3 8 = 3 sin (πx) + sin (3πx). π 27π 3 6. Let V = C[0, 1], regarded as an inner product space under the L2 (0, 1) inner product. Define the following subspaces of V : S1
=
sp{sin (πx), sin (2πx), sin (3πx)},
S2
=
sp{1, cos (πx), cos (2πx)}.
Define f ∈ V by f (x) = x. We wish to compute the best approximation to f from both S1 and S2 . The given bases for S1 and S2 are orthogonal (see the next exercise), and therefore the projections are computed as in the previous exercise. The best approximation from S1 is 2 1 2 sin (πx) − sin (2πx) + sin (3πx), π π 3 while the best approximation from S2 is 1 4 − 2 cos (πx). 2 π The graph below shows f (the dotted line) together with the approximations from S1 (the solid curve) and S2 (the dashed curve). 1 0.8 0.6 0.4 0.2 0 0
0.2
0.4
0.6
0.8
1
Neither approximation is very good, but the second is noticeably better than the first.
6.4. THE PROJECTION THEOREM
167
7. (a) In Exercise 6.3.4(b), we showed that {sin (πx), sin (2πx), . . . , sin (nπx)} is orthogonal under the L2 (0, 1) inner product. (b) Let us write ψ0 (x) = 1 and ψk (x) = cos (kπx), k = 1, . . . , n. A direct calculation shows that ⎧ ⎨ 1, i = j = 0, 1 , i = j > 1, fk , f 2 = ⎩ 2 0, otherwise. Therefore, {1, cos (πx), cos (2πx), . . . , cos (nπx)} is orthogonal under the L2 (0, 1) inner product. 8. Consider the following data points: (−2, 14.6), (−1, 4.3), (0, 1.1), (1, 0.3), (2, 1.9). We wish to find a quadratic polynomial p(x) = c0 + c1 x + c2 x2 that fits the data as nearly as possible, using the method of least-squares. We wish to solve the equations c0 + c1 xi + c2 x2i = yi , i = 1, 2, 3, 4, 5 in the least-squares sense. These equations are equivalent to the system M c = y, where ⎤ ⎡ ⎤ ⎡ 14.6 1 −2 4 ⎢ 4.3 ⎥ ⎢ 1 −1 1 ⎥ ⎥ ⎢ ⎥ ⎢ ⎥ ⎥ ⎢ 0 0 ⎥, y = ⎢ M =⎢ 1 ⎢ 1.1 ⎥ . ⎣ ⎦ ⎣ 1 0.3 ⎦ 1 1 1.9 1 2 4 . The solution is c = (0.69714, −2.9400, 1.8714), and thus the least-squares quadratic is q(x) = 0.69714 − 2.9400x + 1.8714x2. 9. Consider the following data points: (0, 3.1), (1, 1.4), (2, 1.0), (3, 2.2), (4, 5.2), (5, 15.0). We wish to find the function of the form f (x) = a1 ex + a2 e−x that fits the data as nearly as possible in the least-squares sense. We wish to solve the equations a1 exi + a2 e−xi = yi , i = 1, 2, 3, 4, 5, 6 in the least-squares sense. These equations are equivalent to the system M a = y, where ⎤ ⎡ ⎤ ⎡ x 3.1 e 1 e−x1 ⎢ 1.4 ⎥ ⎢ ex2 e−x2 ⎥ ⎥ ⎢ ⎥ ⎢ x ⎢ 1.0 ⎥ ⎢ e 3 e−x3 ⎥ ⎥. ⎢ ⎥ ⎢ M = ⎢ x4 −x4 ⎥ , y = ⎢ ⎥ ⎢ 2.2 ⎥ ⎢ ex e−x ⎥ ⎣ 5.2 ⎦ ⎣ e 5 e 5 ⎦ 15.0 ex6 e−x6 . The solution is a = (0.10013, 2.9878), and the approximating function is 0.10013ex + 2.9878e−x. 10. Let S be a finite-dimensional subspace of an inner product space V , let {u1 , . . . , un } be a basis for S, and suppose v ∈ V , w ∈ S satisfy v − w, ui = 0 for all i = 1, . . . , n. We wish to show that v − w, z = 0 for all z ∈ S. If z ∈ S, then there exist α1 , . . . , αn ∈ R such that z = ni=1 αi ui . But then 3 2 n n v − w, z =
v − w,
αi ui i=1
αi v − w, ui = 0
= i=1
since each v − w, ui is zero. Thus v − w, z = 0 for all z ∈ S, and the proof is complete.
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
168
11. Let A ∈ Rm×n and y ∈ Rm be given. We wish to show that the system AT Ax = AT y always has a solution. By Theorem 291, x ∈ Rn solves AT Ax = AT b if and only if x solves min{ Ax − y 2 : x ∈ Rn }. This last problem is equivalent to projecting y onto col(A), that is, to finding the best approximation to y from col(A). By the projection theorem, we know that the best approximation problem has a unique solution Ax ∈ col(A). Hence AT Ax = AT b has a solution x ∈ Rn . 12. In the previous exercise, the projection theorem guarantees that there is a unique Ax ∈ col(A) closest to y. However, if N (A) is nontrivial, then there are infinitely many y ∈ Rn with Ay = Ax. Thus, while the vector Ax is unique (as guaranteed by the projection theorem), the vector x may not be unique. 13. Assume A ∈ Rm×n . We know that S : col(AT ) → col(A) defined by S(x) = Ax is an isomorphism (Exercise 6.2.11). (a) By Exercise 6.2.11, R : col((AT )T ) → col(AT ) defined by R(y) = AT y is an isomorphism. Since (AT )T = A, we see that R maps col(A) onto col(AT ), and this gives another isomorphism between col(A) and col(AT ). (b) Now let y ∈ Rm be given. Then AT y ∈ col(AT ). Bu the above results, there is a unique z ∈ col(A) such that AT z = AT y. Since z ∈ col(A), there exists x ∈ Rn such that Ax = z, and hence such that AT Ax = AT y. (c) The vector z ∈ col(A) is unique, as noted above, but there may be many u ∈ Rn such that Au = z. Thus x may not be unique. 14. Let A ∈ Rm×n and y ∈ Rm . (a) In the previous exercise, we showed that there exists a unique z ∈ col(A) such that AT z = AT y. Since col(AT ) is isomorphic to col(A), there exists a unique x ∈ col(AT ) such that Ax = z. This vector x is the only vector in col(AT ) satisfying the equation AT Ax = AT y. (b) Now let x̂ ∈ Rn be any solution to AT Ax = AT y, and define x to be the projection of x̂ onto col(AT ). We wish to prove that x is the unique solution of AT Ax = AT b that belongs to col(AT ). First we show that x is a solution of AT Ax = AT b. By the projection theorem, x̂ − x is orthogonal to every vector in col(AT ); that is (x̂ − x) · w = 0 for all w ∈ col(AT ) ⇔ (x̂ − x) · AT v = 0 for all v ∈ Rm ⇔ A(x̂ − x) · v = 0 for all v ∈ Rm ⇔ A(x̂ − x) = 0 ⇔ Ax̂ = Ax. It follows that AT Ax = AT Ax̂ = AT y. Now suppose x̃ ∈ col(AT ) also satisfies AT Ax̃ = AT y. We then have AT Ax̃ = AT Ax ⇒ AT A(x̃ − x) = 0 ⇒ AT A(x̃ − x) · u = 0 for all u ∈ Rn ⇒ A(x̃ − x) · Au = 0 for all u ∈ Rn ⇒ A(x̃ − x) · A(x̃ − x) = 0 ⇒ A(x̃ − x) = 0 ⇒ Ax̃ = Ax. Since A defines a bijection from col(AT ) onto col(A), this shows that x̃ must equal x, and we have proved the uniqueness of x. 15. Let A ∈ Rm×n , where m < n and rank(A) = m. Let y ∈ Rm .
6.4. THE PROJECTION THEOREM
169
(a) Since rank(A) = dim(col(A)), the fact that rank(A) = m proves that col(A) = Rm . Thus Ax = y has a solution for all y ∈ Rm . Moreover, by Theorem 93, N (A) is nontrivial (a linear operator cannot be injective unless the dimension of the co-domain is at least as large as the dimension of the domain), and hence Ax = y has infinitely many solutions. (b) Consider the matrix AAT ∈ Rm×m . We know from Exercise 1(a) (applied to AT ) that N (AAT ) = N (AT ). By Exercise 6.2.11, rank(AT ) = rank(A) = m, and hence nullity(AT ) = 0 by the fundamental theorem of linear algebra. Since N (AAT ) = N (AT ) is trivial, this proves that the square matrix AAT is invertible. −1 (c) If x = AT AAT y, then −1 −1 Ax = A AT AAT y = (AAT ) AAT y = y, and hence x is a solution of Ax = y. 16. Suppose V is a finite-dimensional inner product space over R, S is a finite-dimensional subspace of V , and P : V → V is defined by P (v) = projS v for all v ∈ V . We call P the orthogonal projection operator onto S. (a) We first prove that P is linear. We first note that, for any u ∈ V , P (u) is the unique vector in S satisfying u − P (u), s = 0 for all s ∈ S. Now suppose u, v ∈ V , and w = P (u), z = P (v). Then w, z ∈ S and u − w, s = 0 for all s ∈ S, v − z, s = 0 for all s ∈ S. Adding these equations yields u − w, s + v − z, s = 0 or (u + v) − (w + z), s = 0 for all s ∈ S. This last equation implies, by the definition of P and the projection theorem, that w + z = P (u + v), that is, that P (u + v) = P (u) + P (v). Similarly, if u ∈ V , α ∈ R, and w = P (u), then u − w, s = 0 for all s ∈ S ⇒ α u − w, s = 0 for all s ∈ S ⇒ α(u − w), s = 0 for all s ∈ S ⇒ αu − αw, s = 0 for all s ∈ S ⇒ P (αu) = αw ⇒ P (αu) = αP (u). Thus we have shown that P is a linear operator. (b) By definition, P (u) ∈ S for all u ∈ V , and therefore the vector in S closest to P (u) is obviously P (u) itself. This implies that P (P (u)) = P (u), that is, P 2 (u) = P (u). Since this holds for all u ∈ V , this shows that P 2 = P . (c) Let u, v ∈ V be given. By definition of P , we have u − P (u), s = 0 for all s ∈ S; since P (v) ∈ S, this implies that u − P (u), P (v) = 0, and hence that u, P (v) = P (u), P (v) . Similar reasoning shows that v, P (u) = P (v), P (u) . But then u, P (v) = P (u), P (v) = P (v), P (u) = v, P (u) = P (u), v . Since this holds for all u, v ∈ V , this shows that P is self-adjoint.
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
170
17. Suppose V is a finite-dimensional inner product space over R, and assume that P : V → V satisfies P 2 = P and P ∗ = P . We wish to prove that there exists a subspace S of V such that P (v) = projS v for all v ∈ V . We define S = R(P ). If we prove that u − P (u), s = 0 for all s ∈ S, then, by the projection theorem, it will follow that P (u) = projS u for all u ∈ V , and the proof will be complete. Let s ∈ S be arbitrary; then, by definition of S, there exists v ∈ V such that s = P (v). But then u − P (u), s = u − P (u), P (v) = P (u − P (u)), v 0 1 = P (u) − P 2 (u), v = P (u) − P (u), v = 0, v = 0. Thus P is the orthogonal projection operator onto S. 18. Let A ∈ Rm×n have full rank. Then we know from Exercise 1 that AT A is invertible. Define P : Rm → Rm by P (y) = A(AT A)−1 AT y for all y ∈ Rm . We will show that P is the orthogonal projection operator onto col(A). Since P (y) ∈ col(A) for all y ∈ Rm , it suffices to prove that (y − P (y)) · z = 0 for all z ∈ col(A). Let y ∈ Rm be arbitrary, and let z be any element of col(A). Then there exists x ∈ Rn such that z = Ax, and therefore (y − P (y)) · z = y − A(AT A)−1 AT y · Ax = AT y − A(AT A)−1 AT y · x = AT y − AT A(AT A)−1 AT y · x = AT y − AT y · x = 0 · x = 0. This completes the proof. Notice that one could also show directly that P ∗ = P and P 2 = P by proving T 2 A(AT A)−1 AT = A(AT A)−1 AT and A(AT A)−1 AT = A(AT A)−1 AT . 19. Let V be a vector space over a field F , and let P : V → V . Suppose P is a projection operator, that is, suppose P 2 = P . (a) If I : V → V is the identity operator, then, for any v ∈ V , (I − P )2 (v) = (I − P )((I − P )(v)) = I((I − P )(v)) − P ((I − P )(v)) = I(v − P (v)) − P (v − P (v)) = v − P (v) − (P (v) − P 2 (v)) = v − P (v) − (P (v) − P (v)) = v − P (v) = (I − P )(v). Since this holds for all v ∈ V , this shows that (I − P )2 = I − P , and hence that I − P is a projection operator. (b) Now define S = R(P ), T = R(I − P ).
6.4. THE PROJECTION THEOREM
171
i. First we prove that S ∩ T = {0}. If z ∈ S ∩ T , then there exist u, v ∈ V such that z = P (u) = (I − P )(v). Then we have P (u) = v − P (v). Applying P to both sides of this equation yields P 2 (u) = P (v − P (v)) = P (v) − P 2 (v). But then, since P 2 = P , we have P (u) = 0. Since z = P (u), this shows that z = 0 and hence that S ∩ T = {0}. ii. Next, we find ker(P ) and ker(I − P ). Notice that P ((I − P )(v)) = P (v − P (v)) = P (v) − P 2 (v) = P (v) − P (v) = 0, and hence T = R(I − P ) ⊂ ker(P ). Conversely, if v ∈ ker(P ), then P (v) = 0 ⇒ v − P (v) = v ⇒ (I − P )(v) = v, and therefore v ∈ R(I − P ). Thus ker(P ) = R(I − P ) = T . A similar proof shows that ker(I − P ) = R(P ) = S. iii. Now we show that V is the direct sum of S and T . Let v ∈ V be given. Then v = P (v) + v − P (v) = P (v) + (I − P )(v). Since P (v) ∈ S and (I − P )(v) ∈ T , this shows that V = S + T . Moreover, since S ∩ T = {0}, it follows from Theorem 226 that V is the direct sum of S and T . 20. Let A ∈ Rm×n have full rank. We know from the preceding results that the matrix B = I −A(AT A)−1 AT defines a projection operator onto col(B). (a) By the previous exercise, we know that col(B) = N A(AT A)−1 AT . We obviously have N (AT ) ⊂ N A(AT A)−1 AT . Conversely, suppose y ∈ N A(AT A)−1 AT . Then A(AT A)−1 AT y = 0 ⇒ (AAT y) · A(AT A)−1 AT y = 0 ⇒ (AT y) · AT A(AT A)−1 AT y = 0 ⇒ (AT y) · AT y = 0 ⇒ AT y = 0 ⇒ y ∈ N (AT ).
Thus N (AT ) = N A(AT A)−1 AT , and we have shown that col(B) = N (AT ). (b) We already know that B defines a projection operator (since the matrix A(AT A)−1 AT defines a projection operator P , and B is the matrix of I − P ). Thus B 2 = B. It suffices to prove that B T = B (since then the projection operator defined by B is self-adjoint and we can apply Exercise 17). We have T T B T = I − A(AT A)−1 AT = I T − A(AT A)−1 AT = I − (AT )T ((AT A)−1 )T AT . Now, we have seen that ((AT A)−1 )T = ((AT A)T )−1 , and (AT A)T = AT A. Thus ((AT A)−1 )T = (AT A)−1 . Also, (AT )T = A. Therefore, B T = I − A(AT A)−1 AT = B, and the proof is complete. (c) Since A(AT A)−1 AT defines the (orthogonal) projection onto col(A) and I − A(AT A)−1 AT defines the (orthogonal) projection onto N (AT ), Exercise 19(b)iii implies that every y ∈ Rm can be written uniquely as y = r + z, r ∈ col(A) and z ∈ N (AT ). 21. Consider real numbers x0 , . . . , xn satisfying a ≤ x0 < · · · < xn ≤ b. Define P : V → V , where V = C[a, b], by the condition that p = P (u) is the unique polynomial of degree at most n interpolating the function u at the points x0 , x1 , . . . , xn (that is, p(xi ) = u(xi ) for i = 0, 1, . . . , n). To prove that P is a projection operator, we must simply prove that P 2 = P , that is, that P (P (u)) = P (u) for all u ∈ C[a, b]. By definition, P (u) is the unique polynomial in Pn interpolating u at x0 , . . . , xn , and P (P (u)) is the unique polynomial in Pn interpolating P (u) at x0 , . . . , xn . Since u and P (u) have the same values at x0 , . . . , xn , it follows that P (u) and P (P (u)) are the same polynomial. This holds for all u ∈ C[a, b], and hence P2 = P.
172
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
22. Let x1 , x2 , . . . , xn be linearly independent vectors in Rm (n < m), and let S = sp{x1 , x2 , . . . , xn }. If we define A = [x1 | · · · |xn ] ∈ Rm×n , then A has full rank (since its columns are linearly independent), and S = col(A). Therefore, by Exercise 18, A(AT A)−1 AT is the matrix representing the orthogonal projection operator onto S. If {x1 , x2 , . . . , xn } is an orthonormal basis for S, then AT A = I and the matrix simplifies to AAT .
6.5
The Gram-Schmidt process
1. Let V be an inner product space over R, and let u ∈ V , u = 0. The one-dimensional subspace S spanned by u can be regarded as a line in V , and {u} is (trivially) an orthogonal basis for S. Therefore, projS v =
v, u u. u, u
2. According to Example 297, {p1 , p2 , p3 }, where p1 (x) = 1, p2 (x) = x − 1/2, p3 (x) = x2 − x + 1/6, is an orthogonal basis for P2 . Hence, the best approximation to f (x) = sin (πx) from P2 (relative to the L2 (0, 1) norm) is f, p2 2 f, p3 2 f, p1 2 p1 (x) + p2 (x) + p3 (x) p1 , p1 2 p2 , p2 2 p3 , p3 2 1 60(π 2 − 12) 1 2 2 + x −x+ = ·1+0· x− π 2 π3 6 12π 2 − 120 720 − 60π 2 60π 2 − 720 2 = + x+ x π3 π3 π3 . = − 0.050465 + 4.1225x − 4.1225x2. This is the same result found in Example 296. 3. We wish to find an orthogonal basis for P3 , regarded as a subspace of L2 (0, 1). We already know that {1, x−1/2, x2 −x+1/6} is an orthogonal basis for P2 . Labeling these polynomials as p1 , p2 , p3 , respectively, and writing q(x) = x3 , we compute q, p1 q, p2 q, p3 p1 (x) − p2 (x) − p3 (x) p1 , p1 p2 , p2 p3 , p3 9 1 3 1 1 = x3 − · 1 − x− − x2 − x + 4 10 2 2 6 1 3 3 = x3 − x2 + x − . 2 5 20
p4 (x) = q(x) −
Then {p1 , p2 , p3 , p4 } is an orthogonal basis for P4 . 4. The best approximation to f (x) = sin (πx) from P3 , relative to the L2 (0, 1) norm, is f, p2 2 f, p3 2 f, p4 2 f, p1 2 p1 (x) + p2 (x) + p3 (x) + p4 (x). p1 , p1 2 p2 , p2 2 p3 , p3 2 p4 , p4 2 Since f, p4 2 = 0, this turns out to be the same quadratic polynomial computed in Exercise 2. 5. (a) The best cubic approximation, in the L2 (−1, 1) norm, to the function f (x) = ex , is the polynomial q(x) = α1 +α2 x+α3 x2 +α4 x3 , where Gα = b. G is the Gram matrix, and b is defined by bi = f, p1 2 , where pi (x) = xi−1 , i = 1, 2, 3, 4: ⎡ ⎤ ⎡ ⎤ 2 0 23 0 e − e−1 ⎢ 0 2 0 2 ⎥ ⎢ ⎥ 2e−1 3 5 ⎥ ⎢ ⎥ G=⎢ ⎣ 2 0 2 0 ⎦ , b = ⎣ e − 5e−1 ⎦ . 3 5 16e−1 − 2e 0 25 0 27
6.5. THE GRAM-SCHMIDT PROCESS
173 ⎡
We obtain
⎢ ⎢ α=⎢ ⎣
33 3e 4e − 4 105e 765 4 − 4e 105 15e 4 − 4e 1295 175e 4e − 4
⎤ ⎥ ⎥ ⎥. ⎦
(b) Applying the Gram-Schmidt process to the standard basis {1, x, x2 , x3 }, we obtain the orthogonal basis 5 1 3 1, x, x2 − , x3 − x . 3 5 (c) Using the orthogonal basis for P3 , we compute the best approximation to f (x) = ex from P3 (relative to the L2 (−1, 1) norm) to be 3 1 3 e − e−1 15e 105 1295 175e + x+ − x2 − + − x3 − x . 2 e 4 4e 3 4e 4 5 6. Let S be the subspace of R4 spanned by u1 = (1, 0, 1, −1), u2 = (1, 0, 0, 1), u3 = (1, 2, 1, 1). (a) Applying the Gram-Schmidt process to {u1 , u2 , u3 } yields {v1 , v2 , v3 }, where v1 = u1 , v2 = u2 (this holds because u1 and u2 are already orthogonal), and v3 = (−1/3, 2, 2/3, 1/3). (b) 4 1 17 projS x = v1 + v2 + v3 = 3 2 14
10 17 15 3 , , ,− 7 7 7 7
.
7. Let S be the subspace of R5 spanned by u1 = (1, 1, 1, 1, 1), u2 = (1, −1, 1, −1, 1), u3 = (2, 0, 1, 1, 1), u4 = (1, −2, 0, −1, 1). (a) Applying the Gram-Schmidt process to {u1 , u2 , u3 , u4 } yields the orthogonal basis {v1 , v2 , v3 , v4 } for S, where v1 = (1, 1, 1, 1, 1), v2 = (4/5, −6/5, 4/5, −6/5, 4/5), v3 = (2/3, −1/2, −1/3, 1/2, −1/3), v4 = (−1/7, −1/7, −3/7, 1/7, 4/7). (b) We have projS x = (5/4, 9/4, 7/4, 7/4, 1). 8. Let S be the subspace of C[0, 1] spanned by ex and e−x . Regard C[0, 1] as an inner product space under the L2 (0, 1) inner product. (a) Applying the Gram-Schmidt process to {ex , e−x }, we obtain the orthogonal basis {g1 , g2 }, where g1 (x) = ex and 2 ex . g2 (x) = e−x − 2 e −1 (b) The best approximation to f (x) = x from S (with respect to the L2 (0, 1) norm) is F (x) = c1 g1 (x) + c2 g2 (x), where 2 4e − 6e2 − 4e3 + 2e4 . c1 = 2 , c2 = e −1 1 + e4 − 6e2 9. Let P be the plane in R3 defined by the equation 3x − y − z = 0.
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
174
(a) A basis for P is {(1, 3, 0), (1, 0, 3)}; applying the Gram-Schmidt process to this basis yields the orthogonal basis {(1, 3, 0), (9/10, −3/10, 3)}. (b) The projection of u = (1, 1, 1) onto P is (8/11, 12/11, 12/11). 10. Suppose f ∈ C[a, b] and pn is the best approximation, in the L2 (a, b) norm, to f from Pn . We wish to prove that, for all n = 0, 1, 2, . . ., f − pn+1 2 ≤ f − pn 2 . By definition, we have f − pn+1 2 ≤ f − q 2 for all q ∈ Pn+1 . Since pn ∈ Pn ⊂ Pn+1 , it follows that f − pn+1 2 ≤ f − pn 2 , as desired. 11. (a) If we apply the Gram-Schmidt process to the standard basis {1, x, x2 }, using the H 1 (0, 1) inner product, we obtain the same orthogonal basis as we do with the L2 (0, 1) inner product: 5 1 2 1 . 1, x − , x − x + 2 6 (b) The best approximation to f (x) = sin (πx) from P2 in the H 1 (0, 1) norm is 60(11π 2 + 12) 2 − π 61π 3
1 2 . x −x+ 2
(c) The earlier result has a smaller error in the L2 norm, which is visible in the following graph (absolute error in the H 1 approximation is the solid curve, absolute errror in the L2 approximation is the dashed curve): 0.06 0.05 0.04 0.03 0.02 0.01 0 0
0.2
0.4
0.6
0.8
1
Howver, the error in the derivative is smaller for the H 1 approximation: 1 0.8 0.6 0.4 0.2 0 0
0.2
0.4
0.6
0.8
1
(The solid curve is the absolute error in the derivative for the H 1 approximation, while the dashed curve is the absolute error in the derivative for the L2 approximation.) 12. We have already seen that the following sets are orthogonal under the L2 (0, 1) inner product: {sin (πx), sin (2πx), . . . , sin (nπx)}, {1, cos (πx), cos (2πx), . . . , cos (nπx)}. Thus
1
1 sin (nπx) sin (mπx) dx =
0
0
cos (nπx) cos (mπx) dx = 0, m = n
6.5. THE GRAM-SCHMIDT PROCESS
175
1
and also 0 cos (nπx) dx = 0 for all positive n. If ·, · represents the H 1 (0, 1) inner product, φn (x) = sin (nπx), ψn (x) = cos (nπx), then φn , φm =
1
,
sin (nπx) sin (mπx) + mnπ 2 cos (nπx) cos (mπx) dx
0
1 =
sin (nπx) sin (mπx) dx + mnπ 2
1
0
cos (nπx) cos (mπx) dx 0
= 0. Similarly, ψn , ψm =
1
, cos (nπx) cos (mπx) + mnπ 2 sin (nπx) sin (mπx) dx
0
1 cos (nπx) cos (mπx) dx + mnπ
= 0
2
1 sin (nπx) sin (mπx) dx 0
= 0. Finally, since the derivative of the constant function 1 is zero, we have ψ0 , ψn =
1 cos (nπx) dx = 0. 0
This proves that the above set are also orthogonal with respect to the H 1 (0, 1) inner product. 13. Define an inner product on C[0, 1] by f, g =
1 (1 + x)f (x)g(x) dx. 0
(a) We will first verify that ·, · really does define an inner product on C[0, 1]. For all f, g ∈ C[0, 1], we have f, g =
1
1
(1 + x)f (x)g(x) dx = 0
0
(1 + x)g(x)f (x) dx = g, f ,
and thus the first property of an inner product is satisfied. If f, g, h ∈ C[0, 1] and α, β ∈ R, then αf + βg, h =
1 (1 + x)(αf (x) + βg(x))h(x) dx 0
1 = 0
{α(1 + x)f (x)h(x) + β(1 + x)g(x)h(x)} dx 1
1 (1 + x)f (x)h(x) dx + β
=α 0
(1 + x)g(x)h(x) dx 0
= α f, h + β g, h . This verifies the second property of an inner product. Finally, for any f ∈ C[0, 1], f, f =
1 0
(1 + x)f (x)2 dx ≥ 0
(since (1 + x)f (x)2 ≥ 0 for all x ∈ [0, 1]). Also, if f, f = 0, then (1 + x)f (x)2 = 0 for all x ∈ [0, 1]. Since 1 + x > 0 for all x ∈ [0, 1], this implies that f (x)2 ≡ 0, or f (x) ≡ 0. Therefore, f, f = 0 if and only if f = 0, and we have verified that ·, · defines an inner product on C[0, 1].
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
176
(b) Applying the Gram-Schmidt process to the standard basis yields the orthogonal basis 5 68 5 5 . 1, x − , x2 − x + 9 65 26 14. Let {u1 , u2 , . . . , un } be a basis for an inner product space V . Consider the following algorithm: Initially set ûi = ui for all i = 1, 2, . . . , n. For each k = 1, 2, . . . , n − 1, replace the value of ûi as follows: ûi ← ûi −
ûi , ûk ûk , i = k + 1, . . . , n. ûk , ûk
We wish to show that this procedure produces an orthogonal basis for V . To write out the proof, we will have to label the values of the vectors at each iteration. Accordingly, we define ûi,0 = ui , i = 1, . . . , n. For each k = 1, . . . , n − 1, we define ûi,k = ûi,k−1 , i = 1, . . . , k, ûi,k = ûi,k−1 −
ûi,k−1 , ûk,k ûk,k , i = k + 1, . . . , n. ûk,k , ûk,k
We wish to prove that {û1,n−1 , . . . , ûn,n−1 } is orthogonal. We will do this by proving by induction that, for all k = 1, . . . , n − 1, {û1,k , . . . , ûk,k } is orthogonal (6.1) and ûi,k , ûj,k = 0 if 1 ≤ j ≤ k, k + 1 ≤ i ≤ n.
(6.2)
We first verify that (6.1) and (6.2) hold for k = 1. Condition (6.1) simplifies to the statement that {û1,1 } is orthogonal, which is vacuously true (every vector in the set is orthogonal to every other vector). We also have, for 2 ≤ i ≤ n, ûi,1 , û1,1 = ûi,0 , û1,1 −
ûi,0 , û1,1 û1,1 , û1,1 = 0, û1,1 , û1,1
which verifies (6.2). Now suppose (6.1), (6.2) hold with k − 1 in place of k. Then {û1,k , . . . , ûk−1,k } = {û1,k−1 , . . . , ûk−1,k−1 } is orthogonal by (6.1), and ûk,k = ûk,k−1 is orthogonal to each of û1,k−1 , . . . , ûk−1,k−1 by (6.2). This proves that {û1,k , . . . , ûk,k } is orthogonal, and thus (6.1) holds. Now let i, j satisfy 1 ≤ j ≤ k, k + 1 ≤ i ≤ n. We have ûi,k , ûj,k = ûi,k−1 , ûj,k −
ûi,k−1 , ûk,k ûk,k ûj,k . ûk,k , ûk,k
If j < k, then ûi,k−1 , ûj,k = ûi,k−1 , ûj,k−1 = 0 and ûk,k , ûj,k = ûk,k−1 , ûj,k−1 = 0, both by (6.2), and hence ûi,k , ûj,k = 0. If j = k, then ûi,k , ûk,k = ûi,k−1 , ûk,k −
ûi,k−1 , ûk,k ûk,k ûk,k = 0. ûk,k , ûk,k
6.6. ORTHOGONAL COMPLEMENTS
177
This proves that (6.2) holds, and completes the proof by induction. Finally, (6.1) and (6.2), for k = n − 1, show that {û1,n−1 , . . . , ûn−1,n−1 } is orthogonal and ûn,n−1 is orthogonal to each of û1,n−1 , . . . , ûn−1,n−1 . This proves that {û1,n−1 , . . . , ûn,n−1 } is orthogonal, as desired. 15. Let wn be the projection of pn (x) = xn onto Pn−1 (regarded as a subspace of L2 (0, 1)), n = 1, 2, 3, . . . , N . The following table shows n versus the relative error pn − wn 2 / pn 2 in wn as an approximation to pn : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
5.0000 · 10−1 1.6667 · 10−1 5.0000 · 10−2 1.4286 · 10−2 3.9683 · 10−3 1.0823 · 10−3 2.9138 · 10−4 7.7700 · 10−5 2.0568 · 10−5 5.4125 · 10−6 1.4176 · 10−6 3.6980 · 10−7 9.6148 · 10−8 2.4927 · 10−8 6.4467 · 10−9 1.6637 · 10−9 4.2852 · 10−10 1.1019 · 10−10 2.8292 · 10−11 7.2544 · 10−12
We see that pn lies more nearly in Pn−1 as n increases.
6.6
Orthogonal complements
1. Let S = sp{(1, 2, 1, −1), (1, 1, 2, 0)}. We wish to find a basis for S ⊥ . Let A ∈ R2×4 be the matrix whose rows are the given vectors (the basis vectors for S). Then x ∈ S ⊥ if and only if Ax = 0; that is, S ⊥ = N (A). A direct calculation shows that S ⊥ = N (A) = {(−3, 1, 1, 0), (−1, 1, 0, 1)}. 2. Let S be a plane in R3 that passes through the origin. Then, by Theorem 306, R3 is the direct sum of S and S ⊥ . This implies, since dim(S) = 2, that dim(S ⊥ ) = 1, and hence there exists v = 0 such that S ⊥ = sp{v} = {αv : α ∈ R}. This shows that S ⊥ is a line through the origin. 3. A basis for col(A) is {(1, 1, 2), (2, 3, 8), (−1, 2, 9)}, which shows that col(A) is three-dimensional and hence that col(A) = R3 . It follows that N (AT ) = col(A)⊥ must be trivial, and a direct calculation verifies this: the only solution of AT y = 0 is y = 0. Therefore, N (AT ) has no basis. 4. Bases for N (A) and col(AT ) are {(24, −5, 1)} and {(1, 4, −4), (1, 3, −9)}, respectively. Direct calculation verifies that (24, −5, 1) · (1, 4, −4) = (24, −5, 1) · (1, 3, −9) = 0. Also, a direct calculation verifies that {(24, −5, 1), (1, 4, −4), (1, 3, −9)} is linearly independent and hence a basis for R3 .
178
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
5. (a) Since N (A) is orthogonal to col(AT ), it suffices to orthogonalize the basis of col(AT ) by (a single step of) the Gram-Schmidt process, yielding {(1, 4, −4), (−16/33, −97/33, −101/33)}. Then {(24, −5, 1), (1, 4, −4), (−16/33, −97/33, −101/33)} is an orthogonal basis for R3 . (b) A basis for N (AT ) is {(1, −1, 1, 0), (−2, 1, 0, 1)} and a basis for col(A) is {(1, 1, 0, 1), (4, 3, −1, 5)}. Applying the Gram-Schmidt process to each of the bases individually yields {(1, −1, 1, 0), (−1, 0, 1, 1)} and {(1, 1, 0, 1), (0, −1, −1, 1)}, respectively. The union of these two bases, {(1, 1, 0, 1), (−1, 0, 1, 1), (1, 1, 0, 1), (0, −1, −1, 1)}, is an orthogonal basis for R4 . 6. Let D : P2 → P2 be defined by D(p) = p . We wish to find bases for the four fundamental subspaces of D, assuming that the L2 (0, 1) inner product is imposed on P2 . We have ker(D) = sp{1} and R(D) = sp{1, x}. We can find a basis for ker(D∗ ) = R(D)⊥ by finding a polynomial orthogonal to both 1 and x; we obtain ker(D∗ ) = sp{1 − 6x + 6x2 }. Similarly, R(D∗ ) = ker(D)⊥ , and hence R(D∗ ) = sp{−6 + 12x, 2 − 24x + 30x2 }. 7. Let V be an inner product space over R, and let S be a nonempty subset of V . We wish to prove that S ⊥ is a subspace of V . First, notice that 0, s = 0 for all s ∈ S, which shows that 0 ∈ S ⊥ . Next, suppose u, v ∈ S ⊥ . Then, for any s ∈ S, u + v, s = u, s + v, s , and both u, s and v, s are 0 since u, v ∈ S ⊥ . Thus u + v, s = 0 for all s ∈ S, which shows that u + v ∈ S ⊥ . Thus S ⊥ is closed under addition. Finally, suppose v ∈ S ⊥ and α ∈ R. Then, for any s ∈ S, αv, s = α v, s , and v, s = 0 since v ∈ S ⊥ . Therefore αv, s = 0 for all s ∈ S, which implies that αv ∈ S ⊥ and hence that S ⊥ is closed under scalar multiplication. Thus S ⊥ is a subspace. 8. Let V be an inner product space over R, and let S, T be orthogonal subspaces of V . We wish to prove that S ∩ T = {0}. Suppose v ∈ S ∩ T . Since v ∈ S and v ∈ T and S, T are orthogonal, it follows that v, v = 0 (since every vector in S is orthogonal to every vector in T ). But then, by the definition of an inner product, v = 0. This proves the desired result. 9. (a) Let A ∈ Rn×n be symmetric. Then N (A)⊥ = col(A) and col(A)⊥ = N (A). (b) It follows that y ∈ col(A) if and only if y ∈ N (A)⊥ ; in other words, Ax = y has a solution if and only if Az = 0 ⇒ y · z = 0. 10. Let V be vector space over a field F , and let S be a nonempty subset of V . Define sp(S) to be the set of all finite linear combinations of elements of S. (a) We wish to prove that sp(S) is a subspace of V . First, since S is nonempty, there exists v ∈ S, and 0 = 0 · v is a finite linear combination; this shows that 0 ∈ sp(S). Next, suppose u, v ∈ sp(S). Then each of u and v can be written as a linear combination of a finite number of vectors from S, say u = α1 s1 + · · · + αk sk , v = β1 t1 + · · · + β t , which implies that u + v = α1 s1 + · · · + αk sk + β1 t1 + · · · + β t , and hence u + v is a finite linear combination of vectors in S. Finally, if u ∈ S and α ∈ R, then u = α1 s1 + · · · + αk sk for some positive integer k and α1 , . . . , αk ∈ R, s1 , . . . , sk ∈ S. Then αu = α(α1 s1 + · · · + αk sk ) = (αα1 )s1 + · · · + (ααk )sk , and hence αu is a finite linear combination of vectors in S, that is, αu ∈ sp(S). This completes the proof that sp(S) is a subspace of V .
6.6. ORTHOGONAL COMPLEMENTS
179
(b) Now suppose T is any subspace of V such that S ⊂ T . By Theorem 12, every finite linear combination of vectors in S ⊂ T belongs to T , which shows that sp(S) ⊂ T . Thus sp(S) is the smallest subspace of V that contains S. 11. Let V be a finite-dimensional inner product space over R, and let S be a nonempty subset of V . We wish ⊥ to prove that S ⊥ = sp(S). First notice that, if s ∈ S, then, by definition, v, s = 0 for all v ∈ S ⊥ . But this implies that s ∈ (S ⊥ )⊥ . It follows that S ⊂ (S ⊥ )⊥ . Therefore, since (S ⊥ )⊥ is a subspace of V , the previous exercise implies that sp(S) ⊂ (S ⊥ )⊥ . Conversely, suppose x ∈ (S ⊥ )⊥ , and define y = projsp(S) x. Then, by the projection theorem, x − y ∈ sp(S)⊥ . Now, it is straightforward to show that sp(S)⊥ = S ⊥ . (Clearly, if v ∈ sp(S)⊥ , then v is orthogonal to every vector in sp(S), including every vector in S. Thus v ∈ S ⊥ . Conversely, if v ∈ S ⊥ and x ∈ sp(S), then x is a finite linear combination of vectors in S, and v is orthogonal to each of those vectors and hence to x. Thus v ∈ sp(S)⊥ .) Thus we see that x − y ∈ S ⊥ . However, x ∈ (S ⊥ )⊥ by assumption, and y ∈ sp(S) ⊂ (S ⊥ )⊥ . Since (S ⊥ )⊥ is a subspace of V , this implies that x − y ∈ (S ⊥ )⊥ . By Lemma 302, S ⊥ ∩ (S ⊥ )⊥ = {0}, which shows that x − y = 0. Thus x = y ∈ sp(S), which shows that x ∈ sp(S). Therefore, (S ⊥ )⊥ ⊂ sp(S), and the proof is complete. 12. Let X and U be finite-dimensional inner product spaces over R, and let x1 , . . . , xk be vectors in X such that {T (x1 ), . . . , T (xk )} is a basis for R(T ). Suppose α1 , . . . , αk ∈ R satisfy α1 T ∗ (T (x1 )) + · · · + αk T ∗ (T (xk )) = 0. Then α1 T ∗ (T (x1 )) + · · · + αk T ∗ (T (xk )) = 0 ⇒ T ∗ (T (α1 x1 + · · · + αk xk )) = 0 ⇒ α1 x1 + · · · + αk xk , T ∗ (T (α1 x1 + · · · + αk xk )) = 0 ⇒ T (α1 x1 + · · · + αk xk ), T (α1 x1 + · · · + αk xk ) = 0 ⇒ T (α1 x1 + · · · + αk xk ) = 0 ⇒ α1 T (x1 ) + · · · + αk T (xk ) = 0 ⇒ α1 = · · · = αk = 0. This shows that {T ∗ (T (x1 )), . . . , T ∗ (T (xk ))} is a linearly independent subset of R(T ∗ ), and hence that rank(T ∗ ) ≥ rank(T ). This result holds for all linear operators mapping one finite-dimensional inner product space into another, and hence it holds for T ∗ : U → X, yielding rank((T ∗ )∗ ) ≥ rank(T ∗ ). Since (T ∗ )∗ = T , this implies that rank(T ) ≥ rank(T ∗ ), and hence that rank(T ) = rank(T ∗ ). This completes the proof. 13. Let A ∈ Rm×n and y ∈ Rm be given. Suppose A is singular and y ∈ col(A). (a) We wish to show that there is a unique solution x of Ax = y such that x ∈ col(AT ). This follows from Exercise 6.4.13, which proved that there is a unique least-squares solution x of Ax = y that lies in col(AT ). Since y ∈ col(A), Ax − y 2 must be 0, that is, Ax = y must hold. (b) The solution set of Ax = y can be written as x + N (A). Hence, if x is any solution of Ax = y, then there exists z ∈ N (A) such that x = x + z. Since col(AT ), N (A) are orthogonal, the vectors x, z are orthogonal and hence the Pythagorean theorem implies that x 22 = x 22 + z 22 . If x = x, then z = 0, and hence x 2 > x 2 . This shows that x is the minimum-norm solution of Ax = y. 14. Let A ∈ Rm×n , y ∈ Rm . By Exercise 6.4.1, the set of all least-squares solutions of Ax = y is x̂ + N (A), where x̂ ∈ Rn is any one least-squares solution. Exercise 6.4.13 shows that there is a unique least-squares solution x that lies in col(AT ). Therefore, the set of all least-squares solutions of Ax = y is x + N (A). If x is any least-squares solution of Ax = y, then there exists z ∈ N (A) such that x = x + z. Since col(AT ), N (A) are orthogonal, the vectors x, z are orthogonal and hence the Pythagorean theorem implies that x 22 = x 22 + z 22 . If x = x, then z = 0, and hence x 2 > x 2 . This shows that x is the minimum-norm least-squares solution of Ax = y.
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
180
15. Let A ∈ Rm×n be given. We define an operator S : Rm → Rn as follows: x = S(y) is the minimum-norm least-squares solution of Ax = y. By the previous exercises, S is well-defined. (a) By the previous exercises, S can be uniquely defined as follows: x = S(y) if and only if AT Ax = AT y and x ∈ col(AT ). Suppose x1 = S(y1 ), x2 = S(y2 ). Then, since AT Ax1 = AT y1 , AT Ax2 = AT y2 , AT A(x1 + x2 ) = AT Ax1 + AT Ax2 = AT y1 + AT y2 = AT (y1 + y2 ). Also, since x1 , x2 ∈ col(AT ) and col(AT ) is a subspace, it follows that x1 + x2 ∈ col(AT ). Thus we have shown that x1 + x2 = S(y1 + y2 ), that is, S(y1 + y2 ) = S(y1 ) + S(y2 ). Similarly, if x = S(y) and α ∈ R, then AT A(αx) = αAT Ax = αAT y = AT (αy) and αx ∈ col(AT ) since x ∈ col(AT ) and col(AT ) is a subspace. Thus S(αy) = αS(y), and we have shown that S is a linear operator. It follows that there is a matrix A† ∈ Rn×m such that S(y) = A† y for all y ∈ Rm . (b)
i. If A ∈ Rn×n is nonsingular, then A† = A−1 . ii. If A ∈ Rm×n , m > n, has full rank, then A† = (AT A)−1 AT . iii. If A ∈ Rm×n , m < n, has rank m, then A† = AT (AAT )−1 .
16. Using the characterization of S from the previous exercise, we see that 0 = S(y) if and only if AT A0 = AT y, that is, if and only if AT y = 0. Thus N (A† ) = N (AT ). 17. We wish to prove that col(A† ) = col(AT ). From the previous exercises, we know that col(A† ) ⊂ col(AT ). Suppose x ∈ col(AT ), and define y = Ax. Then x satisfies AT Ax = AT y, x ∈ col(AT ), from which it follows that x = S(y), that is, x = A† y. This shows that col(AT ) ⊂ col(A† ), and completes the proof. 18. Now we wish to prove that AA† is the matrix defining the orthogonal projection onto col(A). By definition, x = A† y is a least-squares solution of Ax = y, and hence Ax = AA† y is the closest vector in col(A) to y. This shows that AA† y = projcol(A) y, as desired. 19. Next, we wish to prove that A† A is the matrix defining the orthogonal projection onto col(AT ). Let x ∈ Rn be given. We know that col(A† ) = col(AT ), and hence A† Ax ∈ col(AT ). Also, the previous exercise implies that AA† y = y for all y ∈ col(A), and hence AA† Ax = Ax. This implies that A(x − A† Ax) = Ax − AA† Ax = Ax − Ax = 0, and hence x − A† Ax ∈ N (A) = col(AT )⊥ . By the projection theorem, this implies that A† Ax = projcol(AT ) x, and the proof is complete. 20. (a) Exercise 18 implies that AA† y = y for all y ∈ col(A), that is, AA† Ax = Ax for all x ∈ Rn . Thus AA† A = A. (b) We know that AA† y = projcol(A) y, and hence y − AA† y ∈ col(A)⊥ = N (AT ) = N (A† ). It follows that A† (y − AA† y) = 0 for all y ∈ Rm , that is, A† y = A† AA† y for all y ∈ Rm . This proves that A† AA† = A† . (c) By Exercise 19, A† A is the matrix representing an orthogonal projection operator. It follows from Exercise 6.4.15 that A† A is symmetric, that is, A† A = (A† A)T . Direct proof: We have x − A† Ax ∈ col(AT )⊥ for all x ∈ Rn , and hence x · z = (A† Ax) · z for all z ∈ col(AT ) = col(A† ). Therefore, for any x1 , x2 ∈ Rn , (A† Ax1 ) · x2 = (A† Ax1 ) · (A† Ax2 ) = x1 · (A† Ax2 ). This shows that A† A is a symmetric matrix, that is, that (A† A)T = A† A. (d) Since AA† is the matrix of an orthogonal projection operator, AA† = (AA† )T by Exercise 6.4.15. Direct proof: We have y − AA† y ∈ col(A)⊥ for all y ∈ Rm , and hence y · z = (AA† x) · z for all z ∈ col(A). Therefore, for any y1 , y2 ∈ Rm , (AA† y1 ) · y2 = (AA† y1 ) · (AA† y2 ) = y1 · (AA† y2 ). This shows that AA† is a symmetric matrix, that is, that (AA† )T = AA† .
6.7. COMPLEX INNER PRODUCT SPACES
181
21. We wish to prove that the unique matrix B ∈ Rn×m satisfying ABA = A, BAB = B, BA = (BA)T , AB = (AB)T
(6.3)
is B = A† . We already know that B = A† satisfies these conditions, so let us suppose B ∈ Rn×m satisfies (6.3). We begin by proving the N (B) = N (AT ) and N (B T ) = N (A). We have AB = B T AT , so N (AB) = N (B T AT ). Obviously N (B) ⊂ N (AB). Conversely, ABy = 0 ⇒ BABy = 0 ⇒ By = 0 (since BAB = B), and hence N (AB) ⊂ N (B). Therefore, N (AB) = N (B). Similarly, we have N (AT ) ⊂ N (B T AT ), and B T AT y = 0 ⇒ AT B T AT y = 0 ⇒ (ABA)T y = 0 ⇒ AT y = 0 (since ABA = A). Thus N (B T AT ) ⊂ N (AT ), and therefore N (B T AT ) = N (AT ). Since we showed above that N (AB) = N (B T AT ), it follows that N (B) = N (AT ). A similar proof shows that N (B T ) = N (A). By Corollary 310, we also obtain col(B T ) = col(A) and col(B) = col(AT ). Now, to prove that B = A† , it suffices to prove that By is a least-squares solution of Ax = y and By ∈ col(AT ) for all y ∈ Rm . Notice that, for any y ∈ Rm , By = B(Ax + y − Ax) = By + B(y − Ax), where x is any least-squares solution of Ax = y. With such an x, we have y − Ax ∈ N (AT ) = N (B), and hence By = BAx. But then AT ABy = AT ABAx = AT Ax = AT y (using both ABA = A and AT Ax = AT y, the latter following from the fact that x is a least-squares solution of Ax = y). Thus By satisfies the normal equations and hence is a least-squares solution of Ax = y. Since we showed above that col(B) = col(AT ), we see that By ∈ col(AT ). This completes the proof that B = A† . 22. We wish to determine if (A† )† = A holds for all A ∈ Rm×n . This identity does hold, as shown by the result of the previous exercise. The conditions (6.3) that characterize A† are symmetric in A and A† ; therefore, if we regard A† as the original matrix, then (6.3) shows that A is the pseudoinverse of A† .
6.7
Complex inner product spaces
1. The projection of v onto S is
w=
2 4 4 2 1 4 − i, + i, + i . 3 9 9 3 3 9
2. Applying the Gram-Schmidt process yields the orthogonal basis ⎧⎡ ⎤⎫ ⎤ ⎡ 10 7 ⎤ ⎡ 21 4 ⎪ ⎪ − 9 + 2i − i −i ⎨ ⎬ 275 5 ⎥ ⎢ ⎥ 8 53 1 ⎣ 1 + 4i ⎦ , ⎢ + 89 i ⎦ , ⎣ − 275 − 275 i ⎦ . ⎣ − 18 ⎪ ⎪ ⎩ ⎭ 181 168 0 i 275 + 55 i 3. If m = n, then 0 imπx inπx 1 e ,e = 2
1
eimπx e−inπx dx
−1
1 =
−1
ei(m−n)πx dx
%1 % 1 ei(m−n)πx %% i(m − n)π −1 1 i(m−n)π e = − e−i(m−n)π = 0. i(m − n)π
=
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
182
The last equality follows from the fact that eiθ is 2π-periodic. This shows that the given set S is orthogonal. (Notice that the calculation is valid also if n = 0, in which case the second function is the constant function 1.) 4. The best approximation to f from S is g(x) = −
i −2πix i i i 2πix e e + e−πix + 0 · 1 − eπix + , 2π π π 2π
which simplifies to g(x) =
2 sin (πx) − sin (2πx) . π
5. The best approximation to f from P2 is π 2 − 12 1 1 2i 24 2 − 2 x− + . x − x + π π 2 60π 3 6 6. The least-squares solution to Ax = b (found by solving the normal equations) is 5 + 28 − 13 13 i x= . 6 11 13 + 13 i 7. The least-squares solution to Ax = b (found by solving the normal equations) is 43 − 14 − 2i x= . 1 − 10 7 − 7i 8. Let V be a complex inner product space, and suppose u, v ∈ V satisfy u, v = 0. Then u + v 2 = u + v, u + v = u, u + v, u + u, v + v, v . By assumption, u, v = 0, and v, u = u, v = 0; therefore u + v 2 = u, u + v, v = u 2 + v 2 . The proof that u + v 2 = u 2 + v 2 is similar. 9. (a) Let u = (1, 1), v = (1, −1 + i) ∈ C2 . Then a direct calculation shows that u + v 22 = u 22 + v 22 and u, v 2 = −i = 0. (b) Suppose V is a complex inner product space. If u, v ∈ V and u + v 2 = u 2 + v 2 , then u, u + v, v + u, v + v, u = u, u + v, v ⇔ u, v + v, u = 0 ⇔ u, v + u, v = 0. For any z = x + iy ∈ C, z + z = 0 is equivalent to 2x = 0 or, equivalently, x = 0. Thus u + v 2 = u 2 + v 2 holds if and only if the real part of u, v is zero. 10. Let V be an inner product space over C, let S be a finite-dimensional subspace of V , and let v ∈ V be given. We first prove the following result: A vector w ∈ S satisfies v − w < v − y for all y ∈ S, y = w, if and only if Re v − w, z = 0 for all z ∈ S. To see this, choose any z ∈ S, z = 0, and note that y = w + tz belongs to S for all t ∈ R. We have v − y 2 = v − w − tz 2 = v − w − tz, v − w − tz = v − w, v − w − t v − w, z − t z, v − w + t2 z, z = v − w 2 − t( v − w, z + v − w, z ) + t2 z 2 = v − w 2 − 2tRe v − w, z + t2 z 2 .
6.7. COMPLEX INNER PRODUCT SPACES
183
Similar to the proof of the theorem in the real case, we see that v − w < v − (w + tz) for all t ∈ R, t = 0, if and only if Re v − w, z = 0. Since every y ∈ S can be represented in the form y = w + tz for some z ∈ S, t ∈ R (just take t = 1 and z = y − w), it follows that v − w < v − y for all y ∈ S, y = w, if and only if Re v − w, z = 0 for all z ∈ S. Next notice that if z ∈ S, then iz also belongs to S, and v − w, iz = −i v − w, z ⇒ Re v − w, iz = Re(−i v − w, z ) = Im( v − w, z ). It follows that Re v − w, z = 0 for all z ∈ S is equivalent to v − w, z = 0 for all z ∈ S. We have now proved part 2 of Theorem 316. The proofs of parts 1 and 3 follow just as in the real case (Theorem 289). 11. Let V and W be finite-dimensional inner product spaces over C, and let L : V → W be linear. We wish to prove that there is a unique linear operator K : W → V such that L(v), w W = v, K(w) V for all v ∈ V, w ∈ W.
(6.4)
We will show that, given bases V, W of V , W , respectively, there is a unique B ∈ Cn×m such that K : W → V defined by [K]W,V = B satisfies (6.4). First, let v ∈ V , w ∈ W be given, and define α = [v]V ∈ Cn , β = [w]W ∈ Cm . Then m 3 2 n L(v), w W =
L
αi vi i=1
,
βj wj j=1
W
m
n
αi βj L(vi ), wj W
= i=1 j=1
⎛
n
αi ⎝
= i=1 n
⎛
i=1
L(vi ), wj W βj ⎠
j=1 m
αi ⎝
=
⎞
m
⎞ wj , L(vi ) W βj ⎠.
j=1
Thus L(v), w W = α, M β Cn , where M ∈ Cn×m is defined by Mij = wj , L(vi ) W , i = 1, . . . , n, j = 1, . . . , m. Now let B ∈ Cn×m be given and define K : W → V by [K]W,V = B. Then 3 2 n n v, K(w) V =
αi vi , i=1 n
(Bβ)j vj j=1
V
n
αi (Bβ)j vi , vj V
= i=1 j=1 n
= i=1 n
= i=1
⎛
αi ⎝ ⎛ αi ⎝
n
⎞ vi , vj V (Bβ)j ⎠
j=1 n
⎞ vj , vi V (Bβ)j ⎠
j=1
= α, GBβ Cn ,
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
184
where G is the Gram matrix for the basis V. Thus we see that (6.4) holds if and only if n
α, M β C = α, GBβ Cn for all α ∈ Cn , β ∈ Cm . This is true if and only if GB = M . Since G is invertible, we finally see that (6.4) holds if and only if B = G−1 M . Thus L∗ exists, is unique, and is defined by [L∗ ]W,V = G−1 M . 12. Let X be a finite-dimensional inner product space over C, and suppose that f : X → C is linear. We wish to prove that there exists a unique u ∈ X such that f (x) = x, u for all x ∈ X.
(6.5)
Let X = {x1 , . . . , xn } be a basis for X. We first assume that u exists, and write u = (6.5) implies that, for each i = 1, . . . , n, 3 2 n f (xi ) =
xi ,
n j=1 αj xj .
Then
αj xj j=1
n
xi , xj αj
= j=1 n
xj , xi αj
= j=1
= (Gα)i , or (Gα)i = f (xi ), where G is the Gram matrix for X . Define b ∈ Cn by bi = f (xi ) and α ∈ Cn by n α = G−1 b. Then we have shown that if there exists u ∈ V such that (6.5) holds, then u = j=1 αj xj . In particular, there is at most one such u. Now define u by u = j=1 αj xj , where α = G−1 b and G, b are defined as above. Then any x ∈ X can n be written as x = i=1 γi xi , and ⎛ ⎞ 3 2 n
n
x, u =
n
γj xj , j=1
n
αi xi i=1
n
n
γj αi xj , xi =
= j=1 i=1
i=1
n
⎝
xj , xi γj ⎠ αi
j=1
= Gγ, α Cn = γ, Gα Cn = γ, b Cn n n =
γi f (xi ) = f i=1
γi xi
= f (x).
i=1
Thus u satisfies (6.4), and we have proved the existence of u. 13. Let A ∈ Cn×n be Hermitian and suppose λ ∈ C, x ∈ Cn satisfy Ax = λx, x = 0. Then λ x, x Cn = λx, x Cn = Ax, x Cn ⇒ λ =
Ax, x Cn . x, x Cn
By Theorem 319, Ax, x Cn ∈ R, and x, x Cn ∈ R by the definition of an inner product. This proves that λ ∈ R, that is, every eigenvalue of a Hermitian matrix is real.
6.8
More on polynomial approximation
1. (a) The best quadratic approximation to f in the (unweighted) L2 (−1, 1) norm is 3 15e − 105e−1 1 e − e−1 + x+ x2 − . 2 e 4 3
6.8. MORE ON POLYNOMIAL APPROXIMATION
185
(b) The best quadratic approximation to f in the weighted L2 (−1, 1) norm is (approximately) 1.2660659 + 1.1303182x + 0.27149534(2x2 − 1) (note that the integrals were computed numerically). The following graph shows the error in the L2 approximation (solid curve) and the error in the weighted L2 approximation (dashed) curve. 0.1 0.08 0.06 0.04 0.02 0 −1
−0.5
0
0.5
1
We see that the weighted L2 approximation has the smaller maximum error. 2. (a) The best cubic approximation to f in the (unweighted) L2 (−1, 1) norm is e − e−1 3 15e − 105e−1 + x+ 2 e 4
1295e−1 − 175e 1 3 x2 − + x3 − x . 3 4 5
(b) The best cubic approximation to f in the weighted L2 (−1, 1) norm is (approximately) 1.2660659 + 1.1303182x + 0.27149534(2x2 − 1) + 0.044336850(4x3 − 3x) (note that the integrals were computed numerically). The following graph shows the error in the L2 approximation (solid curve) and the error in the weighted L2 approximation (dashed) curve. 0.06 0.05 0.04 0.03 0.02 0.01 0 −1
−0.5
0
0.5
1
We see that the weighted L2 approximation has the smaller maximum error. 3. (a) The orthogonal basis for P3 on [−1, 1] is 5 1 3 3 2 1, x, x − , x − x . 3 5 Transforming this to the interval [0, π], we obtain the following orthogonal basis for P3 as a subspace of L2 (0, π): 5 4 2 4 2 8 3 12 2 24 2 2 t− . 1, t − 1, 2 t − t + , 3 t − 2 t + π π π 3 π π 5π 5 The best cubic approximation to f (t) = sin (t) on [0, π] is p(t) =
2 15(π 2 − 12) + π π3
2 4 2 4 t − t+ 2 π π 3
.
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
186
(b) The orthogonal basis for P3 on [−1, 1], in the weighted L2 inner product, is , 1, x, 2x2 − 1, 4x3 − 3x . Transforming this to the interval [0, π], we obtain the following orthogonal basis for P3 as a subspace of L2 (0, π): 5 2 8 32 8 48 18 1, t − 1, 2 t2 − t + 1, 3 t3 − 2 t2 + t − 1 . π π π π π π The best cubic approximation to f (t) = sin (t) on [0, π], in the weighted L2 norm, is (approximately) 8 2 8 q(t) = 0.47200122 − 0.49940326 t − t+1 π2 π (the integrals were computed numerically). (c) The following graph shows the error in the ordinary L2 approximation (solid curve) and the error in the weighted L2 approximation (dashed curve): 0.06 0.05 0.04 0.03 0.02 0.01 0 0
1
2
3
4
The second approximation has a smaller maximum error. 4. Let f ∈ C[a, b] and define f˜ ∈ C[−1, 1] by f˜(x) = f
b−a (x + 1) . a+ 2
Let p̃ be the best approximation, in the (unweighted) L2 (−1, 1) norm, to f˜ from Pn . Define p ∈ Pn by 2 (t − a) . p(t) = p̃ −1 + b−a We wish to prove that p is the best approximation to f , in the L2 (a, b) norm, from Pn . Let q be any element of Pn , and define q̃ ∈ Pn by b−a (x + 1) . q̃(x) = q a + 2 Then we know that f˜ − p̃ L2 (−1,1) ≤ f˜ − q̃ L2 (−1,1) , that is, 1 1 (f˜(x) − p̃(x))2 dx ≤ (f˜(x) − q̃(x))2 dx. −1
−1
Changing variables in both integrals (x = −1 + (2/(b − a))(t − a)) yields b b 2 2 2 (f (x) − p(x)) dx ≤ (f (x) − q(x))2 dx b−a a b−a a (since the change of variables transforms f˜ to f , p̃ to p, and q̃ to q). But this implies f − p L2 (a,b) ≤ f − q L2 (a,b) . Since this holds for all q ∈ Pn , it follows that p is the best approximation to f in the L2 (a, b) norm. 5. (a) Applying the Gram-Schmidt process to the standard basis for P2 yields the following orthogonal basis: {1, x − 1, x2 − 4x + 2}. (b) The best approximation from P2 , relative to the given inner product, to f (x) = x3 is 6 + 18(x − 1) + 9(x2 − 4x + 2) = 9x2 − 18x + 6.
6.9. THE ENERGY INNER PRODUCT AND GALERKIN’S METHOD
6.9
187
The energy inner product and Galerkin’s method
1. Consider the BVP d − dx
du k(x) = f (x), 0 < x < , dx u(0) = 0, u( ) = 0,
where = 1, k(x) = x + 1, and f (x) = −4x − 1. The exact solution is u(x) = x2 − x. A direct calculation shows that the stiffness matrix K is given by Kii =
1 2 1 + 2i, Ki,i+1 = − − i − h h 2
(with K symmetric and Kij = 0 unless j = i − 1 or j = i or j = i + 1), and the load vector F is given by Fi =
3i 1 4i + 5 − − , i = 1, . . . , n − 1. 500h 25 5
Comparing U = K −1 F (whose components are the approximate nodal values of u) to the exact u shows that the solution is highly accurate in this case. 2. (a) The stiffness matrix has either three or five nonzeros per row, with K2i−1,2i−1 =
16 8 , K2i−1,2i−2 = K2i−1,2i = − 3h 3h
and
14 8 1 , K2i,2i−1 = K2i,2i+1 = − , K2i,2i−2 = K2i,2i+2 = . 3h 3h 3h The load vector F ∈ R2n−1 is defined by K2i,2i =
F2i−1 = (8 cos(πh − πhi) − 8 cos(πhi) + πh(4 sin(πh − πhi) − 4 sin(πhi)))/(π 3 h2 ), F2i = (6 sin(πhi))/(π 2 h) + (2 cos(πh) sin(πhi))/(π 2 h) − (8 sin(πh) sin(πhi))/(π 3 h2 ) for i = 1, . . . , n − 1. (b) Using n = 10 elements, we compute the piecewise quadratic approximation to the exact solution u(x) = sin (πx)/π 2 . The following graph shows the errors in the computed nodal values: −7
6
x 10
5 4 3 2 1 0 0
0.2
0.4
0.6
0.8
1
(It is interesting to notice that the nodal values at endpoint nodes are much more accurate than the nodal values at midpoint nodes.) 3. We wish to explain why the energy inner product dv du a(u, v) = k(x) (x) (x) dx dx dx 0 defines an inner product on V = {v : [0, ] → R | v(0) = v( ) = 0}. We recall that k is assumed to be positive: k(x) > 0 for all x ∈ [0, ]. It is easy to prove that a(·, ·) is symmetric, bilinear, and satisfies a(v, v) ≥ 0 for all v ∈ V ; the only interesting point is showing that a(v, v) = 0 only if v = 0. We have 2 dv a(v, v) = 0 ⇒ (x) k(x) dx = 0; dx 0
188
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION since the integrand is nonnegative, this last equation is possible only if the integrand is identically zero: k(x)
2 dv (x) = 0 for all x ∈ [0, ]. dx
Moreover, since k is positive on [0, ], we see that dv/dx must be identically zero on [0, ], which implies that v is a constant function. Since v ∈ V , we know that v(0) = 0, and hence v is the zero function. Thus a(v, v) = 0 implies that v = 0, and the proof is complete. 4. (a) The midpoints are x2i−1 , i = 1, . . . , n. We have − h42 (x − ih + h)(x − ih), x2i−2 ≤ x ≤ x2i , ψ2i−1 (x) = 0, otherwise. (b) The element endpoints are x2i , i = 1, . . . , n − 1 (we ignore x0 = 0 and x2n = because the boundary conditions are imposed at those points). We have ⎧ 2 − ih + h) x − ih + h2 , x2i−2 ≤ x ≤ x2i , ⎨ h2 (x 2 ψ2i (x) = x − ih − h2 (x − ih − h), x2i ≤ x ≤ x2i+2 , ⎩ h2 0, otherwise. 5. The variational form of the BVP is to find u ∈ V such that 5 dv du k(x) (x) (x) + p(x)u(x)v(x) dx = f (x)v(x) dx for all v ∈ V. dx dx 0 0 If the basis for Vn is {φ1 , . . . , φn }, then Galerkin’s method results in the system (K + M )U = F , where K and F are the same stiffness matrix and load vector as before, and M is the mass matrix: Mij =
0
p(x)φj (x)φi (x) dx, i, j = 1, . . . , n.
6. We wish to find the approximate solution of −
d2 u + u = sin (πx), 0 < x < 1, dx2 u(0) = 0, u( ) = 0
on a uniform mesh with ten elements, using piecewise linear functions. According to the last exercise, the nodal values of the approximate solution are found by solving (K + M )U = F , where K and F are the stiffness matrix and load vector derived in the text, and M is the mass matrix. In this case (piecewise linear finite elements on a uniform mesh), M is the symmetric tridiagonal matrix satisfying Mii =
2h h , Mi,i+1 = . 3 6
The following graph shows the piecewise linear approximation to u: 0.1 0.08 0.06 0.04 0.02 0 0
0.2
0.4
0.6 x
0.8
1
6.10. GAUSSIAN QUADRATURE
189
7. The projection of f onto V can be computed just as described by the projection theorem. The projection n−1 is i=1 Vi φi , where V ∈ Rn−1 satisfies the system M V = B. The Gram matrix is called the mass matrix in this context (see the solution of Exercise 5): Mij = The vector B is defined by Bi =
0
φj (x)φi (x) dx, i, j = 1, . . . , n − 1.
0 f (x)φi (x) dx, i = 1, . . . , n − 1.
8. Let u : [0, ] → R be smooth and let a(·, ·) be the energy inner product defined by the weight function k(x) = 1: u (x)v (x) dx. a(u, v) = 0
Let Vn be the space of all continuous piecewise linear functions defined on a (possibly nonuniform) mesh on [0, ]. The mesh elements are [x0 , x1 ], [x1 , x2 ], . . . , [xn−1 , xn ], where 0 = x0 < x1 < · · · < xn = . Let p be the piecewise linear interpolant of u. We wish to prove that p is the best approximation to u from Vn under the energy norm. By the projection theorem, it suffices to prove that a(u − p, v) = 0 for all v ∈ Vn . For any v ∈ Vn , we have n
a(u − p, v) =
xi
(u (x) − p (x))v (x) dx
i=1 xi−1 n
x (u(x) − p(x))v (x)|xii−1 −
= i=1
xi
(u(x) − p(x))v (x) dx .
xi−1
Since v is piecewise linear, it is linear on [xi−1 , xi ], and hence v (x) = 0 for all x ∈ (xi−1 , xi ). Also, v (xi−1 ) = v (xi ) (notice that we are using limx→x− v (x) for v (xi−1 ), and similarly for v (xi )). If we i−1 write ci = v (xi−1 ) = v (xi ), then we have xi
(u (x) − p (x))v (x) dx = ci (u(xi ) − p(xi ) − u(xi−1 ) + p(xi−1 )) = 0
xi−1
since p(xi ) = u(xi ), p(xi−1 ) = u(xi−1 ) (because p interpolates u at the nodes). This holds for all i, and hence we have shown that a(u − p, v) = 0 for all v ∈ Vn . This completes the proof.
6.10
Gaussian quadrature
1. We wish to find the Gaussian quadrature rule with n = 3 quadrature nodes (on the reference interval [−1, 1]). We know . that the nodes are the.roots of the third orthogonal polynomial, p3 (x). Therefore, the nodes are x1 = − 3/5, x2 = 0, x3 = 3/5. To find the weights, we need the Lagrange polynomials defined by these nodes, which are L1 (x) =
. . 5 5 5 x(x − 3/5), L2 (x) = − x2 + 1, L3 (x) = x(x + 3/5). 6 3 6
CHAPTER 6. ORTHOGONALITY AND BEST APPROXIMATION
190 Then
1 w1 =
−1
L1 (x) dx =
5 , 9
L2 (x) dx =
5 , 9
L3 (x) dx =
8 . 9
1 w2 =
−1 1
w3 =
−1
2. The needed orthogonal polynomials are 1, x, x2 − 2/5, x3 − 9x/14. The quadrature rules are defined by the following weights and nodes: n = 1 : x1 = 0, w1 =
8 , 3
. . 4 n = 2 : x1 = − 2/5, x2 = 2/5, w1 = w2 = , 3 3 3 112 136 112 n = 3 : x1 = − √ , x2 = 0, x3 = √ , w1 = , w2 = , w3 = . 135 135 135 14 14 √ 3. Let w(x) = 1/ 1 − x2 . We wish to find the weighted Gaussian quadrature rule 1
n
. w(x)f (x) dx =
−1
wi f (xi ), i=1
where n = 3. The nodes are the roots of the third orthogonal polynomial under . . this weight function, which is T3 (x) = 4x3 − 3x. The nodes are thus x1 = − 3/4, x2 = 0, x3 = 3/4. The corresponding Lagrange polynomials are L1 (x) =
. . 2 4 2 x(x − 3/4), L2 (x) = − x2 + 1, L3 (x) = x(x + 3/4). 3 3 3
The weights are then Then 1 w1 =
−1
w(x)L1 (x) dx =
π , 3
w(x)L2 (x) dx =
π , 3
w(x)L3 (x) dx =
π . 3
1 w2 =
−1
1 w3 =
−1
4. Suppose distinct nodes x1 , x2 , . . . , xn are given and the quadrature rule 1 n . f (x) dx = wi f (xi ) −1
i=1
is exact for all f ∈ Pn−1 . Let L1 , . . . , Ln be the Lagrange polynomials corresponding to these nodes. Since each Lj belongs to Pn−1 , we have 1 −1
n
Lj (x) dx =
wi Lj (xi ) = wj i=1
(since Lj (xi ) = 0 for i = j and Lj (xj ) = 1). Thus the weights are uniquely determined by the requirement that the quadrature rule be exact for f ∈ Pn−1 .
6.11. THE HELMHOLTZ DECOMPOSITION
6.11
191
The Helmholtz decomposition
1. Let Ω be a domain in R3 , and let φ, u be a scalar field and a vector field, respectively, defined on Ω. We have 3 3 3 3 ∂ui ∂ ∂φ ∂ui ∂φ ∇ · (φu) = = (φui ) = ui + φ ui + φ = ∇φ · u + φ∇ · u. ∂xi ∂xi ∂xi ∂xi ∂xi i=1 i=1 i=1 i=1 2. Let φ : Ω → R be a scalar field, where Ω is a domain in R3 . Then ∂φ ∂φ ∂φ ∇φ = , , ∂x1 ∂x2 ∂x3
and ∇ × ∇φ =
∂2φ ∂2φ ∂2φ ∂2φ ∂ 2φ ∂2φ − , − , − ∂x2 ∂x3 ∂x3 ∂x2 ∂x3 ∂x1 ∂x1 ∂x3 ∂x1 ∂x2 ∂x2 ∂x1
.
Since φ is assumed to be smooth, mixed partial derivatives are equal (∂ 2 φ/∂xi ∂xj = ∂ 2 φ/∂xj ∂xi ), and hence we see that ∇ × ∇φ = 0. 3. Let φ : Ω → R be a smooth scalar field. Then 3
∂ ∇ · ∇φ = ∂x i i=1
∂φ ∂xi
3
=
∂2φ ∂2φ ∂2φ ∂2φ = + + . 2 ∂xi ∂x21 ∂x22 ∂x23 i=1
4. Let φ : Ω → R, v : Ω → R3 be smooth scalar and vector fields, respectively. (a) We have
∇×v =
∂v2 ∂v1 ∂v3 ∂v2 ∂v1 ∂v3 − , − , − ∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2
,
and therefore ∇·∇×v =
∂ 2 v2 ∂ 2 v1 ∂ 2 v3 ∂ 2 v2 ∂ 2 v1 ∂ 2 v3 − + − + − . ∂x1 ∂x2 ∂x1 ∂x3 ∂x2 ∂x3 ∂x2 ∂x1 ∂x3 ∂x1 ∂x3 ∂x2
Since v is smooth, mixed partial derivatives are equal, which implies that all the terms cancel. Thus ∇ · ∇ × v = 0. (b) It follows that
(∇ × v) · n =
Ω
∂Ω
∇·∇×v = 0
(since ∇ · ∇ × v = 0). (c) Thus, if φ, v satisfy u = ∇φ + ∇ × v, then u·n= {∇φ · n + (∇ × v) · n} = ∂Ω
∂Ω
∂Ω
∇φ · n +
(∇ × v) · n =
∂Ω
∇φ · n. ∂Ω
Chapter 7
The spectral theory of symmetric matrices 7.1
The spectral theorem for symmetric matrices
1. Let A ∈ Rm×n . Then (AT A)T = AT (AT )T = AT A and, for every x ∈ Rn , x · AT Ax = (Ax) · (Ax) = Ax 22 ≥ 0. Therefore, AT A is symmetric and positive semidefinite. 2. Let A ∈ Rn×n be symmetric. First assume that A is positive semidefinite, and let λ ∈ R, x ∈ Rn be an eigenvalue/eigenvector pair of A. Then λ(x · x) = (λx) · x = (Ax) · x ≥ 0 ⇒ λ =
(Ax) · x ≥ 0. x·x
This shows that every eigenvalue of A is nonnegative. Conversely, suppose all the eigenvalues of A are nonnegative. Let the eignevalues of A (listed according to multiplicity) be λ1 , . . . , λn , and let x1 , . . . , xn be corresponding eigenvectors, chosen so that {x1 , . . . , xn } is orthonormal. Then A = XDX T , where D is the diagonal matrix with diagonal entries λ1 , . . . , λn and X = [x1 | · · · |xn ]. For any y ∈ Rn , write u = X T y. Then n
y · Ay = y · XDX T = (X T y) · D(X T y) = u · Du =
λi u2i .
i=1
Since λi ≥ 0, u2i ≥ 0 for all i, this shows that y · (Ay) ≥ 0. This holds for all y ∈ Rn , and hence A is positive semidefinite. 3. Let A ∈ Rn×n be symmetric and positive definite. Define x, y = x · Ay for all x, y ∈ Rn . We wish to show that that ·, · is an inner product on Rn . For any x, y ∈ Rn , we have x, y = x · Ay = Ax · y = y · Ax = y, x , and thus ·, · is symmetric. Next, if x, y, z ∈ Rn and α, β ∈ R, then αx + βy, z = (αx + βy) · Az = α(x · Az) + β(y · Az) = α x, z + β y, z . This shows that ·, · is bilinear. Finally, because A is positive definite, we have x, x = x · Ax ≥ 0 for all x ∈ Rn and x, x = 0 if and only if x = 0. Thus we have shown that ·, · is an inner product. 193
CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES
194
4. Let U ∈ Rn×n be an orthogonal matrix. Then, for any x, y ∈ Rn , (U x) · (U y) = x · (U T U y) = x · (Iy) = x · y. Also, for any x ∈ Rn , Ux 2 =
. √ (U x) · (U x) = x · x = x 2 .
Thus multiplication by U preserves both norms and dot products. 5. Suppose A ∈ Rn×n satisfies
(Ax) · (Ay) = x · y for all x, y ∈ Rn .
Then x · (AT Ay) = x · y for all x, y ∈ Rn ⇒ AT Ay = y for all y ∈ Rn (by Corollary 275) ⇒ AT A = I. The last step follows from the uniqueness of the matrix representing a linear operator on Rn . Since AT A = I, we see that A is orthogonal. 6. Suppose A ∈ Rn×n satisfies Ax 2 = x 2 for all x ∈ Rn . In this case also, we can show that A is orthogonal. For all x, y ∈ Rn , we have A(x + y) 22 = x + y 22 ⇒
Ax + Ay 22 = x + y 22
⇒ (Ax + Ay) · (Ax + Ay) = (x + y) · (x + y) ⇒ (Ax) · (Ax) + 2(Ax) · (Ay) + (Ay) · (Ay) = x · x + 2x · y + y · y ⇒ Ax 22 + 2(Ax) · (Ay) + Ay 22 = x 22 + 2x · y + y 22 ⇒ (Ax) · (Ay) = x · y. Thus multiplication by A preserves inner products and, by the previous exercise, A must be orthogonal. 7. Let A ∈ Cn×n be a Hermitian positive definite matrix, and let λ1 , . . . , λn be the eigenvalues of A (listed according to multiplicity). Then all the eigenvalues of A are positive (by Theorem 342), and there exists a unitary matrix U ∈ Cn×n such that U ∗ AU = D is diagonal with diagonal entries √ √ λ1 , . . . , λn . Define B = U D1/2 U ∗ , where D1/2 is the diagonal matrix with diagonal entries λ1 , . . . , λn . We then have B 2 = U D1/2 U ∗ U D1/2 U ∗ = U D1/2 D1/2 U ∗ = U DU ∗ = A, and B is the square root of A. 8. Let X be a finite-dimensional inner product space over C, and let T : X → X be a self-adjoint linear operator. (a) If λ ∈ C, x ∈ X is an eigenvalue/eigenvector pair of T (T (x) = λx, x = 0), then λ x, x = λx, x = T (x), x = x, T (x) = x, λx = λ x, x . This shows that (λ − λ) x, x = 0. Since x = 0, x, x = 0, and hence λ = λ must hold, which implies that λ is real. Thus T has only real eigenvalues. (b) Now suppose λ1 , λ2 ∈ R are distinct eigenvalues of T , with corresponding eigenvectors x1 , x2 . Then λ1 x1 , x2 = λ1 x1 , x2 = T (x1 ), x2 = x1 , T (x2 ) = x1 , λ2 x2 = λ2 x1 , x2 , which shows that (λ1 − λ2 ) x1 , x2 = 0. Since λ1 = λ2 , it follows that x1 , x2 = 0, and hence eigenvectors corresponding to distinct eigenvalues are orthogonal.
7.1. THE SPECTRAL THEOREM FOR SYMMETRIC MATRICES
195
9. Let X be a finite-dimensional inner product space over R with basis X = {x1 , . . . , xn }, and assume that T : X → X is a self-adjoint linear operator (T ∗ = T ). Define A = [T ]X ,X . Let G be the Gram matrix for the basis X , and define B ∈ Rn×n by B = G1/2 AG−1/2 , where G1/2 is the square root of G (see Exercise 7) and G−1/2 is the inverse of G1/2 . (a) We wish to prove that B is symmetric. Let x, y ∈ X be given, and define α = [x]X , β = [y]X . Then, since [T (x)]X = A[x]X = Aα, we have T (x) = ni=1 (Aα)i xi . Therefore, 3 2 n n n n T (x), y =
(Aα)i xi , i=1
(Aα)i βj xi , xj
=
βj xj j=1
i=1 j=1 n
=
⎛
(Aα)i ⎝
i=1
⎞
n
Gij βj ⎠
j=1
= (Aα) · Gβ = α · (AT G)β. Similarly, [T (y)]X = Aβ, and an analogous calculation shows that x, T (y) = α · GAβ. Since T (x), y = x, T (y) , it follows that α · (AT G)β = α · GAβ for all α, β ∈ Rn , and hence GA = AT G. This implies that (GA)T = AT GT = AT G = GA (using the fact that G is symmetric), and hence GA is symmetric. (Thus the fact that T is self-adjoint does not imply that A is symmetric, but rather than GA is symmetric.) Now, it is easy to see that G−1/2 is symmetric, and if C is symmetric, then so is XCX T for any square matrix X. Therefore, G−1/2 GA(G−1/2 )T = G−1/2 GAG−1/2 = G1/2 AG−1/2 is symmetric, which is what we wanted to prove. (b) If λ, u is an eigenvalue/eigenvector pair of B, then Bu = λu ⇒ G1/2 AG−1/2 u = λu ⇒ AG−1/2 u = λG−1/2 u. Since u = 0 and G−1/2 is nonsingular, G−1/2 u is also nonzero, and hence λ, G−1/2 u is an eigenpair of A. (c) Since B is symmetric, there exists an orthonormal basis {u1 , . . . , un } of Rn consisting of eigenvectors of B. Let λ1 , . . . , λn be the corresponding eigenvalues. From above, we know that {G−1/2 u1 , . . . , G−1/2 un } is a basis of Rn consisting of eigenvectors of A. Define {y1 , . . . , yn } ⊂ X by [yi ]X = G−1/2 ui , that is, n yi = G−1/2 ui xk , i = 1, . . . , n. k
k=1
Notice that
2 n yi , yj =
G−1/2 ui
k=1 n n
= k=1 =1
n k
G−1/2 ui
k
3 −1/2
G
xk ,
=1
G−1/2 uj
= G−1/2 ui · G G−1/2 uj = ui · G−1/2 GG−1/2 uj 1, i = j, = ui · uj = 0, i = j.
uj
x
xk , x
CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES
196
This shows that {y1 , . . . , yn } is an orthonormal basis for X. Also, [T (yi )]X = A[yi ]X = AG−1/2 ui = λi G−1/2 ui = λi [yi ]X = [λi yi ]X , which proves that T (yi ) = λi yi . Thus each yi is an eigenvector of T , and we have proved that there exists an orthonormal basis of X consisting of eigenvectors of T . 10. Let A ∈ Cn×n be Hermitian, and suppose λ ∈ C is an eigenvalue of A. We wish to prove that λ is real. Let x ∈ Cn be an eigenvalue corresponding to λ. Then x = 0 and λ x, x Cn = λx, x Cn = Ax, x Cn = x, Ax Cn = x, λx Cn = λ x, x Cn , which implies that (λ − λ) x, x Cn = 0. Since x, x Cn = 0, this implies that λ = λ, and hence λ ∈ R. 11. The proof of Theorem 338 is exactly the same as the proof of Theorem 331, with a few changes in notation. The eigenvalue λ still belongs to R, but now the vectors x1 , . . . , xn lie in Cn , the dot products xi · Axj are replaced by inner products xi , Axj Cn , and the matrix X is unitary, with B defined by X ∗ AX. Otherwise, the proof proceeds virtually unchanged. 12. Let A ∈ Cn×n be Hermitian, let λ1 , λ2 ∈ R be distinct eigenvalues of A, and let x1 , x2 ∈ Cn be corresponding eigenvectors. Then λ1 x1 , x2 Cn = λ1 x1 , x2 Cn = Ax1 , x2 Cn = x1 , Ax2 Cn = x1 , λ2 x2 Cn = λ2 x1 , x2 Cn , which shows that (λ1 − λ2 ) x1 , x2 Cn = 0. Since λ1 = λ2 , it follows that x1 , x2 Cn = 0, and hence eigenvectors corresponding to distinct eigenvalues are orthogonal. 13. Let A ∈ Cn×n be Hermitian, and let λ1 , . . . , λn ∈ R be the eigenvalues of A, listed according to multiplicity. For each multiple eigenvalue λ (if any), Theorem 338 shows that the geometric multiplicity of λ equals its algebraic multiplicity. Hence if the algebraic mutiplicity of λ is k (which means that λ appears k times in the list λ1 , . . . , λn , then we can find k linearly independent eigenvectors corresponding to λ, and then we can replace them with an orthonormal basis for EA (λ) by applying the Gram-Schmidt process. If we do this for each eigenspace, we obtain eigenvectors x1 , . . . , xn corresponding to λ1 , . . . , λn . Since eigenvectors corresponding to distinct eigenvalues are automatically orthogonal, it follows that {x1 , . . . , xn } is an orthonormal basis for Cn consisting of eigenvectors of Cn . Finally, by defining X = [x1 | · · · |xn ], it follows that X is unitary (since {x1 , . . . , xn } is orthonormal), and A = XDX ∗ , as desired. 14. Let A ∈ Cn×n be Hermitian. By Theorem 340, there exists a unitary matrix X ∈ Cn×n such that X ∗ AX = D is diagonal, and the diagonal entries of D are the eigenvalues λ1 , . . . , λn of A. For any u ∈ Cn , we have u, Au Cn = u, XDX ∗u Cn = X ∗ u, DX ∗ u Cn = v, Dv Cn =
n
λi vi2 ,
i=1 ∗
∗
where v = X u. Since X is nonsingular, v = 0 if and only if u = 0. The formula n
u, Au Cn =
λi vi2
i=1
shows immediately that A is positive semidefinite if all the eigenvalues of A are nonnegative. Also, if all the eigenvalues of A are positive and u = 0, then v = 0, implying that at least one vi = 0, and hence the above formula shows that u, Au Cn > 0. Therefore, if all the eigenvalues of A are positive, then A is positive definite.
7.1. THE SPECTRAL THEOREM FOR SYMMETRIC MATRICES
197
Conversely, notice that, if X = [x1 | · · · |xn ] and hence xi is the (normalized) eigenvector corresponding to λi , then xi , Axi Cn = xi , λi xi Cn = λi xi , xi Cn = λi . If A is positive semidefinite, this shows that λi ≥ 0 for all i, and if A is positive definite, it shows that λi > 0 for all i. This proves Theorem 342. 15. Let X be an inner product space over C, and let X = {x1 , . . . , xn } be a basis for X. We wish to show that the Gram matrix G corresponding to X is Hermitian and positive definite. By definition, Gij = xj , xi X , Gji = xi , xj X , and therefore Gji = Gij for all i, j, that is, G∗ = G. This shows that G is Hermitian. Now, given any n α ∈ Cn , define x = i=1 αi xi ∈ X, and notice that x = 0 if and only if α = 0. Then n
n
α, Gα Cn =
n
αi (Gα)i = i=1
Gij αj αi i=1 j=1 n n
xi , xj X αi αj
= i=1 j=1
2 n =
3
n
αi xi , i=1
αj xj j=1
X
= x, x X . Therefore, α, Gα Cn > 0 for all α ∈ Cn , α = 0, which shows that G is positive definite. 16. Let L ∈ R(n−1)×(n−1) be defined by
⎤
⎡
2 −1 ⎢ −1 2 −1 1 ⎢ ⎢ .. .. L= 2⎢ . . h ⎢ ⎣ −1
⎥ ⎥ ⎥ .. ⎥ . ⎥ 2 −1 ⎦ −1 2
(a,b) For each k = 1, 2, . . . , n − 1, define U (k) ∈ Rn−1 by (k)
Ui
= sin (kπxi ), i = 1, 2, . . . , n − 1.
We will show that U (k) is an eigenvector of L for each k. The trigonometric identities sin (α ± β) = sin (α) cos (β) ± sin (β) cos (α) will be useful. We have (k)
(k)
− Ui−1 + 2Ui
(k)
− Ui+1
= − sin (kπih − kπh) + 2 sin (kπih) − sin (kπih + kπh) = − sin (kπih) cos (kπh) + cos (kπih) sin (kπh) + 2 sin (kπih) − sin (kπih) cos (kπh) − cos (kπih) sin (kπh) = (2 − 2 cos (kπh)) sin (kπih) (k)
= (2 − 2 cos (kπh))Ui . This shows that (LU (k) )i = (λk U (k) )i for i = 2, . . . , n − 2, where λk =
2 − 2 cos (kπh) . h2 (k)
(k)
Moreover, the above calculation is valid for i = 1 and i = n − 1 as well, since U0 , Un−1 —which should be omitted from the calculation—are zero by the formula above. Thus we have shown that U (k) is an eigenvector of L corresponding to the eigenvalue λk , k = 1, . . . , n − 1.
CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES
198
(c) We wish to show that λk > 0 for all k and all n. The smallest λk is 2 − 2 cos πn λ1 = . h2 Since cos (π/n) is strictly less than 1, we see that λ1 > 0, regardless of the value of n > 1. (A simple Taylor expansion of cos (π/n) at 0 shows that λ1 → π 2 as n → ∞.)
7.2
The spectral theorem for normal matrices
1. The matrix A is normal since AT A = AAT = 2I. We have A = U DU ∗ , where " ! √i √i − 1−i 0 2 2 . , D= U = √1 √1 0 1+i 2 2 2. The matrix A is normal since
⎤ 2 1 −1 AT A = AAT = ⎣ 1 2 1 ⎦. −1 1 0 ⎡
We have A = U DU ∗ , where ⎡ √1 ⎢ U =⎣
3 √1 3 √1 3
1 − 2√ + 12 i 3 1 − 2√ + 12 i 3 √1 3
1 − 2√ − 12 i 3 1 − 2√ − 12 i 3 √1 3
⎤
⎡ ⎤ 0 0 0 √ ⎥ ⎦ , D = ⎣ 0 − 3i √0 ⎦ . 3i 0 0
3. Let A ∈ F n×n be given, define T : F n → F n by T (x) = Ax for all x ∈ F n , and let X = {x1 , · · · , xn } be a basis for F n . We wish to prove that [T ]X ,X = X −1 AX, where X = [x1 | · · · |xn ]. Notice that for any y ∈ F n , y = X(X −1 y), which implies that [y]X = X −1 y. Therefore, since T (xi ) = Axi , we have [T (xi )]X = X −1 Axi . Since [T (xi )]X is the ith column of [T ]X ,X , we see that [T ]X ,X = [X −1 Ax1 | · · · |X −1 Axn ] = X −1 A[x1 | · · · |xn ] = X −1 AX, as desired. 4. Let A ∈ Cn×n be normal and λ ∈ C. Then (A − λI)∗ (A − λI) = (A∗ − λI)(A − λI) = A∗ A − λA − λA∗ + |λ|2 I, (A − λI)(A − λI)∗ = (A − λI)(A∗ − λI) = AA∗ − λA∗ − λA + |λ|2 I. Since A∗ A = AA∗ by assumption, we see that (A − λI)∗ (A − λI) = (A − λI)(A − λI)∗ , and hence A − λI is normal. 5. Let A ∈ Cn×n . (a) Suppose there exists a unitary matrix X ∈ Cn×n and a diagonal matrix D ∈ Rn×n such that A = XDX ∗ . Then, since D ∈ Rn×n , D∗ = DT = D, and therefore A∗ = (XDX ∗ )∗ = (X ∗ )∗ D∗ X ∗ = XDX ∗ = A. Thus A is Hermitian.
7.2. THE SPECTRAL THEOREM FOR NORMAL MATRICES
199
(b) Suppose there exists a unitary matrix X ∈ Cn×n and a diagonal matrix D ∈ Cn×n such that A = XDX ∗ . If D is the diagonal matrix with diagonal entries λ1 , . . . , λn ∈ C, then D∗ is the diagonal matrix with diagonal entries λ1 , . . . , λn , and D∗ D = DD∗ , with both products equal to the diagonal matrix with diagonal entries |λ1 |2 , . . . , |λn |2 . Therefore, A∗ A = XD∗ X ∗ XDX ∗ = XD∗ DX ∗ = XDD∗ X ∗ = XDX ∗ XD∗ X ∗ = AA∗ , and thus A is normal. 6. Let A ∈ R2×2 be defined by
A=
a c
b d
.
We wish to find necessary and sufficient conditions on a, b, c, d ∈ R for A to be normal. The equation AT A = AAT is equivalent to the three equations a2 + b2 = a2 + c2 , c2 + d2 = b2 + d2 , ac + bd = ab + cd, which reduce to
b2 = c2 , ac + bd = ab + cd.
The first equation implies that c = ±b. If c = b, then A is symmetric and hence normal. If c = −b, then the second equation above reduces to −ab + bd = ab − bd or, equivalently, b(a − d) = 0. Thus either b = c = 0 (in which case A is symmetric, as in the first case) or b = 0, c = −b, and d = a. We thus find two classes of normal matrices in R2×2 : a b a b . , −b a b d 7. Let A ∈ Rn×n be skew-symmetric. (a) Since AT A = AAT = −A2 , it follows that A is normal (b) Hence there exist a unitary matrix X ∈ Cn×n and a diagonal matrix D ∈ Cn×n such that A = XDX ∗ , or, equivalently, D = X ∗ AX. We then have D∗ = X ∗ A∗ X = X ∗ (−A)X = −X ∗ AX = −D. Therefore, each diagonal entry λ of D, that is, each eigenvalue of A, satisfies λ = −λ. This implies that λ is of the form iθ, θ ∈ R. Thus a skew-symmetric matrix has only purely imaginary eigenvalues. 8. (a) If A ∈ Rn×n is orthogonal, then AT A = I = AAT , and hence A is normal. (b) If A ∈ Cn×n is unitary, then A∗ A = I = AA∗ , and hence A is normal. 9. Let A ∈ Cn×n be upper triangular (Aij = 0 for i > j) and normal. We wish to prove that A is a diagonal matrix. Since A is upper triangular, its eigenvalues are its diagonal entries, which are real numbers. Since A is normal, there exist a unitary matrix X ∈ Cn×n and a diagonal matrix D ∈ Rn×n such that A = XDX ∗ (note that D ∈ Rn×n , rather than D ∈ Cn×n as usual, because the eigenvalues of A are known to be real). But then A∗ = XD∗ X ∗ = XDX ∗ = A, which implies that A is Hermitian. However, a real Hermitian matrix is symmetric, and a symmetric upper triangular matrix is diagonal. This completes the proof. 10. Suppose A, B ∈ Cn×n are normal and commute (AB = BA). Then, by Exercise 5.4.18, A and B are simultaneously diagonalizable; that is, there exists a unitary matrix X ∈ Cn×n such that X ∗ AX = D and X ∗ BX = C are both diagonal matrices in Cn×n . We then have AB = XDX ∗ XCX ∗ = X(DC)X ∗ , (AB)∗ = X(DC)∗ X ∗ , which proves that AB and (AB)∗ are simultaneously diagonalizable. By Exercise 4.6.14, this proves that AB and (AB)∗ commute, which in turn implies that AB is normal.
CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES
200
11. Suppose A, B ∈ Cn×n are normal and commute (AB = BA). Using the notation of the previous exercise, we have A + B = XDX ∗ + XCX ∗ = X(D + C)X ∗ , (A + B)∗ = X(D + C)∗ X ∗ . Since A + B and (A + B)∗ are simultaneously diagonalizable, it follows that they commute and hence A + B is normal. 12. Suppose A ∈ Cn×n is normal. We wish to prove that there exists a polynomial p ∈ C[r] of degree at most n − 1 such that A∗ = p(A). By the results in Section 2.8, we know that there exists a polynomial p of degree at most n − 1 such that p(λ) = λ for all eigenvalues λ of A. (To see this, let λ1 , . . . , λt be the distinct eigenvalues of A. Then t ≤ n, and Section 2.8 shows that there is a unique polynomial p of degree t − 1 such that p(λi ) = λi for i = 1, . . . , t.) Since A is normal, there exist a unitary matrix X ∈ Cn×n and a diagonal matrix D ∈ Cn×n such that A = XDX ∗ . The diagonal entries of D are the eigenvalues of A, and it follows that p(D) = D = D∗ . It follows that p(A) = p(XDX ∗ ) = Xp(D)X ∗ = XD∗ X ∗ = A∗ . 13. Suppose A ∈ Cn×n satisfies A∗ x 2 = Ax 2 for all x ∈ Cn . We wish to show that A is normal. We have x, AA∗ x Cn = x, A∗ Ax Cn for all x ∈ Cn . Therefore, for every x, y ∈ Cn , x + y, AA∗ (x + y) Cn = x + y, A∗ A(x + y Cn
⇒ x, AA∗ x Cn + x, AA∗ y Cn + y, AA∗ x Cn + y, AA∗ y Cn
= x, A∗ Ax Cn + x, A∗ Ay Cn + y, A∗ A Cn + y, A∗ A Cn ⇒ x, AA∗ y Cn + y, AA∗ x Cn = x, A∗ Ay Cn + y, A∗ A Cn ⇒ Re x, AA∗ y Cn = Re x, A∗ Ay Cn .
The last stop follows from the fact that λ + λ = 2Reλ for all complex numbers λ. We have shown that Re x, AA∗ y Cn = Re x, A∗ Ay Cn for all x, y ∈ Cn . Now, note that Re(−iλ) = Imλ. Hence, replacing y by iy in the above identity, we obtain Re(−i x, AA∗ y Cn ) = Re(−i x, A∗ Ay Cn ) for all x, y ∈ Cn , that is,
Im x, AA∗ y Cn = Im x, A∗ Ay Cn for all x, y ∈ Cn .
Thus it follows that x, AA∗ y Cn = x, A∗ Ay Cn for all x, y ∈ Cn . But this implies that AA∗ y = A∗ Ay for all y ∈ Cn , and hence that AA∗ = A∗ A. Thus we have proved that A is normal. 14. Let V be an inner product space over R or C, and let u, v be nonzero vectors in V . (a) For all x ∈ V , (u ⊗ v)(x) = x, v u ∈ sp{u}. This shows that R(u ⊗ v) = sp{u} and rank(u ⊗ v) = 1. (b) Define W = sp{v}⊥ , and note that, since u = 0, (u ⊗ v)(x) = 0 ⇔ x, v u = 0 ⇔ x, v = 0 ⇔ x ∈ W. Thus λ = 0 is an eigenvalue of u ⊗ v, and the associated eigenspace is W . Since R(u ⊗ v) = sp{u}, if there is a nonzero eigenvalue, the only associated eigenvector could be u. We have (u⊗v)u = u, v u, so there are two cases: • u ∈ W (that is, u, v = 0). The only eigenvalue is 0, with eigenspace W ; • u ∈ W (that is, u, v = 0). The eigenvalues are λ = 0, λ = u, v , with eigenspaces W and sp{u}, respectively. (c) Let us assume that dim(V ) = n and write L = u ⊗ v. If u ∈ W , then 0 is the only eigenvalue, which yields pL (r) = rn , det(L) = 0, and tr(L) = 0 (since det(L) and tr(L) are the product and sum of the eigenvalues, respectively). If u ∈ W , then pL (r) = rn−1 (r − u, v ), det(L) = 0, and tr(L) = u, v .
7.2. THE SPECTRAL THEOREM FOR NORMAL MATRICES
201
(d) For any x, y ∈ V , we have ; : L(x), y = x, v u, y = x, v u, y = x, u, y v = x, y, u v = x, (v ⊗ u)(y) . This shows that (u ⊗ v)∗ = v ⊗ u. 15. Let A ∈ F m×n and B ∈ F n×p , where F represents R or C. We wish to find a formula for the product AB in terms of outer products of the columns of A and the rows of B. Let A = [c1 | · · · |cn ], ⎡
⎤ r1 ⎢ ⎥ B = ⎣ ... ⎦ . rn Then
n
ck rkT .
AB = k=1
To verify this, notice that (ck rkT )ij = Aik Bkj , and hence
n ck rkT k=1
n
= ij
ck rkT ij =
k=1
n
Aik Bkj = (AB)ij . k=1
This holds for all i, j, and thus verifies the formula above. It follows that the linear operator defined by AB can be written as n
ck ⊗ rk . k=1
16. Let V be an inner product space over R and let u ∈ V have norm one. Define T : V → V by T = I −2u⊗u, where I : V → V is the identity operator. It is easy to show that I ∗ = I, and hence T ∗ = (I − 2u ⊗ u)∗ = I ∗ − 2(u ⊗ u)∗ In Exercise 14, we saw that (u ⊗ u)∗ = u ⊗ u, and hence T ∗ = I − 2u ⊗ u = T . Thus T is self-adjoint. We then have T ∗ T = (I − 2u ⊗ u)(I − 2u ⊗ u) = I − 2u ⊗ u − 2u ⊗ u + 4(u ⊗ u)2 = I − 4u ⊗ u + 4(u ⊗ u)2 . Now, for any x ∈ V , (u ⊗ u)2 (x) = (u ⊗ u)((u ⊗ u)(x)) = (u ⊗ u)( x, u u) = x, u (u ⊗ u)(u) = x, u u, u u = x, u u (since u, u = 1) = (u ⊗ u)(x). This shows that (u ⊗ u)2 = u ⊗ u, and hence that T ∗ T = I − 4u ⊗ u + 4(u ⊗ u)2 = I − 4u ⊗ u + 4u ⊗ u = I. This shows that T is orthogonal.
CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES
202
7.3
Optimization and the Hessian matrix
1. Suppose A ∈ Rn×n and define Asym = (1/2) A + AT . We have (Asym )T =
T 1 1 1 T A + AT A +A = A + AT = Asym , = 2 2 2
and hence Asym is symmetric. Also, for any x ∈ Rn , 1 1 A + AT x = x · Ax + AT x x · Asym x = x · 2 2 1 1 = x · Ax + x · AT x 2 2 1 1 = x · Ax + Ax · x 2 2 1 1 = x · Ax + x · Ax = x · Ax. 2 2 2. Let A ∈ Rn×n be symmetric and positive semidefinite, but not positive definite (that is, assume that A is singular). Suppose b ∈ col(A) and define q : Rn → R by q(x) =
1 x · Ax + b · x + c. 2
Let x∗ be a solution to Ax = −b. We wish to show that every vector in x∗ + N (A) is a global minimizer of q. Substituting b = −Ax∗ in the definition of q(x), we obtain q(x) =
1 1 1 x · Ax − x · Ax∗ + c = (x − x∗ ) · A(x − x∗ ) + c − x∗ · Ax∗ . 2 2 2
Since A is positive semidefinite, (x − x∗ ) · A(x − x∗ ) ≥ 0 for all x ∈ Rn . It follows that q(x) ≥ q(x∗ ) for all x ∈ Rn , and hence x∗ is a global minimizer of q. Also, if z ∈ N (A), then 1 ∗ 1 (x + z − x∗ ) · A(x∗ + z − x∗ ) + c − x∗ · Ax∗ 2 2 1 ∗ 1 ∗ = z · Az + c − x · Ax 2 2 1 ∗ ∗ = c − x · Ax = q(x∗ ), 2
q(x∗ + z) =
since Az = 0. Thus x∗ + z is a global minimizer, and we have shown that every vector in x∗ + N (A) is a global minimizer of q. 3. Let A ∈ Rn×n be symmetric and positive semidefinite, but not positive definite. Define q : Rn → R by q(x) =
1 x · Ax + b · x + c. 2
Suppose b ∈ col(A). Let x∗ be a least-squares solution of Ax = −b, and define y = −b − Ax∗ . Then we have −b = Ax∗ + y and, by the projection theorem, y ∈ col(A)⊥ = N (AT ) = N (A). We can now write 1 x · Ax − x · (Ax∗ + y) + c 2 1 = x · Ax − x · Ax∗ − x · y + c 2 1 1 = (x − x∗ ) · A(x − x∗ ) − x · y + c − x∗ · Ax∗ . 2 2
q(x) =
7.3. OPTIMIZATION AND THE HESSIAN MATRIX
203
It follows that 1 ∗ (x + αy − x∗ ) · A(x∗ + αy − x∗ ) − x∗ · y − αy · y+ 2 1 c − x∗ · Ax∗ 2 1 1 = y · Ay − αy · y + c − x∗ · Ax∗ − c∗ · y 2 2 1 = − αy · y + c − x∗ · Ax∗ − c∗ · y 2
q(x∗ + αy) =
Thus q(x∗ + αy) = c̃ − α y 22 , where c̃ is constant with respect to α. Since b ∈ col(A), it follows that y = 0, and hence q(x∗ + αy) → −∞ as α → ∞. 4. Let f : Rn → R be smooth, let x∗ , p ∈ Rn , and define φ : R → R by φ(α) = f (x∗ + αp). By the chain rule for functions of several variables (thinking of f (x∗ + αp) as f (x∗1 + αp1 , . . . , x∗n + αpn )), we have n
n
∂f ∗ ∂f ∗ d (x + αp) (x∗i + αpi ) = (x + αp)pi φ (α) = ∂x dα ∂x i i i=1 i=1
= ∇f (x∗ + αp) · p and φ (α) =
n
n
n
n
∂2f d (x∗ + αp)pi (x∗j + αpj ) ∂x ∂x dα j i j=1 i=1
∂2f (x∗ + αp)pi pj ∂x ∂x j i j=1 i=1 n n ∂2f ∗ = pi (x + αp)pj ∂xj ∂xi i=1 i=1 =
= p · ∇2 f (x∗ + αp)p. 5. The eigenvalues of A are λ = −1, λ = 3. Since A is indefinite, q has no global minimizer (or global maximizer). 6. The eigenvalues of A are λ = 1, λ = 3. Since A is positive definite, q has a unique global minimizer, x∗ = −A−1 b = (2, −3). 7. The eigenvalues of A are λ = 0, λ = 5, and therefore A is positive semidefinite and singular. The vector b belongs to col(A), and therefore, by Exercise 2, every vector in x∗ + N (A), where x∗ is any solution of Ax = −b, is a global minimizer of q. (In other words, every solution of Ax = −b is a global minimizer.) We can take x∗ = (−1, 0), and N (A) = sp{(2, −1)}. 8. Let f : R2 → R be defined by f (x) = 100x22 − 200x21 x2 + 100x41 + x21 − 2x1 + 1. We have
∇f (x) = ∇2 f (x) =
−400(x2 − x21 )x1 + 2(x1 − 1) 200(x2 − x21 ) −400(x2 − x21 ) + 800x21 + 2 −400x1
, −400x1 200
.
CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES
204
It is easy to see that ∇f (x) = 0 has a unique solution, x∗ = (1, 1), so f has a unique stationary point. We have 802 −400 . ∇2 f (x∗ ) = −400 200 A direct calculation shows that the eigenvalues of ∇2 f (x∗ ) are both positive; therefore, ∇2 f (x∗ ) is positive definite and x∗ is a local minimizer of f .
7.4
Lagrange multipliers
1. Suppose x∗ is a local minimizer of min f (x) s.t. g(x) = 0 and a regular point of the constraint g(x) = 0. There there exists a Lagrange multiplier λ∗ ∈ Rm satisfying ∇f (x∗ ) = ∇g(x∗ )λ∗ . Since x∗ is a regular point of g(x) = 0, ∇g(x∗ ) has full rank, implying that the columns of ∇g(x∗ ) are linearly independent and hence that N (∇g(x∗ )) is trivial. Therefore, the equation ∇g(x∗ )λ = ∇f (x∗ ) has at most one solution, which proves that λ∗ is unique. 2. The maximizer of f is (−3, −3, 18), with Lagrange multiplier (6/5, 1/5) and f (x) = 18. The minimizer of f is (2, 2, 8), with Lagrange multiplier (4/5, −1/5) and f (x) = 8. . . 3. The maximizer is x = (−0.058183, 0.73440, −0.67622), with Lagrange multiplier λ = (3.1883, −5.7454) . . and f (x) = 17.923, while the minimizer is x = (0.67622, −0.73440, 0.058183), with Lagrange multiplier . . λ = (0.81171, −5.7454) and f (x) = 14.077. 4. Let f : R3 → R and g : R3 → R2 be defined by f (x) = x21 + (x2 − 1)2 + x23 , g(x) =
x22 − x3 2x22 − x3
.
(a) If g(x) = 0, then x3 = x22 and also x3 = 2x22 , which implies that x22 = 2x22 , and hence that x2 = x3 = 0. It is now easy to see that g(x) = 0 if and only if x2 = x3 = 0, that is, if and only if x lies on the x1 -axis. (b) For each feasible x, f (x) = x21 + 1, and therefore the global minimizer occurs for x∗ = (x∗1 , 0, 0) = (0, 0, 0). (c) We have
and
⎤ ⎡ ⎤ 2x1 0 ∇f (x) = ⎣ 2(x2 − 1) ⎦ ⇒ ∇f (x∗ ) = ⎣ −2 ⎦ 0 2x3 ⎡
⎡
0 ∇g(x) = ⎣ 2x2 −1
⎤ ⎡ ⎤ 0 0 0 0 ⎦. 4x2 ⎦ ⇒ ∇g(x∗ ) = ⎣ 0 −1 −1 −1
It it obvious that there is no solution to ∇g(x∗ )λ = ∇f (x∗ ). This does not contradict the theory developed in the text, since ∇g(x∗ ) fails to have full rank. 5. Let A ∈ Rn×n be symmetric, and define f : Rn → R, g : Rn → R by f (x) = x · Ax, g(x) = x · x − 1. Notice that g(x) = 0 if and only if x 2 = 1. Consider the NLP min f (x) s.t. g(x) = 0.
7.4. LAGRANGE MULTIPLIERS
205
Then, if x is a minimizer or maximizer of this NLP, there exists λ ∈ R such that ∇f (x) = ∇g(x)λ ⇔ 2Ax = (2x)λ ⇔ Ax = λx. This shows that the stationary points are the eigenvectors of A, and the Lagrange multipliers are the corresponding eigenvalues. Moreover, if Ax = λx, x 2 = 1, then f (x) = x · Ax = x · (λx) = λ(x · x) = λ. This shows that the maximum value of f (x), over all vectors of norm one, is the largest eigenvalue of A, and the minimum is the smallest eigenvalue of A. If we order the eigenvalues of A as λ1 ≤ · · · ≤ λn , then we have x 2 = 1 ⇒ λ1 ≤ f (x) ≤ λn , that is, x 2 = 1 ⇒ λ1 ≤ x · Ax ≤ λn . Now, for any x ∈ R , x = 0, the vector x/ x 2 has norm one, and hence n
λ1 ≤
x x ·A ≤ λn . x 2 x 2
Multiplying through by x 22 yields λ1 x 22 ≤ x · Ax ≤ λn x 22 for all x ∈ Rn (this inequality was derived for x = 0, but it obviously holds for x = 0 as well). 6. Consider the following family of nonlinear programs indexed by u in Rm : min
f (x)
s.t. g(x) = u, where f : Rn → R and g : Rn → Rm . Let x = x(u) be a local minimizer of this NLP, where x is assumed to be a smooth function of u, and define p(u) = f (x(u)). We assume that x∗ = x(0) is a regular point of the constraint g(x) = 0 (that is, that ∇g(x∗ ) has full rank), and define λ∗ to be the Lagrange multiplier corresponding to x∗ . Then, by the chain rule, we have ∇p(u) = ∇x(u)∇f (x(u)) ⇒ ∇p(0) = ∇x(0)∇f (x∗ ). Since g(x(u)) = u for all u, differentiating both sides of this equation yields ∇x(u)∇g(x(u)) = I, or ∇x(0)∇g(x∗ ) = I. Since ∇f (x∗ ) = ∇g(x∗ )λ∗ , It follows that ∇p(0) = ∇x(0)∇f (x∗ ) = ∇x(0)∇g(x∗ )λ∗ = Iλ∗ = λ∗ , as desired. 7. The minimizer, and associated Lagrange multiplier, of f (x) subject to g(x) = u is √ √ √ √ 1+u 1+u 1+u 3 . ,− √ ,− √ , λ(u) = − √ x(u) = − √ 2 1+u 3 3 3 √ √ We have p(u) = f (x(u)) = − 3 1 + u. Therefore, by direct calculation, √ √ 3 3 ∇p(u) = − √ . ⇒ ∇p(0) = − 2 2 1+u √ On the other hand, the Lagrange multiplier associated with x∗ = x(0) is λ∗ = λ(0) = − 3/2. Thus ∇p(0) = λ∗ , as implied by the previous exercise.
206
7.5
CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES
Spectral methods for differential equations
1. The exact solution is u(x) = (x − x2 )/2, and the solution obtained by the method of Fourier series is ∞
u(x) =
2(1 − (−1)n ) sin (nπx). n3 π 3 n=1
The following graph shows the exact solution u (the solid curve), together with the partial Fourier series with 1 (the dashed curve) and 4 terms (the dotted curve).
0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0
0.2
0.4
0.6
0.8
1
Notice that the partial Fourier series with 4 terms is already indistinguishable from the exact solution on this scale.
2. The exact solution is u(x) = 1 + (e − 1)x − ex , and the solution obtained by the method of Fourier series is ∞
u(x) =
2(1 − (−1)n e) sin (nπx). nπ(1 + n2 π 2 ) n=1
The following graph shows the exact solution u (the solid curve), together with the partial Fourier series with 1 (the dashed curve) and 4 terms (the dotted curve).
0.25 0.2 0.15 0.1 0.05 0 0
0.2
0.4
0.6
0.8
1
As in the previous exercise, the partial Fourier series with 4 terms is indistinguishable from the exact solution on this scale.
2 3. Consider the operator M : CD [0, ] → C[0, ] defined by M (u) = −u + u.
7.5. SPECTRAL METHODS FOR DIFFERENTIAL EQUATIONS
207
2 [0, ], we have (a) For any u, v ∈ CD
M (u), v 2 =
1
(−u (x) + u(x))v(x) dx
0
=−
1
u (x)v(x) dx +
0 1
= − u (x)v(x)| 0 + 1
1 0
1 u(x)v(x) dx 0
u (x)v (x) dx +
0 1 = u(x)v (x)|0 −
=−
1
1 =
u(x)v(x) dx (since v(0) = v(1) = 0) 0
1
1
u(x)v (x) dx + 0
u(x)v (x) dx +
0
u(x)v(x) dx 0
1
u (x)v (x) dx +
=
1
u(x)v(x) dx 0
1
u(x)v(x) dx (since u(0) = u(1) = 0) 0
u(x)(−v (x) + v(x)) dx
0
= u, M (v) 2 . This shows that M is a symmetric operator. Also, notice from the above calculation that
M (u), u 2 =
1
(u (x))2 dx +
0
1
(u(x))2 dx,
0
2 [0, ], u = 0. Then, if λ is an eigenvalue of M and u which shows that M (u), u 2 > 0 for all u ∈ CD is a corresponding eigenfunction with u, u 2 = 1, then
λ = λ u, u 2 = λu, u 2 = M (u), u 2 > 0, which shows that all the eigenvalues of M are positive. (b) It is easy to show that the eigenvalues of M are n2 π 2 + 1, n = 1, 2, . . ., with corresponding eigenfunctions sin (nπx) (the calculation is essentially the same as in Section 7.5.1).
4. The solution is ∞
u(x) =
2 sin (nπx). nπ(1 + n2 π 2 ) n=1
5. Define , V = u ∈ C 2 [0, ] : u (0) = u ( ) = 0 and K : V → C[0, ] by K(u) = −u .
CHAPTER 7. THE SPECTRAL THEORY OF SYMMETRIC MATRICES
208 (a) We have
K(u), v 2 =
(−u (x))v(x) dx
0
= − u (x)v(x)|0 +
=
u (x)v (x) dx
0
u (x)v (x) dx (since u (0) = u ( ) = 0)
0
= u(x)v (x)|0 −
=−
u(v)v (x) dx
0
u(v)v (x) dx (since v (0) = v ( ) = 0)
0
= u, K(v) 2 . This proves that K is symmetric. (b) From the above calculation, we see that K(u), u 2 =
0
(u (x))2 dx ≥ 0,
which shows that all eigenvalues of K are nonnegative. If u is the constant function 1, then u satisfies the boundary conditions and K(u) = 0 = 0 · u. Thus 0 is an eigenvalue of K. Also, if v is any other eigenfunction of K corresponding to λ = 0, then the above calculation shows that (v (x))2 dx = 0, 0
which implies that v (x) ≡ 0, and hence v is constant. Thus there is only one (independent) eigenfunction correspoding to λ = 0. (c) The positive eigenvalues of K are λn = n2 π 2 , n = 1, 2, . . ., and the corresponding eigenfunctions are ψn (x) = cos (nπx). 2 6. Let L : CD [0, ] → C[0, ] be defined by L(v) = −v . Suppose L(v1 ) = λ1 v1 , L(v2 ) = λ2 v2 , where λ1 = λ2 and v1 , v2 = 0. Then
λ1 v1 , v2 2 = λ1 v1 , v2 2 = L(v1 ), v2 2 = v1 , L(v2 ) 2 = v1 , λ2 v2 2 = λ2 v1 , v2 2 , which shows that (λ1 − λ2 ) v1 , v2 2 = 0. Since λ1 − λ2 = 0, it follows that v1 , v2 2 = 0. 2 2 7. Let L : CD [0, ] → C[0, ] be defined by L(v) = −v , and suppose v ∈ CD [0, ], v = 0, L(v) = λv. We 2 can assume that v, v 2 = 1 (since otherwise we could divide v by its L norm), and thus, by (7.17) in the text, λ = λ v, v 2 = λv, v 2 = L(v), v 2 = (v (x))2 dx ≥ 0. 0
This shows that λ ≥ 0. Moreover, if λ = 0, then it follows that (v (x))2 dx = 0, 0
2
which implies that the integrand (v (x)) is identically 0. But this implies that v is a constant function, and since v satisfies v(0) = 0, it follows that v is the zero function, contradicting that v = 0. Thus λ = 0 is impossible, and we have proven that λ > 0.
Chapter 8
The singular value decomposition 8.1
Introduction to the SVD
1. The result is
⎡
√1 √6 √2 3 √1 6
⎢ U =⎣
− √13 √1 3 − √13
− √12
⎤
⎥ 0 ⎦.
√1 2
The matrices Σ and V are unchanged. 2. The SVD of A is U ΣV T , where !
√1 2 √1 2
U=
− √12
"
, Σ=
√1 2
! √ √1 2 10 √ 0 , V = √25 10 0 5
− √25
" .
√1 5
In outer product form, we have √ A = 2 10 =2
1 1
!
√1 2 √1 2
#
"
<
1 2
$
+
=
√2 5
√1 5
+
−1 1
#
√
! 10
−2 1
− √12 √1 2
$
"
<
√1 5
− √25
=
.
3. The SVD of A is U ΣV T , where ⎡ ⎢ U =⎣
√1 2 √1 2
0
⎡ 1 ⎤ ⎤ ⎡ √ √ √1 0 4 3 0 0 6 2 √ ⎢ ⎥ 2 1 0 − √2 ⎦ , Σ = ⎣ 11 0 ⎦ , V = ⎣ √6 0 √1 0 0 0 −1 0 6
− √111 − √111 √3 11
The outer product form simplifies to ⎤ 1 # A = 2⎣ 1 ⎦ 1 0 ⎡
2 1
$
209
⎤ 0 # + ⎣ 0 ⎦ −1 −1 −1 ⎡
3
$
.
√7 66 − √466 √1 66
⎤ ⎥ ⎦.
CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION
210 4. The SVD of A is U ΣV T , where ⎡ 1 √1 ⎢ ⎢ U =⎢ ⎣ ⎡
2 1 2 1 2 1 2
− √16
2
0 0
0 √2 6 − √16
− √12 1 2 1 2 1 2 1 2
0 ⎢ √1 ⎢ 6 V =⎢ ⎣ − √26 √1 6
0 − √12 0 √1 2
⎤
√ 4 6 ⎥ ⎢ 0 ⎥ ⎥, Σ = ⎢ ⎣ ⎦ 0 0 ⎤
1 √ 2√3 − 23 1 √ 2 3 1 √ 2 3 √ − 23 1 √ 2 3 1 √ 2 3 1 √ 2 3
⎡
⎥ ⎥ ⎥. ⎦
The outer product form is ⎤ ⎡ ⎡ ⎤ 1 1 ⎥# ⎢ ⎢ 1 ⎥# $ ⎥ 0 1 −2 1 + 3 ⎢ 0 ⎥ 1 1 A = 2⎢ ⎣ 0 ⎦ ⎣ 1 ⎦ −1 1 5. We have
A=
1 2
#
1
0 1
$
⎤ 0 0 ⎥ ⎥, 0 ⎦ 0
0 √0 6 2 √0 0 2 3 0 0
=
√
! 10
⎤ −1 $ ⎢ 0 ⎥# ⎥ 1 1 +⎢ ⎣ 2 ⎦ 0 −1 ⎡
√1 5 √2 5
"
<
√1 2
√1 2
0
−1 0 1
$
.
= .
Thus A = U ΣV T , where ! U=
√1 5 √2 5
− √25 √1 5
"
√ 10 0 , Σ= 0 0
6. If U, V ∈ R3×3 are any orthogonal matrices and ⎡
2 0 Σ=⎣ 0 1 0 0
0 0
⎡ √1
2
, V =⎣ 0
√1 2
√1 2
0 − √12
⎤ 0 1 ⎦. 0
⎤ 0 0 ⎦, 1
then A = U ΣV T is a 3 × 3 matrix whose singular values are σ1 = 2, σ2 = 1, σ3 = 1. For example, we can take ⎡ ⎤ ⎤ ⎡ √1
⎢ 3 U = ⎣ √13 √1 3
and then
√1 6 − √26 √1 6
√1 2
⎢ 0 ⎥ ⎦,V = ⎣
− √12 ⎡ ⎢ A=⎣
4 3 1 3 1 3
1 3 1 3 4 3
√1 3 √1 3 − √13
√1 6 √1 6 √2 6
√1 2 − √12
⎥ ⎦,
0
⎤ − 31 ⎥ − 34 ⎦ . − 31
$ # 7. If the columns of A are A1 , . . . , An , U = A1 −1 A1 | · · · | An −1 An , V = I, and Σ is the diagonal matrix with diagonal entries A1 , . . . , An , then A = U ΣV T is the SVD of A. 8. Let A ∈ Cn×n . (a) If A has distinct singular values σ1 > σ2 > · · · > σn , and A = U ΣV T is the SVD of A, then AT A = V Σ2 V T , so that the columns of V are eigenvectors of AT A corresponding to the eigenvalues σ1 , . . . , σn . Since the eigenvalues are distinct, each eigenspace is one-dimensional, and there are
8.1. INTRODUCTION TO THE SVD
211
only two choices for each (normalized) eigenvector, say ±vi , i = 1, . . . , n. Thus there are 2n choices for V , Σ is uniquely determined by the condition that σ1 > · · · > σn , and U is determined by σi ui = Avi . This shows that if σn > 0, then U is uniquely determined by V . If σn = 0 (notice that since the singular values are distinct, at most one singular value is 0), then u1 , . . . , un−1 are uniquely determined, but there are two choices for un (from the one-dimensional space sp{u1 , . . . , un−1 }⊥ ). (b) Suppose there exist integers i, i + 1, . . . , i + k such that σi = σi+1 = · · · = σi+k , where σ1 , σ2 , . . . , σn are the singular values of A. Then σi2 is an eigenvalue of multiplicity k + 1 of the symmetric matrix AT A, and there are infinitely many choices of an orthonormal basis {vi , . . . , vi+k } of the k+1-dimensional eigenspace EAT A (σi2 ). If σi = 0, then the corresponding columns ui , . . . , ui+k of U are uniquely determined by σi ui = Avi , but if σi = 0 (in which case i + k = n since the singular values are ordered), then the choice of an orthonormal basis for sp{u1 , . . . , ui−1 }⊥ can be made in infinitely many different ways. 9. Let A ∈ Rn×n be a diagonal matrix. We wish to find the SVD of A. We will first suppose that the diagonal entries λ1 , . . . , λn of A are all nonnegative, and choose a permutation τ ∈ Sn such that λτ (1) ≥ λτ (2) ≥ · · · λτ (n) ≥ 0. We then define U = V = [eτ (1) | · · · |eτ (n) ], where e1 , . . . , en are the standard basis vectors, and Σ to be the diagonal matrix with diagonal entries λτ (1) , . . . , λτ (n) . Then U Σ = [λτ (1) eτ (1) | · · · |λτ (n) eτ (n) ]. Notice that 1, i = τ (j), Vij = 0, i = τ (j),
and T
(V )ij = Vji =
1, j = τ (i), 0, j = τ (i).
Therefore, the jth column of V T is eτ −1 (j) . The product of U Σ and eτ −1 (j) is simply column τ −1 (j) of U Σ, which is λτ (τ −1 (j)) eτ (τ −1(j)) = λj ej . Thus U ΣV T = [λ1 e1 | · · · |λn en ] = A, which shows that A = U ΣV T is the SVD of A. If the diagonal entries of A are not all positive,# we simply choose the permutation $ τ so that |λτ (1) | ≥ · · · |λτ (n) | ≥ 0. We define V as above, and U = sgn(λτ (1) )eτ (1) | · · · |sgn(λτ (n) )eτ (n) , where sgn(λ) is 1 if λ ≥ 0 and −1 if λ < 0. Then we still obtain U Σ = [λτ (1) eτ (1) | · · · |λτ (n) eτ (n) ], and therefore A = U ΣV T still holds. 10. Let A ∈ Cn×n be a diagonal matrix. Choose a permutation τ ∈ Sn such that |λτ (1) | ≥ · · · ≥ |λτ (1) | ≥ 0. For each j, define * λτ (j) |λτ (j) | , λj = 0, αj = 1, λj = 0, and define U = [α1 eτ (1) | · · · |αn eτ (n) ]. The columns of U are orthogonal (being multiples of the standard basis vectors), and each has norm one. Thus U is unitary. Also, αj |λτ (j) | = λτ (j) for each j, and if we define Σ to be the diagonal matrix with diagonal entries |λτ (1) |, . . . , |λτ (n) |, then U Σ = [λτ (1) eτ (1) | · · · |λτ (n) eτ (n) ]. With V = [eτ (1) | · · · |eτ (n) ], we obtain U ΣV T = A (see the previous exercise). 11. Suppose A ∈ Cn×n is invertible and A = U ΣV ∗ is the SVD of A. (a) We have (U ΣV ∗ )∗ = V Σ∗ U ∗ = V ΣU ∗ (since Σ is a real, diagonal matrix), and hence A∗ = V ΣU ∗ is the SVD of A. (b) Since A is invertible, all the diagonal entries of Σ are positive, and hence Σ−1 exists. We have (U ΣV ∗ )−1 = V Σ−1 U ∗ ; however, the diagonal entries of Σ−1 , σ1−1 , . . . , σn−1 , are ordered from smallest to largest. We obtain the SVD of A−1 by re-ordering. Define W = [vn | · · · |v1 ], Z = [un | · · · |u1 ], and let T be the diagonal matrix with diagonal entries σn−1 , . . . , σ1−1 . Then A−1 = W T Z ∗ is the SVD of A−1 .
CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION
212
(c) The SVD of A−∗ is ZT W ∗ , where W , T , and Z are defined above. In outer product form, A∗ =
n
σi (vi ⊗ ui ), A−1 =
i=1
n
σi−1 (vi ⊗ ui ), A−∗ =
i=1
n
σi−1 (ui ⊗ vi ).
i=1
12. Let A ∈ Rn×n be symmetric. (a) If A is positive definite, then the SVD is simply the spectral decomposition of A, with one caveat. Let A = V DV T , where the columns of V are orthonormal eigenvectors of A and D is a diagonal matrix whose diagonal entries are the corresponding eigenvalues of A, ordered so that λ1 ≥ · · · ≥ λn ≥ 0 (this is the caveat). Then the SVD of A is A = U ΣV T , where U = V and Σ = D. (b) If A has one or more negative eigenvalues, we order the eigenvalues and eigenvectors of A so that |λ1 | ≥ · · · ≥ |λn | ≥ 0. The columns of V are the corresponding eigenvectors v1 , . . . , vn , and the jth column of U will be vi if λj ≥ 0 and −vj if λj < 0. We define Σ to be the diagonal matrix with diagonal entries |λ1 |, . . . , |λn |. Then A = U ΣV T is the SVD of A. 13. Let A ∈ Cn×n be normal, and let A = XDX ∗ be the spectral decomposition of A. Assume the eigenvalues λ1 , . . . , λn and eigenvectors x1 , . . . , xn (that is, the diagonal entries of D and the columns of X) have already been ordered so that |λ1 | ≥ · · · |λn | ≥ 0. Define V = X, let Σ be the diagonal matrix with diagonal entries |λ1 |, . . . , |λn |, and let the jth column of U be (λj /|λj |)xj if λj = 0 and xj if λj = 0. Then, in either case, |λj |Uj = λj xj , and thus U Σ = XD. It follows that A = U ΣV ∗ is the SVD of A.
8.2
The SVD for general matrices
1. The SVD of A (in outer product form) is
A=
√
⎡ √1 ⎤ 10 ⎣
2
0 ⎦
<
√1 2
√1 5
√2 5
=
⎤ ⎡ 0 < √ + 5 ⎣ 1 ⎦ − √25 0
√1 5
= .
From this we see that N (A) is trivial and has no basis, and bases for col(A) and col(AT ) are ⎧⎡ 1 ⎤ ⎡ ⎤⎫ 0 ⎬ *! √1 " ! − √2 "> ⎨ √2 5 5 ⎣ 0 ⎦,⎣ 1 ⎦ , , , √2 √1 ⎭ ⎩ √1 5 5 0 2 respectively. The basis for N (AT ) is determined from the fact that N (AT ) = col(A)⊥ . A suitable basis for N (AT ) is ⎧⎡ ⎤⎫ √1 ⎬ ⎨ 2 ⎣ 0 ⎦ . ⎭ ⎩ − √12 2. The SVD of A is A = U ΣV T , where ! V =
√1 2 √1 2
√1 2 − √12
⎡
"
, Σ=
4 0 √ 0 2 2
⎢ ⎢ , U =⎢ ⎣
√1 2
0 0 − √12
1 2 1 2 1 2 1 2
1 2 − 21 − 21 1 2
0 √1 2 − √12
0
⎤ ⎥ ⎥ ⎥. ⎦
8.2. THE SVD FOR GENERAL MATRICES
213
We see that N (A) is trivial and hence has no basis, and √ √ col(A) = sp{(1/ 2, 0, 0, −1/ 2), (1/2, 1/2, 1/2, 1/2)}, √ √ N (AT ) = sp{(1/2, −1/2, −1/2, 1/2), (0, 1/ 2, −1/ 2, 0)}, √ √ √ √ col(AT ) = sp{(1/ 2, 1/ 2), (1/ 2, −1/ 2)}. 3. We have projcol(A) b = (1, 5/2, 5/2, 4) , projN (AT ) b = (0, −1/2, 1/2, 0) . 4. Referring to the solution of Exercise 8.1.3, if U = [u1 |u2 |u3 ], V = [v1 |v2 |v3 ], then {u1 , u2 } is a basis for col(A), {u3 } is a basis for N (AT ), {v1 , v2 } is a basis for col(AT ), and {v3 } is a basis for N (A). 5. Referring to the solution of Exercise 8.1.4, if U = [u1 |u2 |u3 |u4 ], V = [v1 |v2 |v3 |v4 ], then {u1 , u2 , u3 } is a basis for col(A), {u4 } is a basis for N (AT ), {v1 , v2 , v3 } is a basis for col(AT ), and {v4 } is a basis for N (A). 6. Let A ∈ Cm×n have exactly r nonzero singular values. Then we can write r
σi (ui ⊗ vi ),
A= i=1
and {u1 , . . . , ur }, {v1 , . . . , vr } are orthonormal bases for col(A), col(AT ), respectively. It follows that dim(col(A)) = dim(col(AT )) = r, and hence dim(N (A)) = n − r, dim(N (AT )) = m − r. 7. Let u ∈ Rm and v ∈ Rn be given, and define A = uv T . If u or v is the zero vector, then A = 0, all the singular values of A are 0, and an SVD of A is Im 0m×n InT , where Im , In are identity matrices of size m × m and n × n, respectively, and 0m×n is the m × n zero matrix. Now assume that u = 0, v = 0. We have A= u 2 v 2
u u 2
v v 2
T ,
which is the outer product form of an SVD of A. The matrix A has one positive singular value, σ1 = u 2 v 2 , and the rest of the singular values are 0: σ2 = · · · = σmin{m,n} = 0. We define v1 = v/ v 2 and extend {v1 } to an orthonormal basis {v1 , . . . , vn } for Rn . Similarly, we define u1 = u/ u 2 and extend {u1 } to an orthonormal basis {u1 , . . . , um } of Rm . If we then define U = [u1 | · · · |um ], V = [v1 | · · · |vn ], and let Σ be the m × n diagonal matrix whose diagonal entries are u 2 v 2 , 0, . . . , 0, then A = U ΣV T is an SVD of A. 8. Let u ∈ Rn have Euclidean norm one, and define A = I − 2uuT . We know from Exercise 7.2.16 that A is symmetric. If v is orthogonal to u, then Av = v − 2(v · u)u = v − 0 = v, and hence every vector in W = sp{u}⊥ is an eigenvector corresponding to the eigenvalue λ = 1. Since dim(W ) = n − 1, we can find an orthonormal basis {v1 , . . . , vn−1 } for W . We also have Au = u − 2(u · u)u = −u, and hence u is an eigenvector corresponding to the eigenvalue λ = −1. We know from Exercise 8.1.12 how to construct the SVD from the spectral decomposition of A (when A is symmetric). All the singular values of A (which are the absolute values of the eigenvalues of A when A is symmetric) are 1, so we define Σ = I. Define vn = u and V = [v1 | · · · |vn ], and let U = [v1 | · · · |vn−1 | − vn ]. Then, by the solution of Exercise 8.1.12, A = U ΣV T . We can verify this directly as follows. We have T − vn vnT . U ΣV T = v1 v1T + · · · + vn−1 vn−1
Since V V T = I and V V T = v1 v1T + · · · + vn vnT , we see that U ΣV T = (I − vn vnT ) − vn vnT = I − 2vn vnT = I − 2uuT = A, as desired.
CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION
214
9. Let A ∈ √ Rm×n be nonsingular. We wish to compute min{ Ax 2 : x ∈ Rn , x 2 = 1}. Note that Ax 2 = x · AT Ax. By Exercise 7.4.5, the minimum value of x · AT Ax, where x 2 = 1, is the smallest eigenvalue of AT A, which is σn2 . Therefore, min{ Ax 2 : x ∈ Rn , x 2 = 1} = σn , and the value of x yielding the minimum is vn , the right singular vector corresponding to σn , the smallest singular value. 10. Let A ∈ Cn×n be arbitrary, and let A = U ΣV ∗ be the SVD of A. Then A = U V ∗ V ΣV ∗ = QH, where Q = U V ∗ ∈ Cn×n and H = V ΣV ∗ ∈ Cn×n . By inspection, H is symmetric and positive semidefinite (its eigenvalues are the singular values of A), and Q∗ Q = V U ∗ U V ∗ = V V ∗ = I, which shows that Q is unitary. Similarly, we can write A = U ΣU ∗ U V ∗ = GQ, where G = U ΣU ∗ and Q = U V ∗ (as above). Once again, we see immediately that G is symmetric positive semidefinite. 11. Let A ∈ Cm×n , and let A = U ΣV ∗ be the SVD of A. By Exercise 7.2.4, V ∗ x 2 = x 2 for all x ∈ Cn . For any x ∈ Cn , define y = V ∗ x. Then, assuming A has exactly r positive singular values, Ax
2 2 =
4 r 4
U Σy
4 2 2 =4 4
i=1
42 4 4 σi yi ui 4 = 4 2
r
σi2 yi2 ≤ σ12
i=1
r
yi2 ≤ σ12 y 22 = σ12 x 22 .
i=1
Taking the square root of both sides yields Ax 2 ≤ σ1 x 2 . 12. Let A ∈ Cm×n be nonsingular, and let A = U ΣV ∗ be the SVD of A. The nonsingularity of A implies that m ≥ n and also that all n singular values of A are positive. By Exercise 7.2.4, V ∗ x 2 = x 2 for all x ∈ Cn . For any x ∈ Cn , define y = V ∗ x. Then, 42 4 4 4 n 4 4 σi yi ui 4 = Ax 22 = U Σy 22 = 4 4 4 2
i=1
n
σi2 yi2 ≥ σn2
i=1
n
yi2 = σn2 y 22 = σn2 x 22 .
i=1
Taking the square root of both sides yields Ax 2 ≥ σn x 2 . 13. (a) Let Σ ∈ Cm×n be a diagonal matrix. We wish to find Σ† . Let r be the number of positive singular values of Σ, and note that r = m, r = n, or r < m and r < n are all possible. For all y ∈ Cn , we have Σy − b
2 2 =
m
2
r
((Σy)i − bi ) = i=1
(σi yi − bi )2 + C,
i=1
m 2 2 where C = 0 if r = m and C = i=r+1 bi if r < m. This formula shows that Σy − b 2 is −1 minimized by any y ∈ Cn satisfying yi = σi bi , i = 1, . . . , r. Notice that the values of yr+1 , . . . , yn
are undetermined if r < n; however, the minimum-norm solution results from taking yi = 0 for i = r + 1, . . . , n if r < n. (Therefore, if r = n, there is a unique least-squares solution, as we have seen before, whereas if r < n, there is a unique minimum-norm least-squares solution.) It follows that Σ† is the diagonal matrix with diagonal entries σ1−1 , . . . , σr−1 and the rest of the diagonal entries (if any) equal to 0. (b) Now let A ∈ Cn×n , b ∈ Cm be given, and let A = U ΣV ∗ be the SVD of A. Then, using the fact that a unitary matrix preserves norms, Ax − b 22 = U Σv ∗ x − b 2 = U (ΣV ∗ x − U ∗ b) 22 = ΣV ∗ x − U ∗ b 22 . Perform the change of variables y = V ∗ x and note that y 2 = V ∗ x 2 = x 2 for all x ∈ Rn . We then have Ax − b 22 = Σy − U ∗ b 22 , and the minimum-norm least squares solution is y = Σ† U ∗ b, or x = V Σ† U ∗ b (the fact that y 2 = x 2 is critical for concluding that the minimum-norm y corresponds to the minimum-norm x). This shows that A† = V Σ† U ∗ , where Σ† is computed above.
8.2. THE SVD FOR GENERAL MATRICES
215
14. Let m > n and suppose A ∈ Rm×n has full rank. Let the SVD of A be A = U ΣV T and note that, by the assumptions, the singular values of A are σ1 ≥ · · · ≥ σn > 0. Since m > 0, Σ has the form Σ1 , Σ= 0 where Σ1 is the n × n matrix with σ1 , . . . , σn on the diagonal, and 0 is the (m − n) × n matrix of zeros. (a) We have AT A = V ΣT U T U ΣV T = V ΣT ΣV T , and ΣT Σ = Σ21 , which is diagonal and invertible (since the diagonal entries are all positive). ThereT fore, (AT A)−1 = V Σ−2 1 V , and −2 T T T T T A(AT A)−1 AT = U ΣV T V Σ−2 1 V V Σ U = U ΣΣ1 Σ U .
We have T ΣΣ−2 1 Σ =
Σ1 0
ˆ We will call this last matrix I. A(AT A)−1 AT .
Σ−2 1
#
$
0
Σ1
=
Σ1 Σ−2 1 Σ1 0
0 0
=
I 0
0 0
.
ˆ T . This is the SVD of We then have A(AT A)−1 AT = U IU
(b) If we now write U = [U1 |U2 ], where U1 ∈ Rm×n , U2 ∈ Rm×(m−n) , then T U1 I 0 b A(AT A)−1 AT b = [U1 |U2 ] 0 0 U2T T U1 b I 0 = [U1 |U2 ] 0 0 U2T b T U1 b = [U1 |U2 ] 0 n
(ui · b)ui .
= U1 U1T b = i=1
It follows that T
−1
A(A A)
T
A b
n
2 2 =
2
m
(ui · b) ≤ i=1
(ui · b)2 = U T b 22 = b 22 .
i=1
Thus A(AT A)−1 AT b 2 ≤ b 2 for all b ∈ Rm . 15. (a) Let U ∈ Cm×m , V ∈ Cn×n be unitary. We wish to prove that U A F = A F and AV F = A F for all A ∈ Cm×n . We begin with two preliminary observations. By definition of the Frobenius norm, for any A ∈ Cm×n , m n n A 2F =
|Aij |2
j=1
i=1
Aj 22 ,
= j=1
where Aj is the jth column of A. Also, it is obvious that AT F = A F for all A ∈ Cm×n . We thus have U A 2F =
n
(U A)j 2F =
j=1
n
j=1
U Aj 2F =
n
Aj 2F = A 2F
j=1
and AV as desired.
F =
V T AT F = AT F = A F ,
CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION
216
(b) Let A ∈ Cm×n be given, and let r be a positive integer with r < rank(A). We wish to find the matrix B ∈ Cm×n of rank r such that A − B F is as small as possible. If we define B = U Σr V T , where Σr ∈ Rm×n is the diagonal matrix with diagonal entries σ1 , . . . , σr , 0, . . . , 0, then A − B F = U ΣV ∗ − U Σr V ∗ F = U (Σ − Σr )V ∗ F = (Σ − Σr )V ∗ F = Σ − Σr F / 2 = σr+1 + · · · + σtw , where t ≤ min{m, n} is the rank of A. Notice that the rank of a matrix is the number / of positive singular values, so rank(B) = r, as desired. Thus we can make A−B F as small as
2 σr+1 + · · · + σtw .
Moreover, for any B ∈ Cm×n , we have A − B 2F = Σ − U ∗ BV 2F m n 2 Σij − (U ∗ BV )ij = i=1 j=1 t
=
m
t
i=1
j=1 j = i
2
(σi − (U ∗ BV )ii ) +
i=1
(U ∗ BV )2ij +
m
n
(U ∗ BV )2ij .
i=1 j=t+1
Now, we are free to choose all the entries of U ∗ BV (since U ∗ and V are invertible, given any C ∈ Cm×n , there exists a unique B ∈ Cm×n with U ∗ BV = C) to make the above sum as small as possible. Since all three summations are nonnegative, we should choose (U ∗ BV )ij = 0 for i = 1, . . . , m, j = 1, . . . , n, j = i for i = 1, . . . , t. This causes the second two summations to vanish, and yields A − B 2F =
t
2
(σi − (U ∗ BV )ii ) .
i=1 ∗
The rank of U BV (and hence the rank of B) is the number of nonzero diagonal entries (U ∗ BV )11 , . . . , (U ∗ BV )tt . Since the rank of B must be r, it is clear that U ∗ BV = Σr , where Σr is defined above, will make A − B F as small as possible. This shows that B = U Σr V ∗ is the desired matrix. 16. Let A ∈ Cm×n be given, and let A = U ΣV ∗ be the SVD of A. Then, using the results of the previous exercise, / A F = U ΣV ∗ F = ΣV ∗ F = Σ F =
σ12 + · · · + σt2 ,
where t = min{m, n} is the number of singular values of A.
8.3
Solving least-squares problems using the SVD
1. The least-squares solution is x = (1/10, 1/5). 2. The least-squares solution is x = (3/4, −3/4). 3. The matrix A has two positive singular values, σ1 = 6 and σ2 = 2. The corresponding right singular vectors are ⎡ 2 ⎤ ⎡ ⎤ √1 − 3 2 ⎢ ⎢ ⎥ ⎥ v1 = ⎣ 23 ⎦ , v2 = ⎣ √12 ⎦ 1 0 3
8.3. SOLVING LEAST-SQUARES PROBLEMS USING THE SVD and the left singular vectors are
⎤ − √12 ⎢ 0 ⎥ ⎥ ⎥ ⎢ ⎥ ⎥ , u2 = ⎢ 1 ⎥ . ⎣ √2 ⎦ ⎦ 0
⎡ 1 ⎤ ⎢ ⎢ u1 = ⎢ ⎣
2 1 2 1 2 1 2
217
⎡
The minimum-norm least-squares solution to Ax = b is then given by x=
u1 · b u2 · b v1 + v2 . σ1 σ2
(a) x = (2/3, −1/3, 1/12); (b) x = (13/18, −5/18, 1/9); (c) x = (0, 0, 0) (b is orthogonal to col(A)). 4. Suppose A ∈ Rm×n has SVD A = U ΣV T , and we write U = [U1 |U2 ], where the columns of U1 form a basis for col(A) and the columns of U2 form a basis for N (AT ). Let us write r = rank(A), so that U1 = [u1 | · · · |ur ] and U2 = [ur+1 | · · · |um ]. Then {ur+1 , . . . , um } is an orthonormal basis for col(A)⊥ = N (AT ), and hence m (uj · b)uj .
projN (AT ) b = j=r+1
Since U2T b = (ur+1 ·b, . . . , um ·b), this is equivalent to projN (AT ) b = U2 U2T b. Moreover, since {ur+1 , . . . , um } is an orthonormal set, the Pythagorean theorem implies that 4 4 4U2 U2T b42 = 2
m
2
(uj · b)uj 2 = j=r+1
4 4 4 4 Thus 4U2 U2T b42 = 4U2T b42 for all b ∈ Rm .
m
(uj · b)2 uj 22 =
j=r+1
m
4 42 (uj · b)2 = 4U2T b42 .
j=r+1
5. Suppose A ∈ Rm×n has rank m and that A = U ΣV T is the SVD of A. Define matrices V1 = [v1 | · · · |vm ], V2 = [vm+1 | · · · |vn ] (where v1 , v2 , . . . , vn are the columns of V ), and Σ1 = diag(σ1 , . . . , σm ) (where σ1 , σ2 , . . . , σm are the singular values of A). Notice that Σ has the form Σ = [Σ1 |0], where here 0 is the m × (n − m) zero matrix. Also, since rank(A) = m, col(A) = Rm and Ax = b has a solution for all b ∈ Rm . T n−m , is the general solution to Ax = b. We have (a) We wish to show that x = V1 Σ−1 1 U b + V2 z, z ∈ R m
m
σi ui viT ⇒ Ax =
A= i=1
and
σi (vi · x)ui i=1
m
m
σi vi uTi ⇒ AT y =
AT = i=1
σi (ui · y)vi , i=1
which shows that {v1 , . . . , vm } is an orthonormal basis for col(AT ). Since {v1 , . . . , vm , vm+1 , . . . , vn } is an orthonormal basis for Rn , it follows that {vm+1 , . . . , vn } is an orthonormal basis for col(AT )⊥ = N (A), and hence N (A) = {V2 z : z ∈ Rn−m }. Next, notice that T T V1 V1 V1 I T , V1 = = V V1 = 0 V2T V2T V1 and hence −1 T T T AV1 Σ−1 1 U b = U ΣV V1 Σ1 U b = U [Σ1 |0]
I 0
−1 T T T Σ−1 1 U b = U Σ1 Σ1 U b = U U b = b.
T n−m has the form x = x̂ + v, where x̂ is one solution to Ax = b Thus x = V1 Σ−1 1 U b + V2 z, z ∈ R and v is an arbitrary vector in N (A). It follows that x is the general solution of Ax = b.
CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION
218
(b) For any u ∈ Rm , z ∈ Rn−m , V1 u is orthogonal to V2 z (since (V1 u) · (V2 z) = u · V1T V2 z, and V1T V2 is the zero matrix because every column of V1 is orthogonal to every column of V2 ). Therefore, by the Pythagorean theorem, V1 u + V2 z 22 = V1 u 22 + V2 z 22 . It follows that
4 4 4 4 4V1 Σ−1 U T b + V2 z 42 = 4V1 Σ−1 U T b42 + V2 z 2 , 1 1 2 2 2
and clearly the solution of minimum norm is obtained by choosing z = 0. 6. Let A ∈ Rm×n with rank(A) = n < m, and let A = U ΣV T be the SVD of A. Since rank(A) = n, A has n positive singular values, and we can partition U as U = [U1 |U2 ], U1 ∈ Rm×n , U2 ∈ Rm×(n−m) and Σ as Σ1 , Σ= 0 where Σ1 ∈ Rn×n is a diagonal matrix with positive diagonal entries. Therefore, Σ1 V T = U 1 Σ1 V T A = U ΣV T = [U1 |U2 ] 0 and
AT A = V Σ1 U1T U1 Σ1 V T = V Σ21 V T
(U1T U1 = I since the columns of U1 are orthonormal). Therefore, since V is orthogonal and Σ21 is invertible, (AT A)−1 = V Σ−2 1 VT and hence −2 −1 T T T T (AT A)−1 AT = V Σ−2 1 V V Σ1 U 1 = V Σ1 Σ1 U 1 = V Σ1 U 1 . T Finally, for any b ∈ Rm , (AT A)−1 AT b = V Σ−1 1 U1 b.
7. Let A ∈ Rm×n have rank r, and let σ1 , . . . , σr be the positive singular values of A, with corresponding right singular vectors v1 , . . . , vr ∈ Rn and left singular vectors u1 , . . . , ur ∈ Rm . Then, for all x ∈ Rn , r
σi (vi · x)ui
Ax = i=1
and, for all b ∈ Rm , A† b =
r
ui · b vi . σi i=1
8. Let A ∈ Rm×n be singular. We have seen that Ax = b has a unique least-squares solution in col(AT ) r ui ·b (cf. Exercise 6.6.14). By the previous exercise, x = i=1 σi vi is the minimum-norm least-squares solution of Ax = b (using the usual notation for the singular values and singular vectors of A, and assuming rank(A) = r). By Theorem 361, we see that this vector x lies in col(AT ), and hence x is the unique least-squares solution belonging to that subspace. 9. Let A ∈ Rm×n . (a) Suppose A has a right inverse B ∈ Rn×m . Then, for any b ∈ Rm , A(Bb) = (AB)b = Ib = b, which shows that, for each b ∈ Rm , Ax = b has a solution x = Bb. This in turn shows that col(A) = Rm , and hence that rank(A) = m. By the rank theorem, this implies that the rows of A are linearly independent. Conversely, if the rows of A are linearly independent, then rank(A) = m by the rank theorem, and hence Ax = b has a solution for each b ∈ Rm . This implies that any least-squares solution of Ax = b is actually a solution of Ax = b, and hence (AA† )b = A(A† b) = b for all b ∈ Rm . Therefore, AA† = I, and A† is a right inverse of A.
8.4. THE SVD AND LINEAR INVERSE PROBLEMS
219
(b) With m ≤ n, we have a reduced SVD of A of the form A = U Σ1 V1T , where U ∈ Rm×m is orthogonal, Σ1 ∈ Rm×m , and V1 ∈ Rn×m has orthonormal columns. If A has linearly independent rows, then, T by the first part of this exercise, A† is a right inverse of A. By Exercise 5, A† = V1 Σ−1 1 U , and T hence V1 Σ−1 U is a right inverse of A. 1 10. Let A ∈ Rm×n . (a) If A has a left inverse B ∈ Rn×m , then Ax = 0 ⇒ BAx = B0 ⇒ Ix = 0 ⇒ x = 0. Therefore, N (A) is trivial and the fundamental theorem implies that rank(A) = n. This implies that A has linearly independent columns. Conversely, if A has linearly independent columns, then AT A is invertible, and B = (AT A)−1 AT satisfies BA = (AT A)−1 AT A = I. Thus A has a left inverse. (b) With n ≤ m, the reduced SVD of A has the form A = U1 Σ1 V T , where V ∈ Rn×n is orthogonal, Σ1 ∈ Rn×n , and U1 ∈ Rm×n has orthonormal columns. Assume that A has linearly independent T columns. Then, by Exercise 6 and the first part of this exercise, (AT A)−1 AT = V Σ−1 1 U1 is a left inverse of A.
8.4
The SVD and linear inverse problems
1. (a) Let A ∈ Rm×n be given, let I ∈ Rn×n be the identity matrix, and let be a positive number. For any b̂ ∈ Rm , we can solve the equation
A I
x=
b̂ 0
(8.1)
in the least-square sense, which is equivalent to minimizing 4 42 4 42 4 A 4 4 b̂ 4 4 4 = 4 Ax − b̂ 4 . x − 4 I 4 4 4 0 x 2 2 Now, for any Euclidean vector w ∈ Rk , partitioned as w = (u, v), u ∈ Rp , v ∈ Rq , p + q = k, we have w 22 = u 22 + v 22 , and hence 4 4 4 Ax − b̂ 42 4 4 = Ax − b̂ 2 + x 2 = Ax − b̂ 2 + 2 x 2 . 2 2 2 2 4 4 x 2 Thus solving (8.1) in the least-squares sense is equivalent to choosing x ∈ Rn to minimize Ax − b̂ 22 + 2 x 22 . (b) We have
A I
T =
#
AT
I
$
,
which implies that
A I
T
A I
T
2
= A A + I,
A I
T
b̂ 0
= AT b̂.
Hence the normal equations for (8.1) take the form (AT A + 2 I)x = AT b̂.
CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION
220 (c) We have
A I
x=0 ⇒
Ax x
=
0 0
,
which yields x = 0, that is, x = 0. Thus the matrix is nonsingular and hence, by the fundamental theorem, it has full rank. It follows from Exercise 6.4.1 that AT A + 2 I is invertible, and hence there is a unique solution x to (AT A + 2 I)x = AT b̂. 2. (a) If ≥ σ > 0, then σ 1 σ σ/ ≤ . ≤ 2 = σ 2 + 2 (b) Suppose ≥
√ σ > 0. Then σ/ 2 ≤ 1 and hence σ
=
σ 2 + 2
σ/ 2 1 σ 2 ≤ = 1. 1 1+
3. With the errors drawn from a normal distribution with mean zero and standard deviation 10−4 : • truncated SVD works best with k = 3 singular values/vectors; • Tikhonov regularization works best with around 10−3 . 4. With the errors drawn from a normal distribution with mean zero and standard deviation 10−4 : • truncated SVD works best with k singular values/vectors for k equal to 3 or 4 or 5 (depending on the particular example); • Tikhonov regularization works best with between 10−3 and 2 · 10−3 (depending on the particular example). 5. Suppose the kernel of a Fredholm integral operator of the first kind has the form N
k(t, s) =
fm (t)gm (s). m=1
If we define K ∈ Rn×n by Kij = Δsk(ti , sj ) (as in Example 364), then we have N
N
Kij =
Δsfm (ti )gm (sj ) = m=1
Δs Fm GTm ij ,
m=1
where Fm , Gm ∈ Rn , (Fm )i = fm (ti ), (Gm )j = gm (sj ). It follows that N
ΔsFm GTm ,
K= m=1
and hence, for any X ∈ Rn , N
Δs(Gm · X)Fm ∈ sp{F1 , . . . , FN }.
KX = m=1
Therefore, no matter how large the value of n, rank(K) ≤ N , and hence K has at most N nonzero singular values. Thus, according to the discussion on pages 483–484 of the text, the equation KX = Y should be regarded as a rank-deficient least-squares problem rather than a linear inverse problem.
8.5. THE SMITH NORMAL FORM OF A MATRIX
8.5
221
The Smith normal form of a matrix
1. We have A = U SV , where ⎡
4 U =⎣ 6 3
2. We have A = U SV , where ⎡
1 U =⎣ 6 1
3. We have A = U SV , where ⎡
4 U =⎣ 5 7
2 1 2
⎤ ⎡ 1 −1 0 ⎦, S = ⎣ 0 0 −1
⎤ ⎡ 1 0 0 −1 0 ⎦ , S = ⎣ 0 0 −1 1 ⎤ ⎡ 1 0 1 0 1 ⎦, S = ⎣ 0 0 −1 0
⎤ ⎡ 1 0 0 0 3 0 ⎦, V = ⎣ 0 1 0 0 0 15
⎤ −1 4 ⎦. 1
⎤ ⎤ ⎡ 5 7 5 0 0 2 0 ⎦ , V = ⎣ 12 18 13 ⎦ . 2 3 2 0 12
⎡ ⎤ 2 0 0 3 0 ⎦, V = ⎣ 1 0 0 0
1 0 0
⎤ 4 7 ⎦. 1
4. We wish to prove that each of the elementary matrices Ps described on pages 495–496 is unimodular. 1. Ps is a diagonal matrix; each diagonal entry is 1 except the ith, which is −1. Since Ps is diagonal, its determinant is the product of the diagonal entries, which is −1. 2. Ps is equal to the identity matrix, with the 0 in the i, j entry, j < i, replaced by a nonzero integer p. Here Ps is a simple lower triangular matrix with all ones on the diagonal; its determinant is the product of the diagonal entries, which is 1. 3. Ps is equal to the identity matrix, with rows i and j interchanged. By Corollary 168, det(Ps ) = −det(I) = −1. Thus, in each case, det(Ps ) = ±1, and hence Ps is unimodular. 5. (a) If A ∈ Zn×n is unimodular, then det(A) = ±1. By Corollary 166, det(A−1 ) = det(A)−1 = ±1, and hence A−1 is also unimodular. (b) If A, B ∈ Zn×n are unimodular, then det(AB) = det(A)det(B) by Theorem 163, and hence det(AB) = (±1)(±1) = ±1. This shows that AB is also unimodular. 6. Consider the computation of the unimodular matrices U and V in the Smith decomposition A = U SV . We have U = P −1 = (Pk Pk−1 · · · P1 )−1 = P1−1 P2−1 · · · Pk−1 . We can therefore compute U by beginning with U = I and successively replacing it with U = P1−1 , U = P1−1 P2−1 , and so forth, until we obtain U = P1−1 P2−1 · · · Pk−1 . We wish to explain how to rightmultiply a given matrix by Ps−1 , when Ps is any of the three possible elementary matrices, and thus how to compute U without first computing P and then P −1 . We consider the three possible elementary matrices: 1. Ps is a diagonal matrix; each diagonal entry is 1 except the ith, which is −1. In this case, Ps−1 = Ps , and hence right-multiplying B by Ps−1 is equivalent to multiplying the ith column of B by −1. 2. Ps is equal to the identity matrix, with the 0 in the i, j entry, j < i, replaced by a nonzero integer p. We wish to compute V = BPs−1 , where B is given. This equation is equivalent to V Ps = B, which implies Bk = Vk , k = j, Bj = Vj + pVi .
CHAPTER 8. THE SINGULAR VALUE DECOMPOSITION
222 Solving for the columns of V yields
Vk = Bk , k = j, Vj = Bj − pVi = Bj − pBi . Thus right-multiplying B by Ps−1 is equivalent to subtracting p times column i from B from column j. 3. Ps is equal to the identity matrix, with rows i and j interchanged. Notice that Ps is its own inverse, and we can also characterize Ps as the identity matrix with columns i and j interchanged. It follows that right-multiplying B by Ps−1 is equivalent to interchanging columns i and j of B. 7. Suppose a, b, c, d ∈ Z with a ≡ c (mod p) and b ≡ d (mod p). Then there exist m, n ∈ Z such that a = c + mp, b = d + np. Therefore, a + b = (c + mp) + (d + np) = (c + d) + (m + n)p ⇒ a + b ≡ c + d (mod p) and ab = (c + mp)(d + np) = cd + (dm + cn + mnp)p ⇒ ab ≡ cd (mod p). 8. (a) Let A ∈ Zn×n and let S ∈ Zn×n be the Smith normal form of A, with nonzero diagonal entries d1 , d2 , . . . , dr . The rank of S is obviously r ({e1 , . . . , er } is a basis for col(S)), and we have A = U SV , where U , V are unimodular and hence invertible. Now, for each j = 1, . . . , n, there exists xj ∈ Zn such that V xj = ej (the fact that xj belongs to Zn follows from Cramer’s rule and the fact that V is unimodular). It follows that, for j = 1, . . . , r, Axj = U SV xj = U Sej = U (dj ej ) = dj Uj , where Uj is the jth column of U . It follows that sp{U1 , . . . , Ur } ⊂ col(A), and hence rank(A) ≥ r. On the other hand, for j = r + 1, . . . , n, Axj = U SV xj = U Sej = U 0 = 0. Therefore, sp{xj+1 , . . . , xn } ⊂ N (A). Since {V xj+1 , . . . , V xn } = {ej+1 , . . . , en } is linearly independent, it follows that {xj+1 , . . . , xn } is linearly independent. Thus nullity(A) ≥ n − r. Since rank(A) + nullity(A) = n, rank(A) ≥ r, nullity(A) ≥ n − r, it follows that rank(A) = r (and nullity(A) = n − r), which is what we wanted to prove. (b) Let B ∈ Zn×n and let T ∈ Zn×n be the Smith normal form over Zp of B, with nonzero diagonal p p entries e1 , e2 , . . . , es . Then the same proof as in the first part of this exercise shows that rank(B) = s. (Notice that the vectors xj now belong to Znp since Cramer’s rule is valid over an arbitrary field.)
Chapter 9
Matrix factorizations and numerical linear algebra 9.1 1.
The LU factorization ⎡
1 0 0 ⎢ 3 1 0 L=⎢ ⎣ −1 −4 1 −2 0 5 2.
⎡
1 0 ⎢ −4 1 L=⎢ ⎣ −3 −3 4 −1
0 0 1 −5
3 2 1 −1 0 1 0 0
⎤ 4 3 ⎥ ⎥. −1 ⎦ 1
⎡ ⎤ 1 −2 2 0 ⎢ 0 −1 4 0 ⎥ ⎥, U = ⎢ ⎣ 0 0 1 0 ⎦ 0 0 0 1
⎤ 3 −2 ⎥ ⎥. 5 ⎦ 1
⎤ ⎡ 1 0 ⎢ 0 0 ⎥ ⎥, U = ⎢ ⎣ 0 0 ⎦ 0 1
3. Let L ∈ Rn×n be lower triangular and invertible. We wish to prove that L−1 is also lower triangular. We will argue by induction on n. The case n = 1 is trivial, so suppose the result holds for (n − 1) × (n − 1) matrices. If L ∈ Rn×n is lower triangular and invertible, then we can write a 0 , L= w L1 where a ∈ R, a = 0, w ∈ Rn−1 , and L1 ∈ R(n−1)×(n−1) is lower triangular and invertible. Since L is invertible, L−1 exists, and we can write b u −1 , L = v M where b ∈ R, u, v ∈ Rn−1 , and M ∈ R(n−1)×(n−1) . Then LL−1 = I implies au ab 1 0 . = 0 I bw + L1 v wuT + L1 M We see that ab = 1, which implies that b = a−1 , and au = 0, which yields u = 0 since a = 0. Then −1 T bw+L1 v = 0 implies that L1 v = −a−1 w, or v = −a−1 L−1 1 w, and wu +L1 M = I, u = 0 yields M = L1 . −1 By the induction hypothesis, L1 is lower triangular, and we see that 0 a−1 L−1 = −1 −a−1 L−1 1 w L1 is also lower triangular. This completes the proof by induction. 223
224
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
4. Let L, M ∈ Rn×n be lower triangular. Then n
(LM )ij =
Lik Mkj . k=1
Since Lik = 0 for k > i and Mkj = 0 for k < j, it follows that i
(LM )ij =
Lik Mkj , k=j
where now we include only the nonzero terms in the sum. If j > i, then there are no nonzero terms in the sum, that is, Lik Mkj = 0 for all k. Thus (LM )ij = 0 for all j > i, and hence LM is upper triangular. 5. Let Lk be defined by equation (9.12) in the text (page 509), and let Mk be the matrix labeled L−1 k in equation (9.12) (page 510). We wish to prove that Lk Mk = I, that is, that Mk = L−1 . Left-multiplying k Mk by Lk has the effect of leaving the first k rows of Mk unchanged, and adding − jk times rows k to row j for j > k. The rows of Mk are e1 , . . . , ek , k+1,k ek + ek+1 , . . . , n,k ek + en . Thus, the jth row of Lk Mk is ej for j = 1, . . . , k, and it is − jk ek + ( jk ek + ej ) = ej for j = k + 1, . . . , n. In other words, for all j, the jth row of Lk Mk is ek , and hence Lk Mk = I, as desired. n×n be defined by 6. Let L−1 k ∈R
⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ −1 Lk = ⎢ ⎢ ⎢ ⎢ ⎢ ⎣
⎤
1 ..
⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥. ⎥ ⎥ ⎥ ⎥ ⎦
. 1 k+1,k .. . n,k
..
. ..
. 1
−1 −1 We wish to prove that L−1 n−m Ln−m−1 · · · Ln−1 is given by
⎧ 0, ⎪ ⎪ ⎨ −1 1, −1 Ln−m L−1 k+1 · · · Ln−1 ij = ⎪ , ij ⎪ ⎩ 0,
j > i, j = i, n − m ≤ j < i, j < min{n − m, i}.
(9.1)
We will argue by induction on m. For m = 1, formula (9.1) implies that L−1 n−1 is the identity matrix with the n, n − 1 entry changed from 0 to n,n−1 , which is correct. Let us assume the result holds for m − 1, where m ≥ 2: ⎧ 0, j > i, ⎪ ⎪ ⎨ −1 1, j = i, −1 Ln−m+1 L−1 k+1 · · · Ln−1 ij = ⎪ , n − m + 1 ≤ j < i, ij ⎪ ⎩ 0, j < min{n − m + 1, i}. −1 −1 We will write B = L−1 n−m+1 Lk+1 · · · Ln−1 . Notice that left-multiplying B by Ln−m leaves the first n − m rows of B unchanged, and adds i,n−m times row n − m of B to row i of B, i = n − m + 1, . . . , n. By the induction hypothesis, row n − m of B is simply en−m , so left-multiplying B by L−1 n−m simply adds i,n−m to the i, n − m entry of B. The induction hypothesis implies that the i, n − m entry of B is 0, so the effect of left-multiplying B by L−1 n−m is simply to insert i,n−m in the i, n − m entry, i = n − m + 1, . . . , n. This verifies (9.1), and completes the proof by induction.
9.1. THE LU FACTORIZATION
225
7. Let L=
1
0 1
and notice that LU =
, U=
u v 0 w
u v u v + w
,
.
(a) We wish to show that there do not exist matrices L, U of the above forms such that LU = A, where 0 1 . A= 1 1 This is straightforward, since LU = A implies u = 0 (comparing the 1, 1 entries) and also u = 1 (comparing the 2, 1 entries). No choice of , u, v, w can make both of these true, and hence there do not exist such L and U . (b) If
A=
0 0
1 1
,
then LU = A is equivalent to u = 0, v = 1, and + w = 1. There are infinitely many choices of and w that will work, and hence there exist infinitely many L, U satisfying LU = A. 8. We wish to prove that if A ∈ Rn×n has a unique LU factorization, then M (1) , M (2) , . . . , M (n−1) are all nonsingular. We first prove the following preliminary result: If A ∈ Rn×n is nonsingular and has an LU factorization, then the LU factorization is unique. We argue by induction on n. The result is obvious for n = 1 (since there is only one 1 × 1 unit lower triangular matrix). Suppose the result hold for all (n − 1) × (n − 1) matrices, and let A ∈ Rn×n . Suppose A = LU , where L is unit lower triangular and U is upper triangular. We can partition L and U as follows: Ũ u L̃ 0 , U= L= , 0 Unn 1 and then LU = A implies that
or
L̃Ũ Ũ T
L̃u · u + Unn
=
M (n−1) c
b Ann
,
L̃Ũ = M (n−1) , L̃u = b, Ũ T = c, · u + Unn = Ann .
(9.2)
In particular, we see that L̃Ũ is an LU factorization for M (n−1) . Notice that U must be nonsingular since A is nonsingular. Since det(U ) = det(Ũ )Unn , we see that Ũ is also nonsingular. The matrix L̃ is also nonsingular (since it is unit lower triangular), and the equations (9.2) then show that L and U are uniquely determined: L̃ and Ũ are unique by the induction hypothesis, and the equations (9.2) then determine , u, and Unn : u = L̃−1 b, = Ũ −T c, Unn = Ann − · u. This completes the proof by induction. We can now prove the desired result. Suppose A ∈ Rn×n has a unique LU factorization. Referring to (9.2), it is easy to see that Ũ must be nonsingular, since otherwise we could produce a different LU factorization of A (choose any ˆ ∈ N (Ũ T ), ˆ = 0, replace by + ˆ and replace Unn by Ann − · u − ˆ· u). But then M (n−1) has the LU factorization M (n−1) = L̃Ũ , and M (n−1) is nonsingular since both L̃ and Ũ are nonsingular. The desired result now follows by a simple induction argument. 9. Let A ∈ Rn×n have an LU factorization. It requires n − 1 steps to compute L and U , where step j produces zeros below the diagonal in the jth column of A (thinking now of A as an array that is updated at each step until it is upper triangular). Step j requires n − j substeps: compute the multiplier ij and subtract ij times row j from row i, for i = j + 1, . . . , n. Computing ij requires one division and subtracting ij times row j from row i requires 2(n − j) operations (n − j multiplications and n − j
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
226
subtractions—notice that we need not operate on the first j entries in the rows, since the result is already known to be zero). Thus step j requires a total of (n − j)(2(n − j) + 1) = 2(n − j)2 + (n − j) operations. The total operation count for computing L and U is n−1
,
n−1 n−1 n−1 n−1 2(n − j)2 + (n − j) = 2 (n − j)2 + (n − j) = 2 k2 + k
j=1
j=1
j=1
k=1
k=1
(n − 1)n(2n − 1) (n − 1)n + =2 6 2 2 3 1 2 1 = n − n − n, 3 2 6 which is what we were to prove. 10. Suppose L ∈ Rn×n is unit lower triangular, and we wish to solve Lc = b. Using forward substitution, we have i−1
xi = bi −
Lij xj , i = 1, . . . , n. j=1
Thus computing xi requires 2(i − 1) operations, and the total operation count for computing x is n n n n(n + 1) − n = n2 − n, (i − 1) = 2 i− 1 =2 2 2 i=1 i=1 i=1 as desired. 11. Computing A−1 is equivalent to solving Ax = ej for j = 1, 2, . . . , n. If we compute the LU factorization and then solve the n systems LU = ej , j = 1, . . . , n, the operation count is 8 2 3 1 2 1 1 7 n − n − n + n 2n2 − n = n3 + n2 − n. 3 2 6 3 2 6 We can reduce the above operation count by taking advantage of the zeros in the vectors ej . Solving Lc = ej takes the following form: ci = 0, i = 1, . . . , j − 1, cj = 1, i−1
ci = −
Lik ck , i = j + 1, . . . , n. k=j
Thus solving Lc − ej requires i=j+1 2(i − j) = (n − j)2 + (n − j) operations. The total for solving all n of the lower triangular systems Lc = ej , j = 1, . . . , n, is n
, - 1 1 (n − j)2 + (n − j) = n3 − n 3 3 j=1 n
(instead of n(n2 − n) = n3 − n if we ignore the structure of the right-hand side). We still need n3 operations to solve the n upper triangular systems U Aj = L−1 ej (since we perform back substition, there is no simplification from the fact that the first j − 1 entries in L−1 ej are zero). Hence the total is 1 2 3 1 2 1 1 3 3 n − n − n + n3 − n + n3 = 2n3 + n2 − n. 3 2 6 3 3 2 2 Notice the reduction in the leading term from (8/3)n3 to 2n3 .
9.2. PARTIAL PIVOTING
9.2
227
Partial pivoting
1. The solution is x = (2, 1, −1); partial pivoting requires interchanging rows 1 and 2 on the first step, and interchanging rows 2 and 3 on the second step. 2. The solution is x = (1, 2, 1, 1); partial pivoting requires interchanging rows 1 and 3 on the first step, rows 2 and 3 on the second step, and rows 3 and 4 on the third step. 3. (a) fmin = 0.10 · · · 0 · 10emin = 10emin −1 . (b) fmax = 0.99 · · · 9 · 10emax = 10emax (1 − 10−t ). (c) Let x be a real number satisfying fmin ≤ x ≤ fmax , and let fl(x) be the floating point number closest to x. We can write x as x = m · 10e , where m ∈ [0.1, 1) and emin ≤ e ≤ emax . We have |x − fl(x)| ≤ 5 · 10−t−1 · 10e and |x| ≥ 10−1 · 10e ; therefore, |x − fl(x)| 5 · 10−t−1 · 10e ≤ = 5 · 10−t . |x| 10−1 · 10e 4. Cramer’s rule yields x̂ = (1.000, 1.000) for Example 382 and x̂ = (2.000, 0.000) for Example 384. 5. Suppose A ∈ Rn×n has an LU decomposition, A = LU . We know that the determinant of a square matrix is the product of the eigenvalues, and also that the determinant of a triangular matrix is the product of the diagonal entries. We therefore have det(A) = λ1 λ2 · · · λn , where λ1 , λ2 , . . . , λn are the eigenvalues of A (listed according to multiplicity), and also det(A) = det(LU ) = det(L)det(U ) = (1 1 · · · 1)(U11 U22 · · · Unn ) = U11 U22 · · · Unn . This shows that λ1 λ2 · · · λn = U11 U22 · · · Unn . 6. Suppose {e1 , e2 , . . . , en } is the standard basis for Rn and (i1 , i2 , . . . , in ) is a permutation of (1, 2, . . . , n). Let P be the permutation matrix with rows ei1 , ei2 , . . . , ein . (a) Let A be an n × m matrix. The kth row of P A is the linear combination of the rows of A, where the weights in the linear combination are the components of the kth row of P . Since the kth row of P is simply eik , the kth row of P A is just the ik th row of A. Thus the rows of P A are the rows of A, permuted according to the given permutation (i1 , i2 , . . . , in ). (b) Let A be an m × n matrix with columns A1 , A2 , . . . , An ∈ Rm . The kth row of AP T is a linear combination of the columns of A, where the weights in the linear combination are the components of the kth column of P T , which is eik . Thus the kth column of AP T is simply the ik th column of A, and therefore AP T = [Ai1 |Ai2 | · · · |Ain ]. 7. Let P ∈ Rn×n be an elementary permutation, that is, a permutation matrix defined by a transposition τ . Notice that a transposition τ is its own inverse: τ (τ (i)) = i for all i = 1, . . . , n. We have n
(P P )ij =
Pik Pkj . k=1
Since Pik = 1 if and only if k = τ (i), this reduces to (P P )ij = Pτ (i),j . But then Pτ (i),j = 1 if and only if j = τ (τ (i)) = i, it follows that P P = I. If P is a general permutation matrix, that is, if τ is not a transposition, then τ (τ (i)) = i does not hold for all i, and P P = I. For example, ⎤ ⎤ ⎡ ⎡ 0 0 1 0 1 0 P = ⎣ 0 0 1 ⎦ ⇒ PP = ⎣ 1 0 0 ⎦. 0 1 0 1 0 0
228
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
8. Let A ∈ Rn×n be reduced to upper triangular form by a sequence of row interchanges and elementary operations: U = Ln−1 Pn−1 Ln−2 Pn−2 · · · L1 P1 A. Define P = Pn−1 Pn−2 · · · P1 and L̃k = Pn−1 Pn−2 · · · Pk+1 Lk Pk+1 Pk2 · · · Pn−1 , k = 1, 2, . . . , n − 1. (a) We wish to prove that L̃n−1 L̃n−2 · · · L̃1 P A = Ln−1 Pn−1 Ln−2 Pn−2 · · · L1 P1 A = U . We have L̃1 P = Pn−1 · · · P2 L1 P2 · · · Pn−1 Pn−1 · · · P2 P1 . From Exercise 7, we know that P P = I for each elementary permutation matrix, and hence P2 · · · Pn−1 Pn−1 · · · P2 P1 = P2 · · · Pn−2 Pn−2 · · · P2 P1 = . . . = P2 P2 P1 = P1 . Thus L̃1 P = Pn−1 · · · P2 L1 P1 . Similarly, L̃2 L̃1 P = Pn−1 · · · P3 L2 P3 · · · Pn−1 Pn−1 · · · P2 L1 P1 = Pn−1 · · · P3 L2 P2 L1 P1 , .. .. . . L̃k · · · L̃1 P = Pn−1 · · · Pk+1 Lk Pk · · · L2 P2 L1 P1 , .. .. . . L̃n−1 · · · L̃1 P = Ln−1 Pn−1 · · · L2 P2 L1 P1 . This proves the desired result. (b) Now we wish to show that each L̃k is an elementary matrix, namely, equal to the identity matrix with nonzeros below the k, k entry. We will show that if L is any elementary matrix of this form, then so is Pt LPt for any t > k. Notice that Pt is an elementary permutation matrix with the property that right-multiplication by Pt interchanges rows t and s ≥ t. Left-multiplication by Pt interchanges columns t and s. Now, by assumption, row t of L is t,k ek + et , and row s is s,k ek + es . Then each row of Pt L equals the corresponding row of L, except that row t becomes s,k ek + es and row s becomes t,k ek + et . Then, since t, s > k, columns t and s of Pt L are es and et , respectively, and columns t and s of Pt LPt are et and es , respectively. We then see that row t of Pt LPt is s,k ek + et and row s is t,k ek + es . Thus, the only difference between L and Pt LPt is that the number t,k and s,k have been exchanged in column k, and it follows that Pt LPt is an elementary matrix of the same form as is L. From this we see that L̃k = Pn−1 Pn−2 · · · Pk+1 Lk Pk+1 Pk2 · · · Pn−1 is an elementary matrix of the same form as Lk . (c) We have −1 −1 U = L̃n−1 L̃n−2 · · · L̃1 P A ⇒ P A = L̃−1 1 L̃2 · · · L̃n−1 U, −1 −1 or P A = LU , where L = L̃−1 1 L̃2 · · · L̃n−1 . From Theorems 378 and 379, we know that L is lower triangular.
9. On the first step, partial pivoting requires that rows 1 and 3 be interchanged. No interchange is necessary on step 2, and rows 3 and 4 must be interchanged on step 3. Thus ⎤ ⎡ 0 0 1 0 ⎢ 0 1 0 0 ⎥ ⎥ P =⎢ ⎣ 0 0 0 1 ⎦. 1 0 0 0 The LU factorization of P A is P A = LU , where ⎡ 1 0 0 ⎢ 0.5 1 0 L=⎢ ⎣ 0.25 −0.5 1 −0.25 −0.1 0.25
⎤ ⎡ 6 0 ⎢ 0 0 ⎥ ⎥, U = ⎢ ⎣ 0 0 ⎦ 0 1
⎤ 2 −1 4 3 1 −2 ⎥ ⎥. 0 2 1 ⎦ 0 0 1
9.3. THE CHOLESKY FACTORIZATION
9.3
229
The Cholesky factorization
1. (a) The Cholesky factorization is A = RT R, where ⎤ ⎡ 1 −3 2 2 −2 ⎦ . R=⎣ 0 0 0 2 (b) We also have A = LDLT , where L = (D−1/2 R)T and the diagonal entries of D are the diagonal entries of R: ⎤ ⎡ ⎡ ⎤ 1 0 1 0 0 √0 ⎦ D = ⎣ 0 2 0 ⎦ , L = ⎣ −3 √2 √0 . 0 0 2 2 2 − 2 Alternatively, we can write A = LU , where U = DR and L = (D−1 R)T : ⎤ ⎤ ⎡ ⎡ 1 −3 2 1 0 0 4 −4 ⎦ . 1 0 ⎦, U = ⎣ 0 L = ⎣ −3 0 0 4 2 −1 1 2. (a) The Cholesky factorization is A = RT R, where ⎡ 1 1 ⎢ 0 2 R=⎢ ⎣ 0 0 0 0
⎤ 2 −1 −2 2 ⎥ ⎥. 1 −1 ⎦ 0 2
(b) We also have A = LDLT , where L = (D−1/2 R)T and the diagonal entries of D are the diagonal entries of R: ⎤ ⎡ ⎡ ⎤ 1 0 0 1 0 0 0 √0 ⎢ 1 ⎢ 0 2 0 0 ⎥ 0 0 ⎥ ⎥ ⎢ ⎥. √2 D=⎢ ⎣ 0 0 1 0 ⎦, L = ⎣ 2 − 2 1 √0 ⎦ √ 0 0 0 2 2 −1 2 −1 Alternatively, we can write A = LU , where U = DR and L = (D−1 R)T : ⎤ ⎤ ⎡ ⎡ 1 1 2 −1 1 0 0 0 ⎢ ⎢ 1 4 ⎥ 1 0 0 ⎥ ⎥. ⎥ , U = ⎢ 0 4 −4 L=⎢ ⎦ ⎣ ⎣ 2 −1 0 0 1 −1 ⎦ 1 0 0 0 0 4 −1 1 −1 1 3. The matrix A is not positive definite. Applying the given algorigthm for computing the Cholesky factorization, we find R11 = 1, R12 = −1, R13 = 3, R22 = 2, R23 = 1. However, the equation for R33 is 2 2 2 2 + R23 + R33 = A33 ⇔ 32 + 12 + R33 = 1, R13
which has no (real) solution. 4. The total operation count is
n2 n n3 + + . 3 2 6 This counts a square root as a single operation. Actually, computing a square root is much more expensive than the other operations, so it might be preferable to report the operation count as n3 /3 + n2 /2 − 5n/6 operations plus n square roots.
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
230
5. Let A ∈ Rn×n be SPD. The algorithm described on pages 527–528 of the text (that is, the equations derived on those pages) shows that there is a unique upper triangular matrix R with positive diagonal entries such that RT R = A. (The only freedom in solving those equations for the entries of R lies in choosing the positive or negative square root when computing Rii . If Rii is constrained to be positive, then the entries of R are uniquely determined.) 6. Suppose A ∈ Rn×n is symmetric and there exists an upper triangular matrix R ∈ Rn×n with positive diagonal entries such that A = RT R. If x ∈ Rn , x = 0, then Rx = 0 (since R is nonsingular—its determinant is the product of its diagonal entries), and therefore x · Ax = x · RT Rx = (Rx) · (Rx) = Rx 22 > 0. Thus A is positive definite. 7. Let A ∈ Rn×n be SPD. We wish to derive an algorithm for computing the factorization A = LDLT , where D is a diagonal matrix with positive diagonal entries and L is unit lower triangular. We have (LD)ij = Djj Lij , and hence min{i,j}
n
(LDLT )ij =
Dkk Lik Ljk = k=1
Dkk Lik Ljk , k=1
where the index k is restricted to be no more than min{i, j} because L is lower triangular, and hence Lik Ljk = 0 for k > min{i, j}. Bearing in mind that Lii = 1 for all i, we have i−1
(LDLT )ii = Aii ⇒ Dii = Aii −
Dkk L2ik
k=1
and
−1 Lji = Dii
(LDL )ij = Aij ⇒ T
i−1
Aij −
Dkk Lik Ljk
, j = i + 1, . . . , n.
k=1
This allows us to compute L one column at a time. (Note that these equations simplify to D11 = A11 and Lj1 = A1j /D11 , j = 2, . . . , n.)
9.4
Matrix norms
1. Let · be any induced matrix norm on Rn×n . If λ ∈ Cn×n , x ∈ Cn , x = 0, is an eigenvalue/eigenvector pair of A, then Ax ≤ A
x
⇒
λx ≤ A
⇒ |λ| x ≤ A
x
⇒ |λ| ≤ A
x
(the last step follows from the fact that x = 0). Since this holds for every eigenvalue of A, it follows that ρ(A) = max{|λ| : λ is an eigenvalue of A} ≤ A . 2. Recall that A F can be written as
n
A 2F =
Aj 22 ,
i=1
where A1 , . . . , An are the columns of A and AB 2F =
n j=1
(AB)j 22 =
n
·
ABj 22 ≤
j=1
which shows that AB F ≤ A F B F .
2 is the Euclidean norm. Therefore, if A, B ∈ R n j=1
A 2F Bj 22 = A 2F
n j=1
n×n
Bj 22 = A 2F B 2F ,
,
9.4. MATRIX NORMS 3. Let · define
231
be a norm on Rn , and denote the induced matrix norm on Rm×n by the same symbol. Let us 5 Ax n A ∗ = sup : x ∈ R , x = 0 . x
For any x ∈ Rn , x = 0, the vector x −1 x has norm one. Therefore, 4 4 Ax = 4A( x −1 x)4 ≤ A x (since Ay ≤ A for all vectors y of norm one). It follows that A ∗ ≤ A . Conversely, if x ∈ Rn , x = 1, then Ax Ax = ≤ A ∗, x and it follows that A ≤ A ∗ . Thus A ∗ = A . 4. Let A ∈ Rm×n be given, and let x ∈ Rn satisfy x ∞ = 1. Then, in particular, |xj | ≤ 1 for all j = 1, . . . , n. We have ⎫ ⎧% % % ⎬ ⎨%% n % Ax ∞ = max {|(Ax)i | : i = 1, . . . , n} = max %% Aij xj %% : i = 1, . . . , n ⎭ ⎩% % j=1 ⎧ ⎫ ⎨ n ⎬ ≤ max |Aij xj | : i = 1, . . . , n ⎩ ⎭ j=1 ⎫ ⎧ ⎬ ⎨ n |Aij | : i = 1, . . . , n . ≤ max ⎭ ⎩ j=1
This shows that A ∞ is no more than the maximum absolute row sum of A. Now let k satisfy 1 ≤ k ≤ n and ⎧ ⎫ n ⎨ n ⎬ |Akj | = max |Aij | : i = 1, . . . , n . ⎩ ⎭ j=1
j=1
Define x ∈ Rn by the condition that xj = 1 if Akj ≥ 0 and xj = −1 otherwise. Then x ∞ = 1, n Akj xj = |Akj | for all j = 1, . . . , n, and (Ax)k = k=1 |Akj |. This shows that Ax ∞ is no less than the maximum absolute row sum of A. Thus we have shown that ⎧ ⎫ ⎨ n ⎬ Ax ∞ ≤ max |Aij | : i = 1, . . . , n ⎩ ⎭ j=1
for all x ∈ Rn with x ∞ = 1, and Ax ∞ ≥ max
⎧ ⎨ n ⎩
|Aij | : i = 1, . . . , n
j=1
⎫ ⎬ ⎭
for at least one x ∈ Rn with x ∞ = 1. It follows that A ∞ = sup{ Ax ∞ : x ∈ Rn , x ∞ = 1} = max
⎧ ⎨ n ⎩
j=1
|Aij | : i = 1, . . . , n
⎫ ⎬ ⎭
.
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
232
5. Let · be a vector norm on Rn . We wish to prove that the induced matrix norm · satisfies the definition of a norm relative to the vector space Rm×n . By definition, it is obvious that A ≥ 0 for all A ∈ Rm×n and that A = 0 if A is the zero matrix. Moreover, if A is not the zero matrix, say Aj = 0, then Aej > 0, which shows that A > 0. Thus the first property of a norm is satisfied. The second property is easily verified: If α ∈ R, then, for all x ∈ Rn , (αA)x = α(Ax) = |α| Ax , and it follows immediately that αA = |α| A . Finally, we must verify that triangle inequality. If A, B ∈ Rm×n , then A + B = sup{ (A + B)x : x ∈ Rn , x = 1} = sup{ Ax + Bx : x ∈ Rn , x = 1} ≤ sup{ Ax + Bx : x ∈ Rn , x = 1} ≤ sup{ Ax : x ∈ Rn , x = 1} + sup{ Bx : x ∈ Rn , x = 1} = A + B . The next-to-the-last-step holds because sup{f (x) + g(x) : x ∈ S} ≤ sup{f (x) : x ∈ S} + sup{g(x) : x ∈ S} for any set S and any functions f : S → R, g : S → R. Therefore, the triangle inequality holds, and we have verified that · is a norm on Rm×n . 6. Let A ∈ Rn×n be symmetric. Then there exists a diagonal matrix D ∈ Rn×n and an orthogonal matrix X ∈ Rn×n such that A = XDX T . We can order the columns of X so that the diagonal entries of D are the eigenvalues of A ordered from largest to smallest: λ1 > · · · > λn . Given any x ∈ Rn , define y = X T x, and notice that y 2 = x 2 since X T is orthogonal. We have n
x · Ax = x · (XDX T x) = (X T x) · D(X T x) = y · Dy =
λi yi2 .
i=1
We have
n i=1
and similarly λn x
2 2 ≤
λi yi2 ≤ λ1
n
yi2 = λ1 y 22 = λ1 x 22 ,
i=1
n 2 i=1 λi yi . Therefore, λn
x 22 ≤ x · Ax ≤ λ1 x 22 .
7. Let A ∈ Rm×n . We wish to prove that AT 2 = A.2 . This follows immediately . from Theorem 403 and Exercise 4.5.14. Theorem 403 implies that A 2 = λmax (AT A), AT 2 = λmax (AAT ). By Exercise 4.5.14, AT A and AAT have the same nonzero eigenvalues, and hence λmax (AT A) = λmax (AAT ), from which it follows that A 2 = AT 2 . T A is SPD and hence the smallest eigenvalue λn of AT A is positive. 8. Let A ∈ Rn×n be .invertible. Then A . −1 Now, A λmax (A−T A−1 ) = λmax ((AAT )−1 ). The positive eigenvalues of AAT are the same as 2 = the positive eigenvalues of AT A; since both AT A and AAT are nonsingular, all of the eigenvalues of AAT are positive and λn is the smallest eigenvalue of AAT . The eigenvalues of (AAT )−1 are the reciprocals −T −1 of the eigenvalues of . AAT , and it follows that λ−1 A = (AAT )−1 . n is the largest eigenvalue of A √ √ √ Therefore, A−1 2 = λ−1 λ1 , . . . , σn = λn , we n = 1/ λn . Since the singular values of A are σ1 = can also write A−1 2 = 1/σn , where σn is the smallest singular value of A.
9. Let · denote any norm on Rn and also the corresponding induced matrix norm, and let A ∈ Rn×n be invertible. For any x ∈ Rn , we have x = A−1 (Ax), which implies x ≤ A−1 Ax . Rearranging this inequality yields x Ax ≥ . A−1
9.5. THE SENSITIVITY OF LINEAR SYSTEMS TO ERRORS
9.5
233
The sensitivity of linear systems to errors
1. Let A ∈ Rn×n be nonsingular. (a) Suppose b ∈ Rn is given. Choose c ∈ Rn such that −1 5 A−1 c A x = sup : x ∈ Rn , x = 0 = A−1 , c x and notice that A−1 c = A−1 c . (Such a c exists. For the common norms—the 1 , ∞ , and Euclidean norms—we have seen in the text how to compute such a c; for an arbitrary norm, it can be shown that such a c exists, although some results from analysis concerning continuity and compactness are needed.) Define b̂ = b + c and x = A−1 b, x̂ = A−1 (b + c). Then x̂ − x = A−1 (b + c) − A−1 b = A−1 c = A−1
c = A−1
b̂ − b .
Notice that, for the Euclidean norm · 2 and induced matrix norm, c should be chosen to be a right singular vector corresponding to the smallest singular value of A. (b) Let A ∈ Rn×n be given, and let x, c ∈ Rn be chosen so that 5 Ay Ax n = sup : y ∈ R , y = 0 = A , x y 5 A−1 c A−1 y n = sup : y ∈ R , y = 0 = A−1 . c y Define b = Ax, b̂ = b + c, and x̂ = A−1 (b + c) = x + A−1 c. We then have b = Ax = A and
x
x̂ − x = A−1 c = A−1
It follows that
⇒
x =
b A
c = A−1
x̂ − x A−1 b̂ − b = = A b x
A−1
A
2. Let A ∈ R
n×n
be invertible and suppose E ∈ R
n×n
satisfies
E <
1 , A−1
b̂ − b . b̂ − b . b
where · denotes any induced matrix norm. We wish to show that A + E is invertible and derive a bound on (A + E)−1 . We have A + E = A(I + A−1 E), and A−1 E ≤ A−1
A−1 = 1. A−1
E <
Therefore, Theorem 405 implies that I + A−1 E is invertible, and (I + A−1 E)−1 ≤
1 1 ≤ 1 − A−1 E 1 − A−1
.
E
It follows that A + E = A(I + A−1 E) is invertible (since the product of invertible matrices is invertible), and (A + E)−1 = (I + A−1 E)−1 A−1 ≤ (I + A−1 E)−1 ≤
A−1 1 − A−1
E
.
A−1
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
234
3. Suppose A ∈ Rn×n is nonsingular and  ∈ Rn×n , x, x̂ ∈ Rn , and b ∈ Rn satisfy Ax = b, Âx̂ = b. We then have Âx̂ = Ax ⇒ Âx̂ − Ax = 0 ⇒ Âx̂ − Ax̂ + Ax̂ − Ax = 0 ⇒ Ax̂ − Ax = −(Âx̂ − Ax̂) ⇒ x̂ − x = −A−1 (Âx̂ − Ax̂) ⇒ x̂ − x ≤ A−1
 − A
⇒
x̂ − x ≤ A−1 x̂
 − A
⇒
x̂ − x ≤ A x̂
⇒
 − A x̂ − x ≤ cond(A) . x̂ A
A−1
x̂
 − A A
4. Let A ∈ Rn×n be nonsingular, and let · denote any norm on Rn and the corresponding induced matrix norm. We have cond(A) = A A−1 , and A = max
Ax . x
A−1 = max
A−1 y . y
x=0
Also, y =0
−1
If we write x = A
−1
y, then (since A
A−1
is nonsingular) x = 0 if and only if y = 0, and y = Ax. Therefore, −1 x Ax = max = max . x=0 Ax x=0 x
It is easy to show that, for any set S and any f : S → R satisfying f (s) > 0 for all s ∈ S, maxx∈S f (x)−1 = −1 (minx∈S f (x)) . Therefore, 1 A−1 = , minx=0 Ax x and hence cond(A) = 5. Let A ∈ Rn×n be invertible, and let norm.
·
maxx=0 minx=0
Ax x Ax x
.
denote any norm on Rn and the corresponding induced matrix
(a) Let B ∈ Rn×n be any singular matrix. We have seen (Exercise 9.4.9) that Ax ≥ x / A−1 for all x ∈ Rn . Let x ∈ N (B) with x = 1. Then A − B ≥ (A − B)x = Ax ≥ (b) It follows that
1 x = . A−1 A−1
A−B A 1/ A−1 ≥ inf A 1 . = cond(A) inf
5 : B ∈ Rn×n , det(B) = 0 5 : B∈R
n×n
, det(B) = 0
9.5. THE SENSITIVITY OF LINEAR SYSTEMS TO ERRORS
235
(c) Consider the special case of the Euclidean norm on Rn and induced norm · 2 on Rn×n . Let A = U ΣV T be the SVD of A, and define A = U Σ V T , where Σ is the diagonal matrix with diagonal entries σ1 , . . . , σn−1 , 0 (σ1 ≥ · · · σn−1 ≥ σn > 0 are the singular values of A). Then A − A 2 = U ΣV T − U Σ V T 2 = U (Σ − Σ )V T 2 = Σ − Σ 2 1 = σn = A−1 ( Σ − Σ 2 is the largest singular value of Σ − Σ , which is a diagonal matrix with a single nonzero entry, σn ). It follows that A − A 2 1 1 . = = −1 A 2 A 2 A cond2 (A) 2 Hence the inequality derived in part (b) is an equality in this case. 6. Let A ∈ Rn×n be a nonsingular matrix, and consider the problem of computing y = Ax from x ∈ Rn (x is regarded as the data and y is regarded as the solution). Suppose we have a noisy value of x, x̂, and we compute ŷ = Ax̂. Then ŷ − y = Ax̂ − Ax ≤ A x̂ − x . Also, x = A−1 y implies that x ≤ A−1
y , and therefore
ŷ − y ≤ A A−1 y or
ŷ − y ≤ A y
A−1
x̂ − x , x
x̂ − x = cond(A) x
x̂ − x . x
Thus the condition number for matrix multiplication is the same as the condition number for solving Ax = y for x. 7. Let A ∈ Rm×n be nonsingular, suppose x, x̂ ∈ Rn and y, ŷ ∈ Rm satisfy y = Ax, ŷ = Ax̂. Then ŷ − y 2 = Ax̂ − Ax 2 ≤ A 2 x̂ − x 2 = σ1 x̂ − x 2 , where σ1 is the largest singular value of A. Also, y 2 = Ax 2 ≥ σn x 2 , where σn is the smallest singular value of A (by Exercise 8.2.12). It follows that ŷ − y 2 σ1 x̂ − x 2 ≤ . y 2 σn x 2 Notice that, in the case that A is square and invertible, σ1 /σn is the usual condition number of A. 8. Let x ∈ Rn be fixed, and consider the problem of computing y = Ax, where A ∈ Rn×n is regarded as the data and y as the solution. Assume A is limited to nonsingular matrices. Let  be a noisy value of A, and define ŷ = Âx. We wish to bound the relative error in ŷ in terms of the relative error in Â. We have y = Ax, ŷ = Âx, which imply that ŷ − y = Âx − Ax = ( − A)x, and hence that ŷ − y ≤  − A x . We have seen (Exercise 9.4.9) that y = Ax ≥ x / A−1 . Putting these facts together, we obtain  − A x ŷ − y ≤ = A−1 x y −1 A
⇒
 − A = A
A−1
 − A A
 − A ŷ − y ≤ cond(A) . y A
Thus we see that the usual condition number also measures the conditioning of the matrix-vector multiplication problem.
236
9.6
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
Numerical stability
1. (a) Suppose x and y are two real numbers and x̂ and ŷ are perturbations of x and y, respectively. We have x̂ŷ − xy = x̂ŷ − x̂y + x̂y − xy = x̂(ŷ − y) + (x̂ − x)y, which implies that |x̂ŷ − xy| ≤ |x̂||ŷ − y| + |x̂ − x||y|. This gives a bound on the absolute error in approximating xy by x̂ŷ. Dividing by xy yields |x̂ŷ − xy| |x̂| |ŷ − y| |x̂ − x| ≤ + . |xy| |x| |y| |x| This yields a bound on the relative error in x̂ŷ in terms of the relative errors in x̂ and ŷ, although it would be preferable if the bound did not contain x̂ (except in the expression for the relative error in x̂). We can manipulate the bound as follows: |x̂ŷ − xy| |x̂| |ŷ − y| |x̂ − x| |ŷ − y| |x̂ − x| |x̂| |ŷ − y| ≤ + = + + −1 |xy| |x| |y| |x| |y| |x| |x| |y| % % % |ŷ − y| |ŷ − y| |x̂ − x| %% |x̂| + +% − 1%% ≤ |y| |x| |x| |y| |ŷ − y| |x̂ − x| ||x̂| − |x|| |ŷ − y| = + + |y| |x| |x| |y| |ŷ − y| |x̂ − x| |x̂ − x| |ŷ − y| + + . ≤ |y| |x| |x| |y| When the relative errors in x̂ and ŷ are small (that is, much less than 1), then their product is much smaller, and we see that the relative error in x̂ŷ as an approximation to xy is approximately bounded by the sum of the errors in x̂ and ŷ. (b) If x and y are floating point numbers, then fl(xy) = xy(1 + ), where | | ≤ u. Therefore, fl(xy) is the exact product of x̃ and ỹ, where x̃ = x and ỹ = y(1 + ). This shows that the computed product is the exact product of nearby numbers, and therefore that floating point multiplication is backward stable. 2. (a) Suppose x and y are two real numbers and x̂ and ŷ are perturbations of x and y, respectively. We have (x̂ + ŷ) − (x + y) = (x̂ − x) + (ŷ − y) ⇒ |(x̂ + ŷ) − (x + y)| ≤ |x̂ − x| + |ŷ − y| |(x̂ + ŷ) − (x + y)| |x̂ − x| |ŷ − y| ≤ + |x + y| |x + y| |x + y| |x| |x̂ − x| |y| |ŷ − y| |(x̂ + ŷ) − (x + y)| ≤ + . ⇒ |x + y| |x + y| |x| |x + y| |y|
⇒
From this bound on the relative error in x̂ + ŷ as an approximation of x + y, we see that the conditioning of the problem is determined by the ratios |x|/|x + y| and |y|/|x + y|. If |x + y| is not too small in comparison to |x|, |y| (in particular, if x and y have the same sign), then the computation of x + y is well-conditioned. However, if x and y have opposite signs and are approximately equal in magnitude, so that |x + y| is very small compared to |x|, |y|, then the problem is ill-conditioned. (b) If now x and y are floating point numbers, then fl(x + y) = (x + y)(1 + ) = x(1 + ) + y(1 + ) = x̃ + ỹ, where | | ≤ u and x̃ = x(1 + ), ỹ = y(1 + ). This shows that floating point addition is backward stable.
9.6. NUMERICAL STABILITY
237
3. We wish to prove that the growth factor ρn in Gaussian elimination with partial pivoting is bounded by 2n−1 . Let A ∈ Rn×n and use the notation of Section 9.2.3 of the text: A(1) = A, A(2) = L1 P1 A, A(3) = L2 P2 L1 P1 A, and so forth. With this notation, U = A(n) is upper triangular. Moreover, we will assume that the necessary row interchanges are applied to A before the factorization is done, that is, that we apply Gaussian elimination to P A, where P = Pn−1 · · · P1 . (All the numbers appearing in A(k) are the same, whether the row interchanges are applied first or at each step, although the numbers appear in different entries of A(k) . Thus this assumption involves no loss of generality.) Therefore, A(1) becomes P A, and then A(k) = Lk−1 · · · L1 P A. We then have (k+1)
Aij
(k+1)
(for all other values of i, j, Aij
(k)
= Aij −
(k)
Aik
(k)
Akk
(k)
Akj , i, j = k + 1, . . . , n
(k)
= Aij ). By assumption, % % % A(k) % % ik % % (k) % ≤ 1, %A % kk
and hence
% % % % % % % (k) % % % % % % % % % % (k+1) % % (k) % % Aik % % (k) % % (k) % % (k) % % % %Aij % ≤ %Aij % + % (k) % %Akj % ≤ %Aij % + %Akj % ≤ 2 max %A(k) pq % : p, q = 1, . . . , n . %A % kk
This holds for all i, j, and therefore % % % % % % % (k) % max %A(k+1) : p, q = 1, . . . , n ≤ 2 max % %A pq pq % : p, q = 1, . . . , n . This shows that the maximum entry in A(k) grows by at most a factor of 2 at each step; since Gaussian elimination requires n − 1 steps, we see that the growth factor is bounded by 2n−1 . 4. Given A ∈ Rn×n , x ∈ Rn×n , we wish to analyze the errors arising in computing y = Ax in floating point arithmetic. (a) Let ŷ be the computed value of Ax. Each component of ŷ is the result of a dot product. Let ri denote the ith row of A, and recall that if α, β are floating point numbers, then the computed value of αβ is αβ(1 + δ), where |δ| ≤ u, and similarly for α + β. Computing ri · x requires 2n operations, each of which produces a rounding error. We will adopt the convention of labeling every rounding error by δ, even though each error has a different value. This causes no harm, since the only fact we use is that each rounding error is bounded: |δ| ≤ u. With this convention, the computation of ri · x proceeds as follows: Ai1 x1 (1 + δ) → (Ai1 x1 (1 + δ) + Ai2 x2 (1 + δ))(1 + δ) → ((Ai1 x1 (1 + δ) + Ai2 x2 (1 + δ))(1 + δ) + Ai3 x3 (1 + δ))(1 + δ) → .... Examining this pattern, we see that the final computed value of ri · x can be written as 2 Ai1 x1 Πnk=1 (1 + δ) + Ai2 x2 Πnk=1 (1 + δ) + Ai3 x3 Πn−1 k=1 (1 + δ) + · · · + Ain xn Πk=1 (1 + δ),
which we can write as Ai1 x1 + Ai2 x2 + Ai3 x3 + · · · + Ain xn + Ai1 x1 (Πnk=1 (1 + δ) − 1) + Ai2 x2 (Πnk=1 (1 + δ) − 1) + 2 Ai3 x3 Πn−1 k=1 (1 + δ) − 1 + · · · + Ain xn Πk=1 (1 + δ) − 1 .
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
238
If we define δAi1 = Ai1 (Πnk=1 (1 + δ) − 1) , δAi2 = Ai2 (Πnk=1 (1 + δ) − 1) , .. .. . . 2 δAin = Ain Πk=1 (1 + δ) − 1 , then we have ŷ = Ax + δAx = (A + δA)x, as desired. (b) Next, we need an upper bound on |δAij |. We use the following result: If nu ≤ 0.01 and |δi | ≤ u for i = 1, . . . , n, then |Πni=1 (1 + δi ) − 1| ≤ 1.01nu (see Higham [19], Lemma 3.4). We obtain, for each i, j, |δAij | ≤ 1.01nu |Aij | ≤ 1.01nu A ∞. (These bounds are intentionally rather crude—that is, not very tight—to allow for a manageable result.) It follows that δA ∞ ≤ 1.01n2 u A ∞ (recall that δA ∞ is the maximum absolute row sum). (c) Using the result of Exercise 9.5.8, ŷ − y ∞ δA ∞ ≤ cond(A) ≤ 1.01n2ucond(A). y ∞ A ∞
9.7
The sensitivity of the least-squares problem
1. Let A ∈ Rm×n have rank n, and suppose the singular value decomposition of A is A = U ΣV T . Then AT A = V ΣT U T U ΣV T = V ΣT ΣV T ⇒ (AT A)−1 = V (ΣT Σ)−1 V T and therefore
(AT A)−1 AT = V (ΣT Σ)−1 V T V ΣT U T = V (ΣT Σ)−1 ΣT U T .
The matrix (ΣT Σ)−1 ΣT is an n × m diagonal matrix with diagonal entries σ1−1 , σ2−1 , . . . , σn−1 , where σ1 ≥ σ2 ≥ · · · ≥ σn > 0 are the singular values of A. It follows that the singular values of (AT A)−1 AT −1 are σn−1 ≥ σn−1 ≥ · · · ≥ σ1−1 > 0. By Corollary 404 in the text, it follows that (AT A)−1 AT 2 = σn−1 . 2. Let · be any matrix norm, and let E ∈ Rn×n satisfy E < 1. By Theorem 405, we know that I + E is invertible. We have (I + E)(I − E) = I − E 2 = (I + E)(I + E)−1 − E 2 ⇒ (I + E)(I + E)−1 = (I + E)(I − E) + E 2 ⇒ (I + E)−1 = (I − E) + (I + E)−1 E 2 , and therefore (I + E)−1 − (I − E) = (I + E)−1 E 2 ≤ (I + E)−1 ≤
E 2
E 2 1− E
(using the bound on (I + E)−1 from Theorem 405). If we assume E ≤ 1/2, we obtain (I + E)−1 − (I − E) ≤ 2 E 2 , and thus (I + E)−1 = I − E + O( E 2 ).
9.7. THE SENSITIVITY OF THE LEAST-SQUARES PROBLEM
239
3. Let A ∈ Rm×n , b ∈ Rm be given, and let x be a least-squares solution of Ax = b. We have b = Ax + (b − Ax) ⇒ b · Ax = (Ax) · (Ax) + (b − Ax) · Ax = Ax 22 , since (b − Ax) · Ax = 0 (b − Ax is orthogonal to col(A)). It follows that b 2 Ax 2 cos (θ) = Ax 22 ⇒
Ax 2 = b 2 cos (θ),
where θ is the angle between Ax and b. Also, since Ax, b − Ax are orthogonal, the Pythagorean theorem implies b 22 = Ax 22 + b − Ax 22 . Dividing both sides by b 22 yields 1= Therefore,
Ax 22 b − Ax 22 b − Ax 22 2 + ⇒ 1 = cos (θ) + . b 22 b 22 b 22 b − Ax 22 = sin2 (θ) ⇒ b 22
b − Ax 2 = b 2 sin (θ).
4. Let A ∈ Rm×n , b ∈ Rm be given, and assume m < n, rank(A) = m. The minimum-norm solution of Ax = b is x = AT (AAT )−1 b. If b̂ is a perturbation of b, the corresponding solution is x̂ = AT (AAT )−1 b̂, and x̂ − x ≤ AT (AAT )−1 b̂ − b . Also, b = Ax ⇒
b ≤ A
x
⇒
x ≥
b . A
Combining the two inequalities yields x̂ − x AT (AAT )−1 ≤ b x
b̂ − b
= A
AT (AAT )−1
A
b̂ − b . b
This shows that the condition number of the problem is A AT (AAT )−1 . In the case of the 2-norm, −1 we have A 2 = σ1 , AT (AAT )−1 2 = σm (where σ1 and σm are the largest and smallest singular values of A, respectively), and we obtain x̂ − x σ1 b̂ − b ≤ . x σm b Notice that cond2 (A) = σ1 /σm if A is a square, nonsingular m × m matrix. 5. Let A ∈ Rm×n , b ∈ Rm be given, assume m > n, rank(A) = r < n, and consider the problem of finding the minimum-norm solution least-squares solution x to Ax = b when b is consider the data. The minimum-norm least squares solution is x = A† b =
r
ui · b vi , σi i=1
where σ1 ≥ · · · ≥ σr > 0 are the nonzero singular values of A, and v1 , . . . , vr , u1 , . . . , ur are the corresponding right and left singular vectors, respectively. We have x = A† b, x̂ = A† b̂ ⇒ x̂ − x = A† (b̂ − b) ⇒
x̂ − x 2 ≤ A† 2 b̂ − b 2 .
We also have Ax 2 = b 2 cos (θ), where θ is the angle between b and col(A) (see Exercise 3), and hence b 2 cos (θ) ≤ A 2 x 2 ⇒
x 2≥
b 2 cos (θ) . A 2
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
240
It follows the previous two inequalities that A† 2 b̂ − b 2 x̂ − x 2 ≤ ⇒ b 2 cos (θ) x 2 A 2
x̂ − x 2 A 2 A† 2 b̂ − b 2 ≤ . x 2 cos (θ) b 2
Finally, we know that A 2 = σ1 , and the formula for A† shows that A† 2 = σr−1 . Thus σ1 x̂ − x 2 b̂ − b 2 ≤ . x 2 σr cos (θ) b 2
9.8
The QR factorization
1. Let x, y ∈ R3 be defined by x = (1, 2, 1) and y = (2, 1, 1). We define x−y u= = x−y 2
⎡ 0 1 1 − √ , √ , 0 , Q = I − 2uuT = ⎣ 1 2 2 0
1 0 0
⎤ 0 0 ⎦. 1
Then Qx = y. 2. If x = (1, 2, 1) and y = (1, 0, 1), then no Householder transformation can satisfy Qx = y because x 2 = y 2 . A Householder transformation is orthogonal, and orthogonal transformations preserve norms. 3. We have A = QR, where ⎤ −0.42640 0.65970 0.61885 Q = ⎣ −0.63960 −0.70368 0.30943 ⎦ , 0.63960 −0.26388 0.72199 ⎤ ⎡ −4.6904 −3.8376 2.7716 0 2.0671 −2.1110 ⎦ R=⎣ 0 0 9.2828 ⎡
(correct to the digits shown). The Householder vectors are u1 = (0.84451, 0.37868, −0.37868), u2 = (−0.99987, 0.015968). 4. The QR factorization of A is A = QR, where ⎤ ⎡ −0.30151 0.28719 0.79438 −0.060560 0.43806 ⎢ −0.60302 −0.47865 −0.10929 0.60236 0.18020 ⎥ ⎥ ⎢ ⎥, −0.60302 0.36377 0.019286 −0.10097 −0.70247 Q=⎢ ⎥ ⎢ ⎣ 0.30151 0.55523 0.017449 0.77061 −0.081589 ⎦ −0.30151 0.49780 −0.59694 −0.17161 0.52489 ⎡ ⎤ −3.3166 2.1106 −2.1106 ⎢ 0 −4.7482 −0.51694 ⎥ ⎢ ⎥ 0 0 −4.3907 ⎥ R=⎢ ⎢ ⎥. ⎣ 0 0 0 ⎦ 0 0 0 The Householder vectors are u1 = (0.80669, 0.37376, 0.37376, −0.18688, 0.18688), u2 = (0.89769, −0.12850, −0.34631, −0.24021), u3 = (0.84175, −0.010293, 0.53976).
9.8. THE QR FACTORIZATION
241
5. Let A ∈ Rm×n have full rank, and define Sk = sp{A1 , . . . , Ak }, k = 1, 2, . . . , n. Equations (9.33) show that A1 = A1 2 Q1 , A2 = (A2 · Q1 )Q1 + A2 − projS1 A2 2 Q2 , .. .. . . n−1
(An · Qj )Qj + An − projSn−1 An 2 Qn .
An = j=1
Thus, A = Q0 R0 , where R0 ∈ Rn×n has i, j entry ⎧ Aj · Qi , j > i, ⎨ Aj − projSj−1 Aj 2 , j = i, rij = ⎩ 0, j < i. 6. Let {u1 , u2 , . . . , um } be an orthonormal basis for Rm , let X = [u1 |u2 | · · · |um ], and let D ∈ Rm×m be the diagonal matrix whose diagonal entries are −1, 1, . . . , 1. Finally, define U = XDX T . (a) We have XD = [−u1 |u2 | · · · |um ] and therefore U = (XD)X T = −u1 uT1 + u2 uT2 + · · · + um uTm . (b) We also have XX T = I, which implies that u1 uT1 + u2 uT2 + · · · + um uTm = I. (c) It follows that U = −u1 uT1 + (I − u1 uT1 ) = I − 2u1 uT1 . (d) We have U T = (XDX T )T = XDX T = U , and U T U = XDX T XDX T = XD2 X T = XIX T = XX T = I (since D2 = I), and hence U is symmetric and orthogonal. 7. Let v ∈ Rm be given, define α = ± v 2 , x = αe1 − v, u1 = x/ x 2 , and let {u1 , u2 , . . . , um } be an orthonormal basis for Rm , S = {u1 }⊥ = sp{u2 , . . . , um }. We wish to prove that there exists y ∈ S such that v = y − (1/2)x, αe1 = y + (1/2)x. Let us define y = v + (1/2)x. Then 1 1 1 1 1 y = v + (αe1 − v) = v + αe1 − v = v + αe1 2 2 2 2 2 and 1 1 1 1 1 1 αe1 − x = αe1 − (αe1 − v) = αe1 − αe1 + v = αe1 + v = y. 2 2 2 2 2 2 Thus y satisfies both v = y − (1/2)x and αe1 = y + (1/2)x. It remains to show that y ∈ S, that is, that y · u1 = 0. Since u1 = x/ x 2 , it suffices to show that y · x = 0: 1 1 y·x= v+ x ·x= v·x+ x·x 2 2 1 = v · (αe1 − v) + (αe1 − v) · (αe1 − v) 2 1 2 1 = αv1 − v · v + α − αv1 + v · v = 0 2 2 (since α2 = v · v). This completes the proof. 8. Let A ∈ Rm×n have full rank.
242
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA (a) Reducing A to upper triangular form using Householder transformations requires 2 1 7 2mn2 − n2 + 2mn + n2 + n 3 2 6 arithmetic operations, plus the computation of n square roots. (b) To solve the least-squares problem using the Householder QR method, we must compute the QR factorization, apply Q to the right-hand side vector b, and solve Rx = QT b. Computing QT b requires 4mn − 2n2 + 3n operations, while solving Rx = QT b requires n2 operations. The grand total is 2 1 25 2mn2 − n2 + 6mn − n2 + n 3 2 6 arithmetic operations, plus the computation of n square roots. (c) If m = n, the operation count reduces to 4 3 11 2 25 n + n + n 3 2 6 (plus n square roots). (d) To compute Q = Q1 Q2 · · · Qn , we note that, for k = 1, . . . , n, I 0 Qk = , 0 I − 2uk uTk where uk ∈ Rm−k+1 . We first form Qn , which requires (m − n + 1)2 + 2(m − n + 1) operations to compute I − 2un uTn . Then, for k = n − 1, . . . , 1, we compute Qk (Qk+1 · · · Qn ) by multiplying the last m−k+1 entries in each column of Qk+1 · · · Qn by I −2uk uTk . This requires 4(m−k+1)+1 operations for each column, but it need be done only for the last m−k+1 columns of Qk+1 · · · Qn (since the first k − 1 columns have all zeros in the last m− k + 1 rows). Thus the total is (m− k + 1)(4(m− k + 1)+ 1) in total. The total operation count to compute Q is therefore (m − n + 1)2 + 2(m − n + 1) +
n−1
(4(m − k + 1)2 + m − k + 1)
k=1
4 11 37 = 4m n − 4mn + n3 + 11mn − 3m2 − n2 + n − 5m − 2. 3 2 6 2
2
The signifant part of this expression is 4m2 n − 4mn2 + (4/3)n3 . Note that this reduces to simply (4/3)n3 if m = n.
9.9
Eigenvalues and simultaneous iteration
1. We apply n iterations of the power method, normalizing the approximate eigenvectors in the Euclidean norm at each iteration, and estimating the eigenvalue by λ = (x · Ax)/(x · x) = x · Ax. We use a random starting vector. With n = 10, we have x0 = (0.37138, −0.22558, 1.1174), x10 = (0.40779, −0.8165, 0.40871), . λ = 3.9999991538580488. With n = 20, we obtain x0 = (−1.0891, 0.032557, 0.55253), x20 = (−0.40825, 0.8165, −0.40825), . λ = 3.9999999999593752. It seems clear that the dominant eigenvalue is λ = 4.
9.9. EIGENVALUES AND SIMULTANEOUS ITERATION
243
2. We use simultaneous iteration with a random initial orthogonal matrix Q(0) . We judge the progress by looking at the off-diagonal entries of (Q(k) )T AQ(k) (which should converge to zero). Here are the results: ⎡ ⎤ −8.2644 · 10−1 −5.1733 · 10−1 −2.2217 · 10−1 4.4323 · 10−1 ⎦ , Q(0) = ⎣ 3.8699 · 10−1 −8.0857 · 10−1 −1 −1 −4.0894 · 10 2.8033 · 10 8.6844 · 10−1 ⎡ ⎤ −4.0850 · 10−1 −7.0765 · 10−1 5.7650 · 10−1 Q(10) = ⎣ 8.1650 · 10−1 −9.7820 · 10−4 5.7735 · 10−1 ⎦ , −4.0800 · 10−1 7.0656 · 10−1 5.7820 · 10−1 ⎡ ⎤ 4.0000 −7.0291 · 10−4 8.6678 · 10−7 2.0000 1.1973 · 10−3 ⎦ , (Q(10) )T AQ(10) = ⎣ −7.0291 · 10−4 −7 8.6678 · 10 1.1973 · 10−3 1.0000 ⎡ ⎤ −1 −1 −4.0825 · 10 −7.0711 · 10 5.7735 · 10−1 Q(20) = ⎣ 8.1650 · 10−1 −9.5527 · 10−7 5.7735 · 10−1 ⎦ , −4.0825 · 10−1 7.0711 · 10−1 5.7735 · 10−1 ⎡ ⎤ 4.0000 −6.8643 · 10−7 8.2701 · 10−13 2.0000 1.1692 · 10−6 ⎦ , (Q(20) )T AQ(10) = ⎣ −6.8643 · 10−7 −13 8.2681 · 10 1.1692 · 10−6 1.0000 ⎡ ⎤ −1 −1 −4.0825 · 10 −7.0711 · 10 5.7735 · 10−1 Q(30) = ⎣ 8.1650 · 10−1 −9.3288 · 10−10 5.7735 · 10−1 ⎦ , −4.0825 · 10−1 7.0711 · 10−1 5.7735 · 10−1 ⎡ ⎤ 4.0000 −6.7034 · 10−10 1.1102 · 10−16 2.0000 1.1418 · 10−9 ⎦ . (Q(30) )T AQ(10) = ⎣ −6.7034 · 10−10 −16 3.8858 · 10 1.1418 · 10−9 1.0000 The eigenvalues appear to be 4, 2, and 1. 3. Suppose A ∈ Cn×n is diagonalizable, with eigenvalues λ1 , λ2 , . . . , λn . Assume |λ1 | > |λ2 | ≥ · · · ≥ |λn |. Let corresponding eigenvectors be x1 , x2 , . . . , xn , and assume that v0 ∈ Cn satisfies v0 = α1 x1 +. . .+αn xn with α1 = 0. We wish to prove that the angle between Ak v0 and sp{x1 } = EA (λ1 ) converges to zero as k → ∞. We will assume for convenience that xj 2 = 1 for j = 1, . . . , n. We have n
Ak v0 =
αj λkj xj , j=1
and hence, if θk is the angle between Ak v0 and x1 , n
cos (θk ) =
1 αj λkj x1 · xj . Ak v0 2 j=1
It suffices to show that | cos (θk )| → 1, since then the angle between Ak v0 and EA (λ1 ) converges to zero. We assume for now that λ1 > 0. Now, by the triangle inequality and reverse triangle inequality, we have n
n
|α1 ||λ1 |k −
|αj ||λj |k ≤ Ak v0 2 ≤ |α1 ||λ1 |k + j=2
which implies that n
|α1 | −
|αj j=2
and hence that
|αj ||λj |k , j=2
% %k % %k n k % λj % % λj % % % % % ≤ A v0 2 ≤ |α1 | + |α j % λ1 % % λ1 % λk1 j=2 Ak v0 2 → |α1 | as k → ∞. λk1
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
244 Therefore,
n
cos (θk ) =
λkj α1 λk1 + α x1 · xj j Ak v0 2 j=2 Ak v0 2
=
n λkj λk1 α1 λk1 + αj k x1 · xj k k A v0 2 j=2 λ1 A v0 2
→
α1 = ±1 as k → ∞. |α1 |
This completes the proof in the case λ1 > 0. If λ1 < 0, a slight modification of the above proof shows that A2p v0 2 A2p+1 v0 2 → |α1 |, → −|α1 | as p → ∞, 2p λ1 λ2p+1 1 and hence it still follows that | cos (θk )| → 1. 4. Let A ∈ Rn×n have complex conjugate eigenvalues λ = μ + θi, λ = μ − θi, each of multiplicity one. We wish to prove that there is a two-dimensional subspace of Rn , corresponding to λ, λ, that is invariant under A. We will show that there is a basis of Eλ (A) + Eλ (A) that consists of two real vectors x, y ∈ Rn . We will then prove that S = sp{x, y} ⊂ Rn is invariant under A. Suppose z ∈ Cn , z = 0, is an eigenvector corresponding to λ; then Az = λz, Az = λz. Let us write z = x + iy, x, y ∈ Rn . We then have Az = λz ⇒ A(x + iy) = (μ + iθ)(x + iy) ⇒ Ax = μx − θy, Ay = θu + μv. Notice that these last two equations imply that sp{x, y} is invariant under A, so it suffices to prove that {x, y} is linearly independent. We are assuming that λ is a complex (nonreal) eigenvalue, and hence θ = 0. We argue by contradiction and suppose {x, y} is linearly independent, say y = αx, where α ∈ R. We then have z = x + iαx = (1 + iα)x = ωx, ω = 1 + αi = 0. It follows that Az = A(ωx) = ωAx and λz = λωx and therefore, since Ax = λz, ωAx = λωx, or Ax = λx. But Ax ∈ Rn and λx ∈ Rn (since λ ∈ R and x ∈ Rn ), a contradiction. Thus {x, y} is linearly independent, and the proof is complete. 5. Let A ∈ Rn×n . We wish to prove that there exists an orthogonal matrix Q such that QT AQ is block upper triangular, with each diagonal block of size 1 × 1 or 2 × 2. If all the eigenvalues of A are real, then the proof of Theorem 413 can be given using only real numbers and vectors, and the result is immediate. Therefore, let λ, λ be a complex conjugate pair of eigenvalues of A. By an induction argument similar to that in the proof of Theorem 413, it suffices to prove that there exists an orthogonal matrix Q̂ ∈ Rn×n such that T B , Q̂T AQ̂ = 0 C where T ∈ R2×2 has eigenvalues λ, λ. Suppose z = x + iy, z = 0, x, y ∈ Rn , satisfies Az = λz. In the solution of the previous exercise, we saw that {x, y} is linearly independent, so define S = sp{x, y} ⊂ Rn and Ŝ = sp{x, y} ⊂ Cn (Ŝ = sp{z, z}). Let {q1 , q2 } be an orthonormal basis for S, and extend it to an orthonormal basis {q1 , q2 , . . . , qn } for Rn . Define Q̂1 = [q1 |q2 ], Q̂2 = [q3 | · · · |qn ], and Q̂ = [Q̂1 |Q̂2 ]. We then have " " ! ! Q̂T1 Q̂T1 AQ̂1 Q̂T1 AQ̂2 T Q̂ AQ̂ = [AQ̂1 |AQ̂2 ] = . Q̂T2 Q̂T2 AQ̂1 Q̂T2 AQ̂2 Since both columns of AQ̂1 belong to S and each column of Q̂2 belongs to S ⊥ , it follows that Q̂T2 AQ̂1 = 0. Thus it remains only to prove that the eigenvalues of T = Q̂T1 AQ̂1 are λ, λ. We see that η ∈ C , u ∈ C2 form an eigenpair of T if and only if T u = ηu ⇔ Q̂T1 AQ̂1 u = ηu ⇔ A(Q̂1 u) = η(Q̂1 u). Since both z and z can be written as Q̂1 u for u ∈ C2 , it follows that both λ and λ are eigenvalues of T ; moreover, since T is 2 × 2, these are the only eigenvalues of T . This completes the proof.
9.10. THE QR ALGORITHM
9.10
245
The QR algorithm
1. Two steps are required to reduce A to upper Hessenberg form. The result is ⎡ ⎤ −2.0000 4.8507 · 10−1 −1.1021 · 10−1 8.6750 · 10−1 ⎢ −4.1231 3.1765 1.3851 −7.6145 · 10−1 ⎥ ⎥ H=⎢ ⎣ 0 −1.4240 1.9481 1.6597 · 10−1 ⎦ 0 0 8.9357 · 10−1 −3.1246 and the vectors defining the two Householder transformations are u1 = (8.6171 · 10−1 , −2.8146 · 10−1 , 4.2219 · 10−1 ), u2 = (9.3689 · 10−1 , 3.4964 · 10−1 ). 2. Three plane rotations are required to compute the QR factorization of H: ⎡
− √12 ⎢ − √1 ⎢ 2 Q1 = ⎢ ⎣ 0 0
√1 2 − √12
0 0
0 0 1 0
⎤ ⎡ 0 1 0 0 ⎢ 0 0 ⎥ 0 1 ⎥ ⎥ , Q2 = ⎢ ⎣ 0 −1 0 0 ⎦ 0 0 0 1
⎡ ⎤ 1 0 ⎢ 0 0 ⎥ ⎥ , Q3 = ⎢ ⎢ 0 ⎦ ⎣ 0 1 0
0 1
0 0
0
− 31 √ 2 2 3
0
0 0
√ −232 − 31
⎤ ⎥ ⎥ ⎥. ⎦
Then H = QR, where Q = QT1 QT2 QT3 and ⎡ ⎤ 4.2426 2.8284 −7.0711 · 10−1 7.0711 · 10−1 ⎢ 0 1.0000 −3.0000 −2.0000 ⎥ ⎥ R=⎢ ⎣ 0 0 2.1213 2.1213 ⎦ 0 0 0 −3.0000 3. Let H be an n × n upper Hessenberg matrix. To compute the QR factorization of H using plane rotations requires n − 1 steps. Step j requires 5 operations plus the computation of a square root to compute c and s. Then, multiplying Pj−1 · · · P1 H by Pj requires 6(n − j) operations. The grand total is n−1
{6(n − j) + 5} = 3n2 + 2n − 5
j=1
operations, plus n − 1 square roots. 4. (a) If H ∈ Rn×n is symmetric and tridiagonal, then we can compute the QR factorization of H using n − 1 plane rotations. The jth step requires 5 operations to compute c and s (and η), and 8 more to multiply the rest of the two rows that are affected by the plane rotation (these rows only contain three nonzeros). The last step requires only 11 operations because the last row contains one fewer nonzero. The grand total is 13(n − 1) + 2 = 13n − 11 operations, plus n − 1 square roots. (b) Now we wish to count the number of operations for each iteration of the QR algorithm, applied to a symmetric tridiagonal matrix. We already know that the QR factorization requires 13n − 11 operations. Moreover, the upper triangular factor Ri+1 has nonzeros only on the main diagonal and T first two super diagonals. We must compute Ai+1 = Ri+1 P1T · · · Pn−1 . Multiplying on the right T by Pj only affects columns j and j + 1, and takes only 16 operations since columns j and j + 1 contain only four nonzeros each. Columns 1 and 2 contain fewer nonzeros and so multiplying by P1T and P2T requires only 8 and 14 operations, respectively. The total operation count for computing T Ai+1 = Ri+1 P1T · · · Pn−1 is 16(n − 1) + 10 = 16n − 6 operations, and the total for one step of the QR algorithm (computing Ai+1 from Ai ) is 29n − 17 arithmetic operations, plus n − 1 square roots.
246
CHAPTER 9. MATRIX FACTORIZATIONS AND NUMERICAL LINEAR ALGEBRA
5. The inequality
is equivalent to
|λk+1 | |λk+1 − μ| < |λk − μ| |λk |
(9.3)
|λk+1 − μ| |λk − μ| < , |λk+1 | |λk |
and hence (9.3) holds if and only if the relative error in μ as an estimate of λk+1 is less than the relative error in μ as an estimate of λk .
Chapter 10
Analysis in vector spaces 10.1
Analysis in Rn
1. Let {x(k) } be a sequence in Rn and suppose x(k) → x ∈ Rn . Let i be an integer, 1 ≤ i ≤ n. We wish (k) to prove that the sequence {xi } of real numbers converges to the real number xi . Let > 0 be given. (k) Since x → x, there exists a positive integer N such that x(k) − x ∞ < for all k ≥ N . By definition, (k) (k) (k) |xi − xi | ≤ x(k) − x ∞ , and hence |xi − xi | < for all k ≥ N . This proves that xi → xi . · , be a norm on X. We first note that
2. Let X be a vector space over R, and let
c1 x ≤ x ≤ c2 x for all x ∈ X holds with c1 = c2 = 1, and hence · is equivalent to itself. If · equivalent to · , then there exist positive constants c1 , c2 such that
∗ is another norm on X
that is
c1 x ≤ x ∗ ≤ c2 x for all x ∈ X. But then
c−1 x ∗ ≤ x ≤ c−1 x ∗ for all x ∈ X, 2 1
and hence · is equivalent to · ∗ . Finally, suppose · # is a third norm on X, · ∗ is equivalent to · , and · # is equivalent to · ∗ . Then there exist positive constants c1 , c2 , γ1 , γ2 such that c1 x ≤ x ∗ ≤ c2 x for all x ∈ X and γ1 x ∗ ≤ x # ≤ γ2 x ∗ for all x ∈ X. Combining these inequalities, we obtain γ1 c1 x ≤ x # ≤ γ2 c2 x for all x ∈ X, and hence relation.
·
# is equivalent to
· . Thus we have shown that equivalence of norms is an equivalence
3. Let · and · ∗ be two norms on Rn . Since · and · ∗ are equivalent, there exist positive constants c1 , c2 such that c1 x ≤ x ∗ ≤ c2 x for all x ∈ Rn . Suppose that xk → x under · , and let > 0. Then there exists a positive integer N such that xk − x < /c2 for all k ≥ N . It follows that xk − x ∗ ≤ c2 xk − x < c2 = c2 for all k ≥ N . Therefore, xk → x under
·
∗.
Conversely, if xk → x under · ∗ and > 0 is given, there exists a positive integer N such that xk − x ∗ < c1 for all k ≥ N . It follows that xk − x ≤ c−1 xk − x ∗ < c−1 1 1 c1 = for all k ≥ N . Therefore, xk → x under · . 247
CHAPTER 10. ANALYSIS IN VECTOR SPACES
248
4. Let · and · ∗ be two norms on Rn , and let S be a nonempty subset of Rn . Since · and · ∗ are equivalent, there exist positive constants c1 , c2 such that c1 x ≤ x ∗ ≤ c2 x for all x ∈ Rn . Suppose S is open under · , and let x ∈ X. There exists > 0 such that y − x < implies that y ∈ S. But then y − x ∗ < c1 implies that y − x ≤ c−1 y − x ∗ < c−1 1 1 c1 = and hence that y ∈ S. This shows that S is open under · ∗ . Conversely, suppose S is open under · ∗ , and let x ∈ S. There exists > 0 such that y − x ∗ < −1 implies that y ∈ S. Then y − x ≤ c−1 2 implies that y − x ∗ ≤ c2 y − x < c2 c2 = and hence that y ∈ S. This shows that S is open under · . 5. Let · and · ∗ be two norms on Rn , and let S be a nonempty subset of Rn . By the previous exercise, Rn \ S is open under · if and only if it is open under · ∗ . It follows immediately that S is closed under · if and only if S is closed under · ∗ . 6. Let · and · ∗ be two norms on Rn . Since · and · ∗ are equivalent, there exist positive constants c1 , c2 such that c1 x ≤ x ∗ ≤ c2 x for all x ∈ Rn . Let S be a nonempty subset of Rn . Suppose that x is an accumulation point of S under · . Then, for any > 0, there are infinitely many elements y of −1 −1 S satisfying y − x < c−1 2 . But y − x < c2 implies that y − x ∗ ≤ c2 y − x < c2 c2 = , and hence infinitely many elements y of S satisfy y − x ∗ < . This prove that x is an accumulation point of S under · ∗ . Conversely, suppose that x is an accumulation point of S under · ∗ . For any > 0, there exist infinitely many elements y of S satisfying y − x ∗ ≤ c1 . But y − x ∗ ≤ c1 implies that y − x ≤ c−1 y−x ∗ < 1 c−1 c = , which shows that infinitely many elements y of S satisfy y − x < . This shows that x is 1 1 an accumulation point of S under · . 7. Let · and · ∗ be two norms on Rn , let S be a nonempty subset of Rn , let f : S → Rn be a function, and let y be an accumulation point of S. Since · and · ∗ are equivalent, there exist positive constants c1 , c2 such that c1 x ≤ x ∗ ≤ c2 x for all x ∈ Rn . Suppose first that limx→y f (x) = L under · , and let > 0 be given. Then there exists δ > 0 such that if x ∈ S and x − y < δ, then |f (x) − L| < . But then x − y ∗ < c1 δ ⇒ y − x ≤ c−1 x − y < c−1 c1 δ = δ. Therefore, if x ∈ S and x − y ∗ < c1 δ, 1 it follows that x − y < δ, and hence that |f (x) − L| < . This shows that limx→y f (x) = L under · ∗ . Conversely, suppose that limx→y f (x) = L under · ∗ , and let > 0 be given. Then there exists δ > 0 such that if x ∈ S and x − y ∗ < δ, then |f (x) − L| < . But then x − y < c−1 2 δ ⇒
y − x ∗ ≤ c2 x − y < c2 c−1 2 δ = δ.
Therefore, if x ∈ S and x − y < c−1 2 δ, it follows that x − y ∗ < δ, and hence that |f (x) − L| < . This shows that limx→y f (x) = L under · . 8. Let · and · ∗ be two norms on Rn , let S be a nonempty subset of Rn , let f : S → Rn be a function, and let y be a point in S. Since · and · ∗ are equivalent, there exist positive constants c1 , c2 such that c1 x ≤ x ∗ ≤ c2 x for all x ∈ Rn . Suppose first that f is continuous at y under · , and let > 0 be given. Then there exists δ > 0 such that if x ∈ S and x − y < δ, then |f (x) − f (y)| < . But x − y ∗ < c1 δ ⇒
x − y ≤ c−1 x − y ∗ < c−1 1 1 c1 δ = δ,
and therefore if x ∈ S and x − y ∗ < c1 δ, it follows that x − y < δ and hence that |f (x) − f (y)| < . This proves that f is continuous at y under · ∗ . Conversely, suppose f is continuous at y under · ∗ , and let > 0 be given. Then there exists δ > 0 such that if x ∈ S and x − y ∗ < δ, then |f (x) − f (y)| < . But x − y < c−1 2 δ ⇒
x − y ∗ ≤ c2 x − y < c2 c−1 2 δ = δ,
and therefore if x ∈ S and x − y < c−1 2 δ, it follows that x − y ∗ < δ and hence that |f (x) − f (y)| < . This proves that f is continuous at y under · .
10.2. INFINITE-DIMENSIONAL VECTOR SPACES
249
9. Let · and · ∗ be two norms on Rn , and let S be a nonempty subset of Rn . Since · and · ∗ are equivalent, there exist positive constants c1 , c2 such that c1 x ≤ x ∗ ≤ c2 x for all x ∈ Rn . Suppose first that S is bounded under · , say x ≤ M for all x ∈ S. Then x ∗ ≤ c2 x ≤ c2 M for all x ∈ S, and hence S is bounded under · ∗ . x ∗ ≤ c−1 Conversely, suppose S is bounded under · ∗ , say x ∗ ≤ M for all x ∈ S. Then x ≤ c−1 1 1 M for all x ∈ S, and hence S is bounded under · . 10. Let · and · ∗ be two norms on Rn , and let S be a nonempty subset of Rn . We wish to prove that S is sequentially compact under · if and only if S is sequentially compact under · ∗ . This follows directly from Exercise 3. If S is sequentially compact under one norm, then every sequence in S has a subsequence converging, under that norm, to an element of S. However, by Exercise 3, any such subsequence converges also under the other norm, which shows that S is also sequentially compact under the second norm. 11. Let · and · ∗ be two norms on Rn , and let {xk } be a sequence in Rn . Since · and · ∗ are equivalent, there exist positive constants c1 , c2 such that c1 x ≤ x ∗ ≤ c2 x for all x ∈ Rn . Suppose first that {xk } is Cauchy under · , and let > 0 be given. Then there exists a positive integer N such that m, n ≥ N implies that xm − xn < c−1 2 . But then m, n ≥ N implies that xm − xn ∗ ≤ c2 xm − xn < c2 c−1 2 = , and hence {xk } is Cauchy under
·
∗.
Conversely, suppose {xk } is Cauchy under · ∗ , and let > 0 be given. Then there exists a positive integer N such that m, n ≥ N implies that xm − xn ∗ < c1 . But then m, n ≥ N implies that xm − xn ≤ c−1 xm − xn ∗ < c−1 1 1 c1 = , and hence {xk } is Cauchy under
· .
12. Let · and · ∗ be two norms on Rn . We wish to prove that Rn is complete under · if and only if Rn is complete under · ∗ . This follows immediately from Exercises 3 and 11. If Rn is complete under · , then every sequence that is Cauchy under · converges, under · , to an element of Rn . But a sequence that is Cauchy under · ∗ is also Cauchy under · by Exercise 11 and hence converges under · . By Exercise 3, the sequence also converges under · ∗ . It follows that Rn is complete under · ∗ . Similarly, if Rn is complete under
·
∗ , it follows that R
n
is complete under
· .
13. Let X be a vector space with norm · , and suppose {xk } is a sequence in X converging to x ∈ X under · . Given any > 0, there exists a positive integer N such that n ≥ N implies that xn − x < /2. But then m, n ≥ N ⇒ xm − xn ≤ xm − x + x − xn < + = . 2 2 This shows that {xk } is Cauchy under · .
10.2
Infinite-dimensional vector spaces
1. Let X be a vector space over R, and let · be a norm on X. We wish to show that · is a continuous function. Let x be any vector in X and let > 0 be given. If y ∈ X satisfies y − x < , then, by the reverse triangle inequality, | y − x | ≤ y − x < . This proves that
·
is continuous at x. Since this holds for every x ∈ X,
2. Suppose f belongs to C[a, b], {fk } is a sequence in C[a, b], and fk − f ∞ → 0 as k → ∞.
·
is a continuous function.
CHAPTER 10. ANALYSIS IN VECTOR SPACES
250
Let > 0 be given. By definition, there exists a positive integer N such that k ≥ N implies that fk − f ∞ < . Since fk − f ∞ = max{|fk (x) − f (x)| : x ∈ [a, b]}, it follows that |fk (x) − f (x)| ≤ fk − f ∞ for all x ∈ [a, b]. Therefore, k ≥ N ⇒ |fk (x) − f (x)| < for all x ∈ [a, b]. This shows that {fk } converges uniformly to f on [a, b]. 3. Suppose {fk } is a Cauchy sequence in C[a, b] (under the L∞ norm) that converges pointwise to f : [a, b] → R. We wish to prove that fk → f in the L∞ norm. By Theorem 442, C[a, b] is complete under · ∞ , and hence there exists a function g ∈ C[a, b] such that fk − g ∞ → 0 as k → ∞. By the previous exercise, {fk } converges uniformly to g and hence, in particular, g(x) = limk→∞ fk (x) for all x ∈ [a, b] (cf. the discussion on page 593 in the text). However, by assumption, f (x) = limk→∞ fk (x) for all x ∈ [a, b]. This proves that g(x) = f (x) for all x ∈ [a, b], that is, that g = f . Thus fk − f ∞ → 0 as k → ∞. 4. Let fk : [0, 1] → R be defined by fk (x) = xk . We wish to prove that {fk } is Cauchy under√the L2 (0, 1) norm but not under the C[0, 1] norm. A direct calculation shows that fk L2 (a,b) = 1/ 1 + 2k, and therefore fk − 0 L2 (a,b) = fk L2 (a,b) → 0 as k → ∞. Thus {fk } converges to the zero function under the L2 (a, b) norm, and hence {fk } is Cauchy under the L2 (a, b) norm. On the other hand, let N be any positive integer (no matter how large), and define 1/k 1 xk = . 2 Then xk ∈ (0, 1) for all k, and
% % % % % 1 N/k 1 % % 1% 1 % % − % → %%1 − %% = as k → ∞. |fN (xk ) − fk (xk )| = % % 2 2% 2 2
Therefore, for all positive integers N , there exists k > N and x ∈ (0, 1) such that |fN (x) − fk (x)| is arbitrarily close to 1/2, in particular, such that |fN (x) − fk (x)| > 1/4. Hence there exists k > N such that fN − fk ∞ > 1/4. It follows that {fk } is not Cauchy under · ∞ . 5. We wish to show that C 1 [a, b] is not complete under · ∞ by finding a sequence {fk } in C 1 [a, b] that is Cauchy under · ∞ but converges to a function that does not belong to C 1 [a, b]. One such example is the following: Take [a, b] = [−1, 1] and define ? 1 fk (x) = x2 + , f (x) = |x|. k For any k2 > k1 , we have ? ? % ? %? 1 2 − x2 + k12 x + % % k1 % x2 + 1 − x2 + 1 % = x2 + 1 − x2 + 1 = / / % k1 k2 % k1 k2 x2 + 1 + x2 + 1 k1
k2
1 1 k1 − k2
/ = / x2 + k11 + x2 + k12 1
− k12 / 1 1 + k1 k2
≤ / k1
1 1 1 = √ −√ ≤ √ k1 k2 k1 for all x ∈ [−1, 1]. From this inequality, it is easy to show that {fk } is Cauchy under · ∞ , and hence {fk } converges uniformly to some continuous function. Moreover, it is obvious that {fk } converges pointwise to f (x) = |x|. It then follows from Exercise 3 that fk → f under · ∞ . However, f ∈ C 1 [a, b].
10.3. FUNCTIONAL ANALYSIS
10.3
251
Functional analysis
1. Let V be a normed vector space over R and let f ∈ V ∗ . Define L1 = sup{|f (v)| : v ∈ V, v ≤ 1}, L2 = inf{M > 0 : |f (v)| ≤ M for all v ∈ V, v ≤ 1}. We wish to prove that L1 = L2 . For any > 0, there exists v ∈ V , v ≤ 1, such that |f (v)| > L1 − . This immediately implies that L2 > L1 − ; since this holds for all > 0, it follows that L2 ≥ L1 . Conversely, suppose > 0. Then, by definition of L1 , we have |f (v)| < L1 + for all v ∈ V , v ≤ 1. It then follows from the definition of L2 that L2 ≤ L1 + . Since this holds for all > 0, we can conclude that L2 ≤ L1 , and thus we have shown that L1 = L2 , as desired. 2. Let S be any set, let f : S → R, g : S → R be given functions, and define L1 = sup{f (x) + g(x) : x ∈ S}, L2 = sup{f (x) : x ∈ S} + sup{g(x) : x ∈ S}. We wish to show that L1 ≤ L2 . For any > 0, by definition of L1 , there exists x ∈ S such that f (x) + g(x) > L1 − . We then have L2 = sup{f (x) : x ∈ S} + sup{g(x) : x ∈ S} ≥ f (x) + g(x) > L1 − . Since this holds for all > 0, it follows that L2 ≥ L1 , as desired. 3. Let V be a normed vector space. We wish to prove that V ∗ is complete. Let {fk } be a Cauchy sequence in V ∗ , let v ∈ V , v = 0, be fixed, and let > 0 be given. Then, since {fk } is Cauchy, there exists a positive integer N such that fn − fm V ∗ < / v for all m, n ≥ N . By definition of · V ∗ , it follows that |fn (v) − fm (v)| < for all m, n ≥ N . This proves that {fk (v)} is a Cauchy sequence of real numbers and hence converges. We define f (v) = limk→∞ fk (v). Since v was an arbitrary element of V , this defines f : V → R. Moreover, it is easy to show that f is linear: f (αv) = lim fk (αv) = lim αfk (v) = α lim fk (v) = αf (v), k→∞
k→∞
k→∞
f (u + v) = lim fk (u + v) = lim (fk (u) + fk (v)) = lim fk (u) + lim fk (v) = f (u) + f (v). k→∞
k→∞
k→∞
k→∞
We can also show that f is bounded. Since {fk } is Cauchy under · V ∗ , it is easy to show that { fk V ∗ } is a bounded sequence of real numbers, that is, that there exists M > 0 such that fk V ∗ ≤ M for all k. Therefore, if v ∈ V , v ≤ 1, then |f (v)| = |limk→∞ fk (v)| = limk→∞ |fk (v)| ≤ limk→∞ fk V ∗ v ≤ M v . Thus f is bounded, and hence f ∈ V ∗ . Finally, we must show that fk → f under · V ∗ , that is, that fk − f V ∗ → 0 as k → ∞. Let > 0 be given. Since {fk } is Cauchy under · V ∗ , there exists a positive integer N such that fm − fn V ∗ < /2 for all m, n ≥ N . We will show that |fn (v) − f (v)| < for all v ∈ V , v ≤ 1, and all n ≥ N , which then implies that fn − f V ∗ < for all n ≥ N and completes the proof. For any v ∈ V , v ≤ 1, and all n, m ≥ N , we have |fn (v) − f (v)| ≤ |fn (v) − fm (v)| + |fm (v) − f (v)| <
+ |fm (v) − f (v)|. 2
Moreover, since fm (v) → f (v), there exists m ≥ N such that |fm (v) − f (v)| ≤ /2. Therefore, it follows that |fn (v) − f (v)| < for all n ≥ N . This holds for all v ∈ V (notice that N is independent of v), and the proof is complete. 4. Let V be a normed vector space and let f belong to V ∗ . By definition of f V ∗ , we have |f (v)| ≤ f V ∗ for all v ∈ V , v ≤ 1. Therefore, for any v ∈ V , v = 0, we have |f (v/ v )| ≤ f V ∗ . Since f is linear, this implies |f (v)|/ v ≤ f V ∗ or |f (v)| ≤ f V ∗ v for all v ∈ V, v = 0. Since the inequality obviously holds for v = 0, this completes the proof.
CHAPTER 10. ANALYSIS IN VECTOR SPACES
252
5. Suppose H is a Hilbert space and S is a subspace of H that fails to be closed. Define the closure of S by 5 cl(S) = lim vk : {vk } ⊂ S, {vk } converges . k→∞
Notice that S ⊂ cl(S) since each v ∈ S is the limit of the sequence {vk } defined by vk = v for all k. It can be shown that cl(S) is a closed subspace and that cl(S)⊥ = S ⊥ . It follows that (S ⊥ )⊥ = (cl(S)⊥ )⊥ = cl(S). 6. We wish to prove that the vector u ∈ H satisfying the conclusions of the Riesz representation theorem is unique. Suppose f ∈ H ∗ is given and u, w ∈ H satisfy f (v) = v, u , f (v) = v, w for all v ∈ H. Then v, u = v, w for all v ∈ H, or v, u − w = 0 for all v ∈ H. This implies that u − w = 0, and hence that u = w. 7. Let X, U be Hilbert spaces, and let T : X → U be linear. We wish to prove that T is continuous if and only if T is bounded. Suppose first that T is bounded, say T (x) U ≤ M x X for all x ∈ X, and let y ∈ X be fixed. Then, for every x ∈ X, we have T (x) − T (y) = T (x − y) ⇒
T (x) − T (y) U ≤ M x − y X .
Therefore, given any > 0, if δ = /M , then x−y X <δ ⇒
T (x) − T (y) U < M
= . M
This proves that T is continuous at y. Since y was arbitrary, we see that T is continuous. Conversely, suppose T is continuous. Then T is continuous at 0 in particular, and therefore, with = 1, there exists δ > 0 such that x X < δ implies that T (x) X < 1. Therefore, for any δ < δ, we have x X ≤ δ implies that T (x) U < 1, and hence T (x) U ≤ (1/δ ) x X for all x ∈ X. This shows that T is bounded. 8. Let X, U be Hilbert spaces, and let T : X → U be linear and bounded. Notice that x → T (x), u U defines a bounded linear functional on X (linearity is easy to verify, and we have | T (x), u U | ≤ T u U x X for all x ∈ X). Therefore, by the Riesz representation theorem, there exists a unique vector yu such that T (x), u U = x, yu X for all x ∈ X, u ∈ U. Since yu is unique, we can define an operator S : U → X by S(u) = yu . We first verify that S is linear. We have x, S(αu) U = T (x), αu U = α T (x), u U = α x, S(u) X = x, αS(u) X . Since this holds for all x ∈ X, it follows that S(αu) = αS(u). Similarly, for all x ∈ X, x, S(u + v) X = T (x), u + v U = T (x), u U + T (x), v U = x, S(u) X + x, S(v) X = x, S(u) + S(v) X . This implies that S(u + v) = S(u) + S(v). Therefore, S is a linear operator. By the Riesz representation theorem, S(u) X equals the norm of the linear functional x → T (x), u U . Since | T (x), u U | ≤ T u U x X for all x ∈ X, we see that the norm of this functional is at most T u U , and hence S(u) X ≤ T u||U . This shows that S is bounded, and S ≤ T . This completes the proof. (Note: We call S = T ∗ the adjoint of T . Since T = (T ∗ )∗ , we see that T ≤ T ∗ also holds, and hence T ∗ = T .)
10.4
Weak convergence
1. Suppose H is an infinite-dimensional Hilbert space and S is a subset of H with a nonempty interior. We wish to show that S is not sequentially compact. We already know that the unit ball of H is not compact,
10.4. WEAK CONVERGENCE
253
and hence there exists a sequence {xk }, with xk ≤ 1 for all k, that has no convergent subsequence. Since S has a nonempty interior, it contains a closed ball B r (y) for some y ∈ S and r > 0. Define, for each k = 1, 2, . . ., yk = y + rxk . Then yk − y = r xk ≤ r and hence yk ∈ B r (y) ⊂ S for all k. Now suppose a subsequence {ykj } converges to some z ∈ B r (y). We can write z = y + rw for some w ∈ H with w ≤ 1, and then ykj − z → 0 ⇒ r xkj − w → 0 ⇒
xkj − w → 0.
Thus {xkj } converges to w, a contradiction. Thus {yk } cannot have a convergent subsequence, and it follows that S is not sequentially compact. 2. Let H be a Hilbert space over R, let S be a closed and bounded subset of H, and suppose {xk } is a sequence in S. We wish to show that there exists a subsequence {xkj } and a vector x ∈ H such that xkj → x weakly. Since S is bounded, it is contained in some ball BR (0), R > 0. Define {yk } ⊂ B 1 (0) by yk = R−1 xk . Then, since B 1 (0) is weakly compact, there exists y ∈ B 1 (0) and a subsequence {ykj } of {yk } such that ykj → y weakly. Then {xkj } satisfies xkj = Rykj for all j, and we claim that {xkj } converges weakly to Ry. To verify this, let z ∈ H be arbitrary; then 0 1 0 1 0 1 0 1 xkj , z − Ry, z = xkj − Ry, z = Rykj − Ry, z = R ykj − y, z → 0. This completes the proof. 3. Let H be a Hilbert space over R, and suppose {xk } is a sequence in H converging strongly to x ∈ H. We wish to prove that xk → x weakly. For any z ∈ H, we have | xk , z − x, z | = | xk − x, z | ≤ xk − x
z
by the Cauchy-Schwarz inequality. Since xk → x strongly, xk − x → 0. This proves that | xk , z − x, z | → 0, that is, that xk , z → x, z . Since z was arbitrary, this proves that xk → x weakly. 4. Let {αk } be a sequence of real numbers. (a) By definition, lim inf k→∞ αk = limk→∞ inf{α : ≥ k}. Assume first that L = lim inf k→∞ αk is a real number. Define a subsequence {αkj } as follows. First choose αk1 such that 1 αk1 < inf{α : ≥ 1} + . 2 Then, for each j ≥ 2, choose kj > kj−1 such that αkj < inf{α : ≥ kj−1 + 1} +
1 . 2j
We have inf{α : ≥ kj−1 + 1} → L as j → ∞, and 1/2 → 0 as j → ∞. Since j
inf{α : ≥ kj−1 + 1} ≤ αkj < inf{α : ≥ kj−1 + 1} +
1 for all j, 2j
it follows that αkj → L, as desired. If L = −∞, there must be a subsequence αkj such that αkj → −∞ (since inf{α : ≥ k} must equal −∞ for all k in this case). If L = ∞, then inf{α : ≥ k} → ∞ as k → ∞, and hence αk → ∞ since αk ≥ inf{α : ≥ k} for all k.
CHAPTER 10. ANALYSIS IN VECTOR SPACES
254
(b) By definition, lim supk→∞ αk = limk→∞ sup{α : ≥ k}. Assume first that L = lim supk→∞ αk is a real number. Define a subsequence {αkj } as follows. First choose αk1 such that 1 αk1 > sup{α : ≥ 1} − . 2 Then, for each j ≥ 2, choose kj > kj−1 such that αkj < sup{α : ≥ kj−1 + 1} −
1 . 2j
We have sup{α : ≥ kj−1 + 1} → L as j → ∞, and 1/2 → 0 as j → ∞. Since j
sup{α : ≥ kj−1 + 1} −
1 ≤ αkj < sup{α : ≥ kj−1 + 1} for all j, 2j
it follows that αkj → L, as desired. If L = ∞, there must be a subsequence αkj such that αkj → ∞ (since sup{α : ≥ k} must equal ∞ for all k in this case). If L = −∞, then sup{α : ≥ k} → −∞ as k → ∞, and hence αk → −∞ since αk ≤ sup{α : ≥ k} for all k. (c) Suppose {αkj } is a convergent subsequence of {ak }. We have inf{α : ≥ kj } ≤ αkj ≤ sup{α : ≥ kj } for all j. The expression on the left converges to lim inf k→∞ αk , while the expression on the right converges to lim supk→∞ αk . Therefore, lim inf αk ≤ lim αkj ≤ lim sup αk . k→∞
j→∞
k→∞
(d) If {αk } is convergent, then every subsequence of {αk } must converge to the same limit. It follows from the first two parts of this exercise that lim inf αk = lim sup αk = lim αk . k→∞
k→∞
k→∞
5. Let V be a normed linear space over R, let x be any vector in V , and let S = Br (x) = {y ∈ V : y − x < r}, where r > 0. Then, for any y, z ∈ S, α, β ∈ [0, 1], α + β = 1, we have αy + βz − x = αy + βz − αx − βx = α(y − x) + β(z − x) ≤α y−x +β z−x < αr + βr = r. This shows that αy + βz ∈ S, and hence that S is convex. 6. Let f : Rn → R be a convex function, suppose f is continuously differentiable, and let x, y ∈ Rn be given. By the chain rule, we see that lim+
h→0
f (y + h(x − y)) − f (y) = ∇f (y) · (x − y). h
The convexity of f implies that, for all h ∈ (0, 1), f (y + h(x − y)) − f (y) = f (hx + (1 − h)y) − f (y) ≤ hf (x) + (1 − h)f (y) − f (y) = h(f (x) − f (y)),
10.4. WEAK CONVERGENCE and hence lim
h→0+
255
f (y + h(x − y)) − f (y) ≤ lim+ (f (x) − f (y)) = f (x) − f (y). h h→0
It follows that f (x) − f (y) ≥ ∇f (y) · (x − y), that is, f (x) ≥ f (y) + ∇f (y) · (x − y). 7. Let f : Rn → R be convex and continuously differentiable, and let x, y ∈ Rn be given. By the previous exercise, f (x) ≥ f (y) + ∇f (y) · (x − y), f (y) ≥ f (x) + ∇f (x) · (y − x). Adding these two equations yields f (x) + f (y) ≥ f (y) + f (x) + ∇f (y) · (x − y) + ∇f (x) · (y − x). Canceling the common terms and rearranging yields (∇f (x) − ∇f (y)) · (x − y) ≥ 0. 8. Suppose f : Rn → R is continuously differentiable and satisfies f (x) ≥ f (y) + ∇f (y) · (x − y) for all x, y ∈ Rn . We wish to prove that f is convex. For any x, y ∈ Rn and α, β ∈ [0, 1], α + β = 1, we have f (x) ≥ f (αx + βy) + ∇f (αx + βy) · (x − (αx + βy)), f (y) ≥ f (αx + βy) + ∇f (αx + βy) · (y − (αx + βy)). Multiplying the first inequality by α and the second by β and adding the results yields αf (x) + βf (y) ≥ (α + β)f (αx + βy) + α∇f (αx + βy) · (x − (αx + βy))+ β∇f (αx + βy) · (x − (αx + βy)) = f (αx + βy)+ ∇f (αx + βy) · (αx + βy − (α + β)(αx + βy)) = f (αx + βy). This proves that f is convex. 9. Let H be a Hilbert space over R, let S be a subset of H, and let f : S → R be continuous. Then, given any x ∈ H and any sequence {xk } converging to x, the sequence {f (xk )} converges to f (x). It follows from Theorem 467 that f (x) = lim inf f (xk ) = lim sup f (xk ), k→∞
k→∞
which certainly implies that f (x) ≤ lim inf f (xk ) k→∞
and f (x) ≥ lim sup f (xk ). k→∞
Thus f is both lower and upper semicontinuous.
CHAPTER 10. ANALYSIS IN VECTOR SPACES
256
10. Let H be a Hilbert space, and let {uk } be an orthonormal sequence in H: 1, j = k, uj , uk = 0, j = k. (a) For each n, define Sn = sp{u1 , . . . , un }. Then n
x, uk uk
projSn x = k=1
and projSn x 2 =
n
2
| x, uk | . k=1
By the projection theorem, we know that projSn x and x − projSn x are orthogonal, and of course x = projSn x + (x − projSn x). It follows that projSn x 2 + x − projSn x 2 = x 2 ⇒ projSn x 2 ≤ x 2 n
⇒
2
| x, uk | ≤ x 2 .
k=1 ∞ 2 k=1 | x, uk | converges and
Since this holds for all n, it follows that ∞
2
| x, uk | ≤ x 2 .
k=1 2 (b) Since ∞ k=1 | x, uk | converges, it follows that x, uk → 0 as k → ∞. This holds for all x ∈ H, and hence uk → 0 weakly as k → ∞.
11. Let X, U be Hilbert spaces, and let T : X → U be linear and bounded. We wish to prove that T is weakly continuous, meaning that {xk } ⊂ X, xk → x weakly ⇒ T (xk ) → T (x) weakly. Let {xk } ⊂ X satisfy xk → x weakly. Then, for any u ∈ U , T (xk ), u U = xk , T ∗ (u) X → x, T ∗ (u) X = T (x), u U , which shows that T (xk ) → T (x) weakly.