IEEE CONTROL SYSTEMS LETTERS, VOL. 5, NO. 2, APRIL 2021
391
A Class of High Order Tuners for Adaptive Systems Joseph E. Gaudio , Graduate Student Member, IEEE, Anuradha M. Annaswamy , Fellow, IEEE, Michael A. Bolender , Eugene Lavretsky , Fellow, IEEE, and Travis E. Gibson
Abstract—Parameter estimation algorithms using higher order gradient-based methods are increasingly sought after in machine learning. Such methods however, may become unstable when regressors are time-varying. Inspired by techniques employed in adaptive systems, this letter proposes a new variational perspective to derive four higher order tuners with provable stability guarantees. This perspective includes concepts based on higher order tuners and normalization and allows stability to be established for problems with time-varying regressors. The stability analysis builds on a novel technique which stems from symplectic mechanics, that links Lagrangians and Hamiltonians to the underlying Lyapunov stability analysis, and is provided for common linear-in-parameter models. Index Terms—Adaptive systems, uncertain systems.
I. I NTRODUCTION ODIFICATIONS to gradient-based parameter update methods have been actively researched within both the machine learning and adaptive systems communities for decades, for optimization and control in the presence of uncertainties. Of particular note is the seminal higher order gradient method proposed by Nesterov [1] which has not only received significant attention in the optimization community [2], but also in the neural network learning community [3] due to its potential for accelerated learning. Variants of Nesterov’s higher order method have become the standard for training deep neural networks [3]. To gain insight into Nesterov’s
M
Manuscript received March 10, 2020; revised May 12, 2020; accepted June 1, 2020. Date of publication June 16, 2020; date of current version June 30, 2020. This work was supported in part by the Air Force Research Laboratory, Collaborative Research and Development for Innovative Aerospace Leadership, Thrust 3—Control Automation and Mechanization under Grant FA 8650-16-C-2642, and in part by the Boeing Strategic University Initiative. Recommended by Senior Editor M. Guay. (Corresponding author: Joseph E. Gaudio.) Joseph E. Gaudio and Anuradha M. Annaswamy are with the Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139 USA (e-mail: jegaudio@mit.edu; aanna@mit.edu). Michael A. Bolender is with the Autonomous Control Branch, Air Force Research Laboratory, Wright-Patterson AFB, OH 45433 USA (e-mail: michael.bolender@us.af.mil). Eugene Lavretsky is with the BR&T, The Boeing Company, Huntington Beach, CA 92647 USA (e-mail: eugene.lavretsky@boeing.com). Travis E. Gibson is with the Department of Pathology, Harvard Medical School, Boston, MA 02115 USA (e-mail: tegibson@bwh.harvard.edu). Digital Object Identifier 10.1109/LCSYS.2020.3002513
method, which is a difference equation, several recent results have leveraged a variational approach showing that, in continuous time, there exists a broad class of higher order methods where one can obtain fast convergence rates [4]. In all the aforementioned work, while the parameter update algorithm is time-varying, the regressors in the problem statement are assumed to be constant. While almost all problems in adaptive control have timevarying regressors, much of the machine learning research has focused, by and large, on constant regressors (see [4] and references therein). Any application of machine learning techniques to safety critical problems will necessarily require the consideration of time-varying regressors, where either the input features are time-varying, for time-series prediction, for recurrent networks with time-varying inputs, or for online learning and optimization [5], [6]. While much of the adaptive systems community has focused on first order parameter update laws [7]–[9], one notable exception is the “high-order tuner” proposed by Morse [10] which has been useful in providing stable algorithms for time-delay systems [11] in the adaptive control setting. The algorithms we develop will be based on these high-order tuners, and are applicable both to machine learning and adaptive control problems, and will use a variational perspective to unite the algorithm derivation and Lyapunov stability analysis. We begin with a discussion of algorithm parameterization, where the use of a regressor-based parameterization is proposed as compared to a time-based parameterization common in machine learning methods [4]. Two higher order tuner algorithms, one of which is considered in [10], are then derived from a unified Lagrangian approach which relates the potential, kinetic, and damping characteristics of the algorithm. We proceed to a discussion of the novelty of the proposed implementation of the derived algorithms, by splitting a second order differential equation (ODE) into two first order ODEs, and relating Lagrangians to Hamiltonians. The Hamiltonian perspective allows for a discussion of symplectic forms of equations which in turn allow for the design of two additional higher order tuners amenable to symplectic discretization techniques [12], [13]. A detailed stability analysis follows, where loss functions and Lyapunov functions are connected to the variational perspectives, and is provided for two classes of adaptive systems. The main contributions of this letter are (i) the derivation of a class of high-order tuners (HT) that are proved to be
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 2475-1456 See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 07,2020 at 11:11:51 UTC from IEEE Xplore. Restrictions apply.
392
IEEE CONTROL SYSTEMS LETTERS, VOL. 5, NO. 2, APRIL 2021
stable even with time-varying regressors, three of which are new, and (ii) a unified variational perspective for all tuners. Preliminary results for one of these high order tuners were reported in [14], [15]. Contributions (i) and (ii) lead to two important insights: (a) there are several continuous-time HT that are canonical and stable, (b) but only some of them are implementable. These insights are vital to the design of stable discrete-time HT that have the potential to lead to accelerated learning [1], [16]. II. E RROR M ODELS IN A DAPTIVE S YSTEMS Adaptive systems are commonly represented in the form of differential and algebraic equations containing two errors. The first error e(t) ∈ Rn represents a tracking or estimation error as e(t) = x̂(t) − x(t), where x̂ is an estimation/tracking state and x is the plant state variable. The second error θ̃ (t) ∈ RN represents a parameter estimation error, which may be stated as the difference between a designed parameter estimate θ and a true value θ ∗ in the form θ̃ (t) = θ (t) − θ ∗ (t). Error models that relate these two errors often provide insight into efficient and stable designs for the adjustment of parameter estimates [7], [17]. In this letter, we focus on the class of error models of the form ė(t) = g1 (e(t), φ(t), θ (t), θ ∗ (t)) ey (t) = g2 (e(t), φ(t), θ (t), θ ∗ (t)),
(1)
where the input-output data, in the form of a regressor φ(t) ∈ RN and output error ey (t) ∈ Rp , are measurable at each time t. A loss function Lt (θ (t)) based on measurable signals is commonly proposed to be minimized by adjusting the parameter estimate θ . The subscript t in Lt (θ (t)) denotes the remaining time dependence due to e(t), φ(t), and θ ∗ (t). Two types of gradient-based update laws, without and with normalization, are commonly proposed to adjust θ . With a user-defined gain γ > 0, the standard update takes the form θ̇ (t) = −γ ∇θ Lt (θ (t)). The normalized update takes the form γ θ̇(t) = − ∇θ Lt (θ (t)), Nt
(2)
(2 )
where Nt is a known/designed normalization signal to ensure boundedness of signals in the closed loop system when the regressor φ cannot be assumed to be bounded a priori [7]–[9]. A common choice is Nt = 1 + φ T (t)φ(t). Lyapunov function techniques are commonly employed to design the specific form of ∇θ Lt (θ (t)) and certify stability of the system in (1) with either (2) or (2 ). For the remainder of this letter, notation of time dependence will be omitted when it is clear from the context and · represents the 2-norm. III. H IGHER O RDER T UNER D ERIVATION F ROM A VARIATIONAL P ERSPECTIVE We begin with a common variational perspective in order to derive our higher order algorithms. In particular, the Bregman Lagrangian from [4, eq. 1] is restated as L(θ, θ̇ , t) = eᾱt +γ̄t (Dh (θ + e−ᾱt θ̇, θ) − eβ̄t L(θ )), where Dh is the Bregman divergence defined with a distance-generating function h as: Dh (y, x) = h(y)−h(x)−∇h(x)T (y−x). For ease of exposition,
we will use the squared Euclidean norm h(x) = (1/2) x 2 in the Bregman divergence, thus resulting in the Lagrangian β̄t ᾱt +γ̄t −2ᾱt 1 2 e L(θ, θ̇ , t) = e θ̇ − e L(θ ) . (3) 2 This Lagrangian can be seen to weight potential energy (loss) L(θ ) versus kinetic energy (1/2) θ̇ 2 , with a term which adjusts the damping. The user defined time-dependent parameters (ᾱt , β̄t , γ̄t ) are chosen to result in different algorithms by appropriately weighting each term in the Lagrangian (see [4] for some choices in common machine learning algorithms). As discussed in Section II, the loss function in adaptive systems is in general time dependent. In order to design algorithms robust to the time-varying loss Lt (θ ), the core idea from adaptive systems theory of choosing the parameterization (ᾱt , β̄t , γ̄t in (3)) as a function of a normalization signal Nt is taken. To derive higher order update laws which correspond with the standard (2) and normalized update laws (2 ), we provide two different Lagrangian formulations parameterized by user defined gains γ , β > 0 and Nt . The first Lagrangian, which corresponds with (2), is chosen with ᾱt = ln(βNt ), t β̄t = ln(γ /(βNt )), and γ̄t = t0 βNs ds as t 1 1 2 t0 β Ns ds θ̇ − γ Lt (θ ) . L(θ, θ̇ , t) = e (4) βNt 2 The second Lagrangian, which corresponds with (2 ), is chosen with ᾱt = 0, β̄t = ln(γβ/Nt ), and γ̄t = β(t − t0 ) as 1 γβ L(θ, θ̇ , t) = eβ(t−t0 ) θ̇ 2 − Lt (θ ) . (4 ) 2 Nt The Lagrangians in (4) and (4 ) represent the central idea which will produce the first two higher order algorithms in this letter. From a Lagrangian, a functional may be defined as: J(θ ) = T L(θ, θ̇ , t)dt, where T is an interval of time. To minimize this functional, a necessary condition from the calculus of variations [18] is that the Lagrangian solves the Euler-Lagrange equation: dtd ( ∂∂Lθ̇ (θ, θ̇ , t)) = ∂∂θL (θ, θ̇ , t). Using (4), the second order differential equation resulting from the application of the Euler-Lagrange equation is Ṅt θ̇ = −γβNt ∇θ Lt (θ ). θ̈ + βNt − (5) Nt The second order differential equation in (5) may be implemented without the time derivative of the normalization signal Ṅt using two first-order differential equations as ϑ̇ = −γ ∇θ Lt (θ ) θ̇ = −β(θ − ϑ)Nt ,
(6)
which coincides with the high-order tuner proposed in [10]. In a similar procedure, the application of the Euler-Lagrange equation using the second Lagrangian in (4 ) leads to θ̈ + β θ̇ = −
γβ ∇θ Lt (θ ), Nt
(5 )
which is equivalent to ϑ̇ = − Nγ t ∇θ Lt (θ ) θ̇ = −β(θ − ϑ).
(6 )
Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 07,2020 at 11:11:51 UTC from IEEE Xplore. Restrictions apply.
GAUDIO et al.: CLASS OF HIGH ORDER TUNERS FOR ADAPTIVE SYSTEMS
Fig. 1. High order tuner block diagram. TABLE I C OMPARISON OF PARAMETERIZATIONS
Equations (6) and (6 ) are the two classes of higher order tuners which correspond with the first order methods in (2) and (2 ) respectively, and form the focus of this letter. While (6) is proposed in [10], (6 ) is new. It can be seen that the first of the two equations in each of (6) and (6 ) are identical to the first-order updates in (2) and (2 ). The second equation represents a first order filter of the output of the gradient estimate. Figure 1 shows a block diagram of this scheme, where it can be noted that the number of integrations in the high order updates in (6), (6 ) are double that of (2), (2 ). Remark 1: Normalization Nt is always present in the higher order tuners of (6) and (6 ), and is essential for the proof of stability in Section V-C. The gains γ > 0 and β > 0 adjust the gradient and filter components, respectively. From the second order ODE perspective, the variable β represents “damping” of the algorithm and γ weights the forcing term. Remark 2: Choosing a regressor dependent parameterization of Nt also enables the stability proof, especially in the presence of time-varying regressors. In contrast, the timedependent parameterization normally employed in machine learning (ex. [4]) has not been proven to be stable (see Table I for a comparison) with time-varying regressors. It can be noted that the Heavy Ball method of Polyak [19], which has constant gains, has also not been proven stable in this setting. Remark 3: It should be noted that for the Lagrangian in (4) the second “ideal scaling condition” (γ̄˙t = eᾱt ) in [4, eq. 2b] holds but the first “ideal scaling condition” (β̄˙t ≤ eᾱt ) in [4, eq. 2a] does not need to hold in general. For the Lagrangian in (4 ) neither scaling condition is required to hold. In this sense, the results of this letter are applicable to a larger class of algorithms than considered in [4], in particular for problems with a time dependent loss Lt (θ ). IV. VARIATIONAL AND S YMPLECTIC P ERSPECTIVES In this section we explore insights afforded by the variational perspective utilized to derive the high-order tuners in (6) and (6 ). In particular, we provide insight into the novelty of the splitting of the second order differential equations in (5) and (5 ) into two first order differential equations using the variable ϑ. Due to space limitations, we mainly restrict the discussion to the Lagrangian in (4) and the resulting updates in (5), (6). A comparable analysis can be provided for (4 ), (5 ), and (6 ).
393
We begin by noting that the Lagrangians provided in Section III are not unique. The Lagrangian L(θ, θ̇ , t) = d t N exp( t0 [βNs − dsNs s ]ds)( 12 θ̇ 2 − γβNt Lt (θ )) can be shown to generate the second order ODE in (5), where the damping is located in the usual exponential term. Note that the time derivative of the normalization Nt is in general not known however. Thus this Lagrangian and the second order equation in (5) are not implementable in general. Crucially, the dependence on Ṅt can be eliminated using the variable ϑ in the splitting of (6). In order to demonstrate the novelty of ϑ, we proceed in the following sections to a discussion of Hamiltonians, canonical variables, and symplectic concepts. A. Hamiltonian Formulation Whereas Lagrangians are functions of the coordinate variable θ and velocity θ̇, Hamiltonians are functions of the coordinate variable θ and a conjugate momenta p. Using (4), the conjugate variable may be expressed as p=
1 tt β Ns ds ∂L θ̇ , e 0 = βNt ∂ θ̇
(7)
where this change of variables is often referred to as a Legendre transformation [18]. Using the conjugate variable p, the Hamiltonian H = pT θ̇ − L may be defined as follows in the original coordinate variable θ and velocity θ̇ as t 1 1 β N ds θ̇ 2 + γ Lt (θ ) . H(θ, θ̇ , t) = e t0 s βNt 2 In these variables, the Hamiltonian can be seen to be the sum of potential and kinetic energies, with the same damping term as (4). The Hamiltonian may be expressed in the canonical phase space variables (θ, p) as −
t
β Ns ds 1
t
p 2 + γ e
β N ds
s t0 Lt (θ ). (8) 2 The canonical Hamiltonian equations of motion may then be stated as follows, using the canonical variables (θ, p), and corresponds to the third class of high-order tuners proposed in this letter1 : t ∂H − β N ds = βNt e t0 s p θ̇ = ∂p t ∂H β N ds = −γ e t0 s ∇θ Lt (θ ). ṗ = − (9) ∂θ The system of two equations in (9) is canonical as it arises from a Hamiltonian, and thus preserves the symmetries of Hamiltonian systems [18]. Furthermore, this system of two equations is implementable, as no time derivative of the normalization, Ṅt , is required. A fourth implementable higher order tuner may be derived by starting with the Lagrangian in (4 ) using a similar procedure:
H(θ, p, t) = βNt e
t0
∂H = e−β(t−t0 ) p ∂p γβ ∂H = − eβ(t−t0 ) ∇θ Lt (θ ). ṗ = − ∂θ Nt
θ̇ =
(9 )
Due to space limitations we restrict our discussions to (9). 1 Reference [20] also remarks on a structure similar to (8) and (9).
Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 07,2020 at 11:11:51 UTC from IEEE Xplore. Restrictions apply.
394
IEEE CONTROL SYSTEMS LETTERS, VOL. 5, NO. 2, APRIL 2021
Even though (9) is implementable, the exponentially increasing and decaying terms are numerically undesirable. It is easy to show that these exponential terms can be removed through a change of coordinates −
ϑ =θ +e
t t0
β Ns ds
p=θ+
1 θ̇. βNt
(10)
More importantly, these change of coordinates directly transform (9) into the first class of high-order tuners in (6). Given that (6) has no exponential terms and therefore is easily implementable, it motivates the question of the significance of the variable ϑ, which is a specific combination of θ , θ̇ , and Nt . In particular it is important to understand the relation between ϑ, stability, and Hamiltonians. For this purpose, as a first step, we explore canonical variables in the next section. B. Generating Functions and Canonical Variables A change of variables from the original canonical variables (θ, p) to new canonical variables (ξ, ϑ), where ξ = ξ(θ, p, t), ϑ = ϑ(θ, p, t), and where ϑ is given by (10), can be accomplished using a generating function , as follows: t 1 β N ds 1 ϑ − θ ∗ 2 − θ − ϑ 2 . (11)
(θ, ϑ, t) = e t0 s 2 2 From this generating function we obtain the following change of variables by taking partial derivatives [18] t t ∂
β N ds − β N ds = e t0 s θ̃ , θ̃ = e t0 s ξ, ∂ϑ t t ∂
β N ds − β N ds = e t0 s (ϑ − θ ), ϑ = θ + e t0 s p. (12) p= ∂θ It can be noted however, that a new coordinate ξ = θ results. With the generating function in (11), change of variables (12), and original Hamiltonian (8), a new Hamiltonian H̄ = H + ∂
∂t may be defined in terms of the new
ξ =
variables as [18]: H̄ t t0
β N ds
−
t t0
=
β N ds
βNt e
t t0
β Ns ds 1 2 ϑ
− θ ∗ 2 +
s s Lt (e ξ + θ ∗ ). With this new Hamiltonian, γe the canonical equations in (ξ, ϑ) may be stated as: t ∂ H̄ t0 β Ns ds ξ̇ = ∂ϑ = βNt e (ϑ − θ ∗ ), ϑ̇ = − ∂∂ξH̄ =
t
β N ds
−
t
β N ds
−γ e t0 s ∇ξ Lt (e t0 s ξ + θ ∗ ). It can be noted again that the change variables from p to ϑ (as desired), resulted in another change of θ to ξ . Since ξ depends on the unknown parameter θ ∗ , the high-order tuner in the new variables (ξ, ϑ) is no longer implementable. C. Canonical Transformations Given that the canonical variables (ξ, ϑ) resulted in nonimplementable equations, it may be asked whether any canonical transformation of (θ, p) to (θ, ϑ) exists. The following theorem sheds light on this matter. Theorem 1 ([18], [21]): A transformation ξ = ξ(θ, p, t) and ϑ = ϑ(θ, p, t) is canonical if and only if {ξi , ξj }θ,p = 0,
{ϑi , ϑj }θ,p = 0,
{ξi , ϑj }θ,p = δij ,
where {·, ·}θ,p are Poisson brackets defined as ∂F ∂G ∂F ∂G , {F, G}θ,p = − ∂θi ∂pi ∂pi ∂θi i
xi denotes ith element of x, and δij is the Kronecker delta.
The following two corollaries confirm the results of the previous section and provide a negative result. Corollary 1: The transformation from (θ, p) to (ξ, ϑ) in (12) is a canonical transformation (where ϑ is as in (10)). Corollary 2: The transformation from (θ, p) to (θ, ϑ) with ϑ in (10) is not canonical. We now remark on the significance of the results of Sections IV-A and IV-B. Remark 4: The special choice of splitting the second order differential equation in (5) using the variables (θ, ϑ) as in (6) was shown to result in an implementable update law without extraneous exponential terms (as in Figure 1). While the Lagrangian perspective generates a second order ODE, a set of two first order ODEs may be generated directly from the Hamiltonian perspective as in (9). In light of the significance of the specific choice of the variable ϑ, a generating function approach was proposed to provide a canonical transformation which includes the variable ϑ as desired in (12). While this transformation is canonical, the resulting set of equations are not implementable. The question thus remains if there exists any canonical transformation to result in a Hamiltonian system for which the variables (θ, ϑ) are canonical. Corollary 2 answers this in the negative. This implies that the choice of splitting the second order differential equation in (5) using the variables (θ, ϑ) is unique. This choice of variables will be further shown to be instrumental in proving stability in Section V-C. Before proceeding to a discussion of stability, we explore a unique discretization technique afforded by the Hamiltonian formulation of the canonical (θ, p) system in (9). D. Symplectic Discretization Returning to the original Hamiltonian in (8), which resulted in implementable canonical variables (θ, p) in (9), it can be noted that the Hamiltonian is non-autonomous/time-dependent due to the normalization signal Nt . To eliminate the timedependence, the Hamiltonian can be lifted to a higher dimension by including an additional coordinate variable τ and conjugate momenta E. The lifted Hamiltonian H̃(θ, τ, p, E) = H(θ, p, τ ) + E may then be expressed as −
H̃ = βNτ e
τ t0
β Ns ds 1
2
τ
p 2 + γ e
t0
β Ns ds
Lτ (θ ) + E. (13)
The extended system canonical equations are thus of the form τ ∂ H̃ ∂ H̃ − β N ds τ̇ = = βNτ e t0 s p, = 1, ∂p ∂E τ ∂ H̃ ∂ H̃ β N ds = −γ e t0 s ∇θ Lτ (θ ), Ė = − . (14) ṗ = − ∂θ ∂τ
θ̇ =
Given that the equations in (14) are canonical equations of motion which arise from an autonomous Hamiltonian H̃, symplectic properties of the Hamiltonian are preserved [18]. The preservation of symplectic properties allow for the implementation of symplectic discretization schemes which preserve symmetry properties in the discretization as stated in [12], [13], [20]. Such discretization methods have recently been shown to result in stable discrete algorithms with large step sizes which allow for fast convergence rates [13].
Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 07,2020 at 11:11:51 UTC from IEEE Xplore. Restrictions apply.
GAUDIO et al.: CLASS OF HIGH ORDER TUNERS FOR ADAPTIVE SYSTEMS
V. S TABILITY A NALYSIS FOR C OMMON E RROR M ODELS Four classes of higher order tuners were derived in the previous sections from a unified variational perspective to inform the specific systems of two first order ODEs that are coupled. Of these, the first two classes ((6), (6 )) were shown to be implementable, while the second two ((9), (9 )) were implementable though numerically ill-conditioned. All these four classes will be shown to be stable in this section using the same Lyapunov function. We examine these high-order tuners in the context of two common error models in adaptive systems which arise in linear regression and model reference adaptive control (MRAC). We discuss the loss function Lt (θ ) in each setting and provide stability analysis using the higher order tuner in (6). We also comment on the stability analyses for the higher order tuners in (6 ), (9), (9 ), as well as on the Lyapunov function used in [4]. For notational convenience we consider the single output setting in linear regression and the single input state feedback MRAC problem statement. The results of the section can be trivially extended to multiple inputs and multiple outputs. A. Linear Regression Many problems in adaptive estimation and control may be expressed as y = θ ∗T φ, where θ ∗ , φ ∈ RN represent the unknown parameter and the measurable regressor. The variable y ∈ R represents the measurable output. Given that θ ∗ is unknown, we formulate an estimator ŷ = θ T φ, where ŷ ∈ R is the estimated output and the unknown parameter is estimated with θ ∈ RN . Define the output error as ey = ŷ − y = θ̃ T φ,
(15)
where θ̃ = θ − θ ∗ is the parameter estimation error. The goal is to design a rule to adjust the parameter estimate θ in a continuous manner using knowledge of φ and ey such that ey converges towards zero. To do so, a squared loss function 1 1 (16) Lt (θ ) = e2y = θ̃ T φφ T θ̃ , 2 2 is commonly considered [7]. Using (15), the gradient of this loss function with respect to θ can be expressed in an implementable closed form as ∇θ Lt (θ ) = φey . B. Model Reference Adaptive Control While the error model in linear regression (15) is algebraic, the error model for model reference adaptive control of LTI systems takes the form of the dynamical system [7] ė = Ae + bθ̃ T φ, Rn×n
(17)
where A ∈ is a known Hurwitz matrix and b ∈ is a known input matrix. The goal is to design a rule to adjust the parameter estimate θ in a continuous manner using knowledge of φ and e such that e converges towards zero. A stabilitybased rule is often proposed for this purpose and is generated from the loss function d eT Pe eT Qe Lt (θ ) = + , (18) dt 2 2 Rn
∈ is a positive definite matrix that solves where P = the equation AT P + PA = −Q, with a positive definite matrix PT
Rn×n
395
Q = QT ∈ Rn×n . Using (17), the gradient of this loss function with respect to θ can be expressed in an implementable closed form as ∇θ Lt (θ ) = φeT Pb. Remark 5: Comparing the loss function in (18) to that in (16), the extra terms in the braces account for energy storage in the error model dynamics in equation (17). C. Stability Analysis In this section we state the main stability and convergence results for the higher order algorithms derived in this letter. For this analysis, the normalization signal is chosen as Nt = 1 + μφ T φ,
(19)
where μ > 0 is a user defined gain. With this choice of normalization we now proceed to the main theorems. Assumption 1: The unknown parameter θ ∗ is a constant. Theorem 2: Under Assumption 1, for the linear regression model in (15) with loss in (16), the higher order tuner in (6) with normalization in (19) and μ ≥ 2γ /β (w.l.o.g.), results in (ϑ −θ ∗ ) ∈ L∞ and (θ −ϑ) ∈ L∞ . If in addition it assumed that φ, φ̇ ∈ L∞ then limt→∞ ey (t) = 0, limt→∞ (θ (t) − ϑ(t)) = 0, limt→∞ ϑ̇(t) = 0, and limt→∞ θ̃˙ (t) = 0. Proof: Consider the candidate Lyapunov function inspired by the higher order tuner approach in [11] stated as V=
1 1 ϑ − θ ∗ 2 + θ − ϑ 2 . γ γ
(20)
Using (6), (15), (16), and (19) with μ ≥ 2γ /β, the time 2 derivative of (20) may be bounded as: V̇ ≤ − 2β γ θ − ϑ − 2 2 ey − [ ey − 2 θ − ϑ φ ] ≤ 0. Thus it can be concluded that V is a Lyapunov function with (ϑ − θ ∗ ) ∈ L∞ ∞ and (θ − ϑ) ∈ L∞ . Integrating V̇ from t0 to ∞: t0 ey 2 dt ≤ ∞ − V̇dt = V(t0 ) − V(∞) < ∞, thus ey ∈ L2 . Likewise, ∞t02β ∞ 2 t0 γ θ − ϑ dt ≤ − t0 V̇dt = V(t0 ) − V(∞) < ∞, thus
0) (θ − ϑ) ∈ L2 ∩ L∞ . Furthermore, θ − ϑ 2L2 ≤ γ V(t 2β , where θ − ϑ 2L2 → 0 as β → ∞. If in addition φ ∈ L∞ , then from (15), e ∈ L ∩ L , and from (6), ϑ̇, θ̃˙ ∈ L ∩ L . If
y
2
∞
2
∞
additionally φ̇ ∈ L∞ , then from the time derivative of (15), it can be seen that ėy ∈ L∞ and from the time derivative of (6), ϑ̈, θ̃¨ ∈ L∞ and thus from Barbalat’s lemma, limt→∞ ey (t) = 0, limt→∞ (θ (t)−ϑ(t)) = 0, limt→∞ ϑ̇(t) = 0, and limt→∞ θ̃˙ (t) = 0. Theorem 3: Under Assumption 1, for the MRAC model in (17) with loss in (18), the higher order tuner in (6) with normalization in (19) and Q ≥ 2I which solves AT P + PA = −Q, μ ≥ 2γ Pb 2 /β (w.l.o.g.), results in e ∈ L∞ , (ϑ − θ ∗ ) ∈ L∞ , and (θ − ϑ) ∈ L∞ . If it assumed that φ ∈ L∞ then limt→∞ e(t) = 0. Also if φ̇ ∈ L∞ , then limt→∞ (θ (t)−ϑ(t)) = 0, limt→∞ ϑ̇(t) = 0, and limt→∞ θ̃˙ (t) = 0. Proof: Consider the candidate Lyapunov function inspired by the higher order tuner approach in [11] stated as V=
1 1 ϑ − θ ∗ 2 + θ − ϑ 2 + eT Pe. γ γ
(21)
Using (6), (17), (18), and (19) with μ ≥ 2γ Pb 2 /β and Q ≥ 2I which solves AT P + PA = −Q, the time derivative 2 2 of (21) may be bounded as: V̇ ≤ − 2β γ θ − ϑ − e −
Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 07,2020 at 11:11:51 UTC from IEEE Xplore. Restrictions apply.
396
IEEE CONTROL SYSTEMS LETTERS, VOL. 5, NO. 2, APRIL 2021
[ e −2 Pb θ −ϑ φ ]2 ≤ 0. Thus it can be concluded that V is a Lyapunov function with e ∈ L∞ , (ϑ − θ ∗ ) ∈ L∞ , and ∞ (θ − ϑ) ∈ L∞ . By integrating V̇ from t0 to ∞: t0 e 2 dt ≤ ∞ − V̇dt = V(t0 )−V(∞) < ∞, thus e ∈ L2 ∩L∞ . Likewise, ∞t02β ∞ 2 t0 γ θ − ϑ dt ≤ − t0 V̇dt = V(t0 ) − V(∞) < ∞, thus (θ −ϑ) ∈ L2 ∩L∞ . Furthermore, it can be concluded that again 2 0) θ − ϑ 2L2 ≤ γ V(t 2β . If in addition φ ∈ L∞ , then from (17) ė ∈ L∞ , and thus from Barbalat’s lemma, limt→∞ e(t) = 0. Additionally, from (6) ϑ̇, θ̃˙ ∈ L2 ∩ L∞ . If additionally φ̇ ∈ L∞ ,3 then from the time derivative of (6), ϑ̈, θ̃¨ ∈ L∞ , and thus from Barbalat’s lemma, limt→∞ (θ (t) − ϑ(t)) = 0, limt→∞ ϑ̇(t) = 0, and limt→∞ θ̃˙ (t) = 0. Remark 6: The algorithms provided in this letter can be considered as online learning algorithms for which constant (O(1)) regret bounds can be provided. See [14] for a discussion of constant regret bounds in adaptive systems. Remark 7: We note that while we focus in this section on models that are linear in the parameters, nonlinearly parameterized models and other loss functions can be analyzed using similar Lyapunov stability approaches [17], [20]. Remark 8: A complementary proof of stability for the alternate higher order tuner in (6 ) for linear regression as in Theorem 2 can be provided using the same Lyapunov function in (20). In this setting the time derivative is V̇ ≤ N1 t {− 2β γ θ − ϑ 2 − ey 2 − [ ey − 2 θ − ϑ φ ]2 }. Remark 9: Using the canonical system (θ, p) as in (9) for Theorem 2, the candidate Lyapunov function in (20) may be restated with the change of variables in (10) as V = γ1 θ̃ + −
e
t
t0
β Ns ds −2
t
p 2 + γ1 e
−
t
t0
β Ns ds
p 2 , with time derivative V̇ ≤
β N ds
−
t
β N ds
s t0 p 2 − ey 2 − [ ey − 2e t0 s p φ ]2 . − 2β γ e This formulation however, contains exponentially decaying terms, thus further motivating the use of the Lyapunov function in (20) using (θ, ϑ). Remark 10: In a similar manner, for the update in (9 ) in the setting of Theorem 2, the candidate Lyapunov function in (20) is restated with the canonical system (θ, p) as V = γ1 θ̃ + β1 e−β(t−t0 ) p 2 + γ1 β1 e−β(t−t0 ) p 2 , with time 1 −2β(t−t0 ) p 2 − ey 2 − [ ey − derivative V̇ ≤ N1 t {− 2β γ β2 e
2 β1 e−β(t−t0 ) p φ ]2 }. Remark 11: The stability analysis in [4] cannot be used to certify stability of the higher order tuner in (5). In particular, the candidate Lyapunov function proposed for stability in [4, eq. 8] is restated as V = Dh (θ ∗ , θ + e−ᾱt θ̇ ) + eβ̄t (L(θ ) − L(θ ∗ )). Using the same parameterization as in (4) with squared loss (16) for the linear regression error model (15) considered in Section V-A, the function V is restated as V = γ 1 2 1 1 2 2 θ̃ + β Nt θ̇ + β Nt 2 ey . Using the higher order algorithm (5), error equation (15), and normalization (19), the time derivaT tive may be expressed as V̇ = −γ e2y (1 + μφ 2φ̇ ) + βγNt ey θ̃ T φ̇, which is sign indefinite due to φ̇.
β Nt
2 As is common in adaptive control, φ is a continuously differentiable function of the plant state x. It was proved that e ∈ L∞ , with x̂ ∈ L∞ by design. Thus with x = x̂ − e, φ ∈ L∞ by construction. 3 In adaptive control with φ a continuously differentiable function of the plant state x, φ̇ ∈ L∞ as ẋ = x̂˙ − ė is bounded by construction.
VI. C ONCLUSION This letter presents several high order tuners (HT) which are proved to be stable even when the regressors are timevarying, a common feature in adaptive control, but ignored in many machine learning problems. We also present a unified variational perspective for all tuners. The important insights that are gained from these results are that there are several continuous time HT that are canonical and stable, but that only some of them are implementable. These insights are vital to move onto the design of stable discrete-time HT, which may realize accelerated learning results as in [1], [16]. R EFERENCES [1] Y. Nesterov, “A method of solving a convex programming problem with convergence rate O(1/k2 ),” Soviet Math. Doklady, vol. 27, no. 3, pp. 372–376, 1983. [2] A. Beck and M. Teboulle, “A fast iterative shrinkage-thresholding algorithm for linear inverse problems,” SIAM J. Imag. Sci., vol. 2, no. 1, pp. 183–202, Jan. 2009. [3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105. [4] A. Wibisono, A. C. Wilson, and M. I. Jordan, “A variational perspective on accelerated methods in optimization,” Proc. Nat. Acad. Sci. USA, vol. 113, no. 47, pp. E7351–E7358, Nov. 2016. [5] M. I. Jordan and T. M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” Science, vol. 349, no. 6245, pp. 255–260, 2015. [6] E. Hazan, “Introduction to online convex optimization,” Found. Trends Optim., vol. 2, nos. 3–4, pp. 157–325, 2016. [7] K. S. Narendra and A. M. Annaswamy, Stable Adaptive Systems. Mineola, NY, USA: Dover, 2005. [8] G. C. Goodwin and K. S. Sin, Adaptive Filtering Prediction and Control. Englewood Cliffs, NJ, USA: Prentice-Hall, 1984. [9] P. Ioannou and J. Sun, Robust Adaptive Control. Upper Saddle River, NJ, USA: Prentice-Hall, 1996. [10] A. S. Morse, “High-order parameter tuners for the adaptive control of linear and nonlinear systems,” in Systems, Models and Feedback: Theory and Applications. Boston, MA, USA: Birkhäuser, 1992, pp. 339–364. [11] S. Evesque, A. M. Annaswamy, S. Niculescu, and A. P. Dowling, “Adaptive control of a class of time-delay systems,” J. Dyn. Syst. Meas. Control, vol. 125, no. 2, pp. 186–193, Jun. 2003. [12] E. Hairer, C. Lubich, and G. Wanner, Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations. Heidelberg, Germany: Springer, 2006. [13] M. Betancourt, M. I. Jordan, and A. C. Wilson, “On symplectic optimization,” 2018. [Online]. Available: https://arxiv.org/abs/ 1802.03653. [14] J. E. Gaudio, T. E. Gibson, A. M. Annaswamy, M. A. Bolender, and E. Lavretsky, “Connections between adaptive control and optimization in machine learning,” in Proc. IEEE 58th Conf. Decis. Control (CDC), Dec. 2019, 1–18. [15] J. E. Gaudio, T. E. Gibson, A. M. Annaswamy, and M. A. Bolender, “Provably correct learning algorithms in the presence of time-varying features using a variational perspective,” 2019. [Online]. Available: https://arxiv.org/abs/1903.04666. [16] J. E. Gaudio, A. M. Annaswamy, J. M. Moreu, M. A. Bolender, and T. E. Gibson, “Accelerated learning with robustness to adversarial regressors,” 2020. [Online]. Available: https://arxiv.org/abs/2005.01529. [17] A.-P. Loh, A. M. Annaswamy, and F. P. Skantze, “Adaptation in the presence of a general nonlinear parameterization: An error model approach,” IEEE Trans. Autom. Control, vol. 44, no. 9, pp. 1634–1652, Sep. 1999. [18] H. Goldstein, C. Poole, and J. Safko, Classical Mechanics. San Francisco, CA, USA: Addison Wesley, 2002. [19] B. T. Polyak, “Some methods of speeding up the convergence of iteration methods,” USSR Comput. Math. Math. Phys., vol. 4, no. 5, pp. 1–17, Jan. 1964. [20] N. M. Boffi and J.-J. E. Slotine, “Higher-order algorithms and implicit regularization for nonlinearly parameterized adaptive control,” 2020. [Online]. Available: https://arxiv.org/abs/1912.13154. [21] L. N. Hand and J. D. Finch, Analytical Mechanics. Cambridge, U.K.: Cambridge Univ. Press, 1998.
Authorized licensed use limited to: Khwaja Fareed University of Eng & IT. Downloaded on July 07,2020 at 11:11:51 UTC from IEEE Xplore. Restrictions apply.