Review of Adaptive Neural Networks & Some Applications
Steve Rogers srogers@isr.us 304.368.9300
1
Overview • Adaptive Neural Networks have become more popular due to their ability to approximate a large array of dynamics. The ability to adapt is accomplished by means of a set of tuning rules. • Adaptive Neural Networks are used for control & system identification/prediction, table look-ups, fault detection, and optimization due to their generalization ability. Tuning rules for adaptive neural networks have featured Lyapunov-based approaches in recent years. Although these have some desirable qualities they have led to complex tuning procedures. Tuning rules should be simple and provide for rapid, reliable convergence. • Adaptive Neural Networks possess learning, adaptation, and classification capabilities. adaptive NN background & applications – Steve Rogers
2
Neural Networks Decision Points •
•
Advantages Capable of learning complex nonlinear systems Code/algorithms available Can use either for fixed or adaptive applications including control & system identification/prediction, table look-ups, and fault detection May handle arbitrary inputs, unlike linear systems Can treat system to be identified as a ‘black box’, i.e., doesn’t require knowledge of 1st principles Can be used in conjunction with other conventional methods Disadvantages Requires specialized knowledge related to the algorithm Difficult to validate, especially adaptive systems, because the weights are not deterministic Solution may be available from other conventional methods Convergence may be to local minimum & not global solution
adaptive NN background & applications – Steve Rogers
3
Neural Network Components • Neurons Also known as nodes, neurons are the basic computing element of a network
• Connections Defines relationship of neuron within the network
• Weights Used to determine whether a neuron activates Can also be used as the activation of a neuron
• Activation Function Function used to determine the output of a neuron
• Adaptive Algorithm Controls the learning process of the network
A key point is that arbitrary measurements & derived measurements may be inputs. adaptive NN background & applications – Steve Rogers
4
Neural Network Structure A neuron in the NN will take the weighted sum of its inputs and use this as an input signal for the neuron's activation function which will then produce an output signal.
Common Activation Functions Bipolar Sigmoid Binary Sigmoid Gaussian Radial
1 1 + e ( −σ ⋅ x ) 2 f ( x) = −1 1 + e ( −σ ⋅ x ) f ( x) =
f ( x) = e − x
2
adaptive NN background & applications – Steve Rogers
5
Adaptive Radial Basis Function Neural Networks (RBFN) • RBFN’s are two-layer networks whose outputs are a linear combination of the hidden layer functions • Typical RBFN equations are:
f (ξ)
= w0T
h
+
∑w
T k φk
(ξ)
k =1
φk (ξ) = exp −
1
σk2
ξ −µk
2
• where ξ is the input vector of the network, h indicates the total number of hidden neurons, µ k and σ k refers to the center and width of the kth hidden neuron. ||…|| is the Euclidean norm. The function f(.) is the output of an RBFN, which represents the network approximation to the actual output. The coefficient w k is the connection weight vector of the kthhidden neuron to the output neurons and w 0 is the bias term. adaptive NN background & applications – Steve Rogers
6
Typical RBFN Architecture
adaptive NN background & applications – Steve Rogers
7
Adaptive Update Schemes •
Any good identification scheme that utilizes the RBFN scheme should satisfy two criteria: 1) the parameters of the RBFN are tuned properly to satisfy stability and performance needs 2) the parameter adaptive law should be efficient to allow real-time operation Feedback
•
•
•
The RAN (resource allocating network) was developed to tune all the RBF parameters and incorporated a growth feature. MRAN also includes a pruning feature. Other tuning rules only adjusted the connection weight vector and left the center and width vectors fixed. A Lyapunov derived tuning rule is:
χ ( n + 1) = χ ( n ) + ηΠ ( n ) Pe( n )
where χ is the vector of parameters to be tuned including the connection weights, centers, and widths, η is the user selected learning rate (positive scalar), Π(n) is the gradient of the function with respect to the parameter vector χ evaluated at χ(n). T
Q = −( PA + A P )
•
P is the solution of the Lyapunov derived equation . Q is a user selected positive definite matrix; A is a user selected Hurwitz (all components of polynomial positive) stable matrix. e(n) is the error vector.
adaptive NN background & applications – Steve Rogers
8
Update Schemes • Another common approach for tuning rules is
w = w + µ w eφ − k w µ w ew
µ = µ + µξ ewφ
µ −ξ − k µ eµ 2 σ
σ 2 = σ 2 + µσ ewφ
φ(ξ) = exp − ei = y i − yˆ i ( ξ i )
µ −ξ σ2 1
σ2
2
− kσ µσ eσ 2
2 ξ −µ
• The 3rd term moves the discrete pole away from the unit circle, i.e., from being a pure integrator. Although this may slow down convergence, it improves stability, and should remove oscillations. • Note that all parameters are tuned in the above gradient approach adaptive NN background & applications – Steve Rogers
9
Radial Basis Function Block Diagrams
adaptive NN background & applications – Steve Rogers
10
Control Circuit for LC update •The bottom part of the figure shows how a control structure may be inserted into the linear combiner (LC). The simplest control structure is the standard learning rate µ. A proportional integral (PI) structure is the next simplest controller. It has the form:
Kp
( s + a) s
•which gives another integrator plus a zero. Note also that Kp may be combined with µ, ie, Kp = µ. s + a PID, servo type PID, etc. •Any control structure may be used including lead-lag, Kl
s+b
RBF With Controller Update Mechanism ϕ
PI y
+
e Kp ( s + a )
x
s
yhat
χdot
ϕ sigmoid
x
χ
1 --s
adaptive NN background & applications – Steve Rogers
11
Optimization Gain Results
s+a K1 s +b
K2
s + a1 s + b1
Kp
s + Ki s
adaptive NN background & applications – Steve Rogers
12
Data Plots
Kp
Kp
( s + a) s +b
s + Ki s
adaptive NN background & applications – Steve Rogers
13
Use of Neural Networks for Control Enhancements of Existing Systems
NN add-on
set points
Existing Controller
Gas Turbine
measurements
A neural network may be added to an existing system & make use of the current data stream to enhance an existing system. This would make it non-intrusive to the existing system to take advantage of the existing control system capabilities. The NN add-on could focus on any deficiencies of the existing system. Most current research NN prototypes are handled in this fashion.
adaptive NN background & applications – Steve Rogers
14
NN Control of Systems with Jumps: friction, deadzones, backlash, & hysteresis • •
Add-on to existing continuous controller Modify usual activation function by adding a jump function
Common Activation Functions Bipolar Sigmoid Binary Sigmoid
Gaussian Radial
continuous
1 f ( x) = 1 + e ( −σ ⋅ x )
2 f ( x) = −1 1 + e ( −σ ⋅ x )
f ( x) = e
Jump functions for _ x < 0
0 1 g ( x) = 1 + e ( −σ ⋅ x )
for _ x ≥ 0
0 2 g ( x) = −1 1 + e ( −σ ⋅ x )
− x2
adaptive NN background & applications – Steve Rogers
0 g ( x) = − x 2 e
for _ x < 0 for _ x ≥ 0
for _ x < 0 for _ x ≥ 0
15
System Identification Unknown System
d[n]
+
Input x[n]
+ -
The adaptive component successfully models the system when e[n] converges to a small value. If model coefficients change drastically an anomaly may be declared.
Adaptive Component
adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
y[n]
e[n] 16
System ID with Adaptive Neural Networks •
• • •
•
•
Adaptive components are usually used in conjunction with conventional components because of the instability concerns. They are used to ‘pick up the slop’ remaining from the conventional component. Multi-Layer Perceptrons (MLP) may be used for system identification or prediction. Numerous structures & update law options. Sigma Pi structure, Ci are input vectors, β is a kronecker product, W is a set of weights, Ue is an error function (in this case a PI control output), G/B are defined by the application. Single Hidden Layer structure, W/V are weights to be updated adaptively, & the other parameters are defined by the application system. The MLP used here is explained in the following sheets.
output = W T β ( C1 , C2 , C3 ) , W = −G (U B + L U W ), e
Ue =
e
1 1 + Ki e + e 2Ki ∫ 2Ki K p
output = bwθ wk + ∑ wwk σ j ( z j ), k = 1, , n3 n2
j =1
n1 σ j ( z j ) = σ bvθ vj + ∑ vij xi i =1 Wˆ = − σˆ − σˆ zVˆ T x ς + λ ς Wˆ Γw , Vˆ = − x ςWˆ T σˆ z + λ ς Vˆ Γv ,
{( {
)
}
}
Γw > 0, Γv > 0, λ > 0
adaptive NN background & applications – Steve Rogers
17
System ID Example with MLP MLP (Multi-Layer Perceptron) diagram p1
v11 v12 v21 v22 v31
p2
v32
+
n11
f
+
c11 = f ( n11 ) = f ( v11 p1 + v12 p2 + θ1 )
c12 = f ( n12 ) = f ( v21 p1 + v22 p2 + θ 2 )
w1 n12
f
c12
θ2
+
a = f ( n21 ) = f ( w1c11 + w2 c12 + w3c13 + λ ) ,
c11
θ1
n13
w2
+
n21
a
f
w3
f
MLP equations
c13
θ3
λ
c13 = f ( n13 ) = f ( v31 p1 + v32 p2 + θ 3 )
MLP general equations S R a J = f J ∑ f i ∑ vin pn + θ i wi + λJ , i =1 n =1 a Jx1 = f Jx1 ( wJxS f Sx1 ( vSxR pRx1 + θ Sx1 ) + λ Jx1 )
MLP general update laws The equations completely define the MLP. Note that α is a scalar learning rate. The matlab implementation is shown in the following sheet.
wi ( k + 1) = wi ( k ) − α
λJ ( k + 1) = λ J ( k ) − α
∂Fˆ ( k ) ∂wi ( k ) ∂Fˆ ( k )
∂λJ ( k ) ∂Fˆ ( k ) vi ,n ( k + 1) = vi ,n ( k ) − α ∂vi ,n ( k )
θ i ( k + 1) = θ i ( k ) − α
∂Fˆ ( k ) , ∂θ i ( k )
2 Fˆ ( k ) = error (k ) 2 = ( t ( k ) − a( k ) )
adaptive NN background & applications – Steve Rogers
18
Matlab Code % mlp_example.m % clear * N = 500; cycles = 4; x = sin(cycles*2*pi*[0:N-1]/N); lb = -0.7; ub = 0.6; gain = 2; init = 1; for i = 1:N if x(i)>ub,y(i) = gain*x(i)^5; elseif x(i)<lb,y(i) = gain*x(i)^5; else y(i) = sign(x(i))*x(i)^2; end yhat(i) = MLP_recurArray([init,x(i),y(i)]); init = 0; end figure(1) subplot(211) err = y(:) - yhat(:); errnorm = norm(err); plot([x(:),y(:),yhat(:)]),grid on title(['MLP estimation of sinusoid, error = ',... num2str(errnorm)]) subplot(212) plot(err),grid on ylabel('error')
function yout = MLP_recurArray(in); % % MLP backpropagation learning for single hidden layer % W is output layer weights % Vi is for ith hidden layer % Assume N number of interior nodes, then the MLP NN equations are: % O = W*atanh(V*I); % With the above there are 2 update equations: % W = W - mu*err*atan(V*I); % V = V - mu*err*W*I*[1/(1+(V*I)]; % N is the number of interior nodes % m is the number of inputs including the bias signal persistent X N = 10; m = 5; my = 5; init = in(1); u = in(2); y = in(3); % Initialize W & V if init == 1 | isempty(X) X.W = zeros(1,N); X.dW = X.W; X.V = rand(N,m+my+N)/10000; X.dV = zeros(size(X.V)); X.in = [1;u*ones(m1,1);y*ones(my,1);zeros(N,1)]; X.predslow = y; end
mu = .09; bet = .1; G = tanh(X.V*X.in); out = X.W*G; err = y - out; nextW = X.W + mu*err*G' + bet*X.dW; sec2h = sech(X.V*X.in); sec2h = sec2h.*sec2h; nextV = X.V + mu*err*sec2h.*X.W'*X.in'... + bet*X.dV; X.in = [1;u;X.in(2:m1);y;X.in(2+m+1:2+m+my-1);G]; X.dW = nextW - X.W; X.dV = nextV - X.V; X.W = nextW; X.V = nextV; yout = out;
adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
MLP function code
19
Results MLP estimation of sinusoid, error = 10.3689 2
x y yhat
1 0 -1 -2
0
50
100
150
200
250
300
350
400
450
500
Fluctuation of weights indicates that better model structure needed. 5 4 3 errornorm wtnorm
2 1 0
0
50
100
150
200
250
300
350
400
450
500
x is the input sinusoid, y is the output signal which is a nonlinear combination of sinusoids, & yhat is the MLP tracking signal. The bottom plot shows the stability & error performance. adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
20
Predictive Filters signal
Z-n
Adaptive filter
Z â&#x20AC;&#x201C; discrete delay operator N â&#x20AC;&#x201C; number of delays
error
Signal estimate
+
Adaptive filter copy
Signal prediction
The block entitled adaptive filter may be replaced by an arbitrary structured filter. The adaptive filter copy is updated each time step. This same concept can be applied to an adaptive neural network. Note that many adaptive components (unless otherwise guaranteed stable) are used in conjunction with conventional components to ensure the stability of the adaptive component. adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
21
Fault Detection Concepts • Actuator nonlinearities – deadband, backlash, & hysteresis. Conventional and adaptive neural networks to estimate jump discontinuities. • Instrument faults – excessive noise, dead sensor, drift, and bias. Simple statistics for the 1st two & system ID for the last two. • Parameter estimation for process fault detection. Changes in coefficients may be used for fault detection. • Hopfield neural networks may be used for principal component analyses (PCA), which is used in data driven fault detection.
adaptive NN background & applications – Steve Rogers
22
Continuous Instrumentation Diagnostics for Accuracy/Precision & Life-Cycle Maintenance •
Sensor faults. Monitoring is data validation or cross checking sensor data. There are 4 types of anomalies from typical analog sensors: dead, excessive noise, drift, & offset.
•
Dead or excessive noise can be detected & isolated using standard deviations of the individual sensor data stream. The standard deviation is compared to the statistics of common sensors throughout the plant.
•
Drift or offset may also be caused by something in the process being measured, therefore, detection/isolation must be model based.
•
Drift or offset fault detection model equations can be based on performance criteria, heat/mass balance equations, or other model structures. Fault detection parameters are derived from the equations. Any change indicates an anomaly which can then be investigated. Kalman filters are frequently used to estimate the fault parameters in stochastic systems, although other nonlinear systems including neural networks may be used as well. Typical equations and fault indicators derived from an electric pump system & heat exchanger follow.
adaptive NN background & applications – Steve Rogers
23
Instrument Fault Types: Excessive Noise & Dead Sensor
bottom - excessive noise fault, top - dead sensor fault 4
3
sensor value
2
1
0
-1
-2
-3 0
10
20
30
40 50 60 time seconds
70
adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
80
90
100
24
Smoothed signal
Technical Approach for dead/noisy sensors: Sensor Fault Detection Filter Banks
Raw data
Filter Bank
Note that the low pass filter blocks may be of arbitrary structure and may fixed or adaptive neural networks or linear networks. fault indicators These will be determined by operational experience.
Filter Bank Raw signal x
Low pass smoothed signal + filter y
residual
Low pass filter
Abs( )
s x
y Typical distribution of ‘s’ for a group of sensors Possible noise failure
Possible dead sensor number
This algorithm will process the raw engineering converted data that comes from each sensor. ‘s’ is the output signal that is sent to the decision logic. Sensors will be grouped by type, service, criticality, etc., as appropriate.
Fault decision s
Sds
Snf
Note that Sds, & Snf will be refined by operational experience.
adaptive NN background & applications – Steve Rogers
25
Instrument Fault Types: Drift & Offset bottom - offset fault, top - drift fault 8 7
sensor value
6 5 4 3 2 1 0 -1 0
1
2
3
4 5 6 time (seconds)
7
adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
8
9
10
26
Proposed Observer Solution for Drift/Offset Sensor Fault •The observers will process the raw engineering data ‘yi’ (output measurements) and ‘u’ (input measurements) that comes from each sensor. •An estimated value of all the output measurements is sent to a set of rules for decision making. •If a residual is greater than a threshold a fault is indicated. •This is the basis for an approach using neural networks. Note that the observer blocks may have arbitrary structures. Each observer is made unique by varying the input vectors. Therefore, the differences between them become fault indicators.
u
process
y1 Q sensors
y2
Logic 1
Decision 1
yq y11 Observer 1 y21 yq1 y12 Observer 2 y22 yq2
...
Nominal health
Failed health
Suspect health
Possible States of Sensor Health
y1q y2q Observer q yqq
adaptive NN background & applications – Steve Rogers
Logic q
Decision q
27
Pump 1 Fluid Schematic with sensors & formulas LATI02SR0201P
LATI02SR0401P
dpg
dpp
LATI02SR0101P Abs Press Sensor
LATI02FM0002R LATI02FM0001R Flowmeter fm
psia Inlet
filter
Gas trap
Pump
Check Valve
Outlet
T Temperature Sensor
dpf accumulator
LATI02SR0301P LATI02SR0501Q Pump Indicators: 1) Zf = dpf/pph^2 (filter resistance) 2) Zg = dpg/pph^2 (gas trap resistance) 3) Impeller specific speed = rpm*pph^0.5/(dpp^0.75) 4) Suction specific speed = rpm*pph^0.5/(psia^0.75) electricwatts1 = amps*volts electric watts2 = amps*4.3825*krpm hydraulic watts = pph*psid/(60*8.34*2.298) 5) pump efficiency = hydraulic watts/electric watts 6) a1 = dpp - function(Impeller specific speed)*pph a1 should be close to zero except in a fault condition. 7) vc = Amps/krpm (pump ratio) 8) load = pph*dpp/(krpm*krpm) (pump load ratio) where the left hand side of the above 8 equations are indicator parameters.
LATI02SR0001T
Quantity Sensor
Amps volts krpm
LATI21FC0001C/10 LATI21FC0001V LATI21FC0003U/(255*20000)
Pump Dynamic Equations are used for estimation: 1) Ampsdot = -(R2/L2)*Amps - (psi/L2)*krpm 2) krpmdot = (psi/J)*Amps - (hth /J)*krpm 3) dppdot = hnn*pph^2 + hww*krpm^2 4) pphdot = -(hrr/ab)*pph^2 + dpp/ab where R2, L2, psi, J, hth, hnn, hww, hrr, and ab are indicator parameters which can be determined.
adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
28
ISS MTL & LTL PPA Equations Table Pump Indicators Equation
Parameters
Algorithm
PPA Area of fault detection Sensors
1) 2) 3) 4) 5) 6) 7) 8)
Zf Zg impeller spec. speed (iss) suction spec. speed (sss) pump efficiency (pe) a1 vc load
Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter
filter performance gas trap performance pump performance pump performance pump performance pump performance pump motor performance pump motor performance
dpf, pph dpg, pph krpm, pph, dpp krpm, pph, psia Amps, volts, pph, dpp dpp, krpm, pph dpp amps, krpm pph, dpp, krpm
Adaptive filter Adaptive filter Adaptive filter Adaptive filter
motor performance motor performance pump performance pump performance
amps, krpm amps, krpm pph, krpm pph, dpp
Pump Dynamic Indicators Equation 1) 2) 3) 4)
R2, L2, psi psi, J, hth hnn, hww hrr, ab
Sensor Fault Matrix - PPA equations
Indicator Parameters Note that the algorithms may be adaptive neural networks as well as linear adaptive filters.
Sensors dpf pph dpg krpm dpp psia Amps volts T
Zf Zg iss * *
* *
sss
*
*
* *
*
pe a1 vc load *
* *
R2
L2
psi J
hth
* *
*
* *
*
*
*
*
*
hnn
hww
*
*
*
*
hrr
ab
*
*
*
*
* * *
*
*
adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
*
*
*
*
29
On-Line Estimation of deadband, backlash, & hysteresis In Control Element Deadband Schematic
mr
bl
v
Deadband Equations
Control Element & Plant
u
br
ml
mr(v(t) - br) if v(t) >= br u(t) = 0 if bl < v(t) < br ml(v(t) - bl) if v(t) <= bl
y
Backlash Schematic
m
cl
v
Backlash Equations
Control Element & Plant
u
cr
m
y
u(t) =
m(v(t) - cl) if v(t) <= cl m(v(t) - cr) if v(t) >= cr u(t - 1) if cl < v(t) < cr
The hysteresis schematic is more complicated than deadband or backlash & is not shown here. The general approach for parameter estimation is shown below. The types of nonlinearities are usually known by inspection. v
nonlinearity
Parameter estimation (Kalman Filter)
u
Control Element & Plant
y
Mr, ml, m, br, bl, cl, cr, etc.
adaptive NN background & applications – Steve Rogers
The estimated deadband parameters may be used for 2 purposes: • on-line control loop audits • plant control
30
Deadband Model Parameter Estimation
adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
31
Backlash Model Parameter Estimation
adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
32
Matlab code
% matlab deadband code if udb(i)>0; [p1,Pdb,err(i)] = KalmanF(p1,Pdb,... udb(i),[v(i) -1 0 0]'); end; if udb(i)<0; [p1,Pdb,err(i)] = KalmanF(p1,Pdb,... udb(i),[0 0 v(i) -1]'); end; function [param,P,err] = KalmanF(param,P,y,x); % % niter = 10; Q = 0.05*eye(size(P)); for i = 1:niter err = y - x'*param; k = P*x/(1 + x'*P*x); P = (eye(size(P)) - k*x')*P + Q; param = param + k*err; end;
adaptive NN background & applications â&#x20AC;&#x201C; Steve Rogers
33
Examples of applications for active control of noise and vibration • • •
• • • • • •
• •
Control of aircraft interior noise by use of lightweight vibration sources on the fuselage and acoustic sources inside the fuselage. Reduction of helicopter cabin noise by active vibration isolation of the rotor and gearbox from the cabin. Reduction of noise radiated by ships and submarines by active vibration isolation of interior mounted machinery (using active elements in parallel with passive elements) and active reduction of vibratory power transmission along the hull, using vibration actuators on the hull. Reduction of internal combustion engine exhaust noise by use of acoustic control sources at the exhaust outlet or by use of high intensity acoustic sources mounted on the exhaust pipe and radiating into the pipe at some distance from the exhaust outlet. Reduction of low frequency noise radiated by industrial noise sources such as vacuum pumps, forced air blowers, cooling towers and gas turbine exhausts, by use of acoustic control sources. Lightweight machinery enclosures with active control for low frequency noise reduction. Control of tonal noise radiated by turbo-machinery (including aircraft engines). Reduction of low frequency noise propagating in air conditioning systems by use of acoustic sources radiating into the duct airway. Reduction of electrical transformer noise either by using a secondary, perforated lightweight skin surrounding the transformer and driven by vibration sources or by attaching vibration sources directly to the transformer tank. Use of acoustic control sources for this purpose is also being investigated, but a large number of sources are required to obtain global control. Reduction of noise inside automobiles using acoustic sources inside the cabin and lightweight vibration actuators on the body panels. Active headsets and earmuffs. adaptive NN background & applications – Steve Rogers
34
Acoustic Concept 1 Noise Source
Primary Noise Reference Microphone
Canceling Loudspeaker
Error Microphone
y(n) e(n)
x(n)
ANC
ANC is active noise control, which includes an adaptive component . Main components are: • error microphone for each direction • reference microphone • canceling loudspeaker for each direction y(n) is the loudspeaker signal that minimizes e(n) signal. adaptive NN background & applications – Steve Rogers
35
Acoustic Concept 2 Noise Source
Primary Noise
Canceling Loudspeaker
Error Microphone
y(n) e(n)
ANC ANC is active noise control. The ANC includes an adaptive algorithm that learns the system in order to create an ‘anti-noise’ in the canceling loudspeaker. Components are: • error microphone for each direction •canceling loudspeaker for each direction y(n) is the loudspeaker signal that minimizes e(n) signal. adaptive NN background & applications – Steve Rogers
36