Neural Network Tutorial

Page 1

Review of Adaptive Neural Networks & Some Applications

Steve Rogers srogers@isr.us 304.368.9300

1


Overview • Adaptive Neural Networks have become more popular due to their ability to approximate a large array of dynamics. The ability to adapt is accomplished by means of a set of tuning rules. • Adaptive Neural Networks are used for control & system identification/prediction, table look-ups, fault detection, and optimization due to their generalization ability. Tuning rules for adaptive neural networks have featured Lyapunov-based approaches in recent years. Although these have some desirable qualities they have led to complex tuning procedures. Tuning rules should be simple and provide for rapid, reliable convergence. • Adaptive Neural Networks possess learning, adaptation, and classification capabilities. adaptive NN background & applications – Steve Rogers

2


Neural Networks Decision Points •

Advantages  Capable of learning complex nonlinear systems  Code/algorithms available  Can use either for fixed or adaptive applications including control & system identification/prediction, table look-ups, and fault detection  May handle arbitrary inputs, unlike linear systems  Can treat system to be identified as a ‘black box’, i.e., doesn’t require knowledge of 1st principles  Can be used in conjunction with other conventional methods Disadvantages  Requires specialized knowledge related to the algorithm  Difficult to validate, especially adaptive systems, because the weights are not deterministic  Solution may be available from other conventional methods  Convergence may be to local minimum & not global solution

adaptive NN background & applications – Steve Rogers

3


Neural Network Components • Neurons  Also known as nodes, neurons are the basic computing element of a network

• Connections  Defines relationship of neuron within the network

• Weights  Used to determine whether a neuron activates  Can also be used as the activation of a neuron

• Activation Function  Function used to determine the output of a neuron

• Adaptive Algorithm  Controls the learning process of the network

A key point is that arbitrary measurements & derived measurements may be inputs. adaptive NN background & applications – Steve Rogers

4


Neural Network Structure A neuron in the NN will take the weighted sum of its inputs and use this as an input signal for the neuron's activation function which will then produce an output signal.

Common Activation Functions Bipolar Sigmoid Binary Sigmoid Gaussian Radial

1 1 + e ( −σ ⋅ x ) 2 f ( x) = −1 1 + e ( −σ ⋅ x ) f ( x) =

f ( x) = e − x

2

adaptive NN background & applications – Steve Rogers

5


Adaptive Radial Basis Function Neural Networks (RBFN) • RBFN’s are two-layer networks whose outputs are a linear combination of the hidden layer functions • Typical RBFN equations are:

f (ξ)

= w0T

h

+

∑w

T k φk

(ξ)

k =1

φk (ξ) = exp − 

1

σk2

ξ −µk

2

  

• where ξ is the input vector of the network, h indicates the total number of hidden neurons, µ k and σ k refers to the center and width of the kth hidden neuron. ||…|| is the Euclidean norm. The function f(.) is the output of an RBFN, which represents the network approximation to the actual output. The coefficient w k is the connection weight vector of the kthhidden neuron to the output neurons and w 0 is the bias term. adaptive NN background & applications – Steve Rogers

6


Typical RBFN Architecture

adaptive NN background & applications – Steve Rogers

7


Adaptive Update Schemes •

Any good identification scheme that utilizes the RBFN scheme should satisfy two criteria:  1) the parameters of the RBFN are tuned properly to satisfy stability and performance needs  2) the parameter adaptive law should be efficient to allow real-time operation Feedback

The RAN (resource allocating network) was developed to tune all the RBF parameters and incorporated a growth feature. MRAN also includes a pruning feature. Other tuning rules only adjusted the connection weight vector and left the center and width vectors fixed. A Lyapunov derived tuning rule is:

χ ( n + 1) = χ ( n ) + ηΠ ( n ) Pe( n )

where χ is the vector of parameters to be tuned including the connection weights, centers, and widths, η is the user selected learning rate (positive scalar), Π(n) is the gradient of the function with respect to the parameter vector χ evaluated at χ(n). T

Q = −( PA + A P )

P is the solution of the Lyapunov derived equation . Q is a user selected positive definite matrix; A is a user selected Hurwitz (all components of polynomial positive) stable matrix. e(n) is the error vector.

adaptive NN background & applications – Steve Rogers

8


Update Schemes • Another common approach for tuning rules is

w = w + µ w eφ − k w µ w ew

µ = µ + µξ ewφ

µ −ξ − k µ eµ 2 σ

σ 2 = σ 2 + µσ ewφ

φ(ξ) = exp −  ei = y i − yˆ i ( ξ i )

µ −ξ σ2 1

σ2

2

− kσ µσ eσ 2

2 ξ −µ  

• The 3rd term moves the discrete pole away from the unit circle, i.e., from being a pure integrator. Although this may slow down convergence, it improves stability, and should remove oscillations. • Note that all parameters are tuned in the above gradient approach adaptive NN background & applications – Steve Rogers

9


Radial Basis Function Block Diagrams

adaptive NN background & applications – Steve Rogers

10


Control Circuit for LC update •The bottom part of the figure shows how a control structure may be inserted into the linear combiner (LC). The simplest control structure is the standard learning rate µ. A proportional integral (PI) structure is the next simplest controller. It has the form:

Kp

( s + a) s

•which gives another integrator plus a zero. Note also that Kp may be combined with µ, ie, Kp = µ. s + a PID, servo type PID, etc. •Any control structure may be used including lead-lag, Kl

s+b

RBF With Controller Update Mechanism ϕ

PI y

+

e Kp ( s + a )

x

s

yhat

χdot

ϕ sigmoid

x

χ

1 --s

adaptive NN background & applications – Steve Rogers

11


Optimization Gain Results

s+a K1 s +b

K2

s + a1 s + b1

Kp

s + Ki s

adaptive NN background & applications – Steve Rogers

12


Data Plots

Kp

Kp

( s + a) s +b

s + Ki s

adaptive NN background & applications – Steve Rogers

13


Use of Neural Networks for Control Enhancements of Existing Systems

NN add-on

set points

Existing Controller

Gas Turbine

measurements

A neural network may be added to an existing system & make use of the current data stream to enhance an existing system. This would make it non-intrusive to the existing system to take advantage of the existing control system capabilities. The NN add-on could focus on any deficiencies of the existing system. Most current research NN prototypes are handled in this fashion.

adaptive NN background & applications – Steve Rogers

14


NN Control of Systems with Jumps: friction, deadzones, backlash, & hysteresis • •

Add-on to existing continuous controller Modify usual activation function by adding a jump function

Common Activation Functions Bipolar Sigmoid Binary Sigmoid

Gaussian Radial

continuous

1 f ( x) = 1 + e ( −σ ⋅ x )

2 f ( x) = −1 1 + e ( −σ ⋅ x )

f ( x) = e

Jump functions for _ x < 0

0   1 g ( x) =  1 + e ( −σ ⋅ x )

for _ x ≥ 0

0  2 g ( x) =  −1 1 + e ( −σ ⋅ x )

− x2

adaptive NN background & applications – Steve Rogers

 0 g ( x) =  − x 2 e

for _ x < 0 for _ x ≥ 0

for _ x < 0 for _ x ≥ 0

15


System Identification Unknown System

d[n]

+

Input x[n]

+ -

The adaptive component successfully models the system when e[n] converges to a small value. If model coefficients change drastically an anomaly may be declared.

Adaptive Component

adaptive NN background & applications – Steve Rogers

y[n]

e[n] 16


System ID with Adaptive Neural Networks •

• • •

Adaptive components are usually used in conjunction with conventional components because of the instability concerns. They are used to ‘pick up the slop’ remaining from the conventional component. Multi-Layer Perceptrons (MLP) may be used for system identification or prediction. Numerous structures & update law options. Sigma Pi structure, Ci are input vectors, β is a kronecker product, W is a set of weights, Ue is an error function (in this case a PI control output), G/B are defined by the application. Single Hidden Layer structure, W/V are weights to be updated adaptively, & the other parameters are defined by the application system. The MLP used here is explained in the following sheets.

output = W T β ( C1 , C2 , C3 ) , W = −G (U B + L U W ), e

Ue =

e

1 1 + Ki e + e 2Ki ∫ 2Ki K p

output = bwθ wk + ∑ wwk σ j ( z j ), k = 1,  , n3 n2

j =1

n1   σ j ( z j ) = σ  bvθ vj + ∑ vij xi  i =1    Wˆ = − σˆ − σˆ zVˆ T x ς + λ ς Wˆ Γw ,  Vˆ = − x ςWˆ T σˆ z + λ ς Vˆ Γv ,

{( {

)

}

}

Γw > 0, Γv > 0, λ > 0

adaptive NN background & applications – Steve Rogers

17


System ID Example with MLP MLP (Multi-Layer Perceptron) diagram p1

v11 v12 v21 v22 v31

p2

v32

+

n11

f

+

c11 = f ( n11 ) = f ( v11 p1 + v12 p2 + θ1 )

c12 = f ( n12 ) = f ( v21 p1 + v22 p2 + θ 2 )

w1 n12

f

c12

θ2

+

a = f ( n21 ) = f ( w1c11 + w2 c12 + w3c13 + λ ) ,

c11

θ1

n13

w2

+

n21

a

f

w3

f

MLP equations

c13

θ3

λ

c13 = f ( n13 ) = f ( v31 p1 + v32 p2 + θ 3 )

MLP general equations  S  R   a J = f J  ∑ f i  ∑ vin pn + θ i  wi + λJ ,   i =1  n =1  a Jx1 = f Jx1 ( wJxS f Sx1 ( vSxR pRx1 + θ Sx1 ) + λ Jx1 )

MLP general update laws The equations completely define the MLP. Note that α is a scalar learning rate. The matlab implementation is shown in the following sheet.

wi ( k + 1) = wi ( k ) − α

λJ ( k + 1) = λ J ( k ) − α

∂Fˆ ( k ) ∂wi ( k ) ∂Fˆ ( k )

∂λJ ( k ) ∂Fˆ ( k ) vi ,n ( k + 1) = vi ,n ( k ) − α ∂vi ,n ( k )

θ i ( k + 1) = θ i ( k ) − α

∂Fˆ ( k ) , ∂θ i ( k )

2 Fˆ ( k ) = error (k ) 2 = ( t ( k ) − a( k ) )

adaptive NN background & applications – Steve Rogers

18


Matlab Code % mlp_example.m % clear * N = 500; cycles = 4; x = sin(cycles*2*pi*[0:N-1]/N); lb = -0.7; ub = 0.6; gain = 2; init = 1; for i = 1:N if x(i)>ub,y(i) = gain*x(i)^5; elseif x(i)<lb,y(i) = gain*x(i)^5; else y(i) = sign(x(i))*x(i)^2; end yhat(i) = MLP_recurArray([init,x(i),y(i)]); init = 0; end figure(1) subplot(211) err = y(:) - yhat(:); errnorm = norm(err); plot([x(:),y(:),yhat(:)]),grid on title(['MLP estimation of sinusoid, error = ',... num2str(errnorm)]) subplot(212) plot(err),grid on ylabel('error')

function yout = MLP_recurArray(in); % % MLP backpropagation learning for single hidden layer % W is output layer weights % Vi is for ith hidden layer % Assume N number of interior nodes, then the MLP NN equations are: % O = W*atanh(V*I); % With the above there are 2 update equations: % W = W - mu*err*atan(V*I); % V = V - mu*err*W*I*[1/(1+(V*I)]; % N is the number of interior nodes % m is the number of inputs including the bias signal persistent X N = 10; m = 5; my = 5; init = in(1); u = in(2); y = in(3); % Initialize W & V if init == 1 | isempty(X) X.W = zeros(1,N); X.dW = X.W; X.V = rand(N,m+my+N)/10000; X.dV = zeros(size(X.V)); X.in = [1;u*ones(m1,1);y*ones(my,1);zeros(N,1)]; X.predslow = y; end

mu = .09; bet = .1; G = tanh(X.V*X.in); out = X.W*G; err = y - out; nextW = X.W + mu*err*G' + bet*X.dW; sec2h = sech(X.V*X.in); sec2h = sec2h.*sec2h; nextV = X.V + mu*err*sec2h.*X.W'*X.in'... + bet*X.dV; X.in = [1;u;X.in(2:m1);y;X.in(2+m+1:2+m+my-1);G]; X.dW = nextW - X.W; X.dV = nextV - X.V; X.W = nextW; X.V = nextV; yout = out;

adaptive NN background & applications – Steve Rogers

MLP function code

19


Results MLP estimation of sinusoid, error = 10.3689 2

x y yhat

1 0 -1 -2

0

50

100

150

200

250

300

350

400

450

500

Fluctuation of weights indicates that better model structure needed. 5 4 3 errornorm wtnorm

2 1 0

0

50

100

150

200

250

300

350

400

450

500

x is the input sinusoid, y is the output signal which is a nonlinear combination of sinusoids, & yhat is the MLP tracking signal. The bottom plot shows the stability & error performance. adaptive NN background & applications – Steve Rogers

20


Predictive Filters signal

Z-n

Adaptive filter

Z – discrete delay operator N – number of delays

error

Signal estimate

+

Adaptive filter copy

Signal prediction

The block entitled adaptive filter may be replaced by an arbitrary structured filter. The adaptive filter copy is updated each time step. This same concept can be applied to an adaptive neural network. Note that many adaptive components (unless otherwise guaranteed stable) are used in conjunction with conventional components to ensure the stability of the adaptive component. adaptive NN background & applications – Steve Rogers

21


Fault Detection Concepts • Actuator nonlinearities – deadband, backlash, & hysteresis. Conventional and adaptive neural networks to estimate jump discontinuities. • Instrument faults – excessive noise, dead sensor, drift, and bias. Simple statistics for the 1st two & system ID for the last two. • Parameter estimation for process fault detection. Changes in coefficients may be used for fault detection. • Hopfield neural networks may be used for principal component analyses (PCA), which is used in data driven fault detection.

adaptive NN background & applications – Steve Rogers

22


Continuous Instrumentation Diagnostics for Accuracy/Precision & Life-Cycle Maintenance •

Sensor faults. Monitoring is data validation or cross checking sensor data. There are 4 types of anomalies from typical analog sensors: dead, excessive noise, drift, & offset.

Dead or excessive noise can be detected & isolated using standard deviations of the individual sensor data stream. The standard deviation is compared to the statistics of common sensors throughout the plant.

Drift or offset may also be caused by something in the process being measured, therefore, detection/isolation must be model based.

Drift or offset fault detection model equations can be based on performance criteria, heat/mass balance equations, or other model structures. Fault detection parameters are derived from the equations. Any change indicates an anomaly which can then be investigated. Kalman filters are frequently used to estimate the fault parameters in stochastic systems, although other nonlinear systems including neural networks may be used as well. Typical equations and fault indicators derived from an electric pump system & heat exchanger follow.

adaptive NN background & applications – Steve Rogers

23


Instrument Fault Types: Excessive Noise & Dead Sensor

bottom - excessive noise fault, top - dead sensor fault 4

3

sensor value

2

1

0

-1

-2

-3 0

10

20

30

40 50 60 time seconds

70

adaptive NN background & applications – Steve Rogers

80

90

100

24


Smoothed signal

Technical Approach for dead/noisy sensors: Sensor Fault Detection Filter Banks

Raw data

Filter Bank

Note that the low pass filter blocks may be of arbitrary structure and may fixed or adaptive neural networks or linear networks. fault indicators These will be determined by operational experience.

Filter Bank Raw signal x

Low pass smoothed signal + filter y

residual

Low pass filter

Abs( )

s x

y Typical distribution of ‘s’ for a group of sensors Possible noise failure

Possible dead sensor number

This algorithm will process the raw engineering converted data that comes from each sensor. ‘s’ is the output signal that is sent to the decision logic. Sensors will be grouped by type, service, criticality, etc., as appropriate.

Fault decision s

Sds

Snf

Note that Sds, & Snf will be refined by operational experience.

adaptive NN background & applications – Steve Rogers

25


Instrument Fault Types: Drift & Offset bottom - offset fault, top - drift fault 8 7

sensor value

6 5 4 3 2 1 0 -1 0

1

2

3

4 5 6 time (seconds)

7

adaptive NN background & applications – Steve Rogers

8

9

10

26


Proposed Observer Solution for Drift/Offset Sensor Fault •The observers will process the raw engineering data ‘yi’ (output measurements) and ‘u’ (input measurements) that comes from each sensor. •An estimated value of all the output measurements is sent to a set of rules for decision making. •If a residual is greater than a threshold a fault is indicated. •This is the basis for an approach using neural networks. Note that the observer blocks may have arbitrary structures. Each observer is made unique by varying the input vectors. Therefore, the differences between them become fault indicators.

u

process

y1 Q sensors

y2

Logic 1

Decision 1

yq y11 Observer 1 y21 yq1 y12 Observer 2 y22 yq2

...

Nominal health

Failed health

Suspect health

Possible States of Sensor Health

y1q y2q Observer q yqq

adaptive NN background & applications – Steve Rogers

Logic q

Decision q

27


Pump 1 Fluid Schematic with sensors & formulas LATI02SR0201P

LATI02SR0401P

dpg

dpp

LATI02SR0101P Abs Press Sensor

LATI02FM0002R LATI02FM0001R Flowmeter fm

psia Inlet

filter

Gas trap

Pump

Check Valve

Outlet

T Temperature Sensor

dpf accumulator

LATI02SR0301P LATI02SR0501Q Pump Indicators: 1) Zf = dpf/pph^2 (filter resistance) 2) Zg = dpg/pph^2 (gas trap resistance) 3) Impeller specific speed = rpm*pph^0.5/(dpp^0.75) 4) Suction specific speed = rpm*pph^0.5/(psia^0.75) electricwatts1 = amps*volts electric watts2 = amps*4.3825*krpm hydraulic watts = pph*psid/(60*8.34*2.298) 5) pump efficiency = hydraulic watts/electric watts 6) a1 = dpp - function(Impeller specific speed)*pph a1 should be close to zero except in a fault condition. 7) vc = Amps/krpm (pump ratio) 8) load = pph*dpp/(krpm*krpm) (pump load ratio) where the left hand side of the above 8 equations are indicator parameters.

LATI02SR0001T

Quantity Sensor

Amps volts krpm

LATI21FC0001C/10 LATI21FC0001V LATI21FC0003U/(255*20000)

Pump Dynamic Equations are used for estimation: 1) Ampsdot = -(R2/L2)*Amps - (psi/L2)*krpm 2) krpmdot = (psi/J)*Amps - (hth /J)*krpm 3) dppdot = hnn*pph^2 + hww*krpm^2 4) pphdot = -(hrr/ab)*pph^2 + dpp/ab where R2, L2, psi, J, hth, hnn, hww, hrr, and ab are indicator parameters which can be determined.

adaptive NN background & applications – Steve Rogers

28


ISS MTL & LTL PPA Equations Table Pump Indicators Equation

Parameters

Algorithm

PPA Area of fault detection Sensors

1) 2) 3) 4) 5) 6) 7) 8)

Zf Zg impeller spec. speed (iss) suction spec. speed (sss) pump efficiency (pe) a1 vc load

Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter Adaptive or low pass filter

filter performance gas trap performance pump performance pump performance pump performance pump performance pump motor performance pump motor performance

dpf, pph dpg, pph krpm, pph, dpp krpm, pph, psia Amps, volts, pph, dpp dpp, krpm, pph dpp amps, krpm pph, dpp, krpm

Adaptive filter Adaptive filter Adaptive filter Adaptive filter

motor performance motor performance pump performance pump performance

amps, krpm amps, krpm pph, krpm pph, dpp

Pump Dynamic Indicators Equation 1) 2) 3) 4)

R2, L2, psi psi, J, hth hnn, hww hrr, ab

Sensor Fault Matrix - PPA equations

Indicator Parameters Note that the algorithms may be adaptive neural networks as well as linear adaptive filters.

Sensors dpf pph dpg krpm dpp psia Amps volts T

Zf Zg iss * *

* *

sss

*

*

* *

*

pe a1 vc load *

* *

R2

L2

psi J

hth

* *

*

* *

*

*

*

*

*

hnn

hww

*

*

*

*

hrr

ab

*

*

*

*

* * *

*

*

adaptive NN background & applications – Steve Rogers

*

*

*

*

29


On-Line Estimation of deadband, backlash, & hysteresis In Control Element Deadband Schematic

mr

bl

v

Deadband Equations

Control Element & Plant

u

br

ml

mr(v(t) - br) if v(t) >= br u(t) = 0 if bl < v(t) < br ml(v(t) - bl) if v(t) <= bl

y

Backlash Schematic

m

cl

v

Backlash Equations

Control Element & Plant

u

cr

m

y

u(t) =

m(v(t) - cl) if v(t) <= cl m(v(t) - cr) if v(t) >= cr u(t - 1) if cl < v(t) < cr

The hysteresis schematic is more complicated than deadband or backlash & is not shown here. The general approach for parameter estimation is shown below. The types of nonlinearities are usually known by inspection. v

nonlinearity

Parameter estimation (Kalman Filter)

u

Control Element & Plant

y

Mr, ml, m, br, bl, cl, cr, etc.

adaptive NN background & applications – Steve Rogers

The estimated deadband parameters may be used for 2 purposes: • on-line control loop audits • plant control

30


Deadband Model Parameter Estimation

adaptive NN background & applications – Steve Rogers

31


Backlash Model Parameter Estimation

adaptive NN background & applications – Steve Rogers

32


Matlab code

% matlab deadband code if udb(i)>0; [p1,Pdb,err(i)] = KalmanF(p1,Pdb,... udb(i),[v(i) -1 0 0]'); end; if udb(i)<0; [p1,Pdb,err(i)] = KalmanF(p1,Pdb,... udb(i),[0 0 v(i) -1]'); end; function [param,P,err] = KalmanF(param,P,y,x); % % niter = 10; Q = 0.05*eye(size(P)); for i = 1:niter err = y - x'*param; k = P*x/(1 + x'*P*x); P = (eye(size(P)) - k*x')*P + Q; param = param + k*err; end;

adaptive NN background & applications – Steve Rogers

33


Examples of applications for active control of noise and vibration • • •

• • • • • •

• •

Control of aircraft interior noise by use of lightweight vibration sources on the fuselage and acoustic sources inside the fuselage. Reduction of helicopter cabin noise by active vibration isolation of the rotor and gearbox from the cabin. Reduction of noise radiated by ships and submarines by active vibration isolation of interior mounted machinery (using active elements in parallel with passive elements) and active reduction of vibratory power transmission along the hull, using vibration actuators on the hull. Reduction of internal combustion engine exhaust noise by use of acoustic control sources at the exhaust outlet or by use of high intensity acoustic sources mounted on the exhaust pipe and radiating into the pipe at some distance from the exhaust outlet. Reduction of low frequency noise radiated by industrial noise sources such as vacuum pumps, forced air blowers, cooling towers and gas turbine exhausts, by use of acoustic control sources. Lightweight machinery enclosures with active control for low frequency noise reduction. Control of tonal noise radiated by turbo-machinery (including aircraft engines). Reduction of low frequency noise propagating in air conditioning systems by use of acoustic sources radiating into the duct airway. Reduction of electrical transformer noise either by using a secondary, perforated lightweight skin surrounding the transformer and driven by vibration sources or by attaching vibration sources directly to the transformer tank. Use of acoustic control sources for this purpose is also being investigated, but a large number of sources are required to obtain global control. Reduction of noise inside automobiles using acoustic sources inside the cabin and lightweight vibration actuators on the body panels. Active headsets and earmuffs. adaptive NN background & applications – Steve Rogers

34


Acoustic Concept 1 Noise Source

Primary Noise Reference Microphone

Canceling Loudspeaker

Error Microphone

y(n) e(n)

x(n)

ANC

ANC is active noise control, which includes an adaptive component . Main components are: • error microphone for each direction • reference microphone • canceling loudspeaker for each direction y(n) is the loudspeaker signal that minimizes e(n) signal. adaptive NN background & applications – Steve Rogers

35


Acoustic Concept 2 Noise Source

Primary Noise

Canceling Loudspeaker

Error Microphone

y(n) e(n)

ANC ANC is active noise control. The ANC includes an adaptive algorithm that learns the system in order to create an ‘anti-noise’ in the canceling loudspeaker. Components are: • error microphone for each direction •canceling loudspeaker for each direction y(n) is the loudspeaker signal that minimizes e(n) signal. adaptive NN background & applications – Steve Rogers

36


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.