Prediction of Electrocoagulation Removal of Trivalent Chromium Using Neural Networks with a Bayesian

Page 1

Prediction of Electrocoagulation Removal of Trivalent Chromium Using Neural Networks with a Bayesian Regularization Technique Mohamed NOHAIRa, Hassan CHAAIRb, Khalid DiGUAb, Mohssine Elmorrakchia , M. AZZIc Laboratoire Catalyse, Chimiométrie & Environement, Faculté des Sciences et Techniques de Mohammedia, Morocco a

la boratoire de génie des procédés, Faculté des Sciences et Techniques de Mohammedia, Morocco

b c

laboratoire d’interface, matériaux et environnement, Faculté des Sciences Ain Chok, Casablanca

nohairmohamed@yahoo.fr Abstract Physical processes influencing the ability of the electrocoagulation process to remove Cr(III) from aqueous solutions are highly complex and uncertain, and it is difficult to elaborate a deterministic model because many factors influence the process such as the pH, potential, time and temperature. Accurate model of chromium removal form water is important, as it has implications on the quality of water and the lives depend on it. Here a model based on the neural network modelling is formulated to develop a quantitative relationship model between the elctrocoagulation technique to remove Cr(III) and influencing variables. The results show that the elaborated model is robust and reliable and gives satisfied results. It allows us to predict the elctrocoagulation removal Cr(III) with high success. The statistical method used for deriving the model was a classical threelayer feedforward neural network trained by the back-propagation method and the Levenberg-Marquardt’s algorithm implemented in the neural network MATLAB’s toolbox. The predictive ability of the ANN model was tested by -20%-out (L20%O) cross-validation method, demonstrating the superior quality of the neural model. The established model allows us the prediction of the removal Cr (III) with success. The neural network possessed a 4:4:1 architecture with a sigmoïd shape as a activation function. The model produced a cross-validation standard coefficient r between calculated and observed values about 0.99, while the cross-validation standard deviation s is equal to 1.7. Keywords Electrocoagulatio; Cr(III) Removal; Neural Network Modelling; The Levenberg-Marquardt’s Algorithm; Cross Validation Method

Introduction The relationships between variables in chemistry are almost always very complicated and highly non linear. One of the most appropriate methods to illustrate this seems to be Artificial Neural Networks (ANNs) [1, 2]. In fact, this method is very powerful in dealing with non linear relationships. A large number of publications have underlined the interest of using the ANNs instead of linear statistical models. In fact, starting from input variables, ANNs have the capacity to predict the output variable but the mechanisms that occur within the network are often ignored, it for what ANNs are considered as black boxes. The principles of this approach are widely explained through a case study dealing with the design of a robust model allowing the simulation of the removal of Cr(III) by using electrocoagulation method based on aluminium anode as electrode material. The influencing factors considered in this study are the same studied in a previous work published here [3], in which the pH, electrolysis potential, electrolysis time and temperature are related to the chromium (CrIII) removal by means of a fractional central composite design. The Energy consumption and aluminium remaining in solution are also considered in the study. A large of work has underlined the interest of using of electrocoagulation to remove the CrIII from aqueous solutions [4-7]. It has been successfully used to treat a variety of industrial wastewater. The goal of this method is to form flocs of metal hydroxides within the effluent to be cleaned by elctrodissollution of soluble anodes. Three main processes occur during electrodissolution: electrolytic reactions at surface of electrodes, formation of coagulations in aqueous phase, adsorption of soluble or colloidal pollutants on coagulants, and removal by sedimentation of flotation.

1


Electrocoagulation has the potential to be the distinct economical and environmental choice for treatment of wastewater for the following reasons •

It needs simple equipments

•

It is easily operable

•

The cost is low

•

No chemical addition required

More explanation about the process is given in many specialized papers [3]. There are many factors influencing the chromium removal from aqueous solutions by using the electrocoagulation technique, among them are the pH, the applied potential, the application time and the temperature. These operating conditions are associated with various aspects of the main reactions at the electrodes of aluminium: đ??´đ??´đ??´đ??´đ??´đ??´đ??´đ??´đ??´đ??´: đ??´đ??´đ??´đ??´ → đ??´đ??´đ??´đ??´ 3+ + 3đ?‘’đ?‘’ đ??śđ??śđ??śđ??śđ??śđ??śâ„Žđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œ: 2đ??ťđ??ť2 đ?‘‚đ?‘‚ + 2đ?‘’đ?‘’ → đ??ťđ??ť2 + đ?‘‚đ?‘‚đ??ťđ??ť −

Aaq3+ and OH- ions generated by reactions (1) and (II) react to form various monomeric and polymeric species which transform finally into Al(OH)3 according to complex precipitation kinetics [8]. The present paper is divided into two main sections: The first one describes the database used in our study and a review of the statistical model obtained from the fractional central composite design. The second presents the neural network model and how to illustrate his predictive ability and good generalization; even the data set is small. Dataset The dataset corresponding to the removal of Cr III from aqueous solutions of the present investigation is a set of a thirty-one experiments according to a fractional central composite design. The complete list of responses and the levels attributed to each variable for each experiment are given in Tables 1 and 2. TABLE 1 NATURAL AND CODED VARIABLES

Coded variables x1, x2, x3 et x4

Natural variable

-2

-1

-0

1

x1= pH

2

3

4

5

2 6

x2=potential (V)

2,5

4,5

6,5

8,5

10,5

x3=time (min)

10

15

20

25

30

x4=temperature (°C)

15

20

25

30

35

TABLE 2 THE EXPERIMENTAL DATA FOR CHROMIUM REMOVAL AND THEIR 95% INTERVAL CONFIDENCE

Coded variables

2

Experimental response with interval confidence

number

X1

X2

X3

X3

%E Experimental

Lower limit

1

-1

-1

-1

-1

7,07

5,61

Superior limit 8,53

2

1

-1

-1

-1

36,66

35,2

38,12

3

-1

1

-1

-1

58,7

57,24

60,16

4

1

1

-1

-1

84,56

83,1

86,02

5

-1

-1

1

-1

50,1

48,64

51,56

6

1

-1

1

-1

70,44

68,98

71,9

7

-1

1

1

-1

75,61

74,15

77,07

8

1

1

1

-1

90,35

88,89

91,81

9

-1

-1

-1

1

13,61

12,15

15,07

10

1

-1

-1

1

45,32

43,86

46,78

11

-1

1

-1

1

72,28

70,82

73,74 94,78

12

1

1

-1

1

93,32

91,86

13

-1

-1

1

1

54,6

53,14

56,06

14

1

-1

1

1

71,23

69,77

72,69

15

-1

1

1

1

82,05

80,59

83,51

16

1

1

1

1

92,83

91,37

94,29


17

-2

0

0

0

33,63

32,17

35,09

18

2

0

0

0

80,42

78,96

81,88

19

0

-2

0

0

12,26

10,8

13,72

20

0

2

0

0

88,28

86,82

89,74

21

0

0

-2

0

53,34

51,88

54,8

22

0

0

2

0

91,2

89,74

92,66

23

0

0

0

-2

64,62

63,16

66,08

24

0

0

0

2

74,22

72,76

75,68

25

0

0

0

0

75,45

73,99

76,91

26

0

0

0

0

73,48

72,02

74,94

27

0

0

0

0

72,32

70,86

73,78

28

0

0

0

0

73,27

71,81

74,73

29

0

0

0

0

73,14

71,68

74,6

30

0

0

0

0

75,82

74,36

77,28

31

0

0

0

0

72,93

71,47

74,39

The removal chromium efficiency represents the remaining Cr(III) in the solution and is determined by the atomic absorption spectrophotometry. It is calculated as: đ??śđ??śđ?‘–đ?‘– − đ??śđ??śđ?‘“đ?‘“ ∗ 100, đ??śđ??śđ?‘–đ?‘– đ??śđ??śđ?‘–đ?‘– đ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ??śđ??śđ?‘“đ?‘“ đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x; đ?‘Ąđ?‘Ąâ„Žđ?‘’đ?‘’ đ?‘–đ?‘–đ?‘–đ?‘–đ?‘–đ?‘–đ?‘–đ?‘–đ?‘–đ?‘–đ?‘–đ?‘–đ?‘–đ?‘– đ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Ž đ?‘Ąđ?‘Ąâ„Žđ?‘’đ?‘’ đ?‘“đ?‘“đ?‘“đ?‘“đ?‘“đ?‘“đ?‘“đ?‘“đ?‘“đ?‘“ đ?‘?đ?‘?â„Žđ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x; đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘? đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x; % đ?‘…đ?‘…đ?‘…đ?‘…đ?‘…đ?‘…đ?‘…đ?‘…đ?‘…đ?‘…đ?‘…đ?‘…đ?‘…đ?‘… đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’đ?‘’ =

Before analysis, it is often useful to scale the inputs so that they always fall within a specified range [-2; 2]. The overestimating the effect of variables with strong value is then avoided. The random error is realised by considering the experiments from 25 to 31. They correspond to the center of the experimental domain and the error is assessing by the following relationship: 1 ďż˝(đ?‘Śđ?‘Śđ?‘–đ?‘– − đ?‘Śđ?‘Śďż˝)2 , đ?‘ đ?‘ = 1,30 đ?‘›đ?‘› − 1 đ?‘Śđ?‘Śđ?‘–đ?‘– đ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Ž đ?‘Śđ?‘Śďż˝ đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x; đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x;đ?‘&#x; đ?‘Ąđ?‘Ąâ„Žđ?‘’đ?‘’ đ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œ đ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ą đ?‘Žđ?‘Žđ?‘Žđ?‘Ž đ?‘Ąđ?‘Ąâ„Žđ?‘’đ?‘’ đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘? đ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Ž đ?‘Ąđ?‘Ąâ„Žđ?‘’đ?‘’ đ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Ž đ?‘Łđ?‘Łđ?‘Łđ?‘Łđ?‘Łđ?‘Łđ?‘Łđ?‘Łđ?‘Łđ?‘Ł đ?‘ đ?‘ 2 =

Responses are considered as a normal distribution with the same variance. For a risk of 5%, the confidence interval can be calculated easily, it is equal to:

For all proposed model

Âąđ?‘Ąđ?‘Ą0,025 .

đ?‘ đ?‘

√đ?‘ đ?‘

= Âą1,46,

đ?‘Ąđ?‘Ą0,025 = 2,98 đ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Žđ?‘Ž đ?‘ đ?‘ = 7

đ?‘Śđ?‘Śđ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘? = đ?‘Śđ?‘Śđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œ + ∆ + đ?œ–đ?œ–

∆ and Îľ correspond respectively to the deviation from the model and the random error. The regression model (the second order polynomial) corresponding to the composite design can be explained by the following quadratic equation đ?‘Śđ?‘Śďż˝ = đ?‘?đ?‘?0 + ďż˝ đ?‘?đ?‘?đ?‘–đ?‘– đ?‘Ľđ?‘Ľđ?‘–đ?‘– + ďż˝ ďż˝ đ?‘?đ?‘?đ?‘–đ?‘–đ?‘–đ?‘– đ?‘Ľđ?‘Ľđ?‘–đ?‘– đ?‘Ľđ?‘Ľđ?‘—đ?‘— + ďż˝ đ?‘?đ?‘?đ?‘–đ?‘–đ?‘–đ?‘– đ?‘Ľđ?‘Ľđ?‘–đ?‘–2

đ?‘’đ?‘’đ?‘’đ?‘’. 1

Based on the analysis done by means of the student ‘t’ test according to the common interval confidence for all effects; the linear and quadratic effect for all variables are found highly significant, except the quadratic effect of time. However, all of the interaction effect of the variables is found significant exempt the interaction between pH and temperature. For testing that the model is suitable to offer a good generalisation, we have divided the dataset into two subsets; the first one is used to perform a training model and the second one, chosen by a random function, to test its performance. We computed both la correlation coefficient and the standard deviation between the observed and prediction values: R2 = 1 −

∑ ( yobs − ycalc ) ∑ ( yobs − ymean )

3


s2 =

∑ ( yobs − ycalc ) n

2

eq.2

Both y and n represent the CrIII removal efficiency and the size of data. Equations of regression are the following ones đ?‘Śđ?‘Śđ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘? = 0,774 + 0,898 ∗ đ?‘Śđ?‘Śđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œ

đ?‘…đ?‘… = 0,99 ; đ?‘ đ?‘ = 1,760 đ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ą đ?‘ đ?‘ đ?‘ đ?‘ đ?‘ đ?‘ = 23 đ?‘‘đ?‘‘đ?‘‘đ?‘‘đ?‘‘đ?‘‘đ?‘‘đ?‘‘

đ?‘Śđ?‘Śđ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘?đ?‘? = 0,581 + 1,000 ∗ đ?‘Śđ?‘Śđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œđ?‘œ

Neural Networks Analysis

đ?‘…đ?‘… = 0,99 ; đ?‘ đ?‘ = 2,160 đ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ąđ?‘Ą đ?‘ đ?‘ đ?‘ đ?‘ đ?‘ đ?‘ = 8 đ?‘‘đ?‘‘đ?‘‘đ?‘‘đ?‘‘đ?‘‘đ?‘‘đ?‘‘

Regression method based on neural network (9) is widely used for its ability to model complex non-linear relationships without any prior assumptions about the nature of the relationships, and this represents its greatest advantage. The solution of a problem is not explicitly encoded in the network, but is learned by supplying examples of previously solved problems to the network. After the network has learned to solve the example problems, it is said to be trained. New data from the same knowledge domain then can be input to the trained neural network which then outputs a solution. From a practical point of view, an artificial neural network is simply a computer program that transforms an mvariable input into an n-variable output. Artificial neural networks (ANNs) appear to be very promising in obtaining models that convert structural features into different properties of process. The computational neural network used in this study was a three-layer (input-hidden-output), fully connected, feed-forward network (10) (Figure 1). The input layer contains one node for each variable. The output layer has one node generating the scaled estimated value of the chromium removal. Although there are neither theoretical nor empirical rules to determine the number of hidden layers or the number of neurons in this layer, one layer seems to be sufficient in most applications of ANNs. Networks with biases, a sigmoid layer, and a linear output layer are capable of approximating any function with a finite number of discontinuities (9). To do this, input vectors and the corresponding target vectors are used to train a network until it can approximate a function relating between them in a training phase. y

V

W

x(1)

x(2)

‌‌.

x(k-1)

x(1)

FIG. 1 A THREE LAYER FEED FORWARD NEURAL NETWORK. W AND V ARE THE WEIGHT MATRIX CONNECTING THE INPUT VARIABLES TO THE OUTPUT BY MEANS THE HIDDEN LAYER.

The training phase was realized by using the standard back-propagation method (11). It is a generalizion of the Widrow-Hoff learning rule (12) for multiple-layer networks and nonlinear differentiable transfer functions. Standard back-propagation is a gradient descent algorithm, in which the network weights are moved along the

4


negative of the gradient of the performance function J (equation (3)). The term back-propagation refers to the manner in which the gradient (equation (4) & equation (5)) is computed for nonlinear multilayer networks. N

N

2 k k k J ( w) = ∑ ( y − g ( x , w)) = ∑ J ( w)

k 1= k 1 =

(3)

The function J y: observed target, g(x,w)= calculated target

 ∂J k   ∂J k   ∂vi  k k =   =  δ i x j    ∂wij   ∂vi   ∂wij k

(4)

The Gradient of the function J And

 ∂J k   ∂g ( x, w)  k  = −2 g ( x , w)    ∂vi   ∂vi  k

δj =  k

(5)

The partial gradient related to neuron i The same approach is applied to hidden neurons; the parameters are modified by following the learning rule based on the back-propagation method (equation (6))

w(i ) = w(i − 1) − µi ∇J ( w(i − 1))

(6)

µi and i represent respectively the pass and iteration training. The weights of connections between the neurons were initially assigned random values uniformly. During training, the weights and biases of the network are iteratively adjusted to minimize the network performance function J. The training was followed by examining the RMS error (RMS stands for root mean square, that is the square root of the average residual (eq.2) for the total set. Training was stopped when there was no further improvement in the test set RMS error. We also computed both the correlation coefficient and the standard deviation between the observed and predicted values. The algorithm explained previously represents the simplest implementation of back-propagation learning. In several cases, this algorithm has been widely criticized for slow convergence, inefficiency, and lack robustness, as the network could get stuck in a shallow local minimum. To avoid the problem of the convergence and to reduce the training time, we can use algorithms based on steepest descent gradient with momentum (10). Momentum allows a network to respond not only to the local gradient, but also to recent trends in the error surface. Acting like a lowpass filter, momentum allows the network to ignore small features in the error surface. The momentum is added to back-propagation learning by making weight changes equal to the sum of a fraction of the last weight change and the new change suggested by the back-propagation rule. The performance of the algorithm is very sensitive to the proper setting of the learning rate. The optimal learning rate changes during the training process, as the algorithm moves across the performance surface. We use a steepest descent algorithm by allowing the learning rate to change during the process with momentum training. There are also other algorithms based on the conjugate gradient; a search is performed along conjugate directions, which produces generally faster convergence than steepest descent directions. Among them, the Levenberg-Marquardt algorithm (13) was designed to approach second-order training speed which is a combination of the known gradient descent and Gauss-Newton method. Levenberg-Marquardt has been applied here because the algorithm is stable and efficient. We have used in this study the neural network package included in the MATLAB Toolbox (MATLAB 7.0, 2010) (14) Methods, including back-propagation neural nets, still some problems, and principal among these are overtraining and averfitting. Overfitting results from the use of too many adjustable parameters in modelling the training data. It can be avoided by the use of the validation set and the early stopping technique. The purpose is to choose a model capable to offer a good generalization for prediction. In this technique the available data is divided into

5


three subsets. The first subset is the training set, which is used for computing the gradient and updating the network weights and biases. The second subset is the validation set. The error on the validation set is monitored during the training process. The validation error normally decreases during the initial phase of training, as does the training set error. However, when the network begins to overfit the data, the error on the validation set typically begins to rise. When the validation error increases for a specified number of iterations, the training is stopped, and the weights and biases at the minimum of the validation error are returned. An application randomly divides input vectors and target vectors into three sets as follows: •

60% are used for training.

20% are used to validate that the network is generalizing and to stop training before overfitting.

The last 20% are used as a completely independent test of network generalization.

An optimal architecture network is obtained by varying the number of hidden neurons without limits. This was carried out iteratively. The size of the hidden layer was varied from 1 to 20 hidden units, and 10 networks were trained for each architecture. Plotting the error for the all set against the number of hidden units allowed the optimal architecture to be determined. We started from one neuron in the hidden layer, the statistical indices of the correlation between experimental and predicted values of the CrIII removal for the whole data set improves with the increase of their number; the optimal number retained of neurons in the hidden layer was found four. Any improvement was observed when this number increases over four; the results are expressed in Table 3 TABLE 3 STATISTICAL ANALYSIS FOR DIFFERENT ARCHITECTURE (NUMBER OF NEURONS IN THE HIDDEN LAYER VARIES FROM 2 UP TO 20)

(R2, s)

2

4

6

8

10

15

20

(98%;2,5)

(99%;1,68)

(99%;2,12)

(98%;2,3)

(98%;2,3)

(98%;2,5)

(98%;2,6)

Both r and s represent the correlation coefficient and the standard deviation between estimated and observed values of the CrIII removal for the entire data set (31 experiments)

calculated values

We have also used the leave-20%-out cross validation technique as a criterion for checking the quality of the model. In this procedure 20% of the whole set data were selected out one after another. The data set is classified in increasing order; experiments are extracted in equal intervals. For every selection, the model was build up with the remaining 80% examples. After that, the model was used for predicting. This procedure is repeated five times until all patterns are selected in a prediction set once and only once. The size of the data set is not enough of a correct investigation by varying the validation set. The choice of the validation set is very important because it should be representative of all points in the training set. For that we use the Bayesian regularization technique (15-16) that does not require that a validation data set be separate from the training data set; it uses all the data. It consists of penalizing the high values of the weights by modifying the cost function J. This technique forces the parameters (for example, the weights) not to take high values, and consequently to avoid overfitting. There is no need for a validation set, since the application of the Bayesian statistics provides a network that has maximum generalization. 100 90 80 70 60 50 40 30 20 10 0

y = 1.006x - 0.129 R² = 0.995

0

20

40

60

80

100

observed values FIG. 2 CALCULATED CRIII REMOVAL EFFICIENCY VALUES AS FUNCTION OF OBSERVED VALUES FOR THE WHOLE DATA SET (THE DATA REPRESENTS THE FIVE JOINED DATA SETS)

6


5

error

3 1 -1 0

20

40

60

80

100

-3 observation -5 FIG. 3 THE PROFILE OF THE ERROR DISTRIBUTION

Joined results from the cross-validation technique of the CrIII removal values estimated gave information on the prediction ability. A linear regression between experimental and predicted values leads to the cross-validation standard coefficient R2 between experimental values of the CrIII removal and their computed values equal to 0,99, while the corss-validation standard s is equal to 1,72 (Figure 2 and 3). The plot in the figure (3) shows the residuals plotted for the all data. The 95% confidence intervals about these residuals are plotted as error bars. The first observation is that only a few cases present an error superior than 2. The residuals are the difference between the observed and predicted values. The residuals are useful for detecting failures in the model assumptions, since they correspond to the errors in the model equation. By assumption, these errors have independent normal distributions with mean zero and a constant variance. The regression model corresponding to the composite design (3) takes into both linear and quadratic effect of variables, but it is observed that one of them (time) is not significant (quadratic effect) and the remaining affect poorly the process involved in the CrIII removal in comparison to those representing the linear effect. As well as seen in the linear regression model, the electrolysis potential represents the factor of prime importance followed by pH and time, the temperature contributes poorly. We analyze the relative contribution of the linear effect of each variable by computing the ratio of the regression coefficient to the sum of those corresponding to the entire equation (Table 4). It is assessing by the following relationship: đ?‘?đ?‘?đ?‘–đ?‘– ∗ 100 đ?‘?đ?‘?1 + đ?‘?đ?‘?2 + đ?‘?đ?‘?3 + đ?‘?đ?‘?4

b1, b2, b3 and b4 correspond to regression coefficients of the influencing factors TABLE 4 PROFILE OF CONTRIBUTION OF INFLUENCING FACTORS

Relative Contribution %

pH

Potential

Time

Temperature

25,42

43,55

24,19

6,82

A linear model elaborated from the experiments design is widely sufficient to give a middle profile contribution of influencing factors and operating condition to optimize the removal of the trivalent chromium. This can be made by considering the energy consumption and aluminium remaining in solution for the treatment by electrocoagulation of the trivalent chromium removal. The choice of the pH has to respect the pH of the effluent, from which the trivalent chromium is removed such as the industries of textile, tannery and other industries using the chromium. On the other hand, the neural network analysis is robust and allows the prediction of the CrIII removal with high success without prior assumptions. Conclusion This paper serves as a contribution to the use of the neural network modelling in chemistry sciences. The merits of this methodology are discussed through an example concerning the Cr(III) removal form water. Artificial neural networks (ANNS) are employed here to investigate relationships between the removal of the chromium and operating conditions: pH, electrolysis potential, electrolysis time and temperature; with emphasis on Bayesian Regularization technique. Efficiency of the neural model established here is not demonstrated in comparison with

7


other models. Based on only a thirty-one experiments according to a fractional central composite design, very satisfactory results are then obtained by testing the predictive ability and the good generalisation of the Bayesian statistics by cross-validation technique. The statistical parameters of the models are very good. Indeed we have a model superior than 98% of the total variance and a standard deviation inferior to 1.8 for the whole data set. REFERENCES

[1]

Zupan J. and Gasteiger J. (1997), neural network for chemistry and Drug Design. Wiley-VCH, Weinheim, Germany

[2]

Divellers J. (1996), Neural network in QSAR and Drug Design, Academic Press, London

[3]

Z. Zeroual et al. Optimizing the removal of trivalent chromium by electrocoagulation using experimental design, Chem. Eng. J., 148 (2009) 488-495

[4]

M. Panayotova, J. Fritsch, Treatment of wastewater from the lead-zinc ore processing industry, J. Environ. Sci. Health A31 (9) (1996) 2155-2165

[5]

J. C. Donini, J. Khan, J. Szynkarczuk, T. A. Hassan, K. L. Kar, The operating cost of electrocoagulation, Can. J. Chem. Eng. 72 (December) 1994 1007-1012

[6]

X. Chen, G. Chen, P. L. Yue, Separation of polluants from rastaurant wastewater by electrocoagulation, Sep. Purif. Technol, 31(3-4)(1995) 275-283

[7]

M. Murugananthan, G. Bhaskar Raju, S. Prabhakar, Removal of sulfide, sulfate and sulfite ions by elctrocoagulation. J. Hazard. Mater. B109 (2004) 37-44

[8]

A. Gurses, M. Yalçin, C. Dogar, Electrocoagulation of some reactive dyes: a statistical investigation of some electrochemical variables, Waste Manage. 22(2002) 491-499

[9]

G. Dreyfus, J. Martinez, M. Samuelides, M. Gordon, F. Badran, S., Thiria, & L. Hérault, (2002) Réseaux de neurones Méthodologie et applications. Edition Eyrolles.

[10] M. Rudolph, On topology, size and generalization of non-linear feed-forward neural networks” Neurocomputing, 16, (1997) 1–22. [11] M.H. Hassoum, (1995). Fundamentals of artificial neural networks. Eds., MIT Press, Cambridge. [12] S. Haykin, (1994) Neural networks: a comprehensive foundation, Mac-Millan, New York [13] M. T. Hagan and M. B. Menhaj, Training feedforward networks with the Marquardt algorithm. IEEE Trans. Neural Networks, 5(6) (1994) 989-993. [14] MATLAB 7.0, The MathWorks, Inc., Natick, MA. 2002. [15] D. J. C. MacKay, Bayesian interpolation. Neural Computation, 4 (1992) 415-447. [16] A. Menchero, R.M. Diez, D.R. Insua, and P. Müller, Bayesian analysis of nonlinear autoregression models based on neural networks. Neural Computation, 17 (2005) 453-485.

8


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.