Ijcot v6p309 by Ssrg Journals

International Journal of Computer & Organization Trends – Volume 6 Number 1 – Mar 2014

HLSRGM Based SPRT: MMLE Dr. R. Satya Prasad#1, S. Janardhana Rao#2, Dr. G. Krishna Mohan#3 1

Associate Professor, Dept. of Computer Science & Engg., Acharya Nagrjuna University, Guntur, Andhrapradesh, India. 2

Team Lead, JKC, APSFKNW, ieg, India. Reader, Dept. of Computer Science, P.B.Siddhartha College, Vijayawada, Andhrapradesh, India.

Abstract—Sequential Analysis of Statistical science could be adopted in order to decide upon the reliability / unreliability of the developed software very quickly. The procedure adopted for this is, Sequential Probability Ratio Test (SPRT). It is designed for continuous monitoring. The likelihood based SPRT proposed by Wald is very general and it can be used for many different probability distributions. The parameters are estimated using Modified Maximum Likelihood Estimation (MMLE). In the present paper, the HLSRGM (Half Logistic Software Reliability Growth model) is used on five sets of existing software reliability data and analyzed the results. Keywords— HLSRGM, MMLE, Decision lines, Software testing, Software failure data.

I. INTRODUCTION

assumed that the average number of recorded failures in a given time interval is directly proportional to the length of the interval and the random number of failure occurrences in the interval is explained by a Poisson process then we know that the probability equation of the stochastic process representing the failure occurrences is given by a Homogeneous Poisson Process with the expression P  N  t   n  

e t  t 

(1.1) n! Stieber (1997) observes that if classical testing strategies are used, the application of software reliability growth models may be difficult and reliability predictions can be misleading. However, he observes that statistical methods can be successfully applied to the failure data. He demonstrated his observation by applying the wellknown sequential probability ratio test (SPRT) of Wald (1947) for a software failure data to detect unreliable software components and compare the reliability of different software versions. In this paper a popular SRGM HLSRGM is considered and adopted the principle of Stieber (1997) in detecting unreliable software components in order to accept or reject the developed software. The theory proposed by Stieber (1997) is presented in Section 2 for a ready reference. Extension of this theory to the SRGM – HLSRGM is presented in Section 3. Application of the decision rule to detect unreliable software with respect to the proposed SRGM is given in Section 4. Analysis of the application of the SPRT on five data sets and conclusions drawn are given in Section 5 and 6 respectively.

Wald's procedure is particularly relevant if the data is collected sequentially. Sequential Analysis is different from Classical Hypothesis Testing were the number of cases tested or collected is fixed at the beginning of the experiment. In Classical Hypothesis Testing the data collection is executed without analysis and consideration of the data. After all data is collected the analysis is done and conclusions are drawn. However, in Sequential Analysis every case is analyzed directly after being collected, the data collected upto that moment is then compared with certain threshold values, incorporating the new information obtained from the freshly collected case. This approach allows one to draw conclusions during the data collection, and a final conclusion can possibly be reached at a much earlier stage as is the case in Classical Hypothesis Testing. The advantages of Sequential Analysis are easy to see. As data collection can be terminated after fewer cases and decisions taken earlier, the savings in terms of human life and misery, and financial savings, might be considerable. II. WALD'S SEQUENTIAL TEST FOR A POISSON PROCESS In the analysis of software failure data it is often The sequential probability ratio test (SPRT) was deal with either Time Between Failures or failure count in a given time interval. If it is further developed by A.Wald at Columbia University in

ISSN: 2249-2593

http://www.ijcotjournal.org

Page 37

International Journal of Computer & Organization Trends – Volume 6 Number 1 – Mar 2014 1943. Due to its usefulness in development work on military and naval equipment it was classified as ‘Restricted’ by the Espionage Act (Wald, 1947). A big advantage of sequential tests is that they require fewer observations (time) on the average than fixed sample size tests. SPRTs are widely used for statistical quality control in manufacturing processes. An SPRT for homogeneous Poisson processes is described below. Let {N(t),t 0} be a homogeneous Poisson process with rate ‘’. In our case, N(t)= number of failures up to time ‘ t’ and ‘ ’ is the failure rate (failures per unit time ). Suppose that a system is on test (for example a software system, where testing is done according to a usage profile and no faults are corrected) and that to estimate its failure rate ‘ ’. We can not expect to estimate ‘ ’ precisely. But we want to reject the system with a high probability if our data suggest that the failure rate is larger than 1 and accept it with a high probability,

realizations N (t1 ), N (t2 ),........N (tK ) Q1 Simplification of Q0

N(t). gives

N t 

  Q1  exp(0  1 )t   1  Q0  0  The decision rule of SPRT is to decide in favor of 1 , in favor of 0 or to continue by observing the number of failures at a later time than 't' according Q as 1 is greater than or equal to a constant say A, Q0 less than or equal to a constant say B or in between the constants A and B. That is, we decide the given software product as unreliable, reliable or continue the test process with one more observation in failure data, according as Q1 (2.3) A Q0 Q1 (2.4) B if it’s smaller than 0 . As always with statistical Q0 tests, there is some risk to get the wrong answers. Q1 (2.5) A So we have to specify two (small) numbers ‘α’ and B  Q0 ‘β’, where ‘α’ is the probability of falsely rejecting The approximate values of the constants A and B the system. That is rejecting the system even if λ ≤ 1   , B 0 . This is the "producer’s" risk. β is the are taken as A   1 probability of falsely accepting the system .That is Where ‘  ’ and ‘  ’ are the risk probabilities as accepting the system even if λ ≥ 1 . This is the defined earlier. A simplified version of the above

“consumer’s” risk. With specified choices of 0 and

decision processes is to reject the system as 1 such that 0 < 0 < 1 , the probability of finding unreliable if N(t) falls for the first time above the line NU  t   a.t  b2 (2.6) N(t) failures in the time span (0,t ) with 1 , 0 as To accept the system to be reliable if N(t) falls the failure rates are respectively given by for the first time below the line N t  e1t  1t  N L  t   a.t  b1 (2.7) (2.1) Q1  N (t )! To continue the test with one more observation

e0t  0t 

N t 

on (t, N(t)) as the random graph of [t, N(t)] is between the two linear boundaries given by N (t )! equations (2.6) and (2.7) where Q   The ratio 1 at any time ’t’ is considered as a (2.8) a 1 0 Q0  1  log   measure of deciding the truth towards 0 or 1 ,  0  given a sequence of time instants say t1  t2  t3  ........  tK and the corresponding

Q0 

ISSN: 2249-2593

(2.2)

http://www.ijcotjournal.org

Page 38

International Journal of Computer & Organization Trends – Volume 6 Number 1 – Mar 2014 1    log      b1    log  1   0  1    log     b2    log  1   0 

(2.9)

mean value function the probability equation of a such a process is P  N (t)  Y 

(2.10)

The parameters  ,  , 0 and 1 can be chosen in several ways. One way suggested by Stieber (1997) .log  q  .log  q   is 0  , 1  q where q  1 q 1 q 1 0

 m(t ) 

.em(t ) , y  0,1,2,   

y! Depending on the forms of m(t) we get various Poisson processes called NHPP. We may write e m1 (t ) . m1 (t )  Q1  N (t )!

Q0 

N (t )

e  m0 (t ) . m0 (t ) 

N (t )

N (t )! Where, m1 (t ) , m0 (t ) are values of the mean value If 0 and 1 are chosen in this way, the slope of function at specified sets of its parameters NU (t) and NL (t) equals λ. The other two ways of indicating reliable software and unreliable software choosing 0 and 1 are from past projects and from respectively. Let P , P be values of the NHPP at 0 1 part of the data to compare the reliability of two specifications of b say b , b where b  b  0 1 0 1 different functional areas. respectively. It can be shown that for our models III. HLSRGM m  t  at b1 is greater than that at b0 . Symbolically One simple class of finite failure NHPP model is m0  t   m1  t  . Then the SPRT procedure is as the HLSRGM, assuming that the failure intensity is follows: proportional to the number of faults remaining in Q Accept the system to be reliable if 1  B the software describing an exponential failure curve. Q 0 It has two parameters. Where, ‘a’ is the expected N (t )  m ( t ) 1 total number of faults in the code and ‘b’ is the e . m (t )  i.e.,  m (t ) 1 B shape factor defined as, the rate at which the failure N (t) 0 e . m ( t )   0 rate decreases. The cumulative distribution function bt    1  e  .The expected log    m1 (t )  m0 (t ) of the model is: F  t   1   bt   i.e., N (t )  (4.1) 1 e  log m1 (t )  log m0 (t ) number of faults at time ‘t’ is denoted Decide the system to be unreliable and reject if  bt by m (t )  a 1  e  , a  0, b  0, t  0 . Q 1 A 1  e  bt  Q0 The MML estimation of parameters ‘a’ and ‘b’  1   for the considered model is explained in log   m (t )  m0 (t )   1  Satyaprasad et. al., (2011). i.e., N (t )  (4.2) log m1 (t )  log m0 (t ) IV. SEQUENTIAL TEST FOR SRGMS Continue the test procedure as long as In Section II, for the Poisson process we know    1  that the expected value of N(t) = λt called the log log  m1(t) m0(t)  m1(t) m0(t) 1        average number of failures experienced in time (4.3)  N(t) 

logm1(t) logm0 (t) logm1(t) logm0(t) 't' .This is also called the mean value function of the Substituting the appropriate expressions of the Poisson process. On the other hand if we consider a Poisson process with a general function m(t) as its respective mean value function – m(t) of HLSRGM,

ISSN: 2249-2593

http://www.ijcotjournal.org

Page 39

International Journal of Computer & Organization Trends – Volume 6 Number 1 – Mar 2014 we get the respective decision rules and are given in followings lines Acceptance region:



N (t ) 



 2a eb0t  eb1t      log    t b  b  1    1  eb0t  e b1t  e  0 1  

either side of estimate of b obtained through a Data Set to apply SPRT such that b0 < b < b1. Assuming the value of   0.001 , the choices are given in the following table. TABLE 5.1: ESTIMATES OF A, B & SPECIFICATIONS OF B0, B1 FOR TIME

(4.4)

 1  eb1t   1  e b0t   log  b1t   b0t    1  e  1  e  

DOMAIN

Rejection region:



N (t ) 



 2a e b0t  e b1t  1      log    t b  b    1  e b0t  eb1t  e  0 1  

(4.5)

 1  e b1t   1  eb0t   log  b1t   b0 t    1  e  1  e  

Continuation region:





bt bt1 0      2a e e log  bt bt tb b  0 1 0 1 1 1e e e 

1ebt1 1ebt0  log bt1  bt0  1e 1e 



Estimate of ‘b’ 0.004021 0.005104 0.006709 0.018925

XIE NTDS IBM LYU

Estimate of ‘a’ 31.62190 36.06863 17.38641 32.53707

0.003021 0.004104 0.005709 0.017925

0.005021 0.006104 0.007709 0.019925

38.04483

0.005231

0.004231

0.006231

Data Set



bt bt1 0  1  2a e e log  bt bt tb b  0 1 0 1    1e e e  (4.6) Nt() 1ebt1 1ebt0  log bt1  bt0  1e 1e 

It may be noted that in the above model the decision rules are exclusively based on the strength of the sequential procedure (, ) and the values of the respective mean value functions namely, m0 (t ) , m1 (t ) . If the mean value function is linear in ‘t’ passing through origin, that is, m(t) = λt the decision rules become decision lines as described by Stieber (1997). In that sense equations (4.1), (4.2), (4.3) can be regarded as generalizations to the decision procedure of Stieber (1997). The applications of these results for live software failure data are presented with analysis in Section 5.

Using the selected b0 , b1 and subsequently the m0 (t ), m1 (t ) for the model, it is calculated the decision rules given by Equations 4.4 and 4.5, sequentially at each ‘t’ of the data sets taking the strength ( α, β ) as (0.05, 0.2). These are presented for the model in Table 5.2. The following consolidated table reveals the iterations required to come to a decision about the software of each Data Set. TABLE 5.2: SPRT ANALYSIS FOR 5 DATA SETS OF TIME DOMAIN DATA Data Set

Xie

V. SPRT ANALYSIS OF LIVE DATA SETS

The developed SPRT methodology is for a software failure data which is of the form [t, N(t)]. Where, N(t) is the failure number of software system or its sub system in ‘t’ units of time. In this section we evaluate the decision rules based on the considered mean value function for Five different data sets of the above form, borrowed from (Xie, 2002), (Pham, 2006) and (LYU,1996). The procedure adopted in estimating the parameters is a MMLE. Based on the estimates of the parameter ‘b’ in each mean value function, we have chosen the specifications of b0  b  , b1  b  equidistant on

ISSN: 2249-2593

NTDS

IBM

N(t)

30.02 31.46 53.93 55.29 58.72 71.92 77.07 80.9 101.9 114.87 115.34 9 21 32 36 43 45 50 58 63 70 71 77 78 10 19 32 43 58

1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11 12 13 1 2 3 4 5

http://www.ijcotjournal.org

Acceptance region (≤) -1.342698 -1.267039 -0.166009 -0.104056 0.049878 0.611889 0.818353 0.967340 1.717214 2.125902 2.139960 -3.130824 -2.143404 -1.307213 -1.018950 -0.534218 -0.400259 -0.074025 0.422698 0.717732 1.111369 1.165786 1.482922 1.534234 -4.634744 -4.179723 -3.590548 -3.151740 -2.636489

Rejection Region (≥) 7.201955 7.279608 8.423895 8.489201 8.651966 9.252971 9.476814 9.639516 10.476656 10.949528 10.966060 7.782102 8.786362 9.649855 9.950868 10.461547 10.603770 10.952326 11.489711 11.813266 12.250869 12.311952 12.671122 12.729779 9.795210 10.278280 10.938841 11.466755 12.146432

Decision

Reject

Continue

Page 40

International Journal of Computer & Organization Trends – Volume 6 Number 1 – Mar 2014

LYU

70 88 103 125 150 169 199 231 256 296 0.5 1.7 4.5 7.2 10 13 14.8 15.7 17.1 20.6 24 25.2 26.1 27.8 29.2 31.9 35.1 37.6 39.6 44.1 47.6 52.8 60 70.7 3.183 6.883 11.55 16.383 21.217 27.633 37.133 47.3 53.383 59.883 64.467 70.467 83.8 103.967 110.75 111.583

6 7 8 9 10 11 12 13 14 15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

-2.289491 -1.870631 -1.608845 -1.357710 -1.248933 -1.282180 -1.523307 -2.019126 -2.570507 -3.748940 -14.577397 -14.218605 -13.424252 -12.713367 -12.031234 -11.360371 -10.986704 -10.807797 -10.539827 -9.923674 -9.396380 -9.226543 -9.104625 -8.886935 -8.719864 -8.428185 -8.133359 -7.940449 -7.809238 -7.587367 -7.483326 -7.436374 -7.577969 -8.219828 -3.715630 -3.363946 -2.932672 -2.500304 -2.082085 -1.548537 -0.802457 -0.060144 0.357157 0.781529 1.067734 1.426382 2.160625 3.115518 3.396889 3.430116

12.661193 13.393503 13.974797 14.794471 15.701716 16.391182 17.512828 18.802015 19.913057 21.957983 26.364610 26.729848 27.566605 28.354719 29.154758 29.994898 30.491586 30.738071 31.119236 32.061675 32.965828 33.282939 33.520241 33.967480 34.335069 35.043064 35.882564 36.540419 37.068938 38.269052 39.216853 40.656549 42.732025 46.055192 7.472559 7.826120 8.261731 8.700905 9.128292 9.677657 10.454823 11.240584 11.688938 12.150697 12.465896 12.865790 13.705180 14.855007 15.211822 15.254662

VI. CONCLUSION.

The above consolidated table of HLSRGM as exemplified for five Data Sets indicates that the model is performing well in arriving at a decision. The model has given a decision of rejection for 3 Data Sets i.e. Xie, NTDS and S2 at 11th , 13th and 16th instances respectively and a decision of continue for 2 Data Sets i.e. IBM and LYU. Therefore, we may conclude that, applying SPRT on data sets we can come to an early conclusion of reliability / unreliability of software. REFERENCES [1] [2] [3] [4]

[5] Continue [6] [7]

Lyu, M.R. (1996). “The hand book of software reliability engineering”, McGrawHill & IEEE Computer Society press. Pham. H., (2006). “System software reliability”, Springer. Satyaprasad (2007).”Half logistic Software reliability growth model “Ph.D Thesis of ANU, India. Satyaprasad, R., Ramchand, H. R. and Kantam, R. R. L., (May, 2011). “Software Reliability Measuring using Modified Maximum Likelihood Estimation and SPC”, International Journal of Computer Applications, Volume 21– No.7. Stieber, H.A. (1997). “Statistical Quality Control: How To Detect Unreliable Software Components”, Proceedings the 8th International Symposium on Software Reliability Engineering, 8-12. Wald. A., 1947. “Sequential Analysis”, John Wiley and Son, Inc, New York. Xie, M., Goh. T.N., Ranjan.P. (2002). “Some effective control chart procedures for reliability monitoring” -Reliability engineering and System Safety 77 143 -150.

Author’s profile: Dr. R. Satya Prasad Received Ph.D. degree in Computer Science in the faculty of Engineering in 2007 from Acharya Nagarjuna University, Andhra Pradesh. He received gold medal from Acharya Nagarjuna University for his outstanding performance in a first rank in Masters Degree. He is S2 Reject currently working as Associative Professor in the Department of Computer Science & Engineering, Acharya Nagarjuna University. His current research is focused on Software Engineering. He published 70 research papers in National & International Journals. S. Janardhana Rao Presently working at JKC, Andhra Pradesh From the Table 5.2, a decision of either to accept, Society for Knowledge Networks/ reject the system or continue is reached much in Institute for Electronic Governance advance of the last time instant of the data. (APSFKNW) as a Team Lead for the Academic Program on Oracle Academy, Worked 5 years in Admin section for tracking and smooth functioning of JKC files online, Worked 2 years at Aditya Engineering College as JKC Coordinator.

ISSN: 2249-2593

http://www.ijcotjournal.org

Page 41

International Journal of Computer & Organization Trends â&#x20AC;&#x201C; Volume 6 Number 1 â&#x20AC;&#x201C; Mar 2014 He is pursuing his Ph.D under the guidance of Dr. R.Satyaprasad at Acharya Nagarjuna University. Dr. G. Krishna Mohan, working as a Reader and Head in the Department of Computer Science, P.B.Siddhartha College, Vijayawada. He obtained his M.C.A degree from Acharya Nagarjuna University, M.Tech(CSE) from Jawaharlal Nehru Technological University, Kakinada, M.Phil from Madurai Kamaraj University and Ph.D(CSE) from Acharya Nagarjuna University. He qualified, AP State Level Eligibility Test. His research interests lies in Data Mining and Software Engineering. He published 23 research papers in various National and International journals.

ISSN: 2249-2593

http://www.ijcotjournal.org

Page 42