Development in Earth Science, Volume 4, 2016 www.seipub.org/des doi: 10.14355/des.2016.04.001
Intercomparison of Probability Distributions for Extreme Value Analysis of Rainfall under Missing Data Scenario N. Vivekanandan Central Water and Power Research Station, Pune, Maharashtra, India Abstract Assessment of extreme rainfall is one of the important parameters for planning, design and management of hydraulic structures at the project site. This can be obtained through Extreme Value Analysis (EVA) of rainfall by fitting of probability distributions to the data series of annual 1‐day maximum rainfall. This paper illustrates the adoption of Gumbel (EV1), Frechet (EV2), 2‐parameter Log Normal (LN2) and Log Pearson Type‐3 (LP3) distributions in EVA for Tohana. Methods of moments and Maximum Likelihood Method (MLM) are used for determination of parameters of EV1, EV2, LN2 and LP3 distributions. In addition to above, order statistics approach is used for determination of parameters of EV1 and EV2. The adequacy of fitting of probability distributions is evaluated by Goodness‐of‐Fit tests viz., Anderson‐Darling and Kolmogorov‐Smirnov and diagnostic test using D‐index. By considering the design‐life of the structure, the study suggests the estimated extreme rainfall using LP3 (MLM) distribution could be used for design purposes. Keywords Anderson‐Darling, D‐index, Extreme Value Analysis, Kolmogorov‐Smirnov, Log Pearson, Rainfall
Introduction Rainfall frequency analysis plays an important role in hydrologic and economic evaluation of water resources pro‐ jects. It helps to estimate the return periods and their corresponding event magnitudes thereby creating reasonable design criteria. The basic problem in rainfall studies is an information problem, which can be approached through Extreme Value Analysis (EVA) of rainfall. As the distribution of rainfall varies over space and time, it is required to analyze the data covering long periods and recorded at various locations to obtain reliable information [1]. Out of a number of probability distributions that are adopted in frequency analysis, Gumbel (EV1), Frechet (EV2), 2‐parameter Log Normal (LN2) and Log Pearson Type‐3 (LP3) are extensively used for EVA of rainfall. Based on the applicability, standard parameter estimation procedures viz., Methods of Moments (MoM) and Maximum Like‐ lihood Method (MLM) are generally used for determination of parameters [2]. In addition to MoM and MLM, Atomic Energy Regulatory Board (AERB) guidelines [3] described that the Order Statistics Approach (OSA) can also be considered for determination of parameters of EV1 and EV2 distributions. AERB guidelines also described that the OSA estimates are popular owing to less bias and minimum variance though number of methods are avail‐ able for parameter estimation. In the recent past, numbers of studies have been carried out by researchers adopting probability distributions for EVA of rainfall. Lee [4] expressed that the Pearson Type‐3 (PR3) distribution is better suited amongst five distributions studied for analyzing the rainfall distribution characteristics of Chia‐Nan plain area. Bhakar et al. [5] studied the frequency analysis of consecutive day’s maximum rainfall at Banswara, Rajasthan, India. Study by Saf et al. [6] revealed that the PR3 distribution is better suited for modelling of extreme values in Antalya and Lower‐West Mediterranean sub‐regions whereas the Generalized Logistic distribution for the Upper‐ West Mediterranean sub‐region. Mujere [7] applied EV1 distribution for modelling flood data for the Nyanyadzi River, Zimbabwe. Baratti et al. [8] carried out FFA on seasonal and annual time scales for the Blue Nile River adopting EV1 distribution. Esteves [9] applied EV1 distribution to estimate the extreme rainfall depths at different rain‐gauge stations in southeast United Kingdom. Olumide et al. [10] applied normal and EV1 distributions for prediction of rainfall and runoff at Tagwai dam site in Minna, Nigeria. They have also expressed that the normal distribution is better suited for rainfall pre‐ diction while Log‐Gumbel for runoff. Rasel and Hossain [11] applied EV1 distribution for development of intensity
1
www.seipub.org/des Development in Earth Science, Volume 4, 2016
duration frequency curves for seven divisions in Bangladesh. Generally, when different probability distributions are used for EVA, a common problem that arises is how to determine which model fits the best for a given set of data. This can be evaluated by quantitative assessment using Goodness‐of‐Fit (GoF) and diagnostic tests. GoF tests such as Anderson‐Darling (A2) and Kolmogorov‐Smirnov (KS) are applied for checking the adequacy of fitting of probability distributions to the rainfall data. A diagnostic test (using D‐index) is applied for the selection of most suitable probability distribution for estimation of extreme (say, 1‐day maximum) rainfall. Thus, there exist research efforts in assessing the extreme rainfall for aiding design parameter of interest and present work is an effort in this direction. This paper details the procedures involved in assessing the suitable probability distribution for estima‐ tion of extreme rainfall though GoF and diagnostic tests with illustrative example. Methodology The objective of the study is to assess the adequacy of Probability Density Function (PDF) for EVA of rainfall. In this context, various steps followed for data processing, validation and analysis include: (i) prepare the Annual 1‐ day Maximum Rainfall (AMR) series from the daily rainfall data; (ii) select the PDFs for EVA (say, EV1, EV2, LN2 and LP3); (iii) select parameter estimation methods (say, MoM, MLM and OSA) wherever applicable; (iv) select quantitative GoF and diagnostic tests and (v) conduct EVA and analyse the results obtained thereof. The PDF and quantile estimator (XT) of the distributions are presented in Table 1. TABLE 1. PDF AND QUANTILE ESTIMATOR OF PROBABILITY DISTRIBUTIONS
Distribution EV1
EV2
LN2
LP3
f ( X; α, β )
e
e β
γ β f ( X ;β, γ ) β X f X; μ , σ
Quantile estimator
X α / β e X α / β
γ 1 X e β
, β>0
X T YT
, β>0
Y / X T e T
X T e K P
γ
1 ln( X ) μ ,‐<X<,>0 exp σX 2 π 2σ 2
f ( X; α , β , γ )
2
1 ln( X ) α βXΓγ β
γ 1
e
ln( X )α ,,>0 β
X T Exp (( ) K P )
In Table 1, the symbols μ and σ represent the mean and standard deviation of the log‐transformed series of rec‐ orded data. , and γ denote the location, scale and shape parameters of the distributions respectively. For EV1 and EV2 distributions, the reduced variate (YT ) for a given return period (T) is defined
ln( ln(1 (1 / T ))) while in the mathematical representation of LN2 and LP3, K P denotes the frequency factor corresponding to the probability of exceedance. The Coefficient of Skewness ( C S ) is C S =0.0 for LN2 where‐ as C S is based on the log transformed series of the recorded data for LP3 [12]. by YT
Goodness‐Of‐Fit Tests Generally, A2 test is applied for checking the adequacy of fitting of EV1 and EV2 distributions. The procedures in‐ volved in application of A2 test for LN2 and LP3 are more complex though the utility of the test is extended for checking the quantitative assessment. In view of the above, KS test is widely applied for the purpose of quantita‐ tive assessment. Theoretical descriptions of GoF tests are as follows: A2 test statistic is defined as below:
N
A 2 N 1 N ∑ ( 2 i 1) ln( Z i ) 2 N 1 2 i ln(1 Z i ) (1) i 1
Here, Z i F( X i ) for i=1,2,3,…,N with X1<X2<….<XN , F( X i ) is the Cumulative Distribution Function (CDF) of ith sample ( X i ) and N is the sample size. KS test statistic is defined as below:
2
Development in Earth Science, Volume 4, 2016 www.seipub.org/des
N
KS Max ( Fe ( X i ) FD ( X i )) (2) i1
Here, Fe ( X i ) is the empirical CDF of X i and FD ( X i ) is the derived CDF of X i by PDFs. In this study, Weibull plot‐ ting position formula is used for computation of empirical CDF. The theoretical values of A2 and KS tests statistic for different sample size (N) at 5% significance level are available in the technical note on “Goodness‐of‐Fit Tests for Statistical Distributions” by Charles Annis [13]. Test criteria: If the computed value of GoF test statistic given by the distribution is less than that of theoretical val‐ ues at the desired significance level then the distribution is assumed to be suitable for EVA of rainfall at that level of significance. Diagnostic Test Sometimes the GoF test results would not offer a conclusive inference thereby posing a bottleneck to the user in selecting the suitable PDF for application. In such cases, a diagnostic test in adoption to GoF is applied for making inference. The selection of most suitable probability distribution for EVA of rainfall is performed through D‐index test (USWRC), which is defined as below:
6
D‐index = 1 X ∑X i X i (3) *
i 1
Here, X is the average value of the recorded data whereas X i (i= 1 to 6) and X i* are the six highest recorded and corresponding estimated values by different PDFs. The distribution having the least D‐index is considered as better suited distribution for EVA of rainfall [14]. Application EVA of rainfall was carried out to estimate extreme rainfall for different return periods adopting four PDFs viz., EV1, EV2, LN2 and LP3. MoM, MLM and OSA were used for determination of parameters of EV1 and EV2 distri‐ butions whereas MoM and MLM for LN2 and LP3 distributions. Daily rainfall data (with missing values) for the period 1951 to 2011 was used. The series of AMR was extracted from the daily rainfall data and used for EVA. From the scrutiny of the rainfall data, it was observed that the data for the period of five years (1958 to 1960, 1966 and 1967) are missing. So, the data for the missing years were imputed by the series maximum value (i.e., 158.8 mm) as per AERB guidelines and the entire data set is used for EVA. The descriptive statistics viz., average, stand‐ ard deviation, coefficient of variation, coefficient of skewness and coefficient of kurtosis of the data series of AMR are found to be 73.2 mm, 39.6 mm, 54.1%, 0.932 and 0.119 respectively. Results and Discussions TABLE 2. EXTREME RAINFALL (MM) ESTIMATES WITH SE (MM) USING EV1 AND EV2 DISTRIBUTIONS
EV1
EV2
Return period (year)
X T
SE
X T
SE
X T
SE
X T
SE
X T
SE
X T
SE
2
66.7
4.7
66.8
4.6
66.8
4.8
58.0
4.7
60.3
5.0
58.1
4.7
MoM
MLM
OSA
MoM
MLM
OSA
5
101.8
7.8
101.5
7.8
102.6
7.6
94.3
12.7
112.2
15.1
105.2
14.1
10
125.0
10.6
124.5
10.5
126.4
10.0
130.2
23.6
169.3
30.6
155.8
28.2
20
147.3
13.3
146.6
13.2
149.1
12.5
177.5
40.7
251.1
57.7
227.3
52.1
50
176.1
17.1
175.1
17.0
178.6
15.7
264.9
78.9
418.5
124.7
370.3
110.4
100
197.7
19.9
196.5
19.8
200.7
18.2
357.6
126.1
613.5
216.5
534.0
188.4
200
219.2
22.7
217.9
22.5
222.7
20.7
482.2
197.8
898.3
368.3
768.9
315.3
500
247.6
26.5
246.0
26.3
251.7
24.1
715.4
350.5
1485.4
727.6
1243.9
609.3
1000
269.0
29.4
267.2
29.1
273.6
26.6
963.9
533.5
2172.3
1202.1
1789.2
990.1
2000
290.5
32.2
288.5
31.9
295.6
29.5
1298.6
997.3
3176.5
2439.4
2573.3
1976.2
5000
318.8
36.0
316.6
35.7
324.5
32.5
1925.5
1371.3
5248.7
3737.7
4159.9
2962.4
10000
340.3
38.8
337.8
38.5
346.5
35.0
2593.9
2037.7
7674.0
6028.6
5982.3
4699.6
3
www.seipub.org/des Development in Earth Science, Volume 4, 2016
TABLE 3. EXTREME RAINFALL (MM) ESTIMATES WITH SE (MM) USING LN2 AND LP3 DISTRIBUTIONS
Return period (year) 2 5 10 20 50 100 200 500 1000 2000 5000 10000
LN2
LP3
MoM X T 64.4 98.7 123.3 148.3 182.4 209.5 237.7 277.1 308.6 341.5 380.7 410.4
MLM SE 4.2 7.4 10.9 14.8 21.0 26.2 32.2 40.9 48.3 56.4 66.4 77.9
X T 63.4 100.5 127.8 155.9 194.9 226.2 259.2 305.8 343.4 383.1 435.0 475.0
MoM SE 4.5 8.2 12.1 16.7 24.1 30.6 37.9 48.7 58.0 68.3 81.0 95.9
X T 64.6 101.3 126.9 152.0 185.3 210.7 236.5 271.2 298.1 324.0 356.6 382.7
MLM SE 6.2 10.9 15.7 21.3 29.5 36.5 43.9 54.7 63.3 72.4 83.3 95.6
X T 65.3 101.3 125.6 148.9 178.9 201.3 223.5 252.7 274.7 296.7 325.7 347.7
SE 6.0 10.6 15.1 20.3 27.8 33.9 40.5 49.6 56.8 64.4 74.1 82.3
FIGURE 1. ESTIMATED 1‐DAY MAXIMUM RAINFALL USING EV1, LN2 AND LP3 DISTRIBUTIONS WITH RECORDED DATA FOR TOHANA
FIGURE 2. ESTIMATED 1‐DAY MAXIMUM RAINFALL USING EV2 DISTRIBUTION WITH RECORDED DATA FOR TOHANA
4
Development in Earth Science, Volume 4, 2016 www.seipub.org/des
Based on the parameter estimation procedures [15] of EV1, EV2, LN2 and LP3 distributions, a computer code was developed in FORTRAN language and used for EVA of rainfall. These programs compute the distribution parame‐ ters, extreme rainfall estimates with standard error for different return periods and also perform GoF tests statistic and D‐index. The estimated Extreme Rainfall (XT) with Standard Error (SE) computed from four probability distri‐ butions is presented in Tables 2 and 3. From Table 2, it was observed that the estimated extreme rainfall using EV2 (MLM) is consistently higher than the corresponding values of EV1, LN2 and LP3 distributions. Also, from Table 2, it was noted that there is no signifi‐ cant difference between the estimated extreme rainfall while MoM and MLM are considered for determination of parameters of EV1 distribution. From Table 3, it was noted that the estimated rainfall using LN2 (MLM) is relative‐ ly higher than the corresponding values of LN2 (MoM) and LP3 (MoM and MLM). The EVA results obtained from EV1, LN2 and LP3 distributions were used to develop the probability plots and presented in Figure 1. Similarly, the probability plots of estimated extreme rainfall obtained from EV2 distribution are presented in Figure 2. Analysis Based on GoF Tests The adequacy of fitting of four PDFs adopted in EVA of rainfall was performed through GoF tests viz., A2 and KS, as described above. The GoF tests results are presented in Table 4. TABLE 4. COMPUTED AND THEORETICAL VALUES OF GOF TESTS STATISTIC USING EV1, EV2, LN2 AND LP3 DISTRIBUTIONS FOR TOHANA
Computed values of GoF tests statistic
GoF test
MoM
MLM
OSA
MoM
MLM
OSA
MoM
MLM
MoM
MLM
Theoretical value at 5 % level
A2
0.587
0.598
0.581
1.665
0.950
1.055
0.389
0.403
0.436
0.459
0.757
KS
0.071
0.071
0.068
0.137
0.096
0.121
0.085
0.075
0.066
0.080
0.171
EV1
EV2
LN2
LP3
From the GoF tests results, the following observations were made from the study: i)
A2 test supported the use of EV1, LN2 and LP3 distributions for EVA of rainfall.
ii) A2 test didn’t support the selection of EV2 distribution for EVA of rainfall. iii) KS test confirmed the applicability of EV1, EV2, LN2 and LP3 distributions for EVA of rainfall. Analysis Based on Diagnostic Test A diagnostic test (using D‐index) was used for the selection most suitable probability distribution for estimation of rainfall. The D‐index values computed from EV1, EV2, LN2 and LP3 distributions using different parameter esti‐ mation methods are given in Table 5. TABLE 5. D‐INDEX VALUES OF EV1, EV2, LN2 AND LP3 DISTRIBUTIONS
Diagnostic test D‐index
EV1
EV2
LN2
LP3
MoM
MLM
OSA
MoM
MLM
OSA
MoM
MLM
MoM
MLM
1.756
1.762
1.738
4.986
12.882
10.079
1.940
1.762
2.127
1.604
From Table 5, it was noted that the D‐index value of LP3 (MLM) is minimum when compared with the correspond‐ ing values of other distributions with different parameter estimation methods. Conclusions The paper presents the study carried out for estimation of extreme rainfall at Tohana adopting EV1, EV2, LN2 and LP3 distributions with applicable parameter estimation methods. The following conclusions were drawn from the study: i)
The estimated extreme rainfall by EV2 (MLM) distribution was consistently higher than the corresponding values of other three distributions adopted in EVA of rainfall.
ii) There was no significant difference between the estimated extreme rainfall using EV1 (MoM) and EV1 (MLM).
5
www.seipub.org/des Development in Earth Science, Volume 4, 2016
iii) Suitability of probability distribution was evaluated by GoF (using A2 and KS) and diagnostic (using D‐ index) tests. a) The A2 test results suggest the use of EV1, LN2 and LP3 distributions for EVA of rainfall. b) The KS test results confirm the applicability of EV1, EV2, LN2 and LP3 distributions for EVA of rainfall. iv) The D‐index value of LP3 (MLM) distribution was found to be a minimum when compared to the corre‐ sponding values of EV1, EV2 and LN2 distributions. v) The trend lines of the fitted curves using LP3 (MLM) distribution show the estimated extreme rainfall val‐ ues are within the line of agreement of the recorded data. vi) On the basis of quantitative and qualitative assessment, it was suggested that the LP3 (MLM) distribution could be used for estimation of extreme rainfall at Tohana. vii) By considering the design‐life of the structure, the 10000‐year return period Mean+1 (where Mean denotes the estimated extreme rainfall and 1 the Standard Error (SE)) value of 430 mm obtained from LP3 (MLM) distribution is recommended for the design purposes. ACKNOWLEDGEMENTS
The author is grateful to Dr. M.K. Sinha, Director, Central Water and Power Research Station, Pune, for providing the research facilities to carry out the study. The author is thankful to M/s Nuclear Power Corporation of India Limited, Mumbai for supply of rainfall data. REFERENCES
[1]
B. Singh, D. Rajpurohit, A. Vasishth and J. Singh, “Probability analysis for estimation of annual one day maximum rainfall of Jhalarapatan area of Rajasthan, India”, Plant Archives, Vol. 12, No. 2, pp. 1093‐1100, 2012.
[2]
A.N. Celik, “On the distributional parameters used in assessment of the suitability of wind speed probability density func‐ tions”, Energy Conversion and Management, Vol. 45, No. 11 & 12, pp. 1735‐1747, 2004.
[3]
AERB, Extreme values of meteorological parameters, Atomic Energy Regulatory Board Safety Guide No. AERB/ NF/ SG/ S‐3, 2008.
[4]
B.H. Lee, D.J. Ahn, H.G. Kim and Y.C. Ha, “An estimation of the extreme wind speed using the Korea wind map”, Renew‐ able Energy, Vol. 42, No. 1, pp. 4–10, 2012.
[5]
S.R. Bhakar, A.K. Bansal, N. Chhajed and R.C. Purohit, “Frequency analysis of consecutive days maximum rainfall at Banswara, Rajasthan, India”, ARPN Journal of Engineering and Applied Sciences, Vol. 1, No. 1, pp. 64‐67, 2006.
[6]
B. Saf, F. Dikbas and M. Yasar, “Determination of regional frequency distributions of floods in West Mediterranean River Basins in Turkey”, Fresenius Environment Bulletin, Vol. 16 No. 10, pp. 1300–1308, 2007.
[7]
N. Mujere, “Flood frequency analysis using the Gumbel distribution”, Journal of Computer Science and Engineering, Vol. 3, No. 7, pp. 2774‐2778, 2011.
[8]
E. Baratti, A. Montanari, A. Castellarin, J.L. Salinas, A. Viglione, and A. Bezzi, “Estimating the flood frequency distribution at seasonal and annual time scales”, Hydrological Earth System Science, Vol. 16, No. 12, pp. 4651–4660, 2012.
[9]
L.S. Esteves, “Consequences to flood management of using different probability distributions to estimate extreme rainfall”, Journal of Environmental Management, Vol. 115, No. 1, pp. 98‐105, 2013.
[10] B.A. Olumide, M. Saidu and A. Oluwasesan, “Evaluation of best fit probability distribution models for the prediction of rainfall and runoff volume (Case Study Tagwai Dam, Minna‐Nigeria)”, Engineering and Technology, Vol. 3, No. 2, pp. 94‐ 98, 2013. [11] M. Rasel and S.M Hossain, Development of rainfall intensity duration frequency equations and curves for seven divisions in Bangladesh, International Journal of Scientific & Engineering Research, Vol. 6, No. 5, pp. 96‐101, 2015.
6
Development in Earth Science, Volume 4, 2016 www.seipub.org/des
[12] B. Bobee and F. Askhar, The Gamma family and derived distributions applied in hydrology, Water Resources Publications, 1991. [13] P.E. Charles Annis, Goodness‐of‐Fit tests for statistical distributions, [http://www.statistical engineering. com/goodness.html], 2009. [14] USWRC, Guidelines for determining flood flow frequency, United States Water Resources Council Bulletin No. 17B, 1981. [15] A.R. Rao and K.H. Hameed, Flood Frequency Analysis, CRC Publications, Washington, New York, 2000.
7