(asce)0733 9437(1993)119 3(429)

Page 1

C R I T E R I A FOR E V A L U A T I O N OF W A T E R S H E D M O D E L S

Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

By the ASCE Task Committee on Definition of Criteria for Evaluation of Watershed Models of the Watershed Management Committee, Irrigation and Drainage Division ABSTRACT: This report addresses the problem practicing engineers face as they

try to evaluate the usefulnessof watershed modelsfor solvingengineering problems. The report addresses the need for more complete parameter descriptions, unrealistic data needs, documentation, testing, and the lack of uniform criteria for evaluating a model's performance. The report makes recommendations for using some basic statistical measures to describe the performance of the models.

INTRODUCTION Engineers and scientists depend on technical papers published in journals to keep them abreast of new developments from the research community. Practicing engineers in particular frequently find journal papers lacking in several areas so that they are generally not usable for engineering applications. These areas include: 9 Inadequate description of parameters, parameter selection, or discussion of the legitimate range of parameters. 9 Unavailability of the types of data needed to set up and run the model (i.e., a model developed for an instrumented research basin). 9 Inadequate documentation. 9 Inadequate testing of the model over a range of conditions, physiographic regions, and climatological regions. 9 Inadequate discussion of how well the model performs either using the test data or in comparison with other models and procedures. However, journal papers do serve a very useful role in disseminating new ideas and research results to the scientific and engineering community. Often it is only when one tries to use the results of new models or tries to duplicate the published results that the shortcomings of papers published in journals become evident. Some of the problems listed have been caused in part by the rapid proliferation of computer models dealing with all aspects of hydrology. Each model is justified by the authors by pointing out weaknesses in or lack of existing techniques for addressing a particular hydrologic problem. To some extent, the development of the large number of computer models has been a response to an expanded role of hydrology into such areas as n o n - p o i n t source pollution, atmospheric modeling, and climatic modeling. In other cases it has been the inevitable response to relatively cheap computer time and a graduate student's timetable. However, in almost all cases, these models have been developed in response to the great potential that simulation provides in studying natural systems; that is, simulation allows a large Note. Discussion open until November 1, 1993. To extend the closing date one month, a written request must be filed with the ASCE Manager of Journals. The manuscript for this paper was submitted for review and possible publication on July 23, 1992. This paper is part of the Journal of Irrigation and Drainage Engineering, Vol. 119, No. 3, May/June, 1993. 9 ISSN 0733-9437/93/0003-0429/$1.00 + $.15 per page. Paper No. 4386. 429

J. Irrig. Drain Eng., 1993, 119(3): 429-442


Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

number of scenarios to be completed in a few minutes as compared with an empirical approach that would literally take decades. Although there have been a multitude of watershed and hydrologic models developed in the past several decodes, there do not appear to be commonly accepted standards for evaluating the reliability of these models. There is a great need to define the criteria for evaluation of watershed models clearly so that potential users have a basis with which they can select the model best suited to their needs. In response to these needs, the Watershed Management Committee of the Irrigation and Drainage Division of ASCE authorized a task committee to define the issues, review the literature, and define criteria that can be used to evaluate models. This report summarizes the committee's work on this subject, and presents recommendations for quantitative criteria to be included with every published paper on development and use of hydrologic and watershed models.

SCOPE There are three major limitations to the presentation of hydrologic models and their results in today's technical literature. They can be summarized as follows. t. When presenting model results, the model developers typically do not provide consistent or standard statistical evaluation criteria to assist the readers or users in determining how well their model reproduces the measured data and how well their model compares to other models. 2. Models are not tested for a wide range of conditions (both physical locations and a wide range of runoff conditions) so that the potential user can evaluate where and when a particular model should be applied and how different models work for different conditions. 3. The models reported in journals are usually not documented well enough for a potential user to apply a model effectively and to evaluate the results. There is an abundance of methods to evaluate the performance of mathematical models depending on the type of model, data available for testing, and the ultimate purpose for the model. All too often, a model concept is developed in a technical paper, and its success is demonstrated by matching some of the data from which it was developed. Seldom, if ever, does one take the time and effort to test the model thoroughly with data from several sites. Perhaps more important, one seldom seeks data from a variety of soiltopographic-climate regimes that could truly evaluate a model and perhaps set some limits for the conditions under which it will and will not provide accurate predictions. To some extent the reward system that most of us subscribe to inherently discourages this effort. Some reasons are as follows. First, most model development is accomplished by academicians and government researchers. The reward system for these people is based to a large extent on publishing new research accomplishments, and simple testing of past concepts and models is not considered to be scholarly pursuit. University tenure and peer reviews have historically given testing and evaluation little weight in an overall evaluation. A second reason is that such work is not as interesting as trying new concepts and models. Finally, good testing is a lot of work 430

J. Irrig. Drain Eng., 1993, 119(3): 429-442


Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

that, although useful to other hydrologists, will probably not reinforce previous claims. Because of these reasons and the fact that practitioners searching the literature for a model to apply to a specific problem have essentially no criteria for evaluating how "good" a model is, this committee is recommending that authors publishing technical papers on the development of a hydrologic model include, at a minimum, a few simple and easily applied statistics. This is the case for statistical and deterministic models and for event and continuous models. This recommendation applies to both snowmelt and rainfall-runoff models. In addition to the simple statistics that are described in detail in the next section, authors are strongly encouraged to address, in detail, the necessity and procedure for calibration of the model, per se, and for the parameters, in general. The model validation procedures should also be discussed in some detail, especially addressing the changes in model parameters that were necessary to obtain an adequate fit, and what criteria were used to guide the selection of the final values. Authors are encouraged to present the results of a "blind" application--an application in which the model is used with a data set that is different from the one used in model development. In addition, in a blind application, model parameters and data are not changed from their originally chosen values; that is, the model is tested in the prediction mode, not the fitting mode. One additional area that authors are encouraged to address is the signalto-noise concept, which is an awareness that hydrologic data are not measured with 100% accuracy. Although in many cases the accuracy of the inputs, is not known, e.g., rainfall or temperature. We compare model results to measured output, usually stream flow, which also has an associated measurement error. What the author needs to address is the effect of inaccuracies of measured parameters on the resultant model simulations. This is in effect a statement of model sensitivity to the data. Thus, the scope of this task-committee report is an attempt to put the documentation of hydrologic models on a more comparative ground. The idea is to present procedures that all researchers can use when publishing their results. REVIEW OF LITERATURE

The task committee reviewed a large number of papers concerning criteria for evaluating mathematical model performance. In every paper in which results of mathematical modeling are presented, some criteria had been used to evaluate the model's performance. Evaluations almost always involved a comparison of the model's output to some corresponding measured variable. It was the intent of the literature review to gather those papers published since 1980 that addressed objective and/or statistical methods to evaluate and compare mathematical model performance. Key papers of this kind that were located include: (1) Cavadias and Morin (1986), who suggested combining simulated discharges from two or more models and adopted numerical criteria to evaluate model performance; (2) Cavadias and Morin (1988), who presented a method for calculating confidence intervals for using the mean-squared forecast error to evaluate forecasting models; (3) Green and Stephenson (1986), who discussed the adequacy of using goodness of fit for observed and simulated hydrographs to evaluate model performance; and (4) Troutman (1985), who suggested treating model errors as random variables so that useful and appropriate statistical techniques 431

J. Irrig. Drain Eng., 1993, 119(3): 429-442


Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

could be applied. In addition, Burges (1984) proposed using an accurate physically based model to assess the performance of simpler rainfall-runoff models. Case studies in which two or more models were evaluated and compared with or without objective, explicit criteria were also sought. Key examples of this type of paper includes Hawley et al. (1980), Jennings (1976), Loague and Freeze (1985), and Pathak et al. (1984). Papers that addressed one or more steps of the modeling procedure and the consequences for model performance were also included. Papers concerning calibration procedures, validation procedures, model structure, model parameterization, and data error sources used to calibrate and validate models were collected. Doyle and Miller (1980), Gorgens (1983), Gupta and Sorooshian (1985), Hendrickson et al. (1988), Isabel and Villeneuve (1986), and Schrader et al. (1980) all discussed the effect of calibration procedures on model accuracy and performance. The effect of validation procedures on model accuracy and performance was discussed by Doyle and Miller (1980) and Schrader et al. (1980). Key papers that addressed the importance of accurate input data include Garklavs and Oberg (1986), concerning calculation of rainfall excess; Hawley et al. (1980) and Rango and Martinec (1981), concerning quality of data used in snowmelt models; and McCuen and Bondelid (1983), concerning Soil Conservation Service (SCS) unit hydrograph peak rate factors. Lusby and Lichty (1983) discussed the difficulty in obtaining accurate infiltration and related soil parameters to use in simulation models. Linsley (1986) provided evidence that the observed data, against which simulated data are compared, are also subject to bias and error. Ditmars et al. (1987), Klemes (1986), and Smith and Amisial (1982) summarized and discussed sources of error in the modeling process. Papers that addressed individual models applied to a single watershed were not sought. However, several have been collected as examples. Examples of case studies in which a single model was applied to several or more watersheds and the results compared were sought and collected. Abstracts of unpublished dissertations that pertained to evaluating the performance of models--e.g., Chen (1985), Barlas (1985), and Desh-Ashtiani (1986)--were collected. These were mostly of a conceptual or theoretical nature but may provide a basis for discussion of novel statistical approaches. RECOMMENDATIONS

Hydrologic models are used most frequently to simulate or predict flows either on a continuous basis or for a particular event. In all cases the modelcomputer flow is compared with the measured flow. It is recommended that both visual and statistical comparisons between model computed and measured flows be made whenever data are presented~ The visual comparison, which often takes the form of graphic plots of simulated and observed flows, is a necessary first step in an evaluation. This first step provides a general overview of model performance and provides an overall feeling for model capabilities. When the performance of various models are compared or the performance of a single model is evaluated for different years, quantitative assessment is needed, which can be met using one or more statistical goodness-of-fit criteria. The appropriate criteria to select will depend on the application; however, it is recommended that the number of goodness-offit criteria be kept to a minimum. 432

J. Irrig. Drain Eng., 1993, 119(3): 429-442


Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

Criteria for Continuous Hydrographs In a recent project conducted by the World Meteorological Organization (WMO) ("Intercomparison" 1986), numerous statistical criteria were proposed for evaluating results of continuous hydrograph modeling. Martinec and Rango (1989) have analyzed results of the WMO project and have recommended that only a very few of the quantitative measures be used in combination with a graphic plot of the simulated and measured hydrographs. Martinec and Rango (1989) recommend that the criteria should be as simple as possible. The deviation of runoff volumes, D v , is one goodnessof-fit criterion: V-

V t

Dv(%) = - V

9 lOO

.......................................

(1)

where V = the measured yearly or seasonal runoff volume; and V' = the model computed yearly or seasonal runoff volume. D v can take any value; however, the smaller the number the better the model results are. D v would equal zero for a perfect model. The use of D v provides an immediate complement to a visual inspection of the coninuous hydrographs. The second basic goodness-of-fit criterion is the Nash-Sutcliffe coefficient, R 2 (Nash and Sutcliffe 1970): (Qi-

Q/)2

(Q,-

0) 2

R 2= 1 -i=1

.....................................

(2)

i=1

where Qi = the measured daily discharge; Q ' = the computed daily discharge; Q = the average measured discharge; and n = the number of daily discharge values. The R e values can vary from 0 to 1, with 1 indicating a perfect fit. Computationally, the R 2 could be negative but this becomes rather meaningless as far as interpretation or results are concerned. For R 2 = 0, the interpretation can be made that the model is predicting no better than using the average of the observed data. Furthermore, a shortcoming of the Nash-Sutcliffe statistic occurs in periods of low flow. If the daily measured flows approach the average value, the denominator of (2) goes to zero and R 2 approaches - ~ with only minor model mispredictions. This statistic works best when the coefficient of variation for the observed data set is large. The average measured discharge is determined either from the year or period in question (then R z can also be called the coefficient of determination) or from earlier years as long-term average, Martinec and Rango (1989) recommend using Q from the year or season in question to avoid unrealistically high values of R 2 in low runoff years. The R 2 value is a measure of how well the daily simulated and measured flows correspond. A third possible criterion is coefficient of gain from the daily mean, D G ("Intercomparison" 1986), where D G can vary between 0 and 1, with 0 being a perfect model. 433

J. Irrig. Drain Eng., 1993, 119(3): 429-442


( Q , _ Q;)2 DG = 1

i=1

....................................

Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

2 (Q,-

(3)

Qi) 2

i=1

where Qi = the average measured discharge for each day of the period. As opposed to R 2, where the model results are compared to "no model" (or Q), DG compares model results with daily mean discharge values (or Qi), which vary throughout a year or season. This set of Qi values is also called a peasant's model (Nash, unpublished note, 1980) or a seasonal model (Garrick et al. 1978). Both Q and Qi are also statistical evaluations and can be considered as primitive models for comparison purposes. It is valuable to know if there are certain years in which a model cannot outperform these primitive models. Although many other goodness-of-fit criteria are available, it is recommended that for continuous flow modeling the evaluations be limited to no more than Dv, R 2, and DG. Martinec and Rango (1989) show how these three criteria can be effectively combined. Special situations may require special goodness-of-fit criteria not covered by these three criteria. For such situations, additional criteria are described by WMO ("Intercomparison" 1986).

ContinuousHydrographExamples As a demonstration of using the performance evaluation criteria recommended for use with continuous event models, snowmelt-runoff simulations were run on the Rio Grande basin near Del Norte, Colo., using the snowmelt-runoff model (SRM) (Martinec and Rango 1989). The Rio Grande basin is in the San Juan Mountains of southwestern Colorado, and has an area of 3,419 km 2 and an elevation range from 2,432 to 4,215 m. For model input, temperature and precipitation were obtained from Del Norte, and snow-cover data were obtained from Landsat images of the basin. Three general types of vegetation are present in the basin: the mixed conifer-aspen band (up to about 2,900 m), the spruce-fir band (from about 2,900 m to 3,350 m), and the alpine region (above 3,350 m). The basin is typical of the region where a seasonal snowpack starts to accumulate in late October and reaches the seasonal maximum around April 1. The highest daily peak of snowmelt runoff usually occurs in early June. The relatively simple SRM is similar to many other snowmelt models in that it employs a degree-day approach to calculate snowmelt. It is different from the other snowmelt models in that it employs satellite data to delineate the area of the basin covered by snow where the degree-day algorithm is applied. In the case presented here, the model was applied to simulate daily streamflow for the period April 1-September 30 in two vastly different years. 1977 was the all-time recorded low runoff year, the result of an extreme drought; 1979 was a much above normal runoff year with an above average snowpack. The 1977 and 1979 results are shown in Figs. 1 and 2, respectively. These figures show the measured hydrograph versus the SRM-computed hydrograph along with the three recommended evaluation criteria. For 1977, the Dv, R2, and D G values are - 7.11%, 0.48, and 0.99, respectively. For 1979, the Dr, R 2 and DG values are 0.02%, 0.95, and 0.89, respectively. As can be seen from examination of the Dv and R 2 criteria for the two years, model performance is better in the above-normal runoff year (1979) than in the 434

J. Irrig. Drain Eng., 1993, 119(3): 429-442


260 240

Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

220 200 Dv = - 7 . 1 1 %

180 -

~ o~ tO

.on a

R2 = 0.48

60

DG = 0.99

140 120 100 8O 60

computed

40 20

"

0

J'""'"" i

apdl

i may

I june

i july

I

i

september

august

FIG. 1. Snowmelt Runoff Simulation for 1977 on Rio Grande Basin (3,419 km 2) above Del Norte, Colo., Using SRM 260 240 220

!

200

computed

'"':,,'~f

180

Dv = 0.02 %

, /,/~,

160

R2 = 0.95

140 tO

~5

120 100 80 60 40 20 0

L april

may

june

I july

i august

~

september

FIG. 2. Snowmelt Runoff Simulation for 1979 on Rio Grande Basin (3,419 km 2) above Del Norte, Colo., Using SRM

drought year (1977). Examination of the DG criterion and the graphic plots, however, shows that the model simulation in 1977 is also useful despite the decline in the Dv and R 2 statistical criteria. Criteria for Single Events A review of criteria for use in single-event modeling is given by Green and Stephenson (1986). The objectives of single-event modeling are somewhat more diverse than continuous flow modeling, and as a result the po435

J. Irrig. Drain Eng., 1993, 119(3): 429-442


Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

tential criteria are more numerous. Generally, the objectives of single-event modeling are the determination of peak flow rate, flow volumes, and hydrograph shape and timing. Green and Stephenson (1986) recommend that peak flow rates be evaluated by using a simple percent error in peak (PEP): PeP = Oes-

aeo. 100 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

(4)

Q.o

where Qps = the simulated peak flow rate; and Qpo = the observed peak flow rate. For volumetric assessment, a simple comparison using a measure such as D v is sufficient. For assessing the shape of a simulated hydrograph, a simple sum of squares of the residuals, G, is proposed. [Qo(ti) - Os(ti)] 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

G = ~

(5)

i=1

where Qo(t) = the measured flow rate at time t; and Q~(t) = the simulated flow rate at time t. For an overall goodness-of-fit or measure of hydrograph shape from a number of events, the total sum of squared residuals (TSSR) or the total sum of absolute residuals (TSAR) are recommended by Green and Stephenson (1986). TSSR = ~ j

~

(6)

[Qo(ti) - Qs(ti)] 2~ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1i=1

where n = the number of pairs of ordinates compared in a single event; and m = the number of events. TSAR = ~ j=l

~

(7)

[Qo(ti) - Qs(ti)]j . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

i=1

Green and Stephenson also recommended that graphic plots be used in combination with the statistical criteria. In addition to these, it should be emphasized that evaluation of a single event would be unacceptable. Authors must show results for several events.

Example of Single-Event Model Criteria As a demonstration example, all five criteria suggested for single-event models, plus the R 2 criteria for continuous-event models, were applied to data from a now abandoned U.S. Department of Agriculture-Agricultural Research Service (USDA-ARS) rangeland watershed west of Albuquerque, N.M. This watershed (designated as 47.002) (WSII) has an area of 16.4 ha (40.5 acres) and two primary soil types: a sandy loam (65% of the area) TABLE 1. Event date

(1) August 24, 1957 June 10, 1966 August 13, 1967

Characteristics of Test Storms

Rainfall

Duration

Peak intensity

(ram) (2) 31.8 30.5 30.0

(min) (3) 60 35 50

(ram/h) (4) 137.2 193.0 144.8

436 J. Irrig. Drain Eng., 1993, 119(3): 429-442


Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

and a clay loam (35% of the area). There is only about 20% cover on most of the area. Runoff results from short-duration, high-intensity thunderstorms that occur during the summer monsoon period. Although the watershed has been modeled using complex distributed-parameter, kinematicwave routing models, a simple linear-reservoir hydrologic-routing model was applied for this example. This simple model was a reasonable choice compared to other techniques (Sabol and Ward 1985). The model requires Green and Ampt infiltration equation parameters, time of concentration, and rainfall interception, which were estimated as hydraulic conductivity of 0.98 cm/h (0.385 in./hr), capillary suction of 1.08 cm (2.74 in.), soil porosity of 0.55, initial soil saturation of 0.31, time of concentration of 9 min (Kirpich 1940), and rainfall interception of 0.004 cm (0.01 in.). Ward (1985) applied the model to seven runoff events. For application of the evaluation criteria, only three events were modeled as shown in Table 1. As an indicator of the severity of the storms, the back-calculated values of the SCS curve number were in the range of 85-95. There are some timing differences between measured and simulated runoff that are considered in the application of the criteria. More information about the storms and resultant runoffs can be found in USDA-ARS published data ("Hydrologic Data" 1973). Results

An independent test of the model was made. The results are listed in Table 2. No attempt was made to adjust the infiltration or other parameters to produce a better fit to the data because this was a blind application of the model to the data. The measured and computed hydrographs were discretized into 1-min time steps for consistent comparisons. The hydrographs for the events are plotted in Figs. 3-5. The various model evaluation criteria were computed for these hydrographs, as listed in Table 3. In Table 3, the zero time is set at the start of rainfall. This choice of synchronization time can lead to mismatch of the peak flows, as seen in Figs. 3 and 4. Because the G, TSAR, and R 2 criteria are dependent on a realistic synchronization, different time-shifting approaches were investigated. It is recommended that if there is an apparent synchronization error (Green and Stephenson 1986), then the measured and computed hydrographs should be synchronized to match the peak (or largest peak) flow. Table 4 demonstrates the effect of synchronization on the various criteria affected by time shifts. Because the R 2 criterion measures magnitude and timing errors, a modification was made to the application of the criterion. Before the R 2 criterion was applied to the hydrographs, the measured and the computed hydrographs were converted to unit hydrographs. The shifted unit hydrographs were then subjected to the R 2 criterion as listed in the last column TABLE 2.

Measured and Simulated Peak Flows and Runoff Volumes

Peak Discharge (era/s)

Runoff Volume (1,000 cm a)

Event d~e (1)

Measured (2)

Simulated (3)

Measured (4)

Simulated (5)

August 24, 1957 June 10, 1966 August 13, 1967

3.23 2.21 2.61

1.76 2.94 1.84

3.17 1.62 2.29

1.63 1.58 1.69

437 J. Irrig. Drain Eng., 1993, 119(3): 429-442


J. Irrig. Drain Eng., 1993, 119(3): 429-442

(30

4:= CO

10

4 0~ . . . .

5~0 ~ '~ ~'' 60 ~

0

0.0 0

1.0

10

20

30

40 50

60

70 80

-Measured -- -- Predicted

so

1.0

0 1.5

E 2.0 v0

2.5

Time Since Start of Rainfall

47.002 near Albuquerque, N.M.

FIG. 4. Rainfall-Runoff Simulation for August 13, 1967, Storm; ARS Watershed

(minutes)

T~me Since Start of Rainfall

,,

20

30

40

50

(minutes)

Time STnce Start of Rainfall

10

Measured

60

70

FIG. 5. Rainfall-Runoff Simulation for A u g u s t 24, 1957, Storm; A R S W a t e r s h e d 47.002 near A l b u q u e r q u e , N.M.

0.0

30' . . . . (minutes)

~ .... 20

\\\\

: ,

i I~~ ~ ~

0.0

' ' ~ ....

\

~~ 1.5

'~ 2.5 2.0

3.0

3.5

0,5

O ''

I

1\/

',

MeQsured -- -- Predicted

3.0-

0.5

1.0

i/

~ ~ ~\

FIG. 3. Rainfall-Runoff Simulation for June 10, 1966, Storm; ARS Watershed 47.002 near Albuquerque, N.M.

O

1.5 84

2.0'

2.5"

3.0"

Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

80

,1


TABLE 3. Rainfall

Calculated Evaluation Criteria Values: Time Synchronized on Start of

Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

Criteria Event date

(1) August 24, 1957 June 10, 1966 August 13, 1967

G a

TSAR

(%) (2)

(%)

[(cm/s)2l (4)

(cm/s) (5)

e 2

(3)

- 45.2 33.3 -29.3

- 48.6 42.5 -26.3

17.5 11.8 26.7

28.3 19.9 28.1

0.58 0.54 0.33

56.0

76.3

PEP

DV

Total

(6)

aTotal of G values is TSSR. TABLE 4. Calculated Evaluation Criteria Values: Time Synchronized on Peak Flow; Hydrographs Shifted as Indicated

Criteria

R2

Ga (cm/s 2) (3)

TSAR [ma/s(cfs)] (4)

Shift b (5)

Unit ~ (6)

August 24, 1957 June 10, 1966 August 13, 1967

14.9 11.8 5.7

0.7277(25.7)[ 0.5635(19.9) 0.3058 (10.8)

0.65 0.54 0.86

0.80 0.85 O.95

Total

32.4

1.5971(56.4)

Event date

(rain)

(1)

(2)

Shift

"Total of G values is TSSR. bTime-shifted hydrographs. ~ unit hydrographs. of Table 4. By unitizing the hydrographs, the effect of errors in c o m p u t e d and measured volumes are avoided. This example indicates the i m p o r t a n c e of synchronizing the measured and computed hydrographs before time-based criteria are applied. T h e example also demonstrates that unitizing the hydrographs before they are c o m p a r e d can lead to significantly different and perhaps m o r e comparable results. If a computed h y d r o g r a p h is time-shifted before comparison, the nature and rational for the time shift should be clearly stated. RECOMMENDATIONS FOR IMPLEMENTATION

It is not enough for a group such as this task committee to write and perhaps publish a report. If indeed this subject is of value, there need to be specific steps t a k e n to see that the r e c o m m e n d a t i o n s are carried out. One way this can be done is to m a k e use of the r e c o m m e n d e d criteria mandatory, in the same m a n n e r that journals require use of the SI units. This could be done at least theoretically for A S C E journals, but we as an A S C E task committee really have no say in journals published by other societies, government groups, and universities. T h e r e is, of course, a danger in requiring a m i n i m u m of statistical criteria in that the author m a y choose 439 J. Irrig. Drain Eng., 1993, 119(3): 429-442


Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

an alternative journal rather than be bothered by the additional work or risk of exposing his or her model to quantitative scrutiny. However, there are routes we can take to expand this simple requirement to include other journals and organizations. Many of us are members of more than one society, and we can at least attempt to similarly influence their editorial boards. The same may be possible for government agencies that have a major impact on hydrology. Beyond this we can try to influence journal and handbook editors to direct peer reviewers to look for statistical evaluation in a manuscript, and if not present, recommend it be added before a paper can be accepted. This is perhaps the best route to take for implementing the other recommended criteria (blind tests, signal-to-noise tests, and calibration), especially if these fall into the desirable but not mandatory case. Certainly, government agencies that publish handbooks and manuals can make certain basic statistical and testing criteria mandatory for a model to be published in a specific series. In fact, each agency could have a prestige series that would ensure that certain statistics, testing, and documentation have been accomplished. SUMMARY AND CONCLUSIONS

Currently, the evaluation of watershed models is subjective. Uniform criteria have not been proposed for comparing the performance of various models in simulating runoff hydrographs. After an extensive literature review, the committee has determined that statistical criteria are available for more objectively assessing model performance. In addition to graphical plot, three simple evaluation criteria for continuous-hydrograph modeling and four for single-event modeling are proposed by the committee as follows. Continuous-hydrograph modeling: 1. Deviation of runoff volumes, Dv 2. Nash, Sutcliffe coefficient, R 2 3. Coefficient of gain from the daily mean, DG Single-event hydrograph modeling: 1. Simple percent error in peak, PEP 2. Sum of squared residuals, G 3. Total sum of squared residuals, TSSR 4. Total sum of absolute residuals, TSAR Examples have been prepared that show how combinations of these criteria can be applied for continuous snowmelt runoff and single-event, rainfall-runoff hydrographs. ACKNOWLEDGMENTS

The Watershed Management Committee of the ASCE Irrigation and Drainage Division established the Task Committee on Definition of Criteria for Evaluation of Watershed Models. This report is the result of the committee's efforts which included chair Chuck Leaf, Ted Engman, E. Bruce Jones, AI Rango, Tim J. Ward, and Steven Van Vactor, who aided in the literature search and compiled the annotated bibliography used to support the recommendations in this paper. (Note: There is a limited number of copies of the annotated bibliography 440 J. Irrig. Drain Eng., 1993, 119(3): 429-442


Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

developed as background for this paper. Individual copies may be obtained by writing to Ted Engman, Code 974, N A S A / G o d d a r d Space Flight Center, Greenbelt, MD 20771.)

APPENDIX. REFERENCES Barlas, Y. (1985). "Validation of system dynamics models with a sequential procedure involving multiple quantitative methods," PhD thesis, Georgia Institute of Technology, Atlanta, G a . Burges, S. J. (1984). "Rainfall-runoff model validation: The need for unambiguous tests." Bridge between control science and technology (Proc. 9th Triennial World Congress IFAC), Budapest, Hungary, Jul. 2-6. Cavadias, G., and Modn, G. (1986). "Combination of simulated discharges of hydrological models: Application of the WMO intercomparison of conceptual models of snowmelt runoff." Nordic Hydrol., 17(1), 21-30. Cavadias, G., and Morin, G. (1988). "Approximate confidence intervals for verification criteria of the WMO intercomparison of snowmelt runoff models." Hydrol. Sci. J., 33(4), 69-77. Chen, B.-C. H. (1985). "A statistical validation procedure for discrete simulation models over experimental regions," PhD thesis, Syracuse University, Syracuse, N.Y. Desh-Ashtiani, M. (1986). "Confidence interval estimation of the steady-state mean value of a simulation output process," PhD dissertation, University of Southern California, Los Angeles, Calif. Ditmars, J. D., Adams, E. E., Bedford, K. W., and Ford, D. E. (1987). "Performance evaluation of surface water transport and dispersion models." J. Hydr. Engrg., ASCE, 113(8), 961-980. Doyle, W. H., and Miller, J. E. (1980), "Calibration of a distributed routing rainfallrunoff model at four urban sites near Miami, Florida." Water Resources Investigations Report 80-1, U.S. Geological Survey, St. Louis, Mo. Garklavs, G., and Oberg, K. A. (1986). "Effect of rainfall excess calculations on modeled hydrograph accuracy and unit-hydrograph parameters." Water Resour. Bull., 22(4), 565-572. Garrick, M., Cumane, C., and Nash, J. E. (1978). "A criterion of efficiency for rainfall-runoff models." J. Hydrol., 36, 375-381. Gorgens, A. H. M. (1983). "Reliability of calibration of a monthly rainfall-runoff model: The semi-arid case." Hydrol. Sci. J., 28(4), 485-498. Green, I. R. A., and Stephenson, D. (1986). "Criteria for comparison of single event models." Hydrol. Sci. J., 31(3), 395-411. Gupta, V. K., and Sorooshian, S. (1985). "The automatic calibration of conceptual catchment models using derivative-based optimization algorithms." Water Resour. Res., 21(4), 473-485. Hawley, M. E., McCuen, R. H., and Rango, A. (1980). "Comparison of models for forecasting snowmelt runoff volumes." Water Resour. Bull., 16(5), 914-920. Hendrickson, J. D., Sorooshian, S., and Brazil, L. E. (1988). "Comparison of Newton-type and direct search algorithms for calibration of conceptual rainfall-runoff models." Water Resour. Bull., 24(5), 691-700. "Hydrologic data for experimental argicultural watersheds in the U.S." (1973). USDA miscellaneous Publication 1420, U.S. Dept. of Agric., Washington, D.C. "Intercomparison of models of snowmelt runoff." (1986). Operational Hydrology Report No. 23, World Meteorological Organization, Geneva, Switzerland. Isabel, D., and Villeneuve, J. P. (1986). "Importance of the convergence criterion in the automatic calibration of hydrologic models." Water Resour. Res., 22(10), 1367-1370. Jennings, M. E. (1976). "Comparison of the predictive accuracy of models of urban flow and water-quality processes." Proc. Nat. Symp. on Urban Hydrol., Hydr. and Sediment Control, B. J. Barfield, ed., Lexington, Ky., Jul. 27-29. Klemes, V. (1986). "Operational testing of hydrologic simulation models." Hydrol. Sci., 31(1), 13-24. 441

J. Irrig. Drain Eng., 1993, 119(3): 429-442


Downloaded from ascelibrary.org by UNB - Universidade de Braslia on 05/24/17. Copyright ASCE. For personal use only; all rights reserved.

Kirpich, P. Z. (1940). "Time of concentration of small agricultural watersheds." Cir. Engrg., ASCE, 10(6), 362. Linsley, R. K. (1986). "Flood estimates: How good are they?" Water Resour. Res., 22(9), 159S-164S. Loague, K. M., and Freeze, R. A. (1985). "A comparison of rainfall-runoff modeling techniques on upland catchments." Water Resour. Res., 21(2), 229-248. Lusby, G. C., and Lichty, R. W. (1983). "Use of rainfall-simulator data in precipitation-runoff modeling studies." Water Resources Investigations Report 83-4159, U.S. Geological Survey, Denver, Colo. Martinec, J., and Rango, A. (1989). "Merits of statistical criteria for the performance of hydrological models." Water Resour. Bull., 25(2), 421-432. McCuen, R. H., and Bondelid, T. R. (1983). "Estimating unit hydrograph peak rate factors." J. Irrig. Drain. Engrg., ASCE, 109(2), 238-250. Nash, J. E., and Sutcliffe, J. V. (1970). "River flow forecasting through conceptual models, part 1 - - A discussion of principles." J. Hydrol., 10(3), 282-290. Pathak, C. S., Crow, F. R., and Bengtson, R. L. (1984). "Comparative performance of two runoff models on grassland watersheds." Trans., American Society of Agricultural Engineers, 27(2), 397-402. Rango, A., and Martinec, J. (1981). "Accuracy of snowmelt runoff simulation." Nordic Hydrol. , 12(4/5), 265-274. Sabol, G. V., and Ward, T. J. (1985). "Santa Barbara hydrograph with Green-Ampt infiltration." Proc. 1985 ASCE Watershed Management Symp.--Watershed Management in the Eighties, ASCE, 84-91. Shrader, M. L., McCuen, R. H., and Rawls, W. J. (1980). "The effect of data independence in model calibration and model testing." Water Resour. Bull., 16(1), 49-55. Smith, R. A., and Amisial, R. A. (1982). "Comparative analysis of various rainfallrunoff models." Int. Syrup. on Hydrometeorology, American Water Resources Association, Denver, Colo., Jun. 13-17. Troutman, B. M. (1985). "Errors and parameter estimation in precipitation-runoff modeling: I. Theory." Water Resour. Res., 21(8), 1195-1213. Ward, T. J. (1985). "Watershed modeling using 'standard' Green-Ampt infiltration parameters--a case study," presented at the workshop A National Survey: Selected Problems and Solutions in Applied Hydrology and Hydrogeology, American Institute of Hydrology, Minneapolis, Minn., May 16-17.

442

J. Irrig. Drain Eng., 1993, 119(3): 429-442


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.