US Department of Commerce


Date of this Version



Journal of Hydrology 418–419 (2012) 17–48; doi:10.1016/j.jhydrol.2011.08.056


Phase 2 of the Distributed Model Intercomparison Project (DMIP 2) was formulated primarily as a mechanism to help guide the US National Weather Service (NWS) as it expands its use of spatially distributed watershed models for operational river, flash flood, and water resources forecasting. The overall purpose of DMIP 2 was to test many distributed models with operational quality data with a view towards meeting NWS operational forecasting needs. At the same time, DMIP 2 was formulated as an experiment that could be leveraged by the broader scientific community as a platform for testing, evaluating, and improving the science of spatially distributed models.

This paper presents the key results of the DMIP 2 experiments conducted for the Oklahoma region, which included comparison of lumped and distributed model simulations generated with uncalibrated and calibrated parameters, water balance tests, routing and soil moisture tests, and simulations at interior locations. Simulations from 14 independent groups and 16 models are analyzed. As in DMIP 1, the participant simulations were evaluated against observed hourly streamflow data and compared with simulations generated by the NWS operational lumped model. A wide range of statistical measures are used to evaluate model performance on both run-period and event basis. A noteworthy improvement in DMIP 2 was the combined use of two lumped models to form the benchmark for event improvement statistics, where improvement was measured in terms of runoff volume, peak flow, and peak timing for between 20 and 40 events in each basin.

Results indicate that in general, those spatially distributed models that are calibrated to perform well for basin outlet simulations also, in general, perform well at interior points whose drainage areas cover a wide range of scales. Two of the models were able to provide reasonable estimates of soil moisture versus depth over a wide geographic domain and through a period containing two severe droughts. In several parent and interior basins, a few uncalibrated spatially distributed models were able to achieve better goodness-of-fit statistics than other calibrated distributed models, highlighting the strength of those model structures combined with their a priori parameters. In general, calibration solely at basin outlets alone was not able to greatly improve relative model performance beyond that established by using uncalibrated a priori parameters. Further, results from the experiment for returning DMIP 1 participants reinforce the need for stationary data for model calibration: in some cases, the improvements gained by distributed models compared to lumped were not realized when the models were calibrated using inconsistent precipitation data from DMIP 1. Event-average improvement of distributed models over the combined lumped benchmark was measured in terms of runoff volume, peak flow, and peak timing for between 20 and 40 events. The percentage of model-basin pairs having positive distributed model improvement at basin outlets and interior points was 18%, 24%, and 28% respectively, for these quantities. These values correspond to 14%, 33%, and 22% respectively, in DMIP 1. While there may not seem to be much gain compared to DMIP 1 results, the DMIP 2 values were based on more precipitation–runoff events, more model-basin combinations (148 versus 51), more interior ungauged points (9 versus 3), and a benchmark comprised of two lumped model simulations.

In addition, we propose a set of statistical measures that can be used to guide the calibration of distributed and lumped models for operational forecasting.