Agronomy and Horticulture Department


Date of this Version



Published in Field Crops Research 254 (2020) 107825 doi:10.1016/j.fcr.2020.107825


Copyright © 2020 Elsevier B.V. Used by permission.


Large databases containing producer field-level yield and management records can be used to identify causes of yield gaps. A relevant question is how to account for the diverse biophysical background (i.e., climate and soil) across fields and years, which can confound the effect of a given management practice on yield. Here we evaluated two approaches to group producer fields based on biophysical attributes: (i) a technology extrapolation domain spatial framework (‘TEDs’) that delineates regions with similar (long-term average) annual weather and soil water storage capacity and (ii) clusters based on field-specific soil properties and weather during each crop phase in each year. As a case study, we used yield and management data collected from 3462 rainfed fields sown with soybean across the North Central US (NC-US) during four growing seasons (2014–2017). Following the TED approach, fields were grouped into 18 TEDs based on the TED that corresponded to the geographic location of each field. In the cluster approach, fields were grouped into clusters based on similarity of in-season weather and soil. To evaluate how the number of clusters would affect the results, fields were grouped separately into 5, 10, 18, and 30 clusters. The two stratification approaches (TEDs and clusters) were compared on their ability to explain the observed yield variation and yield response to key management factors (sowing date and foliar fungicide and/or insecticide). Lack of stratification of producer fields based on their biophysical background ignored management by environment (M×E) interactions, leading to spurious relationships and results that are not relevant at local level. In the case of the cluster approach, a fine stratification (18 and 30 clusters) explained a larger portion of the yield variance compared with a coarse stratification (5 and 10 clusters). However, for our case study in the NC-US region, we did not find strong evidence that the data-rich clustering approach outperformed the TEDs on the ability to explain yield variation and identify M×E interactions. Only the stratification into 30 clusters exhibited a small improved ability at explaining yield variation compared with the TEDs. However, the use of the clustering approach had important trade-offs, including large amount of data requirements and difficulties to scale results to different regions and over time. The choice of the stratification method should be based on objectives, data availability, and expected variation in yield due to erratic weather across regions and years.