Statistics, Department of


Document Type


Date of this Version



Ray, S., Jarquin, D., & Howard, R. (2022). Comparing artificial-intelligence techniques with state-of-the-art parametric prediction models for predicting soybean traits. The Plant Genome, e20263.


Open access.


Soybean [Glycine max (L.) Merr.] is a significant source of protein and oil and is also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein, and oil content is important to feed the ever-growing population. As opposed to high-cost phenotyping, genotyping is both cost and time efficient for breeders because evaluating new lines in different environments (location–year combinations) can be costly. Several genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (genomic best linear unbiased predictor [GBLUP]), a kernel method (Gaussian kernel [GK]), an artificial-intelligence (AI) method (deep learning [DL]), and a hybrid method that corresponds to the emulation of a DL model using a kernel method (an arc-cosine kernel [AK]) in terms of their prediction accuracies for predicting grain yield, oil, and protein using data from the soybean nested association mapping experiment (1,379 genotypes tested in six environments, all genotypes in all environments). The relative performance of the four methods varied with the response variable and whether the model includes the genotype × environmental interaction (G×E) effects or not. The GBLUP consistently showed better performances, whereas GK and AK followed a similar pattern to GBLUP and DL performed slightly worse than the other three methods in most of the cases; however, this may also be attributed to suboptimal hyperparameters. The DL method performed particularly worse than the other three methods in presence of the G×E effects.