Selection of the Bandwidth Parameter in a Bayesian Kernel Regression Model for Genomic-Enabled Prediction uri icon

abstract

  • One of the most widely used kernel functions in genomic-enabled prediction is the Gaussian kernel. Selection of the bandwidth parameter for kernel regression has generally been based on cross-validation. We propose a Bayesian method for estimating the bandwidth parameter h of a Gaussian kernel as the modal component of the joint posterior distribution of h and the form parameter . We present a theory for the Bayesian selection of h in a Transformed Gaussian Kernel (TGK) model and its application in two plant breeding datasets (maize and wheat) that were already predicted using the kernel averaging (KA) model in the context of Reproducing Kernel Hilbert Spaces (RKHS KA). We also compared the prediction accuracy of the proposed method with a model that also uses a Gaussian kernel and estimates the bandwidth parameter using a restricted maximum likelihood method (GK REML). Results for the wheat dataset show that the predictive ability of TGK was at least as good as the predictive ability of model RKHS KA, with TGK showing a significantly smaller Predictive Mean Squared Error (PMSE) than the other two approaches. The TGK model was statistically a better predictor than methods GK REML and RKHS KA in terms of mean PMSE and mean correlations in seven (out of 17) trait-environment combinations in the wheat dataset. Fewer differences were found between models for the maize data; the TGK model generally had similar or inferior prediction accuracy than GK REML and RKHS KA in various analyses. The superiority of GK REML over TGK based on mean PMSE was clear in seven maize traits.
  • One of the most widely used kernel functions in genomic-enabled prediction is the Gaussian kernel. Usually selection of the bandwidth parameter for kernel regression is based on cross-validation. In this study, we propose a Bayesian method for selecting the bandwidth parameter h of a Gaussian kernel as the mode of its posterior distribution. We present a theory for the Bayesian selection of h in a Transformed Gaussian Kernel (TGK) model and its application in two genomic plant breeding data sets (maize and wheat) that were already predicted using the kernel averaging (KA) method within the context of the Reproducing Kernel Hilbert Spaces’ (RKHS KA). We also compared the prediction accuracy of the proposed method (TGK) with a model that uses a Gaussian kernel (GK) and estimates the bandwidth parameter using restricted maximum likelihood method (GK REML). Results for the wheat data set show that the predictive ability of TGK was on average 3% higher than the predictive ability of model RKHS KA, with TGK showing a smaller Predictive Mean Squared Error (PMSE) than the other two approaches. The advantages of the TGK model over GK REML in terms of PMSE were clear for one trait in nine environments. For the maize data set, the TGK model had slightly better prediction accuracy than methods RKHA KA and GK REML

publication date

  • 2015
  • 2015
  • 2015