Genome-wide prediction of discrete traits using bayesian regressions and machine learning

biomed - González-Recio Oscar , Forni , Forni Selma

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

12 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates) small n (number of observations) problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance). It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context. Methods This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO) and two machine learning algorithms (boosting and random forest) to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models' predictive ability. Results The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data. Conclusions The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different alternatives proposed to analyze discrete traits, machine-learning showed some advantages over Bayesian regressions. Boosting with a pseudo Huber loss function showed high accuracy, whereas Random Forest produced more consistent results and an interesting predictive ability. Nonetheless, the best method may be case-dependent and a initial evaluation of different methods is recommended to deal with a particular problem.

Informations

Publié par	biomed
Publié le	01 janvier 2011
Nombre de lectures	0
Langue	English

Extrait

González-Recio and Forni Genetics Selection Evolution 2011, 43 :7 http://www.gsejournal.org/content/43/1/7

G e n e t i c s S e l e c t i o n E v o l u t i o n

R E S E A R C H Open Access Genome-wide prediction of discrete traits using bayesian regressions and machine learning Oscar González-Recio 1* , Selma Forni 2

Abstract Background: Genomic selection has gained much attention and the main goal is to increase the predictive accuracy and the genetic gain in livestock using dense marker information. Most methods dealing with the large p (number of covariates) small n (number of observations) problem have dealt only with continuous traits, but there are many important traits in livestock that are recorded in a discrete fashion (e.g. pregnancy outcome, disease resistance). It is necessary to evaluate alternatives to analyze discrete traits in a genome-wide prediction context. Methods: This study shows two threshold versions of Bayesian regressions (Bayes A and Bayesian LASSO) and two machine learning algorithms (boosting and random forest) to analyze discrete traits in a genome-wide prediction context. These methods were evaluated using simulated and field data to predict yet-to-be observed records. Performances were compared based on the models ’ predictive ability. Results: The simulation showed that machine learning had some advantages over Bayesian regressions when a small number of QTL regulated the trait under pure additivity. However, differences were small and disappeared with a large number of QTL. Bayesian threshold LASSO and boosting achieved the highest accuracies, whereas Random Forest presented the highest classification performance. Random Forest was the most consistent method in detecting resistant and susceptible animals, phi correlation was up to 81% greater than Bayesian regressions. Random Forest outperformed other methods in correctly classifying resistant and susceptible animals in the two pure swine lines evaluated. Boosting and Bayes A were more accurate with crossbred data. Conclusions: The results of this study suggest that the best method for genome-wide prediction may depend on the genetic basis of the population analyzed. All methods were less accurate at correctly classifying intermediate animals than extreme animals. Among the different alternatives proposed to analyze discrete traits, machine-learning showed some advantages over Bayesian regressions. Boosting with a pseudo Huber loss function showed high accuracy, whereas Random Forest produced more consistent results and an interesting predictive ability. Nonetheless, the best method may be case-dependent and a initial evaluation of different methods is recommended to deal with a particular problem.

Background livestock, and also for breeding resistant individuals to The availability of thousands of markers from high improve farm profitability. throughput genotyping platforms offers an exciting pro- The statistical treatment of the genetic basis of these spect to predict the outcome of complex traits in animal traits is not straightforward because multiple genes, breeding using genomic information (the so-called geno- gene by gene interactions and gene by environment mic selection) and in personalized medicine. Besides interactions underlie most complex traits and diseases. production and other functional traits, genomic selec- Capturing all marker signals is currently challenging. tion offers a novel challenge for discovering genetic var- Besides the large p small n problem, the statistical treat-iants affecting important diseases in humans, plants and ment of the categorical nature of a trait may increase parameterization. So far, methods dealing with genome-assisted evaluations have focused on traits expressed or * Correspondence: gonzalez.oscar@inia.es recorded in a continuous and Gaussian manner [1-3]. 1 INIA. Ctra La Coruña km 7.5, 28040 Madrid. Spain Full list of author information is available at the end of the article However, other traits (e.g. disease, survival) are generally © 2011 González-Recio and Forni; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Univers
Ebooks
Livres audio
Presse
Podcasts
BD
Documents

Genome-wide prediction of discrete traits using bayesian regressions and machine learning

YouScribe

Le catalogue

Le service

Les conditions