A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on clinically relevant estimation of lipoprotein lipids by 1 H NMR spectroscopy of serum. Results A Bayesian methodology, with a biochemical motivation, is presented for a real 1 H NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC) and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG), 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively). The modelling produced a kernel-based reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of the 1 H NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the IDL and LDL resonances in the 1 H NMR spectra. Conclusion The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics.
Open Access Research A novel Bayesian approach to quantify clinical variables and to 1 determine their spectroscopic counterparts inH NMR metabonomic data 1 12 3 Aki Vehtari*, VillePetteri Mäkinen, Pasi Soininen, Petri Ingman, 4 44 Sanna M Mäkelä, Markku J Savolainen, Minna L Hannuksela, 1 1 Kimmo Kaskiand Mika AlaKorpela*
1 Address: Laboratoryof Computational Engineering, Systems Biology and Bioinformation Technology, Helsinki University of Technology, P.O. 2 3 Box 9203, FI02015 HUT, Finland,Department of Chemistry, University of Kuopio, P.O. Box 1627, FI70211 Kuopio, Finland,Department of 4 Chemistry, Instrument Centre, Vatselankatu 2, FI20014 University of Turku, Turku, Finland andDepartment of Internal Medicine, Clinical Research Center, University of Oulu, P.O. Box 5000, FI90014 Oulu, Finland Email: Aki Vehtari* aki.vehtari@hut.fi; VillePetteri Mäkinen vmakine2@lce.hut.fi; Pasi Soininen pasi.soininen@uku.fi; Petri Ingman petri.ingman@utu.fi; Sanna M Mäkelä sanna.makela@oulu.fi; Markku J Savolainen markku.savolainen@oulu.fi ; Minna L Hannuksela minna.hannuksela@oulu.fi; Kimmo Kaski kimmo.kaski@hut.fi; Mika AlaKorpela* mika.alakorpela@hut.fi * Corresponding authors
fromProbabilistic Modeling and Machine Learning in Structural and Systems Biology Tuusula, Finland. 17–18 June 2006
Published: 3 May 2007 BMC Bioinformatics2007,8(Suppl 2):S8
Abstract Background:A key challenge in metabonomics is to uncover quantitative associations between multidimensional spectroscopic data and biochemical measures used for disease risk assessment and diagnostics. Here we focus on 1 clinically relevant estimation of lipoprotein lipids byH NMR spectroscopy of serum. 1 Results:A Bayesian methodology, with a biochemical motivation, is presented for a realH NMR metabonomics data set of 75 serum samples. Lipoprotein lipid concentrations were independently obtained for these samples via ultracentrifugation and specific biochemical assays. The Bayesian models were constructed by Markov chain Monte Carlo (MCMC) and they showed remarkably good quantitative performance, the predictive R-values being 0.985 for the very low density lipoprotein triglycerides (VLDL-TG), 0.787 for the intermediate, 0.943 for the low, and 0.933 for the high density lipoprotein cholesterol (IDL-C, LDL-C and HDL-C, respectively). The modelling produced a kernel-based 1 reformulation of the data, the parameters of which coincided with the well-known biochemical characteristics of theH NMR spectra; particularly for VLDL-TG and HDL-C the Bayesian methodology was able to clearly identify the most characteristic resonances within the heavily overlapping information in the spectra. For IDL-C and LDL-C the resulting model kernels were more complex than those for VLDL-TG and HDL-C, probably reflecting the severe overlap of the 1 IDL and LDL resonances in theH NMR spectra. Conclusion:The systematic use of Bayesian MCMC analysis is computationally demanding. Nevertheless, the combination of high-quality quantification and the biochemical rationale of the resulting models is expected to be useful in the field of metabonomics.
Page 1 of 9 (page number not for citation purposes)