benchmark
1 page
English

benchmark

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres
1 page
English
Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

Description

Benchmarking of linear and non-linear approaches for QSPR studies of metal complexation with ionophoresIgor V. Tetko,1,2* Vitaly P. Solov'ev,3 Alexey V. Antonov,1 Xiaojun Yao,4 Jean Pierre Doucet,4 Botao Fan,4 Frank Hoonakker,5 Denis Fourches,5 Piere Jost,5 Nicolas Lachiche,5 and Alexandre Varnek,5 1- GSF - National Centre for Environment and Health, Institute for Bioinformatics(MIPS), 85764 Neuherberg, Germany2- Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, 02094, Kyiv, Ukraine, http://www.vcclab.org3- Institute of Physical Chemistry, Russian Academy of Sciences, Leninskiy prospect 31a, 119991 Moscow, Russia4- Université Paris 7-Denis Diderot, ITODYS-CNRS UMR 7086, 1, rue Guy de la Brosse, Paris 75005, France5- Laboratoire d'Infochimie, UMR 7551 CNRS, Université Louis Pasteur, 4, rue B. Pascal, Strasbourg 67000, FranceData SetsDescriptors+ 3+161 molecules 241 moleculeslogK (Ag ) logK (Eu ) 1 11O+ 3+ OH2log (Ag ) 112 molecules log (Eu ) 81 molecules 3 1'2 2Objectives OO 6 6' 4'4 2'H OOO HN C N N5 OH7 5' 7' 3'HOOH P COH NN OHOO Can we predict complexation constants NOOO S O OH OHOOSCl O OH using QSPR? OHO SEQUENCES AUGMENTEDATOMSOIIN IWhat are the best descriptors? ATOMS andBONDS(AB)HO N N HO OO=C-C-N; C-C-N; C-N; O=C-C; C=O; C-C C (-C) (-O) (=O)What are the best methods? HO N N ATOMS(A)NH2 N OH C (C) (O) (O) orO C C N; C C N; C N; O C C; C O; C C (Hy) C (C )(O )(O )sp2 sp3 sp3 sp2Do non-linear methods add ...

Informations

Publié par
Nombre de lectures 13
Langue English

Extrait

Benchmarking of linear and nonlinear approaches for QSPR studies of metal complexation with ionophores
Igor V. Tetko,1,2* Vitaly P. Solov'ev,3 Alexey V. Antonov,1 Xiaojun Yao,4 Jean Pierre Doucet,4 Botao Fan,4 Frank Hoonakker,5 Denis Fourches,5 Piere Jost,5 Nicolas Lachiche,5 and Alexandre Varnek,5
1 GSF  National Centre for Environment and Health, Institute for Bioinformatics(MIPS), 85764 Neuherberg, Germany 2 Institute of Bioorganic & Petrochemistry, National Ukrainian Academy of Sciences, 02094, Kyiv, Ukraine, http://www.vcclab.org 3 Institute of Physical Chemistry, Russian Academy of Sciences, Leninskiy prospect 31a, 119991 Moscow, Russia 4 Université Paris 7Denis Diderot, ITODYSCNRS UMR 7086, 1, rue Guy de la Brosse, Paris 75005, France 5 Laboratoire d'Infochimie, UMR 7551 CNRS, Université Louis Pasteur, 4, rue B. Pascal, Strasbourg 67000, France ObjectiveslogK1(Ag+)161 molecDulaetsa SletKs1(Eu3+)241 moleculesDescriptors og O1 +2OH lo2(Ag )112 moleculeslo2(Eu3+)81 molecules3 1' O O6 6'4' 4 2' O NHOHOC7N55N'3'OH 7' OH HO OH PCHO N NO xation constantsO S O OO Can we predict compleOH OHO ON SPR?SO  using QO OSEQUENCES AUGMENTED ATOMS Cä O OH OH II What are the best descriptors?NATOMSandBONDS AB HO N N HO O What are the best methods?O=CCN CCN CN O=CC C=O CC C C O =O HO N NATOMS A NH2N OHC C or O O Do nonlinear methods add some value?O C C N C C N C N O C C C O C C (Hy) Csp2(Csp3)(Osp3)(Osp2)  HOHO O BONDS B Can we compare results of different OHO OH HO   ==      =  =  C HO OH O O O O  methods in an objective way?O(C) Estate indices (D) Atomtype Estate indices and counts atom index name value count values counts name index index OOno no O O 11.091 dO 44.35 4 1 1 SdO 1 dO(acid) 11.09 1 2 SdO(acid) 44.35 4 OHO 3 SdssC 4.08 42 dssC 1.02 1 HO HO OH O O3 sOH 4 1 9.08 36.30 4 SsOH S 5 SsOH(acid) 1 36.30 4 9.083 sOH(acid) HO O OH 1 6 SssCH2 1.524 ssCH2 0.229 12 O OH OH O O 4 6.57 7 SsssN 1 1.645 sssN 5 sssN(al) 1.64 1 8 SsssN(al) 6.57 4 O O HO S6 ssCH2 0.305 1 O O O7 ssCH2 0.305 1 O O O Analyzed approachesHO OH OHOHOHHO SAisnsgoucliaartiVvaeluNeeuDreacloNmetpwoosirtkio(nA(SMNLNR)Ah/ttSpV://Dw)whtwtp.v:/c/icnlfaobc.hoirmg/l.aubs/tarsansnbg.fr/recherche/isida/Traditional plot Regression Error Curve Radial Basis Function Ne BF twork (R N) http://www.cs.waikato.ac.nz/~ml/wekaeu k1 count _ Maximal Margin Linear Programming Method (MMLP) http://mips.gsf/proj/mdcs kNearest Neighbor Method (kNN) Support Vectors Machine http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Data Analysis: double 5fold crossvalidation G e n e r a t i o n o f d e s c r i p to r s
Testing of Statistical Significance
aÄë Éêêçê DREC allows to compare results from several methods on one plot F r a g m e n t a l E  s t a t e c o u n t s E  s t a t e v a l u e s Statistical assesment of results 4 / 5 tr a in i n g s e t METHOD REC RMSE MAE R B F N S V M...K A S N N}rntecralsoslivaledoelesoitcni,nmaditnoKolmogorovSmirnov (KS).211.0NNre651.554.3331.0AScklab6.02SDVd N N SVM 0.11 2.46 1.65  green Experimental versus predicted values for models oKNN 0.124 2.79 1.85  cyan 1 /5 t e s t s e t p r e d i c t i o n logK1(Ag+) and lo2(Eu3+). Despite apparent difference in  blue 07 1.98RBFN 0.132 3  . quality of both models, the outlying molecules in each model can be easil observed.  brown 2.22 3.89MMLP 0.142 AVERAGE 0.274 Sta tistic al e valu atio n 5.19 4.13  gray BOOSTRAP: asnn > mmlp average p<0.001 Bootstrap significance testerag>av0.00ep<SORT1OBnknPA:1BOOavpagerp<e000.ARTSs:P>mvlmmBOOSTRAP: weka > average p<0.001 BOOSTRAP: svd > average p<0.001 BOOSTRAP: mmlp > average p<0.001 Comparison of Methods Comparison of DescriptorsKS: svd != asnn 0.0081 11 svd != svm 0.0147 1 0 0 1 0 00Éí E-ëí1a 9 029 02îaäìÉë svd != weka 0.0258 90 80 78 0054378 003SMF  svd p<0.0001 != average 70 6 066 04äìÑÉêëaÉÖ ãîÉaåíë-Eíaíë60KS: asnn != average p<0.0001 54 00 4 5 0 05ëÉãíå50ÑêaÖSMF KS: svm != mmlp 0.0258 40 3 070302EE-ëë-íaíÉÅ  ìçaíÉíåí svm != average p<0.0001 03 210081 0802 ÅçìåíS: != p<0.0001 average K knn 0 0 p<0.0001KS: weka != average 01 1 2 3 4 5 6 1 2 3 4 5 6 7 8 0 p<0.0001KS: mmlp != average SMF fragments Estate counts1 2 3 4 5 6 7 8 9 01 1 0 01Percentage of best models (y axis) calculated usingStatistical analysis provides an objective comparison of different methods 9 02corresponding descriptor system and all methods. 8 0 7 0 6 03 5 0 4230005 4Conclusions 1 08 0 1 2 3 4 5 6Models based on fragments (SMF, Estate counts) > Estate indices Estate indices all descriptorsNonlinear approaches > multiple linear regression (MLRA) (p<0.05) But ensemble of several MLRA ≈ non-linear approaches Percentage of models (y axis) as a function of the number ofntFooprreaanckheddastiagnsietficwaentsemleocdteeldsn(xdpercandcounnestdomsletsebatydateachperep.detceles)sixaNosignificant differences in performance of nonlinear models ofmodelscontributedusingeachmethod.CalculatteionswereSVM and ASNN provided largest number of "best" models perform ed using MLRA (1), RBF NN (2), kNN (3), MMLP (4),kNN was the fastest method ASNN (5), averaging of all ISIDA models (6), averaging of fi ve first ranked ISIDA m odels (7) and SVM (8).
Acknowledgement IVT was supported with Invited Professor position from Université Louis Pasteur. The part of this work has been performed in the framework of FrenchRussian collaborative project GDRE “SupraChem”.
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents