Proteinprotein docking benchmark version 4.0

Nepher

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

4 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

proteinsSTRUCTURE O FUNCTION O BIOINFORMATICSProtein–proteindockingbenchmarkversion4.01 1 2 1*Howook Hwang, Thom Vreven, Joe¨l Janin, and Zhiping Weng1Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester,Massachusetts 016052Yeast Structural Genomics, IBBMC Universite´ Paris-Sud, CNRS UMR 8619, 91405-Orsay, FranceINTRODUCTIONABSTRACTDuring the last decade, the computational protein–protein docking field hasWe updated our protein–proteinadvanced considerably. In part, this is due to the efforts of making algorithmsdocking benchmark to include 1–8available to the community through web servers and/or downloadable packages,complexes that became available 9the community-wide CAPRI experiment, and the development of publically avail-since our previous release. As 10,11able benchmarks of protein–protein complexes.before, we only considered high-A protein–protein docking benchmark provides the community with a set ofresolution complex structures thatnon-redundant protein–protein complexes for which the complex structure andare nonredundant at the family–the constituent unbound structures are available. A benchmark forms a subset offamily pair level, for which the12X-ray or NMR unbound struc- the Protein Data Bank (PDB) and provides a standard dataset that can be usedtures of the constituent proteins for systematic comparison of docking algorithms. Quantity and diversity of inter-are also available. Benchmark ...

Informations

Publié par	Nepher
Nombre de lectures	60
Langue	English

Extrait

SpTRUCTrUREOoFUNCTtIONOeBIOiINFOnRMATICsS

Protein–protein docking benchmark version 4.0 1 12 1 HowookHwang,ThomVreven,Joe¨lJanin,andZhipingWeng* 1 Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605 2 Yeast Structural Genomics, IBBMC Universite´ Paris-Sud, CNRS UMR 8619, 91405-Orsay, France

INTRODUCTION ABSTRACT During the last decade, the computational protein–protein docking field has We updated our protein–protein advanced considerably. In part, this is due to the efforts of making algorithms docking benchmark to include 1–8 available to the community through web servers and/or downloadable packages, complexes that became available 9 the community-wide CAPRI experiment,and the development of publically avail-since our previous release. As 10,11 able benchmarks of protein–protein complexes. before, we only considered high-A protein–protein docking benchmark provides the community with a set of resolution complex structures that non-redundant protein–protein complexes for which the complex structure and are nonredundant at the family– family pair level, for which thethe constituent unbound structures are available. A benchmark forms a subset of 12 X-ray or NMR unbound struc-and provides a standard dataset that can be usedthe Protein Data Bank (PDB) tures of the constituent proteinsfor systematic comparison of docking algorithms. Quantity and diversity of inter-are also available. Benchmark 4.0 actions covered in a benchmark can be improved by tracking updates in PDB. adds 52 new complexes to the 12410 Eight years ago, we introduced the first protein–protein docking benchmark, cases of Benchmark 3.0, repre-13,14 and we updated twice in 2005 (Benchmark 2.0) and 2008 (Benchmark 3.0). senting an increase of 42%. Thus, Recently, Kastritis and Bonvin collected experimentally measured protein–protein benchmark 4.0 provides 176 15 binding affinities (Kds) of 81 test cases in Benchmark 3.0.Since the last release, unbound–unbound cases that can the number of entries in the PDB has increased by more than 13,000. This enables be used for protein–protein docking us to release a new update to the Benchmark. method development and assess-ment. Seventeen of the newly added cases are enzyme-inhibitor com-MATERIALS AND METHODS plexes, and we found no new anti-Data collection gen-antibody complexes. Classifying the new cases according to expected We collected candidate structures from the PDB in a semiautomatic way with difficulty for protein–protein dock-˚ the same resolution cutoffs for X-ray structures (3.25 A) and chain length (mini-ing algorithms gives 33 rigid body 10,13,14 mum of 30 residues) as described earlier.Unlike the previous release, we cases, 11 cases of medium difficulty, now also consider structures determined with nuclear magnetic resonance (NMR) and 8 cases that are difficult. Bench-for the unbound forms of the proteins. We still excluded NMR structures for com-mark 4.0 listings and processed structure files are publicly accessi-plexes to preclude the possibility that they were generated with aid of docking ble at http://zlab.umassmed.edu/algorithms. We used the biological assembly information from the PDB to distin-benchmark/ guish crystal contacts from biological complexes. This initial pass yielded 47,767 unbound structures and 8654 complex structures that represent hetero complexes Proteins 2010; 78:3111–3114. V2010 Wiley-Liss, Inc.of at least two interacting chains. The unbound forms of both binding partners C were available for 1667 complex structures, and we used the Structural Classifica-16 Key words:protein–protein dock-tion of Proteins (SCOP)database (version 1.75) to check this set for redundancy ing; protein complexes; protein– protein interactions, complex Additional Supporting Information may be found in the online version of this article. structure.The authors state no conflict of interest. Grant sponsor: NIH; Grant number: R01 GM084884 *Correspondence to: Zhiping Weng, Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Room 1010, Lazare Research Building, 364 Plantation St, Worcester, MA 01605. E-mail: zhiping.Weng@umassmed.edu Received 20 May 2010; Revised 29 June 2010; Accepted 2 July 2010 Published online 23 July 2010 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/prot.22830

V2010 WILEY-LISS,INC. C

3111 PROTEINS

Table I New Cases in the Protein–Protein Docking Benchmark 4.0 a b2 c Complex Cat.PDB ID 1Protein 1PDB ID 2Protein 2RMSD ()DASA ( ) Rigid body (33) 1CLV_A:I E1JAE_Aa-Amylase 1QFD_A(1)a0.86 2086-Amylase inhibitor 1FLE_E:I E9EST_A Elastase2REL_A(4) Elafin1.02 1762 1GL1_A:I E1K2I_1a-Chymotrypsin 1PMC_A(6)Protease inhibitor LCMI II1.21 1590 1GXD_A:C E1CK7_A proMMP2type IV collagenase1BR9_A Metalloproteinaseinhibitor 21.39 2445 1JTG_B:A E 3GMU_Bb-Lactamase inhibitory protein1ZG4_Ab-lactamase TEM-10.49 2599 1OC0_A:B E 1B3K_APlasminogen activator inhibitor-12JQ8_A(4) VitronectinSomatomedin B domain1 1312 1OYV_A:I E1SCD_A SubtilisinCarlsberg 1PJU_ATwo-headed tomato inhibitor-II0.7 1929 1OYV_B:I E1SCD_A SubtilisinCarlsberg 1PJU_ATwo-headed tomato inhibitor-II0.5 1279 2ABZ_B:E E 3I1U_A CarboxypeptidaseA1 1ZFI_A(1)Leech carboxypeptidase inhibitor0.9 1443 2J0T_A:D E 966C_AMMP1 Interstitial collagenase1D2B_A(20) Metalloproteinaseinhibitor 11.23 1476 2OUL_A:B E 3BPF_AFalcipain 22NNR_A Chagasin0.53 1932 3SGQ_E:I E2QA9_E StreptogrisinB 2OVO_AOvomucoid inhibitor third domain0.39 1210 1FCC_AB:C O1FC1_AB Fcdomain of IgG1 MO62IGG_A(3) Strep.protein G C2 fragment0.93 1354 1FFW_A:B O 3CHY_A Chemotaxisprotein CheY1FWP_A Chemotaxisprotein CheA1.43 1166 1H9D_A:B O 1EAN_ARunx1 domain of CBFa1 1ILF_A(1)Dimerization domain of CBF-b1.32 2121 1HCF_AB:X O1B98_AM Neurotrophin-41WWB_X TrkB-d5growth factors receptor0.88 2135 1JWH_CD:A O3EED_AB Caseinkinase IIbCasein kinase IIchain 3C13_Aa1451chain 1.27 1OFU_XY:A O1OFT_AB SulA(PA3008) 2VAW_ACell division protein FtsZ1.1 1583 1PVH_A:B O 1BQU_AIL6 receptorb0.34 1403inhibitory factor1EMR_A Leukemiachain D2-D3 domains 1RV6_VW:X O1FZV_AB PIGFreceptor binding domain1QSZ_A Flt1protein domain 21.09 1625 1US7_A:B O 2FXS_AHeat shock protein 82 N-ter domain2W0G_A HSP90 co-chaperone CDC 371.06 1095 C-ter domain 1WDW_BD:A O1V8Z_AB Tryptophansynthasebchain 1synthase1GEQ_A Tryptophana3147chain 1.29 1XU1_ABD:T O1U5Y_ABD TNFdomain of APRIL1XUT_A(11) TNFreceptor superfamily member1.3 1696 13B TACI CRD2 domain 1ZHH_A:B O 1JX6_A Autoinducer2-binding periplasmic2HJE_A Autoinducer2 sensor kinase/1.31 2189 protein LuxPphosphatase LuxQ 2A5T_A:B O 1Y20_A NMDAreceptor R1–4A subunit2A5S_A NMDAreceptor R2A subunit1.28 1892 ligand-binding coreligand-binding core 2A9K_A:B O 1U90_A Ras-relatedprotein Ral-A2C8B_X Mono-ADP-ribosyltransferaseC3 0.851750 2B4J_AB:C O1BIZ_AB Integrase(HIV-1) 1Z9E_A(1)PC4 and SFRS1 interacting protein0.99 1273 2FJU_B:A O 2ZKM_XPhospholipaseb2 1MH1_ARac GTPase1.04 1245 2G77_A:B O1FKM_A GTPase-activatingprotein Gyp11Z06_A Ras-relatedprotein Rab-33B1.75 2524 2OOR_AB:C O1L7E_AB NAD(P)transhydrogenase 1E3T_ANAD(P) transhydrogenase subunitb1.42 2065 subunitapart 1 2VDB_A:B O 3CX9_A Serumalbumin 2J5Y_APeptostreptococcal albumin-binding0.47 1797 protein GA module 3BP8_AB:C O1Z6R_AB Mlctranscription regulator3BP3_A PTSglucose-specific enzyme EIICB0.45 1390 3D5S_A:C O 1C3D_AComplement C3d fragment2GOM_A Fibrinogen-bindingprotein C-ter domain0.56 1620 Medium Difficult (11) 1JIW_P:I E1AKL_A Alkalinemetalloproteinase 2RN4_A(1)Proteinase inhibitor2.07 1997 4CPA_A:I E8CPA_A CarboxypeptidaseA 1H20_A(9)Potato carboxypeptidase inhibitor1.97 1175 1LFD_B:A O5P21_A Ras1LXD_A RalGDSRas-interacting domain1.79 1167 1MQ8_A:B O1IAM_A ICAM-1domains 1–21MQ9_A Integrina1.76 1252-L I domain 1R6Q_A:C O 1R6C_XClp protease subunit ClpA2W9R_A Clpprotease adaptor protein ClpS1.67 1651 1SYX_A:B O 1QGV_ASpliceosomal U5 15 kDa protein1L2Z_A(1) CD2receptor binding protein1.64 1292 2 C-ter fragment 2AYO_A:B O 2AYN_AUbiquitin carboxyl-terminal2FCN_A Ubiquitin1.62 3026 hydrolase 14 2J7P_A:D O1NG1_A SRPGTPase Ffh2IYL_D Celldivision protein FtsY1.93 3008 2OZA_B:A O 3HEC_AMAP kinase 143FYK_X MAPkinase-activated protein kinase 21.89 6247 2Z0E_A:B O2D1I_A Cysteineprotease Atg4B1V49_A(1) Microtubule-associatedproteins 2.152477 1A/1B light chain 3B 3CPH_G:A O 3CPI_G Ras-relatedprotein Sec41G16_A RabGDP-dissociation inhibitor2.12 1684 Difficult (8) 1F6M_A:C E 1CL0_A Thioredoxinreductase 2TIR_AThioredoxin 14.9 1821 1ZLI_A:B E1KWM_A CarboxypeptidaseB 2JTO_A(6)Tick carboxypeptidase inhibitor2.53 2083 2O3B_A:B E 1ZM8_ANucA nuclease1J57_A NuiAnuclease inhibitor3.13 1675 1JK9_B:A O1QUP_A CCSmetallochaperone 2JCW_ASOD1 superoxide dismutase4.87 2130 1JZD_AB:C O1JZO_AB DsbCdisulfide bond isomerase1JPE_A DsbDdisulfide bond isomerase2.71 2026 1ZM4_A:B O 1N0V_C Elongationfactor 21XK9_A Diphtheriatoxin A catalytic domain2.94 1554 2I9B_E:A O1YWH_A Urokinaseplasminogen activator2I9A_A Urokinase-typeplasminogen activator3.79 2370 surface receptor 2IDO_A:B O1J54_A DNApolymerase IIIe1SE7_A(1) HOTprotein (P1 phage)2.79 1953 exonuclease domain

a Complex category labels: E5Enzyme/Inhibitor or Enzyme/Substrate, O5Other. b NMR model numbers from are shown in parenthesis. c Change in accessible surface area (DASA) upon complex formation, defined as the ASA of Protein 1 plus the ASA of Protein 2 minus the ASA of the Complex. ASA is calculated using NACCESS.

Protein–Protein Docking Benchmark Version 4.0

Table IIRESULTS AND DISCUSSION Statistics of the Three Classes of Difficulty in the Entire Benchmark 4.0 and the New Cases (in Parentheses) The 52 new cases are listed in Table 1. The entire updated Benchmark is reported in Supporting Informa-I-RMSD fnatfnon-natNumber tion Table S1. 1OYV is a 1:2 complex of a two-headed Rigid body0.90 (1.12)0.79 (0.80)0.21 (0.19)121 (33) 20 inhibitor and subtilisin.We split this complex into two Medium 1.76(1.86) 0.63(0.66) 0.35(0.27) 30(11) cases for the Benchmark that represent the interaction Difficult 3.76(3.45) 0.51(0.60) 0.51(0.41) 25(8) between chain A of subtilisin and chain I (inhibitor) and the interaction between chain B of subtilisin and chain I, respectively. In addition to the aforementioned proper-at the family level. Two complexes were deemed redun-ties, the tables also report the change in accessible surface dant if both proteins in one complex were in the same area (ASA) on complexation, which is a measure for the SCOP families as the two proteins in the other complex, size of the interface between the binding partners. respectively. This yielded 109 complexes that were non-Benchmark 4.0 includes 121 rigid body cases (33 new), redundant with the complexes in the previous release of 30 cases of medium difficulty (11 new), and 25 difficult the Benchmark and amongst themselves. (PDB entries cases (eight new). According to biochemical function, we 17 without SCOP unique identifier sunidwere excluded have 52 enzyme-inhibitor (17 new), 25 antibody–antigen, from the bound candidate list to remove possible redun-and 99 complexes with other function (35 new). We did dancy.) Finally, we used literature information to elimi-not find new antibody–antigen complexes. In this update 18 nate obligate complexes,which further reduced the list of the Benchmark, we included 16 cases that involve NMR to 52 complexes. unbound structures. Among them, 11 cases are classified When we found multiple candidates for an unbound as rigid body, four cases of medium difficulty, and one structure, we selected one structure based on a combina-case as difficult. Thus, the expected difficulty for docking tion of several considerations: highest sequence similarity algorithms using NMR structures in the benchmark is with the bound structure, highest resolution, and lowest similar to the expected difficulty using X-ray structures. If number of missing residues in protein–protein interface we would consider NMR structures for the bound com-area. For an ensemble of multiple candidate entries for plexes, we would have included seven more cases (1GGR, NMR structures, we selected the model that had the 1J6T, 1O2F, 1P9D, 1UR6, 2ODG, and 3EZA). Although lowest interface root-mean-square distance (RMSD) one can argue that exclusion of complex NMR structures (I-RMSD; defined below) with the bound form. The final from the Benchmark should be decided on a case-by-case structure files that are on the benchmark website include basis, we decided to simply leave all out as inclusion cofactors that were present in the original PDB files, and would only lead to a small increase of the Benchmark. in the case of an NMR structure, all the models that Table 2 summarizes the average I-RMSD,fnatandfnon-nat were provided in the original file. for the different classes of docking difficulty. The numbers in Table 2 indicate that the new cases in Benchmark 4.0 (in parentheses) have generally higher I-RMSD for rigid body Classification cases and cases of medium difficulty, which predicts the new test cases to be more challenging for computational docking. As done for the previous releases of the Benchmark, Also, the fraction of rigid body cases in the new cases is 0.63, we classify the new entries, according to expected diffi-somewhat lower than the 0.71 in Benchmark 3.0. Thus, the culty for protein–protein docking algorithms, based on new cases are expected to be more difficult for protein–pro-the structural difference between the bound and the 14 teindocking algorithms, and this must be taken into account unbound forms of the binding partners: when assessing docking algorithms, as performance will Rigid body: depend on the benchmark version utilized. ˚ I-RMSD1.5 A andfnon-nat0.4 In summary, Benchmark 4.0 includes 52 new cases and Medium difficulty: a higher number of new rigid body and medium diffi-˚ ˚˚ [1.5 A<I-RMSD2.2 A] or [I-RMSD1.5 A and culty cases show larger conformational changes upon fnon-nat>0.4] binding than cases in the previous release. This is espe-Difficult: cially useful for the development of protein–protein ˚ I-RMSD>2.2 A docking algorithms that incorporate protein flexibility, a We define I-RMSD as the RMSD between the problem that has recently received much attention but unbound and the bound structures, superposed onto 21 still remains a major challenge. each other, calculated using the Caatoms of the interface residues of both binding partners. In line with Mendez REFERENCES 19 et al.,fnatandfnon-natare the fractions of native residue contacts and non-native residue contacts, respectively, of 1. Vakser IA. Protein docking for low-resolution structures. Protein the superposed unbound structures.Eng 1995;8:371–377.

PROTEINS 3113

H. Hwang et al.

2. Comeau SR, Gatchell DW, Vajda S, Camacho CJ. ClusPro: an auto-mated docking and discrimination method for the prediction of protein complexes. Bioinformatics 2004;20:45–50. 3. MandellJG, Roberts VA, Pique ME, Kotlovyi V, Mitchell JC, Nelson E, Tsigelny I, Ten Eyck LF. Protein docking using contin-uum electrostatics and geometric fit. Protein Eng 2001;14:105– 113. 4. Chen R, Li L, Weng Z. ZDOCK: an initial-stage protein-docking algorithm. Proteins 2003;52:80–87. 5. RitchieDW, Kozakov D, Vajda S. Accelerating and focusing protein-protein docking correlations using multi-dimensional rotational FFT generating functions. Bioinformatics 2008;24:1865– 1873. 6. Dominguez C, Boelens R, Bonvin AM. HADDOCK: a protein-pro-tein docking approach based on biochemical or biophysical infor-mation. J Am Chem Soc 2003;125:1731–1737. 7. de Vries SJ, van Dijk M, Bonvin AM. The HADDOCK web server for data-driven biomolecular docking. Nat Protoc 5:883–897. 8. LyskovS, Gray JJ. The RosettaDock server for local protein-protein docking. Nucleic Acids Res 2008;36(Web Server issue): W233–W238. 9. Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJ, Vajda S, Vakser I, Wodak SJ. CAPRI: a critical assessment of predicted interactions. Proteins 2003;52:2–9. 10. Chen R, Mintseris J, Janin J, Weng Z. A protein-protein docking benchmark. Proteins 2003;52:88–91. 11. Gao Y, Douguet D, Tovchigrechko A, Vakser IA. DOCKGROUND system of databases for protein recognition studies: unbound struc-tures for docking. Proteins 2007;69:845–851.

3114 PROTEINS

12. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res 2000;28:235–242. 13. Mintseris J, Wiehe K, Pierce B, Anderson R, Chen R, Janin J, Weng Z. Protein-protein docking benchmark 2.0: an update. Proteins 2005;60:214–216. 14. Hwang H, Pierce B, Mintseris J, Janin J, Weng Z. Protein-protein docking benchmark version 3.0. Proteins 2008;73:705–709. 15. Kastritis PL, Bonvin AM. Are scoring functions in protein-protein docking ready to predict interactomes? Clues from a novel binding affinity benchmark. J Proteome Res 2010;9:2216–2225. 16. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol 1995;247:536–540. 17. LoConte L, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res 2002;30:264–267. 18. Mintseris J, Weng Z. Structure, function, and evolution of transient and obligate protein-protein interactions. Proc Natl Acad Sci USA 2005;102:10930–10935. 19. Mendez R, Leplae R, De Maria L, Wodak SJ. Assessment of blind predictions of protein-protein interactions: current status of dock-ing methods. Proteins 2003;52:51–67. 20. Barrette-Ng IH, Ng KK, Cherney MM, Pearce G, Ryan CA, James MN. Structural basis of inhibition revealed by a 1:2 complex of the two-headed tomato inhibitor-II and subtilisin Carlsberg. J Biol Chem 2003;278:24062–24071. 21. Zacharias M. Accounting for conformational changes during pro-tein-protein docking. Curr Opin Struct Biol 2010;20:180–186.