TUTORIAL Insilicos Proteomic Pipeline (IPP) for Windows

24 pages

English

TUTORIAL Insilicos Proteomic Pipeline (IPP) for Windows

Nisor - Claude Glennon Bryan

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

24 pages

English

Le téléchargement nécessite un accès à la bibliothèque YouScribe
Tout savoir sur nos offres

A propos
Informations
Extrait

Description

TUTORIAL
Insilicos Proteomic Pipeline (IPP)
for Windows

IPP Version 1.0, 2006.
Note: Screenshots may vary from the IPP
build you are using since the application is still
in development.

This document was assembled and edited by Bryan Prazen bryanp@insilicos.com of
Insilicos LLC (www.insilicos.com). This Document utilizes text from multiple
documents written by members of Institute for Systems Biology (ISB) and the
Seattle Proteome Center (SPC) . Go to www.insilicos/ipp.html for the latest version
of this tutorial.
1Table of Contents
1 Introduction................................................................................................................ 3
1.1 About Insilicos Proteomic Pipeline .........................................................................3
1.2 Systems Requirements.......................................................................................... 3
1.3 About this Tutorial................................................................................................ 4
1.4 Who Should Use this Tutorial?............................................................................... 4
2 Getting Started............................................................................................................ 4
2.1 Downloading IPP................................................................................................... 4
2.2 Installing IPP ............................................. ...

Sujets

Tutoriel

Protéinoïde

Séquestration du dioxyde de carbone

QuarkXPress

Banzai Pipeline

XMLHttpRequest

Informations

Publié par	Nisor
Nombre de lectures	178
Langue	English
Poids de l'ouvrage	5 Mo

Extrait

TUTORIAL

Insilicos Proteomic Pipeline (IPP) for Windows

IPP Version 1.0, 2006. Note: Screenshots may vary from the IPP build you are using since the application is still in development. This document was assembled and edited by Bryan Prazen bryanp@insilicos.com of Insilicos LLC ( www.insilicos.com ). This Document utilizes text from multiple documents written by members of Institute for Systems Biology (ISB) and the Seattle Proteome Center (SPC) . Go to www.insilicos/ipp.html for the latest version of this tutorial.

Table of Contents 1 Introduction................................................................................................................ 3 1.1 About Insilicos Proteomic Pipeline.........................................................................3 1.2 Systems Requirements.......................................................................................... 3 1.3 About this Tutorial................................................................................................ 4 1.4 Who Should Use this Tutorial?............................................................................... 4 2 Getting Started............................................................................................................ 4 2.1 Downloading IPP................................................................................................... 4 2.2 Installing IPP......................................................................................................... 4 2.3 IPP license.............................................................................................................5 2.4 DOS reminders......................................................................................................5 2.5 Configuring the Graphical User Interface (GUI)...................................................... 6 3 Tutorial Data............................................................................................................... 7 3.1 Getting the Tutorial Data...................................................................................... 7 3.2 Unpacking and Storing the IPP Tutorial Data......................................................... 7 4 IPP Tutorial................................................................................................................. 8 4.1 Creating Summary HTML Files............................................................................... 8 4.2 Opening the GUI................................................................................................... 9 4.2 Creating pepXML Files...........................................................................................9 4.3 Analyze Peptides................................................................................................. 11 4.3.1 PepXML Viewer............................................................................................. 11 4.3.2 PeptideProphet............................................................................................. 13 4.3.3 XPRESS.......................................................................................................... 14 4.3.4 ASAPRatio..................................................................................................... 15 4.3.5 Libra............................................................................................................. 15 4.3.6 Analysis........................................................................................................ 15 4.4 Evaluating the Results......................................................................................... 16 4.4.1 PeptideProphet Results................................................................................. 16 4.4.2 XPRESS Results................................................................................................. 17 4.4.3 ASAPRatio Results............................................................................................ 18 4.4.4 Reviewing Processed Data .............................................................................19 4.5 Protein Analysis.................................................................................................. 19 4.9 Exporting Data.................................................................................................... 21 5 Beyond this Tutorial.................................................................................................. 22 5.1 Creating mzXML Files.......................................................................................... 22 6 Automation............................................................................................................... 22

7 Getting Help.............................................................................................................. 22

8 Glossary ................................................................................................................... 23

8 References................................................................................................................ 23

1 Introduction

This tutorial will cover the application of the Insilicos Proteomic Pipeline (IPP) for protein identification and quantitation to a set of data that has been searched using SEQUEST. Although this tutorial should be helpful to anyone interested in statistical and quantitative analysis of proteomics data resulting from tandem mass spectrometry, this tutorial was designed for the scientist who is running SEQUEST searches on their tandem mass spectrometry data at their mass spectrometry facility and would like to process their data a step further. This tutorial shows an example of how to run the IPP tools so that previously searched data can be further analyzed, and future data that is first searched using SEQUEST can be statistically evaluated, quantified and organized using IPP tools at one’s desk when time on a computer with a licensed copy of a search engine is short.

1.1 About Insilicos Proteomic Pipeline Insilicos Proteomic Pipeline (IPP) is a data analysis pipeline for the analysis of LC/MS/MS proteomics data. IPP is a version of the highly successful open source proteomics data analysis software known as the trans-proteomics pipeline (TPP). IPP takes all the data analysis power of TPP and bundles it in a package that is easier to install, faster and more reliable. IPP includes data analysis modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assi ned peptides usin a wide variety of database search en ines.

If you have questions about the IPP or have general feedback, please contact the IPP discussion group at http://groups.google.com/group/InsilicosIPP or http:// www.insilicos.com/support.html . Comments and corrections on this document are welcomed at ipp@insilicos.com .

1.2 Systems Requirements Currently IPP is only available for the Windows operating systems. A Linux version is currently being developed. IPP runs under Windows 2000 or Windows XP. A web browser such as Internet Explorer or Firefox is also required. Including the IPP software,

the tutorial requires about 900MB of hard drive space. IPP itself requires approximately 190MB of disk space. The remaining space is necessary to store and manipulate the data. For future IPP analysis it is important to remember that IPP requires that mass spectrometer data be saved in mzXML or mzDATA formats. mzXML and mzDATA are instrument independent data formats used by data analysis software like IPP and data repositories. mzXML was developed by the Institute for Systems Biology and mzData, developed by the HUPO PSI standards group. Unfortunately, storing data in both the mass spectrometer manufacture specific format and one instrument independent data format will require more than twice as much storage space for data.

1.3 About this Tutorial This guide uses the following typographical conventions: Bold is used to indicate commands or steps that the user must complete. Small Itallics is use for notes that contain information that is not required to complete this tutorial.

1.4 Who Should Use this Tutorial? This tutorial is written for anyone who has a general interest in learning about one method to identify and quantify peptides and proteins using mass spectrometry. We have attempted to write this tutorial so that the user does not need an extraordinary knowledge of proteomics, biology, chemistry, mass spectrometry, or software engineering. Also, this tutorial does not require any software or data that is not easily available on the web and it does not require any previous experience with the analysis of proteomics data. This tutorial should also be of use to those who are very familiar with proteomics data analysis but do not have a great deal of experience with IPP or TPP.

2 Getting Started

2.1 Downloading IPP IPP should be downloaded at http://www.insilicos.com/IPP_download.html

2.2 Installing IPP To install IPP, just double click the IPP executable that you downloaded. Read the license and continue if you agree. The IPP installer will set up the pipeline and the web server that will be used to interact with the pipeline. The installer will also give you the option of installing InsilicosViewer. InsilicosViewer is an agile data viewer for mass spectrometric proteomics experiments. InsilicosViewer can be very useful for publishing MS data and trouble shooting the analytical process. After installation a shortcut to the Insilicos Proteomics Pipeline will be placed on your desktop. The IPP GUI is a web application that calls a set of command line utilities that generate results in XML format for viewing in a web browser. Thus, it's necessary to have a web server installed on the computer that IPP is installed on. If you don't have a web server, the IPP installer will install one for you. You do not need to do any extra setup, but you should be aware that a web server is part of IPP installation.

2.3 IPP license IPP licenses are sold as yearly subscriptions. Academic discounts are available. Quotes are available by writing to getipp@insilicos.com . When you install and run the IPP you will notice language about sending a code to Insilicos at getipp@insilicos.com . IPP will operate without a license, but to get the full benefit of the Insilicos performance enhancements and extra features you will need a license. After you send your code to getipp@insilicos.com you will receive a response that contains a code. At the computer where IPP is installed, choose run from the Windows Start menu, type "xinteract -licensecheck" then paste the code starting with "Key: ..."

Alternatively you can copy the key to a file named " license.key" in the IPP installation directory (usually "c:\Inetpub\ipp-bin\license.key"):

2.4 DOS reminders With the introduction of the proteomics pipeline GUI, IPP does not require much use of the command line, but this section is included in the tutorial because the pipeline tools can be run in a DOS environment and high throughput proteomics facilities find that it can save operator time to automate commands in the DOS environment. If you are old enough to remember the dark days of DOS you will not have any problem running IPP from DOS. If not, we have included a few commands to make you feel at home. First of all, the DOS shell can be found in the start menu under run. Click start Click run Type cmd in the box labeled ‘Open:’ Below is are a few commands for the DOS shell that will help you find your way around the DOS environment.

dir lists the files in a directory cd change directory; cd .. moves you backwards to the next higher subdirectory level md makes a directory mv moves a file to a different directory program displays the reference manual page about a program. For components of the pipeline this will often show the syntax necessary to run the program and options associated with the program. To copy text from the DOS shell first highlight the text with the mouse, put the mouse over the DOS shell window bar, right click, select edit, and then select copy. To paste text put the cursor in the desired location, put the mouse over the DOS shell window bar, right click, select edit and then select paste. Wildcards * and ? are wildcard commands in the DOS shell. For example the command dir raft4???.html lists all the .html files in the directory that start with raft4 and have 3 characters after the ‘4’ and before the ‘.’. The * wildcard is more general. It matches zero or any number of characters, except that it will not match a period that is the first character of a name. dir raft4041.* Lists all the files that start with ‘raft4041.’. Wildcards can be used in most DOS shell commands.

2.5 Configuring the Graphical User Interface (GUI) For this tutorial and future data analysis all data should be stored in C:\Intetpub \wwwroot\ISB\. Each experiment can be stored in an individual folder at this location, such as our tutorial folder. Setting up an account: IPP’s pipeline GUI comes with one user account. This account has guest as both the user name and password. Under most circumstances there will not be a need for another account, but for this tutorial we will make an account for the fun of it. Open the DOS shell: Choose run from the Windows Start menu, type cmd In the DOS shell type: cd c:\inetpub\ipp-bin\users\ md tutorial cd tutorial

and crypt isbTPPspc IPP > .password You have just created the password ‘IPP’ for the user ‘tutorial.’ NOTE: In order to add a different username, create a ipp-bin/users/NEWUSER/ directory and run crypt isbTPPspc NEWPASSWORD > .password from this directory. In these examples "isbTPPspc" is the crypt key. This can be changed by altering the tpp_gui.pl code.

3 Tutorial Data

3.1 Getting the Tutorial Data This tutorial uses a data set containing proteins that co-purified with lipid raft plasma membrane domains isolated from control and stimulated Jurkat human T cells. The analysis of similar data can be found in: “The Application of New Software Tools to Quantitative Protein Profiling Via Isotope-coded Affinity Tag (ICAT) and Tandem Mass Spectrometry: II. Evaluation of Tandem Mass Spectrometry Methodologies for Large-Scale Protein Analysis, and the Application of Statistical Tools for Data Analysis and Interpretation” Priska D. von Haller, Eugene Yi, Samuel Donohoe, Kelly Vaughn, Andrew Keller, Alexey I. Nesvizhskii, Jimmy Eng, Xiao-jun Li, David R. Goodlett, Ruedi Aebersold, and Julian D. Watts, Mol Cell Proteomics 2003 2: 428-442. The data used in this tutorial is not the same data that is described in the publication but the same scientists collected it using the same sample preparation and mass spectrometry procedures. Analysis was done on a LCQ Classic. The samples were ICAT labeled (Old-ICAT, light = d0 442, heavy = d8 450), separated by cation exchange chromatography, purified by avidin cartrages, separated by µ LC, and measured with MS/ MS. The tandem mass spectra were then analyzed using SEQUEST. This tutorial begins with the analysis of the SEQUEST results. Only a portion of the data from the raft experiment is used in this tutorial in order to save time and hard drive space. This tutorial uses data that has already been searched so that the user does not need to have a SEQUEST license for the computer that is used for this tutorial. Download this data at http://www.insilicos.com/data/tutorial.exe

3.2 Unpacking and Storing the IPP Tutorial Data It is important that all the data that is analyzed with the IPP be stored in specific locations. For security reasons IPP can only see data that is located under the C: \Inetpub\wwwroot\ISB directory. The data that was downloaded is contained in a self extracting WinZip archive. Click on tutorial.exe and the file should extract to the C:\Inetpub\wwwroot\ISB directory. You should now have a folder named ‘tutorial’ which contains mzXML data for 6 LC runs, folders that contain the .out and .dta files, a sequest.params file and a folder

containing a FASTA database. NOTE: To analyze data from your own experiments you will need to search the data and convert the raw data to mzXML format. These steps are covered in the last section - Beyond this Tutorial. The dbase folder needs to be somewhere that IPP can find it. Move the dbase folder to C:\Inetpub\wwwroot using Windows Explorer.

4 IPP Tutorial

4.1 Creating Summary HTML Files Before we get started with the GUI we will need to run one function outside the GUI. This is because the GUI assumes that a SEQUEST search will be done from the GUI. Because we decided not to require SEQUEST for this tutorial, we will first transfer the tutorial’s search results to a format the GUI can read using a text command. Each tandem mass spectrum resulting from a liquid chromatography (LC) experiment results in an individual .out file after analysis with SEQUEST or TurboSEQUEST. The first step in analyzing the tutorial results is to collect the result from a given LC separation. The Out2Summary program collates the .out files into a single HTML file for each LC separation. The original raft data contains 24 separate LC separations. For speed and portability reasons this tutorial will only analyze 6 of the 24 LC separations. The data from these 6 separations will be combined and analyzed as one single experiment. The first step in the analysis is to change the directory in the DOS shell to your working directory for the tutorial. Type or copy the following command into the DOS shell. cd c:\inetpub\wwwroot\isb\tutorial NOTE: You do not need to worry about capitalization for commands of this type in DOS.

Out2Summary must be run for each LC separation. Type or copy (yes, you can copy multiple commands at once): out2summary raft4041 > raft4041.html out2summary raft4243 > raft4243.html out2summary raft4445 > raft4445.html out2summary raft4647 > raft4647.html out2summary raft4849 > raft4849.html out2summary raft5051 > raft5051.html This process will take a few minutes. NOTE: The “>” command directs output that would otherwise go to the screen to the file named raftXXXX.html NOTE: In future analyses the base name used for the .html should match the base name used for the mzXML data (as above), if you want the instrument information to be passed to the IPP tools.

4.2 Opening the GUI The IPP pipeline GUI can be opened by clicking on the Insilicos Proteomics Pipeline shortcut that was created on your desktop during installation or by selecting “Insilicos Proteomics Pipeline” under “Insilicos Proteomics Pipeline” in the Windows start menu. Alternatively, you can open your favorite web browser and paste this link into the navigation bar: http://localhost/ipp-bin/tpp_gui.pl Login as ‘tutorial’ and use ‘IPP’ as the password.

This tutorial is written from the point of view of a researcher viewing data on the computer where the IPP tools are running. The IPP results can also be viewed from another computer through a web browser. For instance, if the computer running IPP has an internet protocol address of 10.0.2.2 on your local network, you can direct the browser on another computer in your network to http://10.0.2.2/ipp-bin/ to view the results. NOTE: To obtain the internet protocol address for your computer type ipconfig in the command shell. NOTE: Currently the network server capability is only fully functional when using a licensed version of IPP. At this point you will be in the “Home” tab of the proteomics pipeline GUI. The Home tab contains information about IPP and the structure of the GUI, along with a pull down menu that lets you choose between SEQUEST or Mascot. The default is SEQEST which is what will be used for this tutorial. Thus, no input is necessary under this tab.

4.2 Creating pepXML Files For this tutorial we begin with data in the pepXML common file format that has already been searched. This is so the tutorial is instrument independent and does not require software beyond the IPP. Click on “Analysis Pipeline” . This will display six tabs which activate different parts of the pipeline. The first tab is Home, which contains information about the IPP. The second tab is used to convert data from different spectrometers into mzXML, and the third tab is used to search the data. We will start with the fourth tab. Your next step is to convert the search results from .html to the pepXML format. pepXML is a file format for storing the results of database search at the peptide level. A great thing about pepXML is that its format is independent of the search engine. pepXML converters are currently available for SEQUEST, Mascot, COMET and X!Tandem results. Also, the Mascot software contains a pepXML exporter.

NOTE: In the near future look for the mzIdent file format that will be a Human Proteome Organisation (HUPO) standard based on pepXML.

Select the ‘pepXML’ tab in the GUI interface.

Select the ‘Add Files’ button.

Using the directory selector on the right side, navigate to the tutorial directory.

Select ‘View’ for one of the .html files.

This command opens another window that contains the SEQUEST search results for all of the spectra in a given LC run.

NOTE: This window can also be accessed at http://localhost/ISB/ Tutorial/raft4041.html .

Go back to the Main GUI page.

Check the select box to the left of each of the 6 .html files

Press the ‘Select’ button.

In the updated window,

Press ‘Add Files’ under the ‘Specify Sequest Parameters File’ section.

Check the sequest.params file and press ‘Select’.

There is no need to select any of the options and the enzyme should already be set as trypsin.

Press ‘Convert to PepXML . ’

This command will take a moment to run. You will need to update the page by clicking the text “UPDATE THIS PAGE” . When the command is completed you will get 6 copies of the message “Command Successful”

As an alternative this same command could

be run from the command line. From the DOS shell and in directory raft directory you can type: Sequest2XML <file_name.html> -Psequest.params for each of the twenty four raft????.html files in the raft directory. Such as: Sequest2XML raft4041.html -Psequest.params Sequest2XML raft4243.html -Psequest.params etc. NOTE: When analyzing your own data, the working directory must contain the summary.html and summary.mzXML as well as the SEQUEST results in .tgz or subdirectorys for Sequest2XML to work.

4.3 Analyze Peptides Now that you have successfully converted your data to the pepXML format, select the ‘Analyze Peptides’ tab at the top of the GUI. Press the ‘Add Files’ button and navigate to the tutorial directory. At this point let’s view the search results in the pepXML format.

4.3.1 PepXML Viewer Click the ‘View’ link next to the raft4041.xml. NOTE: This window can also be accessed at http://localhost/ISB/Tutorial/raft4041.xml .

A new window containing a pepXML viewer will open. From here you can generate a Pep3D image of the LC/MS data, view the complete SEQUEST output for any spectrum, look at the spectra with the matching ions highlighted, see the peptide in relation to the protein it is part of, and BLAST the protein. On the top of the pepXML viewer there is a SEQUEST link. Click this link to view the SEQUEST parameters. Just under the SEQUEST link are buttons to generate a Pep3D image and save the image. Click on the ‘Generate a Pep3D image’ button. Pep3D images can be very useful in assessing the quality of the LC-MS/ MS data. The Pep3D map has mass channels on one axis and chromatographic time on the other.