|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 4:1189-1193, 2005.
© 2005 by The American Society for Biochemistry and Molecular Biology, Inc.
,




From the
Mass Spectrometry Facility, University of California San Francisco, San Francisco, California 94143-0446 and the ¶ Department of Biological Sciences, Stanford University, Stanford, California 94305-0155
| ABSTRACT |
|---|
|
|
|---|
In database searches of large datasets there is always a long list of spectra that have not been matched to anything by the search engine. There are a number of reasons why these may not match, including poor quality spectra, spectra of peptides containing modifications that were not considered in the search, or peptides that were formed by non-specific cleavages when a certain enzyme cleavage specificity was defined in the search engine. Also the data analyzed by search engines are not the raw data but rather centroided peak list data, which are not always completely representative of the raw data.
These unmatched spectra are typically ignored despite the possibility they could contain important information. A summary of the complications in automated peptide and protein identification has been published recently (9). Hence a number of groups have developed statistical analysis programs of search results to better define the reliability of the reported matches (1013).
There are many groups publishing results from large scale mass spectrometric analyses using different combinations of mass spectrometers and search engines. Unfortunately if a researcher uses one particular combination of tools it can be difficult to assess the quality of the data in studies using different instrument and search engine combinations. Hence there is a drive toward making the raw data itself available so that one can independently assess results and, if desired, reanalyze the results using an alternative searching strategy (14).
In this study we present data from a multidimensional LC-MSMS experiment where we analyzed all acquired spectra manually. From this we are able to report exactly what these unmatched spectra actually constitute. We think this information is important for understanding where there are currently problems with these automated search strategies and to indicate areas where with further refinement this list of unmatched spectra could be reduced. The dataset submitted here was acquired on a QqTOF1 geometry instrument, a QSTAR Pulsar (MDS Sciex/Applied Biosystems). A dataset of a multidimensional LC-MSMS experiment created on an ion trap, LCQ-DECA (Thermo), has already been published in this journal (15). Here we present a QSTAR dataset for comparison. Second to ion traps, QqTOF geometry instruments are the major type of instrument used for large scale proteomic analyses. This dataset submission will allow comparisons of the relative merits of data acquired on each instrument type.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
-factor exposure for 3 h or at M phase using 20 µg/ml nocodazole for 3 h, and then interacting proteins were isolated as published previously (17). Proteins from each cell state (about 510 µg/cell state) were labeled with the cleavable ICAT reagent (Applied Biosystems, Foster City, CA) and analyzed essentially following our published protocol for ICAT of low level samples (18). Briefly proteins were denatured in 9 M urea and reduced with trichloroethylphosphine, and then cysteines of G1 phase-arrested proteins were alkylated with light ICAT reagent, while M phase proteins were alkylated with isotopically heavy reagent. After tryptic digestion peptides were separated by strong cation exchange using a Beckman Gold HPLC system equipped with an analytical flow upgrade. Separation was achieved using a 2.1 x 10-mm polysulfoethyl A column (PolyLC) where Buffer A was 30% ACN, 0.05% formic acid and Buffer B was buffer A containing 400 mM NH4Cl. Six fractions were collected, and each of these was successively passed through the biotin affinity cartridge (Applied Biosystems ICAT kit). Each flow-through was collected separately, and then all ICAT peptides were eluted into one fraction using 30%ACN, 0.4% trifluoroacetic acid. ICAT tags were cleaved in 95% trifluoroacetic acid. Each fraction was reverse phase cleaned up (Zip Tips, Millipore) to desalt the samples and then analyzed by reverse phase LC-MSMS. Reverse phase chromatography was performed using an Ultimate HPLC system and a Famos autosampler (both LC-Packings). Separation was achieved using a 75-µM x 150-mm Pepmap column (LC-Packings) at a flow rate of 300 nl/min. Buffer A was 0.1% formic acid, while Buffer B was acetonitrile, 0.1% formic acid. The gradient separation was 540% B over 105 min. As peptides eluted off the column they were introduced on line into an ESI-QqTOF instrument (QSTAR) and were analyzed using data-dependent switching between MS and MSMS modes; after a 1-s MS spectrum up to three multiply charged precursor ions could be selected for 2-s CID spectra acquisition. After a given precursor was selected, dynamic exclusion was used for the next 60 s to prevent its subsequent reselection.
Peak lists of MSMS spectra from each LC-MS run were created using the Mascot.dll script (version 1.4) within Analyst. These were searched using "Batch Tag," a new piece of software in the latest in-house developmental version of Protein Prospector (for further details see Ref. 19). Those spectra that did not return a high confidence result were manually analyzed by looking at the raw spectra in the Analyst software by interpreting amino acid sequence tags and searching in MS-Homology (Protein Prospector) or by closer examination of the results from the Batch Tag search and assessment of whether the ions observed are those one would predict to be most intense on the basis of the sites of amino acid cleavages (e.g. cleavage N-terminal to a proline or C-terminal to an aspartic acid).
| RESULTS |
|---|
|
|
|---|
Approximately 2000 of these spectra gave confident results, and these were verified only by a cursory look at plots of the ions observed and what they were matched to. The majority of these matches were on the basis of an extensive "y" ion series. The other
1300 spectra were manually analyzed in more extensive detail to determine whether the peptides could be de novo interpreted and, if not, why a peptide could not be confidently assigned.
Following this comprehensive analysis of the dataset we could confidently assign 2368 spectra to predicted tryptic peptides that we felt a search engine should be able to identify when allowing for the modifications of oxidized methionines, protein N-terminal acetylation, and pyroglutamate formation from N-terminal glutamine residues. This left 901 spectra that for various reasons one would not expect the search engine to make a confident match. The reasons for this are summarized in Table I and reported graphically in Fig. 1.
|
|
|
A total of 51 spectra were of modified peptides. The majority of these were either peptides where an asparagine had become deamidated to an aspartic acid or were from the trypsin, which is methylated to reduce chymotryptic activity and minimize autolysis (20). However, there was also a peptide that had an internal disulfide intact, thus having a molecular mass 2 Da less than the peptide with free sulfhydryl groups. A peptide from elongation factor 1
was identified that had a methylated lysine. This lysine 30 is a known site of modification (7).
A number of spectra could not be assigned because of problems in the creation of the peak list used for searching. The data are acquired as profile data but become converted to centroid data for database searching. Errors in the assignment of the peak charge state and recognition of the monoisotopic peak after this centroiding process lead to incorrect information about the parent ion mass, and thus the peptide will not be identified. Both of these problems were most common in spectra of components of relatively high mass (2500 Da or higher) and were mainly caused by poor ion statistics on weak monoisotopic peaks. Jagged peak shapes lead to labeling of multiple spikes on one isotopic peak, leading to the software interpreting this as a part of a highly charged ion and not part of the same isotope profile as the second and third isotopes (Fig. 3).
|
| DISCUSSION |
|---|
|
|
|---|
This study is not reporting the results of a database search but a manual analysis of what we think a search engine could theoretically achieve on this dataset. For analysis of how search engines perform on this dataset, see the accompanying study (19). As these results are on the basis of manual assignments, there is inherently a subjectivity to the results. For example, 313 spectra were categorized as being unassignable fragmentation spectra of peptides. Their lack of assignment is due to an inability to determine with personal confidence an identity for the spectrum. This was in general due to there being very few ions in the spectrum, although some spectra contained several fragment ions of which many were clearly not derived from a peptide; i.e. the spectrum was a mixture of fragmentation of a peptide and a chemical contaminant.
Through the manual analysis of all the data we have been able to assess the quality of data acquired on a QSTAR mass spectrometer. This analysis has also highlighted some of the problems with the data produced. Although this dataset cannot be taken as completely representative of all data acquired on this type of instrument, it does show that the data are typically information-rich and that a high percentage of the data should be assignable.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, May 27, 2005, DOI 10.1074/mcp.D500001-MCP200
1 The abbreviation used is: QqTOF, quadrupole selecting, quadrupole collision cell, time-of-flight. ![]()
* This work was supported by National Institutes of Health National Center for Research Resources Grants RR01614 and RR15804 and NHLBI Grant HL074005-03 and by the Vincent J. Coates Foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
To whom correspondence should be addressed: University of California San Francisco, 521 Parnassus Ave., Rm. C-18, San Francisco, CA 94143-0446. Tel.: 415-476-5189; Fax: 415-502-1655; E-mail: robertc{at}itsa.ucsf.edu
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
I. V. Shilov, S. L. Seymour, A. A. Patel, A. Loboda, W. H. Tang, S. P. Keating, C. L. Hunter, L. M. Nuwaysir, and D. A. Schaeffer The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra Mol. Cell. Proteomics, September 1, 2007; 6(9): 1638 - 1655. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Picotti, R. Aebersold, and B. Domon The Implications of Proteolytic Background for Shotgun Proteomics Mol. Cell. Proteomics, September 1, 2007; 6(9): 1589 - 1598. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Calgaro, M. d. O. Neto, A. C. M. Figueira, M. A.M. Santos, R. V. Portugal, C. A. Guzzi, D. M. Saidemberg, L. Bleicher, J. Vernal, P. Fernandez, et al. Orphan nuclear receptor NGFI-B forms dimers with nonclassical interface Protein Sci., August 1, 2007; 16(8): 1762 - 1772. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hirsch, K. C. Hansen, A. Sapru, J. A. Frank, R. J. Chalkley, X. Fang, J. C. Trinidad, P. Baker, A. L. Burlingame, and M. A. Matthay Impact of Low and High Tidal Volumes on the Rat Alveolar Epithelial Type II Cell Proteome Am. J. Respir. Crit. Care Med., May 15, 2007; 175(10): 1006 - 1013. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Fernandez-Arenas, V. Cabezon, C. Bermejo, J. Arroyo, C. Nombela, R. Diez-Orejas, and C. Gil Integrated Proteomics and Genomics Strategies Bring New Insight into Candida albicans Response upon Macrophage Interaction Mol. Cell. Proteomics, March 1, 2007; 6(3): 460 - 478. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Nielsen, M. M. Savitski, and R. A. Zubarev Extent of Modifications in Human Proteome Samples and Their Effect on Dynamic Range of Analysis in Shotgun Proteomics Mol. Cell. Proteomics, December 1, 2006; 5(12): 2384 - 2391. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hirsch, K. C. Hansen, S. Choi, J. Noh, R. Hirose, J. P. Roberts, M. A. Matthay, A. L. Burlingame, J. J. Maher, and C. U. Niemann Warm Ischemia-induced Alterations in Oxidative and Inflammatory Proteins in Hepatic Kupffer Cells in Rats Mol. Cell. Proteomics, June 1, 2006; 5(6): 979 - 986. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. C. Ru, L. A. Zhu, J. Silberman, and C. D. Shriver Label-free Semiquantitative Peptide Feature Profiling of Human Breast Cancer and Breast Disease Sera via Two-dimensional Liquid Chromatography-Mass Spectrometry Mol. Cell. Proteomics, June 1, 2006; 5(6): 1095 - 1104. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. I. Nesvizhskii, F. F. Roos, J. Grossmann, M. Vogelzang, J. S. Eddes, W. Gruissem, S. Baginsky, and R. Aebersold Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data: Toward More Efficient Identification of Post-translational Modifications, Sequence Polymorphisms, and Novel Peptides Mol. Cell. Proteomics, April 1, 2006; 5(4): 652 - 670. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Chalkley, P. R. Baker, L. Huang, K. C. Hansen, N. P. Allen, M. Rexach, and A. L. Burlingame Comprehensive Analysis of a Multidimensional Liquid Chromatography Mass Spectrometry Dataset Acquired on a Quadrupole Selecting, Quadrupole Collision Cell, Time-of-flight Mass Spectrometer: II. New Developments in Protein Prospector Allow for Reliable and Comprehensive Automatic Analysis of Large Datasets Mol. Cell. Proteomics, August 1, 2005; 4(8): 1194 - 1204. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |