Originally published In Press as doi:10.1074/mcp.M500233-MCP200 on November 30, 2005.
Molecular & Cellular Proteomics 5:497-509, 2006.
© 2006 by The American Society for Biochemistry and Molecular Biology, Inc.
Research
Improved Classification of Mass Spectrometry Database Search Results Using Newer Machine Learning Approaches*
Peter J. Ulintz , ,¶,
Ji Zhu||,
Zhaohui S. Qin ,** and
Philip C. Andrews ,
From the National Resource for Proteomics and Pathways, Bioinformatics Program, || Department of Statistics, and ** Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan 48109
Manual analysis of mass spectrometry data is a current bottleneck in high throughput proteomics. In particular, the need to manually validate the results of mass spectrometry database searching algorithms can be prohibitively time-consuming. Development of software tools that attempt to quantify the confidence in the assignment of a protein or peptide identity to a mass spectrum is an area of active interest. We sought to extend work in this area by investigating the potential of recent machine learning algorithms to improve the accuracy of these approaches and as a flexible framework for accommodating new data features. Specifically we demonstrated the ability of boosting and random forest approaches to improve the discrimination of true hits from false positive identifications in the results of mass spectrometry database search engines compared with thresholding and other machine learning approaches. We accommodated additional attributes obtainable from database search results, including a factor addressing proton mobility. Performance was evaluated using publically available electrospray data and a new collection of MALDI data generated from purified human reference proteins.
¶ To whom correspondence should be addressed: University of Michigan, 300 North Ingalls Bldg., Rm. 1196, Ann Arbor, MI 48109. Tel. and Fax: 734647-0951; E-mail: pulintz{at}umich.edu

CiteULike Complore Connotea Del.icio.us Digg Reddit Technorati What's this?
This article has been cited by other articles:

|
 |

|
 |
 
J. Zhang, J. Ma, L. Dou, S. Wu, X. Qian, H. Xie, Y. Zhu, and F. He
Bayesian Nonparametric Model for the Validation of Peptide Identification in Shotgun Proteomics
Mol. Cell. Proteomics,
March 1, 2009;
8(3):
547 - 557.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
M. Brosch, S. Swamy, T. Hubbard, and J. Choudhary
Comparison of Mascot and X!Tandem Performance for Low and High Accuracy Mass Spectrometry and the Development of an Adjusted Mascot Threshold
Mol. Cell. Proteomics,
May 1, 2008;
7(5):
962 - 970.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
D. A. Stead, N. W. Paton, P. Missier, S. M. Embury, C. Hedeler, B. Jin, A. J. P. Brown, and A. Preece
Information quality in proteomics
Brief Bioinform,
March 1, 2008;
9(2):
174 - 188.
[Abstract]
[Full Text]
[PDF]
|
 |
|

|
 |

|
 |
 
J. D. Jaffe, D. R. Mani, K. C. Leptos, G. M. Church, M. A. Gillette, and S. A. Carr
PEPPeR, a Platform for Experimental Proteomic Pattern Recognition
Mol. Cell. Proteomics,
October 1, 2006;
5(10):
1927 - 1941.
[Abstract]
[Full Text]
[PDF]
|
 |
|
Copyright © 2006 by the American Society for Biochemistry and Molecular Biology.
|
Advertisement
Advertisement
|