Abstract
Typical analyses of mass spectrometry data only identify amino acid sequences that exist in reference databases. This restricts the possibility of discovering new peptides such as those that contain uncharacterized mutations or originate from unexpected processing of RNAs and proteins. De novo peptide sequencing approaches address this limitation but often suffer from low accuracy and require extensive validation by experts. Here, we develop SMSNet, a deep learning-based de novo peptide sequencing framework that achieves >95% amino acid accuracy while retaining good identification coverage. Applications of SMSNet on landmark proteomics and peptidomics studies reveal over 10,000 previously uncharacterized HLA antigens and phosphopeptides, and in conjunction with database-search methods, expand the coverage of peptide identification by almost 30%. The power to accurately identify new peptides of SMSNet would make it an invaluable tool for any future proteomics and peptidomics studies, including tumor neoantigen discovery, antibody sequencing, and proteome characterization of non-model organisms.
- Software
- Deep learning
- Bioinformatics searching
- De novo sequencing
- Mass Spectrometry
- Peptides*
- Phosphoproteome
Footnotes
Author contributions: K.K. and S.S. performed research; K.K., H.-Y.T., and S.S. analyzed data; K.K., E.C., and S.S. wrote the paper; H.-Y.T. and D.W.S. contributed new reagents/analytic tools; E.C. and S.S. designed research; H.-Y.T. and D.W.S. contributed mass spectrometry data.
- Received July 2, 2019.
- Revision received September 9, 2019.
- Accepted October 4, 2019.
- Published under license by The American Society for Biochemistry and Molecular Biology, Inc.