Skip to main content
Molecular & Cellular Proteomics

Main menu

  • Home
  • Articles
    • Current Issue
    • Papers in Press
    • Reviews and Minireviews
    • Special Issues
    • Editorials
    • Archive
    • Letters to the Editor (eLetters)
  • Info for
    • Authors
      • Editorial Policies
      • How to Submit
      • Manuscript Contents & Organization
      • Data Reporting Requirements
      • Publication Charges
    • Reviewers
    • Librarians
    • Advertisers
    • Subscribers
  • Guidelines
    • Proteomic Identification
      • Checklist (PDF)
      • Instructions for Annotated Spectra
      • Tutorial (PDF)
    • Clinical Proteomics
      • Checklist (PDF)
    • Glycomic Identification
      • Checklist (PDF)
    • Targeted Proteomics
      • Checklist (PDF)
    • Data-Independent Acquisition
      • Checklist (PDF)
    • Frequently Asked Questions
  • About
    • Mission Statement and Scope
    • Editorial Policies
    • Editorial Board
    • MCP Lectureships
    • Permissions and Licensing
    • Partners
    • Alerts
    • Contact Us

Submit

  • Submit
  • Publications
    • ASBMB
    • Molecular & Cellular Proteomics
    • Journal of Biological Chemistry
    • Journal of Lipid Research

User menu

  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart

Search

  • Advanced search
  • Publications
    • ASBMB
    • Molecular & Cellular Proteomics
    • Journal of Biological Chemistry
    • Journal of Lipid Research
  • Register
  • Subscribe
  • My alerts
  • Log in
  • My Cart
Molecular & Cellular Proteomics

Advanced Search

  • Home
  • Articles
    • Current Issue
    • Papers in Press
    • Reviews and Minireviews
    • Special Issues
    • Editorials
    • Archive
    • Letters to the Editor (eLetters)
  • Info for
    • Authors
      • Editorial Policies
      • How to Submit
      • Manuscript Contents & Organization
      • Data Reporting Requirements
      • Publication Charges
    • Reviewers
    • Librarians
    • Advertisers
    • Subscribers
  • Guidelines
    • Proteomic Identification
      • Checklist (PDF)
      • Instructions for Annotated Spectra
      • Tutorial (PDF)
    • Clinical Proteomics
      • Checklist (PDF)
    • Glycomic Identification
      • Checklist (PDF)
    • Targeted Proteomics
      • Checklist (PDF)
    • Data-Independent Acquisition
      • Checklist (PDF)
    • Frequently Asked Questions
  • About
    • Mission Statement and Scope
    • Editorial Policies
    • Editorial Board
    • MCP Lectureships
    • Permissions and Licensing
    • Partners
    • Alerts
    • Contact Us
  • Submit
Technological Innovation and Resources

A Meta-proteogenomic Approach to Peptide Identification Incorporating Assembly Uncertainty and Genomic Variation

View ORCID ProfileSujun Li, Haixu Tang and Yuzhen Ye  Correspondence email
Molecular & Cellular Proteomics August 9, 2019, First published on May 29, 2019, 18 (8 suppl 1) S183-S192; https://doi.org/10.1074/mcp.TIR118.001233
Sujun Li
School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Sujun Li
Haixu Tang
School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Yuzhen Ye
School of Informatics, Computing and Engineering, Indiana University, Bloomington, IN
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: yye@indiana.edu
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Article Figures & Data

Figures

  • Tables
  • Additional Files
  • Figure1
    • Download figure
    • Open in new tab
    • Download powerpoint
  • Fig. 1.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 1.

    An overview of the pipeline for peptide/protein identification from metaproteomic MS/MS data. The pipeline integrate two approaches: the Graph2Pro approach that uses assembly uncertainties and the new variant-aware approach Var2Pep.

  • Fig. 2.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 2.

    Comparison of peptide identification by the different approaches on the ocean data sets. The barplot shows the total number of unique peptides identified from six ocean metaproteomic MS/MS data sets, by different approaches.

  • Fig. 3.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 3.

    Comparison of peptide identification results by the different approaches on the wastewater data sets. The barplot shows the total number of unique peptides identified from three wastewater samples (SD3, SD6, and SD7), using either matching metagenomic data alone (MG), or both metagenomic and metaproteomic data (MGMT) as the reference.

  • Fig. 4.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 4.

    Selected examples of identified proteins and variants supported by metaproteomic MS/MS data (ocean water sample CS51 and wastewater sample SD3). The plots depict the regions in the proteins that are supported by identified peptides. The black lines on the top represent the proteins, with boxes in different colors showing predicted PFAM domains in these proteins (orange box represents the SBP_bac_5 domain, gray box represents DUF2815, and the blue boxes represent the SLH domains. The green lines below the protein lines represent MS/MS supported peptides from each protein (no mismatches), and the red lines represent peptide variants that share similarity with the protein, with the number of mismatches indicated by the bar on the left.

  • Fig. 5.
    • Download figure
    • Open in new tab
    • Download powerpoint
    Fig. 5.

    Var2Pep detected variations that are preferred substitutions among homologous proteins. The y axis shows the BLOSUM score of pairs of amino acids. The “all” box is for all pairs of possible amino acids (excluding pairs of identical residues), whereas the other box is for variations found by Var2Pep in Protein366.

Tables

  • Figures
  • Additional Files
    • View popup
    Table I Summary of the read-based MS/MS search databases for the ocean data sets
    BStCS
    Number of peptides in metapeptides database (obtained from 30)15,911,89319,194,693
    Number of mismatched/unmatched reads148,502,311221,702,454
    Number of peptides in Var2Pep database2,702,6554,891,690
    • Note: Bst, Bering strait; CS, Chukchi sea.

    • View popup
    Table II Summary of peptide identification from MS/MS spectra for selected ocean datasets
    BSt45 (90,072 spectra)CS51 (100,588 spectra)
    PSMs (%)Unique peptidePSMs (%)Unique peptide
    Reads based7795 (8.6%)295815,167 (15.1%)5652
    Contigs based (S*)1892 (2.10%)8176526 (6.49%)2442
    Graph2Pro (S*)12,728 (14.1%)454226,576 (26.4%)9932
    Contigs based (M*)8631 (9.6%)336717,427 (17.3%)6388
    Graph2Pro (M*)15,172 (16.8%)585729,072 (28.9%)11,463
    Graph2Pro (M*) + Var2Pep15,913 (17.7%)624031,145 (31.0%)12,380
    • Note: S* represents SOAPdenovo2, M* represents MegaHit, PSM stands for peptide spectrum match. Bst stands for Bering strait, and CS stands for Chukchi sea. Graph2Pro (S*) and Graph2Pro (M*) represent using assembly graph from SOAPdenovo2 (S*) and MegaHit (M*) as the reference in Graph2Pro, respectively. In all cases, FDR (false discovery rate) was estimated using a target-decoy search approach, and a cutoff of 1% at spectrum level was applied. This table only shows the results for two datasets. See Fig. 2 and supplementary Data File S1 for results of all data sets.

    • View popup
    Table III Comparison of additional spectra and unique peptides identified by searching against the Var2Pep database using the separate search approach (our approach) versus cascaded search
    Separate (our approach)Cascaded
    PSMsPeptidesPSMsPeptides
    CS5120739171598629
    CS5222828161692533
    CS5319696251466392
    Bst45741383495221
    Bst46651318390173
    Bst47640322418192
    • Note: PSM stands for peptide spectrum match. Bst, Bering strait; CS, Chukchi sea.

    • View popup
    Table IV Comparison of the performance using metagenome assembly from MegaHit and MetaSpades as the reference on SD3-MG dataset
    MegaHitMetaSpades
    Number of contigs111,739446,773
    Total bases (MB)78166
    Contig only (PSM/peptide)43650 (11173)40161 (10337)
    Graph2Pro (PSM/peptide)63178 (17452)57760 (17651)
    Graph2Pro + Var2pep (PSM/peptide)74317 (21662)71366 (22043)

Additional Files

  • Figures
  • Tables
  • Supplemental Data

    • Supplementary Data 3 - Proteins identified from the CS51 dataset and their annotations
    • Supplementary Data 1 - Peptide identification results for the ocean datasets
    • Supplementary Figure 4 - Comparison of peptide identification results using FDR at spectrum and peptide level for the SD3-MG dataset.
    • Supplementary Figure 3 - Comparison of the different approaches on all wastewater datasets, using FDR <= 0.01 at peptide level.
    • Supplementary Figure 2 - Comparison of peptide identification results using FDR at spectrum and peptide level for the CS51 dataset.
    • Supplementary Figure 1 - Comparison of the different approaches on all ocean water datasets, using FDR <= 0.01 at peptide level.
PreviousNext
Back to top
Print
Download PDF
Article Alerts
Sign In to Email Alerts with your Email Address
Email Article

Thank you for your interest in spreading the word on Molecular & Cellular Proteomics.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
A Meta-proteogenomic Approach to Peptide Identification Incorporating Assembly Uncertainty and Genomic Variation
(Your Name) has sent you a message from Molecular & Cellular Proteomics
(Your Name) thought you would like to see the Molecular & Cellular Proteomics web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
A Meta-proteogenomic Approach to Peptide Identification Incorporating Assembly Uncertainty and Genomic Variation
Sujun Li, Haixu Tang, Yuzhen Ye
Molecular & Cellular Proteomics August 9, 2019, First published on May 29, 2019, 18 (8 suppl 1) S183-S192; DOI: 10.1074/mcp.TIR118.001233

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero

Request Permissions

Share
A Meta-proteogenomic Approach to Peptide Identification Incorporating Assembly Uncertainty and Genomic Variation
Sujun Li, Haixu Tang, Yuzhen Ye
Molecular & Cellular Proteomics August 9, 2019, First published on May 29, 2019, 18 (8 suppl 1) S183-S192; DOI: 10.1074/mcp.TIR118.001233
del.icio.us logo Digg logo Reddit logo Twitter logo CiteULike logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

In this issue

Molecular & Cellular Proteomics: 18 (8 suppl 1)
Molecular & Cellular Proteomics
Vol. 18, Issue 8 suppl 1
9 Aug 2019
  • Table of Contents
  • Table of Contents (PDF)
  • Cover (PDF)
  • About the Cover
  • Index by author
  • Ed Board (PDF)

View this article with LENS

Jump to section

  • Article
    • Graphical Abstract
    • Abstract
    • EXPERIMENTAL PROCEDURES
    • RESULTS
    • DISCUSSION AND CONCLUSIONS
    • Data Availability
    • Footnotes
    • REFERENCES
  • Figures & Data
  • eLetters
  • Info & Metrics
  • PDF

  • Follow MCP on Twitter
  • RSS feeds
  • Email

Articles

  • Current Issue
  • Papers in Press
  • Archive

For Authors

  • Submit a Manuscript
  • Info for Authors

Guidelines

  • Proteomic Identification
  • Clinical Proteomics
  • Glycomic Identification
  • Targeted Proteomics
  • Frequently Asked Questions

About MCP

  • About the Journal
  • Permissions and Licensing
  • Advertisers
  • Subscribers

ASBMB Publications

  • Molecular & Cellular Proteomics
  • Journal of Biological Chemistry
  • Journal of Lipid Research
  • ASBMB Today

© 2019 American Society for Biochemistry and Molecular Biology | Privacy Policy

MCP Print ISSN 1535-9476 Online ISSN 1535-9484

Powered by HighWire