Advertisement

Quantitative Metaproteomics and Activity-based Protein Profiling of Patient Fecal Microbiome Identifies Host and Microbial Serine-type Endopeptidase Activity Associated With Ulcerative Colitis

Open AccessPublished:January 13, 2022DOI:https://doi.org/10.1016/j.mcpro.2022.100197

      Highlights

      • Identified 176 significantly altered protein groups between healthy and UC patients.
      • Serine-type endopeptidase activity is overrepresented in UC patients.
      • Fluorophosphonate ABPP shows that endopeptidases are active in fecal samples.
      • ABPP enrichment helps identify additional putative serine hydrolases in samples.
      • De novo sequencing used to estimate number of MS2 spectra unidentified by ComPIL.

      Abstract

      The gut microbiota plays an important yet incompletely understood role in the induction and propagation of ulcerative colitis (UC). Organism-level efforts to identify UC-associated microbes have revealed the importance of community structure, but less is known about the molecular effectors of disease. We performed 16S rRNA gene sequencing in parallel with label-free data-dependent LC-MS/MS proteomics to characterize the stool microbiomes of healthy (n = 8) and UC (n = 10) patients. Comparisons of taxonomic composition between techniques revealed major differences in community structure partially attributable to the additional detection of host, fungal, viral, and food peptides by metaproteomics. Differential expression analysis of metaproteomic data identified 176 significantly enriched protein groups between healthy and UC patients. Gene ontology analysis revealed several enriched functions with serine-type endopeptidase activity overrepresented in UC patients. Using a biotinylated fluorophosphonate probe and streptavidin-based enrichment, we show that serine endopeptidases are active in patient fecal samples and that additional putative serine hydrolases are detectable by this approach compared with unenriched profiling. Finally, as metaproteomic databases expand, they are expected to asymptotically approach completeness. Using ComPIL and de novo peptide sequencing, we estimate the size of the probable peptide space unidentified (“dark peptidome”) by our large database approach to establish a rough benchmark for database sufficiency. Despite high variability inherent in patient samples, our analysis yielded a catalog of differentially enriched proteins between healthy and UC fecal proteomes. This catalog provides a clinically relevant jumping-off point for further molecular-level studies aimed at identifying the microbial underpinnings of UC.

      Graphical Abstract

      Keywords

      Abbreviations:

      ABPP (activity/affinity-based protein profiling), BLAST (basic local alignment search tool), CD (Crohn’s disease), ComPIL (comprehensive protein identification library), DNA (deoxyribonucleic acid), FP (fluorophosphonate), GI (gastrointestinal tract), GO (gene ontology), GWAS (genome-wide association studies), IBD (inflammatory bowel disease), LC-MS/MS (liquid chromatography tandem mass spectrometry), MAG (metagenome assembled genome), MS (mass spectrometry), NCBI (National Center for Biotechnology Information), PBS (phosphate buffered saline), PCA (principal component analysis), PPI (protein–protein interaction), PSM (peptide-spectrum match), QIIME (quantitative insights into microbial ecology), rRNA (ribosomal ribonucleic acid), STRING (search tool for recurring instances of neighboring genes), UC (ulcerative colitis)
      Inflammatory bowel disease (IBD) is a chronic medical condition characterized by relapsing inflammation of the gastrointestinal (GI) tract. This disease is broadly divisible into two categories based on where inflammation occurs. In ulcerative colitis (UC), inflammation is restricted to the large intestine, while in Crohn’s disease (CD), inflammation can occur anywhere along the GI tract (
      • Baumgart D.C.
      • Carding S.R.
      Inflammatory bowel disease: Cause and immunobiology.
      ,
      • Baumgart D.C.
      • Sandborn W.J.
      Inflammatory bowel disease: Clinical aspects and established and evolving therapies.
      ). In addition to reduced life expectancy, IBD patients can suffer dramatic quality-of-life reductions and are at increased risk to develop gastrointestinal tract malignancy (
      • Bernstein C.N.
      • Blanchard J.F.
      • Kliewer E.
      • Wajda A.
      Cancer risk in patients with inflammatory bowel disease: A population-based study.
      ). The incidence and prevalence of IBD in developed countries have steadily increased in the last few decades, making this disease a public health concern with a potentially heavy cost burden due to a requirement for long-term management (
      • Molodecky N.A.
      • Soon I.S.
      • Rabi D.M.
      • Ghali W.A.
      • Ferris M.
      • Chernoff G.
      • Benchimol E.I.
      • Panaccione R.
      • Ghosh S.
      • Barkema H.W.
      • Kaplan G.G.
      Increasing incidence and prevalence of inflammatory bowel diseases with time, based on systematic review.
      ). Targeted cures for UC and CD are highly desirable, but the search for such treatments is hampered by our incomplete understanding of disease development. Genome-wide association studies (GWAS) have identified over 200 genetic loci associated with UC and CD, but the polygenic nature of these conditions explains only a minor portion of disease incidence (
      • Jostins L.
      • Ripke S.
      • Weersma R.K.
      • Duerr R.H.
      • McGovern D.P.
      • Hui K.Y.
      • Lee J.C.
      • Schumm L.P.
      • Sharma Y.
      • Anderson C.A.
      • Essers J.
      • Mitrovic M.
      • Ning K.
      • Cleynen I.
      • Theatre E.
      • et al.
      Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease.
      ,
      • Liu J.Z.
      • van Sommeren S.
      • Huang H.
      • Ng S.C.
      • Alberts R.
      • Takahashi A.
      • Ripke S.
      • Lee J.C.
      • Jostins L.
      • Shah T.
      • Abedian S.
      • Cheon J.H.
      • Cho J.
      • Dayani N.E.
      • Franke L.
      • et al.
      Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations.
      ,
      • de Lange K.M.
      • Moutsianas L.
      • Lee J.C.
      • Lamb C.A.
      • Luo Y.
      • Kennedy N.A.
      • Jostins L.
      • Rice D.L.
      • Gutierrez-Achury J.
      • Ji S.G.
      • Heap G.
      • Nimmo E.R.
      • Edwards C.
      • Henderson P.
      • Mowat C.
      • et al.
      Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease.
      ). Concordance rates of about 30% for CD and 15% for UC among monozygotic twins suggest a significant nongenetic contribution to disease development (
      • Brant S.R.
      Update on the heritability of inflammatory bowel disease: The importance of twin studies.
      ). Because our gut microbes are in perpetual contact with our GI tracts, they comprise important but ill-defined environmental variables that many studies have implicated in IBD development. IBD triggers are unknown, but its progression is hypothesized to be amplified by inappropriate host–microbe interactions that lead to dysbiosis and, eventually, observable gross pathology (
      • Dalal S.R.
      • Chang E.B.
      The microbial basis of inflammatory bowel diseases.
      ,
      • Nishida A.
      • Inoue R.
      • Inatomi O.
      • Bamba S.
      • Naito Y.
      • Andoh A.
      Gut microbiota in the pathogenesis of inflammatory bowel disease.
      ,
      • Ni J.
      • Wu G.D.
      • Albenberg L.
      • Tomov V.T.
      Gut microbiota and IBD: Causation or correlation?.
      ,
      • Manichanh C.
      • Borruel N.
      • Casellas F.
      • Guarner F.
      The gut microbiota in IBD.
      ,
      • Caruso R.
      • Lo B.C.
      • Nunez G.
      Host-microbiota interactions in inflammatory bowel disease.
      ).
      Efforts to identify potential microbial drivers of IBD are ongoing but stymied by the immense taxonomic complexity of the gut microbiota. The gut harbors hundreds of distinct species per individual, which can change over time and with perturbations to host lifestyle or xenobiotic exposure (
      • Qin J.
      • Li R.
      • Raes J.
      • Arumugam M.
      • Burgdorf K.S.
      • Manichanh C.
      • Nielsen T.
      • Pons N.
      • Levenez F.
      • Yamada T.
      • Mende D.R.
      • Li J.
      • Xu J.
      • Li S.
      • Li D.
      • et al.
      A human gut microbial gene catalog established by metagenomic sequencing.
      ,
      • Almeida A.
      • Mitchell A.L.
      • Boland M.
      • Forster S.C.
      • Gloor G.B.
      • Tarkowska A.
      • Lawley T.D.
      • Finn R.D.
      A new genomic blueprint of the human gut microbiota.
      ,
      • Lozupone C.A.
      • Stombaugh J.I.
      • Gordon J.I.
      • Jansson J.K.
      • Knight R.
      Diversitiy, stability, and resilience of the human gut microbiota.
      ). Campaigns to characterize and monitor the gut microbiota frequently utilize amplicon and metagenomic sequencing technologies, which can provide information about microbial community structure, genetic potential, and transcriptional activities (
      • Gill S.R.
      • Pop M.
      • Deboy R.T.
      • Eckburg P.B.
      • Turnbaugh P.J.
      • Samuel B.S.
      • Gordon J.I.
      • Relman D.A.
      • Fraser-Liggett C.M.
      • Nelson K.E.
      Metagenomic analysis of the human distal gut microbiome.
      ,
      • Franzosa E.A.
      • Hsu T.
      • Sirota-Madi A.
      • Shafquat A.
      • Abu-Ali G.
      • Morgan X.C.
      • Huttenhower C.
      Sequencing and beyond: Integrating molecular ‘omics’ for microbial community profiling.
      ). As the collective size of shotgun metagenomic sequence space has expanded from these efforts, so too have the opportunities for liquid chromatography tandem mass spectrometry (LC-MS/MS)-based metaproteomics (
      • Wilmes P.
      • Bond P.L.
      The application of two-dimensional polyacrylamide gel electrophoresis and downstream analyses to a mixed community of prokaryotic microorganisms.
      ), which rely on protein reference databases constructed from translated genome sequences (
      • Eng J.K.
      • McCormack A.L.
      • Yates J.R.
      An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.
      ). Host protein profiling is a straightforward process given the relative completeness of host (e.g., human, mouse, etc.) genome assemblies, but microbiome protein profiling is more difficult due to the high strain diversity and presence of unculturable microbes in the gut (
      • Stewart E.J.
      Growing unculturable bacteria.
      ). Sample-matched MAGs (metagenome assembled genomes) delivering good spectrum match rates have become an effective solution but can be cost-limiting and require expertise (
      • Almeida A.
      • Mitchell A.L.
      • Boland M.
      • Forster S.C.
      • Gloor G.B.
      • Tarkowska A.
      • Lawley T.D.
      • Finn R.D.
      A new genomic blueprint of the human gut microbiota.
      ,
      • Dang D.D.
      • Froula J.
      • Egan R.
      • Wang Z.
      MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.
      ,
      • Chen L.-X.
      • Anantharaman K.
      • Shaiber A.
      • Muran Eren A.
      • Banfield J.F.
      Accurate and complete genomes from metagenomes.
      ,
      • Pasolli E.
      • Asnicar F.
      • Manara S.
      • Zolfo M.
      • Karcher N.
      • Armanini F.
      • Beghini F.
      • Manghi P.
      • Tett A.
      • Ghensi P.
      • Collado M.C.
      • Rice B.L.
      • DuLong C.
      • Morgan X.C.
      • Golden C.D.
      • et al.
      Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle.
      ,
      • Nayfach S.
      • Shi Z.J.
      • Seshadri R.
      • Pollard K.S.
      • Kyrpides N.C.
      New insights from uncultivated genomes of the global human gut microbiome.
      ). In addition, manual reference database curation has proven to be an important consideration in metaproteomics but becomes computationally burdensome as community diversity expands (
      • Tanca A.
      • Palomba A.
      • Fraumene C.
      • Pagnozzi D.
      • Manghina V.
      • Deligios M.
      • Muth T.
      • Rapp E.
      • Martens L.
      • Addis M.F.
      • Uzzau S.
      The impact of sequence database choice on metaproteomic results in gut microbiota studies.
      ). To address this problem, we developed the Comprehensive Protein Identification Library (ComPIL), a large and scalable proteomics database generally intended for metaproteomics studies (
      • Chatterjee S.
      • Stupp G.S.
      • Park S.K.R.
      • Ducom J.-C.
      • Yates J.R.
      • Su A.I.
      • Wolan D.W.
      A comprehensive and scalable database search system for metaproteomics.
      ). In its current iteration (ComPIL 2.0), it houses >4.8 billion unique, tryptic peptides derived from >113 million bacterial, archaeal, viral, and eukaryotic parent protein sequences assembled from public sequencing repositories (
      • Park S.K.R.
      • Jung T.
      • Thuy-Boun P.S.
      • Wang A.Y.
      • Yates J.R.
      • Wolan D.W.
      ComPIL 2.0: An updated comprehensive metaproteomics database.
      ). With periodic incorporation of new sequences from shotgun metagenomics repositories, we envision that ComPIL will help enable interlaboratory consonance in the global interpretation and communication of bottom-up metaproteomics results. In addition to enabling the direct, large-scale observation of proteins in a complex mixture, LC-MS/MS-based metaproteomics techniques obviate a requirement for intact cells, facilitate the observation of posttranslational protein modifications, and enable functional interrogation of new or incompletely annotated proteins through such cognate techniques as activity- and affinity-based protein profiling (ABPP) (
      • Jessani N.
      • Cravatt B.F.
      The development and application of methods for activity-based protein profiling.
      ,
      • Cravatt B.F.
      • Wright A.T.
      • Kozarich J.W.
      Activity-based protein profiling: From enzyme chemistry to proteomic chemistry.
      ).
      Relative to metagenomics, LC-MS/MS-based metaproteomics are less commonly applied and more rarely employed in IBD studies. In fact, the first large-scale endeavor to identify proteins from a microbial biofilm community was only disclosed by Banfield, et al. in 2005 (
      • Ram R.J.
      • Verberkmoes N.C.
      • Thelen M.P.
      • Tyson G.W.
      • Baker B.J.
      • Blake 2nd, R.C.
      • Shah M.
      • Hettich R.L.
      • Banfield J.F.
      Community proteomics of a natural microbial biofilm.
      ,
      • Tyson G.W.
      • Chapman J.
      • Hugenholtz P.
      • Allen E.E.
      • Ram R.J.
      • Richardson P.M.
      • Solovyev V.V.
      • Rubin E.M.
      • Rokhsar D.S.
      • Banfield J.F.
      Community structure and metabolism through reconstruction of microbial genomes from the environment.
      ,
      • VerBerkmoes N.C.
      • Denef V.J.
      • Hettich R.L.
      • Banfield J.F.
      Functional analysis of natural microbial consortia using community proteomics.
      ). In 2009, Jansson, et al. leveraged high-resolution LC-MS/MS to demonstrate the viability of large-scale metaproteomics in fecal samples collected from a twin pair (
      • Verberkmoes N.C.
      • Russell A.L.
      • Shah M.
      • Godzik A.
      • Rosenquist M.
      • Halfvarson J.
      • Lefsrud M.G.
      • Apajalahti J.
      • Tysk C.
      • Hettich R.L.
      • Jansson J.K.
      Shotgun metaproteomics of the human distal gut microbiota.
      ). The aforementioned study demonstrated for the first time that bottom-up LC-MS/MS-based proteomics technology is suitable for such a complex environment and that it could generate a model of the gut microbiome that is orthogonal to that produced by metagenomics. Since then, several groups have deployed bottom-up metaproteomics to investigate the etiology of IBD in humans by examining patient fecal extracts, intestinal biopsy tissue, and/or blood samples (
      • Erickson A.R.
      • Cantarel B.L.
      • Lamendella R.
      • Darzi Y.
      • Mongodin E.F.
      • Pan C.
      • Shah M.
      • Halfvarson J.
      • Tysk C.
      • Henrissat B.
      • Raes J.
      • Verberkmoes N.C.
      • Fraser C.M.
      • Hettich R.L.
      • Jansson J.K.
      Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn’s disease.
      ,
      • Juste C.
      • Kreil D.P.
      • Beauvallet C.
      • Guillot A.
      • Vaca S.
      • Carapito C.
      • Mondot S.
      • Sykacek P.
      • Sokol H.
      • Blon F.
      • Lepercq P.
      • Levenez F.
      • Valot B.
      • Carré W.
      • Loux V.
      • et al.
      Bacterial protein signals are associated with Crohn’s disease.
      ,
      • Zhang X.
      • Ning Z.
      • Mayne J.
      • Deeke S.A.
      • Li J.
      • Starr A.E.
      • Chen R.
      • Singleton R.
      • Butcher J.
      • Mack D.R.
      • Stintzi A.
      • Figeys D.
      In vitro metabolic labeling of intestinal microbiota for quantitative metaproteomics.
      ,
      • Heintz-Buschart A.
      • May P.
      • Laczny C.C.
      • Lebrun L.A.
      • Bellora C.
      • Krishna A.
      • Wampach L.
      • Schneider J.G.
      • Hogan A.
      • de Beaufort C.
      • Wilmes P.
      Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes.
      ,
      • Zhang X.
      • Chen W.
      • Ning Z.
      • Mayne J.
      • Mack D.
      • Stintzi A.
      • Tian R.
      • Figeys D.
      Deep metaproteomics approach for the study of human microbiomes.
      ,
      • Zhang X.
      • Deeke S.A.
      • Ning Z.
      • Starr A.E.
      • Butcher J.
      • Li J.
      • Mayne J.
      • Cheng K.
      • Liao B.
      • Li L.
      • Singleton R.
      • Mack D.
      • Stintzi A.
      • Figeys D.
      Metaproteomics reveals associations between microbiome and intestinal extracellular vesicle proteins in pediatric inflammatory bowel disease.
      ,
      • Mills R.H.
      • Vázquez-Baeza Y.
      • Zhu Q.
      • Jiang L.
      • Gaffney J.
      • Humphrey G.
      • Smarr L.
      • Knight R.
      • Gonzalez D.J.
      Evaluating metagenomic prediction of the metaproteome in a 4.5-year study of a patient with Crohn’s disease.
      ,
      • Blakely-Ruiz J.A.
      • Erickson A.R.
      • Cantarel B.L.
      • Xiong W.
      • Adams R.
      • Jansson J.K.
      • Fraser C.M.
      • Hettich R.L.
      Metaproteomics reveals persistent and phylum-redundant metabolic functional stability in adult human gut microbiomes of Crohn’s remission patients despite temporal variations in microbial taxa, genomes, and proteomes.
      ,
      • Lloyd-Price J.
      • Arze C.
      • Ananthakrishnan A.N.
      • Schirmer M.
      • Avila-Pacheco J.
      • Poon T.W.
      • Andrews E.
      • Ajami N.J.
      • Bonham K.S.
      • Brislawn C.J.
      • Casero D.
      • Courtney H.
      • Gonzalez A.
      • Graeber T.G.
      • Hall A.B.
      • et al.
      Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases.
      ,
      • Lehmann T.
      • Schallert K.
      • Vilchez-Vargas R.
      • Benndorf D.
      • Puttker S.
      • Sydor S.
      • Schulz C.
      • Bechmann L.
      • Canbay A.
      • Heidrich B.
      • Reichl U.
      • Link A.
      • Heyer R.
      Metaproteomics of fecal samples of Crohn’s disease and ulcerative colitis.
      ). Additionally, several groups including our own have paired traditional proteomic profiling with ABPP techniques in microbiome samples to detect and annotate key protein functionalities often undetectable without preenrichment (
      • Mayers M.D.
      • Moon C.
      • Stupp G.S.
      • Su A.I.
      • Wolan D.W.
      Quantitative metaproteomics and activity-based probe enrichment reveals significant alterations in protein expression from a mouse model of inflammatory bowel disease.
      ,
      • Whidbey C.
      • Sadler N.C.
      • Nair R.N.
      • Volk R.F.
      • DeLeon A.J.
      • Bramer L.M.
      • Fansler S.J.
      • Hansen J.R.
      • Shukla A.K.
      • Jansson J.K.
      • Thrall B.D.
      • Wright A.T.
      A probe-enabled approach for the selective isolation and characterization of functionally active subpopulations in the gut microbiome.
      ,
      • Parasar B.
      • Zhou H.
      • Xiao X.
      • Shi Q.
      • Brito I.L.
      • Chang P.V.
      Chemoproteomic profiling of gut microbiota-associated bile salt hydrolase activity.
      ,
      • Jariwala P.B.
      • Pellock S.J.
      • Goldfarb D.
      • Cloer E.W.
      • Artola M.
      • Simpson J.B.
      • Bhatt A.P.
      • Walton W.G.
      • Roberts L.R.
      • Major M.B.
      • Davies G.J.
      • Overkleeft H.S.
      • Redinbo M.R.
      Discovering the microbial enzymes driving drug toxicity with activity-based protein profiling.
      ).
      The insights garnered from previous metaproteomics studies are valuable for forming a consensus about the constellation of IBD-related environmental factors. We aim to contribute to this nascent pool by presenting a combined 16S rRNA gene amplicon sequencing and metaproteomics analysis of fecal samples from healthy volunteers and ulcerative colitis patients to identify novel proteins associated with health or disease. Using a pipeline that incorporates a novel, strong-acid sample preparation procedure (
      • Wang A.Y.
      • Thuy-Boun P.S.
      • Stupp G.S.
      • Su A.I.
      • Wolan D.W.
      Triflic acid treatment enables LC-MS/MS analysis of insoluble bacterial biomass.
      ), label-free high-resolution LC-MS/MS, and the ComPIL database coupled to the ProLuCID/SEQUEST search engine (
      • Eng J.K.
      • McCormack A.L.
      • Yates J.R.
      An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.
      ,
      • Xu T.
      • Venable J.D.
      • Park S.K.
      • Cociorva D.
      • Lu B.
      • Liao L.
      • Wohlschlegel J.
      • Hewel J.
      • Yates J.R.
      ProLuCID, a fast and sensitive tandem mass spectra-based protein identification program.
      ,
      • Xu T.
      • Park S.K.
      • Venable J.D.
      • Wohlschlegel J.A.
      • Diedrich J.K.
      • Cociorva D.
      • Lu B.
      • Liao L.
      • Hewel J.
      • Han X.
      • Wong C.
      • Fonslow B.
      • Delahunty C.
      • Gao Y.
      • Shah H.
      • et al.
      ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity.
      ), we identify 176 protein groups and several gene ontology (GO) terms enriched in either cohort (
      • Ashburner M.
      • Ball C.A.
      • Blake J.A.
      • Botstein D.
      • Butler H.
      • Cherry J.M.
      • Davis A.P.
      • Dolinski K.
      • Dwight S.S.
      • Eppig J.T.
      • Harris M.A.
      • Hill D.P.
      • Issel-Tarver L.
      • Kasarskis A.
      • Lewis S.
      • et al.
      Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium.
      ). We show that proteomics can provide a more complete picture of gut microbiome taxonomy that includes host, microbial, and even dietary proteins. Using ABPP, we demonstrate that not only are microbiome proteins enzymatically active after collection, but additional proteomic depth can be achieved using ABPP enrichment strategies. Finally, using de novo peptide sequencing tools, we provide a means for estimating the size of database-elusive peptide space in our LC-MS/MS data, enabling a rough estimation of metaproteome completeness. This measure can help shape future decision-making processes regarding the need for additional shotgun metagenomic work to support a given metaproteomics study.

      Experimental Procedures

      Patient Sample Collection

      Collection and use of all patient samples were approved by the Office for the Protection of Research Subjects at Scripps Research and Scripps Green Hospital (IRB protocol IRB-14-6352). The written informed consent was obtained from all subjects in accordance with the Declaration of Helsinki.
      Volunteers self-collected their own stool samples using administered standardized in-home sample collection kits and were instructed to immediately freeze specimens at −20 °C. Samples were stored in provided consumer-grade −20 °C minifreezers immediately after collection, transported by courier services on dry ice, and stored in laboratory-grade freezers at −20 °C until microbial extraction. Collected stool samples were highly heterogeneous in color, texture, and viscosity both before and after microbial extraction.

      Microbe Extraction

      Stool samples were thawed to room temperature, diluted in PBS (pH 7.4), vortexed thoroughly to yield slurries, and then centrifuged at 100g for 1 min. The flocculent upper layer was extracted, filtered over 70 μm nylon mesh cell strainers to remove large, recalcitrant masses, and then centrifuged at 8000g for 5 min to pellet. Pellets were rinsed twice with PBS, then resuspended in PBS to a density of 100 mg wet microbial pellet per 500 μl of suspension.

      DNA Extraction, 16S rRNA Gene Sequencing, and Data Processing

      Microbial DNA was extracted from thawed fecal microbe aliquots using a fecal/soil extraction kit (Zymo Research, Irvine, CA, USA). In total, 50 to 100 ng of DNA per patient sample was submitted to the Scripps Research genomics core for next-generation sequencing, which was performed using the MiSeq platform (Illumina Inc). For taxonomy based on 16S rRNA gene amplicon sequencing, we targeted the bacterial 16S V4 region using a 300 bp paired-end approach aiming for 100,000 reads per sample. Reads were taxonomically mapped using QIIME2 (and associated plug-ins) and classifiers created from the SILVA 132 database (
      • Bolyen E.
      • Rideout J.R.
      • Dillon M.R.
      • Bokulich N.A.
      • Abnet C.C.
      • Al-Ghalith G.A.
      • Alexander H.
      • Alm E.J.
      • Arumugam M.
      • Asnicar F.
      • Bai Y.
      • Bisanz J.E.
      • Bittinger K.
      • Brejnrod A.
      • Brislawn C.J.
      • et al.
      Reproducible, interactive, scalable, and extensible microbiome data science using QIIME 2.
      ,
      • Quast C.
      • Pruesse E.
      • Yilmaz P.
      • Gerken J.
      • Schweer T.
      • Yarza P.
      • Peplies J.
      • Glöckner F.O.
      The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools.
      ). For access to raw data, see Zenodo repository (
      • Thuy-Boun P.S.
      • Wolan D.W.
      Quantitative metaproteomics and activity-based protein profiling of patient fecal microbiota identifies host and microbial proteins associated with ulcerative colitis.
      ). For detailed methods, see Additional File 1.

      Protein Extraction, Protein Digestion, Proteomics Data Collection

      Protein was extracted according to a previously described protocol (
      • Wang A.Y.
      • Thuy-Boun P.S.
      • Stupp G.S.
      • Su A.I.
      • Wolan D.W.
      Triflic acid treatment enables LC-MS/MS analysis of insoluble bacterial biomass.
      ). Extracted protein was resuspended in H2O, and concentration was measured by BCA assay (ThermoFisher). Extracted microbiome protein (100 μg) was reduced, alkylated, trypsinized, and desalted (ZipTip C18, MilliporeSigma) prior to LC-MS/MS analysis. Desalted peptides (1 μg) were separated using a 4-h C18 gradient by nano-flow liquid chromatography coupled to an Orbitrap Fusion Tribrid (Thermo Fisher) operating in data dependent mode. Both MS1 and MS2 spectra were recorded in the Orbitrap at 120K and 30K resolution, respectively. For detailed methods, see Additional File 1.

      Proteomics Data Analysis

      We collected a total of 2,829,920 MS2 spectra between all 18 patient samples. These spectra were searched against the ComPIL 2.0 database (contains 4.8 billion unique tryptic peptides from >225 million forward and reverse protein sequences) (
      • Chatterjee S.
      • Stupp G.S.
      • Park S.K.R.
      • Ducom J.-C.
      • Yates J.R.
      • Su A.I.
      • Wolan D.W.
      A comprehensive and scalable database search system for metaproteomics.
      ,
      • Park S.K.R.
      • Jung T.
      • Thuy-Boun P.S.
      • Wang A.Y.
      • Yates J.R.
      • Wolan D.W.
      ComPIL 2.0: An updated comprehensive metaproteomics database.
      ) using the ProLuCID/SEQUEST search engine (
      • Eng J.K.
      • McCormack A.L.
      • Yates J.R.
      An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.
      ,
      • Xu T.
      • Venable J.D.
      • Park S.K.
      • Cociorva D.
      • Lu B.
      • Liao L.
      • Wohlschlegel J.
      • Hewel J.
      • Yates J.R.
      ProLuCID, a fast and sensitive tandem mass spectra-based protein identification program.
      ,
      • Xu T.
      • Park S.K.
      • Venable J.D.
      • Wohlschlegel J.A.
      • Diedrich J.K.
      • Cociorva D.
      • Lu B.
      • Liao L.
      • Hewel J.
      • Han X.
      • Wong C.
      • Fonslow B.
      • Delahunty C.
      • Gao Y.
      • Shah H.
      • et al.
      ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity.
      ). In total, 523,155 (18.5%) MS2 spectra were mapped to 54,378 distinct peptides at a 1% peptide false discovery rate (two peptide per protein minimum) using a target-decoy strategy (
      • Elias J.E.
      • Gygi S.P.
      Target-decoy search strategy for increased confidence in large-scale protein identification by mass spectrometry.
      ). In total, 576,625 protein sequences were identified and clustered into 95,000 protein groups at a 95% sequence similarity cutoff using CD-HIT (
      • Li W.
      • Jaroszewski L.
      • Godzik A.
      Clustering of highly homologous sequences to reduce the size of large protein database.
      ,
      • Li W.
      • Godzik A.
      Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences.
      ). Quantification at the MS1 level was performed using FlashLFQ with a match-between-runs strategy enabled (10 ppm precursor tolerance, 15 min window) (
      • Millikin R.J.
      • Solntsev S.K.
      • Shortreed M.R.
      • Smith L.M.
      Ultrafast peptide label-free quantification with FlashLFQ.
      ). Peptide MS1 area-under-the-curve intensities were mapped to protein groups. Intensity belonging to peptides that mapped to >1 protein group were excluded. After removing protein groups with too many missing values (protein groups were removed if: (1) both conditions contained only null values, (2) one condition contained null values and the other contained <4 non-null values, or (3) both conditions contained <4 nonnull values each), 4622 protein groups remained for differential expression analysis, which was performed using Limma as part of the DEP package in the R statistical environment (
      • Ritchie M.E.
      • Phipson B.
      • Wu D.
      • Hu Y.
      • Law C.W.
      • Shi W.
      • Smyth G.K.
      Limma powers differential expression analyses for RNA-sequencing and microarray studies.
      ,
      • Zhang X.
      • Smits A.H.
      • van Tilburg G.B.
      • Ovaa H.
      • Huber W.
      • Vermeulen M.
      Proteome-wide identification of ubiquitin interactions using UbIA-MS.
      ). Protein groups were annotated with GO terms using InterProScan; these annotations were used for GO enrichment analysis in the GOstats package. GO relative abundance analysis was performed before removal of protein groups with too many missing values (
      • Jones P.
      • Binns D.
      • Chang H.Y.
      • Fraser M.
      • Li W.
      • McAnulla C.
      • McWilliam H.
      • Maslen J.
      • Mitchell A.
      • Nuka G.
      • Pesseat S.
      • Quinn A.F.
      • Sangrador-Vegas A.
      • Scheremetjew M.
      • Yong S.Y.
      • et al.
      InterProScan 5: Genome-scale protein function classification.
      ,
      • Burge S.
      • Kelly E.
      • Lonsdale D.
      • Mutowo-Muellenet P.
      • McAnulla C.
      • Mitchell A.
      • Sangrador-Vegas A.
      • Mulder N.
      • Hunter S.
      Manual GO annotation of predictive protein signatures: The InterPro approach to GO curation.
      ,
      • Falcon S.
      • Gentleman R.
      Using GOstats to test gene lists for GO term association.
      ). Peptides were mapped to their respective taxa of origin using Unipept (
      • Mesuere B.
      • Devreese B.
      • Debyser G.
      • Aerts M.
      • Vandamme P.
      • Dawyndt P.
      Unipept: Tryptic peptide-based biodiversity analysis of metaproteome samples.
      ,
      • Mesuere B.
      • Debyser G.
      • Aerts M.
      • Devreese B.
      • Vandamme P.
      • Dawyndt P.
      The Unipept metaproteomics analysis pipeline.
      ,
      • Singh R.G.
      • Tanca A.
      • Palomba A.
      • Van der Jeugt F.
      • Verschaffelt P.
      • Uzzau S.
      • Martens L.
      • Dawyndt P.
      • Mesuere B.
      Unipept 4.0: Functional analysis of metaproteome data.
      ). Note that ideally, metaproteomics should be performed in conjunction with metagenomic sequencing to generate matched customized proteome databases. Protein provenance could more confidently be traced in this scenario enabling greater precision during taxonomy analysis. In the absence of matched metagenomic data, Unipept can provide peptide-based taxonomic mapping information but may do so with less accuracy. Peptides were then mapped to taxa to construct relative abundance tables and plots. Finally, de novo peptide sequencing was performed in Novor (
      • Needleman S.B.
      • Wunsch C.D.
      A general method applicable to search for similarities in amino acid sequence of 2 proteins.
      ) and database-de novo peptide comparisons were performed using the Scikit-bio Python library. For more detailed methods, see Additional File 1. For access to LC-MS/MS data, see PRIDE repository PXD022433 (
      • Thuy-Boun P.S.
      • Wolan D.W.
      Ulcerative colitis human gut microbiome.
      ). For de novo datasets, protein fasta files, and protein group files (CD-HIT clusters), see Zenodo repository (
      • Thuy-Boun P.S.
      • Wolan D.W.
      Quantitative metaproteomics and activity-based protein profiling of patient fecal microbiota identifies host and microbial proteins associated with ulcerative colitis.
      ) 10.5281/zenodo.5717460.

      Experimental Design and Statistical Rationale

      Single time-point stool samples from 18 patient volunteers were collected and analyzed. Healthy (n = 8, M/F ratio = 5:3, mean age = 44 years) and UC (n = 10, M/F ratio = 8:2, mean age = 46 years) volunteers consisted of a mixture of males and females between the ages of 21 to 76 years (global mean age = 45 years, global median age = 39 years) at the time of enrollment. UC patients presented with mild to severe symptoms and a range of Mayo scores (0–9) during enrollment. Individuals with BMI values >60, as well as any recent antibiotics usage (<3 months prior to sample collection), severe diarrheal illnesses, or Clostridium difficile infections were excluded.
      For unenriched proteomics experiments, eight healthy biological replicates were compared against ten UC biological replicates (one technical replicate per biological replicate). Analyses did not rely on isotopic labels or internal standards. Instead, total protein and peptide concentrations were measured and normalized by BCA assay prior to LC-MS/MS. We chose to normalize samples by protein concentration to simplify comparisons, as collected patient stool samples were highly variable in volume, hydration, and consistency. This implies that in this study, proteins/protein group compositional fractions rather than absolute protein concentrations were compared between patients. ProLuCID/SEQUEST and DTASelect were used for peptide identification at a peptide-level FDR setting of 1% using a target-decoy strategy (
      • Eng J.K.
      • McCormack A.L.
      • Yates J.R.
      An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.
      ,
      • Xu T.
      • Venable J.D.
      • Park S.K.
      • Cociorva D.
      • Lu B.
      • Liao L.
      • Wohlschlegel J.
      • Hewel J.
      • Yates J.R.
      ProLuCID, a fast and sensitive tandem mass spectra-based protein identification program.
      ,
      • Xu T.
      • Park S.K.
      • Venable J.D.
      • Wohlschlegel J.A.
      • Diedrich J.K.
      • Cociorva D.
      • Lu B.
      • Liao L.
      • Hewel J.
      • Han X.
      • Wong C.
      • Fonslow B.
      • Delahunty C.
      • Gao Y.
      • Shah H.
      • et al.
      ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity.
      ,
      • Elias J.E.
      • Gygi S.P.
      Target-decoy search strategy for increased confidence in large-scale protein identification by mass spectrometry.
      ,
      • Tabb D.L.
      • McDonald W.H.
      • Yates J.R.
      DTASelect and contrast: Tools for assembling and comparing protein identifications from shotgun proteomics.
      ). Quantification was performed by MS1 peak intensity using a match-between-runs strategy. All protein sequences were grouped/clustered at a 95% sequence similarity cutoff using CD-HIT, then peptide intensities were mapped to protein groups/clusters (
      • Li W.
      • Godzik A.
      Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences.
      ). Peptides mapping to >1 group/cluster were excluded. Peptide intensities were summed within protein groups/clusters unless otherwise noted. Protein group/cluster intensities were normalized using the Limma/DEP package function “normalize_vsn” (variance stabilizing normalization) within the R statistical computing environment (
      • Ritchie M.E.
      • Phipson B.
      • Wu D.
      • Hu Y.
      • Law C.W.
      • Shi W.
      • Smyth G.K.
      Limma powers differential expression analyses for RNA-sequencing and microarray studies.
      ,
      • Zhang X.
      • Smits A.H.
      • van Tilburg G.B.
      • Ovaa H.
      • Huber W.
      • Vermeulen M.
      Proteome-wide identification of ubiquitin interactions using UbIA-MS.
      ). Protein group differential enrichment testing was performed using the “test_diff” function within the Limma/DEP package within the R statistical computing environment (
      • Ritchie M.E.
      • Phipson B.
      • Wu D.
      • Hu Y.
      • Law C.W.
      • Shi W.
      • Smyth G.K.
      Limma powers differential expression analyses for RNA-sequencing and microarray studies.
      ,
      • Zhang X.
      • Smits A.H.
      • van Tilburg G.B.
      • Ovaa H.
      • Huber W.
      • Vermeulen M.
      Proteome-wide identification of ubiquitin interactions using UbIA-MS.
      ). Differential testing q-values were calculated using the “qvalue” package in the R statistical environment, and a threshold of q < 0.1 was chosen as a relevant cutoff value for further analysis and discussion (
      • Storey J.D.
      • Tibshirani R.
      Statistical significance for genomewide studies.
      ). Note that at more strict q-value thresholds, several protein groups were found to be significant (68 protein groups significant at q < 0.01, 55 protein groups at q < 0.001), but the modest q < 0.1 threshold was chosen to enable a more broad view of potential disease-associated protein functions with the acknowledged caveat that approximately 18 of 176 significantly enriched protein groups are false positives. Please see Additional File 1 for more details.
      For FP-probe-enriched proteomics experiments, one healthy biological replicate and two UC biological replicates were analyzed by LC-MS/MS (one technical replicate for each sample). The stool samples used for these FP-enriched experiments were identical to patient stool used earlier for unenriched experiments. These FP-enriched experiments were qualitative and not statistically powered.

      Results

      Metaproteomics Yields a More Comprehensive Taxonomic Diversity Than 16S rRNA Gene Analysis

      At the kingdom level, LC-MS/MS-based proteomics identified peptides mapping to bacteria (38.3%), eukaryotes (43.5%) (including host), archaea (<0.1%), and viruses as well as a significant proportion of unassigned peptide (18.2%) [Additional File 1, supplemental Fig. S1]. In contrast, a majority of 16S reads (>99%) were attributable to bacteria with a much smaller proportion attributable to archaea (<0.1%) and <0.1% remaining unassigned [Additional File 1, supplemental Fig. S1].
      Large discrepancies in taxonomic resolution begin to emerge at the phylum level. By LC-MS/MS proteomics, we identified 46 phyla, including Ascomycota, Basidiomycota, Spirochaetes, Chordata, and Streptophyta in addition to the eight identified by 16S rRNA gene sequencing alone (Euryarchaeota, Actinobacteria, Bacteroidetes, Cyanobacteria, Firmicutes, Fusobacteria, Proteobacteria, and Verrucomicrobia) (Fig. 1). Firmicutes account for only 22.7% of the sample composition according to the LC-MS/MS; however, this phylum dominates the composition of patient microbiota according to 16S amplicon sequencing and accounts for 86.2% of all reads. Interestingly, the abundance of Bacteroidetes was relatively minimal by 16S rRNA gene sequencing (except for samples H5, UC9, UC15, UC23) yielding a Bacteroidetes:Firmicutes ratio of approximately 1:100 (Fig. 1). In contrast, Bacteroidetes account for an average of 4.2% of the microbiota peptide content by LC-MS/MS-based proteomics yielding a Bacteroidetes:Firmicutes ratio of approximately 1:5. The majority of identified peptides identified via metaproteomics predominantly originate from the Chordata phylum and are presumably host-derived. Additionally, a significant number of peptides are derived from the Streptophyta and are attributable to a variety of dietary plants, including Solanum tubersum (potato), Seasum indicum (sesame), Theobroma cacao (chocolate), Zea mays (corn), and Oryza sativa (rice) among many others [see Additional File 2]. Where comparable, we posit that differences in DNA extraction efficiencies (e.g., Gram+ versus. Gram-), differences in metabolic/secretory activities, and shared tryptic peptides between microbes likely contribute to the discrepancy in taxonomic compositions between 16S gene sequencing and proteomics methods.
      Figure thumbnail gr1
      Fig. 1Relative abundance plots for all 18 patient samples. Comparison of microbiome phylum-level taxonomy by 16S rRNA gene amplicon sequencing (lower 18 bars; based on sequence counts) against LC-MS/MS bottom-up proteomics (upper 18 bars; based on peptide intensities, note that shared peptides were grouped in the “Unassigned/unknown” category). LC-MS/MS, liquid chromatography tandem mass spectrometry.
      The trend of increased taxonomic diversity across microbiota samples observed by LC-MS/MS proteomics over 16S amplicon sequencing is conserved at each classification tier [Additional File 1, supplemental Figs. S1–S7]. At the species level, we detected a total of 848 species/strains by LC-MS/MS and only 38 by amplicon sequencing. Despite the predicted increase in diversity by LC-MS/MS, an average of 85.9% of the identifiable peptides are not mappable to a particular species. This observation is due to the redundancy and conservation of microbial proteins across distinct and divergent species. Thus, taxonomic predictions based on the peptide composition in samples become increasingly difficult with more granular classification levels.

      176 Protein Groups Are Significantly Altered Between the Healthy and UC Cohorts

      Differential expression analysis yielded 176 protein groups (from host, microbes, and diet) significantly altered between healthy and UC patients (p ≤ 0.005 and q < 0.1), with 65 groups enriched in healthy volunteers and 111 protein groups enriched in the UC volunteers (Fig. 2A and Additional File 3). Principal components analysis (PCA) of the dataset revealed a modest but distinguishable separation between the proteomic composition of healthy and UC fecal samples (Fig. 2B), which is in agreement with the Euclidean distance matrix generated for the same dataset (Fig. 2C).
      Figure thumbnail gr2
      Fig. 2Differential expression and protein network analysis of host and microbial proteins. A, volcano plot depicting differentially detected protein groups from patient fecal samples; whole plot (left), zoomed-in plot (right), red = significant (q < 0.1, foldchange > 2), green = not significant (q > 0.1, foldchange > 2), gray = N.S. (q > 0.1, foldchange ≤ 2), number adjacent red points represent unique “ClusterID” values corresponding to labels. B, principal component analysis of all 18 patient samples across all differentially tested contrasts; red = healthy individuals, blue = UC patients. C, euclidean distance plot of all 18 patient samples across all differentially tested contrasts, blue = more similarity, orange/red = less similarity. D, STRING protein network analysis of host protein groups differentially enriched in UC patients’ fecal samples at medium confidence (0.400); edges: known interactions (aqua: from curated database, magenta: experimentally determined), predicted interactions (green: gene neighborhood, red: gene fusions, blue: gene co-occurrence), others (chartreuse: text mining, black: coexpression, lavender: protein homology). UC, ulcerative colitis
      Twenty nine of the 111 protein groups significantly enriched in UC fecal samples were host-derived; however, no host protein groups were enriched (q-value < 0.1) in the healthy individuals. STRING analysis of significantly enriched UC host proteins yielded 26 edges among 25 nodes and a highly significant protein–protein interaction (PPI) enrichment p-value of 1.84 × 10−12 at medium confidence (0.400) suggesting a strong association between tested proteins (Fig. 2D and Additional File 4) (
      • Szklarczyk D.
      • Gable A.L.
      • Lyon D.
      • Junge A.
      • Wyder S.
      • Huerta-Cepas J.
      • Simonovic M.
      • Doncheva N.T.
      • Morris J.H.
      • Bork P.
      • Jensen L.J.
      • von Mering C.
      STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.
      ). At highest confidence (0.900), 13 edges between 25 nodes were found reinforcing a significant PPI enrichment p value of 2.0 × 10−11. At medium confidence, the most significant reactome pathways (fdr <0.001) were neutrophil degranulation, innate immune system, antimicrobial peptides, immune system, and metal sequestration by antimicrobial proteins. GO biological process terms (fdr <3.3 × 10−8; regulated exocytosis, neutrophil degranulation, secretion by cell, transport, leukocyte mediated immunity, and antimicrobial humoral response) and GO cellular component terms (fdr < 2 × 10−7; cytoplasmic vesicle lumen, secretory granule, secretory granule lumen, vesicle, cytoplasmic vesicle part, and cytoplasmic vesicle) associated with host proteins all support the assertion that host immune-related secretory events are prevalent in the gastrointestinal tracts of UC patients. Interestingly, because no host proteins were significantly enriched across healthy fecal samples, we posit that host-centric biological pathways associated with colitis occur in addition to rather than in lieu of processes associated with homeostasis.
      More than half of the nonhost protein groups have limited to no annotations despite being significantly altered. For example, 37 of 65 and 48 of 82 nonhost protein groups enriched in the healthy and UC cohorts, respectively (p ≤ 0.005 and q < 0.1), were poorly annotated in the ComPIL database (i.e., no annotation, annotated as hypothetical proteins or as domain of unknown function-containing proteins). While additional BLAST homology searches and InterProScan analyses were performed on each poorly annotated protein group, we were unable to make significant additional annotations for 11 of 37 protein groups enriched in the healthy cohort and 15 of 48 protein groups enriched in the ulcerative colitis cohort [see Additional File 4] (
      • Jones P.
      • Binns D.
      • Chang H.Y.
      • Fraser M.
      • Li W.
      • McAnulla C.
      • McWilliam H.
      • Maslen J.
      • Mitchell A.
      • Nuka G.
      • Pesseat S.
      • Quinn A.F.
      • Sangrador-Vegas A.
      • Scheremetjew M.
      • Yong S.Y.
      • et al.
      InterProScan 5: Genome-scale protein function classification.
      ,
      • Burge S.
      • Kelly E.
      • Lonsdale D.
      • Mutowo-Muellenet P.
      • McAnulla C.
      • Mitchell A.
      • Sangrador-Vegas A.
      • Mulder N.
      • Hunter S.
      Manual GO annotation of predictive protein signatures: The InterPro approach to GO curation.
      ,
      • Altschul S.F.
      • Gish W.
      • Miller W.
      • Myers E.W.
      • Lipman D.J.
      Basic local alignment search tool.
      ). Such protein groups represent interesting targets for structural and biochemical validation as further inquiry could elucidate their possible roles, in the propagation of inflammatory or anti-inflammatory processes.
      Significantly enriched and annotated nonhost protein groups among both healthy and UC cohorts had predicted and/or biochemically verified functions ranging from metabolism (glyceraldehyde-3-phosphate dehydrogenase and translation elongation factor Tu) to defense (type II secretion system protein). Notable nonhost entries that were increased in healthy volunteers include an acid-soluble spore protein (WP_071120403.1), methylene tetrahydrofolate reductase (SRS064276.159392-T1-C), and fruit bromelain (BROM1_ANACO). The enriched small spore protein is the only entry in its protein group and originates from the recently described bacterium Romboutsia timonensis, whose depletion has been associated with colorectal cancer incidence (
      • Ricaboni D.
      • Maihe M.
      • Khelaifia S.
      • Raoult D.
      • Million M.
      Romboutsia timonensis, a new species isolated from the human gut.
      ,
      • Mangifesta M.
      • Mancabelli L.
      • Milani C.
      • Gaiani F.
      • de'Angelis N.
      • de'Angelis G.L.
      • van Sinderen D.
      • Ventura M.
      • Turroni F.
      Mucosal microbiota of intestinal polyps reveals putative biomarkers of colorectal cancer.
      ). The methylene tetrahydrofolate reductase protein group enriched in the healthy cohort contains 29 members possessing similarity scores in the range of 98.6 to 100%. By BLAST analysis, these reductases likely originate from the Lachnospiraceae family of bacteria, and their examination could provide a glimpse into the microbial B-vitamin economy that importantly underpins host homeostasis, as humans are unable to de novo synthesize many essential B vitamins (
      • Engevik M.A.
      • Morra C.N.
      • Röth D.
      • Engevik K.
      • Spinler J.K.
      • Devaraj S.
      • Crawford S.E.
      • Estes M.K.
      • Kalkum M.
      • Versalovic J.
      Microbial metabolic capacity for intestinal folate production and modulation of host folate receptors.
      ,
      • Sharma V.
      • Rodionov D.A.
      • Leyn S.A.
      • Tran D.
      • Iablokov S.N.
      • Ding H.
      • Peterson D.A.
      • Osterman A.L.
      • Peterson S.N.
      B-vitamin sharing promotes stability of gut microbial communities.
      ,
      • Rossi M.
      • Amaretti A.
      • Raimondi S.
      Folate production by probiotic bacteria.
      ). Interestingly, fruit bromelain detected in four of eight healthy patient fecal extracts is a pineapple-derived cysteine protease we did not expect to encounter (
      • Hale L.P.
      • Chichlowski M.
      • Trinh C.T.
      • Greer P.K.
      Dietary supplementation with fresh pineapple juice decreases inflammation and colonic neoplasia in IL-10-deficient mice with colitis.
      ,
      • Fitzhugh D.J.
      • Shan S.Q.
      • Dewhirst M.W.
      • Hale L.P.
      Bromelain treatment decreases neutrophil migration to sites of inflammation.
      ). This protease is commonly sold as an over-the-counter supplement or as a component of meat tenderizers, and its detection may be an artifact introduced through patients’ diets.
      With respect to protein groups enriched in UC patients, notable annotated nonhost entries include hyaluronan glucosaminidase (CLONEX_02,131), a transglycosylase SLT domain-containing protein (HMPREF0462_0704), and a metallohydrolase (WP_081140786.1). The enriched hyaluronan glucosaminidase group contains four members with almost identical sequences (99.95–100% identity). These proteins likely originate from the Lachnospiraceae family members Tyzzerlla or Coprococcus. Hyaluronan is a high-molecular-weight carbohydrate component of the human extracellular matrix that can serve as an inflammatory/injury signal for host immune receptors when degraded by host hyaluronidases (
      • de la Motte C.A.
      Hyaluronan in intestinal homeostasis and inflammation: Implications for fibrosis.
      ). Hyaluronan glucosaminidase activity may exacerbate host inflammatory processes and contribute to UC-related inflammation, as well as afford microbes the ability to infiltrate host barriers. The transglycosylase SLT (lytic) domain-containing protein group includes 70 members with sequence identities of 95.71 to 100% compared with the Helicobacter pylori (H. pylori) enzyme. These enzymes catalyze the nonhydrolytic intramolecular cyclization of N-acetylmuramyl residues on bacterial cell walls propagating the cell wall remodeling process (
      • Chaput C.
      • Labigne A.
      • Boneca I.G.
      Characterization of Helicobacter pylori lytic transglycosylases Slt and MltD.
      ,
      • Dik D.A.
      • Marous D.R.
      • Fisher J.F.
      • Mobashery S.
      Lytic transglycosylases: Concinnity in concision of the bacterial cell wall.
      ). For H. pylori, previous studies demonstrate the importance of transglycosylase-generated cell wall muropeptide fragments to inducing host inflammation, which can in turn promote gut colonization (
      • Viala J.
      • Chaput C.
      • Boneca I.G.
      • Cardona A.
      • Girardin S.E.
      • Moran A.P.
      • Athman R.
      • Mémet S.
      • Huerre M.R.
      • Coyle A.J.
      • DiStefano P.S.
      • Sansonetti P.J.
      • Labigne A.
      • Bertin J.
      • Philpott D.J.
      • et al.
      Nod1 responds to peptidoglycan delivered by the Helicobacter pylori cag pathogenicity island.
      ,
      • Wyckoff T.J.
      • Taylor J.A.
      • Salama N.R.
      Beyond growth: Novel functions for bacterial cell wall hydrolases.
      ). Finally, the enriched metallohydrolase originates from Pantoea latae and is annotated as a nonpeptide amide C–N bond hydrolase that may inactivate amide-containing molecules such as lactams, which are contained in an important class of antibiotics (
      • Wright G.D.
      Bacterial resistance to antibiotics: Enzymatic degradation and modification.
      ,
      • Blair J.M.A.
      • Webber M.A.
      • Baylay A.J.
      • Ogbolu D.O.
      • Piddock L.J.V.
      Molecular mechanisms of antibiotic resistance.
      ). A complete list of protein groups found differentially expressed in this patient cohort can be found in the supplement [see Additional File 4].

      Serine-type Endopeptidase Activity is Significantly Enriched in UC Samples

      For GO term relative abundance analysis, we used mean peptide intensities for peptides within the same protein group to account for comparisons between proteins of different lengths. Of the 8538 quantifiable protein groups selected for relative abundance plotting, we identified 575, 394, and 85 terms for the molecular function, biological process, and cellular component GO namespaces, respectively. In general, GO relative abundance breakdowns between all samples for each namespace appear similar by unweighted (count-based) assembly (Fig. 3B and Additional File 1, supplemenal Figs. S8–S10). However, when weighted by corresponding ion intensities, GO term relative abundances between samples differ dramatically (Fig. 3A and Additional File 1, supplemental Figs. S8–S10). The “None” and “Other” categories occupy the largest areas both by unweighted and weighted assembly for all three GO namespaces. With respect to molecular function, global relative abundances (when averaged over all samples) for the terms glutamate-cysteine ligase activity (GO: 0004357), aminopeptidase activity (GO: 0004177), and serine-type endopeptidase activity (GO: 0004252) expand 31-, 18-, and 14-fold respectively going from an unweighted to weighted assembly. Conversely, global relative abundance for the terms enoyl-[acyl-carrier-protein] reductase (NADH) activity (GO: 0004318) and mismatched DNA binding (GO: 0030983) contract >50-fold going from unweighted to weighted assembly. Similar relative abundance expansions and contractions going from unweighted to weighted assemblies were observed for the biological process and cellular component namespaces [see Additional File 1, supplemental Figs. S9 and S10].
      Figure thumbnail gr3
      Fig. 3GO molecular function relative abundance plots for all 18 patient samples. Comparison of microbiota GO molecular function breakdown by LC-MS/MS using either (A) weighted measures (upper 18 bars; peptide intensity-based; to control for protein length, each GO term’s constituent protein group/cluster contributes the mean intensity of its constituent peptides, see for more detail) or (B) unweighted measures (lower 18 bars; count-based; each GO term’s constituent protein group/cluster contributes one count). Bar segments represent the proportion of each GO term’s intensity or count relative to the total intensity or count (respectively) for each patient sample. Loosely, (A) represents GO terms as a function of protein copy number and (B) represents GO terms as a function of protein sequence diversity. GO, gene ontology; LC-MS/MS, liquid chromatography tandem mass spectrometry.
      The large 31-fold unweighted-to-weighted relative abundance expansion of glutamate-cysteine ligase activity could be attributed to one protein group with one member (WP_027345637.1). This enzyme originates from Hamadaea tsunoensis and catalyzes a key step in the synthesis of glutathione, a key antioxidant for the microbiota (
      • Million M.
      • Armstrong N.
      • Khelaifia S.
      • Guilhot E.
      • Richez M.
      • Lagier J.C.
      • Dubourg G.
      • Chabriere E.
      • Raoult D.
      The antioxidants glutathione, ascorbic acid, and uric acid maintain butyrate production by human gut clostridia in the presence of oxygen in vitro.
      ). The 18-fold unweighted-to-weighted relative abundance expansion observed for the aminopeptidase activity term originated from 19 protein groups with one very dominant protein group (WP_027209280.1) representing 96.2% of the GO term namespace’s relative abundance. This predicted M18 family protease originates from Butyrivibrio hungatei but currently has no structural or biochemical annotation. Finally, the 14-fold expansion observed for the serine-type endopeptidase activity GO term is attributable to 38 protein groups with the majority share originating from host (85.7%) and minor shares originating from pig (13.5%) and microbes (0.8%). The serine-type endopeptidase from pig is an artifact, as it originates from the sequencing grade porcine trypsin used to generate peptides for LC-MS/MS analysis. Interestingly, human chymotrypsin-like elastase 3A (CEL3A_Human) and chymotrypsin-C (CTRC_human) are more abundant than porcine trypsin, resulting in 35.8% and 14.9% of the GO term share compared with porcine trypsin. Other prominent protein groups (less abundant than porcine trypsin) include chymotrypsin-like elastase 3B (CEL3B_human), cathepsin G (CATG_human), and trypsin-1 (TRY1_human) and comprise 11.3%, 5.1%, and 3.0% of the serine-type endopeptidase activity GO term, respectively.
      We performed GO enrichment analysis in the R GOstats package with GO terms corresponding to differentially expressed protein groups serving as the enriched GO term set and GO terms mapping to all nondifferentially expressed protein groups as the “universe” set (
      • Falcon S.
      • Gentleman R.
      Using GOstats to test gene lists for GO term association.
      ). GO terms overrepresented in either healthy or UC cohorts for each GO namespace (p < 0.01) are listed in the supplement [see Additional File 1, supplemental Figs. S11–S18].
      For the healthy patient cohort, nine molecular function GO terms were enriched including hydrolase activity (hydrolyzing O-glycosyl compounds) (GO:0016787), cysteine-type peptidase activity (GO:0008234), rRNA binding (GO:0019843), and oxidoreductase activity (acting on iron–sulfur proteins as donors) (GO:0016730) among the most significantly enriched (p < 0.001, odds ratio >10). The biological process namespace contained nine GO terms enriched in healthy patients with polysaccharide catabolic process (GO:0000272), homeostatic process (GO:0042592), sulfur amino acid metabolic process (GO:0000096), nitrogen fixation (GO:0009399), and asexual sporulation (GO:0030436) among the most significantly enriched terms (p < 0.001, odds ratio >20). Only three GO cellular component terms were found to be enriched in healthy patients including oxidoreductase complex (GO:1990204), vanadium–iron nitrogenase complex (GO:0016613), and endospore-forming forespore (GO:0042601) (p < 0.001, odds ratio >100). Together the three GO namespaces strongly associate carbohydrate processing and microbial sporulation activities with healthy patients.
      For the ulcerative colitis cohort, 21 terms were enriched in the molecular function GO namespace including calcium ion binding (GO:0005509), peptidase activity (acting on L-amino acid peptides) (GO:0008233), lipid binding (GO:0008289), catalytic activity (acting on a protein) (GO:0140096), serine-type endopeptidase activity (GO:0004252), serine hydrolase activity (GO:0017171), and calcium-dependent phospholipid binding (GO:0005544) (p < 0.001, odds ratio >10) [see Additional File 5]. The biological process namespace contains 18 entries including proteolysis (GO:0006508), aromatic amino acid family metabolic processes (GO:0009072), propionate metabolic process (methylcitrate cycle) (GO:0019679), acetate metabolic process (GO:0006083), short-chain fatty acid metabolic process (GO:0046459), antibacterial humoral response (GO:0019731), antifungal humoral response (GO:0019732), regulation of cytokine production (GO:0001817), and response to fungus (GO:0009620) among the most enriched terms (p < 0.003, odds ratio >10). The enriched cellular component term list was much shorter with only five entries, including extracellular space (GO:0005615), organelle lumen (GO:0043233), endoplasmic reticulum lumen (GO:0005788), intracellular membrane-bound organelle (GO:0043231), and extracellular region (GO:0005576) (p < 0.008, odds ratio >10). Together, these enriched GO terms associate the extracellular space/secretome, antimicrobial host responses and serine protease activity with UC patients in our study [see Additional File 1, supplemental Figs. S11 and S17]. Interestingly, serine proteases, and more broadly serine hydrolases, are an intensely studied class of enzymes with exquisitely selective mechanism-based fluorophosphonate probes described in the literature (
      • Liu Y.
      • Patricelli M.P.
      • Cravatt B.F.
      Activity-based protein protein profiling: The serine hydrolases.
      ). These probes present an opportunity to better examine this class of enzyme in the context of human UC.

      ABPP Confirms the Presence of Active Serine-type Endopeptidases and Identifies Previously Undetected Serine-type Endopeptidases

      We treated three patient fecal samples with a biotinylated fluorophosphonate-based (FP-based) serine-reactive probe to label and thus establish whether serine hydrolases were active in patient fecal samples (Fig. 4A) (
      • Liu Y.
      • Patricelli M.P.
      • Cravatt B.F.
      Activity-based protein protein profiling: The serine hydrolases.
      ). Biotinylated fluorophosphonate probe (FP probe)-labeled proteins were enriched with streptavidin-agarose beads, visualized by Western blot (Fig. 4B) and identified by LC-MS/MS analysis using a previously described two-step large-to-focused database search strategy (
      • Jagtap P.
      • Goslinga J.
      • Kooren J.A.
      • McGowan T.
      • Wroblewski M.S.
      • Seymour S.L.
      • Griffin T.J.
      A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies.
      ). Analysis of the LC-MS/MS data revealed several hundred noncontaminant protein sequences. These sequences were clustered into 104 distinct protein groups (95% similarity cutoff using CD-HIT) and further reduced to 63 protein groups with highly homologous host proteins condensed together. Of note, 27 and 35 protein groups derived from host (Fig. 4C) and nonhost (Fig. 4D) were identified, respectively.
      Figure thumbnail gr4
      Fig. 4Targeting serine hydrolases via activity-based protein profiling. A, FP probe structure and reaction schematic for covalent attachment to nucleophilic active-site serine in hydrolases. B, Western blot of three patient fecal lysates treated and enriched with FP probe followed by streptavidin bead enrichment visualized with fluorophore-conjugated streptavidin. Host (C) and nonhost (D) proteins from patient fecal samples enriched by FP probe and detected by LC-MS/MS (blue = protein detected in corresponding patient sample, orange = protein not detected in corresponding patient sample, green highlights = annotated proteases). FP, fluorophosphonate; LC-MS/MS, liquid chromatography tandem mass spectrometry.
      The majority of host-derived protein groups labeled and enriched with the FP probe were also identified within the unenriched LC-MS/MS datasets [see Additional File 4]. Fourteen of 27 FP probe-enriched host proteins are known serine hydrolases including the chymotrypsin-like elastase family (2A, 2B, 3A, 3B), cathepsin G, dipeptidyl peptidase 4, neutrophil elastase, myeloblastin, trypsin 1, and phospholipase A2 (Fig. 4C). Enrichment of these particular hydrolases over all other host proteins provides confidence that the FP probe is selective for nucleophilic serine hydrolases in the tremendously complex fecal protein matrix. Aside from demonstrating that the hydrolases are active, these results also suggest that this fraction of proteases remain uninhibited by antiproteolytic proteins often found in the gut (
      • Silverman G.A.
      • Bird P.I.
      • Carrell R.W.
      • Church F.C.
      • Coughlin P.B.
      • Gettins P.G.
      • Irving J.A.
      • Lomas D.A.
      • Luke C.J.
      • Moyer R.W.
      • Pemberton P.A.
      • Remold-O'Donnell E.
      • Salvesen G.S.
      • Travis J.
      • Whisstock J.C.
      The serpins are an expanding superfamily of structurally similar but functionally diverse proteins.
      ,
      • Kriaa A.
      • Jablaoui A.
      • Mkaouar H.
      • Akermi N.
      • Maguin E.
      • Rhimi M.
      Serine proteases at the cutting edge of IBD: Focus on gastrointestinal inflammation.
      ,
      • Uchiyama K.
      • Naito Y.
      • Takagi T.
      • Mizushima K.
      • Hirai Y.
      • Hayashi N.
      • Harusato A.
      • Inoue K.
      • Fukumoto K.
      • Yamada S.
      • Handa O.
      • Ishikawa T.
      • Yagi N.
      • Kokura S.
      • Yoshikawa T.
      Serpin B1 protects colonic epithelial cell via blockage of neutrophil elastase activity and its expression is enhanced in patients with ulcerative colitis.
      ).
      Most nonhost proteins are microbial in origin with the exception of streptavidin and porcine trypsin introduced during sample preparation (Fig. 4D). The most promising FP probe-susceptible proteins include protease Do entries (DegP), S9 family peptidases, and a beta-lactamase, as determined by sequence analysis. Interestingly, of the ten identified nonhost serine hydrolase protein groups, only one (SRS019397.59782-T1-C) was detected without FP-probe demonstrating the utility of chemical-based enrichment strategies for the identification of novel proteins in a complex environment. Of the 167,554 MS2 spectra collected for all FP-enriched LC-MS/MS data sets, only 5352 (3.2%) were assigned by ComPIL database searches. There is a strong likelihood that other serine hydrolase-derived peptides are present in our microbiota samples, but they remain unidentified due to limitations imposed by the incompleteness problem associated with metaproteomics database searching (
      • Tanca A.
      • Palomba A.
      • Fraumene C.
      • Pagnozzi D.
      • Manghina V.
      • Deligios M.
      • Muth T.
      • Rapp E.
      • Martens L.
      • Addis M.F.
      • Uzzau S.
      The impact of sequence database choice on metaproteomic results in gut microbiota studies.
      ). Unfortunately, the techniques for database-independent, high-confidence identification of these peptides and their parent protein sequences are currently not well established.

      De Novo Peptide Sequencing Enables Glimpses into the Dark Peptidome

      Of the 2,829,920 MS2 fragmentation spectra we collected overall, 523,155 (18.5%) were matched to a corresponding peptide with a 1% peptide false positive rate using a target-decoy search strategy paired with the ComPIL database. The modest number of peptide spectrum matches we observed is likely attributable to [1] a loss in filtering sensitivity that often accompanies database expansion (
      • Jagtap P.
      • Goslinga J.
      • Kooren J.A.
      • McGowan T.
      • Wroblewski M.S.
      • Seymour S.L.
      • Griffin T.J.
      A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies.
      ,
      • Kumar P.
      • Johnson J.E.
      • Easterly C.
      • Mehta S.
      • Sajulga R.
      • Nunn B.
      • Jagtap P.D.
      • Griffin T.J.
      A sectioning and database enrichment approach for improved peptide spectrum matching in large, genome-guided protein sequence databases.
      ) and [2] a never-complete database that is perennially associated with metaproteomics. We posit that a nontrivial portion of unmatched MS2 spectra map to either known or unknown peptide sequences, and we aim to estimate the size of unmatched MS2 spectrum space or “the dark peptidome,” using a complimentary de novo peptide sequencing approach (
      • Ma B.
      • Johnson R.
      De novo sequencing and homology searching.
      ).
      We subjected MS2 spectra from all patient fecal sample LC-MS/MS datasets to de novo peptide sequencing using the Novor algorithm (
      • Ma B.
      Novor: Real-time peptide de novo sequencing software.
      ). Novor attempts to deduce peptide sequence from MS2 fragmentation spectra, generating a de novo peptide spectrum match (PSM) and an accompanying confidence score (Novor score; higher scores indicated better predicted matches). Where available, we paired de novo PSMs with their corresponding database PSMs (ComPIL2-assigned) and calculated an additional Novor-ComPIL2 similarity score (de novo database similarity score) based on the Needleman–Wunsch comparison algorithm (raw scores were scaled to 100, where 100 represents a perfect match) (
      • Needleman S.B.
      • Wunsch C.D.
      A general method applicable to search for similarities in amino acid sequence of 2 proteins.
      ). We used these values to construct overlapping histograms [Fig. 5A and Additional File 1, supplemental Figs. S19 and S20] and joint plots [Fig. 5B and Additional File 1, supplemental Figs. S21 and S22] depicting possible unidentified peptide space in patient fecal microbiota samples.
      Figure thumbnail gr5
      Fig. 5Estimating the size of database-elusive peptide (“dark peptidome”) space via de novo peptide sequencing. A, overlapping histograms of all H2 patient sample MS2 spectra by Novor score (0–100, x-axis); darkest green area represents MS2 correctly assigned by Novor determined by comparison to ComPIL (database) result; lightest green area represents MS2 without ComPIL peptide assignments; dashed vertical red line represents Novor score = 75 cutoff and MS2 to the right of this line were used to estimate the size of unassigned peptide space. B, joint plot of H2 patient sample MS2 spectra depicting correlation between Novor score (0–100) and Novor-ComPIL similarity score (0–100). C, number of MS2 spectra from all patient samples with Novor scores >75 that likely represent peptides but do not have ComPIL peptide matches (“dark peptidome”). With duplicates removed, the total number of proteins detected between all 18 samples is 576,625. ComPIL, comprehensive protein identification library.
      ComPIL database-assigned PSMs were nonuniformly distributed along the Novor score axis with a larger proportion of database PSMs grouped near the high-confidence de novo sequencing Novor scores (
      • Thuy-Boun P.S.
      • Wolan D.W.
      Ulcerative colitis human gut microbiome.
      ,
      • Tabb D.L.
      • McDonald W.H.
      • Yates J.R.
      DTASelect and contrast: Tools for assembling and comparing protein identifications from shotgun proteomics.
      ,
      • Storey J.D.
      • Tibshirani R.
      Statistical significance for genomewide studies.
      ,
      • Szklarczyk D.
      • Gable A.L.
      • Lyon D.
      • Junge A.
      • Wyder S.
      • Huerta-Cepas J.
      • Simonovic M.
      • Doncheva N.T.
      • Morris J.H.
      • Bork P.
      • Jensen L.J.
      • von Mering C.
      STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.
      ,
      • Altschul S.F.
      • Gish W.
      • Miller W.
      • Myers E.W.
      • Lipman D.J.
      Basic local alignment search tool.
      ,
      • Ricaboni D.
      • Maihe M.
      • Khelaifia S.
      • Raoult D.
      • Million M.
      Romboutsia timonensis, a new species isolated from the human gut.
      ,
      • Mangifesta M.
      • Mancabelli L.
      • Milani C.
      • Gaiani F.
      • de'Angelis N.
      • de'Angelis G.L.
      • van Sinderen D.
      • Ventura M.
      • Turroni F.
      Mucosal microbiota of intestinal polyps reveals putative biomarkers of colorectal cancer.
      ,
      • Engevik M.A.
      • Morra C.N.
      • Röth D.
      • Engevik K.
      • Spinler J.K.
      • Devaraj S.
      • Crawford S.E.
      • Estes M.K.
      • Kalkum M.
      • Versalovic J.
      Microbial metabolic capacity for intestinal folate production and modulation of host folate receptors.
      ,
      • Sharma V.
      • Rodionov D.A.
      • Leyn S.A.
      • Tran D.
      • Iablokov S.N.
      • Ding H.
      • Peterson D.A.
      • Osterman A.L.
      • Peterson S.N.
      B-vitamin sharing promotes stability of gut microbial communities.
      ,
      • Rossi M.
      • Amaretti A.
      • Raimondi S.
      Folate production by probiotic bacteria.
      ,
      • Hale L.P.
      • Chichlowski M.
      • Trinh C.T.
      • Greer P.K.
      Dietary supplementation with fresh pineapple juice decreases inflammation and colonic neoplasia in IL-10-deficient mice with colitis.
      ,
      • Fitzhugh D.J.
      • Shan S.Q.
      • Dewhirst M.W.
      • Hale L.P.
      Bromelain treatment decreases neutrophil migration to sites of inflammation.
      ,
      • de la Motte C.A.
      Hyaluronan in intestinal homeostasis and inflammation: Implications for fibrosis.
      ,
      • Chaput C.
      • Labigne A.
      • Boneca I.G.
      Characterization of Helicobacter pylori lytic transglycosylases Slt and MltD.
      ,
      • Dik D.A.
      • Marous D.R.
      • Fisher J.F.
      • Mobashery S.
      Lytic transglycosylases: Concinnity in concision of the bacterial cell wall.
      ,
      • Viala J.
      • Chaput C.
      • Boneca I.G.
      • Cardona A.
      • Girardin S.E.
      • Moran A.P.
      • Athman R.
      • Mémet S.
      • Huerre M.R.
      • Coyle A.J.
      • DiStefano P.S.
      • Sansonetti P.J.
      • Labigne A.
      • Bertin J.
      • Philpott D.J.
      • et al.
      Nod1 responds to peptidoglycan delivered by the Helicobacter pylori cag pathogenicity island.
      ,
      • Wyckoff T.J.
      • Taylor J.A.
      • Salama N.R.
      Beyond growth: Novel functions for bacterial cell wall hydrolases.
      ,
      • Wright G.D.
      Bacterial resistance to antibiotics: Enzymatic degradation and modification.
      ,
      • Blair J.M.A.
      • Webber M.A.
      • Baylay A.J.
      • Ogbolu D.O.
      • Piddock L.J.V.
      Molecular mechanisms of antibiotic resistance.
      ,
      • Million M.
      • Armstrong N.
      • Khelaifia S.
      • Guilhot E.
      • Richez M.
      • Lagier J.C.
      • Dubourg G.
      • Chabriere E.
      • Raoult D.
      The antioxidants glutathione, ascorbic acid, and uric acid maintain butyrate production by human gut clostridia in the presence of oxygen in vitro.
      ,
      • Liu Y.
      • Patricelli M.P.
      • Cravatt B.F.
      Activity-based protein protein profiling: The serine hydrolases.
      ,
      • Jagtap P.
      • Goslinga J.
      • Kooren J.A.
      • McGowan T.
      • Wroblewski M.S.
      • Seymour S.L.
      • Griffin T.J.
      A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies.
      ,
      • Silverman G.A.
      • Bird P.I.
      • Carrell R.W.
      • Church F.C.
      • Coughlin P.B.
      • Gettins P.G.
      • Irving J.A.
      • Lomas D.A.
      • Luke C.J.
      • Moyer R.W.
      • Pemberton P.A.
      • Remold-O'Donnell E.
      • Salvesen G.S.
      • Travis J.
      • Whisstock J.C.
      The serpins are an expanding superfamily of structurally similar but functionally diverse proteins.
      ,
      • Kriaa A.
      • Jablaoui A.
      • Mkaouar H.
      • Akermi N.
      • Maguin E.
      • Rhimi M.
      Serine proteases at the cutting edge of IBD: Focus on gastrointestinal inflammation.
      ,
      • Uchiyama K.
      • Naito Y.
      • Takagi T.
      • Mizushima K.
      • Hirai Y.
      • Hayashi N.
      • Harusato A.
      • Inoue K.
      • Fukumoto K.
      • Yamada S.
      • Handa O.
      • Ishikawa T.
      • Yagi N.
      • Kokura S.
      • Yoshikawa T.
      Serpin B1 protects colonic epithelial cell via blockage of neutrophil elastase activity and its expression is enhanced in patients with ulcerative colitis.
      ,
      • Kumar P.
      • Johnson J.E.
      • Easterly C.
      • Mehta S.
      • Sajulga R.
      • Nunn B.
      • Jagtap P.D.
      • Griffin T.J.
      A sectioning and database enrichment approach for improved peptide spectrum matching in large, genome-guided protein sequence databases.
      ,
      • Ma B.
      • Johnson R.
      De novo sequencing and homology searching.
      ,
      • Ma B.
      Novor: Real-time peptide de novo sequencing software.
      ,
      • Browne H.P.
      • Forster S.C.
      • Anonye B.O.
      • Kumar N.
      • Neville B.A.
      • Stares M.D.
      • Goulding D.
      • Lawley T.D.
      Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation.
      ,
      • Fuks G.
      • Elgart M.
      • Amir A.
      • Zeisel A.
      • Turnbaugh P.J.
      • Soen Y.
      • Shental N.
      Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling.
      ,
      • Johnson J.S.
      • Spakowicz D.J.
      • Hong B.-Y.
      • Petersen L.M.
      • Demkowicz P.
      • Chen L.
      • Leopold S.R.
      • Hanson B.M.
      • Agresta H.O.
      • Gerstein M.
      • Sodergren E.
      • Weinstock G.M.
      Evaluation of 16S rRNA gene sequences for species and strain-level microbiome analysis.
      ) (Fig. 5A). While perfect de novo database agreements were rare, a large proportion of database PSMs possessed strong similarity to de novo PSMs, a relationship best depicted by a Novor score versus Novor-ComPIL2 similarity score joint plot (Fig. 5B). Thus, it is reasonable to expect that above a conservative cutoff value (Novor score = 75), significant numbers of MS2 spectra without database assignments correspond to peptides that either are not contained in the ComPIL2 database or were rejected due to our high search-filter stringency. Based on this assessment, it is estimated that an average of 14,075 MS2 spectra with a Novor score of 75 or greater remain unidentified per sample (Fig. 5C). This corresponds to approximately 9% of all MS2 spectra per patient sample, which could increase global identifications by approximately 50%. Note that 9% is a lower limit estimate for global unidentified MS2 spectra as this value is based only on MS2 spectra with Novor scores >75. A significant fraction of MS2 with Novor scores <75 have been identified by ComPIL and further suggest that the upper limit of unidentified MS2 is much larger than 9% (Fig. 5A, left of red vertical dashed line). BLAST analysis of several de novo PSMs, which do not have corresponding database PSMs, returned reasonable, high-similarity score matches to microbial peptides from the NCBI nonredundant database supporting this assertion. Below a Novor score of 75, there are likely many unidentified peptide-MS2; however, these peptides are also likely intermixed with many non-peptide-MS2.

      Discussion

      16S rRNA gene amplicon sequencing has been a workhorse technique for microbiota studies over the last decade due in large part to the simplicity of material extraction and the broad availability of resources needed to generate meaningful data. In contrast, the use of LC-MS/MS-based metaproteomics profiling has been more sparse for the opposite reasons. While functional proteomic interrogation of the microbiome by LC-MS/MS was our main goal, we considered it important to contrast our proteomics-based taxonomy findings with those generated by 16S sequencing, a technique more familiar to the microbiome research community. Gut microbiome taxonomy through the lens of 16S gene analysis versus LC-MS/MS-based proteomics is expectedly different, but in some unexpected ways (
      • Verberkmoes N.C.
      • Russell A.L.
      • Shah M.
      • Godzik A.
      • Rosenquist M.
      • Halfvarson J.
      • Lefsrud M.G.
      • Apajalahti J.
      • Tysk C.
      • Hettich R.L.
      • Jansson J.K.
      Shotgun metaproteomics of the human distal gut microbiota.
      ). At the phylum level, we expected to see a similar configuration by both techniques with the exception that proteomics would include a small concession for host proteins. Unexpectedly, we observed both host and diet-derived (potato, rice, corn, etc.) proteins in great relative abundance to microbial proteins. Our analyzed samples were pre-enriched for microbes by filtration, differential centrifugation, and several washing steps, yet nearly half of the detected proteome was mapped back to host or dietary plant proteins. Another unexpected finding was a discrepancy between the relative abundance ratios of Bacteroidetes and Firmicutes. This discrepancy could be artifactual and originate from differences in DNA extraction efficiencies between microbes, but it could also be the result of biologically relevant phenomena. For example, overrepresentation of Firmicutes by 16S sequencing could stem from an abundance of Firmicutes cells that are metabolically inactive (spores) relative to Bacteroidetes cells (
      • Browne H.P.
      • Forster S.C.
      • Anonye B.O.
      • Kumar N.
      • Neville B.A.
      • Stares M.D.
      • Goulding D.
      • Lawley T.D.
      Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation.
      ). This would be in agreement with our finding that the asexual sporulation GO term is enriched in healthy patient samples. Finally, we found that at more granular taxonomic strata [see Additional File 1, supplemental Figs. S1–S7], the number of identifiable organisms was unexpectedly greater for proteomics (using unique peptides as a proxy) than for 16S sequencing. By proteomics, we observed peptides originating from hundreds of organisms at the species level versus several dozen by 16S. Note that in our case, a standard 16S V4 analysis by paired-end short-read sequencing was performed; however, other higher-resolution techniques are becoming more accessible (
      • Fuks G.
      • Elgart M.
      • Amir A.
      • Zeisel A.
      • Turnbaugh P.J.
      • Soen Y.
      • Shental N.
      Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling.
      ,
      • Johnson J.S.
      • Spakowicz D.J.
      • Hong B.-Y.
      • Petersen L.M.
      • Demkowicz P.
      • Chen L.
      • Leopold S.R.
      • Hanson B.M.
      • Agresta H.O.
      • Gerstein M.
      • Sodergren E.
      • Weinstock G.M.
      Evaluation of 16S rRNA gene sequences for species and strain-level microbiome analysis.
      ,
      • Callahan B.J.
      • Wong J.
      • Heiner C.
      • Oh S.
      • Theriot C.M.
      • Gulati A.S.
      • McGill S.K.
      • Dougherty M.K.
      High-throughput amplicon sequencing of the full-length 16S rRNA gene and single-nucleotide resolution.
      ). Expectedly, the proportion of uniquely mappable peptides progressively decreased at more granular taxonomic levels such that at the species level, about 75% of all peptide intensity could not be mapped to a particular species. Because peptides are proxies for both taxonomy and function, this observation hints at a functional redundancy among microbes in the gut, which could be better examined by differential expression and GO term analysis of proteomics data.
      One of the leading motivations for performing differential expression analyses on microbiome samples is to identify specific biomarkers or disease-associated microbial proteins for further examination. Toward this goal, we identified 176 protein groups significantly enriched (q < 0.1) in either healthy or UC volunteers, with a major share originating from microbes. Interestingly, no host proteins were identified as significantly enriched in the fecal extracts of healthy volunteers while several were found enriched in the fecal extracts of UC patients. Among the host proteins enriched in UC patients, we identified the calibrating entry, protein S100-A9 (p < 0.004, q < 0.07), a component of fecal calprotectin and established IBD biomarker (
      • Tibble J.A.
      • Sigthorsson G.
      • Bridger S.
      • Fagerhol M.K.
      • Bjarnason I.
      Surrogate markers of intestinal inflammation are predictive of relapse in patients with inflammatory bowel disease.
      ,
      • D'Haens G.
      • Ferrante M.
      • Vermeire S.
      • Baert F.
      • Noman M.
      • Moortgat L.
      • Geens P.
      • Iwens D.
      • Aerden I.
      • Van Assche G.
      • Van Olmen G.
      • Rutgeerts P.
      Fecal calprotectin is a surrogate marker for endoscopic lesions in inflammatory bowel disease.
      ). According to STRING and reactome analyses, many host proteins enriched in UC patients are also inflammation-aligned lending more credibility to the prospect that the enriched proteins we have identified are truly UC-associated. For a comprehensive list of enriched protein groups, see Additional File 4. While most enriched protein groups had some annotation, a significant portion had little to none. This finding presents an exciting opportunity for the structural and biochemical study of novel sequences. Given the enormous number of domains of unknown function (DUF) and unknown function-type proteins catalogued from microbiome metagenomic sequencing efforts, we are faced with a prioritization problem wherein the most disease-relevant sequences are obscured by less impactful ones (
      • Jaroszewski L.
      • Li Z.
      • Krishna S.S.
      • Bakolitsa C.
      • Wooley J.
      • Deacon A.M.
      • Wilson I.A.
      • Godzik A.
      Exploration of uncharted regions of the protein universe.
      ). LC-MS/MS-based proteomics appears in this context to be an important tool for identifying sequences that are both expressed and biologically relevant, which will help focus our future studies. Lastly, it is important to point out that poorly annotated proteins (i.e. proteins without GO assignments) factor weakly or not at all into broader functional analyses like GO enrichment simply due to the nature of enrichment testing (i.e. hypergeometric). Therefore, novel sequences without known biological- or disease-relevance are important to eventually characterize.
      Within the detected microbiome proteome, known functional diversity is high with several hundred molecular function GO terms represented. A flat depiction of molecular function wherein a 1-sequence-1-count paradigm is applied reveals a consistent relative abundance configuration between all samples. We reasoned that count-based GO term depictions effectively reveal molecular diversity as each GO term’s constituent protein group/cluster equally contributes to a term’s size. However, this measure alone fails to capture material abundance. To depict material abundance, we have instead weighted GO terms by the peptide intensities (a very loose proxy for protein copy number) of their constituent protein groups/clusters (see Additional File 1 for details regarding this calculation procedure). Interestingly, when this paradigm is applied, a very different picture emerges. The relative abundance of many molecular function GO terms shifts, sometimes dramatically. One of the most conspicuous terms to us was “serine-type endopeptidase activity” that expanded an average of 14-fold among all patient samples going from count-based to intensity-based representation. Additionally, we found this same term enriched in UC patient fecal samples by hypergeometric testing, warranting a closer inspection of the protein groups that contribute to this term. We found that the major contributors were host-derived serine proteases (85.7%) such as chymotrypsin-C and the chymotrypsin-like elastase family with minor contributions from porcine trypsin (13.5%) (an artifact of sample preparation) and microbial serine proteases (0.8%). The high relative abundance of serine proteases in fecal samples is not surprising given that they are important components of host digestive enzyme cocktails secreted into the gut lumen. We were, however, surprised to find both host and microbial serpins, which are known active-site directed suicide inhibitors for serine/cysteine proteases, in fecal samples. This observation suggests that there might be important regulatory host–microbe cross talk with respect to proteolytic activity that occurs in the gut. By comparing the abundance of proteases or serpins, it would still be difficult to determine which and what fraction of serine proteases remained active upon fecal sample collection. To identify active serine proteases, we treated fecal samples with an active-site directed serine-hydrolase selective chemical probe (FP probe) for labeling, enrichment, and target identification via LC-MS/MS (ABPP) (
      • Jessani N.
      • Cravatt B.F.
      The development and application of methods for activity-based protein profiling.
      ,
      • Cravatt B.F.
      • Wright A.T.
      • Kozarich J.W.
      Activity-based protein profiling: From enzyme chemistry to proteomic chemistry.
      ,
      • Liu Y.
      • Patricelli M.P.
      • Cravatt B.F.
      Activity-based protein protein profiling: The serine hydrolases.
      ). We examined three patient fecal samples (one healthy, two UC) and qualitatively found human chymotrypsin-like elastases 3A and 3B and chymotrypsin-C enriched and therefore active in all samples. For one UC sample (UC2), we identified additional FP probe-enriched host proteases including cathepsin G, chymotrypsin-like elastase 2A and 2B, dipeptidyl peptidase 4, neutrophil elastase, and trypsin 1, lending support to the idea that aberrantly increased protease activity is associated with IBD (
      • Kriaa A.
      • Jablaoui A.
      • Mkaouar H.
      • Akermi N.
      • Maguin E.
      • Rhimi M.
      Serine proteases at the cutting edge of IBD: Focus on gastrointestinal inflammation.
      ,
      • Baugh M.D.
      • Perry M.J.
      • Hollander A.P.
      • Davies D.R.
      • Cross S.S.
      • Lobo A.J.
      • Taylor C.J.
      • Evans G.S.
      Matrix metalloproteinase levels are elevated in inflammatory bowel disease.
      ). In addition to host proteases, we identified several microbial proteases from all three patient samples upon FP probe enrichment. Surprisingly, we found that nine of ten microbial proteases were not detected by LC-MS/MS at all without FP probe enrichment. This finding suggests that there are likely many microbial proteases expressed in the gut microbiota and that they are likely below the limit of detection by most current sampling and LC-MS/MS profiling strategies. We speculate that this sentiment also holds true for other low-abundance, high-impact protein functionalities, underscoring the importance of pre-enrichment strategies for future proteomics studies.
      In addition to sampling limits, many microbial peptides that are sampled by LC-MS/MS are liable to go undetected due in large part to database-completeness limitations. We attempted to estimate the number of peptide-likely fragmentation spectra per LC-MS/MS experiment using a de novo sequencing tool (Novor) in order to define a rough boundary around the amount of unassigned peptide space captured by the mass spectrometer but unidentified by our database workflow. Homology searching of high-confidence peptide-like fragmentation spectra revealed high numbers of exact and near-matches to peptide sequences in the NCBI nonredundant database. Though de novo peptide sequencing coupled to homology searching can help capture database-elusive peptides in more defined contexts (
      • Ma B.
      • Johnson R.
      De novo sequencing and homology searching.
      ), we were reluctant to rely more heavily on this strategy without a stringent methodology for distinguishing between sequencing errors and truly homologous peptides, especially in a context as taxonomically diverse as the gut microbiota. An additional layer of difficulty rests in determining how to treat de novo only peptides that are constituents of completely unknown parent protein sequences. Deep genome sequencing and the use of custom MAGs are an obvious path forward especially as long-read technologies become more accurate and accessible (
      • Almeida A.
      • Mitchell A.L.
      • Boland M.
      • Forster S.C.
      • Gloor G.B.
      • Tarkowska A.
      • Lawley T.D.
      • Finn R.D.
      A new genomic blueprint of the human gut microbiota.
      ,
      • Dang D.D.
      • Froula J.
      • Egan R.
      • Wang Z.
      MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.
      ,
      • Chen L.-X.
      • Anantharaman K.
      • Shaiber A.
      • Muran Eren A.
      • Banfield J.F.
      Accurate and complete genomes from metagenomes.
      ,
      • Pasolli E.
      • Asnicar F.
      • Manara S.
      • Zolfo M.
      • Karcher N.
      • Armanini F.
      • Beghini F.
      • Manghi P.
      • Tett A.
      • Ghensi P.
      • Collado M.C.
      • Rice B.L.
      • DuLong C.
      • Morgan X.C.
      • Golden C.D.
      • et al.
      Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle.
      ,
      • Nayfach S.
      • Shi Z.J.
      • Seshadri R.
      • Pollard K.S.
      • Kyrpides N.C.
      New insights from uncultivated genomes of the global human gut microbiome.
      ,
      • Frank J.A.
      • Pan Y.
      • Tooming-Klunderud A.
      • Eijsink V.G.H.
      • McHardy A.C.
      • Nederbragt A.J.
      • Pope P.B.
      Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data.
      ,
      • Bishara A.
      • Moss E.L.
      • Kolmogorov M.
      • Parada A.E.
      • Weng Z.
      • Sidow A.
      • Dekas A.E.
      • Batzoglou S.
      • Bhatt A.S.
      High-quality genome sequences of uncultured microbes by assembly of read clouds.
      ,
      • Moss E.L.
      • Maghini D.G.
      • Bhatt A.S.
      Complete, closed bacterial genomes from microbiomes using nanopore sequencing.
      ,
      • Kolmogorov M.
      • Bickhart D.M.
      • Behsaz B.
      • Gurevich A.
      • Rayko M.
      • Shin S.B.
      • Kuhn K.
      • Yuan J.
      • Polevikov E.
      • Smith T.P.L.
      • Pevzner P.A.
      metaFlye: Scalable long-read metagenome assembly using repeat graphs.
      ). Notably, long-read technologies are expected to yield more contiguous genome assemblies, thus accelerating the functional annotation process for novel sequences by enhancing our ability to contextualize these novel sequences within their respective genomes. For peptides/proteins that elude this approach, however (low abundance microbes, heavily posttranslationally modified peptides/proteins, nonribosomal peptides/proteins, etc.), perhaps genomics-agnostic, middle- and top-down proteomics sequencing could be applied (
      • Catherman A.D.
      • Skinner O.S.
      • Kelleher N.L.
      Top down proteomics: Facts and perspectives.
      ,
      • Cristobal A.
      • Marino F.
      • Post H.
      • van den Toorn H.W.P.
      • Mohammed S.
      • Heck A.J.R.
      Toward an optimized workflow for middle-down proteomics.
      ). We anticipate that the expanded use of ABPP techniques in the microbiota will enrich for many protein sequences not contained in large databases such as ComPIL, and robust high-throughput methods for identifying these whole novel protein sequences will be needed.
      In summary, we identified 176 discrete host and microbial protein groups differentially enriched between healthy and UC patients. Our analysis revealed several protein functions associated with ulcerative colitis, with the function “serine-type endopeptidase activity” featuring prominently. We also identified host and microbial serine protease inhibitors in concert with serine proteases. Using an activity-based chemical tagging strategy, we were able to enrich for serine hydrolases/proteases and showed that these enzymes are still active in the gut despite the presence of active-site directed protease inhibitors. This strategy also revealed the presence of previously undetected serine proteases demonstrating the utility of activity-based tagging for the amplification of low-abundance proteins. Finally, we paired our database metaproteomics strategy with de novo peptide sequencing to estimate the size of high-confidence peptide space in our samples that remains unidentified despite the use of a large comprehensive database. Our data suggests that at the lower bound, at least an average of 9% of all our collected fragmentation spectra per run likely correspond to peptides, but remain unmatched.

      Data Availability

      The datasets generated and analyzed during the current study are available in the Proteomics Identification Database (PRIDE) repository (
      • Thuy-Boun P.S.
      • Wolan D.W.
      Ulcerative colitis human gut microbiome.
      ) under project PXD022433 or the Zenodo data repository (
      • Thuy-Boun P.S.
      • Wolan D.W.
      Quantitative metaproteomics and activity-based protein profiling of patient fecal microbiota identifies host and microbial proteins associated with ulcerative colitis.
      ) with https://doi.org/10.5281/zenodo.5717460.

      Supplemental data

      This article contains supplemental data.

      Conflict of interest

      The authors declare no competing interests.

      Acknowledgments

      We thank G. Tsaprailis and C. Sharager-Tapia for assistance with mass spectrometry data collection. We thank S. K. R. Park, T. Jung, and J. R. Yates for assistance with mass spectrometry data analysis. We thank H. Rosen, R. L. Wiseman, and L. L. Lairson for access to instrumentation.

      Funding and additional information

      This work was supported by NIH R21AI139744 (to D. W. W.) and Boehringer Ingelheim (to D. W. W., A. I. S., and W. J. C.) The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

      Author contributions

      P. S. T.-B., A. Y. W., and J. H. X. data curation; P. S. T.-B., S. C., G. S. S., A. I. S., and D. W. W. formal analysis; P. S. T.-B., A. Y. W., and J. H. X. investigation; P. S. T.-B., W. J. C., and D. W. W. methodology; A. C.-M. and W. J. C. resources, P. S. T.-B. and D. W. W. writing—original draft; P. S. T.-B., A. Y. W., A. C.-M., J. H. X., S. C., G. S. S., A. S., W. J. C., and D. W. W. writing—review and editing.

      Supplemental Data

      References

        • Baumgart D.C.
        • Carding S.R.
        Inflammatory bowel disease: Cause and immunobiology.
        Lancet. 2007; 369: 1627-1640
        • Baumgart D.C.
        • Sandborn W.J.
        Inflammatory bowel disease: Clinical aspects and established and evolving therapies.
        Lancet. 2007; 369: 1641-1657
        • Bernstein C.N.
        • Blanchard J.F.
        • Kliewer E.
        • Wajda A.
        Cancer risk in patients with inflammatory bowel disease: A population-based study.
        Cancer. 2001; 91: 854-862
        • Molodecky N.A.
        • Soon I.S.
        • Rabi D.M.
        • Ghali W.A.
        • Ferris M.
        • Chernoff G.
        • Benchimol E.I.
        • Panaccione R.
        • Ghosh S.
        • Barkema H.W.
        • Kaplan G.G.
        Increasing incidence and prevalence of inflammatory bowel diseases with time, based on systematic review.
        Gastroenterology. 2012; 142: 46-54
        • Jostins L.
        • Ripke S.
        • Weersma R.K.
        • Duerr R.H.
        • McGovern D.P.
        • Hui K.Y.
        • Lee J.C.
        • Schumm L.P.
        • Sharma Y.
        • Anderson C.A.
        • Essers J.
        • Mitrovic M.
        • Ning K.
        • Cleynen I.
        • Theatre E.
        • et al.
        Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease.
        Nature. 2012; 491: 119-124
        • Liu J.Z.
        • van Sommeren S.
        • Huang H.
        • Ng S.C.
        • Alberts R.
        • Takahashi A.
        • Ripke S.
        • Lee J.C.
        • Jostins L.
        • Shah T.
        • Abedian S.
        • Cheon J.H.
        • Cho J.
        • Dayani N.E.
        • Franke L.
        • et al.
        Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations.
        Nat. Genet. 2015; 47: 979-986
        • de Lange K.M.
        • Moutsianas L.
        • Lee J.C.
        • Lamb C.A.
        • Luo Y.
        • Kennedy N.A.
        • Jostins L.
        • Rice D.L.
        • Gutierrez-Achury J.
        • Ji S.G.
        • Heap G.
        • Nimmo E.R.
        • Edwards C.
        • Henderson P.
        • Mowat C.
        • et al.
        Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease.
        Nat. Genet. 2017; 49: 256-261
        • Brant S.R.
        Update on the heritability of inflammatory bowel disease: The importance of twin studies.
        Inflamm. Bowel Dis. 2011; 17: 1-5
        • Dalal S.R.
        • Chang E.B.
        The microbial basis of inflammatory bowel diseases.
        J. Clin. Invest. 2014; 124: 4190-4196
        • Nishida A.
        • Inoue R.
        • Inatomi O.
        • Bamba S.
        • Naito Y.
        • Andoh A.
        Gut microbiota in the pathogenesis of inflammatory bowel disease.
        Clin. J. Gastroenterol. 2018; 11: 1-10
        • Ni J.
        • Wu G.D.
        • Albenberg L.
        • Tomov V.T.
        Gut microbiota and IBD: Causation or correlation?.
        Nat. Rev. Gastroenterol. Hepatol. 2017; 14: 573-584
        • Manichanh C.
        • Borruel N.
        • Casellas F.
        • Guarner F.
        The gut microbiota in IBD.
        Nat. Rev. Gastroenterol. Hepatol. 2012; 9: 599-608
        • Caruso R.
        • Lo B.C.
        • Nunez G.
        Host-microbiota interactions in inflammatory bowel disease.
        Nat. Rev. Immunol. 2020; 20: 411-426
        • Qin J.
        • Li R.
        • Raes J.
        • Arumugam M.
        • Burgdorf K.S.
        • Manichanh C.
        • Nielsen T.
        • Pons N.
        • Levenez F.
        • Yamada T.
        • Mende D.R.
        • Li J.
        • Xu J.
        • Li S.
        • Li D.
        • et al.
        A human gut microbial gene catalog established by metagenomic sequencing.
        Nature. 2010; 464: 59-65
        • Almeida A.
        • Mitchell A.L.
        • Boland M.
        • Forster S.C.
        • Gloor G.B.
        • Tarkowska A.
        • Lawley T.D.
        • Finn R.D.
        A new genomic blueprint of the human gut microbiota.
        Nature. 2019; 568: 499-504
        • Lozupone C.A.
        • Stombaugh J.I.
        • Gordon J.I.
        • Jansson J.K.
        • Knight R.
        Diversitiy, stability, and resilience of the human gut microbiota.
        Nature. 2012; 489: 220-230
        • Gill S.R.
        • Pop M.
        • Deboy R.T.
        • Eckburg P.B.
        • Turnbaugh P.J.
        • Samuel B.S.
        • Gordon J.I.
        • Relman D.A.
        • Fraser-Liggett C.M.
        • Nelson K.E.
        Metagenomic analysis of the human distal gut microbiome.
        Science. 2006; 312: 1355-1359
        • Franzosa E.A.
        • Hsu T.
        • Sirota-Madi A.
        • Shafquat A.
        • Abu-Ali G.
        • Morgan X.C.
        • Huttenhower C.
        Sequencing and beyond: Integrating molecular ‘omics’ for microbial community profiling.
        Nat. Rev. Microbiol. 2015; 13: 360-372
        • Wilmes P.
        • Bond P.L.
        The application of two-dimensional polyacrylamide gel electrophoresis and downstream analyses to a mixed community of prokaryotic microorganisms.
        Environ. Microbiol. 2004; 6: 911-920
        • Eng J.K.
        • McCormack A.L.
        • Yates J.R.
        An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.
        J. Am. Soc. Mass Spectrom. 1994; 5: 976-989
        • Stewart E.J.
        Growing unculturable bacteria.
        J. Bacteriol. 2012; 194: 4151-4160
        • Dang D.D.
        • Froula J.
        • Egan R.
        • Wang Z.
        MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.
        PeerJ. 2015; 3e1165
        • Chen L.-X.
        • Anantharaman K.
        • Shaiber A.
        • Muran Eren A.
        • Banfield J.F.
        Accurate and complete genomes from metagenomes.
        Genome Res. 2020; 30: 315-333
        • Pasolli E.
        • Asnicar F.
        • Manara S.
        • Zolfo M.
        • Karcher N.
        • Armanini F.
        • Beghini F.
        • Manghi P.
        • Tett A.
        • Ghensi P.
        • Collado M.C.
        • Rice B.L.
        • DuLong C.
        • Morgan X.C.
        • Golden C.D.
        • et al.
        Extensive unexplored human microbiome diversity revealed by over 150,000 genomes from metagenomes spanning age, geography, and lifestyle.
        Cell. 2019; 176: 649-662
        • Nayfach S.
        • Shi Z.J.
        • Seshadri R.
        • Pollard K.S.
        • Kyrpides N.C.
        New insights from uncultivated genomes of the global human gut microbiome.
        Nature. 2019; 568: 505-510
        • Tanca A.
        • Palomba A.
        • Fraumene C.
        • Pagnozzi D.
        • Manghina V.
        • Deligios M.
        • Muth T.
        • Rapp E.
        • Martens L.
        • Addis M.F.
        • Uzzau S.
        The impact of sequence database choice on metaproteomic results in gut microbiota studies.
        Microbiome. 2016; 4: 51
        • Chatterjee S.
        • Stupp G.S.
        • Park S.K.R.
        • Ducom J.-C.
        • Yates J.R.
        • Su A.I.
        • Wolan D.W.
        A comprehensive and scalable database search system for metaproteomics.
        BMC Genomics. 2016; 17: 642
        • Park S.K.R.
        • Jung T.
        • Thuy-Boun P.S.
        • Wang A.Y.
        • Yates J.R.
        • Wolan D.W.
        ComPIL 2.0: An updated comprehensive metaproteomics database.
        J. Proteome Res. 2019; 18: 616-622
        • Jessani N.
        • Cravatt B.F.
        The development and application of methods for activity-based protein profiling.
        Curr. Opin. Chem. Biol. 2004; 8: 54-59
        • Cravatt B.F.
        • Wright A.T.
        • Kozarich J.W.
        Activity-based protein profiling: From enzyme chemistry to proteomic chemistry.
        Annu. Rev. Biochem. 2008; 77: 383-414
        • Ram R.J.
        • Verberkmoes N.C.
        • Thelen M.P.
        • Tyson G.W.
        • Baker B.J.
        • Blake 2nd, R.C.
        • Shah M.
        • Hettich R.L.
        • Banfield J.F.
        Community proteomics of a natural microbial biofilm.
        Science. 2005; 308: 1915-1920
        • Tyson G.W.
        • Chapman J.
        • Hugenholtz P.
        • Allen E.E.
        • Ram R.J.
        • Richardson P.M.
        • Solovyev V.V.
        • Rubin E.M.
        • Rokhsar D.S.
        • Banfield J.F.
        Community structure and metabolism through reconstruction of microbial genomes from the environment.
        Nature. 2004; 428: 37-43
        • VerBerkmoes N.C.
        • Denef V.J.
        • Hettich R.L.
        • Banfield J.F.
        Functional analysis of natural microbial consortia using community proteomics.
        Nat. Rev. Microbiol. 2009; 7: 196-205
        • Verberkmoes N.C.
        • Russell A.L.
        • Shah M.
        • Godzik A.
        • Rosenquist M.
        • Halfvarson J.
        • Lefsrud M.G.
        • Apajalahti J.
        • Tysk C.
        • Hettich R.L.
        • Jansson J.K.
        Shotgun metaproteomics of the human distal gut microbiota.
        ISME J. 2009; 3: 179-189
        • Erickson A.R.
        • Cantarel B.L.
        • Lamendella R.
        • Darzi Y.
        • Mongodin E.F.
        • Pan C.
        • Shah M.
        • Halfvarson J.
        • Tysk C.
        • Henrissat B.
        • Raes J.
        • Verberkmoes N.C.
        • Fraser C.M.
        • Hettich R.L.
        • Jansson J.K.
        Integrated metagenomics/metaproteomics reveals human host-microbiota signatures of Crohn’s disease.
        PLoS One. 2012; 7e49138
        • Juste C.
        • Kreil D.P.
        • Beauvallet C.
        • Guillot A.
        • Vaca S.
        • Carapito C.
        • Mondot S.
        • Sykacek P.
        • Sokol H.
        • Blon F.
        • Lepercq P.
        • Levenez F.
        • Valot B.
        • Carré W.
        • Loux V.
        • et al.
        Bacterial protein signals are associated with Crohn’s disease.
        Gut. 2016; 63: 11566-11577
        • Zhang X.
        • Ning Z.
        • Mayne J.
        • Deeke S.A.
        • Li J.
        • Starr A.E.
        • Chen R.
        • Singleton R.
        • Butcher J.
        • Mack D.R.
        • Stintzi A.
        • Figeys D.
        In vitro metabolic labeling of intestinal microbiota for quantitative metaproteomics.
        Anal. Chem. 2016; 88: 6120-6125
        • Heintz-Buschart A.
        • May P.
        • Laczny C.C.
        • Lebrun L.A.
        • Bellora C.
        • Krishna A.
        • Wampach L.
        • Schneider J.G.
        • Hogan A.
        • de Beaufort C.
        • Wilmes P.
        Integrated multi-omics of the human gut microbiome in a case study of familial type 1 diabetes.
        Nat. Microbiol. 2016; 2: 16180
        • Zhang X.
        • Chen W.
        • Ning Z.
        • Mayne J.
        • Mack D.
        • Stintzi A.
        • Tian R.
        • Figeys D.
        Deep metaproteomics approach for the study of human microbiomes.
        Anal. Chem. 2017; 89: 9407-9415
        • Zhang X.
        • Deeke S.A.
        • Ning Z.
        • Starr A.E.
        • Butcher J.
        • Li J.
        • Mayne J.
        • Cheng K.
        • Liao B.
        • Li L.
        • Singleton R.
        • Mack D.
        • Stintzi A.
        • Figeys D.
        Metaproteomics reveals associations between microbiome and intestinal extracellular vesicle proteins in pediatric inflammatory bowel disease.
        Nat. Commun. 2018; 9: 2873
        • Mills R.H.
        • Vázquez-Baeza Y.
        • Zhu Q.
        • Jiang L.
        • Gaffney J.
        • Humphrey G.
        • Smarr L.
        • Knight R.
        • Gonzalez D.J.
        Evaluating metagenomic prediction of the metaproteome in a 4.5-year study of a patient with Crohn’s disease.
        mSystems. 2019; 4e00337-18
        • Blakely-Ruiz J.A.
        • Erickson A.R.
        • Cantarel B.L.
        • Xiong W.
        • Adams R.
        • Jansson J.K.
        • Fraser C.M.
        • Hettich R.L.
        Metaproteomics reveals persistent and phylum-redundant metabolic functional stability in adult human gut microbiomes of Crohn’s remission patients despite temporal variations in microbial taxa, genomes, and proteomes.
        Microbiome. 2019; 7: 18
        • Lloyd-Price J.
        • Arze C.
        • Ananthakrishnan A.N.
        • Schirmer M.
        • Avila-Pacheco J.
        • Poon T.W.
        • Andrews E.
        • Ajami N.J.
        • Bonham K.S.
        • Brislawn C.J.
        • Casero D.
        • Courtney H.
        • Gonzalez A.
        • Graeber T.G.
        • Hall A.B.
        • et al.
        Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases.
        Nature. 2019; 569: 655-662
        • Lehmann T.
        • Schallert K.
        • Vilchez-Vargas R.
        • Benndorf D.
        • Puttker S.
        • Sydor S.
        • Schulz C.
        • Bechmann L.
        • Canbay A.
        • Heidrich B.
        • Reichl U.
        • Link A.
        • Heyer R.
        Metaproteomics of fecal samples of Crohn’s disease and ulcerative colitis.
        J. Proteomics. 2019; 201: 93-103
        • Mayers M.D.
        • Moon C.
        • Stupp G.S.
        • Su A.I.
        • Wolan D.W.
        Quantitative metaproteomics and activity-based probe enrichment reveals significant alterations in protein expression from a mouse model of inflammatory bowel disease.
        J. Proteome Res. 2017; 16: 1014-1026
        • Whidbey C.
        • Sadler N.C.
        • Nair R.N.
        • Volk R.F.
        • DeLeon A.J.
        • Bramer L.M.
        • Fansler S.J.
        • Hansen J.R.
        • Shukla A.K.
        • Jansson J.K.
        • Thrall B.D.
        • Wright A.T.
        A probe-enabled approach for the selective isolation and characterization of functionally active subpopulations in the gut microbiome.
        J. Am. Chem. Soc. 2019; 141: 42-47
        • Parasar B.
        • Zhou H.
        • Xiao X.
        • Shi Q.
        • Brito I.L.
        • Chang P.V.
        Chemoproteomic profiling of gut microbiota-associated bile salt hydrolase activity.
        ACS Cent. Sci. 2019; 5: 867-873
        • Jariwala P.B.
        • Pellock S.J.
        • Goldfarb D.
        • Cloer E.W.
        • Artola M.
        • Simpson J.B.
        • Bhatt A.P.
        • Walton W.G.
        • Roberts L.R.
        • Major M.B.
        • Davies G.J.
        • Overkleeft H.S.
        • Redinbo M.R.
        Discovering the microbial enzymes driving drug toxicity with activity-based protein profiling.
        ACS Chem. Biol. 2020; 15: 217-225
        • Wang A.Y.
        • Thuy-Boun P.S.
        • Stupp G.S.
        • Su A.I.
        • Wolan D.W.
        Triflic acid treatment enables LC-MS/MS analysis of insoluble bacterial biomass.
        J. Proteome Res. 2018; 17: 2978-2986
        • Xu T.
        • Venable J.D.
        • Park S.K.
        • Cociorva D.
        • Lu B.
        • Liao L.
        • Wohlschlegel J.
        • Hewel J.
        • Yates J.R.
        ProLuCID, a fast and sensitive tandem mass spectra-based protein identification program.
        Mol. Cell. Proteomics. 2006; 5S174
        • Xu T.
        • Park S.K.
        • Venable J.D.
        • Wohlschlegel J.A.
        • Diedrich J.K.
        • Cociorva D.
        • Lu B.
        • Liao L.
        • Hewel J.
        • Han X.
        • Wong C.
        • Fonslow B.
        • Delahunty C.
        • Gao Y.
        • Shah H.
        • et al.
        ProLuCID: An improved SEQUEST-like algorithm with enhanced sensitivity and specificity.
        J. Proteomics. 2015; 129: 16-24
        • Ashburner M.
        • Ball C.A.
        • Blake J.A.
        • Botstein D.
        • Butler H.
        • Cherry J.M.
        • Davis A.P.
        • Dolinski K.
        • Dwight S.S.
        • Eppig J.T.
        • Harris M.A.
        • Hill D.P.
        • Issel-Tarver L.
        • Kasarskis A.
        • Lewis S.
        • et al.
        Gene ontology: Tool for the unification of biology. The Gene Ontology Consortium.
        Nat. Genet. 2000; 25: 25-29
        • Bolyen E.
        • Rideout J.R.
        • Dillon M.R.
        • Bokulich N.A.
        • Abnet C.C.
        • Al-Ghalith G.A.
        • Alexander H.
        • Alm E.J.
        • Arumugam M.
        • Asnicar F.
        • Bai Y.
        • Bisanz J.E.
        • Bittinger K.
        • Brejnrod A.
        • Brislawn C.J.
        • et al.
        Reproducible, interactive, scalable, and extensible microbiome data science using QIIME 2.
        Nat. Biotechnol. 2019; 37: 852-857
        • Quast C.
        • Pruesse E.
        • Yilmaz P.
        • Gerken J.
        • Schweer T.
        • Yarza P.
        • Peplies J.
        • Glöckner F.O.
        The SILVA ribosomal RNA gene database project: Improved data processing and web-based tools.
        Nucleic Acids Res. 2013; 41: D590-D596
        • Thuy-Boun P.S.
        • Wolan D.W.
        Quantitative metaproteomics and activity-based protein profiling of patient fecal microbiota identifies host and microbial proteins associated with ulcerative colitis.
        Zenodo. 2020; https://doi.org/10.5281/zenodo.5717460
        • Elias J.E.
        • Gygi S.P.
        Target-decoy search strategy for increased confidence in large-scale protein identification by mass spectrometry.
        Nat. Methods. 2007; 4: 207-214
        • Li W.
        • Jaroszewski L.
        • Godzik A.
        Clustering of highly homologous sequences to reduce the size of large protein database.
        Bioinformatics. 2001; 17: 282-283
        • Li W.
        • Godzik A.
        Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences.
        Bioinformatics. 2006; 22: 1658-1659
        • Millikin R.J.
        • Solntsev S.K.
        • Shortreed M.R.
        • Smith L.M.
        Ultrafast peptide label-free quantification with FlashLFQ.
        J. Proteome Res. 2018; 17: 386-391
        • Ritchie M.E.
        • Phipson B.
        • Wu D.
        • Hu Y.
        • Law C.W.
        • Shi W.
        • Smyth G.K.
        Limma powers differential expression analyses for RNA-sequencing and microarray studies.
        Nucleic Acids Res. 2015; 43e47
        • Zhang X.
        • Smits A.H.
        • van Tilburg G.B.
        • Ovaa H.
        • Huber W.
        • Vermeulen M.
        Proteome-wide identification of ubiquitin interactions using UbIA-MS.
        Nat. Protoc. 2018; 13: 530-550
        • Jones P.
        • Binns D.
        • Chang H.Y.
        • Fraser M.
        • Li W.
        • McAnulla C.
        • McWilliam H.
        • Maslen J.
        • Mitchell A.
        • Nuka G.
        • Pesseat S.
        • Quinn A.F.
        • Sangrador-Vegas A.
        • Scheremetjew M.
        • Yong S.Y.
        • et al.
        InterProScan 5: Genome-scale protein function classification.
        Bioinformatics. 2014; 30: 1236-1240
        • Burge S.
        • Kelly E.
        • Lonsdale D.
        • Mutowo-Muellenet P.
        • McAnulla C.
        • Mitchell A.
        • Sangrador-Vegas A.
        • Mulder N.
        • Hunter S.
        Manual GO annotation of predictive protein signatures: The InterPro approach to GO curation.
        Database. 2012; https://doi.org/10.1093/database/bar068
        • Falcon S.
        • Gentleman R.
        Using GOstats to test gene lists for GO term association.
        Bioinformatics. 2007; 23: 257-258
        • Mesuere B.
        • Devreese B.
        • Debyser G.
        • Aerts M.
        • Vandamme P.
        • Dawyndt P.
        Unipept: Tryptic peptide-based biodiversity analysis of metaproteome samples.
        J. Proteome Res. 2012; 11: 5773-5780
        • Mesuere B.
        • Debyser G.
        • Aerts M.
        • Devreese B.
        • Vandamme P.
        • Dawyndt P.
        The Unipept metaproteomics analysis pipeline.
        Proteomics. 2015; 15: 1437-1442
        • Singh R.G.
        • Tanca A.
        • Palomba A.
        • Van der Jeugt F.
        • Verschaffelt P.
        • Uzzau S.
        • Martens L.
        • Dawyndt P.
        • Mesuere B.
        Unipept 4.0: Functional analysis of metaproteome data.
        J. Proteome Res. 2019; 8: 606-615
        • Needleman S.B.
        • Wunsch C.D.
        A general method applicable to search for similarities in amino acid sequence of 2 proteins.
        J. Mol. Biol. 1970; 48: 443-453
        • Thuy-Boun P.S.
        • Wolan D.W.
        Ulcerative colitis human gut microbiome.
        Proteomics Identification Database (PRIDE). 2020; (PXD022433)
        • Tabb D.L.
        • McDonald W.H.
        • Yates J.R.
        DTASelect and contrast: Tools for assembling and comparing protein identifications from shotgun proteomics.
        J. Proteome Res. 2002; 1: 21-26
        • Storey J.D.
        • Tibshirani R.
        Statistical significance for genomewide studies.
        Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 9440-9445
        • Szklarczyk D.
        • Gable A.L.
        • Lyon D.
        • Junge A.
        • Wyder S.
        • Huerta-Cepas J.
        • Simonovic M.
        • Doncheva N.T.
        • Morris J.H.
        • Bork P.
        • Jensen L.J.
        • von Mering C.
        STRING v11: Protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.
        Nucleic Acids Res. 2019; 47: D607-D613
        • Altschul S.F.
        • Gish W.
        • Miller W.
        • Myers E.W.
        • Lipman D.J.
        Basic local alignment search tool.
        J. Mol. Biol. 1990; 215: 403-410
        • Ricaboni D.
        • Maihe M.
        • Khelaifia S.
        • Raoult D.
        • Million M.
        Romboutsia timonensis, a new species isolated from the human gut.
        New Microbes New Infect. 2016; 12: 6-7
        • Mangifesta M.
        • Mancabelli L.
        • Milani C.
        • Gaiani F.
        • de'Angelis N.
        • de'Angelis G.L.
        • van Sinderen D.
        • Ventura M.
        • Turroni F.
        Mucosal microbiota of intestinal polyps reveals putative biomarkers of colorectal cancer.
        Sci. Rep. 2018; 8: 13974
        • Engevik M.A.
        • Morra C.N.
        • Röth D.
        • Engevik K.
        • Spinler J.K.
        • Devaraj S.
        • Crawford S.E.
        • Estes M.K.
        • Kalkum M.
        • Versalovic J.
        Microbial metabolic capacity for intestinal folate production and modulation of host folate receptors.
        Front. Microbiol. 2019; 10: 2305
        • Sharma V.
        • Rodionov D.A.
        • Leyn S.A.
        • Tran D.
        • Iablokov S.N.
        • Ding H.
        • Peterson D.A.
        • Osterman A.L.
        • Peterson S.N.
        B-vitamin sharing promotes stability of gut microbial communities.
        Front. Microbiol. 2019; 10: 1485
        • Rossi M.
        • Amaretti A.
        • Raimondi S.
        Folate production by probiotic bacteria.
        Nutrients. 2011; 3: 118-134
        • Hale L.P.
        • Chichlowski M.
        • Trinh C.T.
        • Greer P.K.
        Dietary supplementation with fresh pineapple juice decreases inflammation and colonic neoplasia in IL-10-deficient mice with colitis.
        Inflamm. Bowel Dis. 2010; 16: 2012-2021
        • Fitzhugh D.J.
        • Shan S.Q.
        • Dewhirst M.W.
        • Hale L.P.
        Bromelain treatment decreases neutrophil migration to sites of inflammation.
        Clin. Immunol. 2008; 128: 66-74
        • de la Motte C.A.
        Hyaluronan in intestinal homeostasis and inflammation: Implications for fibrosis.
        Am. J. Physiol. Gastrointest. Liver Physiol. 2011; 301: G945-G949
        • Chaput C.
        • Labigne A.
        • Boneca I.G.
        Characterization of Helicobacter pylori lytic transglycosylases Slt and MltD.
        J. Bacteriol. 2007; 189: 422
        • Dik D.A.
        • Marous D.R.
        • Fisher J.F.
        • Mobashery S.
        Lytic transglycosylases: Concinnity in concision of the bacterial cell wall.
        Crit. Rev. Biochem. Mol. Biol. 2017; 52: 503-542
        • Viala J.
        • Chaput C.
        • Boneca I.G.
        • Cardona A.
        • Girardin S.E.
        • Moran A.P.
        • Athman R.
        • Mémet S.
        • Huerre M.R.
        • Coyle A.J.
        • DiStefano P.S.
        • Sansonetti P.J.
        • Labigne A.
        • Bertin J.
        • Philpott D.J.
        • et al.
        Nod1 responds to peptidoglycan delivered by the Helicobacter pylori cag pathogenicity island.
        Nat. Immunol. 2004; 5: 1166-1174
        • Wyckoff T.J.
        • Taylor J.A.
        • Salama N.R.
        Beyond growth: Novel functions for bacterial cell wall hydrolases.
        Trends Microbiol. 2012; 20: 540-547
        • Wright G.D.
        Bacterial resistance to antibiotics: Enzymatic degradation and modification.
        Adv. Drug Deliv. Rev. 2005; 57: 1451-1470
        • Blair J.M.A.
        • Webber M.A.
        • Baylay A.J.
        • Ogbolu D.O.
        • Piddock L.J.V.
        Molecular mechanisms of antibiotic resistance.
        Nat. Rev. Microbiol. 2015; 13: 42-51
        • Million M.
        • Armstrong N.
        • Khelaifia S.
        • Guilhot E.
        • Richez M.
        • Lagier J.C.
        • Dubourg G.
        • Chabriere E.
        • Raoult D.
        The antioxidants glutathione, ascorbic acid, and uric acid maintain butyrate production by human gut clostridia in the presence of oxygen in vitro.
        Sci. Rep. 2020; 10: 7705
        • Liu Y.
        • Patricelli M.P.
        • Cravatt B.F.
        Activity-based protein protein profiling: The serine hydrolases.
        Proc. Natl. Acad. Sci. U. S. A. 1999; 96: 14694-14699
        • Jagtap P.
        • Goslinga J.
        • Kooren J.A.
        • McGowan T.
        • Wroblewski M.S.
        • Seymour S.L.
        • Griffin T.J.
        A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies.
        Proteomics. 2013; 13: 1352-1357
        • Silverman G.A.
        • Bird P.I.
        • Carrell R.W.
        • Church F.C.
        • Coughlin P.B.
        • Gettins P.G.
        • Irving J.A.
        • Lomas D.A.
        • Luke C.J.
        • Moyer R.W.
        • Pemberton P.A.
        • Remold-O'Donnell E.
        • Salvesen G.S.
        • Travis J.
        • Whisstock J.C.
        The serpins are an expanding superfamily of structurally similar but functionally diverse proteins.
        J. Biol. Chem. 2001; 276: 33293-33296
        • Kriaa A.
        • Jablaoui A.
        • Mkaouar H.
        • Akermi N.
        • Maguin E.
        • Rhimi M.
        Serine proteases at the cutting edge of IBD: Focus on gastrointestinal inflammation.
        FASEB J. 2020; 34: 7270-7282
        • Uchiyama K.
        • Naito Y.
        • Takagi T.
        • Mizushima K.
        • Hirai Y.
        • Hayashi N.
        • Harusato A.
        • Inoue K.
        • Fukumoto K.
        • Yamada S.
        • Handa O.
        • Ishikawa T.
        • Yagi N.
        • Kokura S.
        • Yoshikawa T.
        Serpin B1 protects colonic epithelial cell via blockage of neutrophil elastase activity and its expression is enhanced in patients with ulcerative colitis.
        Am. J. Physiol. Gastrointest. Liver Physiol. 2012; 302: G1163-G1170
        • Kumar P.
        • Johnson J.E.
        • Easterly C.
        • Mehta S.
        • Sajulga R.
        • Nunn B.
        • Jagtap P.D.
        • Griffin T.J.
        A sectioning and database enrichment approach for improved peptide spectrum matching in large, genome-guided protein sequence databases.
        J. Proteome Res. 2020; 19: 2722-2785
        • Ma B.
        • Johnson R.
        De novo sequencing and homology searching.
        Mol. Cell. Proteomics. 2012; 11 (O111.014902)
        • Ma B.
        Novor: Real-time peptide de novo sequencing software.
        J. Am. Soc. Mass Spectrom. 2015; 26: 1885-1894
        • Browne H.P.
        • Forster S.C.
        • Anonye B.O.
        • Kumar N.
        • Neville B.A.
        • Stares M.D.
        • Goulding D.
        • Lawley T.D.
        Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation.
        Nature. 2016; 533: 543-546
        • Fuks G.
        • Elgart M.
        • Amir A.
        • Zeisel A.
        • Turnbaugh P.J.
        • Soen Y.
        • Shental N.
        Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling.
        Microbiome. 2018; 6: 17
        • Johnson J.S.
        • Spakowicz D.J.
        • Hong B.-Y.
        • Petersen L.M.
        • Demkowicz P.
        • Chen L.
        • Leopold S.R.
        • Hanson B.M.
        • Agresta H.O.
        • Gerstein M.
        • Sodergren E.
        • Weinstock G.M.
        Evaluation of 16S rRNA gene sequences for species and strain-level microbiome analysis.
        Nat. Commun. 2019; 10: 5029
        • Callahan B.J.
        • Wong J.
        • Heiner C.
        • Oh S.
        • Theriot C.M.
        • Gulati A.S.
        • McGill S.K.
        • Dougherty M.K.
        High-throughput amplicon sequencing of the full-length 16S rRNA gene and single-nucleotide resolution.
        Nucleic Acids Res. 2019; 47e103
        • Tibble J.A.
        • Sigthorsson G.
        • Bridger S.
        • Fagerhol M.K.
        • Bjarnason I.
        Surrogate markers of intestinal inflammation are predictive of relapse in patients with inflammatory bowel disease.
        Gastroenterology. 2000; 119: 15-22
        • D'Haens G.
        • Ferrante M.
        • Vermeire S.
        • Baert F.
        • Noman M.
        • Moortgat L.
        • Geens P.
        • Iwens D.
        • Aerden I.
        • Van Assche G.
        • Van Olmen G.
        • Rutgeerts P.
        Fecal calprotectin is a surrogate marker for endoscopic lesions in inflammatory bowel disease.
        Inflamm. Bowel Dis. 2012; 18: 2218-2224
        • Jaroszewski L.
        • Li Z.
        • Krishna S.S.
        • Bakolitsa C.
        • Wooley J.
        • Deacon A.M.
        • Wilson I.A.
        • Godzik A.
        Exploration of uncharted regions of the protein universe.
        PLoS Biol. 2009; 7e1000205
        • Baugh M.D.
        • Perry M.J.
        • Hollander A.P.
        • Davies D.R.
        • Cross S.S.
        • Lobo A.J.
        • Taylor C.J.
        • Evans G.S.
        Matrix metalloproteinase levels are elevated in inflammatory bowel disease.
        Gastroenterology. 1999; 117: 814-822
        • Frank J.A.
        • Pan Y.
        • Tooming-Klunderud A.
        • Eijsink V.G.H.
        • McHardy A.C.
        • Nederbragt A.J.
        • Pope P.B.
        Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data.
        Sci. Rep. 2016; 6: 25373
        • Bishara A.
        • Moss E.L.
        • Kolmogorov M.
        • Parada A.E.
        • Weng Z.
        • Sidow A.
        • Dekas A.E.
        • Batzoglou S.
        • Bhatt A.S.
        High-quality genome sequences of uncultured microbes by assembly of read clouds.
        Nat. Biotechnol. 2018; 36: 1067-1075
        • Moss E.L.
        • Maghini D.G.
        • Bhatt A.S.
        Complete, closed bacterial genomes from microbiomes using nanopore sequencing.
        Nat. Biotechnol. 2020; 38: 701-707
        • Kolmogorov M.
        • Bickhart D.M.
        • Behsaz B.
        • Gurevich A.
        • Rayko M.
        • Shin S.B.
        • Kuhn K.
        • Yuan J.
        • Polevikov E.
        • Smith T.P.L.
        • Pevzner P.A.
        metaFlye: Scalable long-read metagenome assembly using repeat graphs.
        Nat. Methods. 2020; 17: 1103-1110
        • Catherman A.D.
        • Skinner O.S.
        • Kelleher N.L.
        Top down proteomics: Facts and perspectives.
        Biochem. Biophys. Res. Commun. 2014; 445: 683-693
        • Cristobal A.
        • Marino F.
        • Post H.
        • van den Toorn H.W.P.
        • Mohammed S.
        • Heck A.J.R.
        Toward an optimized workflow for middle-down proteomics.
        Anal. Chem. 2017; 89: 3318-3325