|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 5:987-997, 2006.
© 2006 by The American Society for Biochemistry and Molecular Biology, Inc.

,¶
,||


,¶¶,||||
From the
Institute of Biochemistry, ** Institute of Bioinformatics, 
Institute of Biotechnology in Medicine, and ¶¶ Department of Biotechnology and Laboratory Science in Medicine, National Yang Ming University, * Taipei City Hospital, Taipei 112, Taiwan, Republic of China,
Institute for Systems Biology, Seattle, Washington 98103, ¶ Zymogenetics, Inc., Seattle, Washington 98012, || Institute for Molecular Systems Biology, ETH Hönggerberg and Faculty of Natural Sciences, University of Zurich, Zurich CH-8093, Switzerland, and 
Department of Medicinal Chemistry, University of Washington, Seattle, Washington 98195
| ABSTRACT |
|---|
|
|
|---|
The genomes of Halobacterium species are extremely unstable (79). Early studies of Halobacterium sp. NRC-1 (also known as Halobacterium halobium) and the closely related Halobacterium salinarium discovered unusually high spontaneous mutation frequencies of 0.01% for the production of bacteriorhodopsin- or bacterioruberin-deficient phenotypes and a more striking 1% for partial or total gas vesicle-deficient phenotypes. The species is also noteworthy for the large number of insertion sequence elements that are harbored in this unstable genome. Molecular genetic analysis of the bacteriorhodopsin- and gas vesicle-deficient mutants established a relationship between transposable insertion sequence-mediated insertional inactivation or deletions of structural or regulatory genes and the high mutant rates (1016). Upon the completion of the genome sequence, DNA analysis revealed the presence of 91 copies of insertion sequence elements belonging to 12 families in the Halobacterium sp. NRC-1 genome (2).
Mass spectrometry is a powerful technology for protein identification in the postgenomic era. Our previous shotgun peptide sequencing analysis of the Halobacterium sp. NRC-1 membrane and soluble proteomes identified a total of 426 unique proteins representing approximately one-fifth of the predicted proteome (2, 17). Among these, 232 were identified predominantly in the soluble fraction, 165 were in the membrane fraction, and 29 were in both fractions. Metabolic reconstruction found 103 of the identified proteins could be matched to enzymes in 52 metabolic pathways found in the Kyoto Encyclopedia of Genes and Genomes (KEGG)1 database (www.genome.ad.jp).
Here we report the systems analysis of the Halobacterium sp. NRC-1 soluble proteome identified by two-dimensional liquid chromatography coupled with tandem mass spectrometry. The analysis was facilitated by the BMSorter2 tool that incorporates the protein identification results into biological networks. In addition to being able to document the proteins identified in this study, we demonstrate the power of integrative analysis of proteomic data in the analysis of biological networks.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Fractionation of Peptides
An aliquot of 3 mg of soluble proteins was digested with 60 µg of sequencing grade modified trypsin (Promega, Madison, WI) in a total volume of 3 ml of a solution containing 50 mM ammonium bicarbonate (pH 8.3) at 37 °C overnight. The resulting peptides were fractionated by strong cation exchange chromatography as described previously (19). The fractionated peptides were desalted using 96-well format spin columns containing silica C18 matrix (Nested Group, Southborough, MA) according to the following procedure. First the matrix was washed twice by filling each well with 200 µl of 0.4% acetic acid and centrifuged at 770 x g for 2 min. After loading the peptide samples (
200 µl/well), the plate was incubated at room temperature for 30 min and then centrifuged as above to remove the buffer. The reversed phase C18-bound peptides were washed three times with 200 µl of 0.4% acetic acid and eluted with 200 µl of a solution containing 3 volumes of acetonitrile and 1 volume of 0.4% acetic acid. The eluents were vacuum-dried, and each peptide samples was resuspended in 10 µl of 0.4% acetic acid before analysis by LC-MS/MS.
Mass Spectrometric Analysis of Peptides
The desalted tryptic peptides were analyzed using a LCQ-DECA ion trap tandem mass spectrometer (Thermo Finnegan, San Jose, CA) coupled with a C18 trap ESI-emitter/micro-liquid chromatography column as described previously (20).
Computation Analysis of Mass Spectra
Tandem mass spectra were analyzed using the SEQUEST (21) algorithm to search against the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI) Halobacterium sp. NRC-1 proteome database (2). The searches were performed with a peptide mass tolerance of 3 daltons without enzyme specification. The other parameters were left as default. The SEQUEST outputs were consolidated to a single hypertext file and further analyzed using the proteomic data analysis pipeline (Institute for Systems Biology; www.systemsbiology.org). As part of this pipeline, the programs PeptideProphet and ProteinProphet (22, 23) assign probability values to each peptide and protein identification, respectively, that indicate the likelihood that the respective analyte has been identified correctly.
Biomodular Analysis of Proteomic Data
BMSorter is a bioinformatic tool developed to facilitate systems analysis of proteins detected by tandem mass spectrometry. It is a set of common gateway interface (CGI) scripts written in Perl language to provide an interface to access the results of analyses via web browser. The scripts require an environment with an Apache web server and Perl module: Bioperl1.4, CGI-3.04, and GD-2.16. BMSorter constructed a list of expressed proteins with interactions between the proteins. It acquired the protein list with identification probabilities from the ProteinProphet output file as well as the pathway information in the form of a template of interactions from the KEGG database. Cross-reference between pathway maps from KEGG PATHWAYS database and enzyme/gene information from KEGG LIGAND database implemented the integration between proteins and interactions. The tool set organized the proteomic data and KEGG pathway information in a summary table. Each row of the table is the summary of a pathway (or biomodule) containing the total number of reactions in the reference pathway, number of Halobacterium sp. NRC-1-specific reactions, percentage of NRC-1 reactions, number of identified proteins with ProteinProphet probabilities (P)
0.9, and percentage of identified proteins plus the hyperlinks to the pathway protein list tables and pathway maps with the ProteinProphet identification probabilities indicated.
Plotting the Metabolic Network
The publicly available Cytoscape tool (www.cytoscape.org) was used to display the protein identification information on amino acid metabolisms and the citrate cycle pathways in terms of the enzyme-metabolite interaction networks (24). The Java programs HalKGMLtoSIF.java and HalnoaFactory.java were developed to process the KEGG pathways KEGG mark-up language (KGML) files and proteomic data, respectively. The HalKGMLtoSIF.java program was used to parse the extensible markup language (xml) structure of the KGML files to generate the Cytoscape simple interaction format (SIF) and node attribute files. The HalnoaFactory.java was used to define the node properties (protein names and ProteinProphet probability values) in node attribute files.
| RESULTS |
|---|
|
|
|---|
Protein Identification
The SEQUEST output data were analyzed with the PeptideProphet and ProteinProphet software to evaluate and assign, respectively, the peptide and protein identification probabilities. A total of 888 proteins were identified with P between 0.9 and 1.0. Among these were 562 proteins with annotated functions, 189 conserved hypothetical proteins, and 137 hypothetical proteins of unknown functions (Supplemental Tables I and II). The distributions of probabilities and functional categories of identified proteins are summarized in Fig. 1. A total of 681 and 147 proteins were identified with a probability of 1.0 and 0.99, respectively. The estimated number of false positive identifications of N proteins with a probability value of P is equal to N x (1.0 P). The total number of false positive protein identifications was
5.
|
|
|
Transcription
The transcription of Halobacterium sp. NRC-1 genes is driven by a eukaryotic RNA polymerase II-like system. It consists of 12 subunits (A, B', B''', C, D, E', E''', H, K, L, M, and N) encoded by genes located at six loci. Except for the H and K subunits, all the subunits were identified by tandem mass spectrometry with P
0.9.
Transcription initiation in Archaea requires the binding of the TATA box-binding protein (TBP) and transcription factor B (TFB) to the promoter region (2527). Genome analysis identified the presence of six tbp and seven tfp genes (2, 28, 29). Only two TATA-box binding proteins and one Tfb protein, namely the chromosomal TbpE and TfbG and the minichromosomal TbpB, were identified by mass spectrometric analysis (Supplemental Table I). Two peptides of TbpE (P = 1) were found for a total of 14 times. The TbpB protein (P = 1), encoded by both pNRC100 and pNRC200, had a peptide identified once. The only detectable Tfb, TfbG (P = 1), had one peptide identified 10 times. The transcription initiation factor IIE
subunit (TfeA) had three peptides identified a total of 17 times. In addition, one peptide each from the termination-antitermination factors NusA and NusG was also identified.
Among the 27 transcription regulators predicted from the genome sequence (2), 12 were identified in this study (Supplemental Table I). These included the putative regulator ArcR expressed from a gene in the arcRACB gene cluster on pNRC200 that also codes the enzymes for arginine deiminase (ArcA), carbamate kinase (ArcC), and catabolic ornithine transcarbamylase (ArcB), all of which are required for fermentative growth using the arginine deiminase pathway (30). The phosphate transport system regulatory protein PhoU and phosphate regulatory protein homolog Prp1 encoded by genes in the phosphate transporter phoU-pstB2A2C2-phoX gene cluster and the juxtaposed downstream gene prp1 were identified. The putative transcription regulators ArsC, CinR, SirR, Trh1, Trh4, Trh5, and Trh7 and Hox-like transcription regulators (Hlx1 and Hlx2) were also detected by mass spectrometry.
Translation
A total of 117 proteins have been predicted to be associated with the translation in Halobacterium sp. NRC-1 (2). Among these 92 (78.6%) were identified in this study (Supplemental Table I). These included the ribosome structural proteins and proteins involved in aminoacyl-tRNA synthesis, translation initiation, elongation, and release of synthesized polypeptides.
Except for the asparaginyl- and glutaminyl-tRNA synthetases, all other synthetase-coding genes have been found in the Halobacterium sp. NRC-1 genome, and these included the duplicated tryptophanyl-tRNA synthetase genes (trpS1 and trpS2) (2). Proteome analysis indicated the expression of almost all synthetases with the exception of the TrpS2 protein. In addition, the identification of amidotransferase subunits GatA, GatB1, GatB2, and GatC suggested that Halobacterium sp. NRC-1 could synthesize all necessary aminoacyl-tRNAs for protein synthesis.
All of the nine eukaryotic-like translation initiation factors eIF-1A (Eif1a1 and Eif1a2); eIF-2 subunits
(Eif2a), ß (Eif2b), and
(Eif2g); eIF-2B subunits
(Eif2ba) and
(Eif2bd); ATP-dependent RNA helicase homologs eIF-4A (Eif4a) and eIF-5A (Eif5A); and bacterial-like initiation factor IF-2 (InfB) were identified. Also identified were translation elongation factors eEF1 subunits
(Eef1a) and ß (Eef1b) and eEF-2 (Eef2) and peptide chain release factor eRF1 (Erf1).
The Halobacterium sp. NRC-1 genome encodes 32 large and 25 small ribosomal subunit protein-coding genes (2). Proteome analysis indicated the expression of 27 (84.4%) large subunit and 23 (92%) small subunit proteins. Ribosomal subunit proteins that were not identified were Rpl10e, Rpl24e, Rpl39e, Rpl40e, Rpl44e, Rps14p, and Rps27e.
Energy and Amino Acid Metabolism
Proteome analysis indicated the expression of almost a complete set of all predicted citric acid cycle enzymes and 48 of the 89 amino acid metabolic enzymes (Supplemental Table I). These included arginine deiminase (ArcA), ornithine transcarbamylase (ArcB), and carbamate kinase (ArcC) presumably involved in the fermentative arginine degradation via the arginine deiminase pathway (30). There are 16 amino acid metabolic/degradation pathways listed in the KEGG database. Eleven pathways had three to nine proteins identified, whereas the alanine-aspartate metabolism and valine-leucine-isoleucine degradation pathways had 13 proteins, and the arginine-proline, glutamate, and glycine-serine-threonine metabolic pathways had 16 proteins identified by mass spectrometry analysis (Table I).
The amino acids that may be used in energy production were evaluated by systems analysis of the proteomic data on the amino acid metabolism pathways and citrate cycle (tricarboxylic acid cycle) networks (Fig. 3). The results suggest that up to eight amino acids may be involved in energy production. The amino acids aspartate, asparagine, proline, and arginine may enter the tricarboxylic acid cycle via oxaloacetate, fumarate, or oxoglutarate. Glutamate and glutamine, and glycine and serine may enter the cycle via oxoglutarate and oxaloacetate, respectively. Arginine may be used to produce ATP by the arginine deiminase pathway. Although a complete set of histidine utilization enzyme-coding genes (hutG, -H, -I, and -U) is present in the genome, none of the corresponding proteins were identified suggesting that histidine was not used to produce energy via the citrate cycle. The absence of detectable tryptophan biosynthesis enzymes TrpA, -B, -C, -D1, -E1, -F, and -G1 suggested that tryptophan was not being synthesized when the cells were harvested for proteome analysis. In addition, the interaction network suggested that the synthesis of the acidic amino acids aspartate and glutamate may occur to meet the demand for acidic amino acids that are needed as building blocks for the highly acidic proteome.
|
|
| DISCUSSION |
|---|
|
|
|---|
0.9. This can be compared with our previous study (17) that identified 255 proteins (P > 0.9) in the soluble proteome by only reversed phase C18 fractionation and gas phase fractionation of precursor ions at multiple m/z windows (20) over a total of 53 MS analyses; our new approach gave an
3.5-fold increase in protein identification in this analysis. Proteome analysis revealed the expression of more than one-third of the genes predicted from the Halobacterium sp. NRC-1 genome, which consists of 2,413 unique protein-coding genes. Of the 888 proteins identified at high confidence level (P > 0.9), 562 were expressed from genes with annotated functions, and 326 were expressed from genes of unknown function. Among the predicted proteins with defined functions, a significant fraction of those detected by mass spectrometry were involved in vital life processes, namely 48 (53.9%) of 89 amino acid metabolism; 38 (66.7%) of 57 nucleotide metabolism; 31 (56.4%) of 55 cofactor metabolism; 74 (66.7%) of 111 energy metabolism; 34 (57.6%) of 59 cell envelop components; 35 (29.2%) of 120 transport; 71 (52.2%) of 136 cellular processes; 28 (42.4%) of 66 DNA replication, repair, and recombination; 18 (51.4%) of 35 transcription; 12 (42.9%) of 28 regulation; and 92 (78.6%) of 117 translation-related proteins or enzymes were identified.
Halobacterium sp. NRC-1 is an aerobic chemoorganotroph usually cultured in a complex medium in the laboratory (3, 18). A minimal medium described for Halobacterium species contains all but five (aspartate, asparagine, glutamine, histidine, and tryptophan) of the 20 amino acids (31). Genome analysis suggested that several amino acids, including arginine and aspartate, may enter the citric acid cycle via the metabolic intermediates 2-oxoglutarate and oxaloacetate, respectively, as a source of energy (2, 30). An integrative analysis of the mass spectrometry-identified proteins in the amino acid metabolisms and citrate cycle network suggested that up to eight amino acids, arginine, aspartate, asparagines, glutamate, glutamine, glycine, proline, and serine, but not histidine, may be involved in energy production via the citrate cycle under laboratory culture conditions. Furthermore it also suggested that aspartate and glutamate can be synthesized to meet the high demand for acidic amino acids during protein synthesis. Although network analysis suggested the presence of a complete tryptophan synthesis pathway (Fig. 3), the lack of evidence of expression of seven trp genes, aroC, psc, VNG1244C, and VNG1245C suggested that tryptophan was not synthesized under our culture conditions.
Halobacterium sp. NRC-1 lives in harsh environments, which may expose the cells to extended UV irradiation, periodic changes of temperature, fluctuation of nutrient supplies, etc. Proteome analysis revealed the basal level expression of the bacterial-type excision repair system proteins UvrA, UvrB, and UvrD; eukaryote-type excision-repair proteins Rad2, Rad3b, and RadA1; and the photolyase Phr1; these should allow the cells to repair UV damage and tolerate a high level of sunlight exposure (2, 32, 33). The expressions of thermosome subunits A (CctA) and B (CctB) and heat shock proteins Hsp1, Hsp5, DnaJ, DnaK, GrpE, and Lon were supported by the mass spectrometry analysis results providing support for a lifestyle in an extreme environment. In addition, proteome analysis also demonstrated the presence of chemotaxis proteins CheA, CheB, CheC1, CheC2, CheR, CheW1, CheW2, and CheY and 11 signal transducers, Htr1 to Htr6, Htr10, and Htr12 to Htr15. The expression of these proteins suggests that the cells are equipped with the essential biomolecules for impromptu adaptation to the dynamic conditions found in such extreme environmental conditions (Supplemental Table I).
An examination of this and three other proteome analysis results suggests that at least 40% of predicted proteins are expressed in Halobacterium sp. NRC-1 (2, 17, 34, 35). A total of 972 (P > 0.9) unique proteins have been identified in this and our previous (17) proteome analyses. Using a ProteinProphet probability of 0.9 as the cutoff, most of the proteins (89 of 162 membrane, 218 of 226 soluble, and 26 of 29 membrane/soluble fraction proteins) found in our previous analysis were also identified in this study. The proteome analysis of a closely related strain, H. salinarium, resulted in the identification of 802 proteins in the soluble proteome (34) and 141 proteins in purified membrane (35). The H. salinarium proteome analyses detected
680 unique proteins classified as reliable or trusted identifications. Among these, 459 proteins were found in both strains at a high confidence level suggesting that more proteins may be expressed in Halobacterium sp. NRC-1 under standard laboratory culture conditions.
In addition to the expression level, at least two other factors, the size and the gene copy numbers, may affect protein identification in proteome analysis. Generally speaking, the number of tryptic peptides generated from each protein is proportional to its size. The larger the protein, the higher the number of tryptic peptides, and thus the higher the probability the protein will be identified by mass spectrometry. Maybe for this reason the relatively small housekeeping large ribosome subunit proteins L24E (62 aa), L39E (50 aa), L40E (47 aa), and L44E (61 aa); small ribosome subunit proteins S14P (52 aa) and S27E (57 aa); and DNA-directed RNA polymerase subunits N (66 aa) and K (60 aa) proteins were not identified by mass spectrometry. On the other hand, it may not be necessary to express more than one copy of the duplicated genes to carry out a certain biochemical function. This may be why only one of the histidinol-phosphate transaminase and tryptophanyl-tRNA synthetase gene products, HisC1 and TrpS1, respectively, but not HisC2 and TrpS2, were detected by mass spectrometry.
Similar to many other organisms with completed genome sequences the functions of a significant fraction of predicted genes have yet to be identified. In this study, we documented the identification of 888 proteins among which 326 were translated from genes of no defined function. Tandem mass spectrometry is a very useful tool for detecting expressed proteins. The reliability of most identified proteins can be confirmed through examination of mass spectra. In our proteome studies, a suggested list of expressed proteins that contains more than 40% of the predicted proteome was established to facilitate the design of future experiments such as the systematic functional analysis of all predicted genes in the genome.
| FOOTNOTES |
|---|
The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
Published, MCP Papers in Press, February 23, 2006, DOI 10.1074/mcp.M500367-MCP200
1 The abbreviations used are: KEGG, Kyoto Encyclopedia of Genes and Genomes; p, PeptideProphet probability; P, ProteinProphet probability; CGI, common gateway interface; SIF, simple interaction format; TBP, TATA box-binding protein; TFB, transcription factor B; aa, amino acids. ![]()
2 The software BMSorter, HalKGMLtoSIF.java, and HalnoaFactory.java are available upon request to wvng{at}ym.edu.tw. ![]()
* This work was supported in part by Grant NSC932314-B010014 from the National Science Council and an Aim for the Top University grant from the Ministry of Education in Taiwan (to W. V. N.). ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
|||| To whom correspondence should be addressed: Inst. of Biotechnology in Medicine, National Yang Ming University, 155 Li Nong St., Section 2, Taipei, Taiwan 112, Republic of China. Tel.: 886-2-2826-7321; Fax: 886-2-2826-4092; E-mail: wvng{at}ym.edu.tw
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| All ASBMB Journals | Journal of Biological Chemistry |
| Journal of Lipid Research | ASBMB Today |