|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 5:811-823, 2006.
© 2006 by The American Society for Biochemistry and Molecular Biology, Inc.

From the Department of Biotechnology, Genome Analysis Division, National Institute of Technology and Evaluation (NITE), 2-49-10 Nishihara, Shibuya, Tokyo 151-0066, Japan
| ABSTRACT |
|---|
|
|
|---|
Many microorganisms, in particular Archaea, live in extreme habitats such as high temperature, acidic pH, high salt concentration, etc., and many of them possess unusual cellular and molecular properties. They are collectively termed "extremophiles" and could potentially serve as valuable resources for novel biotechnological applications. Nonetheless there are few existing industrial applications in which either archaeal biomass or archaeal enzymes are used. This is partly due to the lack of data for the expression of individual genes predicted from genome analysis. Such will be best achieved by proteome analysis.
Aeropyrum pernix K1 is an aerobic hyperthermophilic crenarchaeon isolated in 1993 from a coastal solfataric thermal vent in Kodakara-jima Island of Kagoshima, Japan. It grows optimally at 9095 °C (1). Many of the thermostable enzymes of this archaeon are expected to be useful for a variety of industrial applications. The complete genomic sequence of A. pernix K1 was established in 1999, and
2,700 ORFs were predicted from the sequence of nearly 1.67 Mb in size. The data were made available to the public through DDBJ/EMBL/GenBankTM as well as the "Database of the Genomes Analyzed at NITE" (DOGAN).1 About 1,600 of the predicted 2,700 ORFs were hypothetical (2). Moreover the number of predicted ORFs is much larger than those of other Archaea and bacteria with similar genome sizes, casting doubt over the authenticity of the predicted ORFs. Natale et al. (3) reannotated the A. pernix K1 genome using the Clusters of Orthologous Groups of Proteins database and reported the total number of its protein-coding genes to be 1,871. Similarly the current RefSeq contains an annotation reported by Pruitt et al. (4) in which 1,841 proteins were predicted in the A. pernix genome, and Guo et al. (5) re-evaluated the A. pernix K1 annotation and inferred a total of 1,610 ORFs as potential protein-coding genes. The confusion concerning the annotation of the A. pernix K1 genome is one of the factors that might have hindered wide spread utilization of A. pernix K1 enzymes, many of which are expected to possess excellent thermostability.
There is an additional problem: from the genomic and proteomic analyses performed to date, ATG is the most common initiation codon, and GTG and TTG are used in less than 10% of bacterial genes (6, 7). In contrast, however, of some 2,700 ORFs predicted in the genome of A. pernix K1, 43% were deduced to be initiated with ATG and 57% were deduced to be initiated with GTG, which differs greatly from other species. Furthermore in A. pernix K1, genes initiated with TTG were reported (8) despite that TTG has not been reported as an initiation codon in other organisms.
The problems described above can only be experimentally clarified by performing proteome analysis. For this purpose, we adopted four methods to maximize the number of detected proteins. Consequently we were able to identify 704 proteins, including 19 that were derived from the genomic regions in which no ORFs were predicted previously (2). The results suggest at the same time that the number of predicted ORFs in the current version of DOGAN is largely overestimated due to the inclusion of ORFs for non-conserved hypothetical proteins with molecular mass of 1020 kDa. Furthermore amino-terminal amino acid sequences of 134 proteins were determined from which we were able to establish that surprisingly TTG is the most predominant initiation codon in A. pernix K1.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Protein Preparation
Two-dimensional (2D)-PAGE and 1D-SDS-PAGE-LC-MS/MS
A. pernix K1 cells were suspended in an extraction buffer (67% acetic acid containing 33 mM MgCl2) and disrupted by sonication at 4 °C. Cell debris were removed by centrifugation, and 4 volumes of 20 mM DTT in acetone were added to the supernatant. The mixture was stored at 20 °C, and the protein precipitates were collected by centrifugation and dried.
MD-LC-MS/MS
A. pernix K1 cells were suspended in distilled water and lysed by homogenization in S-203 (AS ONE, Osaka, Japan) for 30 s on ice.
2D-PAGE
Protein Separation by 2D-PAGE
IEF was performed on either 180-mm IPG strips with the pH range of 310 (Amersham Biosciences) or IPG ReadyStrips with the pH range of 36 or 58 (Bio-Rad). Protein samples were dissolved in a lysis buffer containing 7 M urea, 2 M thiourea, 4% CHAPS, 50 mM DTT, 40 mM Tris, and 0.2% carrier ampholyte and incubated at room temperature for 1 h. The first dimensional separation was performed on an IPGphor IEF apparatus (Amersham Biosciences). IPG strips loaded with 100 µg of protein were electrofocused first at 200 V for 1 h, then at a linear gradient of 2004,000 V for 6 h, and finally at 8,000 V to achieve a total of 60 kV-h. After IEF, the strips were equilibrated with an equilibration buffer containing 6 M urea, 30% glycerol, 2% SDS, 50 mM Tris-HCl (pH 6.8), and 1% DTT for 30 min. SDS-PAGE was then carried out on 12 or 16% polyacrylamide gels (20 x 20 x 0.1 cm). Proteins were visualized by staining with Coomassie Brilliant Blue R-250 (CBB) (Nacalai Tesque, Kyoto, Japan).
Radical-free and Highly Reducing (RFHR)-2D-PAGE
The method of Wada (9) was mainly followed. Protein samples were dissolved in a lysis buffer containing 8 M urea and 0.2 M mercaptoethanol and incubated at 40 °C for 30 min. Sample charging electrophoresis was carried out with 100 µg of protein on an 8% polyacrylamide gel containing 8 M urea, 40 mM KOH, and 0.37% acetic acid at 100 V for 30 min on an NA1450 apparatus (Nihon Eido, Tokyo, Japan). Subsequently the first dimensional separation was performed on an 8% polyacrylamide gel containing 8 M urea, 400 mM Tris, 500 mM boric acid, and 21.5 mM EDTA-2Na at 100 V for 15 h on an NA1460 apparatus (Nihon Eido). The second dimensional separation was then carried out on an 18% polyacrylamide gel containing 8 M urea 50 mM KOH and 5% acetic acid (16 x 16 x 0.2 cm) at 100 V for 30 h. Proteins were visualized with CBB as described above.
Enzymatic Digestion for 2D-PAGE-MALDI-TOF MS
In-gel digestion with modified trypsin (sequencing grade, Promega, Madison, WI) and sample spotting for MALDI-TOF MS were performed with the Investigator ProPrep automatic digestion and spotting system (Genomic Solutions, Huntingdon, UK) according to the manufacturers protocols with some modifications. The CBB-stained protein spots were excised from the gel and washed with 25 mM NH4HCO3 and acetonitrile at room temperature. The proteins were reduced with 10 mM DDT in 25 mM NH4HCO3 at 60 °C for 10 min and alkylated with 40 mM iodoacetamide in 25 mM NH4HCO3 at room temperature for 35 min. The dried gel pieces were rehydrated and incubated in 25 mM NH4HCO3 containing modified trypsin at 37 °C for 4 h. 3% formic acid was added to stop the enzymatic reaction, and the resultant peptides were concentrated, desalted by passing through a µ-C18 ZipTip (Millipore, Billerica, MA), mixed with a matrix solution of 50% acetonitrile saturated with
-cyano-4-hydroxycinnamic acid (Sigma), and air-dried on the target plate.
Mass Spectrometry of 2D-PAGE-MALDI-TOF MS
The resulting peptide mixture was subjected to analysis on an Auto-Flex instrument (Bruker Daltonics, Bremen, Germany) with
-cyano-4-hydroxycinnamic acid as the matrix and operated in the reflector mode. Calibration was performed in the external mode using a peptide calibration standard kit (Bruker Daltonics). For peptide assignment the mass spectrum data were analyzed using the MASCOT database search program (Matrix Science Ltd., London, UK) in the peptide mass fingerprinting mode against the database of putative proteins of A. pernix K1 containing the data for 2,694 ORFs as well as against the translation of the entire genomic sequence in all phases.
1D-SDS-PAGE-LC-MS/MS Analysis
Protein Separation by 1D-SDS-PAGE
Protein samples were dissolved in a lysis buffer containing 7 M urea, 2 M thiourea, 4% CHAPS, 50 mM DTT, and 40 mM Tris and incubated at room temperature for 1 h. Subsequently a 6x concentrated electrophoresis loading buffer containing 0.35 M Tris-HCl (pH 6.8), 10% SDS, 30% glycerol, and 9.3% DTT was added. 1D-SDS-PAGE was performed on 10% separating polyacrylamide gels with a Tris-Tricine running buffer containing 0.1 M Tris, 0.1 M Tricine, and 0.1% SDS.
Enzymatic Digestion for 1D-SDS-PAGE-LC-MS/MS
After CBB staining, the gels were sliced into 5-mm-thick pieces from the top band (>116 kDa) to the bottom line (4.4 kDa). In-gel digestion with modified trypsin was performed using Investigator ProPrep for 6 h and stopped with 3% formic acid. The resulting peptide mixtures were eluted from the gel and dried by evaporation. The peptides were diluted with 0.02% formic acid containing 0.005% heptafluorobutyric acid (HFBA) and 2% acetonitrile.
Mass Spectrometry of 1D-SDS-PAGE-LC-MS/MS
To analyze peptides, 2D-LC was combined with nano-ESI-MS/MS. The analysis was performed with a Finnigan LCQ DECA XP Plus ion trap mass spectrometer (ThermoElectron, San Jose, CA) coupled with an LC MAGIC 2002 system (Michrom Bioresources, Auburn, CA) through a nanoelectrospray ion source (AMR Inc., Tokyo, Japan) (10). The system was fitted with a strong cation exchange peptide trap column of 1.0-mm inner diameter and 8-mm length (Michrom Bioresources) for the first dimensional chromatography, a Peptide CapTrap (Michrom Bioresources) for desalting and concentration, a C18 reverse phase column (50-mm length and 0.2-mm inner diameter; Michrom Bioresources) for the second dimensional chromatography, and a Pico Tip (New Objective, Woburn, MA) as the electrosprayer. The solvents used for strong cation exchange were 0.02% formic acid containing 0.005% HFBA and 2% acetonitrile with either 0, 25, 50, 75, 100, 150, 250, or 500 mM HCOONH4 (used in eight steps). Desalting was performed with a mixture of 0.1% trifluoroacetic acid, 2% acetonitrile, and 98% water. For reverse phase chromatography, Buffer A (0.1% formic acid, 0.005% HFBA, 2% acetonitrile, and 98% water) and Buffer B (0.1% formic acid, 0.005% HFBA, 90% acetonitrile, and 10% water) were used to form a gradient of 565% of Buffer B in 20 min at a flow rate of 1 µl/min.
The mass spectrometer was operated in data-dependent MS/MS mode with dynamic exclusion at 4502000 m/z ranges, and the ions were selected for CID with automatic data-dependent settings. The MS/MS spectra were converted into peak list files with SEQUESTTM Browser (ThermoElectron) that were searched for with the MASCOT database search program in the MS/MS mode against the A. pernix K1 genomic data. The criteria adopted for protein identification were either 1) that at least three peptides with ion score 20 or higher match or 2) that at least one peptide with ion score 40 or higher matches.
MD-LC-MS/MS Analysis
Protein Separation by MD-LC
Proteins of a whole cell lysate were separated by off-line 2D-LC and 2D-LC-nano-ESI-MS/MS. The first dimensional chromatography was performed with a self-packed strong anion exchange (SAX) column prepared in a glass chromatography tube of 8-mm inner diameter and 100-mm length. Trimethylaminopropyl-bonded silica gel (BONDESIL-SAX, 40 µm, Varian, Palo Alto, CA) was used to fill the column. Buffers used were 20 mM Tris-HCl, pH 7.0 (Buffer C), and 20 mM Tris-HCl, pH 7.0, with 1 M NaCl (Buffer D). Proteins were eluted with Buffer C from 0 to 5 min with a linear gradient of Buffer C to Buffer D from 6 to 25 min and with Buffer D from 26 to 32 min. The flow rate was 2 ml/min, and the eluate was collected into eight 8-ml fractions. The fractions were concentrated to 0.5 ml by partial lyophilization (EYELA FD-81, Tokyo Rikakikai, Tokyo, Japan). The second dimensional chromatography was performed with a gel permeation chromatography (GPC) column (Bioassist G2SWXL, TOSOH, Tokyo, Japan). 200 µl of each of the concentrated SAX fractions were successively injected into a column connected with a guard column (TOSOH) and two GPC. Elution was performed with Buffer E (0.1 M sodium phosphate, pH 7.0) at a flow rate of 0.5 ml/min, and fractions were collected every 3.5 min starting from 18 min until 60 min (12 fractions). In this way a total of 96 fractions (8 x 12) were obtained.
Enzymatic Digestion for MD-LC-MS/MS
Samples in each fraction were reduced by incubating in 5 mM DTT at 60 °C for 30 min, alkylated with 15 mM iodoacetamide in the dark and at room temperature for 30 min, and digested with modified trypsin at 37 °C for 1 h. The samples were adjusted to pH 4 with trifluoroacetic acid, desalted with a C18 reverse phase column, and evaporated. The digests were dissolved in a mixture of 0.02% formic acid, 0.005% HFBA, 2% acetonitrile, and 98% water prior to MS analysis.
In-column Enzymatic Digestion
Proteins retained on the SAX column were treated with 0.1% RapiGest (Waters, Milford, MA) in 5 mM DTT and incubated at 60 °C for 30 min. The proteins were alkylated and digested with modified trypsin at 37 °C for 15 h. The digests were eluted with a mixture of 0.1% trifluoroacetic acid, 5% methanol, and 94.9% water; desalted with a C18 reverse phase column; and evaporated. The digests were dissolved in 0.02% formic acid with 0.005% HFBA, 2% acetonitrile, and 98% water prior to MS analysis.
Amino-terminal Amino Acid Sequence Analysis
Protein spots on 2D-PAGE were electroblotted onto a PVDF membrane (Sequi-Blot PVDF membrane, Bio-Rad) with a semidry blotting apparatus (Bio Craft, Tokyo, Japan). The blotted membrane was stained with CBB. Singly stained spots were excised from the PVDF membrane and applied to a protein sequencer (model Procise 491cLC, Applied Biosystems, Foster City, CA) if the staining intensity appeared to be strong enough for sequencing. For weakly stained spots, two to six excised spots were combined by repeating 2D-PAGE and then applied to the protein sequencer. Edman reactions were performed according to the manufacturers instructions. To identify each protein, the amino acid sequences obtained were compared with the predicted amino acid sequence data translated from the genomic sequence of A. pernix K1.
Miscellaneous
The genomic sequence data of A. pernix K1 along with the data for 2,694 annotated ORFs were downloaded from DOGAN (www.bio.nite.go.jp/dogan/Top). Additional A. pernix K1 data for 1,610 annotated ORFs were downloaded from the home page of Tianjin University BioInfomatics Centre (TUBIC) (tubic.tju.edu.cn/Aper/).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
|
|
|
Molecular Mass Distribution
The molecular mass distribution of proteins predicted from the genomic sequence of A. pernix K1 and those experimentally identified by 2D-PAGE, 1D-SDS-PAGE-LC-MS/MS, and MD-LC-MS/MS was compared in groups of 10 kDa up to and higher than 120 kDa (Fig. 5A). Although 49.4% of the proteins were predicted in the molecular mass range of 1020 kDa in the genome analysis, only 22.9% were actually observed in the same range. Consequently the number of predicted proteins in this molecular mass range in the current version of DOGAN appears to be overrepresented. The average molecular mass of the proteins identified by 2D-PAGE was 32.4 kDa, whereas it was 34.8 kDa by 1D-SDS-PAGE-LC-MS/MS or MD-LC-MS/MS. With the latter two methods, it is possible to identify proteins harboring a larger molecular mass value, whereas such is not the case with 2D-PAGE as the separation of larger proteins becomes poorer. In any event, it is obvious that about half of the proteins predicted from the genomic data in the 1020-kDa range appear to be incorrectly assigned.
|
The largest five of the protein-coding ORFs of A. pernix K1 are APE0620, APE0609, APE0057, APE1340, and APE1213. Of them, the products of APE1340 and APE0609 were identified in our proteome analysis. The former has homology to the reverse gyrase of Pyrococcus furiosus that was recently experimentally proven to be necessary for the growth of this bacterium at high temperature (11). The APE0609 protein is similar to a surface layer protein of Staphylothermus marinus. The surface layer protein of S. marinus forms a complex with a protease that is likely to play a role in taking up external peptides and proteins. A protein similar to the protease of this complex is encoded by APE0607 that was identified in our analysis. This protein is likely to have a similar function in A. pernix.
The remaining proteins were not identified in our analysis most likely because they are membrane proteins as they appear to possess a transmembrane domain. APE1213 is a paralogue of APE0607, but its expression might be different from the latter. The function of APE0620 and APE0057 remains to be investigated.
Isoelectric Point Distribution
The pI values of the identified and predicted proteins were compared with each other in the pI range from 3 to 13 (Fig. 5B). The average pI values of identified proteins were 7.25 (2D-PAGE), 7.55 (1D-SDS-PAGE-LC-MS/MS), and 7.70 (MD-LC-MS/MS), whereas the value for the predicted proteins was calculated to be 8.68. In the pI range between 5 and 7, the proteins predicted from the genomic data are much fewer than those identified by proteome analysis, whereas proteins in the high pI range (>10) show an opposite distribution pattern. Also proteins identified by 2D-PAGE were much more likely to be distributed in the pI range of 57 than those identified in other methods, although the reason for this is not clear.
Hydropathy Distribution
The GRAVY score indicates the hydrophilicity or hydrophobicity of a protein (12); it can be calculated as an arithmetic mean of the sum of the hydropathy index of each amino acid of a protein. About 70% of the predicted proteins concentrated in the neutral range (0.4 to 0.4). On the other hand, 85% of experimentally identified proteins were found to be distributed in the same range regardless of the identification methods used. The results indicated, therefore, that a large portion of proteins of A. pernix are in the neutral GRAVY score range (Fig. 5C). The averages of the GRAVY score for identified proteins were 0.15 (2D-PAGE), 0.13 (1D-SDS-PAGE-LC-MS/MS), and 0.16 (MD-LC-MS/MS), whereas that of the predicted proteins was 0.02.
Protein Class Distribution
Proteins predicted from the genome of A. pernix K1 as well as those identified by 2D-PAGE, 1D-SDS-PAGE-LC-MS/MS, and MD-LC-MS/MS were grouped into six functional classes (Fig. 5D) and compared. More experimentally identified proteins were found to be categorized in "metabolism" and "genetic information processing" than those predicted from the genome analysis, whereas a distinctly large proportion of predicted proteins were categorized in "non-conserved hypothetical proteins." With respect to the distribution pattern of proteins in the six protein classes, differences among the methods used were marginal.
Codon Usage Pattern
It is known that a characteristic bias in codon usage exists in each species of organisms (13). To examine whether and to what extent differences in codon usages exist between the experimentally identified ORFs and the ORFs predicted from the A. pernix K1 genomic sequence, codon usages in individual ORFs were plotted against the categories of proteins described above, namely molecular mass, pI, hydropathy, and protein class.
An example is shown in Fig. 6: in A, the codon usage patterns of proteins categorized by their molecular mass are shown, and in B, similar patterns of proteins categorized by their protein class are shown. As described above, a large proportion of predicted proteins were classified in the molecular mass range of 1020 kDa. Indeed many of them were found to deviate from the average use of TCC, whereas predicted proteins larger than 40 kDa appear to match well with those of experimentally confirmed proteins. Therefore, it seems that the usage patterns of various codons will serve as good tools to evaluate whether a particular ORF predicted from the genomic sequence is likely to be a true gene or not. Indeed this is one of the bases on which algorithms for the prediction of genes/ORFs in the genomic sequence data rely. A similar analysis was performed with respect to protein classes as shown in Fig. 6B. The patterns of experimentally identified versus predicted ORFs were found to be quite different when proteins categorized as "non-conserved hypothetical proteins" were analyzed. Interestingly such a clear difference shown in Fig. 6, A and B, was not observed when a similar analysis was performed with proteins categorized by their pI and hydropathy values.
|
An alternative approach was to omit the first dimensional separation and apply an enriched membrane protein fraction directly to 1D-SDS-PAGE-LC-MS/MS. Most of the bands on the 1D-SDS-PAGE gel consisted of multiple proteins, but the ability of HPLC in connection with ESI tandem mass spectrometry is powerful enough to analyze a mixture of derived peptides so that conventional tryptic digestion of proteins followed by mass spectrometric analysis led to the identification of each protein in the mixture. This method is called the "shotgun method" (16), and it gives a considerable advantage in the characterization of membrane proteins because separation was targeted at the peptides rather than proteins so that solubility problems that are often encountered with hydrophobic proteins could largely be alleviated. A disadvantage of the shotgun method is that intensive computational analysis of the entire data set is always required, and no information regarding the charge of the intact protein could be obtained.
MD-LC-MS/MS is a third alternative method we adopted in which proteins were separated by MD-LC (10, 17, 18), digested with a specific enzyme, and ionized with ESI, and then their mass spectra were measured. In regular MD-LC systems proteins and peptides are separated according to a variety of their properties, such as pI, relative molecular mass, and hydrophobicity (17). However, a disadvantage of the MD-LC-MS/MS method is that not all of the peptide fragments could be detected, and their quantity is low. Also the sensitivity of detection will progressively decrease as the number of fractions increases.
Comparison with the Results Obtained by Other Researchers
Guo et al. (5) examined the genomic data of A. pernix K1 and reported that 1,610 ORFs can be recognized as such (tubic.tju.edu.cn/Aper/). Therefore, their data were compared with the proteins experimentally identified. Of the 704 identified proteins, 692 were included as ORFs predicted by Guo et al. (5), but the remaining 12 were not. The molecular mass distribution of the proteins derived from TUBIC ORFs is very similar to the identified proteins as shown in Fig. 5A. However, their other characteristics slightly but significantly deviate from those of the proteins we experimentally characterized (Fig. 5, BD).
Assignment of the Codons for Translation Initiation
Of the 134 proteins whose amino-terminal sequences were experimentally determined, 50 were found to possess Met at their amino terminus. By comparing the nucleotide sequences corresponding to the amino-terminal Met, seven of them were found to possess ATG, and 14 others contained GTG. In addition, to our surprise, 29 others were found to contain TTG at the position of the amino-terminal Met. Subsequently we looked for candidate initiation codons based on the amino-terminal amino acid sequence data of the remaining 84 proteins that did not possess Met at their amino terminus. With 80 of them, a putative initiation codon was found in their immediate upstream, i.e. 39 of them were with TTG, 29 were with ATG, and 12 were with GTG, respectively. Because A. pernix K1 possesses an ORF encoding a protein homologous to methionine aminopeptidase, we interpreted the results to indicate that the amino-terminal Met of these 80 proteins was removed post-translationally by the putative methionine aminopeptidase.
With the remaining four proteins, however, candidate initiation codons were not found in the immediate upstream. Of these, APE0079 has homology to S-adenosylmethionine decarboxylase proenzyme 2 that is known to be post-translationally processed into an
and a ß chain. Similarly APE0521 has homology to a protease subunit of the proteasome of Methanococcus jannaschii, the amino-terminal region of which is likely to be processed. APE2072 has homology to the thermosome ß subunit of Thermoplasma acidophilum. By a "shotgun" mass spectrometry analysis, a peptide containing the detected amino terminus of APE2072 as well as 13 others matching the upstream region were detected (data not shown). Therefore, the detected amino terminus of APE2072 is likely to be that of a processed protein, although the nature of the processing remains to be clarified further. The genomic nucleotide sequence present in the immediate upstream of APE2493 is ATA. However, because ATA is not likely to serve as a translational initiator, it may be that the amino-terminal amino acid sequence corresponding to APE2493 was similarly processed, although the processing has not been elucidated yet.
To summarize the data for translational initiation codons corresponding to the 130 sequences other than the four proteins mentioned above, TTG was found to be most frequent (52%), whereas ATG and GTG, respectively, were found in 28 and 20% of the cases. Of the 130 ORFs, six proteins were found to be derived from the region in which no ORFs were previously assigned. Of the remaining 124 ORFs, the initiation codons of 89 (72%) were different from the positions that were assigned previously (2).
Characteristics of the Region Upstream of the Putative Initiation Codons
The mechanism of transcriptional initiation in Archaea has been speculated to be more closely related to that of eukaryotes (19, 20). However, three groups of transcription-associated proteins have been identified in Archaea: one group more similar to prokaryotes, another group more similar to eukaryotes, and a third group more similar to both prokaryotes and eukaryotes. Several homologues of bacterial transcriptional factors (2124) have been identified in Archaea, and Tolstrup et al. (25) have shown that the translation process of internal genes of operons in Archaea was similar to that in bacteria.
To characterize the genomic regions likely to function in translational initiation in A. pernix K1, the nucleotide frequency of the sequences surrounding the ORFs for 130 proteins mentioned above were analyzed according to the method of Xiu-Feng et al. (26). The ORFs were categorized into two groups: in Group 1 the ORFs in question are not well separated from their immediate upstream neighbor, whereas in Group 2 they are more than 50 bp away from each other. In the region preceding the 130 ORFs, there is a G box at the position 10 upstream of the initiation codon with a typical sequence of GGTG regardless of the ORF category, whereas ORFs of Group 1 harbor in addition an AT box at the 42 position upstream of the initiation codon and a weak C box at the 35 position (Fig. 7, A and B).
|
For the experimental identification of translational initiation codons, a large scale amino-terminal sequencing of Synechocystis sp. strain PCC6803 was performed by Sazuka and Ohara (27, 28) in which amino-terminal sequences of 234 protein spots were analyzed. The initiation codons in Synechocystis sp. were thus identified, suggesting that ATG was most predominant (88%) followed by GTG (7%) and TTG (3%). A similar set of data for Aquifex aeolicus, Archaeoglobus fulgidus, Bacillus subtilis, Borrelia burgdorferi, Chlamydia trachomatis, Escherichia coli, Haemophilus influenzae, Helicobacter pylori, M. jannaschii, Methanobacterium thermoautotrophicum, Mycobacterium tuberculosis, Mycoplasma genitalium, Mycoplasma pneumoniae, Rickettsia prowazekii, Synechocystis sp., and Treponema pallidum was summarized by Rocha et al. (29) based on the genomic sequences of individual organisms as shown in Table II along with the data for Aeropyrum pernix K1, Corynebacterium efficiens, Pyrococcus horikoshii OT3, Staphylococcus aureus N315, Streptomyces avermitilis, and Sulfolobus tokodaii that were taken from DOGAN.
|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, January 17, 2006
Published, MCP Papers in Press, February 2, 2006, DOI 10.1074/mcp.M500312-MCP200
1 The abbreviations used are: DOGAN, Database of the Genomes Analyzed at NITE; NITE, National Institute of Technology and Evaluation; 2D, two-dimensional; 1D, one-dimensional; MD, multidimensional; SAX, strong anion exchange; GPC, gel permeation chromatography; HFBA, heptafluorobutyric acid; RFHR, radical-free and highly reducing; CBB, Coomassie Brilliant Blue R-250; Tricine, N-[2-hydroxy-1,1-bis(hydroxymethyl)ethyl]glycine; TUBIC, Tianjin University BioInfomatics Centre. ![]()
* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
To whom correspondence should be addressed. Tel.: 81-3-3481-1936; Fax: 81-3-3481-8951; E-mail: yamazaki-shuji{at}nite.go.jp
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
T. J. Santangelo, L. Cubonova, and J. N. Reeve Shuttle Vector Expression in Thermococcus kodakaraensis: Contributions of cis Elements to Protein Synthesis in a Hyperthermophilic Archaeon Appl. Envir. Microbiol., May 15, 2008; 74(10): 3099 - 3104. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||