|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Molecular & Cellular Proteomics 2:1271-1283, 2003.
© 2003 by The American Society for Biochemistry and Molecular Biology, Inc.
,
,¶



From the
Core Facility for Protein Analysis and
Department of Molecular Biology, Max Planck Institute for Infection Biology, 10117 Berlin, and the || Institute for Microbiology, Ernst Moritz Arndt University Greifswald, 17487 Greifswald, Germany
| ABSTRACT |
|---|
|
|
|---|
-glutamyltranspeptidase was demonstrated.
Half of the world population is infected by H. pylori (4), a human pathogen that resides in the stomach and is a causative agent of chronic inflammation of the gastric mucosa. It was estimated that about 10% of the infections lead to severe pathological consequences such as atrophic gastritis, gastric and duodenal ulcers, adenocarcinoma, or mucosa-associated lymphoid tissue lymphoma (5).
Immunologically relevant proteins were searched in the complete cellular proteins (1), surface proteins (6, 7), and secreted proteins (8) and by immunoproteomics (913). These studies used classical proteomics combining high resolution two-dimensional electrophoresis and peptide mass fingerprinting. At present this technology seems to be limited to the identification of about 500700 spots of the H. pylori pattern. To identify a higher percentage of the open reading frames at the proteome level, prefractionations (6, 8, 14), complementary technologies (15), or improvements in the sensitivity of mass spectrometry and 2-DE methods are promising. Another attempt is to evaluate the data obtained by PMFs more comprehensively. The number of usually detected mass peaks clearly exceeds the number of peaks assigned to a main component of a 2-DE-separated spot. Now the question is if the information content of the remaining peaks is sufficient to identify minor components of spots, which could contribute to a higher coverage of the proteome by the 2-DE/MALDI-MS approach. In a first report we demonstrated the application of a software program named MS-Screener together with cluster analysis starting with an H. pylori dataset of 480 PMFs obtained by manual spot picking, digestion, and peak detection (16).
Here we present data from a procedure with automated spot picking, digestion, peak detection, and database search. For further evaluation we applied a new version of MS-Screener comprising elimination of contaminants, detection of neighbor spot contamination, cluster analysis, and identification of minor components of a spot. The antigenic proteins detected in spots identified in former investigations (10, 11) were analyzed in detail to assure that the antigenicity is caused by the formerly identified protein and not by a minor component. As special cases a dimerization of alkyl hydroperoxide reductase (HP1563), the degradation pattern of GroEL, and the fragmentation of
-glutamyltranspeptidase were elucidated.
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Two-dimensional Electrophoresis
H. pylori lysate proteins were separated using a 23 x 30-cm high resolution gel system with a resolution power of up to 5,000 spots (1, 17). For the first dimension an ampholyte mix of pI 211 was used, no alkylation was performed, and the second dimension ranged from 5130 kDa. For preparative gels 250 µg of protein were loaded, and the gels were stained with Coomassie Brilliant Blue (CBB) G-250 for 5 days (18).
In-gel Digest
In a large section of the 2-DE gel (2080 kDa, whole pI range) 384 different spots with staining intensities ranging from very weak to very high were excised in triple replicates. For this purpose a spot cutter (Proteome WorksTM, Bio-Rad) with a picker head of 2-mm diameter was used. Cut spots were transferred into 96-well microtiter plates. The tryptic digest with subsequent spotting on a matrix-assisted laser desorption ionization target was carried out automatically with the EttanTM spot-handling work station (Amersham Biosciences) using the following protocol. The gel pieces were washed twice with 100 µl of a solution of 50% CH3CN and 50% 50 mM NH4HCO3 for 30 min and washed once with 100 µl of 75% CH3CN for 10 min. After drying at 37 °C for 17 min 10 µl of trypsin solution containing 20 ng/µl trypsin (Promega, Madison, WI) was added and incubated at 37 °C for 120 min. For extraction, gel pieces were covered with 60 µl of 0.1% trifluoroacetic acid in 50% CH3CN and incubated for 30 min at 40 °C. The peptide containing supernatant was transferred into a new microtiter plate, and the extraction was repeated with 40 µl of the same solution. The supernatants were dried at 40 °C for 220 min. The dry residue was dissolved in 3 µl of 0.5% trifluoroacetic acid in 50% CH3CN, and 0.4 µl of this solution was directly spotted onto the matrix-assisted laser desorption ionization target. Then 0.4 µl of a saturated
-cyano-4-hydroxycinnamic acid solution in 70% CH3CN was added and mixed with the sample by aspirating the mixture five times. The samples were allowed to dry on the target for 1015 min before measurement in MALDI-TOF.
MALDI-TOF Mass Spectrometry
The MALDI-TOF measurement was carried out on the 4700 Proteomics Analyzer (Applied Biosystems, Foster City, CA). This instrument is designed for high throughput measurement, being automatically able to measure the samples, calibrate the spectra, and process the data using the 4700 ExplorerTM software. The spectra were recorded in a mass range from 900 to 3,700 Da with a focus mass of 2,000 Da. For one main spectrum 20 subspectra with 100 shots/subspectrum were accumulated using a random search pattern. If the autolytic fragment of trypsin with the monoisotopic (M + H)+ m/z at 2,211.104 reached a signal-to-noise ratio (S/N) of at least 10, an internal calibration was automatically performed as one-point calibration using this peak. If the automatic mode failed, manual calibration was applied. After calibration peak lists were created by using the "peak-to-mascot" script of the 4700 ExplorerTM software. Settings were a mass range from 900 to 3,500 Da, a peak density of 50 peaks/200 Da, a minimal area of 0, and a maximum of 200 peaks/spot. Three different peak lists were created for an S/N ratio of 5, 7, and 10, respectively. For confirmation of selected peaks MALDI-TOF/TOF spectra were recorded manually.
Database Searches
Identification of spots was done via batch mode using the Mascot protein identification system (Matrix Science Ltd., London, UK) in-house applying the recent H. pylori 26695 protein database downloaded from The Institute for Genomic Research (TIGR, www.tigr.org/). Optimal search parameters were 30 ppm peptide mass tolerance, fixed oxidation of methionine, and 1 missed trypsin cleavage. The criterion for reliable identification was a significant Mascot score >45 (p < 0.05) (19).
Data Analysis with MS-Screener
To realize an iterative data analysis (16) for large datasets we have developed a new MS-Screener version using the Java 2 standard edition 1.4.1 software development kit (J2SE1.4.1 SDK, java.sun.com/). This Java tool consists of 126 different program classes and was integrated in a user-friendly graphical user interface (GUI). To integrate a plot view the JFreeChart class library, version 0.9.8, was applied (www.jfree.org/jfreechart/index.html). The software runs under LINUX, Solaris, and Microsoft Windows and comprises a setup function for all operating systems. MS-Screener has the ability to import different ASCII file types like .pkm (GRAMS), .pkt, .txt (Data Explorer), and .dta (SEQUEST). Contaminant searches, calculation of half-decimal places, elimination of contaminants, screening of common masses, their rankings, and the generation of matrices to realize hierarchical agglomerative cluster analyses using R (www.r-project.org) can now easily be calculated in one work set. To find common contaminants in the complete dataset a mass tolerance (interval width) of 30 ppm and a threshold of 5% were applied. Masses that exceeded this threshold were eliminated from the peak lists. To calculate the half-decimal place rule an absolute standard deviation of 0.12 Da was applied, and outlier masses were marked and extracted in a separated table. Another function of the MS-Screener allowed us to generate binary or non-binary interval matrices, which include all intensity values of peak intervals as zero/one or real intensity counts, respectively. In the present study we used an interval width of 30 ppm, and about 1,600 intervals were calculated based on 384 spectra for one gel. Using these matrices hierarchical agglomerative cluster analyses were performed using the statistical programming environment R (www.r-project.org).
| RESULTS |
|---|
|
|
|---|
|
The automated recording and one-point calibration (using the autolytic trypsin peptide m/z 2,211.104) of mass spectra yielded peak masses that allowed database searches with an optimal mass tolerance of 30 ppm. The error distribution showed a slight slope to negative errors for smaller masses. This could have been avoided by applying a two-point calibration. However, many spectra did not contain a second sufficiently intense autolytic peak of trypsin (e.g. m/z 1,045.6). The described error distribution remained stable over all measured spectra.
An interesting finding was that possible doubly charged peaks occurred in rare cases of PMFs. Such peaks were conspicuous with regard to the half-decimal place rule, and isotopic patterns showed peak distances of m/z = 0.5. For example in spot 433d (HP0410) the peptide containing amino acids 3149 (QQHNNTGESVELHFHYPIK) was found with high intensity as a single-charged peptide with m/z 2,278.1 and as a doubly charged peptide with m/z 1,139.5. This peptide contains three histidines as potential additional proton acceptors.
To evaluate the reliability of the automated identifications we searched for contradictions in identifications of replicate spots. Here only nine spots (3%) were found to be differently identified with significant scores within the three replicates. When looking closer into these contradictions three spots were found to be imprecisely picked in a densely "populated" area. Two spots contained minor components caused by smearing from neighboring spots that were erroneously identified as major components. Futhermore, three spots contained mixtures of proteins where either one of them was identified as the major component. Only one spectrum was erroneously assigned to a protein.
Another way to evaluate the automated procedure was to compare identification data with manually produced data using a Voyager Elite (Applied Biosystems) (20). Sequence coverages and Mascot scores of 28 arbitrarily chosen spots with medium to low staining intensities were compared (Table I). For this purpose automatically acquired (S/N 7, contaminants removed) and manually measured peak lists (manual peak detection, contaminants also removed (16)) were compared using similar database searches with the Mascot protein identification system. Search parameters were similar apart from the peptide mass tolerances (automatic, 30 ppm; manual, 100 ppm) and possible methionine oxidations (automatic, fixed; manual, variable). For medium staining intensities both methods obtained comparable results. For the more interesting weakly stained spots, however, most spots showed more matched peaks, higher sequence coverages, and higher Mascot scores using the manual procedure.
|
48 of 960 evaluable spectra) was used to define contaminant peaks in the dataset (Fig. 3 and Table II). Sixty-one masses were found to be contaminants; 12 of these were trypsin autolytic peaks, and 12 were matrix cluster peaks (
-cyano-4-hydroxycinnamic acid, Na+, and K+ clusters were outliers of the half-decimal place rule; cluster masses were calculated as described by Keller and Li (21)). Four peaks were unknown outliers of the half-decimal place rule, three peaks belong to the most intense peaks of GroEL, seven were erroneously labeled isotope peaks because of low peak intensities, and the remaining peaks were unknown. It is important to notice that no keratin peaks were found. After removing the 5% most frequently occurring masses in the dataset the identification of spots was improved by 3% to 78% of all spots to be identified at least in one gel (using optimal parameters). A list of all identified spots is found in the supplemental table. As expected after removal of contaminants spot identification was improved for most spots except for very intensely stained ones; three masses of GroEL were misleadingly defined as contaminants (Table II). Even though this most widely spread protein was identified in 15 different spots only (3.9% of spots) these three peptide masses occurred in more than 6% of the spectra.
|
|
|
subunit (HP0073, nine spots), and chaperone and heat shock protein GroEL (HP0010, 15 spots). On average we found 1.6 different protein species/ORF in our dataset. With 15 different spots GroEL occurred most frequently (Fig. 4, left). Interestingly, three groups of GroEL spots occurred in the 2-DE gels: a main spot group (five spots), one group with lower MW and more acidic pI (two spots), and one group with lower MW and more basic pI (six spots). Evidence was found that the second group is N-terminally truncated because one peptide found in the main spot (amino acids 1320) was not found in the spectra of this spot group. The third group we assume to be C-terminally truncated GroEL protein species because seven peptides (comprising amino acids 425522) were not found in the spectra of these spots. In-silico calculated and apparent (according to gel position) MWs and pIs were in good agreement (data not shown).
|
|
|
-glutamyltranspeptidase, Ggt, HP1118) were spots 347 and 494. By comparing the sequences covered by the PMFs (Fig. 6) these spots appeared to be two fragments of the protein whose sequences were mutually exclusive. In-silico MW and pI calculation of the protein fragments with assumed cleavage at amino acid 370 resulted in similar values compared with the spot positions. Spot 347 is positioned at pI/MW coordinates 9.0/40.0, and amino acids 1370 were calculated to have 9.5/39.8. For spot 494 we found 6.7/20.0 according to position, and 6.3/21.0 was calculated for amino acids 371567. A spot containing the whole protein (theoretical mass of 61.2 kDa) was not found in the gels. Therefore, we assume that the entire
-glutamyltranspeptidase content of the cell is processed into two subunits.
|
|
Completion of the Proteome of H. pylori 26695
Another aspect of this study was the continuation of the proteome exploration of H. pylori. Here we identified 298 spots (78% of the spots measured), which represent 183 different ORFs. Twenty-four of these ORFs have not been identified before as compared with the dataset of our group to be published (Table V). Among these, four spot identifications conflict with the manual results presumably because of spot-picking tolerances in densely spotted areas or because spots contain protein mixtures.
|
| DISCUSSION |
|---|
|
|
|---|
To take the most advantage of such a dataset it was shown to be helpful to optimize the peak detection and identification parameters. Not too many possible modifications should be used because Mascot scores will fall with increasing amounts of possible peptide masses. Even more important, however, is to take advantage of the recording of replicate datasets, which will further improve the rate of identification considerably. Performing searches in triple replicates, we were able to achieve an identification rate of 75% for spots that were finally identified in at least one gel.
A good approach to assess the reliability of automatic identifications is the search for contradictions in identifications of replicate spots. Such differences may be caused by spot-picking tolerances, incidental differences in the automatic procedure, or unsuited database search parameters. We found only 3% of spots to be inconsistently identified in the three replicate datasets. Many of these spots were positioned in densely spotted areas, and their inconsistent identification may therefore be more a problem of picking or of protein mixtures in spots than of erroneous database search results. Spots laying side-by-side and containing different proteins will merge into one another even when the protein concentration in the merging zone is below the detection limit of the staining. Small variances in spot picking can in such cases coincidentally cause different identifications for one spot. The same is true for spots that contain mixtures of proteins with similar concentrations. Only one of 298 identifications was incorrect, which shows that identification was highly reliable. Additionally, the use of the exclusive identification criterion of a significant Mascot score of 45 (for use of The Institute for Genomic Research (TIGR) H. pylori 26695 database, p < 0.05) appeared to be trustworthy. It was not necessary to consider sequence coverages or number of matched peptides. The fact that only 3% of spots were inconsistently identified showed also the high reproducibility of spot patterns in our 2-DE gels.
An important aspect of this study was the comparison of automatic and manual procedures of data generation and identification. We have chosen 28 exemplary spots, which were identified automatically as well as manually (Table I). By comparing these results it became evident that differences for medial stained spots were negligible, whereas differences between identification of faint spots were noticeable. After removal of contaminant peaks (discussed below) manually generated spectra of faint spots contained on average more peaks, and also more peaks were matched to the given protein. The same holds true for sequence coverages and Mascot scores. These results were probably caused by the fact that the manual procedure could be adapted for individual spots with low protein contents. It is quite evident that automatic procedures may not be adjusted to all the spots in a gel where protein contents differ by several orders of magnitude. Consequently, manual measurements and data analyses are still powerful means to investigate faint spots.
We have developed the software MS-Screener, which not only is able to improve identification but also can be used for exhaustive data analysis. The new user-friendly graphical user interface allows the import of data in the form of ASCII files, calculates and removes contaminants, calculates and plots half-decimal places, screens and ranks for certain masses in spectra, and enables the generation of intervalized peak intensity matrices for further statistical analyses (Fig. 2). This tool was successfully utilized to improve identification, to analyze protein distributions, and to find minor components especially in H. pylori antigens as discussed below.
The removal of contaminants resulted in an improvement of the identification rate by 3% to 78% of spectra that were identified at least in one of the replicate gels. Sixty-one masses were found in
5% of the 960 spectra and were therefore defined to be contaminants, i.e. peaks that were not specific for a certain spot (Table II and Fig. 3). Contaminant masses were assigned to be matrix clusters, trypsin autolytic products, or peptides from the most frequently found protein GroEL. Seven masses were erroneously labeled isotope peaks. For peaks with low intensity the peak-labeling algorithm of peak-to-Mascot picked the more intense second isotopic peak instead of the monoisotopic. In these cases the monoisotopic masses were found in other spectra and appear also in the contaminants list. Although the source of the other contaminant masses is unknown, no keratin peaks were found. In our previous study (16), in which in 480 manually acquired and analyzed PMFs 69 contaminant masses in the comparable mass range of 9003,500 Da were found, 47 masses were assigned to keratins. In another recent study of 118 spectra (23), 71 contaminants in the range of 9003,500 Da were found in
5% of spectra, and 53 of these were keratins. These results show that although a comparable number of contaminants were found the use of fully automated spot picking, digesting, and spotting can be highly efficient to avoid contaminations with keratin.
The fact that one spot does not contain one protein but rather one protein can be distributed in several spots in the form of different protein species is well illustrated by the heat shock protein GroEL. This most widely distributed protein in our dataset was identified in 15 different spots (Fig. 4, left). Within these spots evidence was found that two were N-terminally truncated and that six were C-terminally truncated. The reasons for the exact spot positions within these groups (modifications, differences in lengths of truncations, or conformational differences) were not figured out. A further nine spots were found to contain three of the five most intense peptide masses of GroEL (Table III) and may therefore most likely contain low amounts of GroEL. In six spots GroEL was a minor component because they were identified to contain a different protein, and in three spots (not identified) this protein may be a minor or major component. These findings raise the question as to whether minor components represent co-migrating proteins, e.g. by protein-protein interactions during electrophoresis or in vivo, or represent just contaminations. It is important to notice that the GroEL peptide distribution was not fully reproduced within the three replicates. This might be caused by differences among the gel runs, or it might be a consequence of the low GroEL content within these spots so that in some cases these peptides might have fallen below the detection limit. Another possibility was that the criterion to find at least three out of the five most intense GroEL peaks was not rigid enough and that not all of these spots truly contain this protein. According to the identification results on average 1.6 spots/ORF were found in our dataset.
The protein alkyl hydroperoxide reductase (HP1563) was found in eight different spots. According to the position in the gel one spot had an apparent molecular weight that was double the weight of the main spot of the protein. Evidence was found that this spot contained dimers of the protein because cysteine-containing peptides were not found in the dimer spot (Fig. 5). This finding raised the question of whether these dimers exist in vivo or were artifacts of the two-dimensional gel electrophoresis. Artificial dimerization during the run of the second dimension can be ruled out because there was no smearing to be seen on the gels. As dimerization has little effect on the pI it could have taken place during the first dimension when the active concentration of the reducing agent dithiothreitol decreased. Alternatively, dimers may have been formed in vivo, and the concentration of dithiothreitol in the sample buffer was not sufficient to reduce all disulfide bonds because only a small part of the protein content of the main spot, according to the staining intensities, was found to be dimerized. The fact that no dimers were found from other proteins supports the idea that dimerization could have taken place in vivo. This finding is also verified by the fact that other members of the peroxiredoxin family form homodimers or even decamers (24).
The protein
-glutamyltranspeptidase (HP1118) was identified in two distinct spots, which were positioned far apart. The PMF-covered amino acid sequences of these spots were exclusive; their combined apparent masses added up to the theoretical mass calculated from the ORF so that we concluded that two fragments occurred (Fig. 6). Although both spots were only weakly antigenic in our immunoblots (11) the protein is known to be a virulence- and apoptosis-inducing factor of H. pylori that occurs in the form of two fragments (25, 26). Additionally, it was hypothesized that the protein is membrane-associated (25), and here the first 36 amino acids were not covered in the PMFs so that a cleavage of a signal sequence might have occurred.
A certain protein can not only be found in several spots, but a spot can also contain several proteins in the form of protein mixtures (similar amounts of protein), as minor components, or in the form of neighbor spot contaminants. In immunoproteomics antibody recognition of proteins separated on 2-DE blots is detected. Because highly specific antibodies may recognize very small amounts of protein it cannot be ruled out that minor components of spots might be detected instead of the major component. Therefore, one has to be sensitive to the identification of such antigens. Here we closely investigated 24 known antigenic spots as to whether they contained minor components using MS-Screener and hierarchical clustering (Table IV). Nine spots possibly contain other components, six were supposedly free of such components, and a further nine contained unknown peaks. From the nine spots first mentioned seven contained peptide masses from known antigens. However, only three spots showed concurrent recognition of the spot and the main spot of its minor component in our immunoblots (11) and might therefore have had an influence on the antigenicity. Two of these spots contain major components that were also found in other "clean" antigenic spots. Consequently, only one protein (spot 278, protease HP1012) remains that could have erroneously been assigned to be antigenic.
As mentioned above, the antigenic protein GroEL was identified in 15 different spots, and three of the five most intense peaks were found in a further nine spots. Twenty-one of these spots were recognized conjointly in the immunoblots (see Fig. 4 for the 15 GroEL-identified spots). For these spots no evidence for differential antigenicities of different protein species was found.
In our recent study (11) we identified five different groups of patients by hierarchical clusterings of immunoblot data. One criterion for the definition of two groups was the recognition of a spot cohort (spots 225, 226, 231, 232, 233, and 234), which was now identified to contain species of GroEL that are supposedly C-terminally truncated. For the reason that GroEL is a highly conserved antigen and that all of its known protein species were conjointly recognized by the sera of the patients, the biological relevance of these two patient groups remains unclear. Spot 154, which was a candidate for differential immunogenicity of different protein species of AtpA in the study mentioned above, was here identified to contain a mix of GroEL and AtpA. Several spots in this region (spots 154157) lie side-by-side and contain either one of these proteins or mixtures of both so that in this case the identification that depends on spot assignment between immunoblots and 2-DE gels remains uncertain. A differentiation between GroEL and AtpA could be obtained by incubation of recombinant proteins with patient sera.
A dataset of 960 PMFs was used to compare automatic and manual data acquisition and investigate protein distributions in 2-DE gels. Large datasets can quickly be generated and identified with automatic procedures. For this purpose it is highly recommended to investigate replicate datasets to raise the rate of identification and improve reliability. Additionally, optimization of peak detection and database search parameters as well as calculation and removal of contaminants were shown to be advantageous. Manual measurements are still up-to-date especially for faint spots where procedures can be adapted individually. In addition we confirmed that immuno proteomics is a powerful hypothesis-free approach to find antigen candidates given that spot identification is performed cautiously.
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, September 29, 2003, DOI 10.1074/mcp.M300077-MCP200
1 The abbreviations used are: 2-DE, two-dimensional electrophoresis; MW, molecular weight; PMF, peptide mass fingerprint; S/N, signal-to-noise ratio; MALDI, matrix-assisted laser desorption ionization; MS, mass spectrometry; TOF, time-of-flight; TOF/TOF, tandem time-of-flight; ORF, open reading frame; CBB, Coomassie Brilliant Blue; CHAPS, 3-[(3-cholamidopropyl)dimethylammonio]-1-propanesulfonic acid. ![]()
* This work was supported by Grants 031U/107 and 031U/207 from the Bundesministerium für Bildung und Forschung of Germany. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
¶ To whom correspondence should be addressed: Max Planck Institute for Infection Biology, Campus Charité Mitte, Schumannstrasse 21/22, 10117 Berlin, Germany. Tel.: 49-30-450578167; Fax: 49-30-28460507; E-mail: krah{at}mpiib-berlin.mpg.de
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. Pyndiah, J. P. Lasserre, A. Menard, S. Claverol, V. Prouzet-Mauleon, F. Megraud, F. Zerbib, and M. Bonneu Two-dimensional Blue Native/SDS Gel Electrophoresis of Multiprotein Complexes from Helicobacter pylori Mol. Cell. Proteomics, February 1, 2007; 6(2): 193 - 206. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Bumann, H. Habibi, B. Kan, M. Schmid, C. Goosmann, V. Brinkmann, T. F. Meyer, and P. R. Jungblut Lack of Stage-Specific Proteins in Coccoid Helicobacter pylori Cells Infect. Immun., November 1, 2004; 72(11): 6738 - 6742. [Abstract] [Full Text] [PDF] |
||||
| |||||||||||||||||||||||||||||