Complementary Analysis of the Mycobacterium tuberculosis Proteome by Two-dimensional Electrophoresis and Isotope-coded Affinity Tag Technology *

Classical proteomics combined two-dimensional gel electrophoresis (2-DE) for the separation and quantification of proteins in a complex mixture with mass spectrometric identification of selected proteins. More recently, the combination of liquid chromatography (LC), stable isotope tagging, and tandem mass spectrometry (MS/MS) has emerged as an alternative quantitative proteomics technology. We have analyzed the proteome of Mycobacterium tuberculosis, a major human pathogen comprising about 4,000 genes, by (i) 2-DE and mass spectrometry (MS) and by (ii) the isotope-coded affinity tag (ICAT) reagent method and MS/MS. The data obtained by either technology were compared with respect to their selectivity for certain protein types and classes and with respect to the accuracy of quantification. Initial datasets of 60,000 peptide MS/MS spectra and 1,800 spots for the ICAT-LC/MS and 2-DE/MS methods, respectively, were reduced to 280 and 108 conclusively identified and quantified proteins, respectively. ICAT-LC/MS showed a clear bias for high Mr proteins and was complemented by the 2-DE/MS method, which showed a preference for low Mr proteins and also identified cysteine-free proteins that were transparent to the ICAT-LC/MS method. Relative quantification between two strains of the M. tuberculosis complex also revealed that the two technologies provide complementary quantitative information; whereas the ICAT-LC/MS method quantifies the sum of the protein species of one gene product, the 2-DE/MS method quantifies at the level of resolved protein species, including post-translationally modified and processed polypeptides. Our data indicate that different proteomic technologies applied to the same sample provide complementary types of information that contribute to a more complete understanding of the biological system studied.

Classical proteomics studies of Mycobacterium tuberculosis have combined two-dimensional electrophoresis (2-DE) 1 with mass spectrometry (MS) and have revealed about 1,800 distinct spots separated by 2-DE (1). About 350 of these were identified, and the comparison of the protein patterns of virulent and attenuated strains identified several proteins that are being studied further as potential vaccine candidates (1,2). More than two million deaths/year and eight million new infections are caused by M. tuberculosis, the bacterium responsible for tuberculosis (3). Genomics, transcriptomics, and proteomics, which form the rationale basis to develop new therapeutic and preventive strategies, have been applied to combat this disease. The complete genome of M. tuberculosis comprises about 4,000 genes, which were classified in six protein classes and 30 subclasses (4).
The smallest unit of the proteome, the protein species, is defined by its chemical structure (5). Therefore, each modification of a protein leads to a new protein species. These are successfully resolved by 2-DE if they differ by at least one charge or by at least several hundred daltons in mass. Quantification by 2-DE requires a high degree of pattern reproducibility, which is difficult to achieve in a multistep and parallel procedure.
To alleviate limitations of the 2-DE/MS method, more recently internally standardized gel-free quantitative proteomics methods have been developed. Of these the prototypical method is isotope-coded affinity tag (ICAT) reagent labeling and tandem mass spectrometry (MS/MS) (6). Proteins contained in two sample mixtures are covalently labeled with the isotopically light or heavy form of the ICAT reagents, respectively, and the samples are combined and proteolyzed. After purification of the labeled peptides via the affinity tag that is part of the reagents, they are analyzed by LC-MS/MS. The peptides eluting from the final chromatography step are subjected to data-dependent MS/MS, providing for peptide quantification based on the relative signal intensities of the heavy and light forms of a particular peptide detected in the MS scan and for peptide identification by MS/MS and sequence database searching. This approach has been successfully applied to a wide variety of biological samples (7)(8)(9)(10)(11)(12)(13)(14). Higher mass accuracy and manual peptide selection without the time pressure from data-dependent procedures were achieved by combining ICAT labeling with MALDI quadrupole time-of-flight mass spectrometry (15). Furthermore, the labeling protocols have been optimized (16), resulting in a robust quantitative technology.
Here we compare the results obtained by analyzing a microbial pathogen comprising about 4,000 genes using the  (1), cellular proteins were dissolved in 9 M urea, 70 mM dithiothreitol, 2% carrier ampholytes Servalyte 2-4 (Serva, Heidelberg, Germany), and protease inhibitors (N-p-tosyl-L-lysine chlormethyl ketone, leupeptin, E64, pepstatin A; 25 M) (1). Carrier ampholyte isoelectric focusing was combined with SDS-polyacrylamide gel electrophoresis (1). Gels with the size 23 cm ϫ 30 cm were used. For analytical gels (gel width of 0.75 mm) stained with silver and for preparative gels (gel width of 1.5 mm) to be stained with Coomassie Brilliant Blue G-250, 60 g and up to 600 g of protein, respectively, were applied at the anodic side of the gel. The 2-DE technique used has a resolution power of 5,000 spots. Subtractive analysis was performed manually, and some of the clear intensity differences were quantified by the program TOPSPOT (available for download free of charge from www.mpiib-berlin.mpg. de/2D-PAGE/).
MALDI-MS-Proteins separated by 2-DE were identified by peptide mass fingerprinting after in-gel tryptic digestion as described previously (1). A Voyager Elite mass spectrometer (Perseptive Biosystems, Framingham, MA) was used with a mass accuracy of 30 ppm after internal calibration. Proteins were identified by MS-FIT or MAS-COT (Matrix Science Ltd., London, United Kingdom, www.matrixscience.com) database searches. The identified proteins had a sequence coverage higher than 30%. For comparison with the ICAT results the dataset described earlier (1, 2) was used. Spot positions and identities with additional information and hyperlinks to sequence and pathway databases are available at www.mpiib-berlin.mpg. de/2D-PAGE/.
ICAT Reagent Labeling and Chromatography-Total cell extract was prepared for the two strains, and proteins were dissolved in buffer (6 M urea, 0.05% SDS, 5 mM Tris, pH 8.3, 5 mM EDTA) to obtain a protein concentration of 700 g/200 l and reduced with 5 mM tributylphosphine for 30 min at 37°C. After addition of 350 nmol ICAT reagent to each sample (ϳ0.5 nmol of ICAT/g of protein; final ICAT concentration, 1.75 mM) proteins were incubated for 90 min in the dark at room temperature with gentle stirring. After incubation, dithi-othreitol was added to a final concentration of 10 mM to quench residual-free reagent. Labeled samples were mixed, diluted 4-fold with water so that the final urea concentration was 1.5 M, and digested with 1:25 trypsin/protein for 5 h at 37°C. To remove the remaining ICAT reagent and other contaminants and to separate peptides, the resulting peptide mixture was combined, acidified to pH 3, and loaded on a 4.6 mm ϫ 200 mm Polysulfoethyl A cationexchange chromatography column (Poly LC Inc., Columbia, MD) with 5-m particles and 300-Å pores at a flow rate of 800 l/min. The column was washed by buffer A (5 mM KH 2 PO 4 , 25% acetonitrile, pH 3.0), and the peptides were eluted by buffer B (20 mM KH 2 PO 4 , 350 mM KCl, 25% acetonitrile, pH 3.0). The column was developed over a 50-min dual-step salt gradient.
The collected fractions were dried by speed-vac and resuspended in 2ϫ phosphate-buffered saline pH 7.2 for avidin purification using a self-packed UltraLink monomeric avidin column (Pierce) with 400 l of packed beads in a glass Pasteur pipette. The column was washed with water, and biotinylated peptides were eluted by 0.3% formic acid. Avidin column eluent was dried down by speed-vac, and the pellet was resuspended in reverse phase buffer A (5% acetic acid, 0.005% heptafluorobutyric acid). The biotinylated peptides were analyzed using a reverse phase capillary chromatography (75 m, 10-cm self-packed C 18 column, Monitor, Column Engineering, Ontario, Canada) at a flow rate of 250 nl/min.
Ion Trap-MS-Peptide identification by collision-induced dissociation was carried out by data-dependent precursor ion selection using the dynamic exclusion option on a Finnigan LCQ ion trap mass spectrometer. The MS/MS spectra were searched against a protein sequence database (M. tuberculosis H37Rv, ftp://ftp.sanger.ac.uk/ pub/tb/sequences/TB.pep; 3,924 entries) using the SEQUEST software tool, and the abundance ratios of isotopically labeled cysteinyl peptide pairs were calculated using the XPRESS program (14).
Only strictly tryptic cysteinyl peptides and peptides with an Xcorr score value higher than 3 were used for identification. The first 800 identifications from SEQUEST, sorted by Xcorr, were verified by spot check using MASCOT. Also, quantification of the first 800 cysteinecontaining peptides was inspected manually. The SEQUEST results and the manual changes were implemented and stored in a relational MySQL database, and a Web interface was developed to ask multicriteria questions for intelligent data searches (www.mpiib-berlin.mpg.de/2D-PAGE/, functional classification tool). The database entries fulfilling these search criteria are reported by a browser such as Netscape or Internet Explorer.

RESULTS
Data Amount and Reduction-The M. tuberculosis H37Rv genome has been predicted to comprise about 4,000 genes (4). However, it cannot be expected that proteins representing every predicted gene are present in a bacterial culture grown under specific conditions. From M. tuberculosis H37Rv cells in late exponential phase about 1,800 distinct protein spots were resolved on silver-stained 2-DE gels (1). Analysis of the same samples by ICAT-LC/MS identified about 60,000 peptides. Both datasets contained redundancies. Because of post-translational modifications and processing, many genes are represented on 2-DE gels by multiple spots. In the ICAT experiment a particular peptide may occur in different chromatographic fractions and will therefore be sequenced multiple times. To obtain comparable datasets data reduction was necessary for both approaches. The data analysis flow chart is shown in Fig. 1.
The criterion to have comparable datasets was the accessibility of data. In 2-DE gels the best accessible proteins are within the high intensity spots, which resulted in high identification scores. In ICAT-LC/MS, where the abundance cannot be directly measured, the accessibility is also determined by scoring factors. Therefore, as a comparable dataset the most intense spots from a 2-DE gel and the most reliably identified proteins from the ICAT analysis were chosen. The 560 spots initially identified by 2-DE/MS and present in both strains were listed in order of decreasing silver-staining intensity. Of these the 160 most intense spots were chosen for further analysis. After removing redundancies caused by post-translational modifications, 108 unique proteins constituted the final 2-DE dataset that was used throughout the study (Table I).
The primary ICAT-LC/MS data contained numerous cysteine-free peptides and peptides missing lysines or arginines at their C terminus. These primary data were reduced to 2,000 cysteinyl peptides with double tryptic termini. The 800 peptides with the highest SEQUEST scores were further manually evaluated, resulting in 280 uniquely identified proteins (Table  II). This final ICAT-MS dataset was used throughout the study. Within the 108 and 280 proteins identified by 2-DE/MS and ICAT-LC/MS, respectively, 27 were common to both (Fig. 2). For peptide quantification, the same 800 spectra were evaluated manually, and if necessary, quantification calculated by the XPRESS program was corrected manually by adjusting the scan intervals, the tolerance, and the predicted shift of light and heavy peptide forms.
All the identified proteins were inserted into a 2D-PAGE database (www.mpiib-berlin.mpg.de/2D-PAGE). Within this database the position of the identified spots on 2-DE gels, including M r and pI values, MS data, and other characteristics, is available. Links to sequence, classification, and pathway databases provide further information. The relational structure of the database is constructed to allow flexible data mining procedures (18).
Protein Class Biases of 2-DE/MS and ICAT-LC/MS Proteomics-The genome of M. tuberculosis was classified into six functional classes (Fig. 3) (4) and further subdivided into 30 subclasses comprising 80 protein families. Three of these protein families (respiration, IS elements, and PE family) were further classified into subfamilies. To identify protein classes that were over-or underrepresented in the two datasets for either method the ratio of the number of identified proteins of each class and the total number of proteins was calculated. These ratios were compared with the ratio of the total number of members of each family and the total number of predicted proteins (normalized ratio).

Complementary Analysis of M. tuberculosis by 2-DE and ICAT
represented (Fig. 3). Both 2-DE/MS and ICAT-LC/MS approaches revealed similar numbers/class. However, at the resolution of the protein subclass level (Fig. 4), the ICAT-LC/MS method compared with the 2-DE/MS method preferentially identified proteins of the subclasses "central intermediary metabolism," "polyketide and non-ribosomal peptide synthesis," "degradation of macromolecules," "cell envelope," and "transport/binding proteins." In contrast, 2-DE preferred relative to the ICAT method the protein subclasses "lipid biosynthesis," "synthesis and modification of macromolecules," "chaperones/heat-shock proteins," "protein and peptide secretion," and "adaptations and atypical conditions." Both methods overestimated relative to the normalized ratio the subclasses "degradation," "energy metabolism," "amino acid biosynthesis," and "lipid biosynthesis and detoxification" and underestimated "cell envelope," "transport/ binding proteins," and "virulence." None of the proteins in the subclasses "cell division," "IS elements, repeated sequences, and phage," "PE and PPE families," "cytochrome P450 enzymes," "cyclases," and "chelatases" were experimentally verified by either of the two methods within the reduced datasets. With the restricted number of experimentally identified proteins a more detailed analysis at the protein family or subfamily level was not considered. These data suggest that either method as executed in this study was significantly biased in favor of highly abundant proteins. Molecular Mass Bias-To estimate the bias of either method for the identification of proteins of a particular mass range the percentages of proteins predicted from the genome of M. tuberculosis H37Rv and of proteins identified by 2-DE/MS and ICAT-LC/MS were sorted by M r in bins of 10-kDa width and into an additional bin for proteins Ͼ100 kDa (Fig. 5a). With 61.7% most of the proteins were predicted in the M r range between 10 and 40 kDa. In the range between 10 and 60 kDa 3,300 of 3,924 (84%) proteins were predicted. With respect to the M r of identified proteins the 2-DE and ICAT methods were complementary. For the low mass range (10 -30 kDa) proteins identified by 2-DE/MS were clearly overrepresented, whereas for all the mass ranges Ͼ30 kDa proteins identified by ICAT-LC/MS were overrepresented. The overrepresentation was most pronounced for proteins Ͼ100 kDa relative to both 2-DE/MS and normalized percentages. Both methods underestimated proteins with a M r Ͻ10 kDa, and furthermore, 2-DE gave a poor representation of proteins with M r Ͼ60 kDa.
Isoelectric Point Biases-To study biases of the two methods for certain pI ranges, the pI range from 3 to 13 was divided into bins of 1 pI unit, and proteins were assigned to these bins based on the calculated pI value of the unmodified polypeptide chain (Fig. 5b). The pI values were calculated by summing the pI values of the single amino acids and dividing them by the number of the amino acids (19). The relative distribution of the normalized mycobacterial proteins showed a low number of proteins in the pI range of 7-8, an observation that has also been described for several other organisms (20 -22). Both methods overrepresented proteins between pI 4 and 6, with 2-DE/MS showing stronger representation for pI range 4 -5 and ICAT-LC/MS for pI range 5-6. More than 50% of all proteins identified by either the ICAT-LC/MS or 2-DE/MS method had a calculated pI between 5 and 6, and more than 70% had a calculated pI between 5 and 7. Proteins of pI ranges Ͼ6 were underrepresented by both methods, and proteins of predicted pI values Ͼ11 were not detected at all.
Hydropathy Bias-Hydrophobic proteins often cause problems for protein analysis due to insolubility in commonly used solvents and loss by binding to surfaces. A measure for hydrophilicity and hydrophobicity is the grand average hydropathy (GRAVY) score, which is calculated as the arithmetic mean of the sum of the hydropathic indices of each amino acid in a protein (23). This index was used to compare the number of normalized proteins and those identified by ICAT-LC/MS and 2-DE/MS in specific hydropathic ranges (Fig. 5c). Fifty-nine percent of the normalized mycobacterial proteins have a hydrophobic character, and 41% have a hydrophilic character. The hydropathy distribution has the shape of a Gaussian curve with a maximum of more than 50% of the Quantification of Differences in Protein Abundance between M. tuberculosis H37Rv and M. bovis BCG-Absolute quantification of proteins is a difficult task and at best achieved by amino acid composition analysis (24). Therefore, most proteomic experiments attempt to obtain relative quantitative information by determining the abundance ratios of proteins present in two samples. In the 2-DE/MS method this is achieved by comparing optical densities of matched spots in 2-DE gels, and in the ICAT-LC/MS method it is achieved by calculating the ratios of signal intensities for pairs of differentially isotopically labeled peptides. Comparing virulent with attenuated strains of the M. tuberculosis complex with the classical proteomics approach, we detected 32 spots only present in the virulent strains and obvious intensity differences by visual inspection (1,2). Reproducibility of sample handling is improved by the ICAT-LC/MS approach due to the fact that aliquots from two biological samples are mixed directly after labeling and treated identically throughout the whole analysis procedure, the two samples thus serving as mutual internal standards.
A higher number of proteins of different abundance in the two strains were detected by ICAT-LC/MS. Proteins showing intensity differences by more than 3-fold were extracted from Table II and are summarized in Table III. After quantification by the XPRESS tool, the calculated ratios were manually verified for the 280 proteins extracted, and if necessary, they were corrected. This analysis revealed three proteins as unique for the virulent H37Rv strain. These three proteins are potential vaccine candidates. Two of them, Rv0223c and Rv1513, were already predicted by genomic analyses (25) as belonging to regions deleted in BCG strains. Rv1513 is a conserved hypothetical protein of unknown function and is specific to M. tuberculosis but absent from the wild type M. bovis genome, as are genes of the flanking clusters of Rv1506c and Rv1516c. Even though a considerable number of flanking genes are annotated as "conserved hypotheticals," this gene cluster may be involved in carbohydrate metabolism due to the fact that Rv1516c is a sugar transferase homologue and that gmdA, epiA, and Rv1520, which code for a GDP-mannose 4,6-dehydratase, a nucleotide-sugar epimerase, and another putative sugar transferase, respectively, are part of the same cluster. Rv0223c is a probable aldehyde hydrogenase that is present in the genome of M. bovis wild type but not of the leprosy agent Mycobacterium leprae, a mycobacterial species with large genetic deletions when compared with M. tuberculosis (26). The other Ϯ variant, Rv0570, was newly detected by ICAT-LC/MS and is a ribonucleotide reductase present in the genome of wild type M. bovis but absent from M. leprae. Furthermore, 20 proteins were increased and 23 decreased by more than 3-fold in M. tuberculosis H37Rv compared with M. bovis BCG (Table III). The genes of these proteins, of which only a small subset was also detected by the 2-DE/MS approach, are present in BCG, although probably down-regulated or not induced by so far unknown regulatory mechanisms active in M. tuberculosis under the same culture conditions. Furthermore, because this analysis was performed with mycobacterial samples from one type of growth condition, i.e. logarithmic growth phase in 7H9 fully complemented medium, the proteins down-or up-regulated in BCG in comparison to M. tuberculosis may not be controlled in the same manner in either of the two mycobacterial strains. The proteins up-regulated in BCG in comparison with M. tuberculosis, although present in the latter strain, are of various functional categories. Of particular interest were the proteins coded by genes Rv2935 and Rv2940c, which belong to a cluster of genes involved in glycolipid synthesis including synthesis of polyketides and mycocerosates, and Rv1527c, which is located in a cluster of genes involved in polyketide and lipid synthesis. These data suggest that under the specific growth conditions used in this study the lipid synthesis pathway in BCG is regulated differently than that of M. tuberculosis. As a certain mycocerosate and its synthesis machinery have been implicated in virulence in M. tuberculosis, these data are of potential clinical importance (27).  (Table III), we searched to explain this observation. One reason may be that the number of identified proteins is low in comparison to the expressed proteins in M. tuberculosis (Fig. 2) so that the chance of finding numerous proteins common to both datasets is low. To search for further reasons we analyzed the 27 proteins both approaches identified as quantitatively regulated. In 2-DE gels the spots of these 27 proteins were quantified with the help of the evaluation software TOPSPOT. The intensity values of spots from gels derived from four independent sample preparations for each mycobacterial strain were averaged. Accepting a divergence of 30%, only eight of the 27 proteins showed the same intensity difference detected by 2-DE and ICAT-LC/MS (Table  IV). One example is shown in Fig. 6. Enoyl-CoA hydratase (Rv0905) showed an intensity relationship of 1:1.12 and 1:1. the peptides of the protein covered 62% of the peptide mass fingerprint. All of the five most intense peaks belonged to Rv0905. The remaining peaks comprised contamination from matrix, trypsin, or stain. No additional protein reached an acceptable scoring factor, suggesting that Rv0905 represents the only protein in this spot (28). These data indicate that for the simple situation encountered for Rv0905, i.e. probable single spot pattern and no co-migrating proteins, the two methods provided comparable results.

Comparison of Quantification by ICAT-LC/MS and 2-DE
More complex situations resulted in a divergence of the ratios obtained by the two methods. Four spots contained more than one protein. Here, only ICAT-LC/MS allowed us to distinguish which one(s) of the proteins present in the spot changed their abundance. Therefore, quantification by 2-DE of spots comprising a mixture of two or more proteins is not useful unless also stable isotope-tagging methods are being used (17). One spot with an intensity ratio of 1:1 between BCG and H37Rv contained two proteins, phosphoenolpyruvate carboxykinase (Rv0211) and an ATPase of ATP:ADP antiporter family (Rv2115c) (Fig. 7). ICAT-LC/MS analysis revealed a ratio of 1:0.68 for Rv0211 and 1:1.22 for Rv2115c, clearly distinguishing between these two proteins. In these four cases only ICAT-LC/MS allowed us to quantify the protein amount. For five genes, the corresponding proteins were detected in different spots. In these cases, each protein species could be independently quantified by the 2-DE/MS method, whereas the ICAT-LC/MS method quantified the sum of the protein species as a single protein. Succinyl-CoA synthase ␣ chain (Rv0952) was identified in three 2-DE spots (Fig. 8). Each of them had a different intensity relationship between BCG and H37Rv of 1:0.41, 1:1, and Ϫ/ϩ. ICAT-LC/MS did not distinguish between these three protein species and calculated a 1:0.7 relationship. Adding up the intensities of all 2-DE spots containing Rv0952 a ratio of 1:1 was calculated between BCG and H37Rv assimilating the ICAT-LC/MS result. These cases exemplify the potential of 2-DE to separate proteins at the protein species level, a capability that is usually not provided by the ICAT-LC/MS method. This is of particular importance if differential post-translational modification leads to different More than one protein species in 2-DE gel electrophoretic mobility. For a further 10 proteins for which the ratios determined by 2-DE/MS and ICAT-LC/MS differed, the reasons for the discrepancy are presently unknown. It is likely that additional data, e.g. the identification of additional spots representing a particular gene product or the detection of additional proteins in specific 2-DE gel spots, will help to clarify the apparent discrepancies. These data clearly show that the 2-DE/MS and ICAT-LC/MS cannot be expected to and in fact do not provide identical quantitative results. Only in cases where a protein exists as a single protein species or where all protein species are represented by one spot containing only one protein will quantification in both methods be comparable. This was only true for 22% of the cases investigated. DISCUSSION As was shown for the ribosomal proteome the resolution of a complete proteome is feasible provided the complexity is not too high (5). Problems with dynamic range of protein concentration, sensitivity, and resolution increase with the complexity of the samples to be analyzed. In addition, intrinsic properties of certain classes of proteins can make their separation and/or detection difficult. Limitations for the analysis of basic and hydrophobic proteins by 2-DE have been de-scribed previously (5,20). The presence of cysteine in the protein is a prerequisite for ICAT analysis as proteins lacking cysteines cannot be quantified. From the mycobacterial genome 81% of the proteins contain at least one cysteine. Including the limitation by the MS measurement to peptides between M r 500 and 4,000 the percentage of accessible proteins is reduced to 75%. Cysteine content in the protein subclasses chaperones/heat-shock proteins and PE and PPE families is significantly reduced (4). Only five of 16 predicted chaperones/heat-shock proteins contain at least one cysteine. Heat-shock proteins like DnaK, GroEL2, GroES, and HspX are represented as intense spot series by 2-DE but are not present within the ICAT-labeled proteins. Instead, LC/MS peptides of these proteins were found with high scores within the unlabeled fraction of the 60,000 peptides.  Fig. 4 illustrates lack of the subclasses polyketide and non-ribosomal peptide synthesis, central intermediary metabolism, and degradation of macromolecules by 2-DE/MS analysis, whereas by ICAT-LC/MS all of these three subclasses were overrepresented. The subclass polyketide and non-ribosomal peptide synthesis contained 41 proteins, and 21 of them had an M r Ͼ100,000. This may be the reason for absence within the 2-DE patterns and overrepresentation in ICAT-LC/MS. The other two protein subclasses contain only one protein with an M r Ͼ100,000, leaving the reason unresolved as to why these subclasses were not represented by 2-DE/MS.
Membrane proteins were also underrepresented by the ICAT-LC/MS approach, although an earlier report described the analysis of 491 microsomal proteins and therefore showed compatibility of the method with the analysis of poorly soluble proteins (14). The underrepresentation of membrane proteins in this study is therefore most likely explained by the protein solubilization conditions used. Underrepresented protein classes other and unknowns could contain falsely assigned genes. We have shown by 2-DE/MS (29) that gene prediction programs have neglected six genes in M. tuberculosis. The only way to avoid falsely predicted genes is to accept a gene as such only after obtaining evidence for its existence at the protein level. Genes in the genome lists should be annotated by a comment "confirmed at the protein level" with reference.
Proteins can only be accurately quantified by 2-DE/MS if the spot intensity is in the linear range of the staining method used and if the spot consists only of one protein species. Furthermore, we show that quantitative results obtained by the two methods cannot be compared directly. quantitative proteomic methods, and the optimal use of either method depends on the biological objectives.
It should be noted that both methods are undergoing significant improvements. To avoid the quantification problems caused by non-parallel separation and identification in the classical 2-DE/MS proteomics an ICAT-2-DE/MS strategy was developed (17). The advantages of protein species separation are combined with parallel quantification and quantification of several proteins within one spot. ICAT-2-DE/MS will not allow the quantification of large proteins by the parallel process and needs again to be complemented by ICAT-LC/ MS. Fluorescence labeling by difference gel electrophoresis (30) exchanges optical density measurements by fluorescence measurements, hereby increasing the dynamic range of optical measurements. Experience will show whether the labeling process with difference gel electrophoresis, ICAT, or others can be performed with 100% yield for all proteins. Maybe even these methods have to be accepted to be complementary because of the different amino acids that are labeled. The ICAT-LC/MS method has been advanced by the development of second generation isotope-tagging reagents that are characterized by the use of 12 C/ 13 C as the stable isotope, resulting in the elimination of the chromatographic shift of deuterated tags and the introduction of an acid-cleavable linker that leads to improved MS/MS spectral quality due to the reduced size of the isotope tag. Furthermore, advanced software tools for the analysis of LC-MS/MS data have been developed that significantly accelerate the pace and increase the consistency of data analysis (31,32). 2 Collectively, these recent technical advances suggest that the performance of quantitative proteomic technologies is rapidly approaching the stage at which the routine and complete analysis of cellular proteomes will become reality. * This work was supported by European Union Grant QLK2CT200001536; Bundesministerium fü r Bildung und Forschung (Germany) Grant 031U107A/207A; NHLBI, National Institutes of Health Contract N01-HV-28179; and NCI, National Institutes of Health Grant 1 R33 CA93302-01. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.