Chromatographic Isolation of Methionine-containing Peptides for Gel-free Proteome Analysis

A novel gel-free proteomic technology was used to identify more than 800 proteins from 50 million Escherichia coli K12 cells in a single analysis. A peptide mixture is first obtained from a total unfractionated cell lysate, and only the methionine-containing peptides are isolated and identified by mass spectrometry and database searching. The sorting procedure is based on the concept of diagonal chromatography but adapted for highly complex mixtures. Statistical analysis predicts that we have identified more than 40% of the expressed proteome, including soluble and membrane-bound proteins. Next to highly abundant proteins, we also detected low copy number components such as the E. coli lactose operon repressor, illustrating the high dynamic range. The method is about 100 times more sensitive than two-dimensional gel-based methods and is fully automated. The strongest point, however, is the flexibility in the peptide sorting chemistry, which may target the technique toward quantitative proteomics of virtually every class of peptides containing modifiable amino acids, such as phosphopeptides, amino-terminal peptides, etc., adding a new dimension to future proteome research.

a narrow-bore reverse-phase ZORBAX ® 300SB-C18 column (2.1 inner diameter ϫ 150 mm, Agilent Technologies, Waldbronn, Germany) coupled to an Agilent 1100 Series Capillary LC system under the control of the Agilent ChemStation software modules. Following injection of the sample, a solvent gradient was developed at a constant flow rate of 80 l/min. First, the column was rinsed with 0.1% trifluoroacetic acid in water (Baker HPLC analyzed, Mallinckrodt Baker B.V., Deventer, The Netherlands) (solvent A) for 10 min followed by a linear gradient to 70% acetonitrile (Baker HPLC analyzed) in 0.1% trifluoroacetic acid (solvent B) over 100 min (thus an increase of 1% of solvent B/min). We refer to this reverse-phase-HPLC separation as the primary run. Peptides were collected starting from 40 min on (corresponding to a concentration of 30% of solvent B) in a total of 48 fractions of 1 min (or 80 l) each in a microtiter plate using the Agilent 1100 Series fraction collector. Fractions separated by 12 min were pooled as described in Table I and dried in a centrifugal vacuum concentrator. These dried fractions were redissolved in 70 l of 1% trifluoroacetic acid in water and placed in the Agilent 1100 Series Well plate sampler. The methionine oxidation reaction proceeded in the injector compartment by transferring 14 l of a freshly prepared aqueous 3% H 2 0 2 solution to the vial containing the peptide mixture. This reaction proceeded for 30 min at 30°C after which the sample was immediately injected onto the reverse-phase-HPLC column. Methionine-sulfoxide (Met-SO)-containing peptides elute under the given experimental conditions in a time window from 7 to 1 min in front of the bulk of the unmodified peptides and were collected in eight subfractions per primary fraction (Table I and Fig. 1). Subfractions with the same subscript and derived from the same secondary run (e.g. subfractions 12 1 , 24 1 , 36 1 , and 48 1 of run 2A were pooled; see Table I and Fig. 1).
Mass Spectrometry-Matrix-assisted laser dissociation ionizationtime of flight-mass spectrometry (MALDI-TOF-MS) analysis was carried out as described previously (15). This was done on one-quarter of each set of pooled subfractions.
For liquid chromatography-tandem mass spectrometry (LC-MS/ MS) identification of methionine-containing peptides, we used 75% of the volume of the pooled subfractions, which were dried in a centrifugal vacuum concentrator and redissolved in 45 l of 0.05% formic acid in 2:98 acetonitrile:water (by volume). This solution was split into two equal parts, which were used for two consecutive LC-MS/MS runs. Per run, 20 l was loaded onto a 0.3-mm-inner diameter ϫ 5-mm trapping column (PepMap, LC Packings, Amsterdam, The Netherlands) at a flow rate of 20 l/min of solvent A. By valve switching, the trapping column was back-flushed, and the sample was loaded onto a nanoscale reverse-phase C18 column (75-m-inner diameter ϫ 150-mm PepMap column, LC Packings), and a binary solvent gradient was started. The solvent delivery system was run at a constant flow of 60 l/min, and by using a 1:300 flow splitter (Accurate, LC Packings), 200 nl/min of solvent was directed into the nanocolumn. Peptides were eluted from the stationary phase using a gradient from 0 to 100% solvent B applied in 50 min. The outlet of the nanocolumn was in-line connected to a distal metal-coated fused silica PicoTip TM needle (PicoTip TM FS360-20-10-D-C7, New Objective, Inc., Woburn, MA) placed in front of the inlet of a Q-TOF mass spectrometer (Micromass UK Ltd., Cheshire, UK). Automated datadependent acquisition with the Q-TOF mass spectrometer was initiated 20 min after the solvent gradient was started. The acquisition parameters were chosen such that only doubly charged ions were selected for fragmentation. After completion of the first LC-MS/MS run, a mass exclusion list was created containing all the selected ion masses of the peptides that were identified using Mascot (16). This exclusion list was then used for the second LC-MS/MS analysis on the remaining half of the material. The same procedure was used for the analysis of the other pooled subfractions.
Peptide and Protein Identification-The obtained collision-induced dissociation spectra in each LC-MS/MS run were automatically converted to a Mascot (www.matrixscience.com)-acceptable format (pklformat) using Proteinlynx available in the Micromass MassLynx software (version 3.4). Per Mascot search a maximum of 300 collisioninduced dissociation peak lists were merged and used for peptide identification in a newly created database only containing E. coli methionine-containing tryptic peptides. For this list we used the E. coli K12 proteome (Refs. 17 and 18 and ftp.expasy.ch/databases/ complete-proteomes/ECOLI.dat). The following search parameters were used: enzyme: trypsin, maximum number of missed cleavages: 1, variable modification: oxidation (Met), N-formyl (protein) and pyroglutamate formation (amino-terminal Glu and Gln), peptide tolerance: 0.3 Da, MS/MS tolerance: 0.25 Da, and peptide charge: 2ϩ. Only MS/MS spectra that exceeded Mascot's significance level were retained.
A second independent method was developed to match experimental spectra with peptide sequences. This method was devised to quantify the probability of a spectrum to match a peptide sequence using two independent methods. As for Mascot, we used as reference the list of methionine-containing tryptic peptides (0 or 1 miscleavage) derived from the complete E. coli K12 proteome. Optional modifications used were oxidation of methionine and the formation of pyroglutamic acid from glutamate and glutamine. From each peptide sequence a list of b and y ions was derived to obtain a theoretical spectrum. Each experimental spectrum containing at least six peaks was scored for similarity with each theoretical spectrum with matching precursor ion mass.
The scoring scheme thus associates two distinct scoring methods that both quantify the likelihood of the experimental spectrum-theoretical spectrum match to occur. The first scoring takes into account the number of matching peaks with reference to the size of the spectra. The G-test was used to evaluate the probability for a match to differ from expectancy. The second scoring focuses on the amplitude of the matching peaks with reference to the relative amplitude of the spectra. The score indicates the deviation from expectancy of the correlation coefficient between spectra. The final score is the product of the two intermediate scores. Because we here combined a probabilistic approach and signal correlation, we named this algorithm Procorr TM . Procorr TM is accessible at penyfan.rug.ac.be/procorr/, and details will be published in the future. 2 The same MS/MS spectra were subjected to Procorr TM analysis, and the matches differing from expectancy were retained (confidence level of 98%). Since the number of identified proteins is ϳ2.5 times lower than the number of identified spectra, a level of confidence of 98% for spectrum identification corresponds to a level of confidence of at least 95% for protein identification.

RESULTS
The Concept of COFRADIC TM -Our gel-free proteome approach starts from a protein cell lysate that is digested with trypsin. The resulting peptide mixture may contain up to 50,000 and most likely even more different components. Out of this complex mixture, we select a subset of peptides, which is highly representative of the parent proteins originally present in the lysate. In this respect our approach is similar to the isotope-coded affinity tag technique (3) or the covalent chromatography method (5) since we also select for a subset of representative peptides. Technically, however, we do not use any tagging chemistry combined with affinity selection but rather use a specific chemical or enzymatic modification on peptides containing rare amino acids, thereby altering their chromatographic properties. When such modification reactions are carried out in between two consecutive identical chromatographic runs, the subset of altered peptides will change elution times in the second run, while the non-modified peptides will elute at the same predictable positions. Such a strategy, in which a modification is carried out between two identical runs, to induce a shift in the chromatographic behavior of the modified components, was previously used on peptide mixtures derived from single proteins and was called "diagonal chromatography" (13). Its name and concept was derived from the technique of "diagonal electrophoresis," which was introduced in a series of elegant articles by the group of Hartley and colleagues (19).
In Fig. 1 we illustrate the adaptation of diagonal chromatography to the sorting of subsets of peptides from very complex mixtures. During the first chromatographic step (run 1), peptides are separated and collected in fractions of appropriate time intervals (Fig. 1A). A specific modification reaction is then carried out in every fraction, altering the properties of a subset of peptides. Every fraction could then be rerun under the same chromatographic conditions, referred to as the secondary run (run 2). The altered peptides will now shift in run 2 compared with their original positions in run 1. The unaltered peptides do not show this shift (Fig. 1B). The shifted peptides can be collected for analysis.
The number of secondary runs, which in principle should be equal to the number of fractions collected in the primary run, can be reduced by combining primary fractions. This is done in such a way that the shifting peptides of a given fraction do not overlap with the non-shifting peptides of neighboring fractions. Depending on the extent of the shifts, up to four or more primary fractions can be combined, thereby reducing the number of secondary runs by the same factor (Fig. 1C). The entire sorting procedure can further be shortened, if necessary, by using two or more columns operating in a synchronous mode. The unaltered peptides are mostly discarded, while the sorted peptides are either on-line analyzed by mass spectrometry or collected for identification in a ternary LC-MS-coupled system. The procedure, in which fractions of the first chromatographic step are combined, modified, and run in a diagonal chromatographic manner is therefore called COFRADIC TM .
Application of COFRADIC TM to the E. coli Proteome-An analysis of the predicted proteome of different model organisms revealed that methionine-containing peptides provided the best representation of the predicted proteins. For instance, for the E. coli proteome, between 99.7 and 95.8% of the predicted proteins contained at least one methionine residue (depending whether the initiator methionine is counted or not). Only 85.4% of the proteins contained cysteine. The same trend in amino acid representation is also observed in other model organisms (data not shown). We therefore decided to select for Met-containing peptides and used the oxidation of methionine to Met-SO as the sorting vehicle since the sulfoxide is more hydrophilic than the non-modified peptide.
COFRADIC TM was therefore used to sort Met peptides In every fraction a subset of peptides is modified by the use of a specific reaction. The modified peptides now acquire altered chromatographic properties (here illustrated by a shift toward more hydrophilic positions). When the peptides of the treated fractions are rerun in the same chromatographic system, the unmodified peptides will elute at the same position, while the subset of modified peptides will show a hydrophilic shift and elute in front of the bulk of unmodified peptides. The former are collected for identification (B). To reduce the number of secondary runs, we can combine fractions of the primary run in such a way that shifted peptides of a given fraction do not overlap with the non-modified peptides of the neighboring fractions. In the theoretical example shown, we combine fractions 8, 20, 32, and 44 and subject them together to the modification reaction (C). The sorted peptides can be directed for further analysis (while unmodified peptides can be discarded) and are collected each time in subfractions (example: 20 1 , 20 2 , . . . 20 8 ). The UV traces (absorption at 214 nm) are derived from peptide mixtures from a total E. coli trypsin digest. mAU, milliabsorbance units. present in a tryptic digest of a total, unfractionated, 4 M urea extract of 50 ϫ 10 6 E. coli cells. Forty-eight fractions of 80 l (1 min) each were collected during the primary reverse-phase-HPLC run. The first fraction was taken between 40 and 41 min (number 1), and the last fraction was taken between 87 and 88 min (number 48) following the start of the run (Fig. 1A). In every fraction we converted the methionine peptides to their sulfoxide derivative by a simple oxidation step. Conditions were established in which neither Cys nor Trp residues were oxidized and where Met residues were not converted into their sulfones. In the chromatographic conditions used, the oxidized Met peptides generally display a hydrophilic shift ranging from 1 to 7 min. The extent and the range of the hydrophilic shifts were similar for early-and late-eluting peptides. Therefore, the same time shifts and intervals for peptide selection could be kept throughout the entire secondary run.
The sorted Met-SO peptides were collected during a 6-min broad interval starting 7 min before the elution time of the unaltered peptides. This window is thus 6 times broader than that in which peptides eluted during the primary run. This is an important aspect of COFRADIC TM : the sorted peptides elute in a less compressed manner, thereby facilitating their identification by further LC-MS/MS analysis.
In the COFRADIC TM mode (see Table I Table I. The entire sorting procedure thus includes one primary run followed by 12 secondary runs (2A-2L), which can be completed in less than 24 h.
An aliquot ( 1 ⁄4) of the combined secondary subfractions was analyzed by MALDI-TOF-MS during which Met-SO peptides could be recognized by their typical neutral loss of methanesulfenic acid (loss of 64 atomic mass units). In this way, we detected at least 1720 different tryptic peptides, 1618 of which contained at least one oxidized methionine residue (data not shown). Thus, less than 6% of the sorted peptides were either not recognized as Met peptides due to lack of specific fragmentation or did not contain methionine and slipped through during the sorting process.
For further individual peptide and protein identification we used a LC-MS/MS configuration using the remaining 3 ⁄4 of the material. For this, we carried out 2 ϫ 96 ternary runs in an automated manner. The obtained information was probed against an E. coli K12 database consisting of only Met-containing peptides. This database consisted of 31,746 peptides in the mass range between 780 and 2400 Da and was generated allowing 0 and 1 miscleavage for trypsin in the predicted K12 proteome. The database size reduction was possible because of the low number of non-methionine peptides identified in an initial MALDI-MS screening exercise (see above).
The Mascot search algorithm assigned 2167 MS/MS spec- The 48 fractions of the primary run are combined in 12 pools of four fractions each. The fraction numbers are given in the second column. The numbers of the 12 secondary runs are shown in the first column. The third and fourth columns, respectively, indicate the time intervals during which the primary fractions and the sorted Met-SO peptides were collected (min after the start of the HPLC runs).  (Table II). The same MS/MS spectra were also analyzed with Procorr TM , an in-house-developed peptide identification algorithm providing an overall confidence of 98% for 1350 peptides and for 807 different proteins. In this case a maximum of 43 spectra (0.02 ϫ 2147 spectra), and thus proteins could have been falsely assigned (Table II). In total, 872 different proteins were identified: 689 proteins were found by both algorithms and are therefore highly relevant, 118 proteins were found with Procorr TM but not with Mascot, and 65 proteins were identified by Mascot only. The complete protein list is provided in Supplemental Table III. A classification of the 872 different proteins according to major functional categories or to important pathways is represented in a virtual cell shown in Fig. 2. A number of important aspects of the COFRADIC TM approach become apparent by further data analysis (Table II). Using the same number of E. coli cells, we identified 86 different proteins via a conventional two-dimensional gel MALDI-TOF-MS approach (not shown). Thus COFRADIC TM has a much higher sensitivity and coverage range than classical methods.
We identified a small but still significant percentage of putative integral membrane proteins (13.1% of the inner membrane and 22.6% of the outer membrane components). This is much higher than by conventional methods. We detected 26 proteins with a hydrophobicity (GRAVY) index (20) larger than 0.3, whereas all previous methods only detected two members of this class of proteins (see Supplemental  Table III). As already mentioned previously (2), membrane proteins are often detected via the tryptic peptides released from their outer membrane parts, which are accessible for the protease.
Although the original complexity of the peptide mixture is reduced by approximately a factor 5, the flux of peptides passing into the ion source of the mass spectrometer is still too high for individual peptide detection. Given this situation, proteins represented by a large number of Met peptides are expected to be detected with higher probability than proteins with few methionines. Such a bias toward Met-rich proteins is indeed observed by relating the percentage of identified proteins with the number of methionines in these proteins. We observe a nearly linear increase from 18% for the total predicted proteome to 43% for proteins with 10 or more Met residues (data not shown). This percentage is still increasing, reaching a plateau at 60%, when proteins with more than 17 methionines are considered (the latter value is statistically weak because of the low number of proteins involved). Based on these observations and since it seems unlikely that Metrich proteins are differently represented in the cell lysate compared with Met-poor proteins, we assume that approximately the same percentage (about 50%) of the predicted E. coli proteins may be detectable in our system. This means that we most likely identified ϳ37% of the proteins actually present in the cell lysate.

DISCUSSION
Our proteome approach is a peptide-based approach. The methionine peptides are selected by two repeated reversephase-HPLC runs with an oxidation step in between. There are no protein premodification steps. This very simple procedure therefore guarantees a high overall sensitivity because peptide losses due to manipulations are limited, although there is still some room for improvement by, for instance, omitting vacuum drying steps, which result in sample loss, and by downscaling the column dimensions used for the chromatographic isolation. The COFRADIC TM analysis was carried out on an extract (no prefractionation was done) of 50 million E. coli cells, corresponding in volume and protein content to ϳ50,000 hepatocytes. Thus COFRADIC TM offers a perfect tool to study the protein profile of biological samples, which could not be addressed previously, using two-dimensional gel analysis. For instance, very small groups of cells displaying defined biological functions such as small biopts, early stages of embryonic development, or even parts of individual cells can be studied. COFRADIC TM also allows the detection of very abundant proteins such as ribosomal proteins, as well as proteins, known to be expressed at very low copy number, e.g. lac repressor. Thus a simultaneous detection of proteins present in ratios of 1:10,000 or even more is now possible, illustrating the high dynamic range of our technology. However, it should be clear that methionine COFRADIC TM , as most other described peptide-based proteomic technologies, does not allow studying protein modification, protein processing/degradation, and the determination of different protein isoforms on a global scale. Typically, these topics have been addressed by separating protein mixtures on twodimensional gels followed by Western blotting using specific antibodies. Nevertheless, it should be noted that the concept of COFRADIC TM allows the isolation of different representative peptides if these can be specifically modified. We have recently altered the sorting chemistry such that amino-terminal peptides of all proteins present in a mixture can be isolated. 3 These types of peptides now allow the analysis of protein amino-terminal processing on a global scale in a gelfree manner. Similarly, we are developing a sorting chemistry to specifically isolate phosphorylated peptides out of protein digestion mixtures.
Following our analysis, we detected an unexpectedly large number of membrane proteins. This is a particular property of the peptide-based approach and was previously noticed by the group of J. Yates (2) using the MudPIT approach. Indeed, proteins can be in situ trimmed at their extramembranous parts, generating a set of hydrophilic peptides, some of which can function as signature peptides. This approach, in which COFRADIC TM may play a crucial role, may offer a valuable alternative to procedures in which membrane proteins are isolated using new types of detergents or novel extraction protocols prior to gel separation (21,22).
Several peptide-based proteome approaches were recently described. The MudPIT technology of the group of J. Yates (2) identified more than 1500 different proteins from yeast. This impressive number was reached by accumulating data from three prefractionated lysates each containing up to 400 g of protein. This represents at least 100 times the amount of starting material used in our studies. In addition, MudPIT does not include any presorting step, which makes it very difficult to separate the high number of peptides, thereby suffering from experiment to experiment reproducibility.
COFRADIC TM follows a preselection step introduced to reduce the number of peptides. This is similar to the isotopecoded affinity tag approach (3) and to the covalent chromatography method (5). However, COFRADIC TM is much more versatile than previous methods because any peptide carrying a group that can be specifically and quantitatively modified can in principle be sorted. In the example shown here, we have used one of the simplest modification reactions in pro-  (20,21). The schematic also shows the protein distribution in major cellular compartments. ABC, ATP-binding cassette. tein chemistry: the conversion of a methionine side chain to its more hydrophilic sulfoxide derivative. A similar one-step reaction could also be proposed for the selection of cysteine peptides: for instance the reduction of -S-S-R groups to the more hydrophilic thiol groups will provoke a hydrophilic shift for SH-containing peptides during the second chromatographic step. An additional advantage is that all these different sorting protocols can be carried out by the same robots in a fully automated manner.
While peptide-based proteomics clearly offers aspects of high sensitivity, broad protein coverage, and full automation, protein identification is, more than in conventional two-dimensional gel approaches, dependent on the confidence by which peptides are identified. Thus both the quality of the MS/MS fragmentation spectra of the individual peptides and peptide identification algorithms to interpret these spectra are therefore of utmost importance.
To provide more confidence to protein identification, we used Mascot as the first searching algorithm, but we additionally used a second in-house-developed algorithm, Procorr TM . Since the latter is based on a combination of parameters, we could also use higher stringency criteria while still identifying more proteins. Using the latter algorithm, we identified at least 807 different proteins with 95% probability. This is 53 proteins more than with Mascot. Taken together, both algorithms identify 872 different proteins, which can be classified into three categories: those identified by both algorithms, those identified with a peptide probability score of at least 98% by Procorr TM , and finally those identified with a peptide probability score of 95% by Mascot only. These 872 proteins represent almost 40% of the estimated expressed E. coli proteome as calculated from the identification score of Met-rich proteins (see above). Fig. 3 relates the distribution curves for the acidic proteins detected by COFRADIC TM with the total predicted proteome and those reported in the SWISS-2DPAGE database. CO-FRADIC TM detects more than 4 times more proteins. The difference between the two data sets is even most striking when the basic proteins are considered where a large number of proteins found by COFRADIC TM are missing in the twodimensional gel approaches.
For differential quantitative analysis we can use the isotope labeling of the peptide COOH terminus. This trypsin-catalyzed water-oxygen incorporation has been known for some time (see Ref. 23, for instance) but has only been applied recently for peptide and protein quantification (24). This procedure fits extremely well in the COFRADIC TM protocol because it does not need additional labeling reactions and purifications. The procedure as well as the application will be the subject of a separate article 4 in which basic questions such as quantitative aspects of oxygen incorporation, possible back-exchange during the peptide sorting process, co-elution of 16 O and 18 O isopeptides, and exact measurements of the ratio of the isotope variants will be addressed.
In conclusion, we have demonstrated that COFRADIC TM constitutes a valid alternative for peptide-based proteomics. It is very sensitive and is characterized by a broad protein coverage, including abundant and rare; large and small; and acidic, basic, and hydrophobic proteins. In the example of the E. coli proteome we identified 872 different proteins with very high probability scores. This number could have been considerably larger but was restricted by the limited capacity of peptide ion selection in the mass spectrometer used in this study. We are therefore confident that novel high-throughput machines may provide a complete coverage of peptides and thus a full coverage of the expressed proteome.
COFRADIC TM offers more to proteomics than just a methionine peptide-sorting technology. COFRADIC TM is a total concept that may become an indispensable tool for future proteomics. The high sensitivity of COFRADIC TM -based proteome analysis clearly allows analysis of only minute amounts of biological material, which was until now not possible using "classical" proteomic technologies. COFRADIC TM will make it possible to carry out targeted forms of proteomics such as detecting and measuring protein cleavage and processing in total cellular lysates or post-translational modifications. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.