An Integrated, Directed Mass Spectrometric Approach for In-depth Characterization of Complex Peptide Mixtures *S

LC-MS/MS has emerged as the method of choice for the identification and quantification of protein sample mixtures. For very complex samples such as complete proteomes, the most commonly used LC-MS/MS method, data-dependent acquisition (DDA) precursor selection, is of limited utility. The limited scan speed of current mass spectrometers along with the highly redundant selection of the most intense precursor ions generates a bias in the pool of identified proteins toward those of higher abundance. A directed LC-MS/MS approach that alleviates the limitations of DDA precursor ion selection by decoupling peak detection and sequencing of selected precursor ions is presented. In the first stage of the strategy, all detectable peptide ion signals are extracted from high resolution LC-MS feature maps or aligned sets of feature maps. The selected features or a subset thereof are subsequently sequenced in sequential, non-redundant directed LC-MS/MS experiments, and the MS/MS data are mapped back to the original LC-MS feature map in a fully automated manner. The strategy, implemented on an LTQ-FT MS platform, allowed the specific sequencing of 2,000 features per analysis and enabled the identification of more than 1,600 phosphorylation sites using a single reversed phase separation dimension without the need for time-consuming prefractionation steps. Compared with conventional DDA LC-MS/MS experiments, a substantially higher number of peptides could be identified from a sample, and this increase was more pronounced for low intensity precursor ions.

Over the past decade, MS has emerged as the method of choice for the identification and quantification of proteins in very complex biological samples (1). In the most widely used implementation, referred to as shotgun proteomics, protein samples are first digested, and the resulting peptide mixtures are then chromatographically separated and finally sequenced by automated MS/MS. Because of its conceptual and experimental simplicity, the shotgun approach has become a very popular method for the identification of proteins in a wide range of biological samples and, in combination with stable isotope labeling, for quantitative proteomics studies (2)(3)(4). Recent technical improvements in MS instrumentation, database searching, and result validation as well as advances in database annotation now make it possible to routinely identify hundreds to a few thousands of proteins in complex biological samples (5)(6)(7)(8).
Despite this impressive progress, shotgun proteomics is not yet capable of characterizing whole proteomes and presents obvious biases, among them the discrimination against protein species of low abundance (5,8). This is primarily a consequence of limited sequencing speed of current LC-ESI-MS/MS systems that are incapable of analyzing each precursor ion detected in complex samples together with the redundant selection of a subset of precursor ions even if precautions like dynamic exclusion are applied. Therefore, even in repeat analyses of the same sample exhaustive identification of the low intensity precursors is not achieved (9 -11).
In contrast to these approaches based on data-dependent acquisition (DDA) 1 precursor ion selection, directed peptide sequencing provides the advantage of focusing the MS/MS analysis on non-redundant and information-rich precursor ions, thereby better managing the analysis time and increasing the depth of analysis (12,13). In this regard, a two-stage strategy by which all MS1 features that represent peptides are extracted from LC-MS maps and subsequently subjected to targeted sequencing, in principle, should lead to the identification of all detectable precursors (14). Because the acquisition of MS1 and MS2 spectra is naturally decoupled in MALDI-MS/MS, this platform is well suited for directed sequencing and has been applied to selectively analyze differential expression or modifications of proteins (15,16).
The same principle is also applicable to ESI-MS, which has the potential to provide much higher sequencing speed in routine applications compared with MALDI-MS/MS. Because the peptides are not "immobilized" on the sample plate, repeat injections of the same sample are required: the first to detect the MS1 features and subsequent ones for directed sequencing. Naturally the decoupling of feature detection and sequencing demands highly reproducible elution times and high mass accuracy. A directed sequencing strategy has already been applied on high mass accuracy ESI instruments to specifically sequence peptides of single proteins in complex mixtures (17) and for the detection of low abundance peptide species (18). However, the number of targeted peptide ion masses was limited to a few hundred per run, a much lower number of sequencing attempts than modern ESI-MS/MS instruments are capable of performing in DDA mode. The number of targeted precursors could only be increased by sample consuming and time consuming multiple LC-MS/MS analyses of the same mixture.
In the present study, a high performance LTQ-FT-ICR mass spectrometer that allows segmentation of inclusion mass lists by LC elution time was used for directed sequencing. Each inclusion lists contained the m/z and elution time of the targeted precursors and was divided into segments of 3-5 min, thereby increasing the number of possible target masses to 3,000 in a 1-hour LC gradient. The strategy was supported by software tools to (i) automatically extract peptide features from MS1 maps and to align features over multiple LC-MS/MS patterns (13), (ii) generate inclusion lists from the identified features, 2 (iii) control directed sequencing of the features on the inclusion list by sequential LC-MS/MS analyses of the same sample, and (iv) map the MS/MS data obtained back to the initial feature list. The potential of the directed MS/MS approach was evaluated by in-depth characterization of complex peptide and phosphopeptide mixtures obtained from Drosophila melanogaster lysates. The data demonstrated the high specificity and reproducibility of the method to identify a higher number of peptides from the same sample in a lower number of LC-MS/MS runs compared with standard DDA LC-MS/MS analysis, specifically in the class of low intensity precursor ions.

MATERIALS AND METHODS
Cell Culture and Phosphopeptide Enrichment-All chemicals, if not otherwise mentioned, were bought at the highest available purity from Sigma-Aldrich.
Cell Culture, Lysis, and Protein Digestion-D. melanogaster Kc167 cells were grown in Schneiders Drosophila medium (Invitrogen) supplemented with 10% fetal calf serum, 100 units of penicillin (Invitrogen), and 100 g/ml streptomycin (Invitrogen) in an incubator at 25°C. To increase the degree of phosphorylation in the Drosophila proteins different batches of cells were pooled that were either growing in rich medium, growing in serum-starved medium, treated for 30 min with 100 nM rapamycin (LC Laboratories, Woburn, MA), treated for 30 min with 100 nM insulin, or treated for 30 min with 100 nM calyculin A. Then the cells were washed with ice-cold PBS and resuspended in ice-cold lysis buffer containing 10 mM HEPES, pH 7.9, 1.5 mM MgCl 2 , 10 mM KCl, 0.5 mM DTT, and a protease inhibitor mixture (Roche Applied Science). To preserve protein phosphorylation, several phosphatase inhibitors were added to a final concentration of 20 nM calyculin A, 200 nM okadaic acid, 4.8 m cypermethrin (all bought from Merck KGaA), 2 mM vanadate, 10 mM sodium pyrophosphate, 10 mM NaF, and 5 mM EDTA, respectively. After a 10-min incubation on ice, cells were lysed by homogenization in a Dounce homogenizer. Cell debris and nuclei were removed by centrifugation for 10 min at 4°C at 5,500 ϫ g. Then the cytoplasmic and membrane fractions were separated by ultracentrifugation at 100,000 ϫ g for 60 min at 4°C. The proteins of the cytosolic fraction (supernatant) were subjected to acetone precipitation. The protein pellets were resolubilized in 3 mM EDTA, 20 mM Tris-HCl, pH 8.3, and 8 M urea. The disulfide bonds of the proteins were reduced with tris(2-carboxyethyl)phosphine at a final concentration of 12.5 mM at 37°C for 1 h. The produced free thiols were alkylated with 40 mM iodoacetamide at room temperature for 1 h. The solution was diluted with 20 mM Tris-HCl (pH 8.3) to a final concentration of 1.0 M urea and digested with sequencing grade modified trypsin (Promega, Madison, Wisconsin) at 20 g/mg of protein overnight at 37°C. Peptides were desalted on a C 18 Sep-Pak cartridge (Waters) and dried in a SpeedVac. Finally 1 g of peptide sample was utilized for each LC-MS/MS experiment.
Phosphopeptide Isolation-Phosphopeptides were isolated using TiO 2 affinity enrichment as described recently (20). 1 g of the phosphopeptide sample was subjected to each LC-MS/MS analysis.
Reversed Phase HPLC-Peptide samples were analyzed on an Agilent 1100 microflow system (Agilent Technologies) connected to a 7-tesla Finnigan LTQ-FT-ICR instrument (Thermo Electron, Bremen, Germany) equipped with a nanoelectrospray ion source (Thermo Electron). Peptides were separated on an RP HPLC column (150 m ϫ 15 cm) packed in house with C 18 resin (Magic C 18 AQ 5 m; Michrom BioResources, Auburn, CA) using a linear gradient from 98% solvent A (0.15% formic acid) and 2% solvent B (98% acetonitrile, 2% water, and 0.15% formic acid) to 30% solvent B over 60 min (for cytosol digest) and 90 min (for phosphopeptide-enriched samples) at a flow rate of 1.2 l/min.
Mass Spectrometry-In DDA mode, each MS1 scan (acquired in the ICR cell) was followed by CID (acquired in the LTQ part) of the three (for feature extraction) and five (for comparison of DDA and directed LC-MS/MS) most abundant precursor ions with dynamic exclusion for 30 s. Only MS1 signals exceeding 150 counts were allowed to trigger MS2 scans with wide band activation enabled. Total cycle time was ϳ1-1.5 s. For MS1, 10 6 ions were accumulated in the ICR cell over a maximum time of 500 ms and scanned at a resolution of 100,000 full-width at half-maximum (at 400 m/z). MS2 and MS3 spectra were acquired using the normal scan mode, a target setting of 10 4 ions, and an accumulation time of 250 ms. Singly charged ions and ions with unassigned charge state were excluded from triggering MS2 events. The normalized collision energy was set to 30%, and one microscan was acquired for each spectrum. For phosphopeptide analysis, the mass spectrometer automatically switched between MS, MS2, and neutral loss-dependent MS3 acquisition. Data-dependent settings were chosen to trigger an MS3 scan when a neutral loss of 97.97, 48.99, 32.66, 24.5, or 19.6 Da was detected among the 10 most intense fragment ions.
Peak Detection-First the data of the initial three LC-MS (mapping) runs (raw format) was converted to the profile mzXML format (21). Then the in-house developed software system SuperHirn (22) was used for (i) detection, (ii) deisotoping, (iii) peak integration, and (iv) alignment of detected features over multiple LC-MS patterns. Peak intensities were measured by calculating peak areas from extracted ion chromatograms of each MS signal. Highly stringent criteria were applied to filter the detected peaks for peptide signals. Specifically the algorithm searches for peak patterns matching isotope distributions typical for peptides within the m/z value range under investigation. Peaks had to be detected in at least two subsequent MS1 scans and with a minimum of three isotopic peaks to be considered. Only peaks that could be found in at least two LC-MS runs were considered, and singly charged masses were excluded. Finally a list of the relevant features was generated and used to build mass inclusion lists for directed MS sequencing.
Generation of Inclusion Lists-To make the generation of inclusion lists less time-consuming, less prone to human error, and easier to reproduce we developed the "Inclusion List Builder" software. The Inclusion List Builder has a rich graphical user interface and is implemented as a plug-in for the Prequips platform 2 (Seattle Proteome Center).
The table containing all features extracted from initial LC-MS runs is imported through the Prequips data provider interface and converted into a so-called "master table" by the Inclusion List Builder. The master table contains the m/z ratios, retention times, averaged peak areas, and charge states for all features identified by the Super-Hirn algorithm. Through interactive application of filters to feature attributes, inclusion lists are created as subsets of the master table and segmented by retention time. This is necessary because the number of features present in the table usually exceeds the number of possible sequencing cycles that the mass spectrometer can acquire in a single run. After segmentation the inclusion lists are exported as tables (.csv file format) that can be read by the MS instrument software.
In the study presented here, the following software settings were used for the directed analysis of the peptide mixtures. The 9,680 features extracted from the three map runs analyzing the Drosophila lysate digest were split by their intensity into five bins, each consisting of 2,000 masses (1,680 for the last bin containing the least intense features) using the following average peak area thresholds: very high, 2.6 ϫ 10 9 -3 ϫ 10 7 ; high, 3.0-1.4 ϫ 10 7 ; medium, 1.4 ϫ 10 7 -8.1 ϫ 10 6 ; low, 8.1-4.9 ϫ 10 6 ; and very low, 4.9 ϫ 10 6 -5.3 ϫ 10 5 . Each subset was further clustered into 3-or 5-min segments using the following elution time values: 0, 27.5, 32.5, 37.5, 40.5, 43.5, 46.5, 49.5, 52.5, 55.5, 58.5, 61.5, 66.5, 71.5, and 80 min. The start and stop time of each 5-(or 3)-min time segment was extended by 2.4 (1.4) and 2.5 (1.5) min, respectively, to compensate for variations in retention time. Because the instrument software requires a few seconds to load/delete the new/old masses in each time segment, a delay of 6 s was implemented for each start time. For directed analysis of the phosphopeptide-enriched sample (90-min gradient), the features were clustered using the following elution time bins: 0, 22.5, 27.5, 32.5, 37.5, 42.5, 47.5, 52.5, 57.5, 62.5, 67.5, 72.5, 77.5, 82.5, 87.5, 95, and 110 min. In summary, this setup allowed, assuming equal distribution, the inclusion of up to 3,000 features in a single directed LC-MS/MS analysis using a 60-min gradient. It is important to note that applying shorter time segments could further increase this number.
Directed MS Sequencing of Features-The generated inclusion lists (.csv file format) were directly imported into the global mass list parent ion table of the MS operating software (Xcalibur 2.0 SR1, Thermo Electron) and activated. Basically the settings for targeted LC-MS/MS were similar to those described above with a few modifications. First the dynamic exclusion mass window that is also setting the m/z tolerance for the inclusion list masses was narrowed to Ϯ10 ppm for all directed analyses with enabled monoisotopic precursor selection and to Ϯ5 ppm when this option was turned off. Ion signals for which no charge could be assigned were also allowed to trigger MS2 scans. The dynamic exclusion time was reduced to 10 s to acquire multiple MS2 scans for each feature. For directed sequencing in preview off mode, this option was disabled, and the resolution of the MS1 scan in the ICR cell was reduced to 50,000. For sequencing features of low abundance, the monoisotopic precursor selection option was disabled, and to minimize unspecific sequencing, the threshold required for triggering MS2 events was raised from 150 to 3,000 counts.
Database Searching-All acquired MS2 and MS3 spectra were searched against the Drosophila Flybase protein database (D. melanogaster, release 4.3; March 2006; 19,645 entries), which also contained the protein sequences of bovine trypsin and human keratins, using the Bioworks (Version 3.2) software (Thermo Electron, San Jose, CA). The search criteria were as follows: full tryptic specificity was required (cleavage after lysine or arginine residues unless followed by proline); two missed cleavages were allowed; carbamidomethylation (Cys) was set as fixed modification; oxidation (Met) and, if required, phosphorylation (Ser/Thr/Tyr) were applied as variable modifications; and mass tolerance of the precursor ion and the fragment ions was 10 ppm and 0.8 Da, respectively. In addition to this, each data set was searched against a decoy Flybase protein database (Version 4.3) as described previously to assess the number of false positive peptide identifications (23). Based on this approach, the error rate was set to a maximum of 1% (error rate ϭ 2 ϫ percentage of decoy hits) using the following thresholds: peptide probability Ͻ0.01 and final score Ͼ0.2 for unphosphorylated and Ͼ0.4 for phosphorylated peptides. For each peptide sequence identified, all matching gene numbers (Flybase gene ID (FBgn)) and protein accession entries (computed gene (CG)) were determined and displayed. In addition to this, Occam's razor logic as implemented in Protein Prophet (24) was applied to calculate the number of identified proteins (CG entries). In brief, redundant protein entries were removed by clustering peptides matching to multiple members of a protein family to a single protein group and considered as a single identification. Furthermore when multiple proteins shared a peptide sequence, it was only assigned to the protein identified with the highest number of peptide assignments. In-house software was used for the calculation of non-redundant phosphorylation sites present in the data sets obtained as recently reported (25).
For comprehensive intensity comparison of the peptides identified by DDA and directed LC-MS/MS analysis (see Fig. 3B), the MS1 signal intensities were determined using the Bioworks software to also include peptides exclusively identified in DDA mode. Therefore, the peak height of every single peptide precursor ion was determined by applying the following parameters: mass range of 0.03 Da, intensity threshold of minimal 1,000, and three smoothing points allowed. The intensity of the same peptide ions detected in multiple runs was averaged.
Assignment of Identified Peptides to Inclusion Lists-An Excel table (in tab-delimited format) containing the search results from the directed LC-MS/MS runs was imported into the Prequips software. Spectral data in mzXML format was loaded to obtain access to the corresponding retention times. All identified peptides were mapped to one or more features in the master table through the MS2 scans associated with them.
For the mapping algorithm we define a scan as the triple (corrected retention time tЈ R , feature at mass m, and charge z) where the feature mass m is defined as follows. m ϭ ͑precursor neutral mass ϩ ͑z ϫ 1.00727͒͒/z (Eq. 1) The corrected retention time tЈ R is computed as from the measured retention time t R . Because all LC-MS/MS experiments were performed using the same C 18 column resulting in minimal shifts in feature elution times, no corrections of t R were required.
Therefore, a was set to 1, and the value for b was set to 0. The following algorithm was applied for each identified peptide p.
1. For every scan s associated with p find all features f in the master table that fulfill each of the following conditions: The retention time tolerance (in min) and the mass tolerance (in ppm) define windows of 2 and 2 centered on tЈ R (s) and m(s), respectively. 2. Map p to all f fulfilling the conditions. A single peptide can be mapped to the same feature multiple times if it is associated with more than one scan. Also more than one distinct peptide can be mapped to a feature. The peptide with the lowest retention time and mass difference is automatically selected as the best hit. The investigator can manually assign any other of the peptides mapped as the best hit for a given feature if there is supporting evidence.
For mapping identified features back to the initial inclusion list, the theoretical m/z values and the elution time of the identified peptides were matched with the precursor masses and elution times in the inclusion list. A mass tolerance of 0.01 Da and a time tolerance of 1 min were allowed. Because no precolumn was used during LC, very hydrophilic peptides, which were not or only partially retained by the C 18 resin, showed very inconsistent retention times. Therefore, all peptides having LC retention times of less than 30 min were matched only by their m/z and charge values.

The Directed Precursor Ion Selection Work Flow-
The general work flow of the directed LC-MS/MS analysis described in this study is outlined in Fig. 1 Thus the construction of the inclusion list required that the number of features from two overlapping segments did not exceed 500. In addition the lower number of features early and late in the gradient allowed the generation of longer segments with a higher time tolerance in those regions. The long time segment at the beginning is essential for targeting hydrophilic peptides because these molecules do not, or only weakly, bind to the RP LC column and consequently showed considerable shifts in elution times between different runs. For samples that showed a common feature distribution, a maximum of 3,000 features could be sequenced per hour (ϳ1 feature/s). It is worth mentioning that this number could be further increased by a further reduction of the time segments. 5. Feature identification. The MS2 data obtained from the directed LC-MS/MS analyses were searched against protein databases and, together with the corresponding mzXML files, mapped back to the master table using the Prequips software.
Reproducibility of LC-MS-The success of an inclusion list experiment strongly depended on the reproducibility of both LC and MS performances. Supplemental Fig. S1 shows that variations in retention time were below 20 s, and mass accuracy was better than 2 ppm. These variations are well within the time and m/z tolerances of 2.5 min and 10 ppm, respectively, used for directed MS2. Naturally the time and m/z tolerances should be kept as small as possible to minimize random MS sequencing events.

Optimization of MS Parameters for Improved Identification of Features-
To optimize the number of features identified in a targeted precursor ion selection experiment we generated a peptide mixture from D. melanogaster Kc167 cells and subjected aliquots to repeat analyses in which relevant experimental parameters were varied.
Cells (10 8 ) were harvested and disrupted, and the cytosolic fraction was isolated. The proteins were alkylated and trypsinized, and three aliquots, each containing 1 g of peptides, were analyzed as described above (Fig. 1). A total of 9,680 unique features could be detected (see supplemental  Table S3) that were imported into the master table. The fea-tures were grouped by decreasing intensity into five inclusion lists, each containing around 2,000 features, and subjected to directed LC-MS/MS analysis.
Subsequently the influence of two important parameters for triggering MS2 in the LTQ-FT mass spectrometer used, the preview mode and the monoisotopic precursor selection (MPS), respectively, was evaluated. With preview mode on (default mode), the instrument acquires a low resolution MS1 spectrum (prescan) that takes around 20% of the total scan time and continues scanning the ions for high resolution while the LTQ is performing MS2 scans of selected ions in parallel (Fig. 2C). This significantly increases MS2 scan rates, but peptide ions for MS2 are selected from low resolution MS1 scans. By disabling the preview mode and reducing the resolution to 50,000 for MS1, the mass accuracy of the peptide ions selected on line for directed MS2 could be improved. Consequently 8% (13%) more features were identified (sequenced) from the Kc167 cell sample from the five inclusion lists with preview mode disabled ( Fig. 2A and supplemental Tables S1, S4, and S5). Despite the slightly longer cycle times required (Fig. 2D), the speed of the MS instrument used was still sufficient to selectively sequence up to 2,000 features in a 60-min gradient (supplemental Table S1). It is important to note that the improvements were particularly pronounced for features with lower intensities. Additionally the MS2 selection efficiency of peptide ions of low abundance could be further improved by disabling MPS, which makes every single isotopic peak in the MS1 scans accessible for triggering MS2. As a consequence, the yield of identified low abundance peptide signals could be further improved from 33 to 40% and 26 to 35% for low and very low intensity features, respectively ( Fig. 2A).
It is worth mentioning that increasing the number of ions into the ICR cell for MS1 scans also slightly increased the numbers of identified low abundance ions, however, to a lower extent than by disabling MPS (data not shown). This is due to the fact that high ion numbers in the ICR cell lead to space charge effects that lower peptide mass accuracy (26,27). Therefore, some features might fall out of the m/z tolerances applied and would not be sequenced if the ICR cell is filled with a high number of ions.
Using the optimized MS parameters, around 80% of all detected features were selected for sequencing, and 48% of those could be confidently identified, resulting in the identification of 3,931 unique peptides and 793 non-redundant gene products (supplemental Tables S1 and S6). In addition, the occurrence of random sequencing of MS signals not selected for directed MS2 was very low because only about 11.6% (516 peptides) of all identified peptides in the five inclusion list runs could not be mapped back to the master feature list. This demonstrates that the whole process starting with feature extraction from high resolution MS1 maps followed by directed sequencing using an inclusion list protocol and mapping the MS2 data back to the features is very effective and specific.
Comparison of Data-dependent and Directed LC-MS/ MS-To assess the performance of the directed LC-MS/MS approach in comparison with CID experiments using DDA, we compared the data obtained from applying the protocols optimized as described above with the results of five repeat LC-MS/MS runs using DDA. The sample was a tryptic digest of Kc167 cell cytosolic fraction, and in each analysis 1 g of total peptide mass was injected. To increase the number of MS2 spectra acquired, the instrument was programmed to randomly select for CID the five most intense precursor ion signals detected in a survey scan in the LC-MS/MS experiment with DDA, and the peptides identified by either strategy were compared.
As shown in Fig. 3A (blue line), 2,493 unique peptides were identified in the first DDA LC-MS/MS run, but then the contribution of additional LC-MS/MS runs of the same sample to the overall number of peptides decreased rapidly, eventually identifying only 1,083 (43.4%) additional peptides in the four subsequent LC-MS/MS runs (supplemental Tables S2 and  S7). This effect was even more apparent at the protein level where only 111 (21.0%) new gene products could be discovered by the additional four LC-MS/MS experiments. Notably the number of identified peptides over all five DDA runs was highly consistent, ranging from 2,461 to 2,493 (supplemental Tables S2 and S7). This clearly demonstrates a strong sequencing redundancy in shotgun LC-MS/MS in agreement with observations obtained from other recent studies (9 -11).
In contrast, the directed approach presented here allowed the non-redundant sequencing of every single feature in only one LC-MS/MS run, and therefore, the degree of novel peptide identifications in each analysis was much higher, resulting in a much steeper curve (Fig. 3A, orange line). Here using the optimized MS parameters shown above (preview mode and MPS disabled; Fig. 2), 2,746 (237%) additional peptides could be determined after the first directed LC-MS/MS run, resulting in a total of 3,931 unique peptide identifications. Using nonredundant sequencing also allowed the identification of a significant number of additional proteins in each directed run compared with DDA LC-MS/MS (supplemental Table S1). Eventually a higher number of peptides and proteins were identified in five directed runs than in five DDA LC-MS/MS analyses despite the much lower number of sequencing events, clearly indicating the higher information content of the ions extracted off line over DDA LC-MS/MS (supplemental Tables S1 and S2). Interestingly the INL curve (Fig. 3A) is only slightly flattening with decreased feature signal intensity, which suggests that potentially even more peptides can be identified by the directed approach if less stringent peak picking criteria were applied and additional inclusion list runs were carried out on the same sample.
It is important to point out that current mass spectrometers offer possibilities to sequence additional low abundance peptides in DDA mode (28). However, most of them drastically reduce the number of MS1 scans acquired and therefore lower reliable quantification of complex peptide mixtures. Conversely directed LC-MS/MS enables the identification of low abundance peptides without affecting MS1 performance and thus quantification accuracy.
Combination of DDA and Directed LC-MS/MS-As shown in Fig. 3A, three inclusion lists were necessary to identify as many peptides/proteins as in a single DDA LC-MS/MS experiment. Therefore, excluding already selected features from the initial LC-MS run for the generation of MS1 maps and specifically targeting the remaining features would be expected to reduce the number of directed LC-MS/MS runs required to sequence every detectable feature. We applied a combined strategy of two initial DDA runs with subsequent directed runs to the Kc167 cell sample described above. As shown in Fig.  3A, the results of more than 5,000 sequencing attempts of the first two DDA LC-MS/MS analyses runs could be mapped back to the feature list, thereby reducing the number of features from 9,680 to 4,323 that were specifically targeted in two additional LC-MS/MS experiments. The first directed run already identified as many novel peptides as the three additional DDA runs together, and eventually 4,273 unique peptides corresponding to 804 different proteins could be identified after only four LC-MS/MS runs (Fig. 3A, green line, and supplemental Table S8). In total, a higher number of peptides could be identified with less analytical effort and time by the combined approach than by the DDA or directed approach alone. Notably the high performance hybrid LTQ-FT mass spectrometer used for our experiments is very well suited for the combined approach because the ICR cell enables the acquisition of high resolution MS1 data while, in parallel, the LTQ part can collect MS2 data without sacrificing time, resolution, or sensitivity for MS1 data acquisition (Fig. 2C).
To evaluate the intensity distribution of the peptides identified by both DDA alone (Fig. 3A, blue line) and in combination with directed (Fig. 3A, green line) LC-MS/MS, the precursor ion abundances of these peptides were calculated and compared. To obtain an unbiased intensity determination of all identified precursor ions, their base peak intensities were calculated using the Bioworks software tool. As shown in Fig.  3B, most of the peptide ions identified in DDA mode had intensities between 10 6 and 2 ϫ 10 6 counts (Fig. 3B, blue line). By contrast, the peak maximum of the distribution of peptides exclusively identified by the two inclusion list runs is clearly shifted toward precursor ions with lower abundance. The apex of the distribution of the peptides determined by the inclusion list in the combined approach was around 2 ϫ 10 5 -4 ϫ 10 5 and thus about 5 times less intense than that of the DDA alone strategy (Fig. 3B, red line). In combination with the data obtained from the first two DDA runs, the intensity of peptides determined by the combined approach was much less biased toward high abundance signals and more equally distributed (Fig. 3B, green line). This clearly indicates that in complex peptide mixtures a higher number of peptide ions with low intensities could be identified by directed than by DDA LC-MS/MS approaches.
Reproducibility of Directed LC-MS/MS Sequencing-To evaluate the reproducibility of the directed approach for both feature selection and identification, 200 identified features from each of the five intensity groups (very high to very low) were combined and reanalyzed five times. Overall the selected features spanned more than 3 orders of magnitude in signal intensity. The data are shown in Table I and supplemental Fig. S2. Of the 1,000 features targeted, more than 85% did trigger sequence attempts, and around 70% could be confidently identified in all five LC-MS/MS runs. Furthermore of the 1,000 features on the respective inclusion list, 984 did trigger an MS2 scan event, and 945 could be assigned to the correct peptide sequence in at least one run. Because the directed approach requires each feature to be detected on line, the selection of features for sequencing strongly depends on their signal intensity. Whereas all of the 200 features in the group with the highest signal intensities were selected for MS2 in all five repeat LC experiments, that number decreased to 148 (74%) in the group with the lowest signal intensities (supplemental Fig. S2B). A similar trend was observable for the number of correctly identified peptides albeit with a steeper decline at lower feature intensities. However, the MS2 spectra obtained can be useful for side-by-side comparison of correctly assigned spectra to confirm the feature identity (29) even if the spectral quality is not sufficient for the assignment of a peptide sequence by a database search engine.
Directed Analysis of a Complex Phosphopeptide Mixture-Phosphopeptides, specifically those phosphorylated at serine or threonine residues, are notoriously difficult to identify because of their specific fragmentation patterns in ion trap instruments.
To evaluate the performance of directed sequencing for the identification of phosphopeptides we applied the combined directed approach described above (Fig. 3A, green  line) to a sample consisting of phosphopeptides isolated by TiO 2 affinity chromatography (20, 30) from a cytosolic fraction of Kc167 cells. The sample was separated using an RP LC gradient that was extended by 30 min compared with the one used for non-phosphorylated peptides to compensate for the longer MS acquisition times applied for phosphopeptide analysis (MS2 followed by MS3 of the neutral loss peak of Ϫ98 Da corresponding to the loss of phosphate). To maximize the number of identified features for a specific number of LC-MS runs, the experiment was designed such a Feature intensity bins based on feature area determined by SuperHirn (very high, 2.6 ϫ 10 9 -3 ϫ 10 7 ; high, 3.0-1.4 ϫ 10 7 ; medium, 1.4 ϫ 10 7 -8.1 ϫ 10 6 ; low, 8.1-4.9 ϫ 10 6 ; and very low, 4.9 ϫ 10 6 -5.3 ϫ 10 5 ).
b The same 1,000 features were targeted in each run, including 200 features of each intensity bin.

Directed Mass Spectrometry of Complex Peptide Mixtures
that in every run a different set of peptides was selected. Specifically the following protocol was applied. First the sample was analyzed by DDA LC-MS/MS, selecting precursor ions at [M ϩ 2H] 2ϩ in the first run and ions at charge states higher than [M ϩ 2H] 2ϩ in the second run, respectively. Then all MS1 peaks were extracted from the high resolution MS1 maps obtained in those first two runs and filtered as described above. A total of more than 7,776 features that aligned over both runs were detected (supplemental Table S9) about half of which were already sequenced by the two initial LC-MS/MS runs and excluded from the following directed LC-MS/MS. After generating a series of inclusion lists, the remaining 4,000 features were subjected to directed sequencing in two additional LC-MS/MS experiments. To further increase the number of detected phosphopeptides, the 2,000 most abundant features showing a neutral loss peak but that could not be confidently assigned to a peptide sequence were reanalyzed in one additional LC-MS/MS run using a different MS sequencing method especially designed for phosphopeptide analysis (31).
Every directed LC-MS/MS run identified a considerable number of previously unidentified peptides. In total, after five LC-MS/MS runs, more than 1,600 phosphorylation sites could be identified (Table II and supplemental Table S10 and Fig. S3). Interestingly 1,500 (87.2%) of the 1,721 identified peptides carried at least one phosphate group confirming a high specificity for phosphopeptides of the TiO 2 affinity enrichment. Of the 1,628 phosphorylation events detected, the exact site of phosphorylation of 1,204 sites could be determined with a probability of more than 90% (⌬CN Ͼ 0.1) (32). The distribution of phosphorylated amino acids was similar to that in other studies (20,33) with most sites found on serine (82.7%) followed by threonine (15.4%) and tyrosine (1.9%) residues. In addition to this, the majority of phosphopeptides identified contained one phosphogroup (92.1%), whereas only 110 (7.3%) and nine (0.6%) were phosphorylated on two or three different residues, respectively. More importantly, compared with the DDA runs, 65% additional protein phosphorylation sites were identified by the three directed LC-MS/MS runs (runs 3-5) of which 107 were identified by reanalyzing features that showed a neutral loss of phosphate (Ϫ98 Da) during CID (run 5). Compared with a single DDA LC-MS/MS analysis of this sample that detected a total of 720 phosphorylation sites (supplemental Table S11 The value of the additional information obtained with directed LC-MS/MS can be demonstrated by the increased protein phosphorylation coverage of the Wnt signaling pathway, which is implicated in the genesis of cancer (34,35). Table III shows all eight identified phosphoproteins and their 13 phosphorylation sites that could be assigned to this pathway (36). Whereas five phosphoproteins and seven phosphorylation sites could be detected by the two initial DDA LC-MS/MS analyses (runs 1 and 2), the three following directed LC-MS/MS runs (runs 3-5) exclusively detected three phosphoproteins and six phosphorylation sites that increased the overall coverage of the pathway by 60 and 85.7%, respectively. For example, phosphorylation of ATP-dependent helicase SWR1 (gene name CG5899) at serines 169, 172, and 841 could be determined with high confidence only by additional directed LC-MS/MS. Although the precursor ion signal intensities for fragment ion spectra acquired by directed sequencing were generally lower than those of the DDA runs, their quality was high. For instance, the complete y-ion series, with the exception of y1 and y2, of the doubly phosphorylated peptide "K2DQVYDpSDDpSDSEMSTK2M" (where pS is phosphoserine) could be assigned from its fragment spectrum (Fig. 4B). Interestingly the precursor ion was not picked for fragmentation in the first two runs, although it is only 5 times less abundant than the highest peak present in the corresponding MS1 spectrum (Fig. 4A). Obviously the sequencing speed of the mass spectrometer was not sufficient to acquire MS2 spectra from the high number of co-eluting peptide ions present in the MS1 spectrum. This agrees very well with our results obtained from the sample consisting of non-phosphorylated peptides (Fig. 3B).  b Numbers calculated from Bioworks 3.2 search results using an in-house software tool (25).

DISCUSSION
This study describes a reproducible, sensitive, and integrated computational and mass spectrometric method for directed sequencing of a high number of peptide ions within complex mixtures. In contrast to concomitant MS1 and MS2 data acquisition during DDA ESI-LC-MS/MS, the directed approach described decouples MS1 and MS2 spectra collection. In a first step, potential peptide ion signals are extracted off line from the pattern generated in initial LC-MS/MS runs. For this task we developed the software tools SuperHirn and Prequips. In a second step, such features were subjected to directed sequencing in subsequent LC-MS/MS runs, and the identified features were automatically mapped back to the list of previously detected features. As we make the software tools developed for this method publicly available and because it uses generic data formats (21,24,37) for which converters for various MS instruments are available (Seattle Proteome Center), the presented approach is applicable to any high performance MS platform that allows direct sequencing by inclusion list. It is important to point out that high resolution scans are required only for MS1 data acquisition and not for MS2 sequencing. Notably hybrid MS instruments, like the LTQ-FT instrument used in this study, are preferred because they provide the unique advantage of parallel MS1 and MS2 spectra acquisition. Therefore, MS2 data can already be obtained in the initial LC-MS/MS runs with minimal impairment of the speed and sensitivity for acquiring high quality MS1 maps. Compared with current DDA LC-MS/MS, the described directed strategy provides several advantages.
First, the off-line feature extraction facilitates the identification of peptides with low intensity precursor ion signals.
Whereas peak selection in DDA LC-MS/MS analysis is based on one single MS1 scan, off-line peptide ion detection allows summing up all isotopic signals of an eluting peptide ion. This simplifies the selection of potential peptide ions based on the distribution of isotopic clusters as well as the determination of charge state and monoisotopic precursor mass. Moreover the alignment of detected peptide ions over multiple runs helps to distinguish between peptide-derived signals and chemical and electronic noise. This significantly improved the generation of a feature list with a high content of peptide ions. As shown above, the directed sequencing of these informationrich features resulted in a higher number of unique peptides identified compared with on-line CID by DDA. It is worth mentioning that the number of MS2 scans acquired by the directed approach was 3 times lower than by DDA LC-MS/ MS, considerably reducing the efforts for subsequent data analysis.
Second, in directed LC-MS/MS, the CID parameters can be adjusted and optimized according to the feature properties. For instance, a higher number of very intense features could be identified in one LC-MS/MS experiment by applying shorter ion accumulation and scan times, whereas longer gating and scan times were used for low abundance peptide ions. It is important to point out that the directed approach offers the opportunity of reanalyzing all features that could not be assigned to a peptide sequence in the first analysis using optimized MS parameters. As shown by the results obtained in this study, more than 100 additional phosphorylation sites were determined by iterative analysis of phosphopeptides that showed a neutral loss peak but could not be confidently identified on the basis of their MS2 or MS3 spectra using the multistage activation mode for MS/MS data acquisition (31). The application of optimized CID parameters for each feature not only allows for the identification of more peptides but can also be used to confirm peptide ions with questionable identity. Third, the directed approach offers the possibility of selecting for CID precursor ions with specific properties. These include the charge state of a peptide that is most likely to yield an informative fragment ion spectrum, particular patterns after stable isotope labeling (13,15,16,38), characteristic isotope distributions generated by tagging selected functional groups with suitable reagents (39), both isotopic versions of crosslinked peptides (40), and redundant identification of peptides of interest over multiple samples (e.g. time course experiments) (13).
For analyzing complex mixtures, the most important improvement of directed versus DDA LC-MS/MS results from the fact that the directed approach copes with the problem of "undersampling" during LC-MS/MS analysis meaning that not all precursor ions present in the MS1 scans can be sequenced using DDA-based LC-MS/MS (8 -11). This results in the preferred identification of peptides of high abundance (Fig. 3A). Conversely using the directed approach, the sequencing speed of the MS instrument used is no longer a limiting factor with respect to the undersampling problem virtually allowing any detectable peptide ion in a sample to be MS-sequenced. Indeed a higher number of peptides were identified by using the inclusion list-based approach with most having a precursor intensity 5 times lower than that of the majority of peptides determined by the DDA strategy. The exclusive identification of low intensity peptide ions by directed LC-MS/MS was particularly pronounced for MS1 scans containing large numbers of peptide ions, clearly indicating incomplete sequencing by on-line DDA LC-MS/MS because of the limited MS sequencing speed and a more thorough sequencing of precursor ions by the directed approach.
Nonetheless the number of peptides identified by directed LC-MS/MS decreased rapidly with lower MS1 signal intensities. This can be ascribed to the fact that very low intensity peaks do not reach the detection limit in all replicate runs, show higher mass tolerances, and are consequently difficult to detect and to align across multiple maps. A higher dynamic range in the MS1 scans would be desirable; however, this is limited by the loading capacity of the ICR cell (26,27). Certainly the dynamic intensity range of detected peptide ions can be extended by additional time-consuming sample fractionation steps (5,19), which can be used in combination with the directed approach. More importantly, each feature needs to be detected on line by the MS instrument to trigger MS2. As shown above, changing the MS detection parameters increased this number, but for a considerable fraction of low abundance features detected off line, no MS2 scan was acquired. Therefore, further improvements in on-line monoisotopic peak detection of peptide-derived precursor ions could definitely increase the number of CIDs for low abundance peptide ions.
In conclusion, directed sequencing of non-redundant, high quality features enables the identification of a higher number of peptides with less analytical effort than current DDA LC-MS/MS-based methods. For instance, more than 1,600 phosphorylation sites could be identified using a single dimension reversed phase separation without the need for time-consuming sample prefractionation steps. The implemented software tools used are freely available and compatible with generic, publicly accessible data formats and therefore applicable to most high performance LC-MS (MS1 level) platforms. With the increasing availability of reproducible high performance LC-MS/MS systems and its capability to identify features of interest specifically and redundantly over a wide intensity range, the directed approach presented here is well suited for indepth and high throughput characterization of complex protein samples and will find wide application in future LC-MS/ MS-based proteome studies.