Unique Ion Signature Mass Spectrometry, a Deterministic Method to Assign Peptide Identity

The growing use of selected reaction monitoring (SRM) mass spectrometry in proteomic analyses led us to investigate how to identify peptides by SRM using only a minimal number of fragment ions. By using a computational model of the SRM work flow we computed the potential interferences from other peptides in a given proteome. From these results, we selected the deterministic SRM addresses that contained sufficient information to confer peptide and protein identity that we termed unique ion signatures (UIS). We computationally showed that UIS comprised of only two transitions are diagnostic for >99% of Escherichia coli proteins and >96% of human proteins that possess a sequence-unique peptide. We demonstrated an example of experimental use of UIS using a modified SRM methodology to profile the E. coli tricarboxylic acid cycle from a single injection of cell lysate. In addition, we showed the potential of UIS to form the first functionally orthogonal approach to validate peptide assignments obtained from conventional analyses of MS/MS spectra. The UIS methodology is a novel deterministic peptide identification method for MS/MS spectra based on information content. These robust theoretical assays will have widespread use when integrated with previously collected MS/MS data and conventional proteomics technologies.

The growing use of selected reaction monitoring (SRM) mass spectrometry in proteomic analyses led us to investigate how to identify peptides by SRM using only a minimal number of fragment ions. By using a computational model of the SRM work flow we computed the potential interferences from other peptides in a given proteome. From these results, we selected the deterministic SRM addresses that contained sufficient information to confer peptide and protein identity that we termed unique ion signatures (UIS). We computationally showed that UIS comprised of only two transitions are diagnostic for >99% of Escherichia coli proteins and >96% of human proteins that possess a sequence-unique peptide. We demonstrated an example of experimental use of UIS using a modified SRM methodology to profile the E. coli tricarboxylic acid cycle from a single injection of cell lysate. In addition, we showed the potential of UIS to form the first functionally orthogonal approach to validate peptide assignments obtained from conventional analyses of MS/MS spectra. The UIS methodology is a novel deterministic peptide identification method for MS/MS spectra based on information content. These robust theoretical assays will have widespread use when integrated with previously collected MS/MS data and conventional proteomics technologies.

Molecular & Cellular Proteomics 8:2051-2062, 2009.
Shotgun proteomic analyses using multidimensional LC/ MS/MS show great capacity for rapid protein analysis. This is arguably the most prevalent work flow for high throughput comparative proteomics, utilizing information-dependent acquisition (IDA) 1 to acquire MS/MS triggered by the signals generated from incoming peptides (1)(2)(3). Despite the utility and widespread use of this approach, there remain inherent problems including a relatively high level of ambiguous and false peptide assignments (ϳ5%) as well as high numbers of unassigned mass spectra (4 -6). The reason for this level of ambiguity stems in part from the non-deterministic nature of the identification algorithms. Without the use of reference standards the only way to know a spectrum was generated by a given peptide with absolute certainty is for the spectrum to contain a fragment pattern that conclusively demonstrates the presence of each amino acid. Unfortunately this level of coverage is extremely rare in proteomics data.
More recently, selected reaction monitoring (SRM) or multiple reaction monitoring (MRM) mass spectrometry methods have been deployed for proteomic analyses (7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20). This has occurred as proteomics has matured from a discovery-oriented discipline into a more targeted and quantitative field. The method is conventionally conducted using triple quadrupole mass spectrometers where two rounds of mass selection provide excellent fidelity and sensitivity to monitor one or more predetermined target peptides generally in the context of a complex sample such as a cell lysate. Using this approach the mass spectrometer continually monitors the selected precursor ion m/z (Q1) and a subsequent product ion m/z (Q3) from the target analyte. SRM experiments can be used to conduct several rounds of these scans targeting different product ions in an attempt to bolster the confidence that the Q1 3 Q3 transitions monitor the intended analyte with fidelity. A key point of contrast with IDA experiments is the need to preselect target analytes for monitoring. This can be achieved by harvesting data from previous discoverybased experiments or by in silico predictions such as MRMinitiated detection and sequencing (MIDAS) (10,12). Regardless the key underlying principle of SRM in proteomics applications is that the selected set of precursor and product ions contain sufficient information to proxy for the target peptide and thereby its protein of origin. Given that proteomics SRM experiments are conducted with a minimal set of transitions, one must accept that a degree of uncertainty resides in any such assay. To date, the magnitude of this uncertainty has not been studied. This remains a key point even with MS instruments capable of conducting subsequent full MS/MS scans triggered by SRM (e.g. QTrap) as these are lower sensitivity scans that may contain insufficient fragmentation data to conclusively confer peptide identity.
The problem of interference is also present in SRM experiments. To achieve acceptable sensitivity a large Q1 m/z window (Ϯ0.3-1.0 m/z) is needed. This in turn allows other peptides with similar Q1 m/z and elution properties to interfere with detection of the desired target. The frequency of these interferences would likely increase as the complexity of the sample increases creating a greater likelihood of false positives. Clearly this is not an unexpected result as conventional peptide identification strategies utilizing tandem MS result in some false assignments. Therefore, it would be unreasonable to expect that SRM assays that typically utilize fewer product ions than MS/MS experiments would not also encounter similar interference (21).
In this study we investigated the information content of SRM assays and in doing so exposed the potential redundancy. Computational simulations of the experiment enabled us to demonstrate that directed selection of SRM precursor and product ions can avoid the pitfalls of interference by selecting ion combinations that uniquely map to target peptides within the context of the simulation. We used these unique ion signatures (UIS) in a proof of concept study to direct SRM data acquisition for the exclusive detection of enzymes in the Escherichia coli tricarboxylic acid cycle. In addition, given that UIS have been calculated to uniquely define target peptides in the experimental context, we demonstrated the applicability of UIS as an orthogonal validation of peptide identity for traditional MS/MS experiments.

EXPERIMENTAL PROCEDURES
Calculation of Unique Ion Signatures-Protein sequences for the nominated proteomes were downloaded from Swiss-Prot release 56.1. We determined a set of variables used to calculate UIS including: the order of the UIS (the number of Q3 values, UIS͉ r ), use of trypsin for proteolysis, the number of possible missed cleavage sites (one for the proteome wide calculation and two for the E. coli tricarboxylic acid cycle calculation), variable modifications of methionine, the number of allowed charge states (1ϩ, 2ϩ, and 3ϩ), and the number of heavy isotopes to consider (ϩ1, . . . ϩ5 amu). Using this description all the possible peptides were generated, X was substituted for isoleucine and leucine, and the peptides were then mapped into a set. If the peptide being loaded was already present in the set it was marked as redundant and excluded as a candidate. From this set the peptides that contain no inappropriate cleavage residues, are non-redundant in the proteome, and fall within a 300 -2000 m/z domain are candidates for potential UIS addresses. For each candidate peptide, all charge states up to 3ϩ within a given tolerance (e.g. Ϯ1 m/z) were pooled. From the pooled peptides, the product ions of the candidate peptide are generated (i ions), and all the possible combinations of Q3 m/z were considered. For a UIS͉ r (r indicating the number of Q3 values) the number of candidate addresses is given by (i choose r) ϭ i!/((r!)(i Ϫ r)!). These candidates are then challenged with all the combinations of product ions for each of the peptides in the pool. These challenge ions are specified for each experiment but may (22). All ions listed are considered with the charge states appropriate for the calculation. Non-unique peptides were removed by determining whether all Q3 values in a combination have a counterpart challenge combination where the ions are within a tolerance (e.g. Ϯ1 m/z) of a candidate combination. All remaining peptide product ions are considered unique and comprise a UIS consisting of a Q1 value and at least one Q3 value. For an example parameter file see supplemental Table S2.
Estimating the Likelihood of SRM Redundancy-The E. coli and human proteomes were downloaded from Swiss-Prot release 56.1. A computer program was written to in silico digest the selected proteome allowing for up to two missed cleavage sites and conditional oxidation of methionine. The peptides were then mapped into a set with the redundant peptides noted. A list of 500 randomly selected proteins for each proteome was then in silico digested, and the sequence-unique peptides were listed. For each of the sequenceunique peptides the charge state of 2ϩ was set, and the b and y ions in the m/z range of 300 -2000 were generated. The set of all possible combinations of these ions was created. Each peptide in the proteome with a Q1 or an isotopic Q1 indistinguishable from candidates Q1 values, accounting for charges of 1ϩ, 2ϩ, and 3ϩ and isotopic contributions of upto ϩ5 amu, was used to make sets of b and y series challenge ions. If the SRM ions were present in a set of challenge ions, that combination of SRMs was marked as redundant. For this comparison a Q1 tolerance of Ϯ0.6 m/z and a Q3 tolerance of Ϯ1.0 m/z were used. Once all the possible SRM ion combinations were checked, the probability of choosing a redundant combination was computed by dividing the number of redundant combinations by the total number of SRM ion combinations for that peptide. The average of these probabilities was then computed as an estimate of an expected likelihood of redundancy in SRM analysis.
Cell Culture-E. coli K-12 (MG1655) was grown in LB medium to mid-log phase (A 600 ϭ 1.2) and collected by centrifugation. The cells were washed with 50 mM Tris/HCl, pH 8.0, and then resuspended in 50 mM ammonium bicarbonate, pH 8.5, supplemented with protease inhibitors. The cells were lysed using a French press operated at 12000 p.s.i., and then the supernatant was collected following centrifugation at 2000 ϫ g.
Sample Preparation-1 ml of the E. coli lysate was adjusted to 8 M urea in 50 mM ammonium bicarbonate, pH 8.5, and reduced with tris(2-carboxyethyl)phosphine (5 mM) at room temperature for 1 h. Proteins were alkylated in 10 mM IAA for 1 h in the dark. The sample was diluted 1:10 with 50 mM ammonium bicarbonate and then digested with trypsin (20 g) at 37°C for 18 h. The digest was concentrated and desalted using a 1-ml solid-phase extraction cartridge. Peptides were gravity-loaded onto a pre-equilibrated cartridge, desalted with 5 ml of 0.1% TFA, and then eluted with 5 ml of 80% acetonitrile, 0.1% TFA. Acetonitrile was removed by centrifugal evaporation to reduce the volume of the eluent to ϳ0.5 ml.
Liquid Chromatography and Mass Spectrometry Analysis-Digested protein samples were analyzed using a 4000 QTrap hybrid triple quadrupole/linear ion trap mass spectrometer (Applied Biosystems, Foster City, CA) operating in positive ion mode. Peptides were separated by nanoflow liquid chromatography using an Eksigent 2D LC system (Eksigent Technologies, Dublin, CA). Digested samples were analyzed by injecting 10 l of the digest onto a precolumn (Captrap 0.5 ϫ 2 mm, Michrom BioResources Inc., Auburn, CA) for preconcentration with 95:5 mobile phase A:mobile phase B (mobile phase A: 2% (v/v) acetonitrile containing 0.1% (v/v) formic acid; mobile phase B: 80% (v/v) acetonitrile containing 0.1% (v/v) formic acid) at 10 l/min. Peptides were then separated using a ProteCol C 18 column (300 Å, 3 m, 150 m ϫ 10 cm; SGE Analytical Sciences, Ringwood, Victoria, Australia). Peptides were eluted from the column using a linear gradient from 95:5 mobile phase A:mobile phase B to 45:55 mobile phase A:mobile phase B over 120 min at a flow rate of 600 nl/min. The LC eluent was subjected to positive ion nanoflow analysis using a NanoSpray II source equipped with a MicroIonSpray II spray head. Column eluent was directed into the MicroIonSpray II spray head via coupling to a distal coated PicoTip fused silica spray tip (360-m outer diameter, 75-m inner diameter, 15-m-diameter emitter orifice; New Objective, Woburn, MA). Samples were analyzed using an ion spray voltage, heater interface temperature, curtain gas flow, and nebulizing gas flow of 2.1 kV, 150°C, 18, and 12, respectively. Collision energy (CE) was determined using the following equa-tion CE ϭ slope x(m/z) ϩ intercept where slope ϭ 0.050 and intercept ϭ 5.5 for 2ϩ precursor ions. MS data were searched against all E. coli entries in the Swiss-Prot database (version 53.2) using MASCOT (Matrix Science, London, UK) allowing for one missed cleavage, alkylation of cysteine (IAA), and oxidation of methionine. False discovery rates were determined by searching the MS data against a reversed E. coli database.
IDA-IDA experiments utilized an enhanced MS survey scan (m/z 350 -1500) followed by three data-dependent product ion scans of the three most intense precursor ions. Precursor ions were fragmented a maximum of two times before being excluded for 2 min.
MIDAS-MIDAS experiments were used in an attempt to identify peptides for each protein in the tricarboxylic acid cycle. MRM transitions were designed for each protein in the tricarboxylic acid cycle using the MIDAS Workflow Designer (Version 1.0.0, Applied Biosystems). Enhanced product ion scans (MS/MS) were triggered when individual MRM signals exceeded 300 counts/s. A list of MRM transitions was obtained by taking the amino acid sequence for each protein and theoretically digesting the sequence in silico. MRM transitions included the potential variable modifications of oxidation of methionine and alkylation of cysteine (IAA). MRM transitions with a maximum of two variable modifications per peptide were considered. Q1 values for tryptic peptides (2ϩ and 3ϩ charge states and no missed cleavages) between m/z 350 and 1500 were determined by the MIDAS work flow designer program, and Q3 values were the first 1ϩ product y ion above the 2ϩ or 3ϩ precursor ion. Precursor ions were fragmented a maximum of two times prior to being excluded for 2 min. MRM experiments were conducted for each protein in the tricarboxylic acid cycle using unit resolution settings for Q1 and Q3.
UIS-SRM Scanning-UIS-SRM experiments conducted on a 4000 QTrap utilized two SRM transitions (UIS͉ 2 ϭ Q1 3 Q3a and Q1 3 Q3b) to detect the target peptide. Wherever possible, UIS experiments utilized a primary Q3 value corresponding to the highest intensity product ion that constituted a UIS and a secondary Q3 value corresponding to the second most intense product ion that constituted the UIS for each peptide candidate. Additional scans utilizing UIS other than the first and second most intense product ion pairs were also assessed wherever possible. Overlay of the extracted ion chromatogram of the Q3 product ions indicated detection of UIS. UIS assays were validated by triggering a product ion scan (MS/MS) when individual SRM signals exceeded 300 counts/s.

RESULTS
Computational Simulation to Assess SRM Assay Redundancy-We developed a computational simulation of an SRM experimental work flow as typically conducted on a triple quadrupole MS instrument. First we considered that each protein was present at equal abundance and calculated all possible peptides that would be formed by trypsin digestion (m/z range from 300 to 2000 m/z) from a proteome considering charge states of 1ϩ, 2ϩ, and 3ϩ and allowing for up to two missed cleavages. We next determined the precursor and corresponding product ion (b and y ions) masses that would be generated by CID. Those precursor m/z values within an m/z isolation window defined by a seed peptide were combined into a bin. Isotopic abundance was also taken into account as this causes some peptide isotopes to fall within the isolation mass window. For every bin, each peptide was considered, and its product ions were challenged with the product ions from all other peptides residing in the bin. This process allowed us to calculate SRM assay redundancy for each peptide in the chosen proteome.
To explore potential SRM assay redundancy we randomly sampled 500 E. coli and 500 human proteins (Ͼ12,000 peptides/data point), selected sequence unique peptides, then evaluated the number of redundant SRM assays for each peptide as a function of the number of SRM transitions, and computed the likelihood that a randomly chosen assay would be redundant (Fig. 1). Fig. 1 shows that standard SRM analysis using a single transition (Q1, Q3 pair) for a given peptide had no power to resolve peptide identity when considered in the wider context of a proteome. Even for highly abundant proteins of the E. coli tricarboxylic acid cycle, extracted ion chromatograms (XICs) from SRM transitions showed the presence of multiple peptide signals with high intensity (supplemental Fig. S1). An example is shown in Fig. 2 for the SRM  Table I by computationally determining the number of peptides that shared a single transition (Table I). Table I shows 10s to 100s of peptides for each targeted SRM transition. For a further discussion on the issue of ion interference in SRM assays see Sherman et al. (21). Clearly there is a significant level of redundancy for single SRM transitions. A common approach to address the problem is to combine multiple transitions; however, as these are normally selected because of favorable fragmentation without consideration of m/z redundancy, this does not solve the problem. This is illustrated in Fig. 1, which shows that even when combining up to five randomly selected product ions there remains considerable likelihood of assay redundancy.
Unique Ion Signatures Are Non-redundant SRM Assays-Using the simulation described previously we observed many instances where particular combinations of m/z ions were non-redundant (Fig. 3). We term these ion combinations UIS as they map exclusively to a given peptide in a proteome under the defined conditions. We observed that two SRMs (Q1 3 Q3a and Q1 3 Q3b; together they comprise the UIS (Q1, Q3a, and Q3b) and are therein referred to as UIS͉ 2 ) were necessary and sufficient to define peptide identity in this simulation. These coordinates comprise the set of UIS͉ 2 and provide proteome coverage for the proteins that contain one or more sequence-unique peptides of Ͼ99 and Ͼ96% for the E. coli and Homo sapiens Swiss-Prot proteomes with Q1 tolerance of Ϯ0.8 amu (Fig. 4A). Interestingly at this Q1 tolerance there are many UIS͉ 2 per protein (estimated average of 26 in E. coli and 16 in humans) (Fig. 4B). Given that there are numerous UIS per protein and individual peptides may possess multiple UIS, the likelihood of experimentally observing at least one unique proteotypic peptide per protein is favorable.
Evaluation of More Stringent UIS Simulations-The simulation described above considers typical experimental conditions that have been reported in publications described to date for SRM work flows. In these experiments, Q3 product ions used in SRM transitions are either y or b ions. Given that gas-phase peptide fragmentation sometimes yields ion species other than y and b ions we evaluated the impact this would make on defining UIS. In this stimulation we considered loss and gain of water and ammonia from certain y and b ions, presence of multiply charged product ions, a ions, and peeling ions (22,23). As would be expected, the consideration of additional ions negatively impacted the number of UIS͉ 2 addresses (Fig. 4A). Given that when an additional ion is added, the number of potential UIS addresses scales with the binomial coefficient, we considered whether UIS addresses with three product ions (i.e. UIS͉ 3 ) would improve proteome-wide coverage when additional ion series were included in the simulation. Fig. 4A shows that UIS͉ 3 addresses restored Ͼ99% proteome coverage for proteins containing one or more sequence-unique peptides even when numerous ion series were considered. In fact, the average number of UIS͉ 3 per protein was greater than 1500 for either E. coli or H. sapiens proteomes (Fig. 4B).
UIS Profiling of E. coli Tricarboxylic Acid Cycle-As a practical, proof of principle example of UIS for targeted proteome profiling we analyzed enzymes of the E. coli tricarboxylic acid cycle (Table I). We applied a combination of both IDA and MIDAS acquisition methods to detect tricarboxylic acid cycle peptides (supplemental Table S1) and matched these to an E. coli UIS atlas that was precalculated for each of the 18 tricarboxylic acid cycle target proteins. UIS-SRM assays using UIS͉ 2 were then selected for each protein and combined into a single MS acquisition method, and an aliquot of trypsin proteolytically cleaved E. coli cell lysate was analyzed by LC/ MS/MS (Table I). Clear evidence of UIS͉ 2 detection was apparent when the extracted ion chromatograms of each Q3 m/z FIG. 2. An example of experimental SRM redundancy. The XIC resulting from targeting a single SRM transition to detect LDGLSDAFSVFR from the protein succinate dehydrogenase ironsulfur subunit is shown. The significant number of peaks results from interference from the sample. Unique Ion Signature Mass Spectrometry were overlaid (Fig. 5). As a further confirmation step we used the SRM signal to trigger MS/MS in the 4000 QTrap and searched these data using MASCOT. Fig. 6 displays the combined UIS-SRM scans detecting 13 of the 18 tricarboxylic acid cycle proteins from a single injection of cell lysate. Using this approach enzymes for each step of the tricarboxylic acid cycle were identified by UIS and validated by MS/MS. In this case MS/MS was conducted only as a validation step, although in principle this is a redundant step when utilizing UIS (supplemental Fig. S2). Thus UIS presents a novel identification strategy for triple quadrupole instruments. A key benefit of using an SRM work flow is that data acquisition is faster than in IDA, and sensitivity is greater if MS/MS scans are not required for peptide identification. Of the five tricarboxylic acid cycle proteins not observed in our analysis, we did not detect any UIS candidate peptides from SucB, SdhC, and SdhD using either IDA or MIDAS acquisition methods. SdhC and SdhD are small hydrophobic transmembrane proteins that were most likely not extracted given our sample preparation methods. Peptides from FumB and FumC were detected by MIDAS, but the FumB peptides did not possess UIS because these peptide sequences are also present in FumA. The FumC peptide detected by MIDAS contained a single UIS; however, the b 6 product ion that constituted the UIS was not detected using the UIS assay nor could it be readily observed in the MS/MS scan. It is important to note that failure to detect some UIS candidate peptides such as FumC is not a flaw of UIS methodology per se but rather a result of poor detection of the necessary Q3 product ion whose intensity is governed by the physicochemical properties of the specific peptide.
UIS for Validation of Peptide Identity from MS/MS-A valuable additional use of UIS is to underpin a functionally orthogonal method to validate peptide assignments obtained from MS/MS spectra. As a proof of concept demonstration we used MS/MS spectra acquired using the Universal Proteomics Standard, a mixture of 48 human proteins, that was analyzed by IDA on a QSTAR XL mass spectrometer and searched with MASCOT using conditions described previously (24). 36 proteins were identified with a p value Ͻ0.05 and appropriate ion score. We computed UIS͉ 2 for these 36 proteins and then searched the MASCOT output for the presence of the ions needed to exclusively identify the peptides proposed by MASCOT. Ions that comprise the UIS were detected in 32 of the 36 proteins proposed by MASCOT, providing a facile mechanism to orthogonally validate the MS/MS assignments (Fig. 7). The four proteins that were not confirmed lacked sufficient intensity of the key product ions that were required to validate these assignments using UIS. The inability to validate these four proteins does not necessarily mean an incorrect assignment by MASCOT as this algorithm relies on the presence of numerous product ions unlike UIS that uses the minimal essential set. However, the intersection of UIS validation and MASCOT assignments sets a new standard for compelling evidence of true positive peptide assignments. Additionally in supplemental Fig. S3 we show evidence of using UIS to rescue two assignments from MS/MS spectra that were poorly informative for MASCOT and therefore were assigned poor expectation values by MASCOT. In isolation, these low scoring spectra would be unassigned, but as they intersect with UIS, they should be considered accurate. DISCUSSION Computational methods for peptide identification are key to proteomics because of the sheer volume of data generated by experiments. We used computational simulation and provided experimental evidence to show that the undirected selection of SRMs to monitor proteins in a proteome will likely result in a significant percentage of assays with ambiguous results because of interference from non-target peptides that share the same SRMs ( Figs. 1 and 2). However, we demonstrated A, the effect of Q1 tolerance on the percentage of proteins that are addressed by UIS. UIS͉ 1 , the red curve, clearly demonstrates that the use of a single transition in a complex mixture is unsuitable for proteome analysis. The addition of a second transition into the same computational context, UIS͉ 2 shown in blue, significantly increases the number of UIS resulting in sufficient coverage of the proteome. When considering a greater set of ions that may interfere with UIS the number of UIS͉ 2 addresses, indicated in yellow, declines. Introduction of a third fragment ion, UIS͉ 3 , overcomes this problem leading to sufficient UIS͉ 3 addresses to restore UIS coverage to the entire proteome. Of note is that the order of the UIS (the number of fragment ions) has far greater impact than the Q1 tolerance. B, the mean coverage of UIS per protein in E. coli. The blue line (UIS͉ 2 b& y) shows the mean number of UIS per protein and the impact of Q1 tolerance. The yellow line (UIS͉ 2 all) displays the impact of increasing the number of types of challenge ions, and the green line (UIS͉ 3 all) shows how the numbers recover when the order of the UIS is increased. C, distribution of UIS by protein mass. The figure illustrates that the number of UIS per protein corresponds with the molecular weight of the protein. Interestingly if one were intentionally targeting a lower molecular weight protein a higher order UIS may be desirable, increasing the likelihood that one of the UIS could be experimentally observed and used as an assay. by using curated proteomes generated by experimental investigation that a solution to this predicament is available. Our approach was to use the assumptions made in the simulation as a hypothesis for the content of the proteome. It should be noted that it is important to accurately mirror the experimental conditions in the simulations as they are fundamental to the results of the simulations. Thus, any UIS that are shown to be false indicate discordance between the assumptions and the experiment.
Computationally UIS occur at surprisingly high frequency in each proteome, enabling detection of at least one peptide in Ͼ99 and Ͼ96% of the E. coli and human proteomes, respectively. We found that these computations achieve this coverage using only two transitions (UIS͉ 2 ). A database of UIS, named ProteomeDB, is currently being made available online. There is currently no robust method that accurately predicts product ion intensity; thus we do not have the ability to predict which UIS product ions will be present experimentally. Nonetheless there are various strategies that could be adopted to increase the likelihood of detecting appropriate product ions, including use of simple rules (e.g. selection of proline), so-phisticated evaluation of peptide physicochemical properties and predicted fragmentation based on these properties (25)(26)(27), use of data repositories (28), or direct empirical methods. One approach would be to select ions based on their membership in multiple UIS. By this we mean select the ions that are present in multiple UIS, thus providing a level of redundancy. Additionally if the UIS of a given order (UIS͉ n ) do not provide an observable ion signature then by simply increasing the number of transitions by one (UIS͉ n ϩ 1 ), the binomial coefficient dictates that approximately an order of magnitude more addresses are generated, likely providing one that is readily observable. To illustrate, a peptide having 20 ions and using two product ions (UIS͉ 2 ) results in 20 ϫ 19/2 ϭ 190 possible coordinates, or for UIS͉ 3 the result is 20 ϫ 19 ϫ 18/(3 ϫ 2) ϭ 1140 possible coordinates. The significance of these observations are profound for large scale proteome profiling, and given a high predicted frequency of UIS occurrence this raises the likelihood that a significant portion of the proteome will be "MS-observable" using the sensitive detection methodology provided by SRM. Furthermore these MSobservable UIS could be considered a higher order proteotypic peptide as they are not only sequence-unique but are non-redundant in the m/z domain for a given computation.
We expect that there is considerable utility in using UIS for validation of peptide identities obtained from conventional analysis of MS/MS data. As this approach is functionally orthogonal to conventional probability-based methods it adds confidence to any assignments that are consistent between these approaches. Any lack of concordance between the two methods is not grounds for rejecting the conventional assignment given that numerous product ions are often used in deriving these assignments. Furthermore a higher order UIS might be present in the MS/MS scan. As we demonstrated in Fig. 7, UIS can also be used to interrogate spectra from low scoring assignments that are below reporting criteria thresholds. Provided that the spectra contain UIS, confident peptide assignments can be made for peptides that have non-uniform fragmentation patterns. This may prove of immense value for proteome profiling given that some estimates suggest that up to 50% of all MS/MS spectra are unassigned (6).
It is important to recognize that the results presented here are only applicable in the context of the simulation, a key parameter of which is the database. If a protein in the database is composed exclusively of peptides found elsewhere in the database (e.g. isoforms, evolutionarily related proteins, etc.) then there are no UIS for those proteins. Additionally giving equal consideration to the presence of   Table I. Each UIS͉ 2 is indicated by black dots above the paired co-eluting peak. C, barcode representation of the E. coli tricarboxylic acid cycle obtained by UIS scans in B. The representation was calculated as a function of the product of Q3a and Q3b ion intensities for each UIS. Colored bars correspond to peptides detected by UIS in B.

FIG. 7. Validation of MASCOT-assigned identities by UIS. A-C
show examples where a MASCOT identity was assigned and then used to retrieve the corresponding UIS for that peptide. On the right is the list of proteins that MASCOT identified that were validated by UIS. each peptide in the proteome is a key variable that penalizes some ion combinations from obtaining UIS status. Clearly this does not truly represent the in vivo situation; yet without an accurate method to account for abundance levels and expression patterns relevant to the sample, this variable cannot be reduced. We have previously considered the effect of using LC retention time to overcome SRM assay redundancy (21). Additionally here we conducted an "orderof-magnitude" analysis to compare the power of (i) LC retention time or (ii) use of an additional SRM Q3 product ion to eliminate assay redundancy (supplemental analysis). Several assumptions were made regarding peptide distribution in the LC time and m/z domain; each of these assumptions was made to favor LC retention time, i.e. the use of a uniform peptide distribution. This analysis indicates that use of LC retention time is 30 times less likely to eliminate redundancy than the use of an additional Q3 product ion. That is, a UIS͉ 2 plus retention time is an order of magnitude less effective than using a UIS͉ 3 without retention time. If one desired to include peptide retention time to reduce redundancy this does provide benefit but may be difficult to accurately predict. Nonetheless as LC separation is an integral component in proteomic analysis a rapid path to UIS implementation might involve the use of LC retention time coupled with appropriate MS/MS reference libraries and possibly isotopic peptide reference peptides for optimum robustness.
There are three primary components to MS-based peptide identification, namely 1) signal, 2) noise, and 3) information content. The main thrust of the current work addresses information content. Further development of MS-based peptide identification would benefit from decoupling signal from noise for which many possible solutions could be adapted from the field of signal analysis (29,30). Optimized methods to deal with noise will provide added confidence in UIS identification and are an important future direction that will likely need instrument-specific solutions.