*mixture*tandem mass (MS/MS) spectrum. These spectra tend to elude current computational tools because of the ubiquitous assumption that each spectrum is generated from only one peptide. Therefore, tools that consider multiple peptide matches to each MS/MS spectrum can potentially improve the relatively low spectrum identification rate often observed in proteomics experiments. More importantly, data independent acquisition protocols

*promoting*the cofragmentation of multiple precursors are emerging as alternative methods that can greatly improve the throughput of peptide identifications but their success also depends on the availability of algorithms to identify multiple peptides from each MS/MS spectrum. Here we address a fundamental question in the identification of mixture MS/MS spectra: determining the statistical significance of multiple peptides matched to a given MS/MS spectrum. We propose the MixGF generating function model to rigorously compute the statistical significance of peptide identifications for mixture spectra and show that this approach improves the sensitivity of current mixture spectra database search tools by a ≈30–390%. Analysis of multiple data sets with MixGF reveals that in complex biological samples the number of identified mixture spectra can be as high as 20% of all the identified spectra and the number of unique peptides identified only in mixture spectra can be up to 35.4% of those identified in single-peptide spectra.

^{ 1}

- Brunner E.
- Ahrens C.H.
- Mohanty S.
- Baetschmann H.
- Loevenich S.
- Potthast F.
- Deutsch E.W.
- Panse C.
- de Lichtenberg U.
- Rinner O.
- Lee H.
- Pedorli P.G. A.
- Malmstrom J.
- Koehler K.
- Schrimpf S.
- Krijgsveld J.
- Kregenow F.
- Heck A.J. R.
- Hafen E.
- Schlapbach R.
- Aebersold R.

*Nat. Biotechnol.*2007; 25: 576-583

*peptide*that generated the spectrum? However, it is increasingly being recognized that this assumption that each MS/MS spectrum comes from

*only one*peptide is often not valid. Several recent analyses show that as many as 50% of the MS/MS spectra collected in typical proteomics experiments come from more than one peptide precursor (

*e.g.*low reproducibility (

- Tabb D.L.
- Vega-Monototo L.
- Rudnick P.
- Mulayathy A.
- Ham A.J. L.
- Bunk M.D.
- Kilpatrick L.E.
- Billheimer D.D.
- Blackman R.K.
- Cardasis H.L.
- Carr S.A.
- Clauser K.R.
- Jaffee J.D.
- Kowalski K.A.
- Neubert T.A.
- Regnier F.E.
- Schilling B.
- Tegeler T.J.
- Wang M.
- Wang P.
- Whiteaker J.R.
- Zimmerman L.J.
- Fisher S.J.
- Gibson B.W.
- Kinsinger C.R.
- Mesri M.
- Rodriquez H.
- Stein S.E.
- Tempst P.
- Paulovich A.G.
- Liebler C.
- Spiegelman C.

*Journal of Proteome Research.*2009; 9: 761-776

*i.e.*a PSM score of 50 may be good for one spectrum but poor for a different spectrum). Ideally, a scoring function will give high scores to all true PSMs and low scores to false PSMs regardless of the peptide or spectrum being considered. However, in practice, some spectra may receive higher scores than others simply because they have more peaks or their precursor mass results in more peptide candidates being considered from the sequence database (

*pair*of peptides matched from the database to a given mixture spectrum M (

*i.e.*the significance of the top peptide–peptide spectrum match (PPSM)). As such, MixGF determines the probability that a random pair of peptides (out of all possible peptides within parent mass tolerance) will match a given mixture spectrum with a score at least as high as that of the top-scoring PPSM.

*p*values are on the order of 10

^{−9}or lower and thus there is typically lack of sufficient data points to accurately model the tail of the score distribution (

*et al.*(

*et al.*(

## MATERIALS AND METHODS

### Spectral Probability for a Mixture Spectrum

*P*matched to a spectrum

*S*with score

*T*is determined by the probability that a random peptide

*R*(out of all possible peptides) when matched to S has a score greater or equal to

*T*: Pr(

*Score*(

*R*,

*S*)≥

*T*) where

*Score*(

*R*,

*S*) is a scoring function for a peptide-spectrum-match. From here on, we will refer to this as the

*Single-peptide probability*in order to distinguish it from the other definitions introduced below. Analogously, to compute the statistical significance of a particular peptide pair (

*P*,

*Q*) matched to a mixture spectrum (

*M*) with a score of

*T*, we are interested in two statistical questions: 1)

*Joint probability*≡ Pr(

*Score*(

*R*1,

*R*2,

*M*) ≥

*T*): the probability that a random peptide pair (

*R*

_{1},

*R*

_{2}) (out of all possible peptide pairs) when matched to

*M*yields a score greater or equal to

*T*and 2)

*Conditional probability*≡ Pr(

*Score*(

*R*1,

*R*2,

*M*) ≥

*T*|

*R*1 =

*P*): given a peptide

*P*, the probability that a random peptide

*R*

_{2}(out of all possible peptides) together with

*P*when matched to

*M*yields a score greater or equal to

*T*. Intuitively a peptide–peptide spectrum match (PPSM) can fall into three categories: (1)

*Correct-match*: both peptides are correct matches; (2)

*Half-correct match*: one peptide is correct and the other peptide is an incorrect match; and 3)

*Incorrect-match*: both peptides are incorrect matches. We are interested in separating the correct matches from incorrect and half-correct matches. The definitions above address this question in two steps. The joint probability assesses the chance that two random peptides have the same or higher score than a given match. When this probability is very low, this means that at least one peptide is a statistically significant match to the spectrum (

*i.e.*it is a correct or half-correct match). Once we assume that at least one peptide is a true match, the conditional probability assesses whether the second peptide is also a statistically significant match (

*i.e.*correct matches). In summary, one is looking for PPSMs with both low joint probability and conditional probability.

### Scoring Function for Mixture Spectrum

*V*= ν

*1*… ν

*N*with

*N*elements, where ν

*i*is the sum of intensity of all the peaks with mass between

*i*− 0.5 and

*i*+ 0.5 and parent mass is defined as the sum of the masses of all amino acids in the peptide that generated the spectrum. A prefix residue mass (PRM) spectrum is a transformation of an MS/MS spectrum into a scored version

*S*=

*s*

_{1}…

*sN*using a probabilistic model as described before (

*i*of the PRM spectrum is a score

*si*that represents the log-likelihood that the peptide from which the spectrum was generated contains a prefix mass

*i*(

*P*, its prefix masses are defined by the amino acid masses for each peptide prefix. For a peptide

*P*of length

*n*with prefix masses

*p*

_{1}…

*pn*, we define its parent mass as

*pn*and the score of matching peptide

*P*to a spectrum is the sum of all the scores at its theoretical prefix masses in the PRM spectrum:

*P*to

*S*, the precursor charge state for

*S*is determined such that the parent mass of

*P*is equal to that of

*S*within the specified mass error tolerance.

*M*, we construct two PRM spectra,

*MH*and

*ML*, each generated using the corresponding scoring models for high and low-abundance peptides present in a mixture spectrum. As shown in MixDB (

*M*) against a pair of peptides (

*P*,

*Q*) we assume that the first peptide (

*P*) is the high-abundance peptide. Thus, the score of a pair of peptides (

*P*,

*Q*) against a mixture spectrum

*M*will be the sum of scoring

*P*with

*MH*and scoring

*Q*with

*ML*:

*P*is the same as a prefix mass of

*Q*, only the bin with the higher score is considered and the other peptide gets a score of zero for that particular mass position:

### Computing Spectral Probabilities

*JM*be a three-dimensional dynamic programming matrix where each element J

_{M}(

*p, q, T*) stands for the joint probability that a pair of peptides

*P*,

*Q*with parent mass

*p*and

*q*match to

*M*with score higher than or equal to

*T*. This means

*P*matches to

*MH*up to the

*p*-th bin and

*Q*matches to

*ML*up to

*q*-th bin. The following recurrence can then be used to compute the joint probability:

*a*,

*a*

_{1,}and

*a*

_{2}denote amino acids;

*mass*(

*a*) denotes the mass of an amino acid;

*prob*(

*a*) denotes the probability that a particular amino acid occurs in a peptide and recall that

*MH*and

*ML*are the PRM spectra defined in the previous section. When considering all possible peptide sequences this probability is uniform and has a value of $\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$20$}\right.$ for each of the 20 standard amino acids. To better reflect the amino acid composition observed in real protein sequences we can also define this probability by computing the frequency of each amino acid in the protein sequence database against which the spectra are searched. To start the computation of the recurrence, we initialize

*Jm*(0, 0, 0) = 1 and

*JM*(

*p,q,s*) for all entries where

*p*or

*q*is smaller than the smallest mass of an amino acid or

*s*is less than zero.

*P*,

*Q*) matched to a spectrum

*M*with score

*T*, we define that peptide

*P*and

*Q*contribute

*Tp*and

*TQ*to the total score, respectively. Assuming that peptide

*P*was matched to

*M*, we define a two-dimensional dynamic programming matrix

*CM*where each element

*CM*(

*q*,

*T*|

*P*) represents the conditional probability that a peptide with parent mass

*q*together with

*P*match

*M*with a score greater than or equal to

*T*. To compute this probability, we first modify

*ML*by setting all the bins corresponding to a prefix mass of

*P*to zero if

*MH*has a higher score at the same location. Then Conditional probability can be computed using the following recursion:

*CM*(0,

*TP*|

*P*). The base case starts at score

*TP*rather than zero because the first peptide

*P*already contributes

*TP*to the total score.

*P*,

*Q*) matched to a spectrum

*M*, we compute their respective single-peptide probabilities and the peptide with lower (

*i.e.*statistically more significant) single-peptide probability is designated as the first peptide. The dynamic programming method described above assumes that peptide fragment ions have integer masses. However, this is not appropriate for data sets with high mass accuracy in the MS/MS spectra. The details of how to extend this method for high mass accuracy data are described in the Supplementary Material. The current implementation of mixgf considers the set of all unmodified peptides or peptide pairs when computing the conditional and joint probability, however as shown in unpublished work MSGF+ (

### Approximating Joint Probability

*e.g.*quadratic for two peptides), making it difficult to generalize to cases with more than two peptides. Thus, it is desirable to find a way to efficiently approximate this probability. To derive this approximation we borrow an intuition from the definition of conditional probability where the joint probability of two random events (R

_{1,}R

_{2}), is equal to the probability of one event times the conditional probability of the second event given the first event:

_{r}(

*Score*(

*R*

_{1},

*M*) ≥

*TP*) of finding a random peptide

*R*

_{1}that matches to

*M*with a score equal or better than T

_{p}=

*Score*(

*P*,

*M*)? and (2) once we find a first peptide

*P*, what is the probability Pr(

*Score*(

*R*

_{1},

*R*

_{2},

*M*) ≥

*T*|

*R*

_{1}=

*P*) of finding a random peptide

*R*

_{1}that together with

*P*scores equal or higher than

*T*when matched to

*M*? Note that the first question is just the single-peptide probability and the second question is the conditional probability. Therefore, we can define the following approximation:

*Product probability*. This formulation is not exactly equivalent to the definition of joint probability because it fixes

*R*

_{1}=

*P*in the conditional probability term (where

*P*is the first peptide in the PPSM) and thus does not explicitly consider the dependences between all possible

*pairs*of peptides that can be matched to the mixture spectrum. However, both single-peptide probability and conditional probability can be computed efficiently in linear time and we show in the next section that this approximation is sufficiently accurate for our main use of the joint probability – to separate correct from incorrect matches to mixture spectra.

### Classification of Matches

*M*: (1)

*No-match: M*does not match any peptide in the database; (2)

*Single-peptide match: M*matches one peptide in the database; and (3)

*Mixture match: M*matches a pair of peptides in the database. Every query spectrum is initially assumed to be a putative mixture spectrum and is assigned to its top-scoring PPSM. Then a two-step procedure is used to separate true mixture matches from false mixture matches. At the first stage, all PPSMs with joint probability less than a threshold are accepted. Then PPSMs with conditional probability less than a second threshold are accepted as Mixture-matches. The probability thresholds are determined in a way such that it enforces a selected false discovery rate (FDR, see next section). Next, all the remaining spectra that do not pass either probability threshold are reconsidered as single-peptide spectra. Each PPSM is converted into a PSM by considering the first peptide as the match to the spectrum. Single-peptide probabilities are computed for all PSMs and a probability threshold is determined to enforce a selected FDR for single-peptide spectra. A graphical illustration of this classification procedure is provided in the Fig. 1.

### Estimation of False Discovery Rates

*TT*to be the number of PPSMs where both peptide matches are from the target database;

*TD*or

*DT*to be the number of cases where one peptide is from the target and the other peptide is from the decoy database and

*DD*to be the cases where both peptides are from the decoy database, the two FDRs mentioned above can be computed using the following formulae:

and

where

*T*is the number of PSMs from the target database and

*D*is the number PSMs from the decoy database.

*FDRSingle*enforces the FDR for Single-match, whereas

*FDRJoint*and

*FDRConditional*enforce the FDR for Mixture-match. All three

*FDR*operates on PSMs level (a Mixture-match is essentially treated as two PSMs, see Supplementary Material). Therefore to enforce a global FDR of 1% for all matches returned by MixGF, all three FDR thresholds were set to 1%.

### Data sets and Data Processing

*Yeast data set*(

*Saccharomyces cerevisiae*that was analyzed on an LTQ Orbitrap XL mass spectrometer (Thermo Fisher Scientific) and MS/MS spectra were acquired using a data-dependent scanning mode in which each full MS scan (

*m*/

*z*300–2000) was acquired on the Orbitrap at resolution 60,000, followed by eight MS/MS scans collected on the LTQ. Two Human data sets were also analyzed. The

*Human-L*data set (

*m*/

*z*350–1500 with a resolution of 60,000. The two most intense ions were fragmented in the linear ion trap using CID and ETD. The

*Human-H*data set (

*i.e.*same peptides with different charge states are counted as different ions), whereas the human spectral library contains 343,301 unique peptide ions. The protein sequence databases used were the SGD yeast protein database (

*ver.5/8/2009*) and the Human protein database (downloaded from NCBI refseq,

*ver.10/29/2010*).

## RESULTS

### Separating True and False Mixture Spectrum Matches

*a priori*the peptides that generated each simulated mixture spectrum, we can extract the top-scoring correct, half-correct, and incorrect matches returned by MixDB and compute their joint and conditional probabilities. As shown in Fig. 2A, joint probability performs very well when separating correct matches from incorrect matches but there is considerable overlap between the joint probability of correct-matches and that of half-correct matches (see Fig. 2

*B*). Further investigation of cases in the overlap region shows that for correct-matches usually both peptides contribute moderate scores to the final combined score but for the half-correct matches the correct peptide often contributes a very high score and thus even when paired with an incorrect match, the resulting combined high score still yields a low joint probability. Intuitively in order to separate half-correct matches from correct matches we need to look for cases that have high combined score as well as both peptides contributing significantly to the total score. The concept of conditional probability defined above aims to address exactly this question—is the score of the peptide pair (

*P*,

*Q*) significantly higher than that of the single peptide

*P*? As illustrated in Fig. 2

*C*, conditional probability is indeed better at separating correct matches from half-correct matches. Therefore, a two-step procedure is used to separate correct matches from false matches: at the first stage of MixGF, joint probability is used to filter out incorrect matches and then conditional probability is used to filter out half-correct matches.

### Approximating Joint by the Product of Conditional Probability

*P*is fixed, there are less opportunities for false positive matches to achieve high scores and thus the resulting spectral probability can be smaller in such cases. However, the range of probabilities where this underestimation occurs is well below the range where incorrect matches tend to occur. Therefore, for the purpose of separating correct matches from incorrect matches, using the approximation is nearly equivalent to computing the exact joint probability. As shown in Fig. 3

*B*, correct matches and incorrect matches remain very well-separated using either the product or joint probability. In addition, the product probability can be computed much more efficiently than the joint probability. In practice the average run time for joint probability is 205.7 s per PPSM, whereas the average time it takes to compute the product probability is only 0.12 s on a Windows 7 machine with an Intel Xeon(R) E5430 CPU, resulting in a ≈1700 times speedup. This makes MixGF computation time in similar scale as searching for the top-scoring PPSMs by MixDB which takes 0.49 s per spectrum on the Yeast data set and 1.08 s on the Human data sets with a 3.0Th precursor mass tolerance.

### Joint and Product Probability Improve the Detection of Mixture Spectra

*i.e.*worse) than those for correct matches to single-peptide spectra. This is because the presence of a second peptide in mixture spectra will allow more peptides to match to the spectrum with high score. However, for false matches the single-peptide probability distribution remains comparable for both single-peptide and mixture spectra because they are random matches in either case. Therefore, the distribution of single-peptide probabilities between correct and incorrect matches should be less well-separated for mixture spectra than for single-peptide spectra. To show this we used the simulated mixture spectra where the first peptide is mixed with a second peptide at 100%, 50%, 30% of the first peptide's total intensity and then computed the single-peptide probability, joint probability and product probability for the correct matches as well as the top-scoring incorrect matches. The performance of each probability function in separating correct from incorrect matches is shown in Table I. As expected, when the second peptide is at relatively low abundance (

*i.e.*30%), the FDR-controlled performance of single-peptide probability is nearly identical to that of joint probability because the mixture spectra are more similar to single-peptide spectra than at higher second-peptide abundances. However, as we increase the relative abundance of the second peptide, joint probability performs considerably better at separating correct matches from incorrect matches. Thus, we expect that as mixture spectra with more peptides become more common in experiments, joint probability and its product probability approximation will substantially improve our ability to identify mixture spectra.

Mixture coefficient | Probability | False discovery rate | ||
---|---|---|---|---|

1% | 2% | 5% | ||

α = 1.0 | Single-peptide probability | 71.9 | 74.7 | 81.8 |

Joint probability | 93.4 | 94.7 | 96.6 | |

Product probability | 93.6 | 94.0 | 96.7 | |

α = 0.5 | Single-peptide probability | 85.7 | 87.5 | 92.0 |

Joint probability | 93.5 | 94.3 | 96.0 | |

Product probability | 92.6 | 94.1 | 96.1 | |

α = 0.3 | Single-peptide probability | 89.4 | 90.8 | 92.8 |

Joint probability | 90.2 | 91.8 | 93.8 | |

Product probability | 90.4 | 91.7 | 93.8 |

### Identification of Mixture Spectra in Complex Biological Samples

Data set | Method | Identified spectra | Identified peptides | ||||
---|---|---|---|---|---|---|---|

Single | Mixture | Total | Single | Mixture | Total | ||

Yeast | ProbIDtree | 21807 | 504 | 22311 | 4826 | 495 | 4936 |

MixDB | 25033 | 748 | 25778 | 5702 | 895 | 5924 | |

MixGF | 28022 | 1320 | 29342 | 6315 | 1398 | 6637 | |

MSGFDB | 26657 | n/a | 26657 | 5752 | n/a | 5752 | |

M-SPLIT* | 28417 | 2053 | 30470 | 5997 | 2033 | 6684 | |

Human-L | ProbIDtree | 28614 | 1433 | 30036 | 8479 | 1675 | 9153 |

MixDB | 38855 | 5420 | 44275 | 13021 | 5735 | 15298 | |

MixGF | 39701 | 7052 | 46783 | 13027 | 6982 | 16080 | |

MSGFDB | 46137 | n/a | 46137 | 14027 | n/a | 14027 | |

M-SPLIT* | 49585 | 8425 | 58023 | 16504 | 8300 | 19826 | |

Human-H | ProbIDtree** | – | – | – | – | – | – |

MixDB | 34790 | 5395 | 40185 | 10317 | 4325 | 12350 | |

MixGF | 35760 | 8462 | 44222 | 10280 | 6707 | 13824 | |

MSGFDB | 46674 | n/a | 46674 | 12202 | n/a | 12202 | |

M-SPLIT* | 45680 | 10935 | 56615 | 12447 | 7988 | 16363 |

## DISCUSSION

*one-peptide-one-spectrum*assumption when designing the next generation of computational tools for identifying MS/MS spectra as instruments advances and higher complexity samples are being analyzed in less time in proteomic experiments. This also illustrates the potential of emerging data acquisition protocols (

## Acknowledgments

## Supplementary Material

## REFERENCES

- Large-scale analysis of the yeast proteome by multidimensional protein identification technology.
*Nat. Biotechnol.*2001; 19: 242-247 - A high-quality catalog of the Drosophila melanogaster proteome.
*Nat. Biotechnol.*2007; 25: 576-583 - Mass spectrometry-based proteomics.
*Nature.*2003; 422: 198-207 - More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC MS/MS.
*J. Proteome Res.*2011; 10: 1785-1793 - Precursor-ion mass re-estimation improves peptide identification on hybrid instruments.
*J. Proteome Res.*2008; 7: 4031-4039 - Detection of co-eluted peptides using database search methods.
*Biol. Direct.*2008; 3: 27 - Quantifying the impact of chimera ms/ms spectra on peptide identification in large-scale proteomics studies.
*J. Proteome Res.*2010; 9: 4152-4160 - Peptide identification by database search of mixture tandem mass spectra.
*Mol. Cell. Proteomics.*2011; (10 10.12, M111-010017) - Identification of tryptic peptides from large databases using multiplexed tandem mass spectrometry: simulations and experimental results.
*Proteomics.*2003; 3: 1279-1286 - Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra.
*Nat. Methods.*2004; 1: 39-45 - UPLC/MSE; a new approach for generating molecular fragment information for biomarker structure elucidation.
*Rapid Commun. Mass Sp.*2006; 20: 1989-1994 - Use of an integrated ms–multiplexed ms/ms data acquisition strategy for high-coverage peptide mapping studies.
*Rapid Commun. Mass Sp.*2007; 21: 730-744 - Precursor acquisition independent from ion count: how to dive deeper into the proteomics ocean.
*Anal. Chem.*2009; 81: 6481-6488 - Proteomics on an orbitrap benchtop mass spectrometer using all-ion fragmentation.
*Mol. Cell. Proteomics.*2010; 9: 2252 - Targeted data extraction of the ms/ms spectra generated by data-independent acquisition: A new concept for consistent and accurate proteome analysis.
*Mol. Cell. Proteomics.*2012; (11 0111-016717) - “Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography–Tandem Mass Spectrometry.”.
*Journal of Proteome Research.*2009; 9: 761-776 - Improving protein and proteome coverage through data-independent multiplexed peptide fragmentation.
*J. Proteome Res.*2010; 9: 3621-3637 - ProbIDtree: an automated software program capable of identifying multiple peptides from a single collision-induced dissociation spectrum collected by a tandem mass spectrometer.
*Proteomics.*2005; 5: 4096-4106 - Database searching and accounting of multiplexed precursor and product ion spectra from the data independent analysis of simple and complex peptide mixtures.
*Proteomics.*2009; 9: 1696-1719 - Deconvolution of mixture spectra from ion-trap data-independent-acquisition tandem mass spectrometry.
*Anal. Chem.*2009; 82: 833-841 - Peptide identification from mixture tandem mass spectra.
*Mol. Cell. Proteomics.*2010; 9: 1476-1485 - Andromeda: a peptide search engine integrated into the maxquant environment.
*J, Proteome Res.*2011; 10: 1794-1805 - Semi-supervised learning for peptide identification from shotgun proteomics data sets.
*Nat. Methods.*2007; 4: 923-925 - Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases.
*J. Proteome Res.*2008; 7: 3354-3363 - Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics.
*J. Proteome Res.*2007; 7: 254-265 - A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics.
*J. Proteomics.*2010; 73: 2092-2123 - Statistical calibration of the sequest xcorr function.
*J. Proteome Res.*2009; 8: 2106-2113 - Quality assessments of peptide–spectrum matches in shotgun proteomics.
*Proteomics.*2011; 11: 1086-1093 - A hypergeometric probability model for protein identification and validation using tandem mass spectral data and protein sequence databases.
*Anal. Chem.*2003; 75: 3792-3798 - Open mass spectrometry search algorithm.
*J. Proteome Res.*2004; 3: 958-964 - A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes.
*Anal. Chem.*2003; 75: 768-774 - Assigning spectrum-specific p-values to protein identifications by mass spectrometry.
*Bioinformatics.*2011; 27: 1128-1134 - Statistical characterization of a 1d random potential problem–with applications in score statistics of ms-based peptide sequencing.
*Physica A.*2008; 387: 6538-6544 - Spectral dictionaries: Integrating de novo peptide sequencing with database search of tandem mass spectra.
*Mol. Cell. Proteomics.*2009; 8: 53 - De novo peptide sequencing via tandem mass spectrometry.
*J. Comput. Biol.*1999; 6: 327-342 Kim, S., Pevzner, P. A., Universal database search tool for mass spectrometry. submitted for publication.

- Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry.
*Nat. Methods.*2007; 4: 207-214 - Network-assisted protein identification and data interpretation in shotgun proteomics.
*Mol. Syst. Biol.*2009; 5: 303 - The generating function of CID, ETD, and CID/ETD pairs of tandem mass spectra: applications to database search.
*Mol. Cell. Proteomics.*2010; 9: 2840-2852 - Deconvolution of mixture spectra and increased throughput of peptide identification by utilization of intensified complementary ions formed in tandem mass spectrometry.
*J. Proteome Res.*2013; 12: 3362-3371 Eds. Stein, S. E., Rudnick, P. A., NIST Peptide Tandem Mass Spectra LIbraries. Yeast Peptide Mass Spectral Reference Data, ion trap, 2009, National Institute of Standards and Technology, Gaithersburg, MD, 20899

- Development and validation of a spectral library searching method for peptide identification from ms/ms.
*Proteomics.*2007; 7: 655-667 - Understanding the improved sensitivity of spectral library searching over sequence database searching in proteomics data analysis.
*Proteomics.*2011; 11: 1075-1085 - Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification.
*J. Proteome Res.*2012; 11: 1621-1632

## Article info

### Publication history

### Footnotes

Author contributions: J.W., P.E.B., and N.B. designed research; J.W. performed research; J.W. and N.B. analyzed data; J.W., P.E.B., and N.B. wrote the paper.

### Identification

### Copyright

### User license

Creative Commons Attribution (CC BY 4.0) |## Permitted

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes

Elsevier's open access license policy