Targeted Peptide Measurements in Biology and Medicine: Best Practices for Mass Spectrometry-based Assay Development Using a Fit-for-Purpose Approach

Adoption of targeted mass spectrometry (MS) approaches such as multiple reaction monitoring (MRM) to study biological and biomedical questions is well underway in the proteomics community. Successful application depends on the ability to generate reliable assays that uniquely and confidently identify target peptides in a sample. Unfortunately, there is a wide range of criteria being applied to say that an assay has been successfully developed. There is no consensus on what criteria are acceptable and little understanding of the impact of variable criteria on the quality of the results generated. Publications describing targeted MS assays for peptides frequently do not contain sufficient information for readers to establish confidence that the tests work as intended or to be able to apply the tests described in their own labs. Guidance must be developed so that targeted MS assays with established performance can be made widely distributed and applied by many labs worldwide. To begin to address the problems and their solutions, a workshop was held at the National Institutes of Health with representatives from the multiple communities developing and employing targeted MS assays. Participants discussed the analytical goals of their experiments and the experimental evidence needed to establish that the assays they develop work as intended and are achieving the required levels of performance. Using this “fit-for-purpose” approach, the group defined three tiers of assays distinguished by their performance and extent of analytical characterization. Computational and statistical tools useful for the analysis of targeted MS results were described. Participants also detailed the information that authors need to provide in their manuscripts to enable reviewers and readers to clearly understand what procedures were performed and to evaluate the reliability of the peptide or protein quantification measurements reported. This paper presents a summary of the meeting and recommendations.

• In the field of Proteomics, umbrella terms like "MRM/SRM" and "targeted MS" can convey the erroneous message that the results are unquesAonably correct with respect to what is being detected and how much is present • New methods (e.g., HR--PRM) are blurring lines between targeted quanAficaAon and discovery proteomics: what do we need to watch out for?
• Many targeted--MS papers are being published without documenAng what was done and what results were obtained that jusAfy the claims made difficult for reviewers and readers to assess quality and reliability • Design an assay with low technical varia2on that can be used to assess biological varia2on in a large number of samples by precise, rela2ve quan2ta2on • Prove feasibility of > 100--plex (34 proteins) assays in plasma • Improve LOD and LOQ by deple)ng abundant proteins • Evaluate true quan)ta)ve accuracy and diges)on recovery using heavy labeled proteins • Conduct blinded verifica)on study to assess accuracy, precision and reproducibility across mul)ple sites and instrument pla_orms • Evaluate system suitability test in context of this large--scale inter--lab study  In te n sity, co u n ts 847  • Pep)de ID is straight forward • Interferences can be objec)vely determined • Quan)ta)on is based on a single transi)on • Protein standards bring us closer to "accuracy" • Mul)plexing allows us to target many analytes in 1 injec)on • External calibra)on is not used • Cost of internal standards • Accurately quan)fying internal standards • Quan)ta)on is based on a single transi)on • Protein standards are hard to come by • Data quality must be high to quan)fy all analytes in 1 injec)on • Assump)ons regarding diges)ons are made • Curves can take 3+ days to complete Advantages and Disadvantages of This Approach Study 9.1 -Blinded Sample Results • Results across 13 sites showed good reproducibility • Some sites showed differences in precision and accuracy • Use of the system suitability and curves will help iden)fy sources of varia)on "Fit--for--purpose"

Minimum Informa)on Needed in Publica)ons
• Complete disclosure of LC--MS methods used -Dwell )me, cycle )me, RT window, AGC target, etc • Sta)s)cs or "calcula)ons" sec)on in the Methods -LODs, LOQs, precision, accuracy, response curves, etc -Cite methods of calcula)on • How is quan)ta)on being done? -# transi)ons used for quan)ta)on -Pep)des per protein: combined or reported separately -What assump)ons are made? -How is the curve being used? • Address interferences -How are data evaluated for interferences? Targeted MRM/MS Experiments Our Goals • To rapidly and precisely quantitate a multiplexed panel of proteins in human biofluids using an absolute quantitative proteomic strategy.
• To verify and validate the candidate disease biomarkers toward clinical use.
Projected Users • Verification and validation quantitative proteomic researchers in academic and industrial laboratories. • We use stable isotope-labeled standard (SIS) peptides.
• 13 C/ 15 N labeled standards are chemically identical to unlabeled counterpart and are distinguishable by mass only.
• 3 transitions/peptide empirically targeted for RT verification and interference screening in the control.
• 1 transition/peptide in the final MRM method.

Q/A
• How else can the target peptide identities be confirmed?

Ionization Suppression
• Correct for ion suppression and matrix effects through the use of 13 C/ 15 N labeled analogs of the unlabeled analytes (peptides in our case).
• SIS peptides behave identically to their NAT counterpart in terms of chromatographic retention, electrospray ionization, and gas-phase fragmentation.
• Produce same pattern of product ions.
• Distinguishable by precursor and/or product ion m/z only.

Question
• If labeled internal standards are so effective at alleviating ion suppression, why then is protein quantitation with them not yet universal?
Cost of the standards is a deterrent, while the reproducibility and transferability of the technique is a misconceived limitation. Analyte Plex-level per Injection • We typically target >100 peptides in a single run.
• 348 peptides (149 plasma proteins) were recently targeted in a quantitative MRM analysis.
• Robustness is independent of plex-level.
av. CVs <10% for signal and <0.05% for RT • Equivalent protein concentration and LOQs are obtained. Q/A • Is the ability to reach higher plex-levels a current technological limitation?
It appears so. The 348 peptides targeted in a single run is pushing the current limits, whereby cycle times are <1 s and dwell times are sufficient to obtain ~10 points across the chromatographic profile of the reconstructed peptide. • We target each peptide's highest responding, interference-free MRM transition and multiple peptides per protein.
• Peptide standard curves are generated with control biofluid.
• merits: multiplexing ability and sample throughput • Chromatographic details.
• Acquisition parameters and MRM transition lists.

Verification of Results
• Qualification and quantitation criteria.

Question
• Should raw data be uploaded into public databases?
Yes. • 50-100 pg/mL with IgY14 • 0.3-1 ng/mL without IgY14  LOQ (S/N>10): • 50-300 pg/mL with IgY14 • 0.5-5 ng/mL without IgY14    Experimental goals: To sensitively and precisely quantify the concentration(s) of peptide(s) in a tryptic digest of a complex protein matrix, but not the accurate amount of the protein in the original sample  Applications: Verification of low abundance biomarker candidates (e.g., fusion protein products); highly sensitive quantification of proteins and PTMs (e.g., phosphorylation) in systems biology

Q/A: Effects of concatenation/pooling
 What is the typical range of analyte plex-level per-injection? What is the impact of the plex-level used on performance of the PRISM-SRM assays?
 Typical plex-level in PRISM-SRM: 5-30 (can be much larger)  Concatenation to >10 fractions has minimal impact  Accuracy is largely unaffected  Impact on precision is evident for LLOQ level analytes  Impact on sensitivity is matrix-and separation-dependent: • potential non-characteristic loss in pooling and concentration steps • less impact in depleted plasma (vs. non-depleted plasma) • pooling followed by short separation diminishes signal by 1/3 to 2/3

Q/A: Quantification method
 Explain the method of quantification.
 Peptides are quantified through Light/Heavy ratios (peak area) and response curves  All transitions are checked for potential interference (e.g., AuDIT)  The most sensitive transition is used for quantification  "Crude" peptides can be used for comparative studies  Discuss the capabilities and limitations of the approach.
 Strength: use of internal standards mitigates many issues that could affect quantitation accuracy (e.g., sample prep and instrument analysis reproducibility)  Limitations: 1) synthetic heavy peptide standards are required for PRISM-SRM; 2) cost is high if purified and accurately quantified internal standards are used; 3) 3-5 weeks lead time for peptide synthesis; 4) not "true" absolute quantitation 9 Q/A: Use of standard curves  If you generate standard curves (calibration or response curve), explain how you use them to assess the quantitative accuracy of the assay.
 The slope and y-intercept from the curve regression are used in calculating the analyte concentration in a sample (process replicates for at least one of the data points)  External calibration can be used, but is not required  Correlation to other measurements (e.g., ELISA) can be made  In-house programs are used for peptide selection from our own data repository  Skyline and in-house algorithms are used for transition selection (e.g., analysis of Orbitrap LC-MS/MS data) and CE optimization (e.g., direct infusion, CE ramping in scheduled SRM)  Skyline is used for LC-SRM scheduling and method generation  Skyline is used for data visualization and quantification  Skyline is used for data sharing (Panorama is being tested)  How do you account for suppression of ionization in your quantification method?
 Because heavy isotope-labeled internal standards are used throughout PRISM-SRM analysis, we do not need to account for suppression of ionization 12 Q/A: "Bottom-up" protein quantification  Can you provide a useful estimate or accurately determine the amount of protein in the matrix based on the measured levels of peptides? Explain how/why. Indicate experimental parameters such as number of peptides per protein and the criteria/computational tools applied.
 The amount of proteins in the matrix can be estimated: • use protein and heavy isotope-labeled peptide when building response curve (2-3 peptides; spike-in samples go through the entire sample prep process) • if protein standard is not available, measure the same actual clinical samples using both PRISM-SRM and other measurements (e.g., ELISA) and establish calibration curve  If you have multiple peptides from the same protein and each gives a different answer for the extrapolated protein level, how do you deal with this?
 Make sure there is no potential PTM site (e.g., phosphorylation, N-glycosylation)  Check protein isoform information  Check if there is potential motif that inhibits trypsin digestion  Cheer up, some great discovery might have been made!!!

Level of Multiplexed Analysis
 We typically do not use LC/MS/MS for de novo hypothesis generation; rather, we use it to understand biology surrounding a target that is of interest to our portfolio.  Most assays have fewer than 10 analytes; targeted assays MRM are developed using the fewest analytes possible to interrogate target biology.
 The influence of plex multiple on assay robustness/performance has not been investigated.
 Formal acceptance criteria are not applied in early stages.  LLOQ is based on precision and accuracy (if protein std available). LOD is considered to be 3x std dev of blank.

Q2: Confidence in Protein Assignment
Explain how you establish confidence that what is being measured is the analyte of interest (e.g., match to spectra of an internal standards, match to reference spectra from discovery experiments, RT, etc.). How do these methods differ from "Discovery Proteomics" using data-dependent or dataindependent experiments?

Confidence in Protein Assignment
Targeted MRM  MRM methods are not prepared without peptide standards.  We synthesize standards for all surrogate peptides of interest and use full scan MS/MS, RT, and exact m/z (if available) to confirm assignments.
 Blast searches are performed to verify the uniqueness of all surrogate peptides used.

Non-Targeted Label-Free Analysis
 De novo hypothesis generation work done using label-free methods on an Orbitrap; MS1 used for quantification with data-dependent MS/MS for peptide identification.
 Workflow for discovery proteomics using label-free methodology has been published. 1,2 Measures taken to confirm protein assignments are discussed.

Method of Quantification
 Labeled protein IS rarely used (availability/purity).  SIL-IS prepared for all surrogate peptides, typically with 2-3 flanking residues.  One to three transitions monitored depending on matrix complexity and method for clean-up.
 Extra transitions often used for confirmation only.  Relative assay 1 -peptide std curves  Definitive assay 1 -protein std curves  Protein standards are well characterized.  Peptide stds are qualified by AAA for peptide content (on fit-for-purpose basis).  For definitive assays, spike-recovery and std addition are used to qualify a surrogate matrix.

Q7: Ionization Suppression
How do you account for suppression of ionization in your quantification method?

Ionization Suppression
 Not a major issue owing to high reliance on SIL internal standards.  Anti-peptide IP employed where ever possible over conventional methods (faster method development and reduced ion suppression).
 We do not specifically quantify ion suppression; it is therefore rolled up into our estimation of spike-recovery.
 Single biggest concern comes from detergents needed for protein handling and IP. We usually modify the sample preparation method to address this issue (change detergent, increase washing steps, etc.

Assay Qualification
 Depends on the use of the assay (Fit-for-Purpose).  Both pre-study and in-study validation considered.  Pre-study: Total Error of 30% is considered the default for acceptance -Not applied to early work -Accuracy measured by spike-recovery using a protein std (definitive) -Accuracy can be estimated from peptide std or QCs (relative)  In-study: bracketed std curves used to assess run acceptance.  For later stage work, qualification of the surrogate matrix is performed: 1. Std addition -does the surrogate and authentic matrix give similar slopes? 2. Dilutional linearity -does the signal for an authentic biological sample decrease linearly when diluted in surrogate matrix? 3. Spiked recovery -is recovery within + 20% of theoretical?  Important to address source of determinant error (e.g. stability or PAE).

Q10: Author recommendations
What information do authors need to provide in their manuscripts/ supplement to enable reviewers and readers to understand what was done and to be able to judge the confidence of the measurements made?

Author recommendations
 Sufficient detail needed to allow replication of work.  More extensive supplemental sections recommended.  Authors should include discussion on how/why they selected the level of validation used (explain fit-for-purpose rationale).
 Surrogate matrix qualification should be documented.  Consistent nomenclature should be adopted for key terms regarding validation or quality assessment.
 Validation data should be included in publication of measurements.

6/4/13
13 Author recommendations  Nature 496, 398 (2013) -Editorial announcement To ease the interpretation and improve the reliability of published results we will more systematically ensure that key methodological details are reported, and we will give more space to methods sections. We will examine statistics more closely and encourage authors to be transparent, for example by including their raw data. Central to this initiative is a checklist intended to prompt authors to disclose technical and statistical information in their submissions, and to encourage referees to consider aspects important for research reproducibility (go.nature.com/oloeip).

Biomarker Verification
Recommendations for Tier 2  Tier 2 assays do not need to be in full control; however, the major contributions to assay variation should be identified during development and addressed as an assay progresses.  Total error of 30% is recommended for assay acceptance.  Methods can be either relative or definitive based on the standards used.  Assay precision should be measured for all methods and can be used for LLOQ assignment.
 Accuracy (bias) is measured in definitive methods by spike-recovery and requires a characterized protein standard.  Spiked recovery using interleaving validation dilutions can be used to estimate bias without a protein std (relative methods).  Parallelism should be demonstrated between the surrogate matrix used to prepare standards (and QCs) and the authentic biological matrix.  Total error profile should be considered when selecting method for curve fitting.

Recommendations for Tier 2 -continued
 SIL peptide internal standards should be made for each analyte.  IS with flanking residues recommended, but not required.  Std curves must have a minimum of 5 pts.  In-study validation: -Duplicate std curves (bracketed) recommended -QC samples optional  Selectivity blanks should be included in each run.  More than one transition per analyte depending on sample prep and matrix.
 Fit-for-Purpose experiments to address pre-analytical effects need to be considered to avoid large sources of determinant error. • Typically 1 protein analyzed (sometimes 2, rarely 3 or more) • Typically 2-3 peptides per protein (sometimes 1, rarely more than 3)

Targeted Peptide Measurements in Biology and Medicine: Best Practices for Assay Development Using a "Fit-for-Purpose" Approach
-Decide upfront what to do with the data from several peptides
[min]    Step 1: Generation of SRM assays for target proteins

Selectivity of PRM Analyses
Step 2: Testing the detectability of proteins in the target sample Step 3: Preparing the nal quanti cation method Step 4: Data analysis of large-scale SRM datasets Example applications:

Response of M. tuberculosis to hypoxia in a time-course experiment
Quanti cation of cancer-associated proteins in a case-control study

M. tuberculosis
Detection of 2,884 proteins across 4 orders of magnitude in unfractionated Mtb digest

Cancer-associated proteins
Detection of 162 proteins across 5 orders of magnitude in depleted plasma Absolute label-free abundance estimates Estimated abundances from PeptideAtlas Integration and scoring of all detected peakgroups   Time ( 2. ID con dence • SRM assay coordinates derived from con dently identi ed spectra • mQuest/mProphet analysis to match to assay coordinates and internal standard (if available) • FDR cutoff < 1% for con dent identi cation   Receptor tyrosine kinase/non--receptor tyrosine kinase panel

Q. Explain your method of quanCficaCon, how many transiCons you monitor and which ones are chosen to quanCfy.
Synthe;c standard --obtain reten9on 9me --fragment ion intensity

Q. Can you provide a useful esCmate or accurately determine the amount of protein in the matrix based on the measured levels of pepCdes? Explain how/why. Indicate experimental parameters such as number of pepCdes per protein and the criteria/computaConal tools applied. If you have mulCple pepCdes from the same protein and each gives a different answer for the extrapolated protein level, how do you deal with this?
•

Q. What so8ware and analyCcal tools do you use in your studies and why? § Skyline
§ QuaSAR § AuDIT § Prototype data analysis package in "R" for CV and ICC determina9on  Control Disease        single test results ... obtained under different conditions ... may be expected to lie with a specified probability". There may be difficulties in carrying out studies of reproducibility in many areas of medical interest. For example, the gestational age of a newborn baby could not be determined at different times of year or in different places. However, when it is possible to vary conditions, observers, instruments, etc., the methods described above will be appropriate provided the effects are random. When effects are fixed, for example when comparing an inexperienced observer and an experienced observer, the approach used to compare different methods, described below, should be used.

Comparison of methods
The main emphasis in method comparison studies clearly rests on a direct comparison of the results obtained by the alternative methods. The question to be answered is whether the methods are comparable to the extent that one might replace the other with sufficient accuracy for the intended purpose of measurement. Method 1 Method 2   313 The obvious first step, one which should be mandatory, is to plot the data. We first consider the unreplicated case, comparing methods A and B. Plots of this type are very common and often have a regression line drawn through the data. The appropriateness or regression will be considered in more detail later, but whatever the merits of this approach, the data will always cluster around a regression line by definition, whatever the agreement. For the purposes of comparing the methods the line of identity (A = B) is much more informative, and is essential to get a correct visual assessment of the relationship. An example of such a plot is given in Figure 1, where data comparing two methods of measuring systolic blood pressure are shown. Although this type of plot is very familiar and in frequent use, it is not the best way of looking at this type of data, mainly because much of the plot will often be empty space. Also, the greater the range of measurements the better the agreement will appear to be. It is preferable to plot the difference between the methods (A -B) against (A + B)/2, the average. Figure 2 shows the data from Figure 1 replotted in this way. From this type of plot it is much easier to assess the magnitude of disagreement (both error and bias), spot outliers, and see whether there is any trend, for example an increase in A -B for high values. This way of plotting the data is a very powerful way of displaying the results of a method comparison study. It is closely related to the usual plot of residuals after model fitting, and the patterns observed may be similarly varied. In the example shown (Figure 2) there was a significant relationship between the method difference and the size of measurement (r = 0.45, n = 25, P = 0.02). This test is equivalent to a test of equality of the total variances of measurements obtained by the two methods (Pitman, 1939;see Snedecor and Cochran, 1967, pp. 195-7). What is a good assay?
◆ LOQ and LOD, dynamic range   • Linear calibration curves for every monitored transition for the peptide; • Log plots to visualize all raw data points, and identify trends. § Calibration curves are fitted using robust, weighted regression.
• Tables provide: -Slope and intercept along with confidence intervals § Calibration curves can be based on: • Concentration (= PAR * IS Concentration) • Peak area ratio (= Analyte Peak Area / IS Peak Area) • Analyte peak intensity (area) LOD Regression LOD distribution for all peptides at a site § Plots show LOD distribution for peptides at each site. § Identical samples with prespecified SOP were analyzed on 14 different instruments spanning 9 laboratories § Data were processed using QuaSAR and plots created from QuaSAR output • QuaSAR analysis + plots took less than a day! Summary of QuaSAR features § QuaSAR implements a comprehensive, easy to use pipeline for MRM-MS data analysis including: • Statistics: For every monitored peptide -Coefficient of variation (CV) -Regression slope and intercept (with confidence intervals) -Interference detection, and -Limits of detection (LOD) and quantification (LOQ) • Visualization: Succinct visual summaries of various results including reproducibility, interferences and detection limits are generated.
• Interpretation: Statistics and visualization are integrated to enable effective data interpretation, understanding and insight.

Proteomics and Biomarker Discovery
QuaSAR can promote standardization of data analysis in manuscripts § Use of QuaSAR results in reproducible data analysis: • QuaSAR version + parameter settings completely specifies analysis • Easy replication of data analysis at other sites/laboratories § Custom analysis would require authors to specify: • Determination and use of response/calibration curves • Interference detection and action taken • Assessment of assay precision

3) Analyte Identification
• Explain how you establish confidence that what is being measured is the analyte of interest (e.g., match to spectra of an internal standards, match to reference spectra from discovery experiments, RT, etc.). • How do these methods differ from "Discovery Proteomics" using data-dependent or data-independent experiments? 10.06.2013 5 9

Workflows
• Label free • 2-plex • "Spike in" (typically SIS) • Peak detection on internal standard • Peak scoring on endogenous • "Label" • Peak detection and scoring on both channels spike in label 12

Method -Scoring
Scores are optimally combined using semi-supervised machine learning Reiter et al. NM 2011