Data-Independent Acquisition Guidelines
These guidelines are for studies where comprehensive fragmentation data is acquired independent of eluting components, with the goal of global analysis of a mixture. This strategy is commonly known as Data-Independent Acquisition (DIA), and contrasts with other strategies for acquiring peptide fragmentation data that are based on detection of eluting precursor ions, often referred to as Data-Dependent Acquisition (DDA), or strategies where a subset of components are specifically targeted using the strategies of single/multiple/parallel reaction monitoring (SRM/MRM/PRM). There are different guidelines for these other strategies(1,2), and authors should adhere to the guidelines appropriate to their acquisition approach.
It is very important that authors also review the checklist document that has to be completed on the submission site when submitting a manuscript.
Experimental Section
Experimental Design and Statistical Rationale:
Authors must include a subsection in the Experimental Methods section with the header “Experimental Design and Statistical Rationale”. In this section, clearly state:
- the numbers and types of sample conditions employed
- the total number of samples analyzed and described in Results
- the number of technical, process and/or biological replicates performed. If no replicate analyses were performed, clearly state why this should be considered acceptable for your study
- rationale for the choice of sample numbers, replicates, etc.
- if retention time standards or other spiked protein or peptide standards were used
- the methods used for sample acquisition order randomization (if appropriate for your study)
- if a library was created as part of the work for subsequent peptide detection, the number and type (biological/technical) of samples used for library creation
- Describe algorithms or programs used for data processing and statistical analysis. Fully describe and/or reference the statistical tests used for subsequent data analyses and provide reasons for choice of statistical tests used.
Data Acquisition
There are several different acquisition strategies for DIA data. Authors should provide all parameters they think are important to help evaluate the results. These include:
- Whether MS1 data was acquired, and if so, m/z range/s.
- Whether m/z range was fractionated for fragmentation analysis; and if so, parameter of separation (m/z / mobility), number of windows, whether overlapping windows were acquired, and total cycle time.
Methods for DIA Data Analysis
Software for analyzing DIA data can be roughly split into two strategies; those that try to match a peptide to an individual spectrum (spectrum-centric) and those that try to detect a given peptide somewhere in a data file (peptide-centric). For each strategy it is possible to either match against a protein database or a spectral library, but most spectrum-centric analyses query against a protein database and peptide-centric approaches mostly make use of spectral libraries. The guidelines below are divided based on the analysis strategy. Some workflows may use a combination of both approaches, in which case both set of guidelines are applicable.
For Spectrum-Centric DIA Analysis:
Peak List Generation: State the method and/or program (including version number and/or date) used to create the peak lists.
- List parameters used in the creation of this peak list, particularly any processing which might affect the quality of the subsequent database search. Examples include smoothing, signal-to-noise thresholding, charge state assignment or de‐isotoping, de-multiplexing, relative contribution of the detected charge states (m/z or drift separated) of a peptide to a product ion spectrum.
- Define how retention/drift time and intensity of ions in peak list file is assigned.
- State the maximum number of precursor peak lists an observed fragment ion can be included in.
- In cases where additional customized processing of the collections of peak lists has been performed, e.g. clustering or filtering, the method and/or program (including version number) should be referenced.
Search Engine: The name AND version (or release date) of all programs used for database searching must be provided.
- Sequence Database or Spectral Library: The name AND version (or release date) of all sequence database(s) or spectral libraries used must be listed. If a database or library was compiled in‐house, a complete description of the source of the sequences or spectra is required and the software used for library generation. The number of entries actually searched from each database or library must be included. If the database or library used is very small (< 1000 entries) or excludes common contaminants, justification must be specifically provided since this may generate misleading assignments and an inaccurate false discovery rate estimate.
- Enzyme specificity: A description of all enzymes used to generate peptides, including the number of missed and non‐specific cleavages (e.g. semi‐tryptic) permitted, must be listed.
- Fixed modification(s): A list of all modifications considered (including residue specificity) must be given.
- Variable modifications: A list of all modifications considered (including residue specificity) must be given. If there fixed or variable modifications were not specified, this should be so stated.
- Mass tolerance for precursor and fragment ions (if this is a user-definable setting; some software automatically determines this).
- Known contaminants excluded: All omitted peaks from pre‐designated contaminants (or if any of these fragments are used for calibration) must be identified.
- Threshold score/expectation value: Criteria used for accepting individual spectra should be stated along with a justification.
- False Discovery Rates at Peptide, Protein and Batch levels: For large scale experiments, the results of any additional statistical analyses that estimate a measure of identification certainty for the dataset, or allow a determination of the false discovery rate, e.g., the results of decoy searches or other computational approaches.
For Peptide-Centric DIA Analysis:
Spectral Libraries
For all libraries the number of spectral entries and the number of proteins these cover (target and decoy) must be reported. For libraries of small size (<1000 entries), justification for the validity of using such as small search space must be provided.
If the library was created as part of this study:
If created from DDA data then the DDA MS/MS guidelines must be completed for this data.
- If public data was used to compile the library, location where the raw data can be obtained / downloaded from.
- Software used for library generation (including version number)
- When multiple spectra for a peptide are available;
- if a representative spectrum was added to the library, criteria for its choice; e.g. best scoring, most confident modification site localization.
- if a composite spectrum was created in the library, then parameters used for merging of spectra
- Whether only a subset of the identified peptides were used in library creation; e.g. removal of unmodified or modified peptides
- Whether specific peaks (e.g. precursor ion) were removed from the spectra
- Whether thresholding was applied to the spectrum (e.g. minimum S/N, maximum number of peaks per spectrum retained)
- Estimated FDR of entries in library; including method of estimation. If results from multiple analyses are being combined in the library, software/method used for FDR control when combining.
If a public library was used for data analysis:
- Version number of library. Provide literature citation if available.
- Location library can be obtained / downloaded from.
- What additional metadata about spectra in the library was used; e.g. retention time, ion mobility.
- Whether the library was further processed; e.g. subsetted; peak lists thresholded.
If predicted spectra were used:
- Software used to create spectra (including version number)
- Parameters used for deciding which peptides to include in the library (e.g. source of protein sequences; enzyme specificity assumed; what modifications were included; ranges of peptide lengths / masses included…).
If the library contains decoys:
- How many decoy entries are included (relative to the number of target entries)
- How were these assigned to decoy proteins (to allow protein-level FDR estimation)?
- How were these decoy spectra created?
Matching of Data to Spectral Library
- Name and version number of software used for peptide-centric analysis
- Was precursor detection attempted?
- If so, how was precursor information used?
- What mass tolerance was used for matching precursor ions?
- Was retention time or ion mobility used to assist identification?
- If so, how was this used; e.g. was a window around the predicted time/mobility applied?
- What method was used to align retention times between acquisitions, or describe tests used to assess retention time reproducibility.
- Was chromatogram peak shape used as a parameter in scoring results? If so, how?
- How many peaks were used for identifying individual peptides? (for some software this may be a range)
- Criteria for selection of these peaks; e.g. relative intensity in library spectrum, must be above a certain mass; must be within a certain mass range…
- Mass tolerance employed for fragment ions when matching to library.
- If modification sites are reported, method for assessing site localization reliability
- False Discovery Rates at Peptide, Protein and/or Batch levels: The results of any additional statistical analyses that estimate a measure of identification certainty for the dataset, or allow a determination of the false discovery rate, e.g., the results of decoy searches or other computational approaches.
Results Section
Peptide and Protein Reporting
Depending on the focus of the study, results may be most appropriately reported at either the peptide or protein level. A table of results must be provided, either in the main manuscript, or if large, as a supplementary file submitted to the journal with the manuscript.
For results reported at the protein level this table must include:
- Protein accession number
- Count of the number of distinct peptide sequences assigned to each protein: When computing this number, multiple matches to peptides with the same primary sequence should be counted as a single distinct peptide, including multiple matches that represent different precursor charge states or modification states. Any alternative assumptions must be justified.
- If identified by library searching, number of distinct peptide sequences in the library that are from this protein.
- For any proteins identified by a single distinct peptide sequence, peptide-level information must additionally be provided and annotated spectra or chromatograms (whichever is more appropriate; see below)
For results reported at the peptide level, the results table must include:
- Protein accession number
- All peptide sequences assigned.
- Precursor charge and observed mass/charge (if MS1 data used).
- All modifications observed.
- For peptide-centric analysis, the number of matched and unmatched fragments and a statistical measure of quality of match to the spectrum in the library
- For spectrum-centric analysis the score(s), and/or statistical measure associated with the individual peptide identification.
- If the identified peptide contains a biological modification, a measure of reliability for the modification site localization must be reported (or it must be indicated that site localization reliability was not assessed).
- For reporting of peptides with biological post-translational modifications, or proteins identified on basis of a single unique peptide (which is not encouraged), the ability to view annotated spectra or chromatograms for these identifications must be made available. This can be achieved by:
- Submission of all data and search results to a public results repository that is equipped with a viewer, prior to submission of the manuscript to the journal.
- Submission of data and search results in a file format that allows visualization of the spectra using a freely‐available viewer
Please see http://www.mcponline.org/page/content/annotated-spectra for more details about how annotated spectra from different software can be achieved.
In the general results section, we encourage authors to report the percentage of total ion current observed in the mass spectrometry data that is explained by the interpretation of the data through whichever software was used. Authors should also state how the percentage was determined or estimated.
Please see the document entitled “Information on how to provide access to annotated spectra” for details on how to comply.
Quantification
Manuscripts presenting quantitative proteomic results from mass spectrometric analyses must provide the following information:
- All relevant quantification data (as part of the peptide and/or protein identification tables), along with a description of how the raw data were processed to produce these measurements (e.g. whether MS1 or MS2 ions were used for quantification).
- A complete description of post-processing steps, such as outlier rejection, filtering with respect to identification scores or CV, categorical exclusion of data by thresholds (e.g., based on signal to noise or minimum ion counts.)
- Number of peptides used for quantification of each protein (if different from number used for identification).
- A description of how the analytical reliability of measurements was validated using technical replicates and statistical methods. Citation of standard methods or specialized software may be used. However, it is essential to demonstrate that the data contained in the manuscript do conform to the hypotheses made by the models.
- A description of how the biological reliability of measurements was validated using biological replicates, statistical methods, independent experiments, etc. Studies based on a single biological experiment are generally not acceptable (except as a dataset to test bioinformatic systems). If a biological replicate from the same source cannot be performed (e.g. patient sample), a large enough number of similar biological samples, appropriately justified, must be performed in order to enable sound conclusions.
- A description of how co-eluting peak interferences were handled for quantification
- If modification site localization is reported, software used to evaluate localization reliability.
- Proper estimates of uncertainty and the methods used for the error analysis.
Quantification of many proteins or peptides generally results in the need to use some form of multiple hypothesis testing correction. Whenever possible, confidence in protein quantification should be provided for each individual protein rather than the global dataset. Any conclusions drawn or hypothesis generated from the quantitative data in the manuscript must be in concert with the determined estimate of uncertainty.
- A description of the way multiple isoforms in a protein group were quantified.
Data Submission to a Public Repository
All mass spectrometric output files in the original instrument vendor file format must be deposited, at the time of first submission of the paper, in a publicly accessible site that is independent of the authors' control (e.g. any of the ProteomeXchange resources). If spectral libraries were created as part of the study, then the raw data used to create these should also be deposited (unless it is already publicly available, in which case the location to download should be referenced), as well as the libraries created (both target and decoy). The spectral library data should preferentially be deposited as a separate submission to make it easier to reference. Repositories generally require a username and password for access to the submitted dataset. This information must be provided to the Editors at the time of submission to the journal, and it will be made available to Reviewers as part of the manuscript review process. Data conversion to an open format such as mzML is encouraged if software capable of reading the instrument vendor file format is not widely available. In all cases, the spectra are expected to be provided in a form as close the raw data as possible, prior to any processing that might affect the quality of subsequent interpretation.
In addition, a file must be submitted as supplementary material (and also to the repository to which the raw data was submitted) that maps the relationship between each raw data file, intermediate processed file and results file, and identifies which are biological, technical or process replicates. All software analyses must be documented with the corresponding version of the software.
Requests for exemptions (or delays not related to technical problems) from this requirement must be made in writing to the Data Management Editor [[email protected]] at the time of submission. Embargoed deposits must be made publicly available at the time of publication.
Further information regarding this requirement can be obtained by contacting [email protected].