Acquiring and Analyzing Data Independent Acquisition Proteomics Experiments without Spectrum Libraries

Data independent acquisition (DIA) is an attractive method for quantitative proteomics. However, most DIA methods require collecting exhaustive, sample-specific spectrum libraries with data dependent acquisition (DDA) to detect and quantify peptides. Studies of non-human organisms, splice junctions, sequence variants, or simply working with small sample yields can make developing spectrum libraries impractical. Here we illustrate a method to efficiently generate DIA-only chromatogram libraries and demonstrate best practices for how to acquire, queue, and validate DIA data without spectrum libraries. Graphical Abstract Highlights Rapid DIA-only library building with gas-phase fractionation. Recommended DIA acquisition strategies with staggered windows and forbidden zones. Optimized DIA instrument settings for several Thermo Orbitrap instruments. Data analysis tutorial using open source DIA software. Data independent acquisition (DIA) is an attractive alternative to standard shotgun proteomics methods for quantitative experiments. However, most DIA methods require collecting exhaustive, sample-specific spectrum libraries with data dependent acquisition (DDA) to detect and quantify peptides. In addition to working with non-human samples, studies of splice junctions, sequence variants, or simply working with small sample yields can make developing DDA-based spectrum libraries impractical. Here we illustrate how to acquire, queue, and validate DIA data without spectrum libraries, and provide a workflow to efficiently generate DIA-only chromatogram libraries using gas-phase fractionation (GPF). We present best-practice methods for collecting DIA data using Orbitrap-based instruments and develop an understanding for why DIA using an Orbitrap mass spectrometer should be approached differently than when using time-of-flight instruments. Finally, we discuss several methods for analyzing DIA data without libraries.

Shotgun proteomics (1) using liquid chromatography (LC) inline with tandem mass spectrometry (MS/MS) is enabling a revolution in the study of large-scale systems biology (2). Although the most common approach to large-scale proteomics relies on data-dependent acquisition (DDA) (3), dataindependent acquisition (DIA) (4,5) is emerging as a powerful tool for studying the proteome (6). DIA workflows attempt to acquire the same precursor isolation windows repeatedly across the elution profile of a peptide. Unlike parallel reaction monitoring (PRM) (7), which targets specific peptides, DIA targets wide, evenly spaced precursor isolation windows that are tiled across an m/z range of interest. Originally Venable et al. (4) envisioned DIA as a method to detect peptides without requiring a precursor signal. Consequently, the original methods focused on acquiring MS/MS with narrow precursor windows (10 m/z) at 3 Hz with an approximate 35 s cycle time. This approach allowed them to generally acquire at least one MS/MS spectrum within the elution profile of each peak, but quantitation could only be performed on precursor ions in interspersed MS spectra. Modern Orbitrap and ToF instruments can collect MS/MS at 10 Hz or faster and allow for PRM-like quantitation using MS/MS spectra.
DIA MS/MS differ from DDA MS/MS in several ways. First, DDA MS/MS are triggered by the presence of a Top-N intense MS1 ion, whereas DIA MS/MS are triggered systematically regardless of precursor ion intensities. This means that although many MS/MS contain either no peptides or signal that is so low that it is uninterpretable, DIA always contains at least one MS/MS near the apex of the peptide signal, and cannot suffer from the data-dependent ion selection problem of triggering a poor MS/MS from the shoulder of a precursor peak. Second, DDA fragmentation is performed with a charge-state optimized collision energy, whereas DIA fragmentation uses a fixed specified collision energy that may not fragment peptides the same way. Higher charged peptides typically fragment best with lower collision energies, there can be a tradeoff between over-and under-fragmenting classes of peptides. Because of this, it is important that DDA methods match DIA settings when building spectral libraries and use the same collision energy settings in both acquisition methods (8,9). Third, DDA precursor selection and fragmentation theoretically contain only one precursor species per MS/MS, whereas DIA fragments all coeluting peptides within a specified precursor m/z range. Additionally, DDA analysis methods seldom consider multiple peptides per MS/MS, which commonly occur in congested regions of LC gradients (10) or with complicated or mixed proteomes such as ocean, soil, or gut metaproteomics studies (11).
Despite the wide appeal of DIA for quantitative proteomics, one drawback is that many approaches commonly require generating comprehensive DDA-based spectrum libraries (12) before interpreting any DIA data (9,13,14). Although this approach produces high-performance libraries with instrument-specific fragmentation and retention times (15), it does so at the expense of instrument time, sample, and significant effort offline fractionating that sample. However, several approaches have been developed to detect peptides directly from DIA experiments (16 -18), and here we demonstrate how to use them to successfully acquire and analyze DIA experiments without spectrum libraries using a DIA-only workflow (19). This document is designed to build intuition to aid in decision making before starting a DIA experiment. Specifically, we focus on collecting DIA data, constructing DIA-only libraries, and analyzing DIA data with open-source software using Proteowizard (20), Skyline (21) and EncyclopeDIA (19). Although much of this intuition is transferable to ToF-based experiments, here we use DIA methods for Orbitraps as examples. Nevertheless, principles emphasized here, such as gas phase fractionation, are also valid on other instrumentations. Finally, it should be noted that other open-source (e.g. DIA-Umpire (17)) and commercial (e.g. ProteinLynx Global Server (16), Scaffold DIA or Spectronaut Pulsar) software for detecting peptides without spectrum libraries can also be used.

EXPERIMENTAL PROCEDURES
HeLa Cell Culture and Sample Preparation-We cultured HeLa S3 cervical cancer cells (ATCC, Manassas, Virginia) at 37°C and 5% CO 2 in Dulbecco's modified Eagle's medium (DMEM) with added L-glutamine, 10% fetal bovine serum (FBS), and 0.5% strep/penicillin. We washed cells three times with 4°C phosphate-buffered saline (PBS) and frozen using liquid nitrogen. We lysed frozen cells in 9 M urea, 50 mM Tris (pH 8), 75 mM NaCl, and protease inhibitors (Roche, Basel, Switzerland, Complete-mini EDTA-free). After scraping, we probe sonicated cells for 2 ϫ 15 s, then incubated them for 20 min on ice, followed by 10 min of centrifugation at 21,000 ϫ g and 4°C. We estimated supernatant protein content using bicinchoninic acid. We reduced proteins with 5 mM dithiothreitol for 30 min at 55°C, alkylated with 10 mM iodoacetamide in the dark for 30 min at room temperature, and quenched with an additional 5 mM dithiothreitol for 15 min at room temperature. We diluted proteins to 1.8 M urea and digested with sequencing grade trypsin (Pierce, Waltham, Massachusetts) overnight at an estimated 1:50 enzyme to substrate ratio, before quenching with 10% trifluoroacetic acid to achieve approximately pH 2. We desalted the resulting peptides with 100 mg tC18 SepPak cartridges (Waters, Milford, Massachusetts) and dried them with vacuum centrifugation. We brought the peptides to 1 g/3 l in 0.1% formic acid and 2% acetonitrile prior to mass spectrometry acquisition.
Liquid Chromatography-We analyzed peptides with a Waters NanoAcquity UPLC coupled with a Thermo, Waltham, Massachusetts, Q-Exactive HF tandem mass spectrometer. We used an inhouse pulled column created from 75 m inner diameter fused silica capillary packed with 3 m ReproSil-Pur C18 beads (Dr. Maisch, Ammerbuch, Germany) to 300 mm, coupled with a Kasil fritted trap column created from 150 m inner diameter that was packed with the same C18 beads to 25 mm. Solvent A was 0.1% formic acid in water, whereas solvent B was 0.1% formic acid in 98% acetonitrile. For each injection, we loaded ϳ1 g peptides and separated them using a 90-min gradient from 5 to 35% B, followed by a 40 min washing gradient.
Experimental Design and Statistical Rationale-This study is focused on interpreting results from different DIA acquisition strategies, rather than any biological interpretation. As such, technical triplicate experiments of the single-injection runs were acquired from the same HeLa lysate using a block-based data acquisition strategy.
DIA Data Analysis-The library-free data analysis workflow used in this manuscript is fully described as a step-by-step walkthrough in supplementary Note 1. Briefly, we demultiplexed (22) staggered DIA data with 10 ppm accuracy in ProteoWizard (20) (version 3.0.1908). We used EncyclopeDIA (version 0.9.0) to search the resulting demultiplexed mzMLs. Walnut, a reimplementation of the PECAN FASTA search engine in EncyclopeDIA, was configured to search reviewed human proteins from Uniprot Swissprot downloaded December 13, 2019 (20379 total entries), with the default settings: fixed cysteine carbamidomethylation, 10 ppm precursor and fragment tolerances, using Y ions, and assuming full tryptic digestion with up to 1 missed cleavage. Library searching in EncyclopeDIA using both the Pan-Human Library (23) and chromatogram libraries was also configured with the same default settings, except both B and Y ions were used and library tolerances were also configured to 10 ppm. EncyclopeDIA search results were filtered to a 1% peptide-level using Percolator 3.1 and then filtered again to a 1% protein-level FDR assuming protein grouping parsimony.

RESULTS AND DISCUSSION
Designing A Balanced DIA Measurement Strategy-The intent of DIA is to measure as much of the proteome as possible, while still maintaining quantitative rigor. Balancing compromises between these two goals is critical for successful experiments. These compromises manifest as a result of three competing objectives: (a) maximize the total precursor range of targeted peptides, (b) maximize the number of points measured across every chromatographic peak, and (c) minimize the number of interfering peptides in each window.
The Precursor Range of Targeted Peptides-Although it is impractical to measure every possible tryptic peptide in a proteome, the total precursor range can be optimized to target most peptides. For example, although some peptides produce more intense signals below 400 m/z or above 900 m/z, we find that 93% of peptides in the Pan-Human library (23) can be observed within that range (Fig. 1). Assuming a fixed cycle time, narrowing our focus to this range allows us to collect narrower precursor isolation windows, and thus lowering signal interference for any given peptide. However, this same range only encapsulates 77% of the phosphopeptides in a human phosphopeptide library (24) of similar scope, suggesting that it is important to match the precursor range to the proteome of interest if specific peptides or modifications are targeted.
The Number of Points Across Chromatographic Peaks-Because quantitative measurements are made at the fragment-level, it is imperative that there are sufficient fragment ion measurements for each precursor isolation window to appropriately represent the peptide peak shape. Following the conventional practice of quantitative mass spectrometry (25), most DIA tools use trapezoidal rule-based integration to measure peak area. Although generally robust, significant measurement errors can be caused simply by undersampling across the shape of the peak ( Fig. 2A). Based on a model sampling at fixed intervals across Gaussian distributions (Fig.  2B), we estimate that restricting measurements to a minimum of nine points sufficiently limits bias caused by trapezoidal integration to below an average of 1% (Fig. 2C). In general, we recommend attempting to achieve a minimum average of 10 points to describe a peak to ensure that faster eluting peptides at the beginning and end of the chromatographic gradient are adequately measured.
Several data acquisition variables can be adjusted to achieve this requirement: total precursor isolation range, scan rate, and peptide elution width. Average peak elution widths are dependent on the liquid chromatography setup and can be determined by looking at past runs (DIA or DDA) using a fixed gradient. The necessary cycle time is the ratio of the average peak width and the minimum number of points needed to describe a peak (typically 10): cycle time ͓sec͔ ϭ average peak width ͓sec͔ minimum points across peak The optimal scan rate is instrument specific. Combined with the estimated cycle time, it is possible to determine the rela-tionship between total precursor range and the fixed precursor isolation width (or average width if using variable width windows): average precursor isolation width ͓m/z͔ ϭ total precursor range ͓m/z͔ ͑cycle time ͓sec͔ ϫ scan speed ͓hz͔͒ This calculation assumes equal transmission of ions across the entire precursor isolation window, and no ions outside that window. It should be noted that no Q1 quadrupole produces a true square-wave transmission, and that some researchers increase the precursor isolation width to add small margins on either side that account for loss of sensitivity at each window edge. We find that this is typically not necessary for instruments that employ a hyperbolic segmented Q1 quadrupole (e.g. for Thermo instruments, the QEϩ, QE-HF, QE-HFX, and Fusion Lumos), as long as the window boundaries are placed in "forbidden zones" (discussed below and in supplementary Note 2). However, small margins may help with other Q1 designs.
The Cost of Interfering Peptides-At first it might appear logical to increase the total precursor range to be as large as possible to measure the greatest number of peptides. Although we previously observed that most tryptic peptides could be detected within the total precursor range of 400 -900 m/z, there are some peptides for which the most intense charged ion falls outside that range, and others that are rarely (if ever) observed in that range. As the total precursor range increases, the precursor isolation width also increases, and with that the number of interfering peptides. We find that at some point, interference caused by wider precursor isolation widths outweighs any benefit gained from sampling more peptides. To demonstrate this, we analyzed the same tryptic pep-FIG. 1. The distribution of peptide precursor m/z in the Pan-Human library (purple) and in the HeLa phosphopeptide library (green) differ significantly, indicating the importance of setting precursor isolation width ranges appropriately for each experiment. The bin size is 10 m/z. FIG. 2. A, Trapezoidal quantitation can produce poor measurements with only five points (or fewer). B, Error (shaded in blue) in trapezoidal quantitation (red dashed lines) typically cancel out when measuring a Gaussian peak (black solid line) with eight to nine points across the peak. C, The average percent deviation with 95% error bars in actual/calculated area at different number of points across a Gaussian peak. The shaded regions indicate poorly quantified peptides with 5 or fewer transitions (red), moderately quantified peptides with between 6 and 8 transitions (yellow), and well quantified peptides with 9 or more transitions (green). tide sample generated from a HeLa lysate using five different acquisition schemes that were scaled to keep cycle time constant ( Fig. 3). We found that although the Pan-Human library (23) contains peptides from 400 -1200 m/z, increasing the total precursor range from 500 -900 m/z did not actually provide any meaningful increase in sensitivity when searching the resulting files with EncyclopeDIA, and indeed we found that the number of detected peptides starts to drop beyond 400 -1000 m/z. However, when searching a sample-specific library, where retention times and fragmentation patterns are tuned specifically for the instrumentation and chromatographic setup used in this specific experiment, we found that we detected more peptides at wider windows, presumably because of increasing the precision for matching peptide signatures.
Generating a Windowing Scheme-After deciding on a precursor range, the next step is to build a windowing scheme with an inclusion list. A more detailed review of various inclusion list approaches can be found elsewhere (9); here we will focus on general best practices for window width and placement. In addition to "normal" fixed-window DIA windowing schemes, there are several ways to improve peptide selectivity for the majority of peptides while maintaining a reasonable cycle time (Fig. 4). One approach is to use variable-width windows (26), which adjust the window-width based on the expected number of precursors in those windows. This approach has the advantage of producing narrow precursor isolation windows in regions of the mass range where there are the most peptide precursors. Conversely, windows in areas of the mass range with few precursors are often quite wide (as much as Ͼ50 m/z), and this can have a detrimental effect on the ability to differentiate coeluting precursors with similar peptide sequences.
Staggering precursor isolation windows from cycle to cycle (22) is a compressed sensing strategy (27) to achieve higher precursor specificity through undersampling (Fig. 5). Other applications of compressed sensing to achieve higher specificity include higher order multiplexing with MSX (28), and using a scanning quadrupole (29,30). These windowing schemes take advantage of collecting peptide signals from multiple regions of the precursor space in the same MS/MS, and using other MS/MS nearby in time with different precursor space regions to computationally deconvolve signal specific to each individual region. Staggered window and MSX deconvolution are now a built-in feature in the freely available, open-source software tool, Proteowizard (20), while interpreting data from some scanning quadrupole windowing strategies is possible through Skyline (21).
Most quadrupoles do not have perfectly rectangular isolation efficiency over the complete isolation window and DIA methods must account for the fact that transmission efficiency is lower at the window boundaries. The easiest way to do this is to add window margins (typically Ϯ 0.5 m/z) to each precursor isolation window, and this works well for normal or variable width windows. However, these margins can significantly complicate the mathematics behind compressed sensing strategies, and coupling these approaches is not recommended.
Another strategy to account for non-rectangular quadrupole efficiency is to take advantage of peptide mass defects. Because peptides are all made of the same components (e.g. amino acids), they have an "avergine" elemental composition (31) and there are regions of m/z values ("forbidden zones") where it is impossible to find a peptide precursor (32) (Fig. 6). Using an inclusion list with windows bordered by forbidden zones maximizes the transmission of precursor ions in the window range. By placing a window edge at one of these forbidden zones where no precursor can possibly exist, the quadrupole transmission edge effects are less pronounced. The forbidden zone width and sparsity changes across the m/z range, but is typically Ϯ 0.1 m/z. This matches the isolation window transmission falloff on instruments that utilize a hyperbolic segmented Q1 quadrupole (33).
Practically, this entails shifting the width and edges of the windows off of the chosen integer value (for example, a 24 m/z window from 400 m/z to 424 m/z) so that the window boundaries where quadrupole ion transmission suffers fall in forbidden zones where no precursor can exist (e.g. 400.43 m/z to 424.44 m/z). Forbidden zones for ϩ2H and ϩ3H peptides can be expressed as: ceil ͩ nominal mass optimal m/z increment ͪ ϫ optimal m/z increment ϩ optimal m/z constant where the optimal m/z increment is 1.00045475 and constant is 0.25 (34,35). This formula is based on the same principles that motivate the 1.0005 m/z bin constant used in SEQUEST (36), and is common to Skyline and EncyclopeDIA. Both programs are capable of generating inclusion lists using these principles by using Skyline's Edit Isolation Scheme feature (21) and EncyclopeDIA's Window Scheme Wizard (19). It should be noted that the constant should be modified when analyzing samples enriched in PTMs, such as phosphorylation, because each phosphate introduces a different mass defect pattern (discussed in more detail in supplementary Note 2). In this case, phospho-enrichment shifts the constant to 0.18.

Constructing DIA-Only Libraries
Leveraging prior knowledge in the form of libraries is the most common approach for detecting peptides in DIA ex-periments (37)(38)(39)(40). Most commonly, libraries are constructed from fractionated DDA data, at the cost of significant effort, sample quantity, and instrument time. Additionally, relying on DDA-based spectral libraries as prior knowledge for DIA-based experiments assumes that DDA MS/MS are reasonable representations of DIA MS/MS. Even if DDA-based spectral libraries are collected on the same instrumentation platform at the same time as the DIA measurements, that is not always the case. First, DIA co-fragments all peptides with the same collision energy, regardless of charge state. Typically, researchers must configure the instrument to fragment peptides assuming a specific charge state, and because DDA MS/MS are charge state optimized for collision energy, the fragmentation patterns may not match. Second, offline fractionation simplifies the matrix each individual peptide sees, relative to the unfractionated sample. Consequently, retention times can shift between the library and quantitative DIA injections because of missing on-column interactions between peptides (41). Finally, DDA spectral libraries from off-line fractionated samples do not result in MS/MS spectra that reflect possible co-fragmentation interferences that would occur in the original unfractionated sample, again because the matrix has been simplified. Regardless, this strategy of generating DDA-based libraries to accompany DIA acquisitions remains the most frequently used approach to analyzing DIA data. FIG. 5. Schematics for (A) staggered DIA windows, where (B) common peaks in the previous and next scan cycles can be used to computationally cut the precursor isolation window in half. Here the Time ϭ T 0 MS/MS scan acquired with a 600 -624 m/z precursor isolation window is used as an example. Green ions, which correspond to peaks that can be found in the 588 -612 m/z windows in the previous and next scans, can only come from peptides between 600 -612 m/z. Similarly, blue ions can only come from peptides between 612-624 m/z. Red ions are found in both sets of previous and next scans and are fractionally allocated to both computationally demultiplexed MS/MS, whereas orange ions are not found in any neighboring scans and can be considered noise.
FIG. 6. Frequency of precursor m/z in ϩ2H and ϩ3H peptides between 600 and 610 m/z in Uniprot Swissprot human tryptic peptides (blue) echoes the frequency of all precursors found in the Pan-Human library (red). Black circles indicate "forbidden zones" where no precursors are likely to exist, which make for excellent DIA precursor isolation window boundaries.
Analyzing DIA Data Sets Without Spectrum Libraries-Circumstances often dictate if generating a new sample-specific DDA-based spectrum library is feasible for each experiment. Although human samples can be interpreted with the Pan-Human library (23), very few other species-specific public libraries exist that are designed for DIA analysis. Global libraries, such as NIST (42), MassIVE-KB (43), and PeptideAtlas (44) typically do not contain calibrated or indexed retention times (45) (iRTs) making reuse for DIA impractical. In addition to working with non-human samples, other research interests (such as the study of splice junctions, sequence variants, or simply working with small sample yields) can make developing DDA-based spectrum libraries impractical. An alternative strategy is to predict spectrum libraries from FASTA databases using machine learning (46,47), although care must be taken to appropriately correct for false discoveries when searching such large libraries.
One way to address these disadvantages is to identify peptides directly from DIA experiments. Early on, DIA data was analyzed using typical database search engines (4,48) such as SEQUEST (36) but new approaches take advantage of the repetitive MS/MS measurements in DIA. Two major classes of tools have emerged to detect and quantify peptides from DIA experiments. Spectrum-centric analysis tools attempt to demultiplex several peptide signals from the same MS/MS spectra (16 -18), by time-aligning elution peaks for both fragment and precursor ions. Fragment ions that co-vary across retention time are likely to come from the same peptide, and matching precursor ions indicate the potential masses for that peptide. These time-aligned ions are converted into demultiplexed "pseudo" spectra that usually represent a single peptide and can be interpreted with any database searching engine. A powerful benefit for this approach is that it can leverage a wealth of downstream MS/MS software because the pseudo spectra effectively resemble DDA data. In contrast, peptide-centric analysis (49) looks for specific peptides across all spectra in a precursor isolation window. PECAN (18) (PEptide-Centric Analysis) is an implementation of this approach that attempts to detect peptide sequences in FASTA databases by scanning raw files across retention time for groups of fragment ions that could have resulted from those sequences. Rather than generating predicted fragmentation models (46,47), PECAN scores target peptides by considering the frequency of observing sequence-specific fragment ions at random in a background FASTA database. Some regions of the retention time gradient produce more fragment ions than others, so PECAN performs the same task with a subset of decoys that represent how peptides might score by random chance, and subtracts this background score at each retention time point. A series of feature scores are calculated for high scoring peptide-retention time matches (in contrast to peptide-spectrum matches) and these features are aggregated and FDR corrected using Percolator (50). Walnut (19) is a reimplementation of PECAN that makes minor modifications to the original PECAN scoring algorithm to improve speed and memory consumption.
Building DIA-only Libraries With Gas-phase Fractionation-Another strategy is to use additional DIA injections to build DIA-only libraries. Libraries that are constructed with DIA data can act as better priors, particularly when the DIA data is collected on the exact experimental sample and on the exact instrument platform. Gas-phase fractionation (51) (GPF) was originally coupled with DIA in the PAcIFIC approach (48) to improve peptide detection in complex samples by injecting the same sample multiple times, where each injection focused on different precursor isolation ranges. When experimentrepresentative DIA-based libraries are generated from gasphase fractionated, narrow-window DIA acquisitions of a pooled reference sample, they are called chromatogram libraries (19). We find that 6ϫ fractions each covering 100 m/z windows provides parallel reaction monitoring (7) (PRM) quality data (2 m/z precursor isolation windows) for every peptide between 400 -1000 m/z (Fig. 4F).
Preparing the chromatogram library sample requires skimming a small aliquot from a representative set of the experimental samples and pooling them together. Although spectrum libraries are typically generated by strong cation exchange or high-pH reverse-phase liquid chromatography coupled with DDA, a chromatogram library is generated by gas-phase fractionation coupled with DIA (see supplementary Note 2) (48). Separating the library sample by GPF rather than fractionation using chemical interactions preserves the sample matrix, granting more accurate library retention times and more representative fragmentation patterns. In addition, peptide fragmentation patterns are more similar when comparing DIA data to DIA libraries versus DDA libraries, due in part to differences in collision energy estimation assumptions made by the two approaches.
Tradeoffs of Chromatogram Libraries Over Spectral Libraries-GPF involves no additional sample preparation; instead, GPF injects the same sample multiple times, each time focusing the precursor isolation range on a fraction of the overall precursor range of interest. This approach has been shown to improve sensitivity in modern DIA workflows (18), but comes at the expense of requiring additional injections and sample. Any peptides present in only a few samples or present in low abundance in a subset of samples will be diluted below the level of detection, which may mean that the peptide will be missing in the chromatogram library. Although this may be, it is important to remember that most peptides of very low abundance in a pooled but narrow isolation window GP fractionated injection will not likely be detectable in a wide isolation window, unfractionated injection. Also, as not all peptides present in the pool are present in each individual sample, searches with chromatogram libraries must be rescored and FDR corrected as with DDA-based libraries.
For many quantitative experiments, adding an additional 6ϫ GPF-DIA injections of a sample pool to a long queue of biological samples adds negligible work for substantial gains in the number of detected peptides. Additionally, because only 6 g of pooled sample are required, it is possible to construct the pool by mixing the remaining volume left by the autosampler pickup after the single-injection DIA runs. Although this is not optimal in terms of run order (the GPF-DIA injections must be run after all of the single-injection DIA measurements are performed), it is extremely efficient at conserving sample material. However, in smaller experiments with few quantitative samples, these injections can be impractical. In these cases, we recommend using GPF-DIA to collect quantitative samples, similar to the experimental workflow presented in Ting et al. (18). For example, in an experiment with only 3 biological samples, it may make more sense to collect 2ϫ GPF-DIA injections (e.g. 400 -700 m/z and 700 -1000 m/z) per sample (6 total runs), rather than 6ϫ GPF-DIA injections of a pool plus single-injection DIA for each sample (9 total runs). Unlike with single-injection DDA, we find that there is little technical variation between single-injection DIA runs, and that these 2ϫ GPF-DIA approaches can be preferable to performing technical replicates with DDA. In an extreme case for performing exhaustive detection work on a single biological sample (no quantitative comparisons), we find that PECAN analysis of 6 GPF-DIA injections (essentially constructing a chromatogram library for each sample) can perform comparably to 6ϫ strong cation exchange or high-pH reverse-phase fractionated DDA injections with dramatically less sample material and preparation effort (supplementary Note 2) (41).

Recommended Data Acquisition Methods for Orbitrap Instruments
Rationale for Orbitrap Parameter Settings-Thermo Orbitrap instruments use parallel ion accumulation and ion analysis, where the Orbitrap analyzes ions for the current spectrum while the instrument accumulates ions for the next spectrum. Consequently, the time used for collecting the Orbitrap transient, which is dictated by the resolution setting, must be balanced with the time used for accumulating ions, which is dictated by the Automatic Gain Control (AGC) target and the maximum Ion Inject Time (max IIT). The AGC target setting describes the target ion population the instrument will accumulate before the MS/MS event. The max IIT setting is the maximum time the instrument will spend accumulating ions before the MS/MS event. These two parameters, AGC and max IIT, together define the time it takes to accumulate ions for each MS/MS event. Either ion accumulation stops because of hitting the AGC target, or ion accumulation stops because of reaching the max IIT.
With DIA MS/MS events, we want to optimize for collecting the greatest number of ions in the allotted time without encumbering space charge effects to increase the fragmentlevel quantification accuracy. Consequently, we set the AGC target higher than for typical DDA MS/MS events, which are optimized to just get enough peptide sequence ions to make a successful search engine match as fast as possible. As with DDA methods, we set the max IIT to roughly match the time it takes to collect an Orbitrap transient. Narrower precursor isolation windows (e.g. for building chromatogram libraries) require more time to collect enough ions for a successful MS/MS, and thus it makes sense to increase the resolution settings to match the required higher max IITs. In addition to increased mass accuracy, longer transients increase signal to noise, thus resulting in higher quality quantitative measurements at the cost of fewer MS/MS per duty cycle.
Adjusting these settings is important when using different precursor ranges and isolation window schemes, as these result in different populations of ions within the average isolation window. We typically recommend the 400 -1000 m/z precursor range using Orbitrap mass spectrometers running at either 10 or 20 Hz as a good tradeoff between breadth and selectivity for most unenriched, tryptically digested proteomics experiments (Table I and II). Fig. 7 indicates where these specific settings are used for QE-and Fusion-class instruments method editors. Although QE-HFX and Fusion Lumos instruments can collect high-quality DIA MS/MS at 20 Hz with 8 m/z or larger windows, we find that these instruments are still benefit from acquiring at a slower 10 Hz with 4 m/z windows used by GPF-DIA because of the smaller population of ions in each scan (data not shown). Given these parameters, we provide examples of several general inclusion window strategies with which to cover a precursor range which we will refer to as normal, variable, and staggered (Table III). We provide more detailed windows for other window widths and GPF-DIA windows in supplementary Table S1, and other windowing schemes for experiments with PTMs such as phosphorylation can be created in EncyclopeDIA's Window Scheme Wizard.
We find that because of the relatively long scan time required by Orbitraps, staggered windows produce the highest quality results in our hands. Because of filtering caused by the Fourier Transform analysis, Orbitraps usually produce MS/MS with relatively low noise, which makes staggered-window deconvolution more accurate. However, staggered windows may not work as well for time-of-flight (ToF) instruments because of their poor noise rejection. Instead, we recommend variable width windows with small margins as specified by Zhang et al. (26), to take advantage of the high scan speed ToF mass spectrometers typically provide. Either way, because of software limitations, it is not advisable to use an isolation scheme with both staggered windows and margins.
Precursor spectra are required by DDA to trigger MS/MS. In contrast, with DIA it is possible to detect high confidence peptides without either seeing any precursor signal (18), or even acquiring MS1 precursor spectra at all (37). However, with Orbitrap instruments, we recommend collecting periodic precursor spectra because they are collected by the instru-ment to aid in the AGC calculation, regardless of whether they are saved with the raw file. In addition, MS1 precursor spectra improve detection rates with EncyclopeDIA, which does not require seeing precursor signals for each peptide but can use those signals when present to calculate several scores including fitting the isotopic distribution. We recommend acquiring precursor spectra for only the range of the MS/MS precursor isolation windows, for gas-phase fractions. We find that by focusing on regions of the precursor space, we observe more sensitive precursor signals for the peptides of interest, similar to that found in windowed accurate mass and tag approaches such as BoxCar (52).
Queueing-Accurate measured retention times contribute powerful prior knowledge to chromatogram libraries, so it is crucial that the retention times measured in the chromatogram library can be precisely aligned to the wide-window, single-shot quantitative samples. As gas-phase fractions typically do not include the same peptides, unstable retention times between these injections complicate retention time alignment between the fractions in a way that cannot be corrected with the addition of iRT (45) peptides. Because gas-phase fractions subset peptides by mass-to-charge, the only peptides that are present in multiple fractions are different charge states of the same peptide (supplementary Note 2). Without the same peptides detected across gas-phase fractions, retention time alignment cannot easily be performed between them. Therefore, it is important with GPF that retention times between each fraction are as stable as possible. There are two strategies to ensure stable chromatographic retention times: column conditioning and library sandwich queueing.
Column conditioning simply means running several samples after equilibrating a new column. Retention time is sensitive to changes in column age, so it is best to first condition the column to a sample with similar matrix complexity as the experimental samples. As the column ages and is exposed to the sample matrix, retention times should stabilize. In practice, tracking retention time stabilization can be done by choosing a handful of endogenous peptides or spiking a synthetic set of peptides into the conditioning sample and tracking the retention times and peak shapes over each injection before queueing any experimental samples.
Second, the order in which the library and the quantitative samples are acquired can help to capture the average retention time in the GPF library. It is recommended that the chromatogram library pool should be run in the middle of the experimental sample queue, "sandwiched" on either side by the unpooled, single-shot quantitative samples (Fig. 8). Because the pooled sample is an empirical average of the unpooled, single-shot quantitative samples, running at least 6 of the single-shot quantitative samples before running the pooled library sample gives the most stable retention times. Although retention times may shift between the first acquisition and the last, the order in which peptides elute typically does not. Because retention time ordering stays constant with the same column, it is easy to precisely retention time align peptides between two runs on the same column with Ency-clopeDIA, including samples that are acquired early in the column's lifetime. However, this process assumes that there are no retention time deviations between GPF-DIA librarybuilding injections, which cannot easily be aligned together (supplementary Note 2). Consequently, we find it better to run the GPF-DIA injections in the middle of the acquisition queue after the column retention properties have stabilized.
Verifying Raw Data Quality-The raw data quality of DIA data can be assessed either qualitatively like shotgun or quantitatively like targeted proteomics before any peptide detection or quantification is performed. There are three ways to quickly assess data quality: comparing file size, observing total ion current (TIC) trace shape, and evaluating maximum ion inject time (IIT). The most basic assessment is noting file size across the runs. Quantitative single-shot samples should all be similarly sized files; likewise, each gas phase fraction should be roughly similar in file size. If any file is substantially lower in size, this may indicate a sample or acquisition issue that should be investigated further. Next, the TIC profile ideally makes a right-angled plateau with no obvious spikes in the gradient. Visual evaluation of the TIC can be performed by opening the acquisition file using vendor-specific software like XCalibur (Thermo) and Analyst (SCIEX), or with tools like SeeMS (ProteoWizard) and RawMeat (Vast Scientific). Finally, the IIT across a DIA experiment ideally is not affected by the precursor isolation windows. Plots of the IIT across retention time and isolation windows can be constructed with RawDiag (53) or EncyclopeDIA's built-in Raw File Browser function (supplementary Note 1). Although IIT should ideally remain unchanged across retention time, it is common to observe that the average IIT across RT forms an upside-down U shape, where the maximum IIT is reached at the beginning and ends of a chromatographic gradient because less ions are eluting at those times. This is because more ions are eluting in the middle of the gradient, requiring less time to collect enough ions to trigger the MS/ MS. However, at the beginning and ends of the gradient, there are fewer ions, and with fewer ions to fill the ion trap, the instrument spends more time accumulating ions and more likely triggers once reaching the maximum IIT.

Interpreting DIA Data
Data File Preparation-Acquisition files from the mass spectrometer are first converted using MSConvert (supplementary Note 1). For data acquired using the staggered window scheme discussed above, files require a computational demultiplexing step before they can be processed by data analysis softwares. Demultiplexing the files separates the staggered precursor isolation windows into their effective parts, for example the first few cycles in the staggered scheme described in Table III (D, E) are computationally demultiplexed into 12 m/z effective windows such that the converted output file contains isolation windows 400 -412 m/z, 412-424 m/z, 424 -436 m/z, etc. This requires only one additional parameter flag during the MSconvert step.
Constructing a Chromatogram Library-A chromatogram library is generated from the gas-phase fractionated, narrowwindow acquisitions of the pooled reference sample. Peptide detections made in the GPF, narrow-window acquisitions of a pooled library sample are then the set of all possible peptide detections that can be made in the wide-window acquisitions. Although these acquisitions can be analyzed by searching against a spectral library, here we describe searching against a FASTA proteome with Walnut, a performance optimized implementation of the PECAN algorithm (supplementary Note 1).

Library-free Analysis of DIA Proteomics Experiments
Walnut detects peptides in the pooled reference sample acquisitions by searching a "target" proteome fasta in the context of a "background" proteome fasta. In global proteome experiments, the target and background proteome will be the same. In enriched or targeted proteome experiments, the target proteome should contain only proteins the researcher is interested in, whereas the background proteome should be all possible proteins in the sample. For example, an experiment investigating global protein abundances in yeast should use the yeast reference proteome as both the target and background; for an experiment focused on changes in mitochondrial respiration in yeast, the reference proteome would be yeast mitochondrial proteins whereas the background is the reference yeast proteome.
When searching for all peptides in a FASTA database, we suggest limiting searches narrowly to only peptides and proteins that are likely to be present in samples. Unlike with DDA where the search space is proportional to the number of acquired MS/MS, with peptide-centric DIA the search space is proportional to the number of peptides considered. Consequently, we do not recommend searching for PTMs, such as oxidation or phosphorylation, unless the experiment is likely enriched for oxidized or phosphorylated peptides. Similarly, we recommend only searching for ions that are likely to be present, either by limiting Walnut searches to Y-ions only when using beam-type fragmentation, or by limiting spectral predictions to only ions predicted to be above an appropriate intensity threshold.
FDR Control-When using Walnut to generate a chromatogram library, the gas phase fractions are searched against the target fasta and an equal number of decoys. The 1% peptidelevel FDR-thresholded detections in each gas phase fraction are combined into a single chromatogram library, which is additionally controlled to a global 1% peptide-level FDR.
In a chromatogram library workflow as described here, the size of the library is sample-specific, as opposed to using repository-based libraries which may include proteins and peptides not expressed in the current experiment. Using a sample-specific library such as this decreases the statistical burden of false discovery rate (FDR) control by reducing the size, comprehensiveness, and heterogeneity of peptides and proteins represented in the library. However, even with the smaller statistical burden of a sample-specific chromatogram library, it is important to control for the various FDR control levels represented in a quantitative DIA experiment (54). Not only is the library FDR controlled, in this workflow each quantitative single-injection DIA file searched is itself additionally FDR controlled and each quantitative experiment is also globally FDR controlled at the 1% peptide-and the 1% protein-level.
Library searching assumes that all library entries are correct, and incorrect entries can propagate as true positives in target/decoy analysis (55). This concern is partially mitigated by multiple levels of filtering when building the library, and may benefit from additional filtering at a protein-level FDR. Further work is necessary to validate and improve FDR estimates for library searching in DIA experiments.
Refining Transitions for Quantification-EncyclopeDIA calculates a global interference score for transitions across all wide-window samples in the experiment and only uses the set of best scoring, interference-free transitions to integrate and sum for peptide quantification. Using this method for automated transition refinement, we see that as more interference-free fragments are required for quantification, the reproducibility of peptide quantification improves. We find that the peak area quantifications for peptides with just one or two interference-free transitions are extremely variable, with median coefficient of variations (CV) greater than 50% (Fig. 9). When we increase the number of interference-free transitions to a required three, the median peak area CV improves to 20%. Requiring five or more interference-free transitions improves the upper quartile peak area CV is also below 20%.
As with other targeted proteomics approaches, the chromatogram library approach integrates signal over the chromatographic area where a peptide is detected by placing integration boundaries on either side of a peak group. Background is subtracted from each peak group's signal area, and finally the background-subtracted peak areas are TIC normalized. For peptides, whose signal does not exceed the background noise, the peak area is reported as zero, as opposed to "NaN". Use of zero in quantification as opposed to NaN reflects the systematic sampling of DIA; peptides that are not detected in every sample of a DIA experiment are not missing at random, but rather are not detected in every sample because they are either present but below the limit of quantification, present but below the limit of detection, or truly not present in the sample. Without further validation experiments (56), it is not possible to discern which scenario describes the failure to detect a peptide in a given sample.

Current Limitations
The Tradeoff of Mass and Temporal Separation-Many peptides generate the same fragment ions, either because of sequence similarity, or post-translational modifications Longitudinal Studies and Batch Effects-Although a single pooled library sample is appropriate for most basic research purposes, there are some experimental scenarios where acquiring multiple chromatogram libraries may be best practice. For example, experimental designs with 3ϩ sample groups may benefit from a multiple chromatogram library strategy, in which each sample group is pooled for a sample type-specific chromatogram library. Large experiments spanning multiple columns, whether because of planned instrument maintenance or unplanned column changes because of column clogs, may benefit from treating each column as a separate set of experiments and calibrating to normalize between them (60). To note, a chromatogram library should be collected on each column used in an experiment. This requirement can be satisfied either by pooling the representative sample with enough volume to acquire chromatogram libraries on the same sample on multiple columns, or the experiment acquisition queue can be designed such that the chromatogram library for each column reflects the sample replicate blocks acquired on that column. Although it is possible to share chromatogram libraries between multiple columns, we typi- cally do not recommend it because slight variations between columns can change peptide retention time ordering. This reordering may be correctable with peptide-by-peptide alignment using tools like DIAlignR (61), but cannot be corrected using standard global alignment. CONCLUSIONS DIA is a powerful technique for analyzing quantitative experiments, and GPF-DIA can be a useful, low-effort method for deep fractionation for the purposes of detecting peptides. Here we present "best practices" for DIA methods for Orbitrap mass spectrometers and the intuition needed to modify them to suit specific types of experiments. In addition, we discuss how to collect DIA-only libraries using GPF-DIA from a pool with an example developed as a tutorial in supplementary Note 1, and when it makes sense to deploy this approach. We also provide supplementary Note 2, which contains answers to frequently asked questions about this approach. This DIAonly method to library generation can make DIA an attractive alternative to DDA, even when the sample is limited or spectrum library building is impractical. Finally, library-free data analysis and workflows can be performed using open-source, GUI-based tools as discussed here, or with vendor-supported commercial software.

DATA AVAILABILITY
The raw data are available at MassIVE (MSV000084531) available at https://massive.ucsd.edu/ProteoSAFe/dataset. jsp?taskϭb0c33f6caba04dd7a17a18e01e35071d.  Conflict of interest-The authors have declared a conflict of interest. B.C.S. is a founder and shareholder in Proteome Software, which operates in the field of proteomics. The Mac-Coss Lab at the University of Washington has a sponsored research agreement with Thermo Fisher Scientific, the manufacturer of the instrumentation used in this research. Additionally, M.J.M. is a paid consultant for Thermo Fisher Scientific.