High-speed analysis of large sample sets – how can this key aspect of the omics be achieved?

High-speed analysis of large (prote)omics sample sets at the rate of thousands or millions of samples per day on a single platform has been a challenge since the beginning of proteomics. For many years, electrospray ionisation (ESI)-based mass spectrometry (MS) methods have dominated proteomics due to their high sensitivity and great depth in analysing complex proteomes. However, despite improvements in speed, ESI-based MS methods are fundamentally limited by their sample introduction, which excludes off-line sample preparation/fractionation due to the time required to switch between individual samples/sample fractions, and therefore being dependent on the speed of on-line sample preparation methods such as liquid chromatography. Laser-based ionisation methods have the advantage of moving from one sample to the next without these limitations, being mainly restricted by the speed of modern sample stages, i.e. 10 ms or less between samples. This speed matches the data acquisition speed of modern high-performing mass spectrometers while the pulse repetition rate of the lasers (>1 kHz) provides a sufficient number of desorption/ionisation events for successful ion signal detection from each sample at the above speed of the sample stages. Other advantages of laser-based ionisation methods include the generally higher tolerance to sample additives and contamination compared to ESI MS, and the contact-less and pulsed nature of the laser used for desorption, reducing the risk of cross-contamination. Furthermore, new developments in matrix-assisted laser desorption/ionisation (MALDI) have expanded its analytical capabilities, now being able to fully exploit high-performing hybrid mass analysers and their strengths in sensitivity and MS/MS analysis by generating an ESI-like stable yield of multiply charged analyte ions. Thus, these new developments and the intrinsically high speed of

The 'omics', in particular proteomics, have tremendously benefitted from the arrival of modern mass spectrometry (MS) with its unrivalled performance in sensitivity while providing high specificity and superior multiplexing due to its exquisitely high resolution in mass separation. Simultaneous and accurate detection of numerous forms of biomolecules is easily achievable in one MS experiment. This biomolecular detection sensitivity has been exploited and further improved over the years, also in combination with up-stream sample fractionation methods, lowering the limits of detection and expanding the number of identified and quantified proteins as well as other biomolecules (metabolites, lipids, etc.). It has led to a race for higher proteome coverages with records being frequently broken as exemplified by work in the areas of phosphoproteomics (1)(2)(3)(4)(5) and blood plasma proteomics (6)(7)(8).
The invention and commercial manufacturing of ever newer and faster mass spectrometers were crucial for these advances into the depth of many proteomes. Orbitrap technology (9, 10) and ion mobility spectrometry (11,12) are good examples of novel concepts of ion manipulation that supported this development. In combination with faster MS/MS and ion detection as well as faster signal read-outs and new MS/MS strategies such as dataindependent acquisition (DIA) (11,13) higher proteome coverages have been obtained at increasing speed. In addition, further improvements in protein labelling methods (e.g. greater multiplexing (14)) and separation techniques (e.g. UHPLC (15,16)) have helped to speed up the analysis of complex samples (17). Thus, new MS hardware and methods as well as advances in up-stream separation/fractionation techniques have arguably resulted in fast large-scale proteomics. by guest on November 5, 2020 https://www.mcponline.org

Downloaded from
However, the impact of these advances are greatest for in-depth proteomics and the analysis of extensively processed samples that often undergo complex sample preparation protocols (1,6,11). These protocols typically rely on (nano)HPLC separation and frequently on further up-stream sample purification/extraction such as filter-aided sample preparation and can take a minimum of 3 days from proteolytic digestion to data evaluation (11). For the analysis of sub-proteomes, these protocols can be even more complex, including protein depletion if there is a large protein abundance range such as in blood (6), or specific peptide/protein enrichment by affinity purification as it is the case in phosphoproteomics (1). For obvious reason, such thoroughly prepared samples are somewhat limited in numbers, simply by the fact that these are prepared in specialised labs, of which there are only a few in existence.
Some of the latest advances in deep proteome analysis now allow the confident identification of >10 (HeLa cell line) proteomes per day at a depth of nearly 8,000 identified proteins using TMT labelling/multiplexing(4) while new data mining software based on neural networks can substantially improve the number of confident precursor peptide identifications within a DIA bottom-up proteomics analysis approach (18). Both strategies easily lead to an in-depth analysis (>5000 proteins; >50,000 peptide precursors) of 10-20 (HeLa) proteomes per day on a single MS instrument. Further improvements in this area can be expected and one day might allow the analysis of 100 or more proteomes per day, though in the near future most likely only at the expense of proteome coverage.
However, there seems to be no major movement towards extremely fast omic analysis of several samples per seconds for extremely large sample sets, i.e. millions or even billions of samples, despite the realisation that baseline abundances of specific proteoforms or other by guest on November 5, 2020 https://www.mcponline.org Downloaded from biomolecular species can substantially vary amongst (healthy) individuals, in particular in the human population (19)(20)(21), and thus for diagnostic purposes ultimately demanding frequent longitudinal sampling of all individuals in a given population. Only this type of large-scale sampling and subsequent (prote)omic analysis will provide the much needed data for advancing the understanding of population-wide proteome changes and exploiting protein/proteome analysis for improved diagnostics and therapeutics. For population-wide preventive medicine, frequent measurements of an individual's proteome (or subsets of it) will be the next crucial step in clinical proteomics with the potential to fulfil the promises of the much heralded future of personalised and precision medicine.
Unfortunately, the scale and speed for this type of proteomics is far from achievable with the analytical tools currently employed, especially with those for in-depth proteome analysis. Thus, it seems reasonable to consider a departure from exclusively focusing on these tools and the quest for comprehensiveness.
With regard to MS-based methods, the coupling of slow separation/fractionation methods such as chromatography to electrospray ionisation (ESI) has served the first decades of proteomics well. Many proteomes have been qualitatively and (to a lesser) extent quantitatively catalogued and compared in depth, though typically only from a few biological replicates and time points as the price for in-depth analysis are analysis times of hours, at the best tens of minutes per sample (18). Even multiplexing using current labelling methodologies cannot provide the means for the analysis of millions, let alone billions of samples per day.
Consequently, the low speed of up-stream sample fractionation and ESI need to be addressed. With respect to ESI, a handful of groups have considered improvements of its by guest on November 5, 2020 https://www.mcponline.org Downloaded from sample introduction and thus sample-to-sample speed. These efforts have led to sample analysis rates of seconds per sample (22,23), in some cases even up to 6 samples per second, though with the caveat of a relatively convoluted (micro)fluidic sample introduction system (24). However, the latter proof-of-principle study was shown with the small drug molecule dextromethorphan and its primary metabolite dextrorphan rather than with peptides or proteins as analytes. While up to 6 samples per second is extremely fast compared to conventional ESI MS analysis, it could well be its limit, given the practical rough calculation assumes continuous sample supply as might be possible using a conveyor belt set-up. However, the up-stream and on-line nature of using conveyor belt technology would realistically require several sample preparation stations along the conveyor belt with all the disadvantages of a complex on-line multi-station up-stream sample preparation system. A more practical scenario is the use of microtiter plates as a standard format, which provides a truly off-line and scalable sample preparation system that can be set up with the required number of commercial sample preparation stations needed for feeding the mass spectrometer at the applicable sample throughput. In this case, additional time for changing plates needs to be added. Using modern robotics, microtiter plates can be easily changed within 5 seconds, and with formats of high sample density such as 1536-well microtiter plates, rates of approximately 6.5 million samples analysed per day should be feasible on one laser-based MS instrument. Roughly 1,000-2,000 of these analytical platforms would therefore be sufficient for analysing 1 sample per human being per day.
Data transfer, processing and further mining at these high data acquisition speeds might then arguably present the next challenge, which will also depend on the exact use of metadata and database searching. Even more challenging will be the logistics behind individual sample collection and delivery to the laboratory at this scale. Interestingly, these questions have recently become highly topical as part of the COVID-19 testing response and calls for developing future delivery systems such as small drone deliveries (cf. UK Research and Innovation's Future Flight Challenge). Nonetheless, given current performance data, moving away from ESI-to laser-based MS analyses for greater speed is likely to lead to a loss of overall sensitivity, even if the same or a similar sample pre-fractionation methods were employed and MALDI was used as the softest and most sensitive laser-based ionisation technique (28,29). For proteomics analysis, ESI-based methods are currently the most sensitive, providing the greatest proteome coverages, particularly in combination with on-line LC separation. However, compared to ESI, MALDI has competitive advantages in three important analytical areas, namely scalability, speed and sample flexibility. Sample flexibility, i.e. flexibility with regard to the overall sample conditions, is important if these conditions need to be adjusted to provide an optimal environment for the analyte to be ionised and detected, or equally important, if sample conditions are sub-optimal but would take a lot of time to 'clean up' for best ionisation results. The latter would have a direct impact on the time needed for sample preparation. In combination with the advantages in scalability, which are partially a result of off-line sample preparation, and speed (due to the above-mentioned laser characteristics), by guest on November 5, 2020 MALDI is ideal for high-speed biomolecular analysis of extremely large sample sets. Taking its good performance in low-speed in-depth proteomic analysis (and MS imaging) into account MALDI is probably the most versatile proteomic tool.
A relative comparison of the analytical performance of ESI, DESI, (conventional solid-state) MALDI and liquid AP-MALDI in six important areas (sensitivity, scalability, speed, signal stability, sample flexibility, and structural elucidation) is shown in Figure 2.
Employing MALDI instead of ESI and exploiting its capabilities in high-speed MS profiling of large sample sets, undoubtedly results in a reduction of the number of biomolecules that will be detected. Nevertheless, recent developments have shown that fast MALDI MS profiling has gained further depth. The use of heated atmospheric pressure (AP) ion sources on hybrid mass analysers and liquid MALDI sample preparation methods that facilitate the production of multiply charged proteinaceous analyte ions have significantly contributed to these developments (30). The combination of these recent advances that add additional functionalities to (prote)omic profiling by MALDI MS, which in its earlier form was significantly less-advanced and ultimately unsuccessful in accurate disease diagnostics (cf. SELDI(31)), could soon become sufficiently sensitive for the detection of important peptide/protein panels and other molecular biomarkers that can be further exploited for clinical diagnostics as well as for understanding the underlying (systems) biology (30). Some protein digest replicate analyses at around 5-10% or less compared to 10-25% or more for solid MALDI, particularly for low-purity samples (39). The second main advance, the production of ESI-like multiply charged ions, allows the use of high-performing hybrid mass analysers with a typically small m/z range as found in Q-TOF and hybrid orbitrap instruments. It therefore results in post-source MS/MS peptide sequencing in the same way as for ESI-generated peptide ions, without any indication of a difference in fragmentation related to the origin of these ions(37, 40) -for in-source fragmentation, however, fragment ion generation appears to be depend on the exact matrix being used and therefore different to ESI (41). This new and unique feature in MALDI MS was exploited by Hale et al. for identifying discriminative protein fragments in liquid AP-MALDI MS profiling for the accurate detection of bovine mastitis from extremely small amounts of milk (30). Interestingly, liquid AP-MALDI MS profiling on a Q-TOF instrument also revealed another advantage of this MALDI/Q-TOF combination as it showed that small molecules such as metabolites and lipids can be effectively co-detected together with larger peptides and proteins in a single spectrum (30,42). In conventional (vacuum) MALDI MS using axial TOF instruments usually only one or the other can be effectively analysed due to the difference in acquisition modes employed for small molecules (high-resolution reflectron mode) and larger molecules such as proteins (linear mode with increased laser energy, and thus extremely high amounts of ions in the lower m/z range, which are normally suppressed before and/or at the detection step). Thus, liquid AP-MALDI MS provides a greater range of accessible analytes that can be detected in the same spectrum, making comprehensive biomolecular profiling by MALDI MS potentially more powerful than that offered by current methods using conventional (solidstate and vacuum) MALDI MS profiling as available in commercial MALDI MS biotyping platforms. by guest on November 5, 2020 The potential of these new developments in MALDI in addition to the well-known fundamental advantages of a laser-based analytical method (e.g. in speed) have so far attained little attention. With these new step changes being made in MALDI MS-based methods it now seems to be a good time to (re)consider extreme high-speed and large-scale proteomics with all its potential in gaining additional information for biological systems analysis and disease diagnostics.
In this context it has to be noted that the recent large influx of new acronyms for supposedly novel (laser) ionisation techniques, without scrutinising the technique's analytical usefulness and in many cases its novelty, let alone the need for creating a new acronym, has not been helpful. In fact, it confuses the field and makes it difficult to find true advances. As a result many groups are deterred from further exploring truly novel and advantageous developments in this area.
Finally, non-MS proteomic tools such as immunoassays and enzymatic activity assays seem to be in many cases good alternatives for large-scale and high-speed proteomics (43,44). In virtually all cases, however, the highly targeted nature of these assays and the time often needed for the necessary reactions and read-outs disqualify these methods as serious competitors. In many cases, issues with respect to specificity, traceability and development costs are further aspects that render them ultimately uncompetitive. Nevertheless, it has to be noted that the one strong advantage of such methods is the possibility of point-of-care and in-field applications, which is still a weak point for all current MS-based (prote)omic analyses.
In conclusion, after the advent of modern MS in the omics and its stellar rise as the analytical method of choice, MS has further advanced in areas of its obvious strength such by guest on November 5, 2020 as high sensitivity and specificity in biomolecular detection, mainly as a result of further improvements in mass analysers (including greater mass measurement resolution and accuracy), sample preparation methods and separation/fractionation techniques. ESI-based methods have been for most of these advances the obvious choice and at the centre for further improvements. As a direct result some areas such as high-speed and large-scale (prote)omic analysis of large numbers of biological samples have been challenging and therefore somewhat neglected. There have been advances in these areas however and it is now feasible to undertake (prote)omic analysis of millions of samples at much higher speed using non-ESI-based methods. These analyses might never provide the depth of LC-ESI MS/MS with all its improvements made over decades (e.g. nanoESI, UHPLC, multiplex labelling, …) but will be able to provide some depth in biomolecular detection, partially benefitting from the same MS hardware that also improved ESI-based analyses. Importantly, these non-ESI-based methods can offer the speed and scalability that is still missing in order to satisfy the requirements for population-wide and longitudinal sample collections.
Achieving the latter will not only help the advancement of personalised and precision medicine but also provide invaluable data with regard to population-wide and environment-/time-specific changes of the proteome. While critical voices might argue that the biomolecular coverage will never be as great as with LC-ESI-based MS analyses, it would seem to be grossly negligent to ignore the additional richness of the omic and system information that large sample sets collected over various dimensions can offer. Even if it will never be possible to analyse the entire proteome by these methods -as will probably never be possible with an ESI-based MS method either -there are now clear indications that the analytical depth is sufficient to pursue (prote)omic analyses at the speed of thousands or more samples per day (34,37). The analysis of a million samples per day on a single MS by guest on November 5, 2020 platform is also well within the capabilities of laser-based ionisation techniques, though here it remains to be seen whether more than a hundred biomolecular species can be analysed at this speed. Early data obtained by the laser-based method of liquid AP-MALDI using high-performing hybrid mass analysers are encouraging and indicate that analysis at this speed and depth are entirely possible in the foreseeable future (34). Importantly, the analytical depth, signal robustness and sample flexibility of this new approach is well beyond early MS profiling methods using conventional, solid-state MALDI on axial TOF mass analysers. These extremely fast analyses would then be able to fill many of the gaps in the field of proteomics that can currently only be served with severe limitations by the commonly employed but much slower ESI-based proteomic tools.