^{ 1}

*m/z*

*m/z*value and spatial location can then be plotted as pixels in a false-color image, called an ion image, that displays the spatial distribution of the analyte associated with that

*m/z*value. MS imaging has been shown to be promising in a wide range of biological applications such as molecular histology of tissue, whole body sections, bacterial films, etc. (

*m/z*binning or resampling. Much progress has already been made in the preprocessing of MS imaging data. Many mature tools for preprocessing mass spectra already exist (

## Related Work

*i.e.*assigning pixels to predefined classes, and segmentation (for unsupervised experiments),

*i.e.*assigning pixels to newly discovered segments. A number of methods for these purposes already exist.

- Veselkov K.A.
- Mirnezami R.
- Strittmatter N.
- Goldin R.D.
- Kinross J.
- Speller A.V.
- Abramov T.
- Jones E.A.
- Darzi A.
- Holmes E.
- Nicholson J.K.
- Takats Z.

*Proc. Natl. Acad. Sci. U.S.A.*2014; 111: 1216-1221

*e.g.*using the Pearson correlation between a segment and the single ion images (

*et al.*(

## Contribution of the Manuscript

*et al.*We show that, for unsupervised segmentation, the spatial probabilistic modeling provides better quality segmentation. It characterizes the probability of segment membership for each pixel and allows us to quantify and visualize the uncertainty of segmentation for each pixel. Moreover, statistical regularization aids interpretation by automatically selecting subsets of the spectral features, such that each subset defines each segment. Statistical regularization also enables data-driven selection of the number of segments. Similarly, for supervised image classification probabilistic modeling characterizes the probability of predefined tissue class membership for each pixel and aids interpretation by automatically selecting subsets of spectral features that define each class.

## EXPERIMENTAL PROCEDURES

### Unsupervised Segmentation: Pig Fetus Cross-Section

*A*is an optical image of the Hematoxylin and eosin (H&E)-stained tissue section showing the general morphology of the pig fetus, including major organs such as the brain, heart, and liver.

*m/z*range. The images were cropped to remove non-informative spectra originating from the glass slide. The cropped dataset consisted of 4,959 mass spectra with 10,200 spectral features. The mass spectra were normalized to a common total ion current, and peak picking was performed to reduce the dataset to 143 peaks. All data processing and analysis were performed using Cardinal (

### Unsupervised Segmentation: Cardinal painting with Known Segmentation

*m/z*range. The dataset consisted of 12,600 mass spectra with 10,800 spectral features. Mass spectra were normalized to a common total ion current, and peak picking was performed to reduce the dataset to 51 peaks. All data processing and analysis were performed using Cardinal.

### Unsupervised Segmentation: Rodent Brain Images of Varying Quality

*A*. Mass spectra were acquired on a Bruker Autoflex III MALDI-TOF mass spectrometer over the 2,500 to 25,000

*m/z*range. The images were cropped to remove noninformative spectra, and only the 2,500 to 10,000

*m/z*range was used. The reduced dataset consisted of 20,185 mass spectra with 3,045 spectral features. The mass spectra were normalized to a common total ion current, and baseline correction was performed using ClinProTools. Cardinal was thereafter used to perform peak picking to reduce the dataset to 80 peaks. Except for baseline correction and normalization, all data processing and analysis were done in Cardinal.

*B*. This experiment produced high-quality spectra but with a moderate amount of experimental noise. Mass spectra were acquired on a Thermo Finnigan LTQ linear ion trap mass spectrometer with a DESI ion source over the 200–1,000

*m/z*range. The images were cropped to remove noninformative spectra. The cropped dataset consisted of 8,950 mass spectra with 9,600 spectral features. The mass spectra were normalized to a common total ion current, and peak picking was performed to further reduce the dataset to 123 peaks. All data processing and analysis were done in Cardinal.

*C*. This dataset features a high degree of experimental noise. Mass spectra were acquired using an AB Sciex MALDI TOF/TOF 5800 System over the 4,000 to 20,000

*m/z*range. The images were cropped to remove noninformative spectra. The cropped dataset consisted of 4,923 mass spectra with 22,667 spectral features. The mass spectra were normalized to a common total ion current, smoothed, and baseline corrected. Peak picking was then performed to reduce the dataset to 57 peaks. All data processing and analysis were done in Cardinal.

### Supervised Segmentation: Human Renal Cell Carcinoma

*m/z*range. The images were cropped to remove noninformative spectra originating from the glass slide. The cropped dataset consisted of 6,077 mass spectra with 10,200 spectral features. Individual tissue samples consisted of between 972 to 3,564 mass spectra per matched pair. The mass spectra were normalized to a common total ion current and resampled to unit resolution, resulting in 850 spectral features. All data processing and analysis were performed using Cardinal. Supplemental Fig. 3 shows single ion images for 215.3

*m/z*, which is an ion known to be more abundant in normal tissue, and Supplemental Fig. 4 shows single ion images for 885.7

*m/z*, which is known to be more abundant in cancerous tissue (

*m/z*along the edge of the cancerous tissue in sample UH0505 12 (Supplemental Fig. 3B). We will use this dataset to demonstrate the ability of the proposed framework to perform classification, while selecting spectral features important in distinguishing the disease condition.

## RESULTS

### Overview and notation

*m*= 1, …,

*M*denote the index of the biological sample,

*i.e.*a slide with one (or several) tissues. On slide

*m*, the experiment collects

*Nm*spectra at

*Nm*total pixel locations. Therefore, over all the samples, the experiment contains

*N*= Σ

_{m=1}

^{M}

*Nm*spectra and pixels.

*i, j*) denote the location of a pixel on a sample

*m*. We do not assume that the samples are rectangular in shape, so the indices (

*i, j*) are arbitrary. However, we do assume (

*i*+ δ

*i*,

*j*+ δ

*j*) describes the location of a pixel (δ

*i*, δ

*j*) away on the same sample. We assume that the spectra acquired at these locations have been processed, so that every spectrum has the same

*P*features, defined as a picked peak or a binned

*m/z*range. We also assume that the pixel intensities are normalized so that spectra are comparable across pixels and across samples. Then, denote the spectrum acquired at a pixel location (

*i, j*) on sample

*m*as x

*ijm*= {x

*ijmp, p*= 1, …,

*P*}. In other words, spectrum x

*ijm*is a vector of scalar intensities

*xijmp*for

*P*spectral features.

*K*classes (for supervised classification) or segments (for unsupervised segmentation). For supervised classification, the class membership is known, for example, from annotation by a pathologist, and the statistical goal of the experiment is to classify each pixel to one of these classes in a supervised manner based on its spectrum. Alternatively, for unsupervised segmentation, the class membership is unknown, and the statistical goal of the experiment is to discover these classes from the spectra in an unsupervised manner. Let

*Nk*denote the number of spectra, and the number of pixel locations, assigned to class

*k*=

*1*, …,

*K*by an unsupervised or a supervised procedure.

*k*as x̄

*k*and the overall mean spectrum as x̄. That is, x̄

*k*is a vector of

*P*scalar intensities

*x̄kp*, which are the mean intensities for spectral feature

*p*, over spectra from all pixel locations assigned to class

*k*, and x̄ is a vector of

*P*scalar intensities

*x̄p*, which are the overall mean intensities for spectral feature

*p*, over all spectra.

### Proposed Statistical Framework for Supervised Classification and Unsupervised Segmentation of Mass Spectrometry Images

#### Characterization of Classes and Segments by Their Shrunken Centroids

*k*. Here, we propose that each class (or segment) is better represented using shrunken centroids, from the method of nearest shrunken centroids by Tibshirani

*et al.*(

*et al.*and calculate the class (or segment) centroids, x̄

*k*and use statistical regularization to shrink the centroids toward the overall centroid x̄. We then calculate the

*t*-statistic for spectral feature

*p*for class (or segment)

*k*as

Here τ̂

*p*is the pooled estimate of the within-class standard deviation for feature

*p*. The number $\sqrt{\frac{1}{{N}_{k}}-\frac{1}{{\displaystyle {\sum}_{k=1}^{K}{N}_{k}}}}$ makes the denominator equal to the estimated standard error of the numerator. Second, we apply the soft thresholding operator ()

_{+}to shrink the

*t*-statistics toward 0.

and

*s*is the shrinkage parameter. Larger values of

*s*lead to a larger number of

*t*-statistics

*t*′

_{kp}to be set to 0. Finally, we define the intensities of the shrunken centroids for each feature

*p*for each class (or segment)

*k*as

so that

*x̄′k*= {x′

*kp*=

*1*,…,

*P*} is the shrunken centroid for class (or segment)

*k*.

*k*here can be viewed as adjusted mean spectra of the

*K*classes (or segments), where the intensities have been adjusted toward the overall mean spectrum. Therefore, the characteristic mean spectrum for a class (or segment) should differ from the overall mean spectrum only for those spectral features that are truly characteristic of the class (or segment). Spectral features that are not meaningfully different from the overall mean spectrum will have intensities set to the overall mean intensity for that feature.

#### Selection of Informative Features

*t′kp*calculated in Equation 2 are well suited for selecting informative features. The spectral features with

*t′kp*> 0 are systematically enriched for class (or segment)

*k*. Likewise, spectral features with

*t′kp*<

*0*are systematically absent from class (or segment)

*k*, as compared with the overall mean spectrum. Spectral features with

*t′kp*= 0 are noninformative, as only the features with

*t′kp*≠ 0 matter when assigning a pixel's spectrum to class (or segment)

*k*.

#### Spatially Aware (SA) and Spatially Aware Structurally Adaptive (SASA) Distances

*ijm*and x

*i′j′m*, which depends on the spectra from pixels within a neighborhood of (

*i, j*) and ([i′, j′]). The authors showed that this approach is beneficial, as it produces better quality segmentations, as compared with naïve methods that do not account for the spatial relationships between pixels (

*r*, the distance between two spectra is defined as

_{δiδj}(

*Xijm*,

*Xi′j′m′)*are spatial weights of the neighbors. The exact definition of these weights results in either a spatially aware (SA) distance or a spatially aware structurally adaptive (SASA) distance. In the SA distance, the weights are defined as

which are Gaussian weights independent of the spectra and only depend on the neighborhood. Using Gaussian weights, which decrease with the distance δ

_{i}

^{2}+ δ

_{i}

^{2}from the neighborhood center, is a natural choice because it assumes that pixels further away from each other are less related than pixels that are closer together. In the SASA distance, the weights are defined as

where

which are adaptive weights that downweight neighborhood locations where the spectra are very different from the neighborhood center. This is designed to preserve edges between morphological regions and small details in local structure, which could otherwise be lost due to oversmoothing by the ordinary Gaussian weights. The term λ is set empirically to the half of the norm of the difference between the two most differing spectra in the neighborhood.

#### Defining the SA and SASA Distances to the Shrunken Centroid of a Class or Segment

where defining the α

_{δiδj}using the Gaussian weights as in Equation 5 results in our version of the SA distance, and using adaptive weights defined as

with β

_{δiδj}(

*xijm*) as in Equation 7 results in our version of the SASA distance. We normalize the weights in both cases so that they sum to 1.

_{δiδj}rather than two, reflecting this difference. In the case of supervised classification, we will use this distance to classify pixels according their spectrum's similarity to the shrunken centroids of known classes. In the case of unsupervised image segmentation, we will use this distance to iteratively update the pixels assigned to discovered segments.

*s*. Moreover, for unsupervised segmentation, the selection of the number of segments

*K*is also required. The procedure for selecting these parameters and their effect and implications will be described further in Selection of Parameters.

#### Assignment of Segment or Class Probabilities to Pixels

*K*classes has a prior probability π

*k*and is modeled as a multivariate Gaussian distribution. All classes are assumed to share a common diagonal within-class covariance matrix. This leads to a straightforward way to calculate probabilities for individual observations belonging to a class using Gaussian likelihoods. By analogy, we calculate a discriminant score based on the SA or SASA distances from each spectrum x

*ijm*to each of the shrunken centroids x̄

*k*as

where, as before, τ̂

*p*is the pooled within-class standard deviation for feature

*p*. We typically estimate the prior probabilities empirically as π̂

*k*=

*Nk*/

*N*. If the training data are not representative of the population, different priors could be used. Because we are using spatial distances that incorporate spectra from multiple pixels, the discriminant scores cannot be interpreted directly as following Gaussian distributions. However, we empirically demonstrate below that the technique still produces good results in practice. Therefore, we further follow Tibshirani et al. by calculating class probabilities for each spectrum x

*ijm*for each class (or segment)

*k*as

*p̂k*(x

*ijm*).

*K*is the maximum number of segments, and the segments are initialized randomly or with another segmentation procedure. We typically use π

*k*=

*1/K*, but a semi-supervised procedure could be developed that uses different priors and a different initialization procedure. During each iteration of the segmentation, a pixel is updated as belonging to the segment with the highest

*p̂k*(x

*ijm*) using Equation 11.

#### Selection of Parameters

*s*, and, for unsupervised segmentation, the number of segments

*K*.

*s*can be selected by cross-validation. Specifically, given a set of

*M*biological samples on

*M*slides, each slide is viewed as an experimental unit for cross-validation. For experiments with a small number of biological replicates,

*M*-fold (

*i.e.*leave-one-sample-out) cross-validation can be performed. Within each fold (or sample) of cross-validation, fit spatial shrunken centroids for a range of values of

*s*. The final selected values of

*s*is the one that maximizes the overall classification accuracy on the left-out samples. This is illustrated for the human RCC experiment in Supplemental Fig. 10.

*s*, and the number of segments

*K*. First, spurious segments tend to be defined by noninformative features. When those are removed through statistical regularization, the spurious segments become empty. They have

*Nk*= 0, and are, in fact, removed. Second, excessive regularization can remove some informative features, and this results in the loss of the correct segments. We balance the regularization and the number of segments by creating segmentations for multiple values of

*s*and

*K*and then plotting the relationship between

*s*and the number of nonempty segments. We illustrate this using experimental data in Section 3.4.2 and in Supplementary Section 2.2.2.

#### Algorithm and Implementation

*E*and Fig. 3

*F*took 51 and 52 s, respectively. The segmentations for the cardinal painting (51 peaks and 12,600 pixels) in Supplemental Fig. 5E and Supplemental Fig. 5F took 67 and 48 s, respectively. The cross-validation for the human RCC dataset (850 features and 6,077 pixels) in Fig. 10 took 69 s.

#### Evaluation

#### Spatial Probabilistic Modeling Improves the Quality of Segmentation over Per-Pixel Segmentation

*A*,

*k*-means clustering was applied to the peak-picked spectra, resulting in a noisy segmentation. The heart is not assigned to a unique segment. Figure 3

*B*shows

*k*-means clustering applied to the first five principal components of the peak-picked spectra, which also results in a noisy segmentation, again without the heart represented as a unique segment. Figure 3

*C*and Fig. 3

*D*show the spatially aware clustering and spatially aware structurally adaptive clustering of Alexandrov and Kobarg (

*E*and Fig. 3

*F*show the proposed spatial shrunken centroids segmentation method with SA and SASA distances, which produce clean segmentations comparable to those in Fig. 3

*C*and Fig. 3

*D*. The number of segments for these methods was initialized to 20 and resulted in six segments in the final segmentations, as described in the section Statistical Regularization Enables Data-Driven Selection of the Number of Segments for Unsupervised Experiments. In addition, the segmentations in Fig. 3

*E*and Fig. 3

*F*are more similar to each other than those in Fig. 3

*C*and Fig. 3

*D*, suggesting that the proposed spatial shrunken centroids method produces more consistent results across different types of spatial smoothing.

#### Statistical Regularization Enables Data-Driven Selection of the Number of Segments for Unsupervised Experiments

*E*is illustrated in Fig. 4. To select an appropriate segmentation, we initialize spatial shrunken centroids with different numbers of starting segments

*K*(

*e.g.*15 and 20) while increasing the shrinkage parameter s (

*e.g.*from 0 to 9 in increments of 3). Appropriate segmentations are those that result in a comparable number of predicted segments from different numbers of starting of segments. This suggests that all extraneous segments have been dropped, and the remaining segments explain true biology. For the pig fetus dataset in Fig. 4, this occurs after about

*s*= 3. We also look for the predicted number of segments to stabilize (stop decreasing) as

*s*increases. In Fig. 4, this occurs around

*s*= 6. The segmentation in Fig. 3

*E*above corresponds to

*r*= 2,

*K*= 20,

*s*= 6.

#### Feature Selection Aids Interpretation by Automatically Selecting Spectral Features Associated with Differentiating Each Segment from Others

*E*, the selected spectral features using the proposed spatial shrunken centroids segmentation method are shown in Fig. 5. Feature selection is shown for the brain, heart, and liver segments, along with their corresponding

*t*-statistics and top-ranked single ion images. Note that each unsupervised or supervised segment is characterized by its own reduced subset of informative features. Some features may be found informative for multiple segments, and some features may be found informative for no segment. For the brain segment, 49 spectral features were systematically enriched, and 54 features were systematically absent. For the heart segment, seven spectral features were systematically enriched, and one feature was systematically absent. For the liver segment, 41 spectral features were systematically enriched, and 74 features were systematically absent. Compared with the brain and liver segments, the heart had very few spectral features associated with it.

#### Probabilistic Modeling Allows for Characterization and Visual Inspection of Uncertainty in Segment Membership in Unsupervised Experiments

*s*), and the “best” segmentations were plotted in Figs. 6

*D*-6

*F*using the criteria described in section Statistical Regularization Enables Data-Driven Selection of the Number of Segments for Unsupervised Experiments for selecting an appropriate number of segments. This resulted in five segments for both the rat brain (R1) with little noise, three segments for the mouse brain (R2) with moderate noise, and three segments for the mouse brain (R3) with strong noise. For the strongly noisy mouse brain (R3), there was no clearly appropriate parameter set, as shown in Fig. 6

*C*. Even for

*K*= 10 starting segments,

*s*= 0 resulted in only two predicted segments, and the predicted number of segments actually increased to three temporarily as

*s*was increased before dropping to two again. This reflects the lower quality of the information in this brain dataset.

*s*was further increased past the point of stabilization until the predicted number of segments eventually dropped to only two segments. That is, more and more spectral features were excluded from the segmentation until the remaining ones only explained two segments. These segmentations are plotted in Figs. 6

*G*–6

*I*. For the rat brain (R1) with little noise, this occurred at

*s*= 25. For the mouse brain (R2) with moderate noise, this occurred at

*s*= 28. For the mouse brain (R3) with strong noise, this occurred at

*s*= 0, reflecting the lesser amount of information in the data.

#### Classification in Supervised Experiments

*C*. Unlike PLS-DA and O-PLS-DA, which use all features, making interpretation difficult, spatial shrunken centroids only uses the features that best distinguish each class. Among the selected features, the top ion associated with cancerous tissue was 885.7

*m/z*, which is known to be more abundant in cancer (

*m/z*, which is known to be more abundant in normal tissue (

*B*, the tumor tissue (

*left*) shows an indistinct border of normal tissue along the left side, and in Fig. 7

*D*, the normal tissue (

*right*) shows an indistinct border of tumor tissue along the left side. These borders are defined by ions known to be associated with cancer and normal tissue (

## DISCUSSION

*k*-means clustering of the mass spectra or

*k*-means clustering of their principal components. It outputs probabilities of segment membership and therefore helps characterize and visualize the uncertainty in the segmentation. It automatically selects the total number of segments, as well as subsets of informative features that define each segment to provide more interpretable results. For supervised classification, spatial shrunken centroids achieves similar accuracy as compared with commonly used methods such as PLS-DA and O-PLS-DA. However, similarly to the unsupervised segmentation, it characterizes and visualizes the uncertainty of segment membership and subsets of informative features that define each class.

## Acknowledgments

## Supplementary Material

## REFERENCES

- MALDI imaging mass spectrometry: Statistical data analysis and current computational challenges.
*BMC Bioinformatics.*2012; 13: S11 - Testing for presence of known and unknown molecules in imaging mass spectrometry.
*Bioinformatics.*2013; 29: 2335-2342 - Spatial segmentation of imaging mass spectrometry data with edge-preserving image denoising and clustering.
*J. Proteome Res.*2010; 9: 6535-6546 - Efficient spatial segmentation of large imaging mass spectrometry datasets with spatially aware clustering.
*Bioinformatics.*2011; 27: i230-8 - Cardinal: An R package for statistical analysis of mass spectrometry-based imaging experiments.
*Bioinformatics.*2015; - MALDI imaging combined with hierarchical clustering as a new tool for the interpretation of complex human cancers.
*J. Proteome Res.*2008; 7: 5230-5236 - Multivariate statistical identification of human bladder carcinomas using ambient ionization imaging mass spectrometry.
*Chemistry.*2011; 17: 2897-2902 - Multivariate statistical differentiation of renal cell carcinomas based on lipidomic analysis by ambient ionization imaging mass spectrometry.
*Anal. Bioanal. Chem.*2010; 398: 2969-2978 - Molecular assessment of surgical-resection margins of gastric cancer by mass-spectrometric imaging.
*Proc. Natl. Acad. Sci. U.S.A.*2014; 111: 2436-2441 - Direct detection of peptides and small proteins in fingermarks and determination of sex by MALDI mass spectrometry profiling.
*Analyst.*2012; 137: 4686-4692 - The Use of Mass Spectrometry Imaging to Predict Treatment Response of Patient-Derived Xenograft Models of Triple-Negative Breast Cancer.
*J. Proteome Res.*2015; 14: 1069-1075 - Spatial and spectral correlations in MALDI mass spectrometry images by clustering and multivariate analysis.
*Anal. Chem.*2005; 77: 6118-6124 Sarkari, S., Kaddi, C. D., Bennett, R. V., Fernández, F. M., and Wang, M. D., (2014) Comparison of clustering pipelines for the analysis of mass spectrometry imaging data,

- Diagnosis of multiple cancer types by shrunken centroids of gene expression.
*Proc. Natl. Acad. Sci. U.S.A.*2002; 99: 6567-6572 - Class prediction by nearest shrunken with applications to DNA microarrays.
*Statistical Science.*2003; 18: 104-117 - Chemo- informatic strategy for imaging mass spectrometry-based hyperspectral profiling of lipid signatures in colorectal cancer.
*Proc. Natl. Acad. Sci. U.S.A.*2014; 111: 1216-1221 - The evolving field of imaging mass spectrometry and its impact on future biological research.
*J. Mass Spectrom.*2011; 46: 209-222 - EXIMS: An improved data analysis pipeline based on a new peak picking method for exploring imaging mass spectrometry data.
*Bioinformatics.*2015; 31: 3198-3206 - Desorption electrospray ionization (DESI) mass spectrometry: A brief introduction and overview.
*Current Separations Drug Development.*2007; 22: 11-14 - Mass spectrometry imaging under ambient conditions.
*Mass Spectrom. Rev.*2013; 32: 218-243 - Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis.
*BMC Bioinformatics.*2009; 10: 4

## Article info

### Publication history

### Footnotes

Author contributions: L.S.E. and C.R.F. designed the research; L.S.E., C.R.F., and S.M.v. performed the research; K.D.B. and A.H. contributed new analytic tools; K.D.B. analyzed data; K.D.B., A.H., and O.V. wrote the paper; and P.M., M.S., and O.V. co-PI.

### Identification

### Copyright

### User license

Creative Commons Attribution (CC BY 4.0) |## Permitted

- Read, print & download
- Redistribute or republish the final article
- Text & data mine
- Translate the article
- Reuse portions or extracts from the article in other works
- Sell or re-use for commercial purposes

Elsevier's open access license policy