Novel MMP-9 Substrates in Cancer Cells Revealed by a Label-free Quantitative Proteomics Approach*S

Matrix metalloproteinase-9 (MMP-9) is implicated in tumor metastasis as well as a variety of inflammatory and pathological processes. Although many substrates for MMP-9, including components of the extracellular matrix, soluble mediators such as chemokines, and cell surface molecules have been identified, we undertook a more comprehensive proteomics-based approach to identify new substrates to further understand how MMP-9 might contribute to tumor metastasis. Previous proteomics approaches to identify protease substrates have depended upon differential labeling of each sample. Instead we used a label-free quantitative proteomics approach based on ultraperformance LC-ESI-high/low collision energy MS. Conditioned medium from a human metastatic prostate cancer cell line, PC-3ML, in which MMP-9 had been down-regulated by RNA interference was compared with that from the parental cells. From more than 200 proteins identified, 69 showed significant alteration in levels after depletion of the protease (>±2-fold), suggesting that they might be candidate substrates. Levels of six of these (amyloid-β precursor protein, collagen VI, leukemia inhibitory factor, neuropilin-1, prostate cancer cell-derived growth factor (PCDGF), and protease nexin-1 (PN-1)) were tested in the conditioned media by immunoblotting. There was a strong correlation between results by ultraperformance LC-ESI-high/low collision energy MS and by immunoblotting giving credence to the label-free approach. Further information about MMP-9 cleavage was obtained by comparison of the peptide coverage of collagen VI in the presence and absence of MMP-9 showing increased sensitivity of the C- and N-terminal globular regions over the helical regions. Susceptibility of PN-1 and leukemia inhibitory factor to MMP-9 degradation was confirmed by in vitro incubation of the recombinant proteins with recombinant MMP-9. The MMP-9 cleavage sites in PN-1 were sequenced. This study provides a new label-free method for degradomics cell-based screening leading to the identification of a series of proteins whose levels are affected by MMP-9, some of which are clearly direct substrates for MMP-9 and become candidates for involvement in metastasis.

method for degradomics cell-based screening leading to the identification of a series of proteins whose levels are affected by MMP-9, some of which are clearly direct substrates for MMP-9 and become candidates for involvement in metastasis.

Molecular & Cellular Proteomics 7: 2215-2228, 2008.
Matrix metalloproteinase-9 (MMP-9) 1 or gelatinase B was originally identified as a secreted enzyme that could degrade collagens, especially collagen IV (1,2). These early observations have since been refined as it became apparent that denatured collagen I, IV as well as gelatin are efficiently cleaved but that intact collagen I or IV are poor substrates for the enzyme (3,4). Indeed MMP-9 fails to promote cellular invasion of natural basement membranes (5). Although MMP-9 may not be a determining factor for invasion of collagen IV-rich basement membranes, collagen IV degradation by MMP-9 has been shown to be involved in angiogenesis. For instance, the reduced levels of antiangiogenic peptides derived from collagen IV in mice genetically deficient in MMP-9 reflect the decreased turnover of collagen IV in the absence of MMP-9 (6). Many non-collagenous substrates have also been identified as MMP-9 substrates including cell surface molecules and secreted proteins. These include molecules involved in cell adhesion, cell surface receptors, cytokines, protease inhibitors, angiogenic factors, and growth factors. The cleavage of many of these substrates affects the cellular phenotype and is associated with several pathological conditions in mice (7,8).
In cancers MMP-9 produced by the host contributes to squamous cancer progression by activation of the angiogenic switch and recruitment of macrophages or hematopoietic stem cells to the pre-existing metastatic niche (9,10). MMP-9 made by the tumor cell also contributes to cancer progression (11). In some cases metastatic potential is associated with increased MMP-9 levels, and down-regulation of MMP-9 depresses metastasis (11). However, a protective role for MMPs in cancer progression has also been demonstrated (12). For example, MMP-9 can also result in decreased angiogenesis because of the release of extracellular endostatin in a breast cancer model, therefore contributing to tumor regression (13).
Despite the abundant evidence supporting a role for MMP-9 in promotion of tumor metastasis, the nature of the substrates targeted by MMP-9, which contributes to this process, remain largely unknown. As a prelude to define the targets of MMP-9 activity in cancer metastasis, we developed a new method to identify MMP-9 substrates in cancer cells. MMP-9 is a member of the matrix metalloproteinase family of zinc-dependent endopeptidases that characteristically have a conserved catalytic domain with Zn 2ϩ complexed to three histidines at the active site (11,14,15). A cysteine residue in a propeptide domain of MMPs interacts with the Zn 2ϩ in the catalytic site inhibiting activity and maintaining latency (14). Although cleavage of the propeptide domain is a common mechanism for MMP activation and acquisition of enzymatic activity, evidence has shown that catalytic activity can be acquired without cleavage of the propeptide, in particular in pro-MMP-9 (16,17). MMP-9 and its close homolog MMP-2 (gelatinase A) possess a gelatin-binding domain, which comprises three fibronectin-like repeats that provide binding to and specificity for denatured collagen (4). Like all MMPs, MMP-9 also contains a hemopexin-like domain, which is connected to the catalytic domain by a highly glycosylated hinge region. The function of the hemopexin-like domain of MMP-9 in catalysis has not been elucidated. However, the complex of pro-MMP-9 with TIMP-1 involves binding of the inhibitor to the hemopexin-like domain (18). The glycosylation of the hinge region affects activity indirectly through influences on cell surface binding and internalization (19,20). Different cell lines catalyze different glycosylation patterns (21). These factors further complicate the ability to predict valid substrates in a cellular context. MMP-9 like many MMPs is secreted but can also bind the cell surface through collagen (␣2) IV chains (2), CD44 (22), and through the lipoprotein receptor-related protein receptor (23).
Identification of substrates for proteases in general poses a series of technical problems. Prediction of substrates based on cleavage of phage or peptide libraries in vitro has often overestimated sites because many sequences that are capable of being cleaved are not accessible in the intact protein or exposed to the enzyme in its cellular locations (24 -26). A more successful approach to identify new protease substrates includes the use of mass spectrometry and in particular quantitative proteomics approaches to compare protein levels after alterations in the protease activity because the cleavage of substrates is assessed under live conditions (26,27). In these experiments, cell culture supernatants from cells with a differential expression level of the protease under study are examined for enrichment of proteins that can be considered as candidate substrates (28). Initial studies of this kind included the use of stable isotope labeling methodologies to quantify peptide and protein levels by mass spectrometry more accurately (27). For instance, isotope-coded affinity tags (ICAT) that react with cysteine residues were used to identify novel substrates for the matrix metalloproteinase MT1-MMP (29). A more recent study applied isobaric tags for relative and absolute quantitation (iTRAQ) and mass spectrometry to uncover novel substrates for MMP-2 (30). This approach revealed a significantly higher number of potential substrates because this mass tag is not restricted to cysteine but is reactive to free amines (peptide N termini) (30). However, both approaches require the attachment of isotope labels to peptide material prior to MS analysis (29,30), which may introduce an uncertainty factor for quantitation because the tagging reactions may not be complete. As an alternative, peak height and ion counts observed in liquid chromatography runs can also be used to determine levels of biomolecules (31). This method has been problematic for proteomics analysis because of the low reproducibility of nano-LC runs and the requirement for MS to MS/MS switching to obtain sequence information, resulting in low quality chromatograms (32). The development of chromatographic columns with smaller particle size as well as LC pumps with higher pressure limits and nanoflow delivery capacity permitted the achievement of much improved chromatography performance, referred to as ultraperformance LC (UPLC) (31)(32)(33). The recent development of nano-UPLC coupled to ESI-MS/MS now allows the analysis of low abundance biological samples (31,33). Initial results demonstrate that the enhanced chromatography performance leads to significant gains in the collection of information on biomolecules such as peptides and proteins (sequence coverage and post-translational modifications) (33,34). In addition, the level of reproducibility in nano-UPLC allows quantitative assessment of changes between samples with higher precision without the need for using stable isotope-based techniques, in particular when combined with high speed MS-MS/MS switching capabilities (31,32) or parallel parent and fragment ion analysis MS E (35,36). In this study, we used nano-UPLC coupled to high/low collision energy MS (MS E )-based label-free quantitative shotgun proteomics to reveal novel MMP-9 substrates present in prostate cancer cell culture supernatants and confirm the susceptibility of several proteins to MMP-9 using independent biochemical methods.

EXPERIMENTAL PROCEDURES
Cell Culture and Conditioned Medium Collection-PC-3ML cells were maintained in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum at 37°C with 5% CO 2 . To collect conditioned medium, plated PC-3ML cells were washed in serum-free DMEM three times and then incubated with serum-free DMEM with or without 10 M BB-94 (British Biotech), 100 nM phorbol 12-myristate 13-acetate (PMA; Sigma), or MMP-9 inhibitor I (Calbiochem) for 24 h. After harvest, the conditioned medium was centrifuged at 1000 ϫ g to remove the cell debris and then concentrated 10 -15 times through a Centricon concentrator device. For mass spectrometry, the proteins from concentrated conditioned medium were extracted through chloroform-methanol precipitation (37), and protein concentration was measured and normalized for in-solution tryptic digestion (38). For immunoblot and zymography, the loading of concentrated medium was normalized according to cell numbers and protein content.
shRNA-Sequences coded for shRNAs directly against MMP-9 were cloned into either pcDNA3 or IMG-800 vectors and tested for the ability to reduce the expression and activity of MMP-9 in the PC-3ML cell line. In general, 1 g of shRNA MMP-9 was transfected into PC-3ML cells using the FuGENE transfection reagent (Roche Applied Science) according to the manufacturer's protocol. The cell colonies were then selected by 400 g/ml G418 for a number of passages until it was stably expressed. Conditioned medium was collected to test for the reduction of MMP-9 activity. The sequence of final applied shRNA MMP-9 for mass spectrometry was CAAGTG-GCACCACCACAACAT (39).
Analysis by Nano-UPLC-MS E Tandem Mass Spectrometry-Collected and precipitated material from cell supernatants was subjected to in-solution trypsin digestion as described previously (38). Digested samples were desalted and concentrated using Sep-Pak C 18 column cartridges (Waters) according to the manufacturer's instructions. Samples were analyzed by nano-UPLC-MS E using a 75-m-inner diameter ϫ 25-cm C 18 nanoAcquity TM UPLC TM column (1.7-m particle size; Waters) and a 90-min gradient: 2-45% solvent B (solvent A: 99.9% H 2 O, 0.1% formic acid; solvent B: 99.9% acetonitrile, 0.1% formic acid) on a Waters nanoAcquity UPLC system (final flow rate, 250 nl/min; 7000 p.s.i.) coupled to a Waters QTOFpremier TM tandem mass spectrometer. Data were acquired in high definition MS E mode (low collision energy, 4 eV; high collision energy ramping from 15 to 40 eV, switching every 1.5 s) and processed with ProteinLynx Global Server (PLGS; version 2.2.5; Waters) to reconstruct MS/MS spectra by combining all masses with identical retention times. Each sample was analyzed in triplicate runs. The mass accuracy of the raw data was corrected using Glu-fibrinopeptide (200 fmol/l, 700 nl/min flow rate, 785.8426 Da [M ϩ 2H] 2ϩ ) that was infused into the mass spectrometer as a lock mass during sample analysis. Low and high collision energy MS data were calibrated at intervals of 30 s. The raw data sets were processed including deisotoping and deconvolution, and peak lists were generated on the basis of assigning precursor ions and fragments based on similar retention times (35,36). A Swiss-Prot database (release 51.0, October 2006, 241,242 entries) was used for database searches of each triplicate run with the following parameters: peptide tolerance, 15 ppm; fragment tolerance, 0.015 Da; trypsin missed cleavages, 1; and variable modifications, carbamidomethylation and Met oxidation. Alternatively MS/MS spectra (peak lists) were also searched against the Swiss-Prot database (release 54.0; July 2007; number of entries, 276,256) using Mascot version 2.2 (Matrix Science, London, UK) and the following parameters: peptide tolerance, 0.2 Da; 13 C ϭ 2; fragment tolerance, 0.1 Da; missed cleavages, 3; and instrument type, ESI-Q-TOF. The interpretation and presentation of MS/MS data were performed according to published guidelines (40). In addition, individual MS/MS spectra for peptides with a Mascot Mowse (molecular weight search) score lower than 40 (expect value Ͻ0.015) were inspected manually and included in the statistics only if a series of at least four continuous y or b ions were observed. Protein identification was also based on the assignment of at least two peptides. False positive rates for the data sets generated in this study were evaluated using Mascot and a randomized database based on Swiss-Prot (decoy function) and are illustrated in supplemental Table S3. The local "in-house" Mascot server used for this study is supported and maintained by the Computational Biology Research Group at the University of Oxford.
Quantitative Analysis-The analysis of quantitative changes in protein abundance, which is based on measuring peptide ion peak intensities observed in low collision energy mode in a triplicate set, was carried out using Waters Expression Analysis Software (WEPS TM ), which is part of PLGS 2.2.5 (Expression version 2). For normalization, each sample was spiked with 125 fmol of tryptic digest of ␣-enolase as an internal standard. Included were all protein hits that were identified with a confidence of Ͼ95%. Identical peptides from each triplicate set per sample were clustered based on mass precision (Ͻ15 ppm, typically ϳ5 ppm) and a retention time tolerance of Ͻ0.25 min using the clustering software included in PLGS 2.2.5. Protein scores were increased when the same peptide assignments were made in more than one replicate run. To avoid potential errors due to redundancies in assignments, searches were performed using the non-redundant Swiss-Prot database (as described above). Furthermore if two or more distinct proteins shared an identical peptide but were found to be regulated differently, then the quantitation algorithm did not include the peptide in question. To allow for this, peptide probabilities are always softened slightly by the PLGS software prior to quantitation. Because of this, the contributions from peptides with even 100% probability of presence were suppressed to avoid potential errors in quantitation. Normalization of the data sets was performed based on the spiked standard, but very similar results were obtained using the PLGS "autonormalization" function (not shown).
Protease Nexin-1 (PN-1) Peptide Mapping-The determination of MMP-9 cleavage sites in PN-1 as described for Fig. 7 was performed by comparing in-solution tryptic digests of PN-1 control with PN-1 incubated with recombinant MMP-9. The presence of non-tryptic cleavage sites that were not observed in the control sample was ranked based on the individual Mascot peptide scores and divided into high probability (Mascot peptide score Ͼ40) and low probability (Mascot peptide score Ͻ40). The peak list files of one representative experiment are available as supplemental material (CTRL_MMP9KD1/ 2/3 and MMP9KD1/2/3).
Zymography-MMP-9 and MMP-2 enzyme activity was assayed by gelatin zymography as described previously (41). In brief, concentrated conditioned medium was separated by electrophoresis, under non-reducing condition, through an 8% polyacrylamide gel co-polymerized with 1 mg/ml gelatin (Sigma). Gels were washed in 2.5% Triton X-100 solution three times and then incubated overnight at room temperature in developing buffer containing 50 mM Tris-HCl, pH 7.5, 5 mM CaCl 2 , 5 M ZnCl 2 , and 1% Triton X-100. Gels were subsequently stained with 0.25% Coomassie Brilliant Blue R-250 and then destained with 50 mM Tris-HCl (pH 7.4) and 2% Triton X-100. Enzymatic activity attributed to MMP-9 and MMP-2 was visualized in the zymogram as clear bands against dark background. Human recombinant MMP-9 and MMP-2 proteins (10 ng) (Calbiochem) were loaded as controls.
In Vitro Cleavage Assay-Human active MMP-9, MMP-2, or pro-MMP-9 (Calbiochem) was tested for the ability to cleave human recombinant LIF (Chemicon) or PN-1 protein (R&D Systems) in vitro. Briefly recombinant proteins were mixed in the assay buffer (50 mM Tris/HCl, pH 7.5, 150 mM NaCl, 10 mM CaCl 2 , 0.05% Brij35, 100 g/ml BSA, 0.5 mM 4-(2-aminoethyl)benzenesulfonyl fluoride, 1 g/ml pepstatin, and 1 g/ml leupeptin) and incubated at 37°C. At different time points, reactions were terminated by 10 mM EDTA. The digestive products were subjected to 15% polyacrylamide electrophoresis, transferred to PVDF membranes, and immunoblotted with anti-LIF or anti-PN-1 antibodies. Alternatively the resulting cleavage products were analyzed by peptide mapping using LC-MS/MS as described above. Data analysis was performed using Mascot using the parameters as described above with the exception of using no restriction for enzyme cleavage.

MMP-9 Levels in Tumor
Cells-To select a tumor cell system to use for identification of potential substrates for MMP-9, several invasive tumor cell lines were screened for the expression and activity of MMP-9 through gelatin zymography (Fig.  1A). From this screening, we selected PC-3ML, a highly invasive and metastatic subline of PC-3 (human prostate cancer cell line), which express MMP-9 but little MMP-2 ( Fig. 1A).
Two sets of strategies were undertaken to regulate MMP-9 activity in PC-3ML cells, pharmacological inhibition and RNA interference expression. An expression vector coding for shRNA against MMP-9 was transfected and stably expressed (Fig. 1C); cells displayed significant loss of MMP-9 (Fig. 1D). Even after treatment with PMA, a potent transcriptional activator for several MMPs including MMP-9 (42), the level of MMP-9 released by the knockdown (KD) line remained undetectable, whereas it was significantly increased in PC-3ML  (Fig. 1D). Two commercial compounds, BB-94 and an MMP-9 inhibitor, were tested for inhibition of MMP-9 activity in vitro. Both reduced the activity of MMP-2 and MMP-9. In particular, BB-94 almost completely ablated the activity of both gelatinases at a concentration of 10 M (Fig. 1B).
Identification of MMP-9 Substrate Candidates by Nano-UPLC-MS E Analysis-Serum-free conditioned medium was collected from PC-3ML cells after extensive washing (five times) to remove as much serum as possible. Medium was collected from control cells, cells treated with BB-94, and cells stably expressing shRNA against MMP-9 followed by sample concentration using a 10-kDa molecular mass cutoff spin filter. Concentrated material was precipitated and subjected to in-solution trypsin digestion. Digested material was analyzed using a label-free quantitative proteomics approach (Fig. 2). This method entails separation by nano-UPLC and analysis by an on-line Q-TOF tandem mass spectrometer. The presence of several hundred proteins in cell culture supernatants gives rise to thousands of peptides after proteolytic digestion that usually requires multiple chromatographic steps for efficient separation, such as MudPIT (33). In our approach, the use of a 25-cm reversed phase column packed with small particles (1.7 m) combined with ultrahigh pressure (ϳ7000 p.s.i.) allowed for adequate separation in a single chromatography run (Figs. 2 and 3A).
A current limitation of data collection is determined by the acquisition speed in data-dependent acquisition (DDA) mode, which requires switching from MS to MS/MS to obtain both ion counts of precursor ions and fragmentation information in a tandem mass spectrometer. This usually leads to insufficient collection of data points to accurately define peak   2. A label-free quantitative proteomics approach for MMP-9 substrate degradomics. Cell culture supernatants of PC3-ML control and those treated either with BB-94 inhibitor or induction of MMP-9 RNA interference (RNAi) knockdown were harvested and subjected to in-solution trypsin digestion. Samples were separated by reversed phase nano-UPLC using a 25-cm 1.7-m-particle size C 18 column coupled to a Q-TOF tandem mass spectrometer (shown as base peak intensity liquid chromatogram (BPI LC)). MS data acquisition was performed using MS E technology, which is based on continuous switching between low and high collision energy to collect precursor ion masses (low collision) and fragment ions (high collision) at intervals as short as 0.5 s. Low collision MS provides information about precursor ions (indicated as M1 and M2) and their retention times revealed by extracted ion chromatograms (EIC). High collision energy MS provides the masses for fragment ions (indicated as F1 and F2). Association of fragment ions to their corresponding precursor ion is determined by their similar retention times (obtained from extracted ion chromatograms) and allows the creation of MS/MS spectra that can be used to obtain sequences and protein identification.
shapes in MS mode because of fragment data collection in MS/MS mode at the same time (43). In addition, for a complex mixture, a fraction of precursor ions will be selected for sequencing only because of the sequential acquisition of MSand MS/MS-based spectra (43). To overcome this, we used the recently developed MS E technology, which is based on continuous switching between low and high collision energy to collect precursor ion masses (low collision) and fragment ions (high collision) at intervals as short as 1 s (35,36). In this way, approximately 5-10 times more information can be obtained from a single LC run as compared with data-dependent switching modes (43). The assignment of precursor ions to their associated fragment ions is based on their equivalent retention time profiles, allowing the recreation of MS/MS spectra (Fig. 2). In addition, in the same chromatography run, sufficient data points can be collected in low collision energy mode to define a precursor ion peak shape that is adequate for quantitation by measuring its peak height. The principle of relative quantitation without previous incorporation of stable isotope tags of amino acids is based on measuring peak intensities of individual peptide masses obtained in separate chromatography runs for each sample (Fig. 3, A and B). Nano-UPLC is now sufficiently reproducible to accomplish this task (Fig. 3B). Each mass peak (precursor or fragment ion) is therefore defined by its exact mass and retention time (EMRT). A, chromatographic separation of cell culture supernatants digested with trypsin on a nano-UPLC C 18 reversed phase column using a 90-min gradient (base peak intensity chromatogram). B, relative quantitation is based on peak intensities observed in base peak intensity chromatograms as shown for peaks in the range of 87-92 min (black and red). C, a comparison between proteolytic digests obtained from PC3-ML control (ctrl) and MMP-9 knockdown cell culture supernatants reveals the presence of ϳ30,000 mass peaks defined as EMRTs with different intensities obtained in a single chromatographic run. Along the diagonal axis are the log values of peptide ion intensities that were present at equal levels between the two samples, and differentially expressed peaks are present above or below the axis, representing tryptic peptides from proteins that were either increased or reduced upon MMP-9 knockdown, respectively. The adequacy of relative quantitation is refined by analyzing the samples in triplicate runs, allowing the detection of changes in expression above the relative ratios of 1.1 and below 0.9 with high significance (p Ͻ 0.01; dotted lines). norm, normalized; KD, knockdown. D, the availability of triplicate data sets provides information about the data quality, such as replicate rates between repeated runs (left panel) and the confidence and reproducibility of protein identification (right panel). Under these conditions, 200 proteins were identified in all three runs (triplicates) and are therefore of high confidence. 800 proteins were assigned in two of the three repeats and are therefore of intermediate confidence, and 3700 proteins were identified in one replicate, representing low confidence hits (see also supplemental material for additional information).
A comparison between proteolytic digests obtained from media of control and MMP-9 knockdown PC-3ML cells revealed the presence of ϳ30,000 EMRTs with different intensities (Fig. 3C) obtained in a single chromatographic separation step. Along the diagonal axis are the peptide ions that were present at similar levels in each of the two samples, and differentially expressed peaks are present above or below the axis, representing tryptic peptides from proteins that were either increased or reduced upon MMP-9 knockdown, respectively. The adequacy of relative quantitation was refined by analyzing the samples in triplicate runs, allowing the detection of changes in expression above the relative ratios of 1.1 and below 0.9 with high significance (p Ͻ 0.01; Fig. 3C, Table I, and supplemental material). The availability of triplicate data sets can also provide information about the data quality, such as replicate rates between repeated runs (Fig.  3D, left panel) and the confidence and reproducibility of protein identification (Fig. 3D, right panel). Under these conditions, 200 proteins were identified in all three runs (triplicates) and are therefore of high confidence. 800 proteins were assigned in two of the three repeats and are therefore of intermediate confidence.
A subset of the identified proteins including known MMP-9 substrates and novel substrate candidates is described in Table I. For instance, known substrates such as collagens I, II, and IV and integrin ␤1 (1, 2, 44) were found to be reduced in media from MMP-9 knockdown PC-3ML cells. Galectin 3, which was shown previously to be processed by MMP-9 (45), was found to be significantly reduced when MMP activity was inhibited pharmacologically. In the same screen, a number of novel substrate candidates were detected. Collagen VI (␣1 and ␣2), laminin ␤1, MMP-13, protease nexin-1, neuropilin, PCDGF, clathrin, integrin ␣3, and CD166 appeared to accumulate when MMP-9 levels were reduced. In addition, collagen ␣2 I and fibronectin were found to be decreased in the absence of active MMP-9, suggesting that they may be candidates for MMP-9-dependent cell surface shedding (Table I). TIMP-1, a naturally occurring inhibitor of MMP-9 that tightly binds to it, may have been reduced in amount simply because of decreased MMP-9.
Validation of Changes in Protein Levels after Down-regulation of MMP-9 -We used several criteria to select candidate proteins for biochemical validation as MMP-9 substrates. First, we chose proteins that have been identified as secreted, membrane-bound, or on the cell surface, recognizing that this might exclude proteins for which current information is incomplete. Second, only proteins that scored above 40 for significance were used. Third, an MMP-9 siRNA:control (KD/C) ratio Ͼ1 was set for selecting candidate substrates as the implication was that if they were truly substrates they would accumulate when MMP-9 activity was inhibited. This would exclude any surface protein that is released or shed into the medium by MMP-9, and this ratio would decrease in level after MMP-9 down-regulation. We did not choose candidates from the BB-94-generated list. Fourth, we required that good commercial antibody be available to facilitate validation. Fifth, we chose proteins whose levels had large differences in amount as well as those with only modest changes. These criteria led us to test APP, collagen VI, LIF, neuropilin-1, PCDGF, and PN-1 for protein levels in the conditioned medium generated from control and treated cells by Western blotting.
In this set of experiments, we sought to compare the results from the proteomics determination of differences with changes seen on immunoblotting. APP pathologically accumulates and forms aggregates in patients with Alzheimer disease (46). Interestingly MMP-9 and MMP-2 have been shown to cleave APP in vitro and in vivo (47)(48)(49). APP was found at a slightly increased level in the conditioned medium from the MMP-9 KD cell line compared with the control (Fig. 2) consistent with the MS KD/C ratio of 1.1 (Table I). There was a more marked decrease in the amount in conditioned medium from the BB-94-treated cell line also consistent with the ratio of 0.3. Collagen IV, whose denatured form has been well established as a classic substrate for MMP-9, appeared in the list. We chose, however, to test the potentially novel substrate collagen VI, a less well characterized collagen. ␣1 and ␣2 chains of collagen VI accumulated in the medium from KD cells consistent with the ratio of 1.5. The proteomics results in the BB-94-treated cells with a ratio of ϳ0.8 were the only example of proteomics data not confirmed by Western blotting with more and not less collagen VI detected (Table I and Fig. 4). LIF scored at 55.69 and had KD/C ratio of 2.6, consistent with immunoblotting results. LIF is a cytokine that promotes the long term maintenance of embryonic stem cells by suppressing spontaneous differentiation. Neuropilin-1, a transmembrane protein with a large extracellular domain (50), accumulated in the conditioned medium of the MMP-9 KD cell line (Fig. 4), consistent with an MS score of 123 and KD/C ratio of 1.9. The fragment (75 kDa) released into conditioned medium appears to be a soluble isoform of neuropilin-1 (51). Neuropilin-1 was predicted by the proteomics screen to be unaltered in conditioned medium after BB-94 treatment, and this was confirmed by Western blotting. PCDGF, a secreted protein also known as granulins precursor or GP88, increased in both conditioned medium and whole cell lysates from MMP-9 KD cells and in conditioned medium from BB-94-treated cells. PN-1 also accumulated in the conditioned medium from both MMP-9 KD cells (score, 40.83; ratio, Ͼ10) and BB-94-treated cells (score, 81.46; ratio, 2.0) by proteomics analysis (Table I). Thus, the changes in levels noted in the proteomics screens were largely confirmed by immunoblotting.
We examined the complexity of the system in response to other manipulations that perturb protease activity. BB-94 as indicated above is a nonspecific inhibitor of MMPs. Thus in a simple system the results of down-regulation of MMP-9 might be paralleled by its inhibition by a nonspecific inhibitor such as BB-94. In fact, however, these results did not track together as measured either by proteomics or by immunoblotting. Similarly overexpression of MMP-9 induced by PMA, a treatment that has many effects in addition to up-regulation of MMP-9, might in a simple system exaggerate the difference with the MMP-9 down-regulated cells. However, this was not the case indicating that the regulation allowing accumulation of proteins in the condi-tioned medium was much more complex. Thus to determine the validity of proteins as MMP-9 substrates, we sought to determine whether candidates could be cleaved by MMP-9 through biochemical means.
Processing of Collagen VI Is Dependent on MMP-9 -One of the most abundant proteins found in PC3-ML cell culture supernatants was collagen VI (Table I). This collagen type  (Fig. 4). f PLGS score, sequence coverage, and number of MS/MS spectra derived from the MMP-9 siRNA experiment. g PLGS score, sequence coverage, and number of MS/MS spectra derived from the BB-94 inhibitor experiment. h Accession number for the mouse protein sequence.
appears to affect early mammary tumor progression by modulating the tumor/stroma microenvironment (52). We noted a wide range of relative abundance rates for individual tryptic peptides derived from collagen VI that correlated with their location within different domains (Fig. 5). Peptides located at the N and C termini and outside the triple helix domain strongly accumulated when MMP-9 levels were reduced, suggesting MMP-9-dependent processing under normal conditions (Fig. 5, A and B). MMP-9 Cleaves LIF and PN-1 in Vitro-To test whether the candidate substrates could be directly digested by MMP-9, cleavage assays were set up in vitro using active MMP-9 enzyme and recombinant proteins. MMP-9 cleaved LIF and PN-1 (Fig. 6) in a dose-and time-dependent manner. PN-1 remained intact after incubation with either pro-MMP-9 or active MMP-2 (Fig. 6A), demonstrating the specificity of the MMP-9 cleavage. The MMP-9-specific cleavage sites on PN-1 were identified through proteomics approaches. The cleavage products (5% of total reaction) were resolved by SDS-PAGE; gels showed three predominant PN-1-reactive fragments at around 37, 33, and 12 kDa (Fig. 7A) but also a number of intermediate breakdown fragments (Fig. 7B). To map candidate cleavage sites, the reaction mixture (50%) was processed by tryptic in-solution digestion. The sequence coverage obtained was up to 89%, and potential MMP-9-specific cleavage sites were identified as peptides without tryptic cleavage sites (C-terminal Arg or Lys or not preceded by these residues) that were present in MMP-9-treated samples only (Fig. 7C). Peptide mapping suggests the 37-kDa fragment to be 59 -398 (37,515 Da), the 33-kDa fragment to be either 59 -367 (33, 947 Da) or 108 -398 (32,355 Da), and the 12-kDa fragment to be 255-367 (12,100 Da) or 1-107 (11,700 Da; Fig. 7C). Evidence for the presence of the latter fragment was obtained by Q-TOF-MS analysis ([M ϩ 3H] 3ϩ 3901.13Da (data not shown)). Based on these results, MMP-9 appears to cleave PN-1 predominantly at the sites PHGI 58 2A 59 and I 107 2V 108 , therefore preferentially before small aliphatic amino acids such as Ile, Gly, and Val, which is consistent with other studies that examined MMP-9 cleavage of non-collagenous substrates (53,54). In particular, the cleavage site in the PN-1 sequence at position 58/59 corresponds to the major MMP-9-specific cleavage consensus sequence PXX2Hy(S/T) (Hy is a hydrophobic amino acid) as described previously (54). DISCUSSION In this study, we used a label-free quantitative proteomics approach to identify novel substrates for MMP-9 in cancer cells and confirmed changes in abundance levels and cleavage of some of them using biochemical assays and peptide mapping. Whereas mass spectrometry-based quantitation has been routinely used to measure changes in levels of small molecules, it has been less trivial to perform relative and absolute quantitation of protein levels in biological samples by MS (32). This was due in part to a low reliability and reproducibility of nano-LC as well as fluctuations in electrospray ionization that affect the MS acquisition quality (32). In addition, LC-MS analysis of complex biological samples (proteolytic digests) can give rise to ion suppression, which complicates the retrieval of quantitative information based on ion intensities (32). These shortcomings have been overcome by the introduction of isotope labels such as 15 N labeling, stable isotope labeling with amino acids in cell culture (SILAC), ICAT, isotope-coded protein labels (ICPL), or iTRAQ that allow the mixing of samples prior to analysis to minimize variations during the data acquisition process (55). However, the data qualities derived from approaches that include introduction of isotopes depend on the yield of metabolic labeling or incorporation of thiol/amine-reactive chemical groups. In addition, the use of isotope-containing compounds can be costly for biological experiments (26). Therefore, a label-free approach remains attractive for such large scale comparisons of peptide and protein content.
Two recent technical developments contributed to major improvements that render this method applicable to such studies. First, UPLC, initially developed for more rapid sepa- Conditioned medium and whole cell lysates collected from PC-3ML cells treated as indicated were subjected to electrophoresis, transferred to PVDF membranes, and probed with the appropriate antibodies. Cell lysate aliquots were made and normalized by intracellular protein concentration. The loading of concentrated conditioned medium was normalized according to cell numbers and intracellular proteins. ration solutions to reduce costs and increase throughput, has been introduced to nano-LC-based methods for proteomics (31). Elevated chromatography performance by nano-UPLC coupled to fast MS/MS significantly increases sensitivity and in-depth analysis of peptides and proteins (31,33). Analysis by conventional nano-LC-MS as compared with nano-UPLC-MS has revealed the superior resolution power of the latter technology (43). 2 Samples may contain several hundred proteins that result in more than 10,000 peptide species after proteolytic digestion. A common strategy to deconvolute and analyze a biological sample of this complexity was the use of multidimensional protein identification technology (MudPIT), which comprises two different chromatography steps (56). Such experiments are usually very time-consuming, require a significant amount of instrument time and complex data analysis, and are therefore feasible only in a laboratory that is equipped with a large infrastructure. In addition, multidimensional chromatography approaches are usually associated with sample loss (33,57). In many cases, it represents a significant effort to prepare the amounts of biological material required to conduct such an analysis. Here we found that nano-UPLC may provide a significant advantage, namely that such complex biological samples can be analyzed by a smaller number of chromatography steps, resulting in a significant gain of time and effort to conduct such an experiment.
Second, the development of MS E allows the collection of 5-10 times more precursor ions and fragmentation data as compared with DDA modes (35,36,43) because of a sequential low and high collision energy data acquisition cycle (Fig.  3). MS E -based data acquisition permits the collection of sufficient data points in low collision mode for the quantification of peak ion intensities and, at the same time, acquisition of fragmentation data in high collision mode for protein identification (Fig. 3). The successful analysis of this type of data was dependent on the development of specific software (PLGS) (36,43). The assignment of precursor and its corresponding fragment ions is possible because of stringent settings for EMRT clusters (typically around 5 ppm and below Ϯ0.25 min) even in complex mixtures (35,36). Therefore, the mass accu-Position peptide sequence MMP-9 KD : Ctrl Ratio 28-51  AVAFQDCPVDLFFVLDTSESVALR  >10  0  1  >  K  D  V  L  A  G  Y  P  K  L  1  6  -2  5  83-105  NLVWNAGALHYSDEVEIIQGLTR  >10  5  2  .  0  K  I  A  C  D  T  Y  T  G  4  3  1  -6  2  1 135-148 GLEQLLVGGSHLK 2.5 186-199  A, collagen VI sequence coverage obtained by nano-UPLC-MS E analysis of trypsin-digested cell culture supernatants from control (underlined) and MMP-9 KD samples (black letters). B, relative quantitation of individual tryptic peptides derived from collagen VI observed in control (Ctrl) and MMP-9 KD cell culture supernatants. MMP-9-sensitive areas were defined in which tryptic peptides with relative abundance ratios (MMP-9 KD/control (Ctrl)) were more than 10. Mass intensity values were chosen only from peptides that were identified with a high confidence PLGS score (Ͼ95%) and that were observed in at least two of the triplicate runs. C, MMP-9-sensitive areas of collagen VI are predominantly located in the C-and N-terminal globular regions (VWA, von Willebrand factor domain), whereas the triple helix domain appears to be more stable (Col, collagen triple helix domain), indicating differential processing of collagen VI by MMP-9. The domain structure of human collagen VI (accession number P12109) was obtained from the Pfam Website. racy achieved with this system reduces redundancy of multiple peptide assignments. In addition, when fragmentation data can be matched to peptide sequences that are observed in more than one protein that share similar probabilities, PLGS will exclude this data from further quantitative analyses to avoid potential errors in determining the relative abundance ratios of these proteins.
We noted that fragmentation spectra created via MS E were of lower quality than DDA-based MS/MS spectra because collision energies were adapted to mass and charge state in the latter method, whereas high collision energy ramping between 15 and 40 eV was used in MS E throughout the entire analysis. As a consequence, data-dependent acquisition appears to be more appropriate for the detection of post-translational modifications. However, the protein sequence coverage in DDA mode is generally lower as observed with MS E . 2 The PLGS software takes into account some of the shortcomings of MS E data and gives more weight to the mass accuracy of the peptide precursor and fragment ions in addition to considering their retention times (EMRT clusters), whereas Mascot, at least the version used for this study, mainly considers the number of matched fragment ions, which favors data with high fragmentation quality obtained from data-dependent MS analysis experiments. Based on these reasons, the PLGS search engine is more optimized for the interpretation of MS E data. This is exemplified by the detection of the proteins neuropilin and LIF as well as their relative changes in abundance, which were confirmed by immunoblotting (Fig. 4). Interrogation of the MS data with Mascot failed to detect these proteins ( Table I). Analysis of samples in triplicates permitted the improvement of the confidence level of assignments by PLGS based on multiple identifications in more than one replicate in some cases (Fig. 3D). PLGS analysis detected MMP-9-dependent changes in abundance levels of collagens II, IV, and XI, all of which are known to be MMP-9 substrates (1, 2, 4, 58), whereas Mascot searches did not assign confident scores for any of these proteins (Table I). The problem with proteomics data sets is that in the range below high confidence scores the false positive rate can increase considerably (supplemental Table S3), but these data cannot simply be discarded because they may contain biologically significant information. Future data curation and evaluation processes will certainly benefit from combining different search engines with complementary algorithms and selection criteria to further improve the accuracy of MS data interpretation.
Proteomics analysis of conditioned medium from PC-3ML cells after knockdown of MMP-9 revealed at least 100 proteins with variations in level. The variations detected were sometimes small (Ͻ2-fold) but were confirmed for substrates by Western blotting in several cases (Fig. 4). A possible reason for this is the use of a knockdown approach so that we would be comparing endogenous levels of MMP-9 expression without genetically engineered activation to avoid artifacts due to overexpression. The complexity of changing the expression of this one protease was surprisingly large. Several of the resultant changes were of proteases or their inhibitors that could lead to a cascade of effects. We cannot know how many of the affected proteins were altered in level as a result of direct action by MMP-9 or of indirect effects. Some of the proteins identified are known substrates of MMP-9, and we confirmed that several others could be cleaved by MMP-9 in vitro. Nonetheless this type of analysis may provide an initial snapshot and allow the construction of maps of protease interactions and networks.
The MMP-9 that we detected in the medium of the PC-3ML cells was 92 kDa, the proform, whereas the most active forms of MMP-9 are cleaved at the N terminus. However, MMP-9 has been shown to be active in its proform (16). In other cases MMP-9 activity has been detected at the cell surface without cleaved MMP-9 detection in the medium (59 -61). Fridman and co-workers (62) reported cleaved cell surface MMP-9 that was a minority of the total MMP-9. We cannot be sure in this system whether activation by cleavage was required for the proteolysis we detected.
The selection of substrates by MMP-9 is not a simple matter of recognition of a linear amino acid sequence. The triple helical domains of collagens tend to be resistant to MMP-9 (63). A detailed examination of the susceptible peptides in collagen VI is consistent with this pattern (Fig. 5). The cleavage sites sequenced in protease nexin-1 do not clearly conform to the identified MMP-9 cleavage sites in collagenous substrates, although the site at position 59 has proline in the P3 position as has been identified for MMP-9 cleavage of triple helical substrates (54). The exact components that allow MMP-9 to recognize substrates are not yet clear. Some of the sites in PN-1 have similarities to the sequences termed Group II constructed from recognition of cleavage sites in phage libraries. Group II consists of Gly-Leu2(Lys/Arg). The ultimate determination of the sites to be utilized by MMP-9 awaits confirmation by mutation analysis.
Taken together, UPLC-MS E provides a label-free quantitative proteomics approach that was used to discover candidate MMP-9 substrates. Subsequent biochemical analysis and identification of cleavage patterns suggested that some of these proteins are likely to be MMP-9 substrates. This method proved to be robust and allowed the identification of peptide differences of relatively small magnitude, such as 2-fold or lower. Our results provide the framework to examine the functional role of these substrates in cancer metastasis.
Acknowledgments-We thank Dr. Holger Kramer for expert assistance with the sample analysis by tandem mass spectrometry and members of the group for insightful discussion. We are grateful to Dr. Ghislain Opdenakker for the kind comments on the manuscript. We also acknowledge the services of the Computational Biology Research Group at the University of Oxford in this project. * This work was supported, in whole or in part, by National Institutes of Health grants (to D. X. and R. J. M.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.   7. Mapping MMP-9 cleavage sites on PN-1. A, recombinant PN-1 (500 ng) was incubated with 100 ng of active MMP-9 at 37°C for 20 min or 1 h. 5% of the cleaved products were resolved by 15% SDS-PAGE and immunoblotted with anti-PN-1 antibody. B, second independent experiment as described in A except using a different batch of recombinant MMP-9 (Calbiochem). C, sequence coverage obtained by in-solution trypsin digestion of the samples above followed by LC-MS/MS analysis as described under "Experimental Procedures." Peptides detected by MS are highlighted, and potential MMP-9 cleavage sites are indicated by triangles. § § Supported by CRUK and Medical Research Council (MRC) grants. To whom correspondence may be addressed.