Identification of Sites of Mannose 6-Phosphorylation on Lysosomal Proteins*

Most newly synthesized soluble lysosomal proteins contain mannose 6-phosphate (Man-6-P), a specific carbohydrate modification that is recognized by Man-6-P receptors (MPRs) that direct targeting to the lysosome. A number of proteomic studies have focused on lysosomal proteins, exploiting the fact that Man-6-P-containing forms can be purified by affinity chromatography on immobilized MPRs. These studies have identified many known lysosomal proteins as well as many proteins not previously classified as lysosomal. The latter are of considerable biological interest with potential implications for lysosomal function and as candidates for lysosomal storage diseases of unknown etiology. However, a significant problem in interpreting the biological relevance of such proteins has been in distinguishing true Man-6-P glycoproteins from simple contaminants and from proteins associated with true Man-6-P glycoproteins (e.g. protease inhibitors and lectins). In this report, we describe a mass spectrometric approach to the verification of Man-6-phosphorylation based upon LC-MS of MPR-purified proteolytic glycopeptides. This provided a useful tool in validating novel MPR-purified proteins as true Man-6-P glycoproteins and also allowed identification of low abundance components not observed in the analysis of the total Man-6-P glycoprotein mixture. In addition, this approach allowed the global mapping of 99 Man-6-phosphorylation sites from 44 known lysosomal proteins purified from mouse and human brain. This information is likely to provide useful insights into protein determinants for this modification and may be of significant value in protein engineering approaches designed to optimize protein delivery to the lysosome in therapeutic applications such as gene and enzyme replacement therapies.

The greatest challenge in "whole cell" proteomic approaches to understanding cellular function is the fact that protein identification techniques are greatly limited by both the large number of individual proteins and isoforms and the wide range of their abundance. Multidimensional sample frac-tionation can increase proteome coverage in such global approaches, but complexity still remains a significant problem. One approach that addresses this challenge is the proteomic characterization of subcellular organelles (for a review, see Ref. 1). Although by definition more narrow than global studies, the great reduction in sample complexity allows for higher relative proteome coverage. In addition, proteomic analysis of organelles provides useful insights into the biological function of the individual proteins and the organelle in question.
One cellular organelle that has been the focus of a number of proteomic studies is the lysosome, a membrane-delimited compartment that contains over 50 soluble hydrolytic enzymes and associated accessory proteins that play a major role in the catabolism of macromolecules within the cell (2) and are important in such processes as endocytosis, autophagy, and antigen presentation. Lysosomes and lysosomal proteins are of considerable biomedical importance, and defects in lysosomal proteins result in over 40 human genetic diseases, the lysosomal storage disorders. In addition, alterations in the lysosomal system have been implicated with more widespread diseases including cancer, Alzheimer disease, rheumatoid arthritis, and atherosclerosis.
A number of proteomic studies (3)(4)(5)(6)(7)(8)(9) rely upon the fact that it is possible to distinguish soluble lysosomal proteins from other classes of proteins based on a specific post-translational modification that is required for lysosomal targeting. Newly synthesized lysosomal proteins differ from other classes of glycoproteins in that they bear one or more Nlinked oligosaccharides containing mannose 6-phosphate (Man-6-P). 1 This carbohydrate modification is recognized within the cell by two Man-6-P receptors (MPRs) that direct the vesicular trafficking of the newly synthesized lysosomal proteins to a prelysosomal compartment. The Man-6-P marker is removed in most cell types but in others, e.g. neurons, it is retained to some extent (3,10). In affinity purification protocols, immobilized purified MPRs can be used to bind Man-6-P glycoproteins that can be specifically eluted using free Man-6-P.
As expected, many of the proteins isolated by MPR affinity chromatography are known lysosomal proteins. However, this approach has also led to the tentative lysosomal classification of both novel proteins and previously characterized cellular proteins that were not originally assigned lysosomal function. These proteins are of considerable interest as potential novel components of the lysosomal system, and their identification as such will clearly extend understanding of this organelle. In addition, novel lysosomal proteins may represent candidates for defective genes in lysosomal storage diseases of unknown etiology, and this is an approach that has led to the identification of two human disease genes (11,12). However, a significant difficulty in interpretation of the results of MPR affinity chromatography lies in distinguishing true Man-6-P glycoproteins from proteins that do not contain this modification. Some contaminants simply represent highly abundant proteins that, although greatly diminished, are still detected using highly sensitive analytical methods. Other contaminants represent non-lysosomal proteins that associate with and copurify with true Man-6-P glycoproteins. These include protease inhibitors bound to lysosomal proteases, lectins that recognize the carbohydrate moieties of lysosomal proteins, and molecular chaperones.
Bioinformatic approaches can help in the classification of lysosomal proteins and identification of these false positives (7). Some MPR-purified proteins, for example, can be eliminated because they lack potential N-linked glycosylation sites (e.g. protease inhibitors cystatin B and cystatin C). Others can be considered unlikely because they are not predicted to contain signal sequences (e.g. the cytoplasmic lectin F-box only protein 2). However, these sorts of bioinformatic approaches are imperfect and are likely to generate false negatives. For example, acid sphingomyelinase, a known lysosomal Man-6-P glycoprotein, is not predicted to contain a signal sequence using SignalP 3.0 (13). In addition, although such approaches are useful in eliminating some false positives, they are of limited value as many non-lysosomal proteins contain both potential N-linked glycosylation sites and predicted signal sequences (7). Thus, a direct method to verify the presence of the Man-6-P modification would be extremely valuable in these studies.
Previously we attempted to directly validate the presence of Man-6-P by fractionation of the proteins purified by MPR affinity chromatography by two-dimensional gel electrophoresis and detection of proteins containing Man-6-P using a radiolabeled MPR probe (12,7). However, a significant technical problem with this approach was that many spots were found to contain more than one protein species, thus the source of a positive signal was frequently ambiguous. In addition, a conceptual problem with this approach is that many lysosomal proteins consist of multiple subunits or chains of which some contain Man-6-P and some do not. Another approach has been to express and purify candidate proteins containing His tags and demonstrate Man-6-P-dependent binding to MPRs in vitro and Man-6-P-dependent internalization and lysosomal targeting in cell lines (8), but again this is an indirect approach that cannot distinguish between true Man-6-P glycoproteins and potential contaminants that have been identified by virtue of interactions with true Man-6-P glycoproteins. In addition, none of these approaches provides any information with respect to the actual sites of the Man-6-P modification, and each must be pursued on an individual, rather than global, basis.
A recent focus of interest in the proteomic field is the use of glycopeptide capture to identify glycosylated peptides from N-linked glycoproteins using either affinity capture with lectins (14,15) or chemical capture (16). In this report, we describe a sensitive and unambiguous method to globally identify peptides containing Man-6-P, and we used this approach to validate candidates for novel lysosomal proteins. In addition, we provide a database for Man-6-phosphorylation sites in known lysosomal proteins that is likely to prove valuable in terms of both the understanding of lysosomal cell biology and in bioengineering of lysosomal proteins for therapeutic applications.

EXPERIMENTAL PROCEDURES
Purification of Human and Mouse Brain Man-6-P Glycoproteins-Man-6-P glycoproteins were purified from normal human brain autopsy samples and from brain samples collected from mixed strain BALBc/129SvEv/C57BL6 mice as described previously (7).
LC-MS/MS-Reduction and alkylation of MPR-purified proteins were as described previously (7), and digestion with trypsin (Promega, Madison, WI), Glu-C (Wako Pure Chemical Industries, Osaka, Japan), and Lys-C (Roche Applied Science) was conducted overnight at 37°C in 50 mM ammonium bicarbonate. Digestion with Glu-C was also performed in 100 mM sodium phosphate, pH 7.8. Proteolytic digests of MPR-purified proteins were loaded onto a 300-nm ϫ 5-cm Pepmap C 18 trap column (Dionex/LC Packings, Amsterdam, the Netherlands) and desalted for 5 min using 0.1% formic acid at a flow rate of 25 l/min. Peptides were then back-flushed onto a fritless nanoscale column (75 m ϫ 15 cm) packed with Poros 10, R2 (Applied Biosystems, Foster City, CA) and eluted using a linear 2-45% gradient of solvent B (0.1% formic acid in acetonitrile) in solvent A (0.1% formic acid) over 40 min (tryptic digest of gel slices) or 150 min (solution tryptic digest of affinity-purified proteins) with a flow rate of 300 nl/min using an Ultimate nano-LC system (Dionex/LC Packings). Peptides were analyzed by ESI-MS/MS using an LTQ ion trap mass spectrometer (ThermoFinnigan, San Jose, CA) equipped with a nanospray source (Proxeon Biosystems, Odense, Denmark). Each MS scan was followed by MS/MS of the 15 most intense ions with a dynamic exclusion of 2 min and a repeat count of two. For analysis of MPR-purified Man-6-P glycopeptides, the sample was first loaded onto the trap column and washed with 0.1% formic acid for 40 min at a flow rate of 25 l/min before back-flushing to the nanoscale column. Peptides were eluted using a gradient of 2-45% B over 50 min. Each MS scan (m/z ϭ 400 -1800) was followed by subsequent zoom scans and MS/MS scans of the four most abundant ions with a dynamic exclusion of 1 min and repeat count of two. Peak lists were generated from raw files using TurboSequest. Parameters were: peptide mass range of 500 -5000, threshold intensity of 1000, precursor mass tolerance of 1.4 m/z, minimum ion count of 50, and automatic charge state determination.
Database Searching-Searching of the human (ENSEMBL 28.35a.1, National Center for Biotechnology Information (NCBI) 35, May 2005) and mouse (ENSEMBL, NCBIm34, May 2005) was conducted using a local implementation of X! Tandem version 2005.02.1, which is open source software for protein identification using MS/MS data (17). For the unfractionated proteolytic digest mixtures, LTQ data was searched using a precursor ion mass error of ϩ4 and Ϫ0.5 Da and a fragment mass error of 0.4 Da. Errors in assignment of monoisotopic mass were not permitted. Cysteine carbamidomethylation was specified as a complete modification, and methionine oxidation was a permitted variable modification during development of the preliminary model with one missed cleavage site allowed. Methionine oxidation and deamidation at asparagine and glutamine residues were allowed during model refinement. Database searching for deglycosylated peptides was as described for the unfractionated proteolytic digests except that mass increments on asparagine as a result of deglycosylation were allowed both as a potential modification and as a potential motif in both preliminary modeling and in model refinement. For Endo H, an N-acetylhexosamine linked to asparagine remains after oligosaccharide cleavage that results in a 203-Da increment (Nϩ203). For PNGase F, conversion of asparagine to aspartic acid after oligosaccharide removal results in a 1-Da increment (Nϩ1). The X! Tandem output was modified to generate Excel spreadsheets containing all peptides identified, their sequences, and sites and types of modification (e.g. position and number of Nϩ1 and Nϩ203 modifications resulting from deglycosylation with PNGase F and Endo H, respectively).
Criteria for Protein Identification-In our analysis of the unfractionated MPR affinity-purified mixtures, the threshold score for acceptance of non-lysosomal proteins was set as the lowest score observed for a known lysosomal protein. This resulted in a highly stringent cutoff of protein log(e) ϳ Ϫ16.5. The threshold score for acceptance of purified Man-6-P glycopeptides from proteins not previously classified as lysosomal that were also identified in the unfractionated mixture was set as the lowest score observed for a peptide derived from a known lysosomal species (peptide log(e) ϳ Ϫ1.3 and Ϫ1.1 for human and mouse, respectively), although spectra of log(e) Ͼ Ϫ3 were also inspected manually and removed from our analysis if ambiguous. Although these thresholds for assignment are relatively low, identification of the corresponding protein in the unfractionated mixture together with the observation of Nϩ203 in the context of a potential N-linked glycosylation site (NX(S/T)) greatly increased the confidence of assignment. Some peptides were observed from proteins not previously classified as lysosomal that were not observed by LC-MS analysis of the unfractionated mixtures. In these cases, these spectra were inspected manually and accepted only if a contiguous ion series included the modified NX(S/T) and if the peptide log(e) score was less than Ϫ2.
Purification and Identification of Proteolytic Peptides Containing Man-6-P-Eppendorf (Westbury, NY) BioPur tubes were used throughout. MPR affinity-purified proteins were digested with trypsin (Promega), Lys-C (Calbiochem), or with a combination of trypsin and Glu-C (Calbiochem). Ten micrograms of peptides in ammonium bicarbonate were heated twice for 10 min at 100°C to inactivate proteases, lyophilized, dissolved in water, lyophilized twice to remove trace ammonium bicarbonate, and then dissolved in 300 l of PBS containing 1 mM Pefabloc (Pentafarm, Basel, Switzerland). Peptides were transferred to another tube containing a 50-l bed volume of Affi-Gel 15 (Bio-Rad) containing sCI-MPR coupled at a density of 1 mg/ml and incubated with rocking for 60 min on ice. The peptide-sCI-MPR/Affi-Gel mixture was then placed in a microcolumn at 4°C containing an additional 50-l bed volume of sCI-MPR/Affi-Gel 15. The flow-through was reloaded 10 times, and then the microcolumn was placed in a 1.5-ml microcentrifuge tube and centrifuged for ϳ2 s in a bench-top microcentrifuge to remove flow-through but not completely dry the resin. The resin was washed with 500 l of PBS, with 500 l of PBS containing 10 mM glucose 6-phosphate, and finally with 500 l of 20 mM sodium phosphate buffer, pH 6.9, and then centrifuged again to remove excess buffer. The beads were resuspended in 100 l of 20 mM sodium phosphate buffer, pH 6.9, containing 10 mM Man-6-P and incubated on ice for 5 min. The column was then centrifuged briefly, and the eluate was reloaded three times. The column was then centrifuged to dryness (10 s in microcentrifuge), washed with 50 l of water to remove any remaining glycopeptides, and centrifuged again. Pooled eluates were concentrated to 50 l in a vacuum concentrator resulting in a final buffer concentration of 40 mM sodium phosphate buffer, pH 6.9. To deglycosylate the eluted peptides, 0.5 l of Endo H (1000 units/l) or PNGase F (500 units/l) (New England Biolabs) was added, and the sample was incubated at 37°C for 2 h. Another aliquot of glycosidase was then added, and the incubation was repeated.

RESULTS
In our previous study of proteins purified from human brain by MPR affinity chromatography, we analyzed a tryptic digest of the total mixture. In this study, we digested proteins purified from human brain using several different proteolytic strategies (digestion with trypsin, Lys-C, Glu-C, and a mixture of trypsin and Glu-C) to provide wider proteome coverage. In addition, we also analyzed proteins purified from mouse brain after digestion with trypsin. Data for human brain were compiled from over 20 individual LC-MS/MS experiments on solution digests of MPR affinity-purified proteins. Data for mouse brain were obtained from individual LC-MS/MS runs on tryptic digests of 29 consecutive gel slices from 100 g of MPR affinity-purified proteins fractionated in a single lane by SDS-PAGE. Forty-eight known lysosomal proteins were identified from human brain (Table I), representing the 44 proteins identified previously (7) as well as four additional known lysosomal proteins (galactocerebrosidase, cathepsin B, GM2 activator, and palmitoyl-protein thioesterase 2). In mouse, 42 known lysosomal proteins were found (Table I), all of which were observed in the human sample with the exception of cathepsin S. In total, analysis of proteins purified from mouse and human brain resulted in the identification of 49 known lysosomal proteins.
Forty-three proteins that are not currently thought to be lysosomal were identified in human brain (Table II), and of these, 13 were also identified in mouse brain. An additional 22 proteins not currently thought to be lysosomal were found only in mouse brain (Table II). Identification of proteins from both sources not only increases the confidence of peptide assignment but also identifies them as proteins that are less likely to represent simple contaminants. Conversely, proteins that are species-specific might be regarded to be more likely to represent contaminants, although differences in the data-sets from mouse and human may also reflect the fact that the human sample was more extensively characterized than the mouse sample (see "Experimental Procedures").
A number of the mouse and human proteins that are not currently considered lysosomal have known or predicted biological properties that are consistent with lysosomal function. For example, acid sphingomyelinase-like phosphodiesterase 3A, epididymis-specific ␣-mannosidase, and plasma ␣-fucosidase are similar to known lysosomal proteins. Cat eye syndrome critical region 1, phospholipase D 3 , and hypothetical proteins LOC196463 and FLJ22662 are predicted to have hydrolytic activities; this is typical of most soluble lysosomal proteins. Candidates for novel lysosomal proteins are indicated in Table II, but these assignments are tentative, reflecting the difficulties in classifying MPR affinity-purified proteins when the presence of Man-6-phosphorylation is not verified.
Our approach to the identification of sites of N-linked oligosaccharides containing Man-6-P was as follows. 1) A mixture of proteins was purified from human and mouse brain by affinity chromatography on immobilized MPRs. 2) Aliquots of the mixture were digested with trypsin or other proteases. 3) A portion of each digest was analyzed by LC-MS/MS, and Ϫ37 Hyaluronan and proteoglycan link protein a Proteins with known or predicted biological function that may be consistent with lysosomal function.
results were collated to identify components of the mixture. 4) Ten micrograms of each digest were loaded onto a microcolumn of immobilized MPR. 5) The column was washed, and Man-6-P glycoproteins were eluted with free Man-6-P. 6) Man-6-P glycopeptides were enzymatically deglycosylated. 7) Peptides were analyzed by LC-MS/MS. 8) Spectra were assigned to peptides using X! Tandem. 9) Data were analyzed for the presence of mass modifications at asparagine residues that are indicative of deglycosylation and inspected to determine whether they fell within the context of an N-linked gly-cosylation motif. For the Endo H-deglycosylated human brain glycopeptides, data were compiled from 16 individual LC-MS/MS runs under different sample load and protease digestion conditions. Endo H-deglycosylated mouse brain glycopeptide data were compiled from two LC-MS/MS experiments using trypsin digestion. Prior to LC-MS/MS analysis of purified Man-6-P glycopeptides, carbohydrate residues must be removed, and two enzymes, PNGase F and Endo H, were used here. PNGase F will remove all N-linked oligosaccharides, resulting in conversion FIG. 1. Spectra of peptides containing potential N-linked glycosylation sites. MS/MS spectra were also assigned to the peptide containing Asn 135 after analysis of the unfractionated total purified mixture from human brain without deglycosylation (bottom panel) indicating that glycosylation at this site is heterogeneous. Comparison of the middle and bottom panels (i.e. spectra assigned to the NPC2 peptide containing Asn 135 with or without glycosylation) clearly illustrates the Nϩ203 modification. For each spectra, the y ion series (red) are shown together with two prominent b ions (blue) found in the spectra assigned to the peptide containing Asn 135 that also show the presence of the Nϩ203 mass increment. In addition, both spectra representing the glycosylated peptides (upper and middle panels) display a prominent peak representing a doubly charged ion that is unfragmented along the peptide backbone after neutral loss of the 203 m/z N-acetylglucosamine (green). of asparagine to aspartate with a ϩ1-Da increment in monoisotopic peptide mass. Endo H removes high mannose and some hybrid-type oligosaccharides leaving a residual N-acetylglucosamine coupled to asparagine with a signature mass increment of ϩ203 Da. Examples of spectra assigned to peptides with the Nϩ203 modification are shown in Fig. 1.
Preliminary experiments revealed significant problems with the use of Nϩ1 after deglycosylation with PNGase F as a mass marker for N-linked oligosaccharides when data are analyzed in an automated manner. False positive errors can occur when Nϩ1 is assigned to peptides that have not been deglycosylated. For example, we analyzed a total digest of human brain MPR affinity-purified proteins that was not deglycosylated with PNGase F (in which the Nϩ1 modification should not be found if it represents an unambiguous marker for deglycosylation) and found that ϳ8.9% of peptides were assigned Nϩ1, and of these, ϳ0.4% were assigned Nϩ1 in the context of a potential N-linked glycosylation site. In these cases, assignment of Nϩ1 could simply result from errors in spectra interpretation, or they could result from chemical or biological deamidation of asparagine residues. Spectral assignment errors are less likely to occur in analyses of PNGase F-deglycosylated Man-6-P glycopeptides because a high resolution zoom scan protocol was used to accurately determine parent ion mass, but deamidation in the context of NX(S/T) would remain indistinguishable from deglycosylation. Thus, it is clear that difficulties in determination of the source of the Nϩ1 modification are still likely to represent a potential problem when larger datasets are analyzed automatically. Errors in assignment of the Nϩ1 modification can also result in false negative errors when peptides are actually deglycosylated but when the Nϩ1 modification is assigned to the wrong asparagine residue. When we analyzed MPR affinity-purified human brain glycopeptides after deglycosylation with PNGase F, we found that of 823 automated assignments of Nϩ1 in 86 (representing 16 individual peptides) the modification was assigned outside of NX(S/T) (Tables III and IV). Of these, eight assignments were to three peptides that did not contain an NX(S/T) sequence, and assignment of Nϩ1 in these cases probably resulted from deamidation or spectra interpretation error. The remaining 78 assignments were to 13 peptides that contained an NX(S/T) motif, and manual inspection of the spectra in these cases suggested that the Nϩ1 modification was incorrectly automatically assigned to an asparagine residue outside of this sequence. In 58 of these 78, the aspara-gine to which Nϩ1 was automatically assigned lay adjacent to an asparagine within the context of a potential N-linked glycosylation site (i.e. in an NNX(S/T) or NN(S/T) sequence). Thus, although manual inspection of the fragmentation and sequence data allowed reassignment of the Nϩ1 modification to an asparagine residue that is consistent with glycosylation in most of these peptides, this error remains a significant source of false negatives when data from peptides deglycosylated with PNGase F are analyzed in an automated manner. It should be emphasized that both the false positive and false negative errors described here with the assignment of Nϩ1 may be particular to the particular combination of instrument and/or analysis software used in this study and may be less of a problem when instruments providing higher mass accuracy and/or resolution are used.
Given these problems with the automated assignment of Nϩ1, we also examined data from peptides deglycosylated with Endo H for the same type of potential errors. In terms of   Table IV. false positives, no peptides derived from known lysosomal proteins were assigned Nϩ203 when data derived from a non-deglycosylated sample were analyzed automatically. In terms of false negatives, the Nϩ203 modification was assigned to a total of 2451 peptides of which only one was assigned the Nϩ203 modification outside of NX(S/T). In this case, Nϩ203 was assigned to an asparagine adjacent to NX(S/T) (Tables III and IV), and this assignment error was easily detected and corrected manually. This is a significantly lower incidence of false negative errors than observed with PNGase F, thus we chose to conduct subsequent analyses with Endo H. This glycosidase also has advantages for our purposes in that the Nϩ203 modification occurs only by deglycosylation and because Endo H specifically cleaves only those classes of carbohydrate that are capable of receiving the Man-6-P modification (see below and "Discussion"). A number of peptides deglycosylated with Endo H were assigned Nϩ1 both in NX(S/T) and outside of this motif (Tables  III and IV). Nϩ1 modification of these peptides could reflect spectra interpretation error or deamidation, although the possibility of a low level of contaminant PNGase F-like activity within the Endo H preparation cannot be excluded. During the synthesis of N-linked oligosaccharides containing Man-6-P, mannose residues in the ␣1,6 branch are normally phosphorylated before mannose residues positioned within the ␣1,3 branch (18) as part of a highly ordered process reflecting the spatial organization of the oligosaccharide processing enzymes in the ER, transitional elements, and the Golgi apparatus. Phosphorylation of mannose residues within the ␣1,6 branch prevents high mannose-type units from being converted to Endo H-resistant structures by impairing ␣-mannosidase action. However, a mutant mouse lymphoma cell line that has a glycosylation defect resulting in the synthesis of aberrant Endo H-resistant N-linked glycans (19) that have a truncated core mannose ␣1,6 branch (20) is still able to confer the Man-6-P modification to the ␣1,3 branch of the oligosaccharide (21). Therefore, although most Man-6-P-containing glycans from normal mouse cells are Endo H-sensitive (22,23), we were concerned that in brain there may exist some oligosaccharides containing Man-6-P that are Endo H-resistant. We therefore compared the efficiency of Endo H and PNGase F in removing Man-6-P oligosaccharides. We deglycosylated human brain MPR-purified proteins with each enzyme, fractionated the digest by SDS-PAGE, transferred to nitrocellulose, and used radiolabeled MPR as an affinity probe to detect proteins retaining Man-6-P (Fig. 2). Treatment with both PNGase F and Endo H resulted in a complete loss of detectable Man-6-P glycoproteins indicating that Endo Hresistant Man-6-P oligosaccharides are not present.
In our analysis of Endo H-deglycosylated MPR affinitypurified human brain glycopeptides, 2973 peptides were assigned to known lysosomal proteins, and of these, 522 (17.6%) lacked Nϩ203. These peptides likely represent abundant peptides that are not fully removed but are presumably depleted by affinity purification. To estimate the extent of this depletion, we measured their abundance in both the unfractionated digest of purified brain proteins and in the purified Man-6-P glycopeptide fraction by the integration of extracted ion chromatograms representing the different charge states of select contaminant peptides (peaks were assigned on the basis of retention time and MS/MS). Many of the contaminant peaks were abundant in the unfractionated digest but present at levels approximating background in the Man-6-P glycopeptide fraction and were thus not quantifiable. For peptides whose abundance could be estimated in the Man-6-P glycopeptide fraction, their levels were found to range from 0.06 to 0.45% of starting levels, representing a significant depletion (more than ϳ200-fold).
In total, 99 different sites of Man-6-phosphorylation were identified from the mouse and human MPR affinity-purified mixtures after deglycosylation with Endo H (Table V) corresponding to 42 of the 49 known lysosomal proteins identified in the unfractionated mixtures (Table I). For a number of these lysosomal proteins, the location of N-linked oligosaccharides that contain Man-6-P have been experimentally determined in site-directed mutagenesis experiments, allowing us to evaluate our approach. Human GUSB (see Table I for gene nomenclature) is glycosylated at all four potential N-linked glycosylation sites, but only two are important for intracellular trafficking and are thus thought to contain Man-6-P (24). The single GUSB peptide identified here contains one of these two N-linked glycosylation sites. ASAH has six potential N-linked glycosylation sites of which the N-terminal five are used, and sites 1, 3, and 5 from the N terminus are thought to contain  a Sites of Man-6-phosphorylation contained within peptides that were also assigned during analysis of the non-deglycosylated, unfractionated proteolytic digests of purified brain proteins. The presence of oligosaccharides at these sites is therefore heterogeneous. One spectra was assigned to murine palmitoyl-protein thioesterase 2, a protein not identified in the unfractionated proteolytic digests. In this case, we could not determine whether there exists glycosylation heterogeneity at the site of Man-6-phosphorylation.
b Potential sites of Man-6-phosphorylation that were found on doubly glycosylated peptides for which singly glycosylated equivalents were not identified. These sites clearly contain Endo H-sensitive carbohydrates (indicated by the Nϩ203 modification), but assignments of Man-6-phosphorylation should be regarded as tentative.

TABLE V Sites of mannose 6-phosphorylation in Endo H-deglycosylated brain MPR affinity-purified proteins that are known to have lysosomal function
In most cases, N-linked glycosylation sites are conserved between mouse and humans; however, in some cases, this is not the case and non-conserved sites are indicated by "Not present." Blank entries indicate that spectra were not assigned to the peptide containing the indicated site of Man-6-phosphorylation. The score representing the most confident peptide assignment (lowest log(e)) is indicated in parentheses. (score)

Protein
Human site (score) Mouse site (score) Man-6-P (25); all three were identified here. Similarly Man-6-P glycopeptides corresponding to known sites for Man-6-phosphorylation were also found for HEXA (26), CLN1 (27), CTSL (28), PPGB (29), CTSC (30), and HEXB (31). In some cases, we also identified glycosylation sites that were not found to contain Man-6-P in previous studies. Most likely, these sites are Man-6-phosphorylated at levels below the threshold of detection of site-directed mutagenesis studies, but it is possible that these differences reflect differences in the source of lysosomal proteins (cell culture medium versus brain). Regardless it is clear that our global results and earlier studies of Man-6-phosphorylation of individual proteins are in excellent agreement.
Of the 99 Man-6-P glycosylation sites identified in mouse and human brain, 34 were assigned in both species, 14 were assigned only in mouse, and 51 were assigned only in human. (The degree of similarity is probably higher between the two species as the data here likely reflect the more extensive analysis of the human versus mouse sample; see "Experimental Procedures"). The majority of the N-linked glycosylation sites containing Man-6-phosphorylated oligosaccharides are conserved between mice and humans, although 10 were not, and these are indicated in Table V.
In the analysis of the human brain sample, a small number of peptides were identified that contained more than one NX(S/T) motif with an Nϩ203 residue modification, indicating the presence of two Endo H-sensitive N-linked carbohydrates. In such cases, either or both carbohydrates can contain Man-6-P. For some of these doubly glycosylated peptides, glycosylation was found to be heterogeneous, and we also found the same peptide containing only a single Nϩ203 modification, allowing us to unambiguously assign one of the two potential Man-6-phosphorylation sites. In other cases, cleavage with alternate proteases between the two potential sites of Man-6-phosphorylation allowed unambiguous assignment of at least one site. Some potential Man-6-P sites were identified only in doubly glycosylated peptides. Although it is likely that both of such sites actually contain Man-6-P, we cannot formally prove this, and such cases are indicated in Tables V and VI. A number of the MPR-purified proteins that have not been assigned lysosomal function or localization were found to contain Man-6-phosphorylated oligosaccharides. These are listed in Table VI and described individually below.
Acid Sphingomyelinase-like Phosphodiesterase 3A (SMPDL3A)-SMPDL3A was first identified in a yeast twohybrid screen for proteins interacting with a bladder tumor suppressor (32). SMPDL3A is 29% identical and 44% similar to acid sphingomyelinase, a known lysosomal protein, and we identified a single, conserved site of Man-6-phosphorylation in both mouse and human brain. This fact together with the observation that it contains Man-6-P makes SMPDL3A an excellent candidate for a protein with novel lysosomal function.
Acyloxyacyl Hydrolase (AOAH)-A tryptic Man-6-P glycopeptide derived from AOAH was found here, although the protein itself was not detected in the total unfractionated mixture of MPR affinity-purified proteins from human brain. AOAH is a highly conserved lipase found in immune cells of myeloid lineage that deacylates bacterial lipopolysaccharides and other substrates (33). AOAH is also found in urine where it is secreted by proximal tubule cells and taken up by bladder cells in a process that in vitro is inhibited by Man-6-P (34), which is in accordance with the identification of a Man-6-P glycopeptide here. The hydrolytic function of AOAH, partial similarity to lysosomal sphingolipid activator proteins (saposins), presence of Man-6-P, and the fact that the protein is processed upon cellular uptake (presumably upon MPR-mediated delivery to the lysosome (35)) make AOAH another promising lysosomal candidate.
Arylsulfatase G (6330406P08Rik)-A single site of Man-6phosphorylation was identified for predicted protein arylsulfatase G in mouse brain.
Biotinidase (BTD)-BTD (36) generates free biotin, a cofactor required for four cellular carboxylases, by hydrolysis of biocytin or biotinyl peptides derived from the diet or from the turnover of biotinylated cellular proteins. BTD deficiency causes multiple carboxylase deficiency in humans. We identified two sites of mannose 6-phosphorylation in BTD, and a potential lysosomal role for BTD is supported by the observations that BTD has an acidic pH optima (37) and that biotinylated carboxylases are know to be degraded in the lysosome (38). Immunohistochemical detection of BTD reveals a punctate cytoplasmic distribution that is consistent with lysosomal localization, although results reported from subcellular fractionation were interpreted to suggest a microsomal localization for the bulk of cellular BTD activity (38).
Calumenin (CALU)-CALU is a calcium binding protein that is both secreted and found within the cell throughout the secretory system (39). The single Man-6-P site identified here in mouse brain is known to be glycosylated (39).
Cat Eye Syndrome Critical Region 1 (CECR1)-Two sites of Man-6 phosphorylation were identified for CECR1, a protein of unknown function with some similarity to invertebrate growth factors and to adenosine deaminase (40). CECR1 has a predicted signal sequence and maps to a region that is duplicated in a developmental disorder, cat eye syndrome. Overexpression of CECR1 has been suggested to play a role in this disease (40).
Cellular Repressor of E1A-stimulated Gene Expression (CREG)-Two sites of Man-6-phosphorylation were identified for CREG. The presence of Man-6-P-containing oligosaccharides was inferred in a previous study in which CREG was shown to bind to the CI-MPR (41).
Epididymis-specific ␣-Mannosidase (MAN2B2)-Two Man-6-phosphorylation sites were observed for MAN2B2. This observation together with its catalytic function suggests that this protein almost certainly has lysosomal function.

GDP-fucose Protein O-Fucosyltransferase 2 (POFUT2)-
POFUT2 is an uncharacterized protein with some resemblance (22% identical, 37% similar) to protein O-fucosyltransferase 1 (POFUT1), a soluble protein thought to be localized within the endoplasmic reticulum (42). A single conserved site of Man-6-phosphorylation was found in both mouse and human POFUT2.
Hypothetical Protein LOC196463 (LOC196463)-LOC196463 has been identified as a potential Man-6-P glycoprotein in human brain (7) and is a predicted protein of unknown function with some similarity to Dictyostelium discoideum phos-pholipase B. We found here that five of the six potential N-linked glycosylation sites contain Man-6-P. LOC196463 has also been identified recently in neuromelanin granules from human brain that are subcellular compartments rich in lysosomal proteins (43) and as an MPR affinity-purified protein secreted from murine MPR-deficient fibroblasts (8). These observations together with the finding that LOC196463 contains numerous Man-6-phosphorylated oligosaccharides are highly suggestive that this protein actually represents a lysosomal protein. Two of the sites of Man-6-phosphorylation of LOC196463 were conserved between mouse and human.

Sites of mannose 6-phosphorylation in Endo H-deglycosylated brain MPR affinity-purified proteins in proteins that are not known to have lysosomal function
In most cases, N-linked glycosylation sites are conserved between mouse and humans; however, in some cases, this is not the case and non-conserved sites are indicated by "Not present." Blank entries indicate that spectra were not assigned to the peptide containing the indicated site of Man-6-phosphorylation. The score representing the most confident peptide assignment (lowest log(e)) is indicated in parentheses.

Protein
Human a Sites of Man-6-phosphorylation contained within peptides that were also assigned during analysis of the non-deglycosylated, unfractionated proteolytic digests of purified brain proteins. The presence of oligosaccharides at these sites is therefore heterogeneous.
b Some spectra were assigned to proteins that were not identified in the unfractionated proteolytic digests. In these cases, we could not determine whether there exists glycosylation heterogeneity at the sites of Man-6-phosphorylation.
LOC196463 and FLJ22662 (see below) may well represent a novel class of lysosomal phospholipases.
Hypothetical Protein FLJ22662 (FLJ22662)-FLJ22662 is a predicted protein observed previously (7) that has considerable sequence similarity to LOC196463 (33% identical, 48% similar). A single Man-6-phosphorylation site was identified here in FLJ22662 purified from human brain.
Mammalian Ependymin-related Protein; Up-regulated in Colon Cancer I (EPDR1)-EPDR1 was first identified as a potential Man-6-P glycoprotein in rat brain (3) and is a protein of unknown function that bears some similarity to fish ependymins, which are soluble proteins reported to be associated with extracellular matrix (44) and involved in memory consolidation (45). The human ortholog has been proposed to be a transmembrane protein (46), but here we verified in both mouse and human that EPDR1 is a soluble protein that contains Man-6-P. A lysosomal function for EPDR1 is supported by the observation that this protein, like many known lysosomal proteins, is also identified in neuromelanin granules from human brain (43) and by the fact that it is detected as an MPR affinity-purified protein that is secreted from MPR-deficient mouse fibroblasts (8).
Microsomal Stress 70 Protein ATPase (STCH)-STCH is a microsomal membrane-associated member of the heat shock protein 70 family (47) found here to contain a single N-linked glycosylation site containing Man-6-P. Given its likely function as an endoplasmic reticulum chaperone, the significance of Man-6-phosphorylation of STCH is not clear.
Myelin-associated Glycoprotein (MAG)-MAG is a minor but necessary component of myelin (48). As a sialic acidbinding lectin, we originally classified MAG as a likely specific contaminant (7) in human brain MPR-purified samples; however, we find here that this protein actually appears to contain Man-6-P.
N-Acetyllactosaminide ␤-1,3-N-Acetylglucosaminyltransferase (B3GNT6)-B3GNT6 is involved in the synthesis of poly-N-acetyllactosamine (49). It is a predicted type II transmembrane protein, although detection here as a soluble Man-6-P glycoprotein suggests that this assignment may not be correct or that B3GNT6 may be proteolytically cleaved into transmembrane and soluble fragments.
Neuroserpin (SERPINI1)-Not detected by LC-MS analysis of the total unfractionated mixture of MPR-purified proteins from human brain, two sites of Man-6-phosphorylation were identified for this protein. Missense mutations in SERPINI1 can result in polymeric aggregation of defective protein in cytoplasmic inclusions with a subsequent autosomal dominant familial encephalopathy (50). Identification of neuroserpin as a Man-6-P glycoprotein raises the intriguing possibility that such disease may actually represent a lysosomal storage disease.
Prostaglandin H 2 Isomerase (PTGDS)-PTGDS is a multifunctional protein that is responsible for the conversion of prostaglandin H 2 to prostaglandin D 2 (52) and has also been reported to be a retinoic acid transporter (53). Identification of a single Man-6-phosphorylation site that is conserved between mouse and human suggests a lysosomal localization for this protein, and in support of this, an electron microscopic study (54) detected PTGDS in lysosomes, multivesicular bodies, and endocytic vesicles in some cell types including meningeal macrophages and parenchymal microglia.
Ribonuclease T2 (RNASET2)-RNASET2, given its catabolic function, represents a promising candidate for an enzyme with lysosomal function.
SPARC-like Protein 1 (SPARCL1)-SPARCL1 is a secreted acidic protein thought to play a role in modulating lymphocyte adhesion to basement membranes (55).
Hypothetical Protein DKFZp313G1735, Telethon Sulfatase, Bone Rel Sulfatase (DKFZp313G1735)-DKFZp313G1735 is a predicted protein with homology to sulfatases. The presence of a number of other sulfatases within the lysosome is well documented; thus this protein represents another promising lysosomal candidate.
Sulfatase-modifying Factor 2 Precursor (SUMF2)-SUMF2 is a paralog of SUMF1, an enzyme responsible for generation of the unique amino acid C-formylglycine in the active site of a number of microsomal and lysosomal sulfatases and in which defects are responsible for multiple sulfatase deficiency. SUMF2 is known to be glycosylated at its only potential N-linked glycosylation site (56), which we found also to contain Man-6-P. Although most of the cellular SUMF2 appears to be located within the ER, it is possible that some proportion is transported to the lysosome to play a specific role in lysosomal sulfatase function.
Vasopressin-Neurophysin 2 Copeptin (AVP)-AVP is a glycosylated peptide hormone precursor (57). DISCUSSION A principal aim of this study was to develop and validate a mass spectrometric method for the direct demonstration of the Man-6-P modification on glycoproteins and for identification of the sites of modification. Given that the Man-6-P modification is characteristic of most soluble lysosomal proteins, these represent important steps in the characterization of the lysosomal proteome and will allow selection of candidate proteins for further, in-depth investigation of lysosomal localization. Although we identified a number of excellent lysosomal candidates (see "Results"), we also found Man-6-P on a number of proteins for which lysosomal function or localization appears highly unlikely. For example, several Man-6-P glycoproteins identified here have been proposed to function in the ER, including STCH, SUMF2, and POFUT2, thus the significance of the Man-6-P modification is unclear. If these do actually represent authentic ER proteins, one possibility is that they may escape the ER retention apparatus and then become Man-6-phosphorylated in the transitional elements/cis-Golgi apparatus before retrieval back to the ER prior to encountering the MPRs in the trans-Golgi. Although the three proteins identified here lack the ER retrieval signal KDEL, the demonstration that cathepsin D engineered to contain this sequence is both ER-resident and Man-6-phosphorylated (58) provides general support for this possibility.
Sites of Man-6-phosphorylation have been experimentally determined for a number of individual lysosomal proteins, generally by site-directed mutagenesis of potential N-linked glycosylation sites of proteins expressed in vitro and analyses of the effects of these mutations on protein targeting and processing (24 -29, 31). These studies have provided useful data on individual lysosomal proteins on a case-by-case basis but have drawbacks. They are limited to a small subset of lysosomal proteins and frequently rely on surrogate markers for Man-6-phosphorylation (e.g. secretion versus retention within the cell) that may not clearly identify sites where either glycosylation or the presence of Man-6-P may be variable modifications. An additional potential problem is that Man-6phosphorylation and intracellular targeting of lysosomal proteins overexpressed in cell culture systems is not necessarily the same as seen in vivo (59,60). Therefore, another major aim of this study was to globally map Man-6-phosphorylation sites of lysosomal proteins purified from a source that is more physiologically relevant.
We assigned a total of 85 individual sites of Man-6-phosphorylation to 40 known human lysosomal proteins and 48 sites to 31 mouse lysosomal proteins. Taken together, this represents a total of 99 sites of Man-6-phosphorylation representing 42 individual proteins. As would be predicted, there is a significant incidence of false negatives in terms of overall assignment of Man-6-P-containing glycoproteins from the glycopeptide analysis. We estimate the false negative incidence to be ϳ8 and 19% for human and mouse brain, respectively, based on the assignment of Endo H-deglycosylated MPR affinity-purified peptides to known lysosomal proteins. Failure to detect any given Man-6-phosphorylation site may simply reflect low peptide abundance. Alternatively the specific properties (e.g. sequence and size) of a given peptide may render it prone to selective loss during sample preparation or cause it to be missed by the mass spectrometer because it is poorly ionized or it exceeds the mass range constraints of the instrument. Such peptide-specific issues may be addressed in part by conducting different proteolytic digests, resulting in peptides with partially overlapping sequences that may increase the probability of detecting a glycosylation site. It is also worth noting that when identifying proteins containing Man-6-P, the relative probability of assigning at least one Man-6-P glycopeptide to a protein in-creases exponentially in proportion to the number of Man-6-P sites within this protein. For this reason, proteins containing a single site of Man-6-phosphorylation are particularly likely to represent false negatives.
In this study, we evaluated two different enzymes for the deglycosylation of purified glycopeptides, PNGase F and Endo H, and found that the latter has distinct advantages for our purposes. First, the Nϩ203 mass increment observed after Endo H digestion results from an N-acetylglucosamine residue that remains after the removal of the rest of an oligosaccharide; thus this is a highly specific marker for N-linked glycans that is difficult to incorrectly assign even when data are analyzed in an automated manner. In contrast, the Nϩ1 modification associated with the removal of N-linked glycans by PNGase F can be confused with deamidation, and others have found it necessary to use isotopic labeling to distinguish between Nϩ1 as a result of deamidation versus deglycosylation (14). In addition, automated data analysis can lead to incorrect assignment of Nϩ1 to peptides that may not actually contain this modification (see "Results"). Second, Endo H provides an additional level of specificity in the analysis of Man-6-phosphorylation sites. Previous studies have found that Man-6-phosphorylation blocks further carbohydrate processing to Endo H-resistant structures, and thus the vast majority of Man-6-P-containing glycans are Endo H-sensitive (22,23) (Fig. 2). 2 PNGase F, in contrast, cleaves all N-linked oligosaccharides including complex type sugars that do not contain Man-6-P.
Manual examination of the spectra derived from glycopeptides deglycosylated with Endo H revealed that in many cases prominent peaks were present that represent peptides after neutral loss of the 203-Da N-acetylglucosamine (Fig. 1). This represents another significant advantage of Endo H as the presence of this neutral loss ion provides a clear mass fingerprint for glycosylation. In addition, it may be possible in the future to exploit the presence of this characteristic neutral loss as a trigger for MS 3 scans that should be specific for deglycosylated peptides.
One potential problem in using Endo H is that it is possible that some peptides containing a Man-6-phosphorylated oligosaccharide may also contain an additional Endo H-resistant glycan. In this case, the glycopeptide would not be detected using our current methods after Endo H treatment. However, in an analysis of MPR affinity-purified glycopeptides that were sequentially deglycosylated with Endo H and then PNGase F, we observed no peptides that contained both Nϩ203 and Nϩ1 in the context of NX(S/T) (data not shown). For our sample at least, the presence of Endo H-resistant glycans does not therefore appear to be a problem.
A number of previous studies have investigated protein determinants for the selection of carbohydrates for Man-6phosphorylation by site-directed mutagenesis of proteins that acquire Man-6-P in cell culture, including DNase 1 (61,62), cathepsin D (63)(64)(65)(66), cathepsin L (67), aspartylglucosaminidase (68), cathepsin D/pepsinogen chimeras (69), and arylsulfatases (70 -72). A general consensus of these studies is that spatially distinct amino acids, in particular lysine residues, form part of a broad three-dimensional patch on the surface of protein molecules that interacts with lysosomal protein UDP-N-acetylglucosamine phosphotransferase, orienting this enzyme for the recognition and phosphorylation of specific oligosaccharides (63,69,73). The generation of a global database of Man-6-phosphorylation sites in this study, especially when considered in conjunction with structural information, is likely to prove valuable in future studies of determinants for Man-6-phosphorylation and in validating the models arising from individual case studies.
Glycosylation is often a variable modification; thus an individual gene product may be represented both by molecules that are glycosylated at a given site and others that are not. For the known human lysosomal proteins, we found that 19 different peptides of the 99 that were assigned the Nϩ203 modification were also found without this modification in the unfractionated MPR affinity-purified protein digest indicating that the presence of glycosylation is heterogeneous in these cases. Although we cannot exclude the possibility that there are cases where we may not sample a non-glycosylated peptide in the analysis of the total peptide mixture, detection of the same peptide in both the glycosylated and unglycosylated form is an unambiguous determination of heterogeneity in glycosylation site utilization. One example is NPC2, which has three NX(S/T); the first (in order from the N terminus) is never glycosylated, the second always contains high mannose oligosaccharides, and the third is either not glycosylated or may contain high mannose or complex type oligosaccharides (74) (also data not shown). 3 In this study, the first Nlinked glycosylation site of NPC2 was not found to be glycosylated, the second was only detected with Endo-H-cleavable oligosaccharides, and the third was heterogeneous in terms of glycosylation (Fig. 1). These findings are in complete accordance with the previous observations. The ability to recognize sites of Man-6-phosphorylation on lysosomal proteins may also have therapeutic implications for gene and enzyme replacement therapies for human lysosomal storage diseases. Central to the gene therapy of lysosomal storage diseases is the concept that not all cells will be transduced with a recombinant virus, but surrounding cells will be cross-protected by MPR-mediated uptake of recombinant protein secreted from the transduced cells. In developing gene therapy approaches, it may therefore be possible to optimize protein spread by adjusting the balance between secretion, lysosomal targeting, and uptake. Key to this balance is the level of Man-6-phosphorylation, and lysosomal proteins that contain physiological levels of Man-6-phospho-rylated carbohydrates are generally efficiently targeted to the lysosome (75,76). Expression of highly Man-6-phosphorylated recombinant proteins would promote efficient lysosomal targeting, rather than secretion, in transduced cells as well as highly efficient endocytosis in neighboring cells and would thus result in diminished spread. Selective mutagenesis of some (but not all) Man-6-phosphorylation sites, decreasing the efficiency of lysosomal targeting and thus promoting secretion from transduced cells and reduced uptake by adjacent cells, may therefore allow for greater intracellular spread before eventual MPR-mediated endocytosis and delivery to the lysosome. Similar arguments can be made for this sort of protein engineering to optimize enzyme replacement therapies that rely on MPR-mediated uptake and delivery of recombinant proteins. Conversely an understanding of protein determinants may allow for the increased selection of Nlinked glycans for Man-6-phosphorylation; this could also be of benefit, depending on the disease and protein in question, especially for lysosome enzyme replacement therapies (77,78). Detailed studies of individual lysosomal proteins will be required to thoroughly understand the effects of modulation of Man-6-phosphorylation in terms of protein structure and function and in terms of therapeutic efficacy.