Mass Spectrometric Characterization of Proteins from the SARS Virus

A new coronavirus has been implicated as the causative agent of severe acute respiratory syndrome (SARS). We have used convalescent sera from several SARS patients to detect proteins in the culture supernatants from cells exposed to lavage another SARS patient. The most prominent protein in the supernatant was identified by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) as a ∼46-kDa species. This was found to be a novel nucleocapsid protein that matched almost exactly one predicted by an open reading frame in the recently published nucleotide sequence of the same virus isolate (>96% coverage). A second viral protein corresponding to the predicted ∼139-kDa spike glycoprotein has also been examined by MALDI-TOF MS (42% coverage). After peptide N-glycosidase F digestion, 12 glycosylation sites in this protein were confirmed. The sugars attached to four of the sites were also identified. These results suggest that the nucleocapsid protein is a major immunogen that may be useful for early diagnostics, and that the spike glycoprotein may present a particularly attractive target for prophylactic intervention in combating SARS.

The recent clinical identification of a novel type of atypical pneumonia without a clearly defined etiology, together with epidemiological evidence of high transmissibility, have provoked the World Health Organization to issue a rare travel advisory. The new entity has been called severe acute respiratory syndrome (SARS) 1 ; it apparently began in Guangdong province in China in November of 2002 and has since spread to Hong Kong, Singapore, Vietnam, Canada, the U.S., Taiwan, and several European countries.
The outbreak in Canada began in late February 2003 in a traveler returning from Hong Kong whose exposure was to the index case in the Hong Kong epidemic (a physician who had cared for SARS cases in Guangdong province in the People's Republic of China). The Canadian index case died 9 days after the disease onset, and a 43-year-old male relative became ill 2 days after exposure and died of the adult respiratory distress syndrome 15 days after the illness began (1). Subsequently, Canada has faced the largest SARS outbreak outside of Asia, with at least 351 probable or suspected cases and 27 deaths, mostly in the Toronto area (2,3).
Samples from patients with suspected or probable SARS in Canada have been referred to the National Microbiology Laboratory (NML), Health Canada, for laboratory diagnostics. This laboratory, part of the Canadian Science Centre for Human and Animal Health, is Canada's national reference center for infectious diseases and houses the only Class 4 containment facilities in the country. NML has played an active role in an intensive international collaborative effort among 11 laboratories around the world that suggested a distinct coronavirus may be etiologically involved. In particular, the laboratory prepared the nucleotide samples for the first successful effort to determine the genome sequence for the coronavirus (4), a result soon confirmed by several other laboratories (see, for example, Ref. 5).
Nevertheless, the genome sequence merely provides a template for the construction of the viral proteins. Thus, an alternative strategy is to examine the proteins themselves, and mass spectrometry has proved to be an efficient tool for this purpose (6). The University of Manitoba time-of-flight mass spectrometry laboratory has already been active in characterizing viral proteins (7)(8)(9)(10)(11), so it was natural for NML to enlist the university laboratory (late in March) as a collaborator in the analysis of the SARS proteins. The first results of this collaboration are described below.

EXPERIMENTAL PROCEDURES
Preparation of the Primary Material at NML-Clinical specimens obtained from the original case cluster were extensively investigated for the presence of bacterial and viral pathogens (1). Nasopharyngeal swab and bronchoalveolar lavage fluids from several of these patients were found to be positive by reverse transcription-PCR for human metapneumovirus and the novel coronavirus (1). Inoculation of the bronchoalveolar lavage fluid from the 43-year-old male patient in Vero E6 cells produced a strong cytopathic effect on day 4 after infection. The second passage of this viral isolate was further used to produce large quantities of the virus. Initially, this virus material was used to assess its antigenicity with convalescent serum samples from SARS patients. The convalescent sera that were previously found to be positive for antibodies to the virus by indirect immunofluorescence assay 2 strongly reacted in Western blot with a ϳ46-kDa protein (Fig.  1A) similar in size to the nucleocapsid protein of coronaviruses (12).
In order to prepare this (and perhaps other SARS-related proteins) for proteolytic digestion, the virus was purified by 20 -60% linear sucrose gradient. Western blotting of the gradient fractions showed that fraction 4 (density, 1.18g/cm 3 ) reacted strongly with a convalescent serum from a SARS patient. This fraction was run on a Novex 4 -12% Bis-Tris gel in 4-morpholinepropanesulfonic acid running buffer (Invitrogen), and stained with Coomassie blue (Fig. 1B). Two bands were then excised from the gel (indicated by arrowheads), one containing the prominent ϳ46-kDa protein and the other containing a much weaker protein band with an apparent mass of ϳ180 kDa. These were transferred to the university laboratories for in-gel digestion with various proteolytic enzymes.
Proteolytic Digestions-The excised protein bands were in-gel digested with one of three different enzymes (sequencing grade-modified trypsin (Promega, Madison, WI), Lys-C, or Asp-N (both from Roche Molecular Biochemicals)). Digestions were performed according to the procedure described by Shevchenko et al. (13)  The extracts containing the peptide mixture were lyophilized and resuspended in 5.5 l of 0.5% trifluoroacetic acit in water, then 0.5 l of the resulting sample was mixed 1:1 with 2,5-dihydroxybenzoic acid (150 mg/ml in water:acetonitrile 1:1) matrix solution and deposited on the gold surface of a matrix-assisted laser desorption/ionization (MALDI) target. The remaining 5 l was separated into fractions by micro-high-performance liquid chromatography (HPLC), and the individual fractions were deposited on a target for subsequent mass spectrometric analysis.
Chromatography-Chromatographic separations were performed using an Agilent 1100 Series system (Agilent Technologies, Wilmington, DE). Deionized (18 M⍀) water and HPLC-grade acetonitrile were used for the preparation of eluents. Samples (5 l) were injected onto a 150 m ϫ 150 mm column (Vydac 218 TP C18, 5; Grace Vydac, Hesperia, CA) and eluted with a linear gradient of 1-80% acetonitrile (0.1% trifluoroacetic acid) in 60 min. The column effluent (4 l/min) was mixed on-line with dihydroxybenzoic acid matrix solution (0.5 l/min) and deposited by a small computer-controlled robot onto a movable gold target at 1-min intervals (17). The vast majority of the tryptic fragments were eluted within 40 min under the HPLC condition used, so 40 fractions were normally collected.
Glycoprotein Analysis-Our original intention was to postpone any detailed analysis of the higher mass protein to a subsequent investigation. Later, when we decided to include this effort in the present measurements, the only materials that we had available were two lyophilized samples from digests of the larger protein (ϳ180-kDa band), one from a tryptic digest and one from a Lys-C digest. The sample from the tryptic digest was separated by HPLC and used for analysis of the glycosylated peptides. The sample from the Lys-C digest was digested twice more, first by peptide N-glycosidase F (PNGase F; Roche Molecular Biochemicals) to remove the asparagine-linked glycosylation (18), then by trypsin to produce smaller fragments (both digestions in ordinary water).
TOF Mass Spectrometry-The spots on the gold targets were analyzed individually, both by single mass spectrometry (MS) and by tandem mass spectrometry (MS/MS) in the Manitoba/Sciex prototype quadrupole/TOF (QqTOF) mass spectrometer (subsequent commercial model sold as QSTAR by Applied Biosystems/MDS Sciex, Foster City, CA) (19). In this instrument, ions are produced by irradiation of the target with photon pulses from a 20-Hz nitrogen laser (VSL 337ND, Spectra-Physics, Mountain View, CA) with 300 J energy per pulse. Orthogonal injection of ions from the quadrupole into the TOF section normally produce a mass resolving power 10,000 full-width half-maximum and accuracy within a few millidaltons in the TOF spectra in both MS and MS/MS modes, as long as the ion peak is reasonably intense.

RESULTS
Mass Spectra from Proteolytic Digests of the ϳ46-kDa Protein- Fig. 2A shows the m/z spectrum of the mixture of peptides resulting from tryptic digestion of the ϳ46-kDa protein in ordinary water, before HPLC fractionation. Note that T x-y in-2 Y. Li, unpublished data. A, Viral particles pelleted from the supernatant of vero E6 cells exposed to samples derived from patients with SARS (lane 1) or from mock-infected cells processed in a similar fashion (lane 2) were separated by SDS-PAGE and analyzed by Western blot with convalescent sera from SARS patients (Tor 2, Tor 3, Tor 4, and BC 1) or a control serum from a noninfected donor (NML). The sera from the patients, but not the control, reacted with a 44-to 48-kDa species present in the supernatants from the infected but not the mocktreated cultures (indicated by arrowhead). B, A virus sample similar to that described in A was fractionated on a sucrose gradient. The fraction containing immunoreactive material was separated on a 4 -12% bis acrylamide gradient gel and stained with colloidal Coomasie blue. A prominent band with an apparent molecular mass of 44 -48 kDa was observed along with much less intense band at ϳ180 kDa (indicated by arrowheads). These bands were excised and used for the mass spectrometric studies described in this report. dicates a tryptic fragment containing amino acid residues x to y in Fig. 2 and in subsequent tables and discussion. A small region of this spectrum is expanded in Fig. 2B, and an HPLC fraction containing some of the same ions is shown in Fig. 2C. Here, the most intense ion in Fig. 2B has moved to a different fraction, but some of the weaker ions are much more prominent. It is clear that individual peptide peaks are considerably easier to distinguish after HPLC separation; spectra of the fractions are dramatically simpler and have a signal-to-noise ratio improved by a factor ϳ10 or more.
Initial efforts to identify the protein (based on data base searching against the peptide fingerprint), failed to yield any significant matches, suggesting that it was a novel protein. De novo peptide sequencing was therefore undertaken in order to characterize it. For this purpose, samples were digested in the presence of a 50/50 mixture of ordinary water and H 2 18 O, as described above, because the addition of either 18 O or 16 O during enzymatic cleavage yields spectra containing both species and thus distinguishes fragments containing the C terminus from those containing the N terminus by their distinctive isotopic patterns (14 -16). In order to determine the amino acid sequence of the proteolytic fragments, each clearly observed peptide ion was selected in turn as a parent by the mass-selecting quadrupole of the QqTOF instrument and subjected to collisionally induced dissociation in the collision cell. For example, the resulting daughter ion spectrum from the m/z ϭ 2297 parent is shown in Fig. 3, where the advantages of the 16 O/ 18 O addition for distinguishing the Cand N-terminal ions are clearly evident. The y ions, which contain the C terminus, all show the doublet structure superimposed on the usual isotopic pattern, whereas the b ions, containing the N terminus, have a normal pattern. A comparison between the measured m/z values and the masses calculated from the deduced sequence is given in Table I.
Further examples are provided in the supplemental material. Fig. S1 shows the daughter ions from dissociation of the 1144-Da N-terminal peptide, indicating deletion of the Nterminal methionine and acetylation of the resulting N-terminal serine. Fig. S2 shows a comparison between HPLC-separated ions from tryptic and Lys-C digestions, respectively, showing alternate cleavages at adjacent lysines. Fig. S3 shows a spectrum of the parent ion containing the C terminus, the one C-terminal peptide that shows no doublet structure.
A comparison of experimental m/z values and masses calculated for the deduced sequences of all the peptides observed in tryptic digests is given in Table II. In both Tables I  and II, most observed m/z values and the masses calculated for the deduced amino acid sequences agree within ϳ10 mDa, lending credibility to the assignments; the anomalously high values observed for a few ions in Table I correspond to peaks of very low intensity.
The MS and MS/MS measurements just described were applied first to the peptides resulting from tryptic digests of the gel band, listed in Table I, and then to the products of a Lys-C digest. BLAST searching (20,21) of the total Gen-Bank TM protein data base with these peptides was then undertaken in order to search for homology. The most definitive example was provided by the 2297-Da tryptic peptide. In that case, the highest rated results of the BLAST search are shown in Fig. 4; all are coronavirus nucleocapsid proteins, and all yield BLAST scores of 40 to 41, with E values of 0.003. Moreover, the highest rated hit in the BLAST search that is not a coronavirus protein (a bacterial protein, in this case) had a score of only 29 and a high E value of 9.4. Thus, the ϳ46-kDa protein is clearly a coronavirus nucleocapsid protein; indeed, there is complete agreement between the first 10 residues and those found by BLAST in a region of the coronaviruses that is highly conserved. On the other hand, only three out of the next nine residues agree with any of the other viruses, so the SARS virus is significantly different from any of the other coronaviruses. BLAST searches with the other peptides led to similar conclusions; in particular, they strengthened the evi-dence for significant differences between the SARS coronavirus and any other coronavirus in the data base. By April 12, these measurements had been carried out and most of them analyzed, yielding almost complete sequence information on the individual peptides, as summarized in Table II. The task of fitting together the peptides was not yet done, however, because there were still a number of ambiguities in their order. To sort out this problem, an Asp-N digestion had also been carried out (but not yet separated on the HPLC), and Glu-C and perhaps Arg-C digestions were planned as soon as sufficient material was available. However, these measurements turned out to be unnecessary, because at that stage a nucleotide sequence of infectious material (also prepared by NML), was obtained by a group at the Michael Smith Genome Centre in Vancouver (4) (Gen-Bank TM accession number AY274119), soon followed by similar results from several other laboratories (see for example Ref. 5). It soon became clear to us that the open reading frame identified by the Vancouver group as specifying the coronavirus nucleocapsid protein did in fact predict the amino acid sequence of the ϳ46-kDa protein that we were analyzing, as might be expected from the BLAST homology reported above. Consequently, we were able to remove the remaining ambiguities in ordering the proteolytic fragments listed in Table II. A comparison of our results with the predicted sequence is shown in Fig. S4A; the mass spectral data cover more than 96% of the predicted sequence and include both C and N termini. The mass spectra also indicate removal of the N-terminal methionine and oxidation of all other methionines, as well as acetylation of the resulting N-terminal serine, as shown in Fig. S1. The N-terminal deletion and acetylation presumably occur as a result of post-translational modifications (22), which of course could not be predicted by the nucleotide data. Otherwise, our results confirm the predicted sequence (GenBank TM accession number AY274119), a result consistent with the samples being derived from the same infectious source at NML.
Mass Spectra from Proteolytic Digests of the Spike Protein-In addition to the almost completely defined ϳ46-kDa protein, we have partially characterized a protein that ap- FIG. 2. A, Single MS MALDI-QqTOF spectrum of the peptide mixture obtained from tryptic digestion of the 46-kDa protein prior to HPLC fractionation. B, Expanded view of a small section of the MALDI mass spectrum in A. C, The same section of the spectrum obtained from fraction 23, after HPLC separation of the mixture. The peak labels indicate the residue numbers corresponding to the intact protein; in one case loss of 64 Da is indicated. The intense peak corresponding to T277-293 in the mixture is absent in fraction 23 (it elutes in fraction 21), but several weaker peaks that are present, like T389 -405, are significantly enhanced by the HPLC. The improvement helps to identify them and is essential for high mass accuracy and for subsequent MS/MS analysis. Measured and predicted masses for all the tryptic peptides can be found in Table II; ⌬m is less than 10 mDa in nearly every case.

FIG. 3. MS/MS spectrum of tryptic fragment of m/z 2297 after digestion in a 50/50 mixture of normal and 18 O-labeled water.
The complete spectrum is shown in A, with the amino acid sequence indicated between the y-series fragments. An example of a b-series fragment is shown in B, and of a y-series fragment in C. The signature isotopic pattern of fragments containing the C terminus is visible in C. The measured and predicted masses for all identified peaks are shown in Table I. peared as a very weak band at an apparent mass of ϳ180 kDa in the gel separation (Fig. 1B). Despite the low intensity, 39 peptides in the initial tryptic digest were found to belong to the ϳ139-kDa "spike protein" predicted by the nucleotide sequence (GenBank TM accession number AY274119), and 36 of these were sufficiently intense for MS/MS measurements, which confirmed the identification (30% coverage). A summary of the data and the coverage for this protein is given in Table S1.
This protein is homologous to spike proteins in other coronaviruses, which contain a large number of potential glycosylation sites (NXT or NXS). Thus, they are usually assumed to be extensively glycosylated and to act as attachment proteins. Indeed, the predicted sequence of the spike protein of the SARS coronavirus contains 23 of these potential N-glycosylation sites, of which 17 are identified as likely sites by the Netglyc 1.0 server (available at www.cbs.dtu.dk/services/Net-NGlyc/). (O-glycosylation may also be possible, but has not been examined here.) To investigate glycosylation in the spike protein, a tryptic digest was treated with PNGase F to remove the glycans, as described above. This step converts asparagine residues to aspartic acids, thus specifying the corresponding deglycosylated peptides through observation of their mass differences of 0.984 Da per deglycosylated site from the values calculated from the predicted amino acid sequence. This procedure identified nine glycopeptides from observation of their deglycosylated products (Table III) and raised the sequence cov-erage to 42%. MS/MS measurements on the deglycosylated peptides confirmed the predicted single N-glycosylation sites and showed that T111-126, T316 -333, and T1140 -1163 had two glycosylation sites each (Table III). For example, PNGase digestion produced two distinct deglycosylated peptides for T111-126, with molecular ions at m/z 1758 and 1759. MS/MS measurements on the m/z 1758 ion revealed that the parent was glycosylated on Asn 119 only, but similar measurements on the m/z 1759 ion showed that both Asn 118 and Asn 119 were glycosylated in this parent. Another example is presented in Fig. S5, which shows MS and MS/MS spectra of the deglycosylated peptide T222-232, although this experiment did not produce details on the exact nature and composition of the N-glycans. We note that in these measurements no peptides were observed containing possible sites that were not glycosylated, suggesting that some of the other sites may also be modified by glycosylation.
The tryptic digest of the spike protein without PNGase F deglycosylation yielded spectra of four of these glycopeptides that were intense enough (barely) for detailed analysis, as summarized in Table IV. Here the relatively large ⌬m values for the carbohydrate residues originate from peak distortions due to the low intensity. Problems in T226 -287 were especially bothersome; there several glycoforms (indicated by asterisks in Table IV) were detected but could not be measured accurately because of a combination of weak signal and overlap with the 18 O peak resulting from the previous labeling of the C  (23). These compositions each encompass more than one isomeric structure; some examples are given in Fig. 5D. We note that some observed glycoforms may result from in-source fragmentation, which could be reduced by the use of electrospray ionization on the QqTOF instrument (24,25) rather than MALDI, although the results suggest that such fragmentation is not an important factor in the present case (see below). Fig. 5A shows the MALDI mass spectrum of the HPLC fraction containing glycosylated T111-126. This spectrum is interpreted in Table III as containing either one or two glycosylation sites; peaks between m/z 3000 and 3800 show one possible glycosylation site, because compositions allow only one trimannosyl core, but those higher than m/z 3800 possibly correspond to diglycosylated T111-126. Here it is likely that both Asn 119 and Asn 118 are glycosylated with complex structures, with fucosylation on only one of them, as illustrated by an example in Fig. 5D. In Fig. 5B, MS of the glycosylated T222-232 HPLC fraction clearly highlights the presence of high-mannose structures. Here the predominance of (Man) 2 (GlcNAc) 9 , the highest possible form of N-linked highmannose oligosaccharide, suggests that there is little insource fragmentation. Fig. 5C is the tandem mass spectrum of T222-232 with a (Man) 2 (GlcNAc) 9 attachment. This spectrum shows losses of one to five mannose residues, loss of the whole oligosaccharide moiety (m/z 1257.733), loss of the whole moiety minus one GlcNAc (m/z 1460.808), and loss of the carbohydrate residue via a cross-ring cleavage (m/z 1340.768). MS/MS analysis of m/z 3404.572, 3607.682, and 4093.818 ions confirmed the presence of complex glycan structures from observed losses of Gal-GlcNAc moieties (data not shown). MS/MS was also performed on glycoforms of  Table IV (spectra not shown). All MS/MS spectra recorded in this study showed that the preferred fragmentation mode was loss of the entire oligosaccharide rather than loss of one residue at a time, which again argues against extensive in-source fragmentation.
Other candidate peptides were sought in their possible glycosylated forms but were not detected, perhaps because of the low amount of sample. Alternatively, they may not be detectable as positive ions at these low sample levels because of the presence of negatively charged sialic acids; it has been shown that sialylation has a detrimental effect on positive mode ionization, at least in the case of free N-linked oligosaccharides (26 -28). Indeed, the several galactosylated complex oligosaccharide compositions found in this study suggest the undetected presence of sialic acid, because the latter compound attaches to terminal galactose in such structures.
The glycosylation study conducted here is only preliminary and will be followed by more detailed structural analyses involving glycan release, labeling, and MALDI and electrospray MS. A complementary experiment could also involve exoglycosidase digestions of HPLC fractions containing the glycopeptides. Stimson et al. (24) have already shown that detailed structural analysis of glycans may be conducted on low femtomole amounts of glycopeptides from murine prion proteins by a combination of exoglycosidases and electrospray MS.

DISCUSSION
The Nucleocapsid Protein N-Comparison of deduced amino acid sequences of different coronavirus N proteins revealed only ϳ32% identity between the SARS-related coronavirus and known viruses from the three coronavirus clusters. Correspondingly, the phylogenetic tree (29) of the N protein-deduced amino acid sequences indicated that the SARS-related virus is only distantly related to any of the other clusters (Fig. S4B). The evolutionary distance between the viruses, based on this phylogenetic tree analysis, makes it difficult to speculate about the origin of the SARS virus, although recent reports in the media have implicated various wild animals that are used for food in Guangdong, particularly the civet cat, whose genome was not in the data base.
Despite the striking heterogeneity of the SARS corona N protein when compared with other corona nucleoproteins, certain domains seem functionally conserved (30). The SRrich region of SARS N protein resembles that of murine and bovine coronaviruses; in a short stretch of 36 residues (amino acids 176 -212) it contains 14 serines and 7 arginines. The amino acid sequence in this region is highly variable among coronaviruses except for a core motif SRXX for which double or triple repeats are a distinguishable feature among all coronavirus N proteins. This region has been mapped as the RNA binding domain of the N protein. An intriguing feature of SARS N protein is that it contains five SRXX motifs (see shaded amino acid sequence in Fig. S4A); whether that will translate into much higher RNA binding activity remains to be seen. However, this finding supports the concept of a conserved function within the SR-rich domain.
The appearance of a shorter form of the N protein late in infection has been observed with transmissible gastroenteritis, mouse hepatitis virus, feline infections peritonitis virus, bovine coronavirus, avian infectious bronchitis virus, and turkey coronavirus (ϳ2 to 5 kDa less) in cell culture. It has been demonstrated that host cell caspases, which are activated during coronavirus infection, are responsible for this cleavage (31). A common caspase cleavage motif is present in all of the mentioned coronavirus N proteins. Furthermore, the accumulation of the shortened N protein form was correlated with a  Table III). B, MS spectrum of HPLC fraction 25 showing glycosylated forms of T222-232. C, MS/MS spectra of the 3122.373-Da peak from B. D, Suggested high-mannose and complex N-glycan structures, emphasizing possible diglycosylation of T111-126. reduction in virus production by a factor of ϳ100. These observations suggest that cleavage of viral nucleocapsid protein by host cell caspases could be a general mechanism by which infected cells eliminate coronaviruses. Interestingly, no caspase cleavage motif is present in the SARS-related coronavirus N protein.
The Spike Protein S-The spike protein is a major target of the cellular immune response to coronaviruses and plays an important role in the initial stages of infection. It mediates the attachment of the virus to the cell surface receptors and induces the fusion of the viral and cellular membranes.
The importance of N-glycosylation of the attachment proteins has often been highlighted in virus-receptor interactions in several types of virus: • In influenza C and A viruses myxoviruses, Rosenthal et al. showed that N-glycosylation of the hemagglutinin-esterase-fusion proteins can have dramatic effects on immune escape, virulence, and interactions with cellular receptors (32). The hemagglutinin components have been shown to interact with sialic acid moieties on the receptors, and it is known that neuraminidase inhibitors inhibit the replication of influenza viruses A and B (33). • Hepatitis viruses have been shown to bud into the endoplasmic reticulum and depend on N-glycosylation of coat proteins to form infectious virus particles (34,35). Dwek and his colleagues investigated the effect of deoxynojirimycin, an alpha-glucosidase inhibitor, and found that it blocks oligosaccharide processing after monoglucosylation of Asn sites. The glucosylated proteins were shown to misfold. Even at very low inhibitor concentration, viral titers dropped by nearly 100-fold (36). Studies in animals showed that deoxynojirimycin had a negligible effect on host glycosylation (37), and thus drugs such as this alpha-glucosidase inhibitor are seen as good candidates for treatment of hepatitis B. Hepatitis C virus may also respond to these inhibitors (38). However, similar studies using sugar inhibitors on HIV, which has many N-linked sites, showed less sensitivity to misglycosylation (39). • Rossen and coworkers modified the N-glycosylation characteristics of coronavirus spike proteins in cultured epithelial cells and found that N-glycans had an important impact on virus formation and behavior. For example, inhibition of spike N-glycosylation by tunamycin, which inhibits the synthesis of N-glycans, resulted in the synthesis of spikeless virions (40). The same authors also discussed the implications of N-glycosylation of hemagglutinin proteins of epithelial cell coronaviruses (40). We note, however, that the SARS-associated coronavirus genome sequence does not contain a gene encoding hemagglutinin or large genes derived from another virus or host cell (4), although is believed that host range, tissue tropism, and virulence of animal coronaviruses can be changed by mutating the S gene, thus modifying the S proteins (12,41).
It has been shown that sialic acid plays important roles in host-receptor interactions. We therefore plan to study the exact compositions of spike N-linked glycans after detachment by PNGase, because the sialic acid content of a glycoprotein can be determined in an isolated oligosaccharide pool. The study of N-linked glycan structures by MS is well documented (see, for example, Ref. 42), and established methods are available to conduct such analyses.
Possible Therapeutic Applications-The present studies provide the first description of the actual proteins derived from the novel coronavirus thought to be the etiologic agent of SARS. Similar to the pattern observed with animal coronaviruses (43), the 46-kDa nucleoprotein appears to be the major immunogenic antigen, as it was the only viral protein recognized by acute and early convalescent sera from several patients recovering from SARS. While the immune response to the nucleoprotein could serve as an early diagnostic marker for infection, it is unlikely that an immune response to this protein offers protection, because it is an internal protein and neutralizing antibodies are more likely to target the surface proteins (12). However, it has been shown for other coronaviruses that some antigenic peptides of the N protein can be recognized on the surface of infected cells by T cells (12).
The spike glycoprotein is certainly a surface protein, so it may offer an attractive target. Although no drugs with proven efficacy against coronaviruses are currently approved, potential targets exist for new drugs. For example, protease inhibitors could prevent processing of the RNA polymerase or cleavage of the viral S glycoprotein. Finding antibodies against the S glycoprotein or against the unidentified SARS coronavirus receptor are also possible routes to take), and the use of glycosylation inhibitors that have minimal effects on host cells would be an interesting approach (36). Very recently, an important contribution by Hilgenfeld et al. outlined a plan for drug design based on inhibition of the viral main proteinase, called M pro or 3CL pro , which controls the activities of the coronavirus replication complex (44). Ideas for development of vaccines against SARS also include the use of killed or subunit vaccines containing the spike glycoprotein together with other viral proteins.
Why Analyze the Proteins?-The application of de novo sequencing by MS provides an alternative to the usual genomic approach for protein identification. It has the advantage of distinguishing the actual proteins expressed from those that are simply hypothesized or predicted by the nucleotide sequence. It may also be useful to realize that questions of homology can be investigated by examining protein proteolytic fragments even in the complete absence of genome information. Indeed, the results of the BLAST search (Fig. 4), and the conclusion that the 47-kDa protein that NML had isolated was a nucleocapsid protein belonging to an extensively modified coronavirus, were reported at a meeting of the local participants in this investigation on April 3, 2003, more than a week before the nucleotide sequence became available.
Even when the nucleotide sequence is available, analysis of the proteins (which is much easier in that case), provides significant complementary information, particularly on posttranslational modifications (22). The relatively minor modifications observed in the nucleocapsid protein are not particularly newsworthy, but we nevertheless believe that the result is useful in ruling out (probably) such modifications as an explanation of the unusual properties of the virus. However, glycosylation in the SARS spike protein, first investigated here, is more exciting; it is likely to play a key role in attachment of the virus to cell surface receptors, and therefore may have important therapeutic applications, as pointed out above.