A Comprehensive Proteomics and Genomics Analysis Reveals Novel Transmembrane Proteins in Human Platelets and Mouse Megakaryocytes Including G6b-B, a Novel Immunoreceptor Tyrosine-based Inhibitory Motif Protein*S

The platelet surface is poorly characterized due to the low abundance of many membrane proteins and the lack of specialist tools for their investigation. In this study we identified novel human platelet and mouse megakaryocyte membrane proteins using specialist proteomics and genomics approaches. Three separate methods were used to enrich platelet surface proteins prior to identification by liquid chromatography and tandem mass spectrometry: lectin affinity chromatography, biotin/NeutrAvidin affinity chromatography, and free flow electrophoresis. Many known, abundant platelet surface transmembrane proteins and several novel proteins were identified using each receptor enrichment strategy. In total, two or more unique peptides were identified for 46, 68, and 22 surface membrane, intracellular membrane, and membrane proteins of unknown subcellular localization, respectively. The majority of these were single transmembrane proteins. To complement the proteomics studies, we analyzed the transcriptome of a highly purified preparation of mature primary mouse megakaryocytes using serial analysis of gene expression in view of the increasing importance of mutant mouse models in establishing protein function in platelets. This approach identified all of the major classes of platelet transmembrane receptors, including multitransmembrane proteins. Strikingly 17 of the 25 most megakaryocyte-specific genes (relative to 30 other serial analysis of gene expression libraries) were transmembrane proteins, illustrating the unique nature of the megakaryocyte/platelet surface. The list of novel plasma membrane proteins identified using proteomics includes the immunoglobulin superfamily member G6b, which undergoes extensive alternate splicing. Specific antibodies were used to demonstrate expression of the G6b-B isoform, which contains an immunoreceptor tyrosine-based inhibition motif. G6b-B undergoes tyrosine phosphorylation and association with the SH2 domain-containing phosphatase, SHP-1, in stimulated platelets suggesting that it may play a novel role in limiting platelet activation.

The platelet surface is poorly characterized due to the low abundance of many membrane proteins and the lack of specialist tools for their investigation. In this study we identified novel human platelet and mouse megakaryocyte membrane proteins using specialist proteomics and genomics approaches. Three separate methods were used to enrich platelet surface proteins prior to identification by liquid chromatography and tandem mass spectrometry: lectin affinity chromatography, biotin/NeutrAvidin affinity chromatography, and free flow electrophoresis. Many known, abundant platelet surface transmembrane proteins and several novel proteins were identified using each receptor enrichment strategy. In total, two or more unique peptides were identified for 46, 68, and 22 surface membrane, intracellular membrane, and membrane proteins of unknown subcellular localization, respectively. The majority of these were single transmembrane proteins. To complement the proteomics studies, we analyzed the transcriptome of a highly purified preparation of mature primary mouse megakaryocytes using serial analysis of gene expression in view of the increasing impor-tance of mutant mouse models in establishing protein function in platelets. This approach identified all of the major classes of platelet transmembrane receptors, including multitransmembrane proteins. Strikingly 17 of the 25 most megakaryocyte-specific genes (relative to 30 other serial analysis of gene expression libraries) were transmembrane proteins, illustrating the unique nature of the megakaryocyte/platelet surface. The list of novel plasma membrane proteins identified using proteomics includes the immunoglobulin superfamily member G6b, which undergoes extensive alternate splicing. Specific antibodies were used to demonstrate expression of the G6b-B isoform, which contains an immunoreceptor tyrosine-based inhibition motif. G6b-B undergoes tyrosine phosphorylation and association with the SH2 domaincontaining phosphatase, SHP-1, in stimulated platelets suggesting that it may play a novel role in limiting platelet activation. Molecular & Cellular Proteomics 6:548 -564, 2007.
Platelets are small anucleate cells that circulate in the blood in a quiescent state. Their primary physiological function is to stop bleeding from sites of vascular injury by adhering to and forming aggregates on exposed extracellular matrix proteins following blood vessel damage (1,2). The platelet aggregate or "primary hemostatic plug" is consolidated by fibrin polymers produced by thrombin generated on the platelet surface (3).
Platelets express a diverse repertoire of surface receptors that allow them to respond to different stimuli and adhere to a variety of surfaces. The expression levels of platelet surface receptors vary widely with the most abundant being the integrin ␣IIb␤3, which is essential for platelet aggregation. Quiescent human platelets express 40,000 -80,000 copies of ␣IIb␤3 on their surface, which increases by 30 -50% upon platelet activation (4). In contrast, the ADP receptor P2Y 1 is among the least abundant with quiescent human platelets expressing ϳ150 copies on their surface (5).
To fully understand how platelets respond to vessel wall damage we require a comprehensive knowledge of the receptors expressed on their surface. Several novel platelet receptors have been identified in recent years, including the lectin receptor CLEC-2 (6); CD40L (7); Eph kinases and their counter-receptors, ephrins (8,9); cadherins (10); Toll receptors 2, 4, and 9 (11,12); and the single pass transmembrane natriuretic peptide receptor type C (13). These findings suggest that platelets may express additional receptors that have important roles in modulating their function.
Proteomics-based approaches have been used to explore the platelet proteome in its entirety (14 -16) as well as subproteomes, including the phosphoproteome of thrombin-activated platelets (17)(18)(19) and the platelet releasate (20). One class of proteins conspicuously under-represented in the early platelet proteomics studies were transmembrane proteins. This reflects the relatively low abundance of these proteins and also technical difficulties associated with solubilizing and resolving transmembrane proteins in some of the above techniques, most notably two-dimensional gel electrophoresis. More recently, Sickmann and co-workers (21) have characterized the platelet membrane proteome using a combination of density gradient centrifugation and one-dimensional gel electrophoresis (1-DE), 1 and 16-benzyldimethyl-nhexadecylammonium chloride (16-BAC)/SDS-PAGE. This group reported the identification of 83 plasma membrane proteins and 48 proteins localized to other membrane compartments.
The application of molecular techniques to analyze expressed genes in platelets is fraught with difficulties because of the lack of a nucleus and the very low levels of mRNA that are carried over from the megakaryocyte. Thus contamination with mRNA from other cell types is a major issue of concern. Furthermore only 11% of platelet mRNA appears to be derived from genomic DNA; the majority is derived from mitochondrial genes as demonstrated by serial analysis of gene expression (SAGE) (22). These problems can be overcome to a large extent by use of a highly purified, mature population of the platelet precursor cell, the megakaryocyte. These cells contain very high levels of mRNA that includes transcripts for all platelet proteins as illustrated by Kim et al. (23) who used SAGE to analyze mRNA in megakaryocytes derived from human cord blood CD34 ϩ cells.
In this study, we used several membrane protein enrichment techniques, namely lectin and biotin/NeutraAvidin (NA) affinity chromatography and free flow electrophoresis in combination with LC-MS/MS to identify novel receptors in human platelets. We also performed LongSAGE on a population of well characterized, highly purified mature murine megakaryocytes (24). The 21-base pair long LongSAGE sequence tags have the advantage over the 14-base pair tags of standard SAGE in providing more reliable detection of greater than 99% of all expressed genes (25). Moreover SAGE provides a quantitative measure of mRNA expression unlike DNA microarrays (26). We chose to use megakaryocytes rather than platelets as the source of RNA to minimize contamination from other cells and to limit the contribution of mitochondrially derived mRNA (see above). A major advantage of using mouse rather than human megakaryocytes is with regard to the widespread use of mouse models for functional studies, especially as SAGE analysis of mouse megakaryocytes has not been reported. In this study, Ͼ80% of transmembrane proteins identified in human platelets using proteomics were also present in the mouse megakaryocyte LongSAGE library, thereby validating this approach. In total, the present study reports the identification of 136 transmembrane proteins in human platelets based on the identification of two or more unique peptide hits of which just under 100 have yet to be studied in platelets using biochemical or functional means. Determination of the functional roles of these proteins will enable the further understanding of platelet regulation and may identify novel targets for development of new types of antiplatelet agents.
Preparation of Washed Platelets-Washed human platelets were prepared from blood collected from healthy drug-free volunteers as described previously (28). Briefly 9 volumes of blood were collected into 1 volume of 4% (w/v) sodium citrate solution. One volume of ACD solution (1.5% (w/v) citric acid, 2.5% (w/v) sodium citrate, and 1% (w/v) glucose) was added to the anticoagulated blood before centrifugation at 200 ϫ g for 20 min at room temperature. Platelet-rich plasma was collected, 2 nM prostacyclin was added, and the plasma was centrifuged at 1,000 ϫ g for 10 min. Platelets were washed in 25 ml of modified Tyrode's-HEPES buffer, pH 7.3 (134 nM NaCl, 2.9 mM KCl, 20 mM HEPES, 12 mM NaHCO 3 , 1 mM MgCl 2 , 5 mM glucose), containing 3 ml of ACD solution and 1 nM prostacyclin. Platelets were centrifuged at 1,000 ϫ g for 10 min and resuspended at 5 ϫ 10 8 /ml in modified Tyrode's-HEPES buffer. Platelets were counted with a Coulter Z 2 Particle Count and Size Analyzer (Beckman Coulter Ltd., High Wycombe, UK).
WGA Affinity Chromatography-Washed platelets (10 ml at 5 ϫ 10 8 /ml) were lysed with an equal volume of 2ϫ lysis buffer (2% Nonidet P-40, 300 mM NaCl, 20 mM Tris, 10 mM EDTA, pH 7.4) containing protease inhibitors (1 mM 4-(2-aminoethyl)benzenesulfonyl fluoride, 10 g/ml leupeptin, 10 g/ml aprotinin, and 1 g/ml pepstatin A). The platelet lysate was precleared with 2 ml of Sepharose 4B beads for 30 min at 4°C and centrifuged at 10,000 ϫ g for 15 min at 4°C. WGA conjugated to Sepharose 4B (2 ml) was added to the supernatant. The sample was incubated overnight at 4°C with mixing. The WGA resin was transferred to a column and washed three times with 1ϫ lysis buffer. Bound proteins were eluted from the WGA resin with 3 ml of 0.3 M N-acetyl-D-glucosamine and concentrated to 200 l using an Amicon Centriprep YM-10 and Ultrafree 0.5 centrifugal filter devices. A fifth of the volume of 5ϫ SDS-PAGE sample buffer was added to samples and heated to 100°C for 5 min. Samples were prepared in this way in three separate experiments.
Biotinylation of Surface Proteins and Isolation by NeutrAvidin Affinity Chromatography-Platelet surface proteins were biotinylated according to the manufacturer's instructions with a few minor modifications. Platelets (10 ml at 5 ϫ 10 8 /ml) were washed twice with 25 ml of PBS, pH 7.4, containing 1 M prostacyclin. Platelets were then resuspended in 10 ml of 412 M EZ-link sulfo-NHS-SS-biotin in PBS, pH 7.4, for 30 min at room temperature. Unreacted biotinylation reagent was quenched by adding Tris, pH 8.0, to a final concentration of 50 mM; platelets were pelleted at 1,000 ϫ g for 10 min at room temperature; washed twice in 10 ml of 0.025 M Tris, 0.15 M NaCl (TBS), pH 7.4, containing 1 M prostacyclin; and lysed in 500 l of lysis buffer (proprietary) by sonicating on low power at 10-min intervals for 30 min on ice. Lysates were centrifuged at 10,000 ϫ g for 2 min at 4°C to remove cell debris. Clarified supernatants were incubated with 250 l of NA beads for 1 h at room temperature and then centrifuged for 1 min at 1,000 ϫ g. The gel was washed with 3 ϫ 500 l wash buffer (proprietary). Proteins were eluted in 2ϫ sample buffer containing 50 mM DTT and heated to 100°C for 5 min. Samples were prepared in this way in three separate experiments.
Preparation of Platelet Plasma Membranes (PMs) and Intracellular Membranes (IMs) by Free Flow Electrophoresis-Platelet PM and IM were prepared as described in detail previously (29). Briefly platelets were separated from freshly obtained platelet concentrates (National Blood Service, Tooting, London, UK) and treated with neuraminidase (type X, 0.05 units/ml) for 20 min at 37°C. After two washings, platelets were disrupted by sonication, and the platelet homogenate was layered on a linear (1-3.5 M) sorbitol density gradient followed by centrifugation at 42,000 ϫ g for 90 min to obtain a mixed membrane fraction (free of granular contamination). This membrane fraction was separated into PM and IM by free flow electrophoresis using an Octopus electrophoresis apparatus (Dr. Weber GmbH) running at 750 V, 100 mA. Two discrete peaks comprising PM and IM (more electronegative) were obtained. Tops of peaks were pooled; centrifuged (100,000 ϫ g for 60 min); resuspended in 0.4 M sorbitol, 5% glycerol, and 10 mM triethanolamine, pH 7.2; and kept at Ϫ80°C until further analysis. The purity of fractions was checked by analyzing by SDS-PAGE and Western blotting for the absence of actin in IM and of SERCA2 Ca 2ϩ ATPase in PM fractions as described previously (29). Samples were prepared in this way on two separate occasions.
Protein Preparation for MS/MS-Proteins were resolved on 4 -20% Tris-glycine SDS-PAGE gels and stained with Colloidal Coomassie G-250 stain. Twelve to 32 gel slices each with a width of 1-2 mm were manually excised with a razor for subsequent in-gel trypsinization and LC-MS/MS analysis. Bands were excised from three separate WGA affinity purification experiments, three biotin/NA affinity purification experiments, and two free flow electrophoresis (FFE) experiments. Proteins were trypsinized within gel slices, and peptides were extracted using the method described by Shevchenko et al. (30).
LC-MS/MS and Data Analysis-Tryptic peptides were analyzed by LC-MS/MS using a ThermoFinnigan LCQ Deca XP Plus ion trap (Thermo Electron Corp., Hemel Hempstead, UK) coupled to a Dionex/LC Packings nanobore HPLC system (Dionex/LC Packings, Sunnyvale, CA) configured with a 300-m-inner diameter/1-mm C 18 PepMap precolumn (LC Packings, San Francisco, CA) and a 75-minner diameter/15-cm C 18 PepMap analytical column (LC Packings). Tryptic peptides were eluted into the ion trap mass spectrometer using a 45-min 5-95% acetonitrile gradient containing 0.1% formic acid at a flow rate of 200 nl/min. Spectra were acquired in an automatic data-dependent fashion using a full MS scan (400 -2,000 m/z) to determine the five most abundant ions, which were sequentially subjected to MS/MS analysis. Each precursor ion was analyzed twice before it was placed on an exclusion list for 1 min. MS/MS spectra were converted into dta-format files by Bioworks Browser (3.1) and searched against the National Center for Biotechnology non-redundant (NCBInr) database (released April 2004) using the TurboSequest (3.1) search algorithm (ThermoFinnigan). Both the precursor mass tolerance and the fragment mass tolerance were set at 1.4 Da. Two missed tryptic cleavages and carbamidomethylation of cysteine residues as a fixed modification were allowed. Positive peptide hits using TurboSequest had a minimum cross-correlation factor of 2.5, a minimum delta correlation value of 0.25, and a preliminary ranking of one. The same dta-format files generated with the LC-MS/MS ion trap and Bioworks Browser setup were also searched against the NCBInr database using the Mascot 1.8 search algorithm (Matrix Science Ltd., London, UK). Mascot searches were restricted to the human taxonomy allowing carbamidomethyl cysteine as a fixed modification and oxidized methionine as a potential variable modification. Both precursor mass tolerance and MS/MS tolerance were 1.4 Da, allowing for up to two missed cleavages. Positive identification was only accepted when the data satisfied the following criteria: (i) MS/MS data were obtained for at least 80% y-ion series of a peptide comprising at least eight amino acids and no missed tryptic cleavage sites and (ii) MS/MS data with more than 50% y-ions were obtained for two or more different peptides comprising at least eight amino acids and no more than two missed tryptic cleavage sites. Swiss-Prot/TrEMBL accession numbers were obtained for all proteins identified.
MS/MS analysis of tryptic fragments was also carried out with a Q-TOF 1 mass spectrometer (Micromass, Manchester, UK) as a means of verifying proteins identified with the ion trap mass spec-trometer and of improving both protein and proteome coverage by using complementary instruments for the MS/MS analysis (31). The Q-TOF 1 mass spectrometer was coupled to a CapLC HPLC system (Waters, Milford, MA) configured with a 300-m-inner diameter/5-mm C 18 precolumn (LC Packings) and a 75-m-inner diameter/25-cm C 18 PepMap analytical column (LC Packings). Tryptic peptides were eluted to the mass spectrometer using a 45-min 5-95% acetonitrile gradient containing 0.1% formic acid at a flow rate of 200 nl/min. Spectra were acquired in an automatic data-dependent fashion with a 1-s survey scan followed by three 1-s MS/MS scans of the most intense ions. The selected precursor ions were excluded from further analysis for 2 min. MS/MS spectra were converted into pkl-format files using Mass Lynx 3.4 and searched against the NCBInr database with the Mascot search algorithm as described above. All proteins identified by both Sequest and Mascot were checked for predicted transmembrane domains (TMDs) with TMHMM version 2.0 (47).

Construction of Decoy Database and Estimation of the False Positive Rate of Protein Identification by LC-MS/MS-A randomized ver-
sion of the NCBInr database used in this study was generated by a Perl program downloaded from Matrix Science Ltd., decoy.pl. This program was run using the random and append command line switches that appended a random set of sequences, with the same average amino acid composition as those in the original dataset, onto the database. The decoy.pl program was modified to work correctly with the long header format of the NCBInr database. Database searches with all of the dta-format files generated by LC-MS/MS ion trap and Sequest were searched against the decoy database using the same search parameters described above for the original searches. The percent false positive rate of protein identification was calculated by dividing the number of "random" proteins identified by the sum of random and "real" proteins identified and multiplying by 100. The false positive rate was calculated for random proteins identified by two or more peptide hits and for those identified by one peptide hit.
Comparison of Proteomics Datasets-To compare which proteins were common to both our proteomics dataset reported in this study and that of Moebius et al. (21), a non-redundant set of peptide sequences were collected from each study. A total of 295 were obtained from the Moebius et al. (21) study, and 136 were obtained from the present study. All sequences were subsequently BLAST (Basic Local Alignment Search Tool) searched against the Reference Sequence Project peptides. Sixty-two proteins were found to be common to both datasets.
Megakaryocyte Culture and Purification-Bone marrow cells were flushed from femurs and tibias of 3-4-month-old C57Bl6 mice as described previously (24). Mature erythrocytes were lysed with ammonium chloride potassium buffer (0.15 M NH 4 Cl, 1 mM KHCO 3 , 0.1 mM Na 2 EDTA, pH 7.3). CD16/CD32 ϩ Gr1 ϩ B220 ϩ CD11b ϩ cells were depleted using immunomagnetic sheep anti-rat IgG beads and rat anti-mouse antibodies according to the manufacturer's instructions. The cell-depleted population was then cultured in serum-free medium supplemented with 2 mM L-glutamine, 50 units/ml penicillin, 50 g/ml streptomycin, and 20 ng/ml murine stem cell factor at 37°C and 5% CO 2 for 2 days and 5 more days under the same conditions in addition to 200 ng/ml recombinant human thrombopoietin. High density mature megakaryocytes were then isolated in a 0 -3% BSA gradient (4 ml of 3% BSA, PBS in a 15-ml Falcon tube overlaid with 4 ml of 1.5% BSA, PBS and 4 ml of suspension cells in PBS) (32). After standing for 40 min at room temperature, the cells remaining in the lower 2 ml were collected, washed in PBS, and subjected to another 0 -3% BSA gradient to obtain a pure population. DNA content of cells was determined by staining with 50 g/ml propidium iodide and analyzing cells with a FACScan analyzer and CellQuest software (BD Biosciences) as described previously (24).
Serial Analysis of Gene Expression-Primary mouse megakaryocyte RNA was made using the RNeasy Miniprep kit. The LongSAGE library was generated from 20 g of RNA using the I-SAGE Long kit and sequenced by Agencourt Bioscience Corp. (Beverly, MA). Long-SAGE sequence tags were identified using SAGE2000 4.5 Analysis Software with reference to the SAGEmap_tag_ug-rel database (www.ncbi.nlm.nih.gov/SAGE/). To identify megakaryocyte-specific genes, the resulting SAGE library of 53,046 sequence tags was compared with 30 other mouse SAGE libraries from T lymphocyte (14 SAGE libraries), dendritic cells (six SAGE libraries), intraepithelial lymphocytes (two SAGE libraries), embryonic stem cells (two SAGE libraries), brain (two SAGE libraries), B lymphocyte (one SAGE library), heart (one SAGE library), 3T3 fibroblast cell line (one SAGE library), and P19 embryonic carcinoma cell line (one SAGE library) with a combined total of 1,031,389 tags. The data analysis was performed using custom written software (!SAGEClus) as described in Cobbold et al. (33). Genes with predicted TMDs were identified using TMHMM version 2.0 (47).
Platelet Activation, Immunoprecipitations, and Western Blotting-Washed platelets (8 ϫ 10 8 /ml) were stimulated with 10 g/ml CRP or 5 units/ml thrombin for 90 s with constant mixing at 1,200 rpm and 37°C as described previously (28). Platelets were lysed in 2ϫ lysis buffer containing 5 mM sodium vanadate in addition to the protease inhibitors described above. Proteins were immunoprecipitated from platelet lysates with 2 g of rabbit anti-SHP-1 antibody and 10 l of rabbit anti-G6b-B serum. Ten microliters of rabbit preimmune serum were used as a negative control for immunoprecipitations. Membranes were immunoblotted with 1 g/ml anti-phosphotyrosine antibody, 0.2 g/ml anti-SHP-1 antibody, and 1:1,000 rabbit anti-G6b-B antibody as described previously (28,34).
Transient Transfections-Human embryonic kidney (HEK) 293T cells were transfected with 5 g of either pCDNA3.1 plasmid or pCDNA3-G6bB plasmid by the calcium phosphate technique. Cells were lysed in 2ϫ lysis buffer containing protease and phosphatase inhibitors, and proteins were resolved on 4 -20% SDS-PAGE gels and Western blotted with either 1:1,000 rabbit anti-G6b-B serum or 1:1,000 preimmune serum from the same rabbit in which the anti-G6b-B antibody was raised.

Enrichment of Platelet PM Proteins by Affinity Chromatography and Free Flow
Electrophoresis-Three different techniques were used to enrich platelet transmembrane proteins, namely WGA affinity chromatography, biotin/NA affinity chromatography, and FFE. Proteins were subsequently resolved by 1-DE and stained with Colloidal Coomassie Blue, and bands were manually excised and identified by LC-MS/MS. Fragmentation spectra generated by the ion trap and Q-TOF mass spectrometers were searched against the NCBInr database using the Sequest search algorithm and against the NCBInr and Swiss-Prot/TrEMBL databases using the Mascot search algorithm. The use of two different search algorithms and databases increased the number of identified proteins and also helped to safeguard against erroneous identifications (31). All proteins that met the search criteria outlined under "Experimental Procedures," including identification of two or more unique peptides, were investigated for transmembrane domains using TMHMM version 2.0 (47).
The proteins that were identified in this study are divided into PM proteins, IM proteins, and proteins of unknown sub-

Transmembrane proteins localized to the plasma membrane identified by tandem mass spectrometry in human platelets and SAGE analysis in mouse megakaryocytes
Proteins are arranged according to families. Information is given on general function or specific function in platelets where known. Several proteins are predominantly expressed on intracellular membranes and platelet ␣-granules and are translocated to the plasma membrane on activation. General information was obtained from NCBI, Swiss-Prot/TrEMBL, and PubMed databases. The number of transmembrane domains (No. of predicted TMDs) in each protein was predicted with TMHMM version 2.0 (47). The highest number of unique peptides (No. of unique peptides) identified in a single mass spectrometry experiment is shown. The search algorithm (Mascot and/or Sequest) used to identify each protein is indicated as is the method used to enrich transmembrane proteins (biotin/NA, biotinylation and NeutrAvidin affinity chromatography; FFE-IM, free flow electrophoresis, intracellular membrane fraction; FFE-PM, free flow electrophoresis, plasma membrane fraction; WGA, wheat germ agglutinin affinity chromatography). All proteins were identified by two or more peptide hits with at least one of the search algorithms.     cellular distribution in accordance with data from NCBI, Swiss-Prot/TrEMBL, and PubMed (Table I and Supplemental  Tables 1 and 2). The techniques and search algorithms that were used in their identification are also shown in Table I  Because a large proportion of platelet surface proteins are glycosylated, we initially used the lectin WGA to purify platelet glycoproteins followed by elution with N-acetylglucosamine (Fig. 1A) as illustrated for the platelet glycoproteins GPIb␣ and PECAM-1 (Fig. 1B). The distinct staining pattern of the WGApurified sample relative to that of the whole cell lysate confirms that a substantial level of protein purification was achieved, a result that is further supported by comparing the ␣IIb␤3:actin ratio before and after enrichment (Fig. 1A, WCL versus WGA lanes). In total, 21 PM proteins and two IM proteins were identified by two or more peptide hits using this approach (Table I and Supplemental Table 1). This approach also identified a similar number of cytosolic and granule proteins possibly because of association with the cytoplasmic regions of transmembrane proteins or because of their glycosylation (data not shown).
As an alternative approach, exposed lysine residues of platelet surface proteins were labeled with biotin prior to affinity purification with NA beads. The membrane-insoluble biotinylating reagent sulfo-NHS-SS-biotin was used to biotinylate surface proteins and thereby limit labeling of intracellular proteins (35). NA beads were used rather than avidin or streptavidin beads to facilitate removal of bound proteins through the reducing agent DTT. An estimate of the amount of enrichment of transmembrane proteins can be obtained by comparing the ␣IIb␤3:actin and GPIb␤:actin ratios before and after enrichment (Fig. 1A, WCL versus biotin/NA lanes). This approach detected a greater number of proteins than that using WGA chromatography as shown by the increased number of bands in Fig. 1A. This is most likely due to the higher proportion of transmembrane proteins with free lysine residues compared with those that are precipitated by the lectin. Furthermore the high affinity of NA for biotin enables the use of more stringent wash conditions, thereby removing a greater proportion of cytosolic proteins that would interfere with detection of membrane proteins. Thirty-five PM, 14 IM, and five transmembrane proteins of unknown localization were identified by two or more peptide hits using biotin/NA (Table I and  Supplemental Tables 1 and 2). FFE was used to separate PM and IM proteins on the basis of a charge difference generated by treatment of platelets with neuraminidase, which selectively removes sugar residues from the outer plasma membrane (29). The purity of the two FFE fractions was estimated by Western blotting for the absence of actin in IM fractions and of SERCA2 Ca 2ϩ ATPase in PM fractions. The presence of actin in the PM fraction is a consequence of its association with surface glycoproteins, including the GPIb-IX-V complex. The results demonstrate a level of contamination of less than 5% of PM in the IM fraction, which is consistent with our experience of this technique (29). The purity of the two membrane fractions was further supported by the distinct banding pattern of the PM and IM samples; the banding pattern of the PM samples was similar to that obtained using biotin labeling but with a greater number of bands (Fig. 1A). A total of 35 PM, 30 IM, and 10 transmembrane proteins of unknown location were found in the FFE-generated PM sample by a minimum of two peptide hits (Table I and Supplemental Tables 1 and 2) (Table I and Supplemental Tables 1 and 2). Significantly only two of the 44 proteins identified only in the FFE-IM fraction were known PM proteins, further illustrating the successful separation of plasma and intracellular membranes (Tables II and III). The presence of IM proteins in the PM fraction, and vice versa, is therefore most likely due to the presence of proteins in both membrane regions as well as a degree of cross-contamination. The majority of the IM proteins are expressed in the endoplasmic reticulum (Supplemental Table 1).
In total, these three approaches identified 46 PM, 68 IM, and 22 transmembrane proteins of unknown compartmentalization on the basis of identification of two or more unique peptides by MS/MS. A summary of the number of transmembrane proteins identified by each enrichment method and the overlap between the different enrichment methods is provided in Tables II and III. Eighty-three percent of the proteins were identified by both Mascot and Sequest search algorithms, and 60% were identified by more than one enrichment method. Strikingly the 17 proteins identified by all of the enrichment techniques are well known platelet surface transmembrane proteins that are present at high levels (see Table  I). Interestingly only a small number (17%) of the identified PM proteins had more than one predicted transmembrane domain, including the three tetraspanin proteins CD9, Tspan-9, and Tspan-33. On the other hand, there are no seven-transmembrane G protein-coupled receptors in this list, a result that was also found by Moebius et al. (21) who used a combination of density gradient centrifugation, 1-DE, and 16-BAC/SDS-PAGE to purify platelet membranes. Significantly a greater proportion of IM proteins (58%) and proteins of an undefined membrane distribution (59%) are predicted to contain more than one transmembrane domain, suggesting that the lack of identification of multispanning proteins in the PM fraction may be due, in part, to their low abundance. We estimate that just under 100 of the identified proteins have not been described previously in platelets on the basis of biochemical and functional data. Of this list, 10 are hypothetical proteins in that they have not been identified in any cell type. Together these results illustrate the power of using all three approaches to identify platelet membrane proteins.
The false positive rate of protein identification was determined by reanalyzing all of the Sequest dta-format files against a decoy database consisting of the original NCBInr database with a randomized version of the same database appended to the end of it. Scrambled peptides were marked random so that they could be easily distinguished from real proteins. The estimated false positive identification rate was 0.025% for proteins identified by two or more peptide hits, reflecting the stringent settings used in the study and thereby giving increased confidence to the data.
As part of this study, we also identified 45 proteins on the basis of a single unique peptide using the above techniques. These proteins are listed in Supplemental Table 5. The estimation of the false positive rate for this group of proteins was 5% thereby demonstrating the need for supporting biochemical or functional data to confirm their expression in platelets. Nevertheless it is emphasized that several of these proteins are already known to be expressed in platelets, including the ␣5 integrin subunit and the C-type lectin-like receptor CLEC-2.
Identification of G6b-B in Human Platelets: a Novel Tyrosine Phosphorylated ITIM-bearing Protein-One of the novel plate-   let PM proteins is the immunoglobulin superfamily member G6b, which is reported to have seven splice variants, G6b-A to G6b-G (36). Two of these splice variants, G6b-A and G6b-B, have transmembrane domains and have been shown to be expressed on the surface of transiently transfected cells (36). The main difference between these two splice variants is in their cytoplasmic tails. The G6b-A isoform lacks any tyrosine residues in this region, whereas the G6b-B isoform contains an ITIM and therefore has the potential to selectively inhibit signaling by the platelet immunoreceptor tyrosinebased activation motif (ITAM) receptors GPVI and Fc␥RIIA. Three unique peptides were identified for different isoforms of G6b by MS/MS. MS/MS spectra for all three peptides are shown in Fig. 2. One of the peptides (TVLHVLGDR) could have come from any of the seven splice variants. A second peptide (LPPQPIRPLPR) could only have come from G6b-A, whereas the third peptide (IPGDLDQEPSLLYADLDHLALSR) could have come from either G6b-B, -C, or -E. However, neither G6b-C nor G6b-E are predicted to contain transmembrane domains. To clarify the ambiguity of the MS/MS result and determine whether G6b-B is expressed in human platelets, we raised a rabbit polyclonal antibody to peptides found in a portion of the cytosolic tail of G6b-B that is absent from G6b-A and used the antibody to confirm expression of the ITIM-bearing isoform of G6b in platelets by Western blotting (Fig. 3A). Whole cell lysate prepared from HEK 293T cells transiently transfected with G6b-B was used as a positive control (Fig. 3A). The specific antibody identified two bands at 32 and 38 kDa on a 4 -20% SDS-PAGE gel in platelets that are most likely to represent differentially glycosylated isoforms of G6b-B because similar bands were also seen in G6b-B-transfected but not mock-transfected HEK 293T cells (Fig. 3A). Multiple forms of G6b-B that can be separated by SDS-PAGE have been described in transfection studies in other cell types (36).
To investigate a possible functional role for G6b-B in platelets, the protein was immunoprecipitated from resting and stimulated platelets and analyzed for tyrosine phosphorylation. Platelets were stimulated with the GPVI-specific peptide CRP, and the G protein-coupled receptor agonist thrombin. G6b-B was constitutively phosphorylated on tyrosine residues under resting conditions and underwent a small increase in tyrosine phosphorylation upon stimulation by both agonists (Fig. 3B). The tyrosine phosphatase SHP-1, which is regulated by ITIM receptors, was weakly precipitated with G6b-B under FIG. 3. Expression of G6b in human platelets. A, i, whole cell lysates prepared from human platelets and HEK 293T cells transiently transfected with either plasmid alone (mock) or a G6b-B expression plasmid (G6b-B) were Western blotted for G6b-B using a rabbit anti-G6b-B polyclonal antibody raised against two peptides from the cytoplasmic tail of the protein. ii, as a control, the same samples Western blotted in i were blotted with preimmune serum from the same rabbit in which the G6b-B antibody was raised. B, G6b-B undergoes an increase in tyrosine phosphorylation in response to CRP and thrombin stimulation and interacts with SHP-1 in human platelets. G6b-B was immunoprecipitated (IP) from whole cell lysates prepared from resting platelets and platelets stimulated with either 10 g/ml CRP or 5 units/ml thrombin. Samples were Western blotted for tyrosine phosphorylated proteins, then stripped, and blotted for G6b-B followed by SHP-1. C, G6b-B is tyrosine phosphorylated in resting and CRP-and thrombin-activated platelets and interacts with SHP-1. SHP-1 was immunoprecipitated from whole cell lysates prepared from resting platelets and platelets stimulated with either 10 g/ml CRP or 5 units/ml thrombin. Samples were Western blotted for tyrosine phosphorylated proteins, then stripped, and blotted for G6b followed by SHP-1. Results are representative of three experiments. pTyr, phosphotyrosine. basal conditions and more strongly precipitated following stimulation by the two agonists. Importantly G6b-B was also precipitated by an antibody to SHP-1 with the level of G6b-B in the immunoprecipitate increasing upon stimulation with CRP and thrombin (Fig. 3C). Taken together, these results demonstrate that G6b-B associates with SHP-1 in resting and stimulated platelets, consistent with the idea that the immunoglobulin superfamily protein may function as a novel ITIM receptor in platelets.
Identification of Transmembrane Proteins in Mouse Megakaryocytes by SAGE-To complement the proteomics studies, LongSAGE was performed on a highly enriched population of primary mouse bone marrow-derived megakaryocytes that had been allowed to fully differentiate as indicated by the fact that over 95% of cells had ploidy values of 64N or 128N (Fig. 4). The characteristics of this highly purified preparation have been described previously (24). Sequencing of 53,046 SAGE tags identified 8,316 expressed genes of which ϳ1,200 contain transmembrane domains as predicted by TM-HMM version 2.0 (47). Strikingly the total number of transmembrane proteins identified by SAGE was greater than 8 times that identified by proteomics on the basis of two or more unique peptides. Importantly, however, 81% of the proteins identified in the proteomics studies in human platelets were also identified in mouse megakaryocytes by SAGE ( Table I and Supplemental Tables 1 and 2), suggesting a high degree of similarity in the membrane proteomes of human platelets and mouse megakaryocytes. Furthermore the high purity of the SAGE library was verified by the absence of tags for many well known markers of other hematopoietic lineages, including CD3␦, CD3⑀, CD3␥, CD4, and CD8␣ (T cells); CD19, Ig␣, and Ig␤ (B cells); F4/80 (macrophages); and CD16 (macrophages, natural killer cells, neutrophils, and myeloid precursors).
The list of membrane proteins that were identified by SAGE includes nearly all of the known platelet surface proteins, and moreover, for the majority of these, there was a good agreement between the number of SAGE tags and their reported levels of expression (Table I, Supplemental Tables 1 and 2, and data not shown). For example, the major platelet PM protein, integrin ␣IIb (80,000 copies per platelet), was the most abundant PM protein identified by SAGE (136 SAGE tags). The tetraspanin CD9 (45,000 copies; 34 tags) and the GPIb-IX-V complex (25,000 copies; 21, 31, 11, and nine tags for GPIb␣, GPIb␤, GPIX, and GPV, respectively) were intermediate, whereas GPVI (4,000 copies; six tags) and P2Y 1 (150 copies; two tags) had relatively few tags. The near comprehensive coverage of the SAGE library is illustrated by the identification of 20 class I G protein-coupled receptors of which 18 have been reported previously in platelets (Supplemental Table 6) and the presence of 15 tetraspanins, each of which was verified in mouse megakaryocytes by RT-PCR. 2 Moreover the two novel class I G protein-coupled receptors are orphans and so have evaded discovery through functional means. Significantly, however, a small number of platelet proteins were not detected by SAGE, including the ␣2 and ␣5 integrin subunits and the P2Y 12 G protein-coupled ADP receptor, suggesting that the mRNA levels for these genes are relatively low in megakaryocytes. A list of the top 50 transmembrane proteins with the greatest number of SAGE tags is shown in Table IV.
The megakaryocyte SAGE library was compared with 30 other mouse SAGE libraries to identify megakaryocyte-specific expressed genes (Table V). As anticipated, this identified the integrin ␣IIb subunit as the major megakaryocyte-specific gene. Strikingly, however, 17 of the 25 most megakaryocytespecific expressed genes encoded transmembrane proteins, emphasizing the unique nature of the megakaryocyte surface. This includes all of the proteins that make up the GPIb-IX-V complex as well as the recently identified type II C-type lectinlike receptor CLEC-2 and the ITIM-containing protein triggering receptor expressed on myeloid cells-like transcript 1 (TLT-1) (6,37,38).
These findings demonstrate that the mouse megakaryocyte SAGE library represents a powerful bioinformatics source for analysis of expression of transmembrane proteins in mature murine megakaryocytes with clear implications for their expression in platelets. The SAGE data have been deposited in the NCBI SAGEmap database (www.ncbi.nlm.nih.gov/SAGE/).

DISCUSSION
The main objective of this study was to identify novel receptors expressed on the surface of human platelets using proteomics and to determine which of these proteins are likely to be expressed on mouse platelets using a megakaryocyte SAGE library. The latter information is important because the mouse is the model system of choice for functional studies of novel platelet proteins. Megakaryocytes rather than platelets were chosen because they contain a considerably greater level of mRNA, and the application of SAGE to these cells is not hampered by the presence of mitochondrial DNA (22).
In total, 136 transmembrane proteins were identified by proteomics on the basis of identification of two or more unique peptides using three distinct membrane purification procedures compared with over 1,200 identified by SAGE.
Although it is likely that the relatively large and more complex megakaryocyte expresses more transmembrane proteins than platelets express, the reason for the differences in total numbers may be largely due to a fundamental difference between the two techniques in that genomics detects essen- tially all expressed genes but provides no information on protein expression, whereas proteomics detects protein expression but preferentially identifies the most highly expressed proteins. In addition, the application of proteomics as used in the present study is critically dependent on the presence of suitably spaced trypsin cleavage sites to generate peptides of the appropriate size for identification. Such factors may explain why multispanning proteins, such as G protein-coupled receptors and tetraspanins, were particularly under-represented in the proteomics study as was also reported by Moebius et al. (21) in their analysis of the platelet membrane proteome. This is likely to reflect the low abundance of the majority of these proteins (the tetraspanin CD9, which was detected, is a notable exception with 45,000 copies per platelet) and relatively low number of tryptic cleavage sites as is typical for small, multispan membrane proteins. There was, however, a good correlation between reported expression levels of platelet receptors and the number of SAGE tags for a significant number of proteins. Furthermore the degree of overlap between the genomics and proteomics data was strong: 81% of the transmembrane proteins identified in human platelets using proteomics were present in the mouse megakaryocyte SAGE library. The remaining 19% may be due to a number of factors, including differences in the levels of expression in the two species, the absence of certain genes from the mouse genome (e.g. Fc␥RIIA), differential gene expression between the two species (e.g. human but not mouse platelets express PAR1) (39,40), or differences in expression in megakaryocytes and platelets. We conclude that the combined use of proteomics-and genomics-based approaches represents a powerful way of mapping the platelet membrane proteome.
Our study has also shown that the use of SAGE data alone is a good method for identifying platelet-specific transmembrane proteins. Because SAGE is quantitative, different libraries can be directly compared. Comparison of the megakaryocyte SAGE library with 30 other SAGE libraries, the majority of which are hematopoietic in origin, revealed that transmembrane proteins feature strongly in the list of the most megakaryocyte-specific proteins. Indeed the 25 most megakaryocyte-specific genes contained 17 with predicted transmembrane domains, including the known platelet marker integrin ␣IIb and all four components of the GPIb-IX-V complex. The list also included the recently identified platelet  (33). The top 25 most megakaryocyte-specific genes, which had at least eight megakaryocyte tags, are listed in order of tag number. The number of nonmegakaryocyte tags for each gene is not shown but was between zero and five tags per 1,031,389 total tags. Genes encoding known and predicted transmembrane proteins are shown in bold. 5-HT, 5 Eya2 Eyes absent 2 homolog (Drosophila) 9 NM_008148 Gp5 Ltb4dh Leukotriene B4 hydroxydehydrogenase 9 NM_025926 Dnajb4 DnaJ (Hsp40) homolog B4 8 NM_172708 A930013K19Rik Hypothetical protein LOC231134 8 transmembrane proteins CLEC-2 (6), TLT-1 (37,38), and endothelial cell-selective adhesion molecule (41) for which functions remain to be elucidated. The results of this SAGE analysis suggest that cell specificity is governed to a large extent by the receptors expressed on the cell surface. Similar analyses will facilitate the identification of cell-specific transmembrane proteins in other cell types. Moreover given that the NCBI SAGEmap depository now contains over 300 human and 200 mouse SAGE libraries, such experiments can be done entirely in silico. Three different membrane enrichment techniques were used in this study in combination with LC-MS/MS analysis to identify transmembrane proteins expressed in human platelets. A total of 46 PM proteins, 68 IM proteins, and 22 proteins of unknown localization were identified by this approach. Eighty-three percent of these were identified by both Mascot and Sequest search algorithms; this correlates well with the study of Elias et al. (31) who reported a figure of Ͼ85% when evaluating mass spectrometry platforms used in large scale proteomics investigations. Reproducibility between experiments using the same enrichment technique was high for abundant, known platelet surface proteins (e.g. ␣IIb and ␤3 integrin subunits and all of the subunits of the GPIb-IX-V complex) and much lower for novel platelet transmembrane proteins (Ͻ50%). This was not surprising as low reproducibility (ϳ70%) between replicate data acquisitions of the same sample has been reported previously (31). The lower reproducibility in our study compared with the Elias et al. (31) study is probably largely due to interexperimental variation, bearing in mind that each set of samples was only analyzed once per experiment but that either two (FFE) or three (WGA and biotin/ NA) purifications were performed.
Additional biochemical and functional studies were performed on one of the novel proteins that was identified in this study, namely G6b, as this is alternatively spliced to seven different isoforms, one of which contains a transmembrane domain and an ITIM and is therefore a potential inhibitor of platelet activation. To date, only one inhibitory ITIM-containing receptor has been identified in platelets, PECAM-1, which selectively inhibits platelet activation by GPVI (42)(43)(44). A second platelet ITIM receptor, TLT-1, has been reported to support weak platelet activation (37,38). Biochemical evidence using a G6b-B-specific polyclonal antibody confirmed the presence of G6b-B in human platelets and demonstrated that it is constitutively phosphorylated on tyrosine in platelets and that it undergoes a further increase in tyrosine phosphorylation upon stimulation by the GPVI-specific agonist CRP and thrombin. Furthermore the non-receptor protein-tyrosine phosphatase SHP-1 is constitutively associated with G6b-B in resting platelets and undergoes an increase in association in parallel with tyrosine phosphorylation. Thus, G6b-B may potentially play an important role in regulating platelet activation by the two ITAM receptors, the collagen receptor GPVI and the low affinity immune receptor Fc␥RIIA, through its association with SHP-1. Further work is necessary to determine which other forms of G6b are expressed in platelets and their functional roles.
The initial proteomics studies in platelets used two-dimensional electrophoresis in combination with LC-MS/MS (14,(17)(18)(19). These studies reported the presence of a small number of platelet membrane proteins most likely because many are expressed at low levels and because a significant number precipitate during isoelectric focusing. More recently, a combined fractional diagonal chromatography technology, a nongel-based "shotgun" approach developed by Gevaert and co-workers (16), was used in combination with MS/MS to study the platelet proteome. Sixty-nine platelet transmembrane proteins were identified using this approach, only 12 of which had been reported previously in platelet proteomics studies. Furthermore Moebius et al. (21) used a combination of 1-DE and 16-BAC/SDS-PAGE prior to LC-MS/MS to identify 83 PM and 48 IM proteins. However, these investigators report both transmembrane and membrane-associated proteins, such as G␣ 13 subunit and Rap-1A, which lack transmembrane domains. Taking this into account, the number of proteins predicted to contain transmembrane domains identified by Moebius et al. (21) using proteomics was 124, which is similar to that of 136 identified in the present study. The slightly larger number of proteins identified in the present study can be largely attributed to the number of identified IM proteins, which is likely due to the fact that we used FFE to enrich the IM fraction. A direct comparison of the proteomics dataset reported in the present study with that from the Moebius et al. (21) study showed that 62 proteins were identified in both studies, approximately half of which are known platelet PM proteins. This low level of overlap between the two studies is a reflection of the different techniques but may also be partially inherent to MS/MS studies as pointed out by Elias et al. (31). Together the present study and that of Moebius et al. (21) illustrate the requirement for affinity/membrane purification for the identification of platelet membrane proteins using proteomics.
It is beyond the scope of this study to address the question of the functional roles in platelets of novel receptors identified in the study, but it is noteworthy that a number of the identified proteins have either recently been shown to regulate platelet function or to have characteristics that strongly indicate that they may regulate platelet function. Examples of the former group include the immunoglobulin superfamily protein CD84, which has recently been shown to play an important role in supporting late stage events in platelet aggregation (45); the C-type lectin receptor CLEC-2, which has been shown to mediate platelet activation through a distinct signaling cascade (6); and the immunoglobulin superfamily protein G6f, which has been shown to localize Grb2 to the membrane in GPVI-activated platelets (46).
In summary, the present study has illustrated the power of the combined use of proteomics-and genomics-based ap-proaches in identifying proteins in the platelet membrane. It has also highlighted the high degree of similarity in proteins expressed on the surface of human platelets and mouse megakaryocytes, further validating the use of the mouse model for studying the role of platelets in thrombosis. Future studies need to focus on establishing the biological and biochemical functions of the newly identified proteins in the physiological and pathological regulation of platelets in anticipation that this may lead to the identification of novel targets for antithrombotic agents. ʈʈ Holds a BHF chair.