|
|
||||||||
,
,¶,||
,¶
,**
,
,

From the
Department of Proteomics and Signal Transduction, Max Planck Institute for Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany,
Center for Experimental Bioinformatics, University of Southern Denmark, Campusvej 55, DK-5230 Odense M, Denmark, and ** Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 101300, China
| ABSTRACT |
|---|
|
|
|---|
|
| EXPERIMENTAL PROCEDURES |
|---|
|
|
|---|
Subcellular Fractionation and Western Blotting
Differential centrifugation was used to fractionate adipocytes as described previously (28, 29). Briefly cells were washed with ice-cold PBS twice and suspended in HES buffer (5 mM HEPES, pH 7.4, 0.5 mM EDTA, 250 mM sucrose, protease inhibitors (Complete tablets; Roche Applied Science). Cells were then sheared by 10 passages through a 25-gauge needle and centrifuged at 1,000 x g for 10 min at 4 °C. The supernatant served as the source of cytosol, mitochondria, and membranes. The nuclear pellet was resuspended in 7 ml of 0.25 M sucrose, TKM buffer (50 mM Tris-HCl, pH 7.4, 25 mM KCl, 5 mM MgCl2, Complete protease inhibitor). The suspension was overlaid with 1 ml of 2.3 M, 3 ml of 2.1 M, and 6 ml of 1.6 M sucrose layer (sucrose, TKM buffer). The sucrose step gradient was centrifuged at 160,000 x g for 1 h (Sorvall Surespin 630 rotor). The purified nuclei settled at the 2.12.3 M interface. The nuclear layer was isolated; diluted in 10 ml of 0.25 M sucrose, TKM buffer; and centrifuged at 2,700 x g for 10 min. The nuclear pellet was resuspended in 200 µl of HES buffer. The mitochondria were isolated from the crude cytoplasm by centrifugation at 16,000 x g for 1 h with the resulting pellet suspended in 0.58 M TES buffer (0.58 M sucrose, 10 mM Tris-HCl, pH 7.4, 0.1 mM EDTA, Complete protease inhibitor). The suspension was centrifuged at 16,000 x g for 20 min, and the pellet was suspended in 3 ml of 1.75 M TES buffer while the supernatant served as the microsomal and cytosolic fraction. On the suspension of 1.75 M TES buffer, 3 ml of 1.55 M, 3 ml of 1.29 M, and 7 ml of 0.58 M sucrose layers were overlaid, and the sucrose step gradient was centrifuged at 160,000 x g for 1 h. The mitochondrial layer at the 1.291.55 M interface was isolated, diluted in 15 ml of 0.25 M HES buffer, and centrifuged at 16,000 x g for 20 min. The purified mitochondrial pellet was resuspended in 300 µl of HES buffer. Membrane was isolated by centrifugation of the postmitochondrial cytoplasmic fraction at 160,000 x g for 1 h while the supernatant served as the cytosolic fraction. The membrane pellet was resuspended in HES buffer and centrifuged again at 160,000 x g for 1 h. The purified membrane pellet was resuspended in 200 µl of HES buffer. Proteins in the cytosolic fraction were enriched using Centriprep YM-3 membrane concentrators (Millipore, Billerica, MA). Total protein in all fractions was then quantified using the Coomassie Protein Assay kit (Pierce) and stored at 80 °C. Equal protein amounts (7 µg) from each of the subcellular fractions were loaded onto 10 and 15% SDS-polyacrylamide gels, electrophoresed, transferred to nitrocellulose membranes, and immunoblotted with antibodies against histone H3 (Cell Signaling Technology, Beverly, MA), cytochrome c (BD Pharmingen), insulin receptor ß chain (Santa Cruz Biotechnology, Santa Cruz, CA), and MEK1 (Upstate Biotechnology Inc., Lake Placid, NY) followed by horseradish peroxidase-conjugated secondary antibodies. The membranes were subjected to chemiluminescence detection (ECL) according to the manufacturer's instructions (GE Healthcare).
One-dimensional SDS-PAGE and In-gel Digest
Proteins (100, 150, 150, and 150 µg) from each of the subcellular fractions (nuclear, mitochondrial, membrane, and cytosol fractions, respectively) were separated by one-dimensional SDS-PAGE using NuPage® Novex bis-Tris gels and NuPage MES-SDS running buffer (Invitrogen) according to the manufacturer's instructions. The gel was stained with Coomassie using Colloidal Blue Staining kit (Invitrogen). Protein bands were excised and subjected to in-gel tryptic digestion essentially as described previously (30). Briefly the gel pieces were destained and washed, and after dithiothreitol reduction and iodoacetamide alkylation, the proteins were digested with porcine trypsin (modified sequencing grade; Promega, Madison, WI) overnight at 37 °C. The resulting tryptic peptides were extracted from the gel pieces with 30% acetonitrile, 0.3% trifluoroacetic acid and with 100% acetonitrile. The extracts were evaporated in a vacuum centrifuge to remove organic solvent and then desalted and concentrated on reversed-phase C18 StageTips as described previously (31).
Nanoflow LC-MS2 or MS3
All nanoflow LC-MS/MS and MS/MS/MS experiments were performed basically as described previously (25, 32). All digested peptide mixtures were separated by on-line reversed-phase nanoscale capillary LC and analyzed by electrospray MS/MS and MS/MS/MS. The experiments were performed on an Agilent 1100 nanoflow system connected to an LTQ-FTICR mass spectrometer (Thermo Electron, Bremen, Germany) equipped with a nanoelectrospray ion source (Proxeon Biosystems, Odense, Denmark). Binding and chromatographic separation of the peptides took place in a 15-cm fused silica emitter (75-µm inner diameter) in-house packed with reversed-phase ReproSil-Pur C18-AQ 3-µm resin (Dr. Maisch GmbH, Ammerbuch-Entringen, Germany). Peptide mixtures were injected onto the column with a flow of 500 nl/min and subsequently eluted with a flow of 250 nl/min from 10 to 64% acetonitrile in 0.5% acetic acid in a 105-min gradient. Data were acquired in data-dependent mode using Xcalibur software. The precursor ion scan MS spectra (m/z 3001575) were acquired in the FTICR instrument with resolution R = 25000 at m/z 400 (number of accumulated ions, 5 x 106). The three most intense ions were isolated and fragmented in the linear ion trap by collisionally induced dissociation using 3 x 104 accumulated ions. They were simultaneously scanned by FTICR-selected ion monitoring with 10-Da mass range, R = 50,000, and 5 x 104 accumulated ions for even more accurate molecular mass measurements. For MS3, the most intense ions with m/z >300 in each MS2 spectra were further isolated and fragmented. In data-dependent LC-MS2 experiments dynamic exclusion was used with 30-s exclusion duration.
Proteomics Data Analysis
Proteins were identified via automated database search (Mascot; Matrix Science, London, UK) of all tandem mass spectra against an in-house curated version of the mouse International Protein Index (IPI) protein sequence database (version 3.07, 43,335 protein sequences; European Bioinformatics Institute, www.ebi.ac.uk/IPI/) containing all mouse protein entries from Swiss-Prot, TrEMBL, RefSeq, and Ensembl as well as frequently observed contaminants (porcine trypsin, Achromobacter lyticus lysyl endopeptidase, and human keratins). A "decoy database" was prepared by reversing the sequence of each entry and appending this database to the forward database. Carbamidomethylcysteine was set as fixed modification, and oxidized methionine, protein N-acetylation, N-pyroglutamate, and deamidation of asparagine and glutamine were searched as variable modifications. Initial mass tolerances for protein identification on MS peaks were 5 ppm and on MS/MS peaks were 0.5 Da. Two "missed cleavages" were allowed. The instrument setting for the Mascot search was specified as "ESI-Trap." Peptide identification information was extracted from the Mascot result file into EPICenter (Proxeon Biosystems) (33). Besides the standard search engine results used for peptide assignment (score, expected versus calculated fragment ions, and delta mass), additional empirical information was computed by the EPICenter peptide validation module to assist in the assignment (33). Peptides satisfying the following four criteria were accepted for identification: 1) peptides for which the MS2 score were above the 99th percentile of significance (Mascot score >32), 2) fully tryptic peptides with sequence length 7 or longer, 3) peptides for which delta scores (the difference in score between the first and second scoring peptide) were at least 5.0, and 4) peptides for which y ion or b ion score was at least 50.0. The Mascot result file was also imported into MSQuant (75), and the MS3 score was calculated automatically. Finally the protein identification list was created by accepting the peptides (which passed our criterion mentioned before) and consolidating them according to the following method. Proteins with at least two peptides having a cumulative MS2 score (Mascot score) of at least 64 were counted as identified protein. This protein identification criterion corresponds to the confidence of p = 0.0001 if both peptide identifications are considered independently. For proteins identified by a single peptide, we required the presence of an MS3 spectrum and a combined score for MS2 and MS3 of above 52, which corresponded to a level of false positives of p = 0.0001. EPICenter automatically assigns identified peptides to proteins and organizes all proteins with shared peptides into a single group (protein group). EPICenter selects the protein with most of the peptides as an anchor protein and marks proteins that are identified by at least one distinct and separate peptide as conclusively identified proteins. We counted proteins as identified only when a protein conclusively identified as described above or a protein group consisted of only isoforms or overlapping database entries.
Enrichment Analysis of Gene Ontology (GO) Categories
Biological Networks Gene Ontology (BinGO) (34), the Cytoscape (35) plugin for finding statistically over- or under-represented GO categories, was used for the enrichment analysis of our adipocyte proteome dataset. The 3T3-L1 proteome dataset was compared against a reference set of complete mouse proteome (IPI mouse) GO annotations. According to the instructions for BiNGO (34) the custom GO annotation for the reference set of whole IPI version 3.13 mouse dataset was created by extracting the GO annotations available for mouse IPI identifiers from the EBI Gene Ontology Annotation Mouse 22.0 release (www.ebi.ac.uk/GOA/MOUSE_release.html) at the time of release. The Gene Ontology Annotation Mouse 22.0 release contained annotation for 32,776 proteins compiled from different sources. The analysis was done using "Hypergeometric test," and we selected all GO terms that were significant with p < 0.001 after correcting for multiple term testing by "Benjamini and Hochberg false discovery rate." The analysis was done separately for GO biological process and molecular function categories, and fold enrichment for every over-represented term in the two GO categories was calculated. In the following we discuss the fold enrichment calculation for GO biological process; the same procedure applies for the molecular function category. Suppose the set of over-represented biological process GO terms is called BGO. For each term BGO[i] in set BGO the fold enrichment measure was calculated by the following formula,
![]() |
![]() |
![]() |
InterPro Domain Enrichment for Insights into Protein Function
To gain a better understanding of identified proteome functions we used InterPro (release 13.0) annotation for finding statistically enriched protein domains in our dataset. We used Cytoscape (35) plugin BiNGO (34) for domain enrichment analysis. For domain enrichment we needed three components: 1) the test dataset of identified 3T3-L1 proteome; 2) the InterPro ontology, which was built by parsing the "interpro.xml" file (for release 13.0) available at ftp.ebi.ac.uk/pub/databases/interpro/ using in-house scripts; and 3) the reference set of InterPro annotation for the complete mouse proteome, which was created by parsing the "ipi.MOUSE.IPC" file that contains all the InterPro matches for IPI mouse 3.19 database, available at ftp.ebi.ac.uk/pub/databases/IPI/current/ as part of IPI 3.19 release.
Briefly we created a custom InterPro ontology according to ontology format specifications of Cytoscape. The test set of 3T3-L1 proteome was compared against the InterPro annotations of the IPI mouse reference set using the custom InterPro ontology as the framework. The enrichment analysis was done using Hypergeometric test, and we selected all InterPro domains that were significant with p < 0.001 after correcting for multiple term testing by Benjamini and Hochberg false discovery rate. The set of over-represented InterPro terms is called Ienrich. For each term Ienrich[k] in set Ienrich the -fold enrichment measure was calculated by the following formula,
![]() |
![]() |
![]() |
.
Subsequently these enriched InterPro domains were grouped in functional categories according to their representative biological functions.
Proteome mRNA Concordance Analysis for 3T3-L1 Adipocytes
To estimate the depth of the proteome we covered in our survey, we compared our identified proteome list with the DGAP microarray dataset for normal 3T3-L1 adipocytes. The available dataset is in triplicates for Affymetrix MG_U74A, B, and C arrays. The MG_U74A, B, and C arrays contain 12,654, 12,636, and 12,728 probe sets, respectively, which cover known mouse genes and ESTs. The DGAP dataset contains triplicates for each array type. The analysis was carried out in two steps. In the first step we estimated the basal expression of the 3T3-L1 adipocyte transcriptome, and in the second step we mapped our 3T3-L1 adipocyte proteome dataset on the transcriptome data. We analyzed the triplicates of each array type separately and calculated the MAS5 expression values using the "mas5" function implemented in the "affy" package of R software (36). The expression values were then converted to log2 scale. And the log2 expression values were z-transformed to facilitate the comparison of mRNA expression across three array types. The data were further filtered based on the present (P) versus absent (A) call percentage, which is a widely accepted measure of microarray data quality. We used a criterion of 66% P call for accepting a probe set as expressed, i.e. a probe set was accepted if it had a P call in two of three samples. Subsequently the data for the MG_U74A, B, and C arrays were combined in one set. Only 7,656 probe sets of a total 37,886 unique probe sets met this criterion, and they were taken as surrogate for basal 3T3-L1 mRNA expression. Subsequently we mapped our proteome list on the estimated basal expression set. We used BioMart version 0.5 (76) "Mus musculus genes NCBIM36" dataset to map adipocyte IPI identifiers to the MGU_74A, B and C probe sets. Thus we could map 3,287 IPI identifiers to 3,329 MG_U74(A,B,C) probe sets. Finally the overlap of the adipocyte basal (7,656) probe sets and our survey (3,329) probe sets was calculated. And this gave us a final number of 2,182 probe sets that were found in both the datasets. Hence these 2,182 probe set data were regarded as the genes that we could identify, and the remaining 5,474 probe sets were used for the not-identified set. We used this consolidated information to calculate the average mRNA expression for the identified versus non-identified proteome using the expression levels of 7,656 probe sets.
Protein Prioritization Analysis
The recently reported software application for the computational prioritization of genes, Endeavour (26), was used for protein prioritization. 3T3-L1 proteins (IPI identifiers) were mapped to human orthologs (Ensembl Gene identifiers) using BioMart version 0.5 (76) and the Mus musculus genes NCBIM36 dataset. In total 2,990 IPI protein identifiers were successfully mapped to human Ensemble gene identifiers. The training set (STraining) was created by choosing 29 genes involved in vesicular trafficking in insulin signaling pathway as shown in Fig. 6C. The mapped 3T3-L1 proteome Ensembl list was taken as the candidate test set (STest). The following data sources were used for ranking: 1) literature (abstracts in EntrezGene), 2) functional annotation (Gene Ontology), 3) microarray expression (Atlas gene expression), 4) EST expression (EST data from Ensembl), 5) protein domains (InterPro), 6) pathway membership from the Kyoto Encyclopedia of Genes and Genomes, 7) cis-regulatory modules (TOUCAN), and 8) sequence similarity (BLAST) data. A model for "vesicular trafficking" genes was created in Endeavour using the above mentioned data sources. Finally the candidate test set (STest) genes were ranked for their putative role in vesicular trafficking in the insulin pathway by measuring their similarity to genes in training set (STraining).
|
Hierarchical Clustering of the Cellular Compartment Profiles of the Adipocyte Proteome
A cellular compartment distribution matrix was created for the 3,287 IPI identifiers that we identified in our analysis. Briefly suppose C is the 3,287 x 4 cellular compartment matrix corresponding to 3,287 proteins and four compartments (nuclei, mitochondria, membrane, and cytosol). For a particular IPI i, i
[1,3287] and a particular compartment column j, j
[1, 4], if the IPI was observed with k unique peptides in the compartment then C[i, j] = k. Otherwise if the IPI i was not observed in compartment column j then C[i, j] = 0. This 3,287 x 4 matrix was called the cellular compartment distribution matrix. The matrix was further converted to a probability distribution matrix, Cprob, of the same dimensions (3,287 x 4) with each element calculated by the following formula.
![]() |
The matrix Cprob was then used for one-dimensional hierarchical clustering using Genesis (38) software. The distance metric used was "euclidean," and the clustering was done using the "average linkage clustering" technique. Subsequently the data from earlier large scale studies (20, 22) were overlaid on the clustered dendrogram to ascertain the proteome concordance and depth of our fractionation and subcellular identification. Also the available GO cellular component terms corresponding to the four compartments were extracted for the 3,287 IPI identifiers and overlaid on the clustered dendrogram.
Pathway Mapping of Identified Proteins in Subcellular Compartments
To ascertain the coverage of our dataset with respect to the established key pathways and biological processes, we used the recently developed functional mapping tool GenMapp version 2.1 (39) to map our 3T3-L1 adipocyte dataset on publicly available mouse MAPPs. Because IPI is not currently supported by GenMAPP, we mapped the IPI identifiers to their MGI counterparts using IPI protein cross-reference information as available for IPI mouse version 3.13. Overall 3,124 IPI identifiers (95.0% of the total) could be mapped to their respective MGI identifiers. Subsequently we created a compartment wide list for the mapped MGI identifiers based on the presence/absence of a particular protein in either of the four compartments, and the data were mapped to the latest available mouse MAPPs.
Adipocyte Proteome Comparison with Previous Human Cell Line and Drosophila Lipid Droplet Proteomics Studies
We used ProteinCenter (Proxeon Bioinformatics, Odense, Denmark), which is proteomics data mining and management software, to compare our datasets with previously published datasets of six human cell lines (40) and the Drosophila lipid droplet proteome (5, 6). Briefly for mapping our dataset to any other dataset we loaded the datasets as two groups in ProteinCenter. The datasets were then clustered based on the sequence similarity, and the optimization criterion used was "most homogeneous groups" wherein protein sequences are clustered to make the individual groups as homogeneous as possible. The similarity threshold of 80% was chosen as the cutoff to define clusters of sequences. The adipocyte proteins belonging to the clusters of at least two proteins with at least one identifier from the compared dataset were deemed overlapping with that dataset. The adipocyte proteins belonging to singleton clusters, which could not be clustered with any sequence of the compared dataset, were chosen as adipocyte-specific.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
Determining the identities of proteins from sequenced peptides is complicated because the same peptide sequence can be present in multiple different proteins or protein isoforms (45). Standard search engines such as Mascot report proteins even when they do not have distinct peptides with their sequence specific to these proteins. Thus, sharing information on identified peptides was checked by EPICenter (33) and manually verified. EPICenter organizes all entries with shared peptides into a single group (protein group). The protein that contains most of the peptides is selected as an anchor, and all group members that are identified by at least one distinct and separate peptide are marked as conclusively identified. We counted proteins as identified only when a protein had at least one distinct peptide. If a protein group consisted of only isoforms or overlapping database entries indistinguishable by MS then only the anchor protein was counted; thus the number of identified proteins is a lower boundary of the actual value.
From 45 LC-MS/MS/MS runs, 182,271 MS/MS spectra were submitted to Mascot database search, and 52,585 MS/MS spectra satisfied the criteria for peptide validation. Among these, 22,706 represent unique sequences in the four fractions. A total of 3,287 proteins were identified from four fractions with 8,953 unique peptides (Supplemental Tables 1 and 2). Of all proteins, 20.4% were identified with single peptide identification and two stages of peptide fragmentation. Interestingly 71.3% of the total were identified within only one fraction, whereas 16.7, 7.8, and 4.2% were identified within two, three, and four fractions, respectively. This confirms the relatively high quality of organellar separation as already suggested by marker analysis in the Western blotting experiments in Fig. 2. In two previous studies analyzing several mouse tissues, each similarly separated into four subcellular compartments, protein overlap among compartments was deeper, and fewer than 50% of total identified proteins were specific to one compartment (22, 46).
Depth and Coverage of the 3T3-L1 Adipocyte Proteome
We compared our proteome dataset with the recently published mouse liver organelle proteome map (20) and the above mentioned study of six mouse tissues (brain, heart, kidney, liver, lung, and placenta) (22). As shown in Fig. 4A, more than two-thirds of the proteins identified by us in adipocytes overlapped with these other proteomes. These proteins are candidates for the "household proteome," i.e. proteins performing general cellular functions and therefore present in different cell lines and tissues. However, the proportion of proteins specific to adipocytes in our study (28.7%) was also relatively high. In contrast, in a previous study that compared six human cell lines, specific proteins (proteins that were exclusively found in a single cell line) account for only 636% of all identified proteins (40). As shown in Fig. 4B, nearly half of our cytoplasm proteins overlapped with the combined cytoplasmic proteins from six human cell lines (40). Secreted proteins were enriched
3.5-fold in identified cytoplasmic proteins from adipocyte compared with cytoplasmic proteins from the six-cell line proteome. Fig. 4 clearly shows that our proteome dataset contains many proteins that were not identified in previous large scale proteome analysis using both tissues and cell lines. Our result may reflect the depth of high confidence analysis now possible and/or the specificity of adipocytes for their critical role in energy balance and whole body homeostasis.
|
Previously microarray studies have been undertaken to unravel various aspects of 3T3-L1 adipocyte differentiation, development, and function (4750) and because they serve as a useful resource for gaining insights into mRNA expression and cellular dynamics. Moreover microarray studies provide an estimate of the transcriptome in a particular biological state at any given time, and we wished to use these data as a reference for estimating the coverage and depth of our large scale proteome study. We analyzed the gene expression levels of normal 3T3-L1 adipocyte from Affymetrix microarray data generated in the DGAP. The available dataset contains triplicates for each of the MGU_74A, B, and C Affymetrix array types. We combined the three array type datasets for our analysis. In total they contain 37,886 probe sets of which 7,656 were deemed "present" by using the 66% P call criterion (see "Experimental Procedures"). Of these 7,656 probes, 2,182 could be mapped to our identified proteome.
We then divided the genes judged to be expressed in 3T3-L1 according to the microarray data into two groups: those identified in our study and those not identified (Fig. 5). If proteomics was biased to detect only high abundance proteins, we would expect a large difference in mRNA signal between the two groups. Remarkably the distribution of mRNA expression levels was less than 2-fold higher for the genes whose products were detected in our proteomics analysis as compared with those that were not identified. To further substantiate this finding, we also compared our proteome data with the Gene Atlas V2 mouse microarray data for adipose tissue (51). Again we observed that the distribution of mRNA expression level was less than 2-fold higher for the genes whose products were detected in our proteomics analysis as compared with those that were not (Supplemental Fig. 1). This suggests that proteomics experiments, despite the remaining limitations in complex mixture analysis (52), have become quite comprehensive and able to detect low abundance proteins in cellular proteomes. Our previous study on mouse tissue mitochondria, in contrast, still showed a substantial tendency of mass spectrometry to preferentially detect products of high abundance messages (21).
|
In contrast, less than half of the known proteins in the insulin pathway map were identified (Fig. 6C). Coverage of kinases and transcription factors was low, whereas we detected more than half of the proteins related to vesicular trafficking. Interestingly in the analysis of 3T3-L1 microarray data half of the key components of the insulin signaling pathway were also flagged as not expressed (i.e. filtered out by microarray quality measures when using the P calls >66% criterion). This suggests that the limitation of detection of low abundance proteins/genes such as kinases and transcription factors are no worse for MS-based proteomics than for other high throughput "omics" technologies.
Endocrine function is one of the key functions of adipocytes, and several hormones, called adipokines or adipocytokines, are synthesized by adipose tissue. Among them, adiponectin, adipsin, and nicotinamide phosphoribosyltransferase (visfatin) were identified in this study. Furthermore we identified proteins secreted from 3T3-L1 cells (9, 14, 15) including procollagens, complement C3, galectin-3, prohibitin, thioredoxin, peroxiredoxin 1, ß2-microglobulin, haptoglobin, vimentin, cathepsin B, and CD14 antigen. We also identified NUCB1 (Nucleobindin-1) and NUCB2 (Nucleobindin-2) proteins, which are homologous and are reported to be secreted, although until recently their precise function was unknown (55, 56). Interestingly in a very recent study an N-terminal fragment derivative of NUCB2 named nesfatin-1 (NEFA/Nucleobindin-2-encoded satiety-and fat-influencing protein-1) was shown to act as a satiety molecule in the hypothalamus of rat brain (57). However, we did not identify tumor necrosis factor
, interleukin-6, leptin, and plasminogen activator inhibitor 1 precursor (PAI-1). The absence of leptin in our survey may not only be due to its extremely low abundance but may also be explained by earlier observations that leptin expression at the mRNA level in cultured 3T3-L1 adipocytes is markedly down-regulated or absent compared with the leptin expression in mouse adipose tissue (58). Moreover we classified 60 proteins as secreted (GO:0046903), and our dataset was more than 2.5-fold enriched for secreted proteins when compared with complete IPI mouse proteome dataset version 3.07. Also 482 proteins had a predicted signal peptide, corroborating the endocrine nature of the adipocyte proteome.
Annotation: Subcellular Localization
To compare our subcellular fractions and detection coverage we benchmarked our data against previously reported large scale proteomics analyses of mouse organelles (20, 22) and Gene Ontology annotation. For our proteome, we built a cellular compartment distribution matrix (3,287 proteins x 4 fractions) by first counting peptides of each protein in each fraction. Then we normalized the data to arrive at a probability matrix for the distribution of each protein in the four compartments (see "Experimental Procedures"). Hierarchical clustering of this matrix showed that more than 70% of the proteins localized to four clusters with propensity for one specific fraction (Fig. 7). A fifth cluster contained proteins for which there was no clear pattern of distribution. Additionally we overlaid the data of the two large scale mouse tissue experiments as well as cellular compartment annotations from Gene Ontology on the clustered dendrogram. As seen in the figure, most of the adipocyte cytosolic proteome showed high concordance with the experimental studies. Similarly the membrane cluster showed good agreement. For the mitochondrial and nuclear fractions there was excellent correlation; however, the depth of our study was much greater. (Only the top part of our clusters have detected counterparts in the other studies.) The GO annotations agreed well for the mitochondria- and nucleus-specific clusters. No major enrichment was seen for the membrane and cytoplasmic fractions, which seem to be less well annotated in GO.
|
Annotation: Novel Proteins
Coverage and level of detail of current GO annotation is one of the limitations for the analysis using Gene Ontology. As mentioned above only about two-thirds of the identified proteins were mapped to at least one annotation term within the GO molecular function and biological process categories. Furthermore we found that 335 proteins had no specific protein names and gene symbols; thus we used the Blast2GO (37) tool to assign putative GO annotations to these terms. This allowed us to assign 175 IPI identifiers to GO terms as well as 45 IPI identifiers to Enzyme Classification (EC) identifiers. Hence overall we could assign putative annotation to more than 50% of unannotated proteins in our identified dataset (Supplemental Table 3). The assigned GO biological process terms were mainly related to cellular metabolism, transport, protein biosynthesis, organelle organization and biogenesis, and cellular physiological process (Supplemental Fig. 4). The GO molecular function terms were mainly related to protein binding, nucleic acid binding, RNA binding, catalytic activity, hydrolase activity, oxidoreductase activity, and structural molecule activity (Supplemental Fig. 5). This result aligns well with our enrichment analysis of GO annotations for the whole adipocyte proteome and indicates likely roles of these hypothetical/unknown proteins in adipocyte function and biology.
Protein Domain Enrichment for Insights into Protein Function
Classification of proteins based on their amino acid sequence or three-dimensional structure is one of the most established practices in protein science and also adopted by current large scale structural genomics endeavors (59). Moreover knowledge of independently folding protein domains can provide useful pointers into the complex interplay of proteome interactions and regulation by post-translational modifications (60). To obtain an additional perspective of the adipocyte proteome we performed InterPro domain enrichment analysis using our adipocyte proteome dataset and the proteome datasets obtained from six mouse tissues (22) and extracted InterPro domains enriched only in the adipocyte proteome (Fig. 8). The functions of these enriched domains were mainly related to signal transduction, redox system, protein transport, translation, transcription, protein degradation, fatty acid metabolism, and phospholipid biosynthesis in agreement with the enriched GO terms described before. Enrichment of redox-related domains, such as the thioredoxin domain, is interesting because it has been suggested that the reduced redox state encourages triglyceride synthesis, adipocyte differentiation, and the development of adipose tissue (61), whereas an increase in the markers of systemic oxidative stress has been associated with obesity and metabolic syndrome (62). Similarly the domains related to vesicular protein transport, such as Ras small GTPase, Rab type, t-SNARE, Longin-like, and a domain zinc finger, Tim10/DDP-type, which is related to protein import into mitochondrial inner membrane, were substantially enriched.
|
Protein Prioritization Analysis on Vesicular Trafficking in Adipocytes
One of the important features of adipocytes is insulin-regulated glucose uptake. In adipocytes, the majority of this glucose uptake results from the translocation of the glucose transporter 4 (GLUT4) to the cell surface membrane. Since the cloning of GLUT4 in 1989 in several laboratories (6569) numerous studies have attempted to elucidate the molecular basis of the insulin receptor signaling pathway and membrane trafficking processes. One of the unresolved questions is the connection between Akt activation and GLUT4 translocation. GO term enrichment analysis revealed that protein transport was enriched in the adipocyte proteome (see above), and some of the identified vesicular trafficking proteins are known to be involved in the insulin signaling pathway (Fig. 6C). As 26, 35, and 36% of identified proteins in our study were not annotated by Gene Ontology molecular function, biological process, and cellular component categories, respectively, we tried to predict candidate proteins involved in GLUT4 translocation using a bioinformatics approach. Very recently, an algorithm termed Endeavour was developed for gene prioritization to rank genes involved in human diseases and biological processes (26). The concept of prioritization by Endeavour is that candidate test genes are ranked based on their similarity with a set of known training genes. The similarity measure is in turn calculated by integrating functional, process, gene ontology (GO), pathway, and sequence similarity information obtained from diverse data sources. As training genes we chose 29 genes involved in vesicular trafficking that are on the map in Fig. 6C. We used 2,990 proteins that were identified and mapped to human Ensembl gene identifiers in our study as test genes. For 41 gene products we obtained highly significant values (p < 0.0002) for association with our set of proteins known to be involved in vesicular traffic (Table I). Candidate proteins highly ranked by Endeavour include many Ras-related GTP-binding proteins (Rabs) and soluble N-ethylmaleimide-sensitive factor attachment protein receptors (SNAREs). Although it is not surprising that these proteins are involved in vesicular trafficking, they do serve as a positive control of the algorithm. We found that proteins recently found to be related to insulin signaling or GLUT4 translocation were ranked high in the candidate proteins. For example, Rab-10, Rab-14, Rab-2, vesicle transport through interaction with t-SNAREs 1B homolog, vacuolar protein sorting 45, vesicle-associated membrane protein 8, and syntaxin-12 are known to be contained in GLUT4 vesicle (12, 70). Rab-2, Rab-10, and Rab-14 were identified as targets of Akt substrate of 160 kDa (AS160), whereas Rab-4 was reported to be involved in insulin-induced GLUT4 translocation (71). ADP-ribosylation factor 5 (Arf5) was observed to exhibit modest redistribution to the plasma membrane in response to insulin (72) and Cdc42, a Rho GTPase family member, mediates insulin signaling to glucose transport in 3T3-L1 adipocytes (73). The above examples show that the protein prioritization by Endeavour is reasonable. By extension, candidates in Table I with no obvious connection to insulin signaling and GLUT4 are now excellent candidates for further functional study in this context.
|
| ACKNOWLEDGMENTS |
|---|
| FOOTNOTES |
|---|
Published, MCP Papers in Press, April 4, 2007, DOI 10.1074/mcp.M600476-MCP200
1 The abbreviations used are: LTQ, linear ion trap; DGAP, Diabetes Genome Anatomy Project; IPI, International Protein Index; GO, gene ontology; DMEM, Dulbecco's modified Eagle's medium; bis-Tris, 2-[bis(2-hydroxyethyl)amino]-2-(hydroxymethyl)propane-1,3-diol; EST, expressed sequence tag; BLAST, Basic Local Alignment and Search Tool; MGI, Mouse Genome Informatics; SIM, selected ion monitoring; BiNGO, Biological Networks Gene Ontology; SNARE, soluble N-ethylmaleimide-sensitive factor attachment protein receptor; t-SNARE, target SNARE; GLUT4, glucose transporter 4. ![]()
* This study was supported by the DGAP. Work at the Center for Experimental BioInformatics was supported by a generous grant from the Danish National Research foundation. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. ![]()
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. ![]()
¶ Both authors contributed equally to this work. ![]()
|| Present address: Graduate School of Global Environmental Studies, Kyoto University, Yoshida-Honmachi Sakyo-Ku, Kyoto 606-8501, Japan. ![]()

To whom correspondence should be addressed. Tel.: 49-89-8578-2557; Fax: 49-89-8578-3209; E-mail: mmann{at}biochem.mpg.de
| REFERENCES |
|---|
|
|
|---|
-mediated) starvation.
Cell. Mol. Life Sci.
62, 492
503[CrossRef][Medline]
2 macroglobulin is involved in the adipose conversion of 3T3 L1 preadipocytes.
Proteomics
4, 1840
1848[CrossRef][Medline]
and
genes in 3T3-L1 adipocytes and white adipose tissue.
J. Biol. Chem.
269, 19041
19047