Originally published In Press as doi:10.1074/mcp.M500344-MCP200 on June 30, 2006.
Molecular & Cellular Proteomics 5:1703-1707, 2006.
© 2006 by The American Society for Biochemistry and Molecular Biology, Inc.
Dataset
A Dataset of Human Fetal Liver Proteome Identified by Subcellular Fractionation and Multiple Protein Separation and Identification Technology*,S
Wantao Ying
,
,¶,
Ying Jiang
,
,¶,
Lihai Guo
,
,¶,
Yunwei Hao
,
,¶,
Yangjun Zhang
,
,¶,
Songfeng Wu
,
,¶,
Fan Zhong
,
,¶,||,
Jinglan Wang
,
,
Rong Shi
,
,
Dong Li
,
,
Ping Wan
,
,
Xiaohai Li
,
,
Handong Wei
,
,
Jianqi Li
,
,
Zhongsheng Wang
,
,
Xiaofang Xue
,
,
Yun Cai
,
,
Yunping Zhu
,
,**,
Xiaohong Qian
,
,
and
Fuchu He
,
,||,
From the
Department of Genomics and Proteomics, Beijing Institute of Radiation Medicine, Beijing Proteomics Research Center, 27 Taiping Road, Beijing 100850, China,
Beijing Proteome Research Center, 33 Life Garden Road, Beijing 102206, China, and || Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, China
 |
ABSTRACT
|
|---|
A high throughput process including subcellular fractionation and multiple protein separation and identification technology allowed us to establish the protein expression profile of human fetal liver, which was composed of at least 2,495 distinct proteins and 568 non-isoform groups identified from 64,960 peptides and 24,454 distinct peptides. In addition to the basic protein identification mentioned above, the MS data were used for complementary identification and novel protein mining. By doing the analysis with integrated protein, expressed sequence tag, and genome datasets, 223 proteins and 15 peptides were complementarily identified with high quality MS/MS data.
Between weeks 16 and 24 of gestation the human fetal liver (HFL)1 is a major site of fetal hematopoiesis and is at the critical turning point between immigration and emigration of the hematopoietic system (1). Therefore, the unique characteristics of the fetal liver at this stage are worthy of investigation.
 |
SAMPLE PREPARATION
|
|---|
Chinese volunteers underwent induction of labor by breaking with water bag in Beijing Northern Taiping Road Hospital. Livers of Homo sapiens fetuses in gestation period of 1622 weeks were used for proteomic analysis after obtaining informed consent. All procedures were conducted in accordance with protocols approved by the local institutions ethical committee. Liver samples were immediately washed completely with iced PBS at 4 °C, and a portion of the samples was preserved in liquid nitrogen until use; the rest was used for subcellular fractionation. For the subcellular proteome analysis, nuclei, mitochondria, plasma membrane (PM), and cytosol were fractionated according to the procedure described by Fleischer and Kervina (2) with minor modifications. The purity and enrichment for fractionated organelles were determined by electron micrograph and enzyme marker assay and specific immunoblotting, respectively (Supplemental Fig. S1). (For details see the supplemental materials and methods.)
 |
TECHNOLOGY FOR PROTEIN EXPRESSION PROFILE
|
|---|
SDS-PAGE Separation
Proteins were separated by SDS-PAGE with different cross-linking percentages, 15, 10, and 7.5%, to obtain a full representation of proteins ranging from 5 kDa to more than 300 kDa. After separation, these gels were stained with Colloidal Coomassie Blue R250, and the gel lanes were manually excised from loading position to the bottom of the gel. After in-gel digestion with trypsin, the extracted peptide mixtures were loaded onto nanoscale LC-ESI-Q-TOF MS or micro-LC-ion trap MS systems for protein identifications.
IEF Separation
For IEF separation of proteins, 1 mg of proteins was mixed with 450 µl of rehydration buffer (8 M urea, 4% CHAPS, 65 mM DTT, 0.2% IPG buffer (pH 310), and trace bromphenol blue) and loaded on IPG strips (ReadyStrip IPG strips, 18 cm, pH 310, Bio-Rad). After IEF separation, the gel was equilibrated with equilibration buffer I (6 M urea, 2% SDS, 0.375 M Tris-HCl, 20 mM DTT, 20% glycerol, and trace bromphenol blue, pH 8.8) for 15 min and then in equilibration buffer II (6 M urea, 2% SDS, 0.375 M Tris-HCl, 2.5 mM iodoacetamide, 20% glycerol, and trace bromphenol blue, pH 8.8) for 15 min. The in-gel proteins were processed in the same manner as SDS-PAGE-derived bands to acquire peptide mixtures for MS analysis.
2DE Separation
500 µg of the HFL protein mixture were mixed with rehydration solution (8 M urea, 2% CHAPS, 0.5% IPG buffer, pH 47, 20 mM DTT, and a trace of bromphenol blue) to a total volume of 350 µl. After in-gel rehydration, 2DE separations were performed with IEF as the first dimension using IPGphor (Amersham Biosciences) and SDS-PAGE as the second dimension with Protean II (Bio-Rad), respectively, following the protocols described in the manufacturers manuals. For the separation of basic proteins with pH values between 6 and 9, a cup-loading method was used with the rehydration buffer above containing isopropanol and glycerol. Gels were visualized with a modified silver staining method (3). Protein spots were excised from 2DE gels and in-gel digested with trypsin.
Two-dimensional LC Separation
A two-dimensional liquid chromatograph (ProteomeLabTM PF 2D, Beckman Coulter), including one high pressure chromatofocusing module, one high pressure reversed-phase module, two UV detectors, one fraction collector/injection module, and 32 Karat software as well as one 215 liquid handler (Gilson) was used for the separation of the intact protein mixture and collection of different fractions in terms of set times, respectively. A strong anion exchange column (250 mm x 2.1-mm inner diameter; particle size, 1.5 µm; Beckman Coulter) installed in the high pressure chromatofocusing module was equilibrated with Start buffer (pH 8.5; Beckman Coulter) for at least 15min, and then 1.8 mg of protein extract were injected onto the column and separated with an eluent buffer (pH 4.0; Beckman Coulter) for 100 min. Fractions were collected per 0.3 pH unit or collected in 5 min if a 0.3 pH change was not achieved. The collected samples were then separated on a reverse phase chromatographic column (50 mm x 4.6-mm inner diameter; particle size, 1.5 µm; Beckman Coulter) with a gradient of 100% A (5% ACN, 0.1% TFA) to 100% B (95% ACN, 0.1% TFA) in 60 min. For each run 20 fractions were collected once every 1 min. The proteins in each fraction were lyophilized, reduced, alkylated, and digested with trypsin. The digests were analyzed by micro-LC-ESI-ion trap MS and CapLC-ESI-Q-TOF MS, respectively.
Modified Multidimensional Protein Identification Technology
1.0-mg pellets of plasma membrane proteins were digested with CNBr and trypsin (4). Then the desalted peptide mixture was separated on a strong cation exchange column (250 mm x 4.6-mm inner diameter; 5 µm and 300 Å; Thermo Keystone-Hypersil) with a gradient of 100% A (10 mM ammonium acetate, 20% ACN, pH 3.0) to 100% B (100 mM ammonium acetate, 20% ACN) over 30 min at a flow rate of 0.75 ml/min. The fractions were collected every 1 min, lyophilized, and resuspended in 7 µl of 5% ACN plus 0.1% formic acid for micro-RPLC-ESI-ion trap MS analysis.
MALDI-TOF MS and MALDI-TOF-TOF MS Identification of 2DE Gel Protein Spots
The extracted peptide mixture was mixed with an
-cyano-4-hydroxy-trans-cinnamic acid saturated solution in 0.1% TFA, 50% aqueous ACN and dispensed onto a 96-well plate for MALDI-TOF MS (M@LDI-R, Waters) analysis. MS data acquisition was first calibrated with lock mass adrenocorticotropic hormone fragment (amino position, 1839, MH+, 2,465.199 Da, ACTH) and then with a self-digested peak of porcine trypsin (MH+, 2,211.105 Da). Some peptides were analyzed by a 4700 Proteomics Analyzer (Applied Biosystems, Foster City, CA). Data were acquired automatically, and the MS spectra were processed by using the 4700 GPS ExplorerTM software. The spectra were recorded in a mass range from 900 to 3,500 Da with a focus mass of 1,600 Da. For one main spectrum, 20 subspectra with 100 shots/subspectrum with the first 10 shots discarded were accumulated using a random and uniform search pattern. The five peaks with the strongest intensity in the MS spectra (setting for laser intensity, 3,600; minimum signal/noise of 15 within the window of 200 Da; cluster area signal/noise of 20) were picked automatically and used to produce the tandem spectra (setting for laser intensity, 4,600; minimum signal/noise of 10 within the window of 200 Da).
CapLC-ESI Q-TOF MS
Nanoscale RP HPLC of the peptide mixture was carried out on a CapLC liquid chromatography system (Waters). Peptide mixtures were injected onto a precolumn (300-µm-inner diameter x 5-mm PepMap C18, 3- mm length; LC Packings, Amsterdam, The Netherlands) for desalting. The separation was performed on a capillary C18 column (75 µm x 15 cm; LC Packings) by running a gradient of 4% B (80% ACN, 0.1% formic acid) to 50% B in 60 min, and peptides were then directly eluted into a Q-TOF mass spectrometer (Q-TOF Micro, Waters) at a flow rate of about 200 nl/min. The positive ion mode was used; the spray voltage was set at 3.2 kV, and the spray temperature was 80 °C. MS/MS spectra (maximum 7.7 s) were acquired from the four most intense ions in each full scan with dynamic exclusion within 55 s. Raw data were processed using MassLynx Version 4.0 (smooth 3/2 Savitzky Golay and center four channels/80% centroid), and the resulting MS/MS dataset was exported into the pkl files (5).
Micro-HPLC-ESI Ion Trap MS
The microscale HPLC separation was performed on a Surveyor LC system (Thermo Finnigan), and the flow rate was maintained at 1.5 µl/min after flow splitting from the 180 µl/min set before splitting. The peptide mixtures were separated on a capillary column (Biobasic 180-µm inner diameter x 10 cm C18, 5 µm and 300 Å, silica, Thermo Keystone-Hypersil) with a gradient of 95% A (0.1% formic acid in water) to 50% B (95% ACN, 0.1% formic acid) in 60 min. The elution from RPLC column was directed on line to ESI-ion trap MS system (Thermo Finnigan). The positive ion mode was selected, the spray voltage was set at 3.2 kV, and the spray temperature was set at 160 °C. Collision energy was set at 35% for MS/MS. After the acquisition of the full scanning of a mass spectrum, three MS/MS mass spectra were acquired for the three most intense peptide ions using dynamic exclusion within 5.0 min. All data searching was performed using SEQUEST algorithm against the Internation Protein Index (IPI) protein database by BIOWORKS software.
 |
DATABASE QUERIES AND PROTEIN IDENTIFICATIONS
|
|---|
When the data produced by ESI-Q-TOF MS was searched against the IPI_human_2.33 protein database by MASCOT, mass tolerance of peptide precursor and its daughter ions was set at 0.2 Da in peptide sequence tag, and one possible missed cleavage for trypsin digestion was selected. Protein identifications were performed based on probability-based Mowse scoring algorithm with a confidence level of 95%. With MALDI-TOF MS, mass accuracy was set at 50 ppm in peptide mass fingerprinting, and one possible missed cleavage for trypsin digestion was selected in MASCOT searching against IPI_human_2.33 protein database; the proteins with a confidence level of more than 95% and that matched at least four peptides were considered to be a significant identification. For tandem analysis of gel-separated protein spots on a 4700 Proteomics Analyzer (Applied Biosystems), 4700 GPS Explorer software with an embedded Mowse scoring algorithm was used to provide identification at a level of 95% confidence. With ESI ion trap MS, data searching was carried out against the IPI_human_2.33 protein database with precursor tolerance of 1.4 Da. The protein identification criteria were set up based on
CN (
0.1) and Xcorr (single charge,
2.0; double charges,
2.2; triple charges,
3.5). If the protein was matched by only one peptide, which accounted for 37.6%, its mass spectrum was checked manually with at least three consecutive y ions or b ions with high intensity. All of these criteria were used for the quality control of the data process.
 |
PROTEIN EXPRESSION PROFILE OF HFL
|
|---|
The protein expression profile of HFL analyzed by subcellular fractionation and multiple approaches of protein (or peptide) separation and identification could provide a survey of proteins expressed extensively in HFL. The high throughput process is shown in Fig. 1. In addition to entire tissue lysate, four crucial subcellular components were also analyzed: PM, mitochondria, nucleus, and cytosol. By multiple protein separation and identification technology, 2,495 distinct proteins and 568 non-isoform groups were identified from 64,960 peptides and 24,454 distinct peptides (Supplemental Tables S1 and S2 and Figs. S2S10). Among those, 681 distinct proteins were identified in the tissue lysate, 543 distinct proteins were identified in PM, 1,492 distinct proteins were identified in cytosol, 522 distinct proteins were identified in mitochondria, and 512 distinct proteins were identified in nucleus (Table I). Each fraction contained specific proteins (identified in only one fraction) with the most in cytosol (47%) and the least in PM (also up to 35.5%).

View larger version (16K):
[in this window]
[in a new window]
|
FIG. 1. Flowchart of experimental platforms for the protein expression profiling of human fetal liver. SCX, strong cation exchange.
|
|
View this table:
[in this window]
[in a new window]
|
TABLE I Protein distribution identified in tissue lysate and subcellular components of HFL proteome
The detailed description of group process is in the supplemental materials and methods.
|
|
 |
COMPLEMENTARY IDENTIFICATION AND NOVEL PROTEIN MINING
|
|---|
In addition to the basic protein identification mentioned above, the MS data were also used for complementary identification (6) and novel protein mining (7). The best strategy for protein identification and novel protein mining is to search several different databases with various quality and coverage (Fig. 2A). In our dataset, the Q-TOF data were used for this analysis. Three additional databases were searched, including our integrated human protein database (including the common protein database and some predicted protein sequences from Ensembl (www.ensembl.org/)), human EST database, and human genome database. The MASCOT threshold scores of 45, 57, and 43 with 95% confidence were used for each identified peptide searching against the three databases, respectively.

View larger version (40K):
[in this window]
[in a new window]
|
FIG. 2. Complementary identification and novel protein discovery. A, the strategy for complementary identification and novel protein discovery. B, identification of novel peptides using Q-TOF data by the three additional databases and of the novel proteins from searching the IPI database. C, novel proteins by the analyses of identified novel peptides. "Int" indicates integrated protein database. D, newly discovered protein isoforms. E, examples for newly discovered protein isoforms (the three examples are a nonspecific degradation (NSD) isoform of heat shock protein 70, an alternative splicing (ASP) isoform of ß actin, and a protein variant (VAR) isoform of laminin-binding protein).
|
|
Fig. 2B summarizes the final results of identification and discovery of novel peptides and proteins from the HFL proteome (including those proteins identified above without detailed functional annotation). Of 2,495 proteins identified by searching the IPI database, 159 proteins were annotated as unknown proteins with low similarity (<50%) to known proteins in IPI or the National Center for Biotechnology Information non-redundant (NCBI nr) database (Supplemental Tables S1 and S2, marked proteins). By searching the integrated protein database, 272 novel peptides were identified that included 91 distinct peptides (Supplemental Table S3A) corresponding to 89 proteins. 151 novel peptides were identified from the EST database, corresponding to 87 distinct peptides (Supplemental Table S3B) and 86 ESTs (Fig. 2B). Another 21 peptides (11 distinct peptides) were found from the human genome (Supplemental Table S3C). Supplemental Table S1D shows the mass spectrum of identified novel peptides from the integrated protein database and EST database, indicating that most of them are of high quality.
These results were confirmed further by similarity searching with NCBI nr database and were cataloged into six classes: known proteins, novel isoforms, known protein sequences without functional annotation, protein sequences unknown, novel peptides in genome, and false positive peptides (known proteins match to the different frame of the EST data) (Fig. 2C).
The additional analyses of the complementary identification and novel protein mining revealed 132 novel protein isoforms (Fig. 2C and Supplemental Table S4). Further analysis showed that these isoforms could be classified into three categories: nonspecific degradation, alternative splicing, and protein variant (Fig. 2D). Several novel isoforms are presented in Fig. 2E, including a nonspecific degradation isoform of heat shock protein 70, an alternative splicing isoform of ß actin, and a protein variant of laminin-binding protein.
 |
ACKNOWLEDGMENTS
|
|---|
We thank Drs. Jian Wang, Gang Cheng, Tinggui Chen, Shuguang Ouyang, Yanzhi Yuan, Tao Li, Yinghua Tian, Na Wang, Qingfang Meng, Liyan Jiao, Xue Gao, and Yun Zhai for valuable dedication to this manuscript.
 |
FOOTNOTES |
|---|
Received, October 21, 2005, and in revised form, June 16, 2006.
Published, MCP Papers in Press, June 30, 2006, DOI 10.1074/mcp.M500344-MCP200
1 The abbreviations used are: HFL, human fetal liver; 2DE, two-dimensional gel electrophoresis; EST, expressed sequence tag; PM, plasma membrane; CapLC, capillary LC; RP, reverse phase; IPI, International Protein Index; NCBI nr, National Center for Biotechnology Information non-redundant. 
* This work was supported in part by Chinese State Key Projects for Basic Research (Grants 2001CB510204, 2001CB510201, and 2001CB510209) and for High-Tech (Grant 2002BA711A11), by National Natural Science Foundation of China for Creative Research Groups (Grant 30321003) and for general program (Grants 20275046 and 20405017), and by the Beijing Municipal Key Project (Grants H030230280190 and H030230280290). 
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. 
¶ These authors contributed equally to this work. 
** To whom correspondence may be addressed. E-mail: zhuyp{at}hupo.org.cn

To whom correspondence may be addressed. E-mail: qianxh{at}nic.bmi.ac.cn

To whom correspondence may be addressed. Tel./Fax: 8610-68177417 or 8610-68171208; E-mail: hefc{at}nic.bmi.ac.cn
 |
REFERENCES
|
|---|
- Yu, Y., Zhang, C., Zhou, G., Wu, S., Qu, X., Wei, H., Xing, G., Dong, C., Zhai, Y., Wan, J., Ouyang, S., Li, L., Zhang, S., Zhou, K., Zhang, Y., Wu, C., and He, F.
(2001) Gene expression profiling in human fetal liver and identification of tissue- and developmental-stage-specific genes through compiled expression profiles and efficient cloning of full-length cDNAs.
Genome Res.
11, 1392
1403[Abstract/Free Full Text]
- Fleischer, S., and Kervina, M.
(1974) Subcellular fractionation of rat liver.
Methods Enzymol.
31, 6
41[Medline]
- Shevchenko, A., Wilm, M., Vorm, O., and Mann, M.
(1996) Mass spectrometric sequencing of protein from silver-stained polyacrylamide gels.
Anal. Chem.
68, 850
858[Medline]
- Washburn, M. P., Wolters, D., and Yates, J. R., III
(2001) Large-scale analysis of the yeast proteome by multidimensional protein identification technology.
Nat. Biotechnol.
19, 242
247[CrossRef][Medline]
- Perkins, D. N., Pappin, D. J., Creasy, D. M., and Cottrell, J. S.,
(1999) Probability-based protein identification by searching sequence databases using mass spectrometry data.
Electrophoresis
20, 3551
3567[CrossRef][Medline]
- Kristiansen, T. Z., Bunkenborg, J., Gronborg, M., Molina, H., Thuluvath, P. J., Argani, P., Goggins, M. G., Maitra, A., and Pandey, A.
(2004) A proteomic analysis of human bile.
Mol. Cell. Proteomics
3, 715
728[Abstract/Free Full Text]
- Omenn, G. S., States, D. J., Adamski, M., Blackwell, T. W., Menon, R., Hermjakob, H., Apweiler, R., Haab, B. B., Simpson, R. J., Eddes, J. S., Kapp, E. A., Moritz, R. L., Chan, D. W., Rai, A. J., Admon, A., Aebersold, R., Eng, J., Hancock, W. S., Hefta, S. A., Meyer, H., Paik, Y. K., Yoo, J. S., Ping, P., Pounds, J., Adkins, J., Qian, X., Wang, R., Wasinger, V., Wu, C. Y., Zhao, X., Zeng, R., Archakov, A., Tsugita, A., Beer, I., Pandey, A., Pisano, M., Andrews, P., Tammen, H., Speicher, D. W., and Hanash, S. M.
(2005) Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database.
Proteomics
5, 3226
3245[CrossRef][Medline]

CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?