Originally published In Press as doi:10.1074/mcp.M700023-MCP200 on April 27, 2007.
Molecular & Cellular Proteomics 6:1183-1187, 2007.
© 2007 by The American Society for Biochemistry and Molecular Biology, Inc.
Research
Investigation of Human Protein Variants and Their Frequency in the General Population*,S
Dobrin Nedelkov
,
,¶,
David A. Phillips
,
Kemmons A. Tubbs
and
Randall W. Nelson
From the
Intrinsic Bioprobes Inc. and
Institute for Population Proteomics, Tempe, Arizona 85284
 |
ABSTRACT
|
|---|
Genetic variations and posttranslational modifications give rise to structural diversity in fully expressed human proteins. Structural modifications can also be induced during the life cycle of a protein and can lead to impaired functioning and pathological conditions. Although a large number of protein modifications have been discovered thus far, their incidence among the general population has not been determined. Here we show that human proteins exhibit a wide range of modifications present at various frequencies in the general population. The screening of 1,000 individuals from four geographical regions in the United States for five plasma proteins revealed the existence of 27 protein modifications. Some variants, such as those resulting from oxidation and single amino acid terminal truncations, were observed in the majority of individuals, whereas point mutations and extensive sequence truncations were detected in only a few individuals. Gender correlations were observed for two protein modifications. The data obtained reveal the extent of structural diversity in the general populace and represent the first such catalogue of structural protein modifications. Systematic studies of this kind will help redefine the normal human proteome and reveal the effects of these modifications in pathological processes.
Protein modifications are common for many proteins and occur as a result of genetic variations and postexpression processing. Whereas modifications at the gene level can readily be studied on a large scale, the changes that occur at the protein level have been much harder to assess across larger populations. Recent advances in mass spectrometric methods of detection and sample processing have opened up the realms of population proteomics wherein human proteins are investigated across and within populations to define and understand protein difference and to facilitate the discovery and validation of disease-specific protein modulations (1, 2). Mass spectrometry measures a unique feature of each fully expressed protein, its molecular mass. Changes in the protein structure resulting from structural modifications are reflected in its molecular mass and can be detected via MS without a priori knowledge of the modification. In regard to sample processing, high throughput immunoaffinity methods can be used for protein retrieval from complex biological samples such as human plasma or serum in preparation for MS detection. The resulting mass spectrometric immunoassays (3) are ideally suited for studying structural diversity in human proteins within large populations. In a recent pilot study using MS immunoassays, 25 plasma proteins from 96 healthy individuals were examined, yielding a total of 53 structural variants with various frequencies in the sample cohort (4). Here we greatly expand the number of samples to get an accurate view of the distribution of some of these protein modifications in the general population. One thousand individuals from four geographical regions in the United States were selected, and the protein modifications for ß2-microglobulin, cystatin C, retinol-binding protein (RBP),1 transferrin, and transthyretin were delineated (In the preliminary study of 96 individuals, these five proteins accounted for 19 of the 53 protein variants observed (4).). The plasma concentration of the five proteins is in the medium-to-high range (1 mg/liter to 1 g/liter), and each has been shown to have clinical and diagnostic value for a number of ailments (5, 6).
 |
EXPERIMENTAL PROCEDURES
|
|---|
Rabbit anti-human polyclonal antibodies to ß2-microglobulin (A0072), cystatin C (A0451), retinol-binding protein (A0040), transferrin (A0061), and transthyretin (A0002) were obtained from DAKO (Carpinteria, CA). 1,1'-Carbonyldiimidazole-activated affinity pipettes (Intrinsic Bioprobes Inc., Tempe, AZ) were prepared and derivatized with the antibodies as described previously (7). Proteolytic enzyme-derivatized MALDI targets (Intrinsic Bioprobes Inc.) were manufactured and used as described elsewhere (8).
One thousand samples of sodium heparin human plasma (317 females; 683 males; ages, 1878 years) in 5-ml volumes were obtained through ProMedDX (Norton, MA). The samples were collected at certified blood donor and medical centers in California, Florida, Tennessee, and Texas (250 samples from each state) and designated as normal based on their non-reactivity for common blood infectious agents and the donor information itself. The samples were labeled only with a barcode and an accompanying specification sheet containing information about the gender, age, and geographical origin of each donor, ensuring proper privacy protection. A 100-µl aliquot of each sample was placed into a particular 2-ml well on a 96-well Masterblock plate (Greiner, Longwood, FL; catalogue number 780271) and diluted with 900 µl of HBS physiological buffer (HEPES-buffered saline, 10 mM HEPES, pH 7.4, 150 mM NaCl) to a 10-fold working sample dilution. A total of 11 96-well trays containing diluted plasma samples were prepared. Because each MS immunoassay depletes the samples of a specific protein, all five protein assays were performed by sequentially addressing each tray with successive protein extractions. The trays were prepared immediately prior to the assaying, limiting the storage of individual sample trays at 4 °C to 5 days. Samples from the four different states were analyzed interchangeably to limit the geographical bias.
Processing of the 96-plasma sample trays was accomplished with a Beckman Multimek automated 96-channel pipettor (Beckman Coulter, Fullerton, CA). The protein extraction/affinity capture process followed previously established protocols (4). Antibody-derivatized affinity pipettes were mounted onto the head of the Multimek pipettor and initially rinsed with 100 µl of HBS physiological buffer (10 cycles, each cycle consisting of a single aspiration and dispensation through the affinity pipettes). Next the pipettes were immersed into the sample tray, and 150 aspirations and dispensation cycles (100-µl volumes each) were performed, allowing for affinity capture of the targeted protein. The pipettes were then rinsed with HBS physiological buffer (10 cycles); water (10 cycles); 2 M ammonium acetate, acetonitrile (3:1, v/v) mixture (10 cycles); and two final water rinses (10 cycles each). The affinity pipettes containing the retrieved protein were then rinsed with 1 mM N-octyl glucoside (single cycle with a 150-µl aliquot) to homogenize the subsequent MALDI matrix draw and elution by completely wetting the porous affinity supports inside the pipettes. For elution of the captured proteins, 6-µl aliquots of MALDI matrix (6 g/liter
-cyano-4-hydroxycinnamic acid in aqueous solution containing 33% (v/v) acetonitrile and 0.4% (v/v) trifluoroacetic acid) were aspirated into the affinity pipettes, and after a 10-s delay (to allow for the dissociation of the protein from the capturing antibody), the eluates from all 96 affinity pipettes containing the targeted protein were dispensed directly onto a 96-well formatted MALDI target. Following air drying and visual inspection of the sample spots, linear mass spectra were acquired on an Autoflex MALDI-TOF mass spectrometer (Bruker Daltonics, Billerica, MA). The resulting mass spectra were evaluated using Zebra software (Intrinsic Bioprobes Inc.), which, when used in conjunction with PAWS (sequence display and manipulation software from Proteometrics, New York, NY), allows for rapid identification of protein sequence modifications via display of mass values and differences between peaks. A peak was assigned to a specific modification if the observed m/z value differed by less than 0.02% from that empirically predicted (For transferrin, which is a higher mass glycosylated protein, a mass value within 0.1% from that empirically calculated was deemed acceptable.).
Peptide mapping validation experiments with trypsin-derivatized mass spectrometer probes were performed as described previously (8, 9). The digest mass spectra were acquired in reflectron mode on an Autoflex II TOF-TOF MS instrument (Bruker Daltonics).
 |
RESULTS AND DISCUSSION
|
|---|
Fig. 1 summarizes the results of the current study, listing the protein modifications observed and the frequency of each in the 1,000-sample cohort. Of the 27 modifications detected, 20 were posttranslational modifications and 7 were point mutations. Wild-type protein signals were observed for each protein in all 1,000 samples (except for two individuals exhibiting homozygous transthyretin and cystatin C point mutations). The frequency of the modifications was wide ranged. Variants resulting from oxidation were observed most frequently across the 1,000 samples (e.g. cystatin C oxidation and transthyretin cysteinylation were observed in all 1,000 samples). Single amino acid truncations were also common among the individuals, such as the retinol-binding protein missing its C-terminal leucine residue. Least frequent were variants arising from point mutations and extensive sequence truncations (in this group were most of the transthyretin point mutations and the excessive cystatin C and retinol-binding protein truncations). In total, six modifications can be considered to be of high occurrence (observed in >80% of the samples), five are of medium frequency (2050% of the samples), and 16 are low frequency modifications observed in <7% of the samples. Nine of the low frequency modifications were not observed in the pilot study of the 96 samples (mass spectra showing the modifications are provided in Supplemental Fig. 1). As expected, only by increasing the size of the population studied did it become possible to detect these low occurrence protein modifications. When the frequencies of the modifications in the 1,000 samples were compared with those in the 96-sample cohort study, an excellent correlation was obtained (Supplemental Fig. 2). For example, carbohydrate-deficient transferrin was observed in 66 of the 1,000 samples and in seven of the 96 samples. Similarly the cystatin C point mutation was observed in only one of the 96 samples, whereas in the 1,000-sample cohort it was detected in 10 samples.

View larger version (23K):
[in this window]
[in a new window]
|
FIG. 1. Protein modifications observed and their frequency in the 1,000-sample cohort. Yellow, posttranslational modifications; purple, point mutations.
|
|
Because each of the 1,000 samples was characterized by its geographical origin, age, and gender, we investigated possible correlation of the observed modifications in regard to these attributes (Supplemental Fig. 3; the graphs were generated from Supplemental Table 1, which contains all the data for the 1,000 samples in a Microsoft Excel format). Several interesting observations were made. To start with, the samples obtained from California contained significantly fewer protein modifications than the samples obtained from Florida, Tennessee, and Texas. The ß2-microglobulin des-Lys58 truncation, transferrin deglycosylation, and cystatin C des-SSPGKPPR truncation were not observed in the samples from California at all. Furthermore retinol-binding protein des-LL and cystatin C des-SSPG were represented only at a small percentage in the samples from California compared with the other three states. Differences in sample collection and storage conditions are both plausible explanations for this occurrence. However, the samples from all four states were collected in the same way within a 3-month window in the spring of 2005 and stored under identical conditions until analysis. Furthermore the other transthyretin and cystatin modifications were well represented in the California population. Nonetheless the observed differences in regard to the above mentioned modifications in the California population should be interpreted with reservations.
When adjustments were made for gender and age biases within each geographical group (the California population included more females than the other three states and was slightly older), correlations were discovered in regard to the gender distribution of two protein modifications. Of the 66 individuals with carbohydrate-deficient transferrin, only three were females, i.e.
1% of the females and
10% of the males in the entire cohort contained carbohydrate-deficient transferrin (Fig. 2). Furthermore 12 of those 66 individuals contained a carbohydrate-deficient transferrin form that lacked both glycan chains, and all 12 were male (Supplemental Fig. 3). The absence of the glycan chain(s) in transferrin can be a result of carbohydrate-deficient glycoprotein syndrome (10) and chronic alcohol consumption (11), conditions that are generally not gender-dependent. However, due to higher prevalence for alcohol dependence in males compared with females in the United States (12), the gender bias observed in this study might be a reflection of alcohol drinking habits in the general population (carbohydrate-deficient transferrin is a Food and Drug Administration-approved clinical biomarker for alcoholism). The second gender correlation was related to cystatin C: all 10 of the cystatin C point mutations were found in males. However, at this point of time the source of this point mutation is unknown, and the significance of this gender bias remains unclear.
In addition to the qualitative assessments of the protein modifications in the mass spectra, information about their abundance (relative to the wild-type protein) can also be obtained. The signals (peak heights) for each protein isoform can be compared with that of the wild-type protein to determine whether the isoform is more abundant in the sample (13). Using this protein profiling method, we found that the two retinol-binding protein modifications showed the widest range of variations in regard to the wild-type protein (Fig. 3). The des-Leu form was most abundant in 216 samples, whereas the des-LL variant was most abundant in 15 samples (Supplemental Fig. 3) (increased des-Leu and des-LL levels were least abundant in the samples from California in line with the previous observation about the lower number of modifications in the California population). C-terminal truncation of retinol-binding protein can have a profound effect on its binding to transthyretin as its C terminus is nestled in a hydrophobic patch at the RBP-transthyretin complex interface (14). Furthermore elevated levels of the truncated forms of RBP have been found in patients with chronic renal failure (15).
When all the modifications were considered,2 the number of modifications that each sample contained was in the range of 514 with majority of the samples (98%) containing 11 or fewer modifications per sample (Supplemental Fig. 4). This relatively low range of modifications per sample is a clear indication that the modifications were randomly spread among the individuals without significant grouping of multiprotein modifications in any sample. Furthermore gender or age correlations in the number of modifications per sample data were not found.
The 1,000 samples examined were obtained from four geographically diverse states in the United States. Thus, we can assume that the data obtained in this study represent the normal distribution of these protein modifications in the general United States population. This systematic study of protein modifications and variants using MS methods of detection is the first of its kind; past large scale studies of variations at the protein level have relied on electrophoretic techniques (16). Recent human diversity studies at the genome level have helped redefine the "normal" human genome (17, 18) and have identified human genes that are mutated in cancer (19). Similar protein-based population studies, with the focus on individuals rather than entire proteome delineation, will help map all the postexpression protein modifications and determine the wild-type protein profiles and the extent of variations across and within populations. With a new definition of the normal human proteome in hand, we will be better prepared to study the effects of these modifications in pathological processes and evaluate their potential as new biomarkers of disease.
 |
FOOTNOTES |
|---|
Received, January 22, 2007, and in revised form, April 10, 2007.
Published, MCP Papers in Press, April 27, 2007, DOI 10.1074/mcp.M700023-MCP200
1 The abbreviation used is: RBP, retinol-binding protein. 
2 In calculating the modifications per sample, the carbohydrate-deficient transferrin lacking both glycan chains, the elevated amounts of retinol-binding protein isoforms, and the homozygous point mutations in transthyretin and cystatin C were counted as additional modifications, bringing the total number of modifications to 32. 
* This work was supported by grants from the National Institutes of Health (to D. N., K. A. T., and R. W. N.). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 
S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material. 
¶ To whom correspondence should be addressed: Intrinsic Bioprobes Inc., 2155 E. Conference Dr., Suite 104, Tempe, AZ 85284. Tel.: 480-804-1778; Fax: 480-804-0778; E-mail: dnedelkov{at}intrinsicbio.com or dnedelkov{at}populationproteomics.org
 |
REFERENCES
|
|---|
- Nedelkov, D.
(2005) Population proteomics: addressing protein diversity in humans.
Expert Rev. Proteomics
2, 315
324[CrossRef][Medline]
- Nedelkov, D., Kiernan, U. A., Niederkofler, E. E., Tubbs, K. A., and Nelson, R. W.
(2006) Population proteomics: the concept, attributes, and potential for cancer biomarker research.
Mol. Cell. Proteomics
5, 1811
1818[Abstract/Free Full Text]
- Nedelkov, D.
(2006) Mass spectrometry-based immunoassays for the next phase of clinical applications.
Expert Rev. Proteomics
3, 631
640[CrossRef][Medline]
- Nedelkov, D., Kiernan, U. A., Niederkofler, E. E., Tubbs, K. A., and Nelson, R. W.
(2005) Investigating diversity in human plasma proteins.
Proc. Natl. Acad. Sci. U. S. A.
102, 10852
10857[Abstract/Free Full Text]
- Ritchie, R. F. (ed)
(1999)
Serum Proteins in Clinical Medicine, Foundation for Blood Research, Scarborough, ME
- Craig, W. Y., Ledue, T. B., and Ritchie, R. F.
(2001)
Plasma Proteins: Clinical Utility and Interpretation, Foundation for Blood Research, Scarborough, ME
- Niederkofler, E. E., Tubbs, K. A., Kiernan, U. A., Nedelkov, D., and Nelson, R. W.
(2003) Novel mass spectrometric immunoassays for the rapid structural characterization of plasma apolipoproteins.
J. Lipid Res.
44, 630
639[Abstract/Free Full Text]
- Nedelkov, D., Tubbs, K. A., Niederkofler, E. E., Kiernan, U. A., and Nelson, R. W.
(2004) High-throughput comprehensive analysis of human plasma proteins: a step toward population proteomics.
Anal. Chem.
76, 1733
1737[Medline]
- Kiernan, U. A., Black, J. A., Williams, P., and Nelson, R. W.
(2002) High-throughput analysis of hemoglobin from neonates using matrix-assisted laser desorption/ionization time-of-flight mass spectrometry.
Clin. Chem.
48, 947
949[Free Full Text]
- Carchon, H., Van Schaftingen, E., Matthijs, G., and Jaeken, J.
(1999) Carbohydrate-deficient glycoprotein syndrome type IA (phosphomannomutase-deficiency).
Biochim. Biophys. Acta
1455, 155
165[Medline]
- Arndt, T.
(2001) Carbohydrate-deficient transferrin as a marker of chronic alcohol abuse: a critical review of preanalysis, analysis, and interpretation.
Clin. Chem.
47, 13
27[Abstract/Free Full Text]
- World Health Organization
(2004)
WHO Global Status Report on Alcohol, World Health Organization, Geneva
- Kiernan, U. A., Nedelkov, D., Tubbs, K. A., Niederkofler, E. E., and Nelson, R. W.
(2004) Selected expression profiling of full-length proteins and their variants in human plasma.
Clin. Proteomics. J.
1, 7
16[CrossRef]
- Naylor, H. M., and Newcomer, M. E.
(1999) The structure of human retinol-binding protein (RBP) with its carrier protein transthyretin reveals an interaction with the carboxy terminus of RBP.
Biochemistry
38, 2647
2653[CrossRef][Medline]
- Jaconi, S., Rose, K., Hughes, G. J., Saurat, J. H., and Siegenthaler, G.
(1995) Characterization of two post-translationally processed forms of human serum retinol-binding protein: altered ratios in chronic renal failure.
J. Lipid Res.
36, 1247
1253[Abstract]
- Neel, J. V., Satoh, C., Goriki, K., Fujita, M., Takahashi, N., Asakawa, J., and Hazama, R.
(1986) The rate with which spontaneous mutation alters the electrophoretic mobility of polypeptides.
Proc. Natl. Acad. Sci. U. S. A.
83, 389
393[Abstract/Free Full Text]
- International HapMap Consortium
(2005) A haplotype map of the human genome.
Nature
437, 1299
1320[CrossRef][Medline]
- Redon, R., Ishikawa, S., Fitch, K. R., Feuk, L., Perry, G. H., Andrews, T. D., Fiegler, H., Shapero, M. H., Carson, A. R., Chen, W., Cho, E. K., Dallaire, S., Freeman, J. L., Gonzalez, J. R., Gratacos, M., Huang, J., Kalaitzopoulos, D., Komura, D., MacDonald, J. R., Marshall, C. R., Mei, R., Montgomery, L., Nishimura, K., Okamura, K., Shen, F., Somerville, M. J., Tchinda, J., Valsesia, A., Woodwark, C., Yang, F., Zhang, J., Zerjal, T., Armengol, L., Conrad, D. F., Estivill, X., Tyler-Smith, C., Carter, N. P., Aburatani, H., Lee, C., Jones, K. W., Scherer, S. W., and Hurles, M. E.
(2006) Global variation in copy number in the human genome.
Nature
444, 444
454[CrossRef][Medline]
- Sjoblom, T., Jones, S., Wood, L. D., Parsons, D. W., Lin, J., Barber, T. D., Mandelker, D., Leary, R. J., Ptak, J., Silliman, N., Szabo, S., Buckhaults, P., Farrell, C., Meeh, P., Markowitz, S. D., Willis, J., Dawson, D., Willson, J. K., Gazdar, A. F., Hartigan, J., Wu, L., Liu, C., Parmigiani, G., Park, B. H., Bachman, K. E., Papadopoulos, N., Vogelstein, B., Kinzler, K. W., and Velculescu, V. E.
(2006) The consensus coding sequences of human breast and colorectal cancers.
Science
314, 268
274[Abstract/Free Full Text]

CiteULike
Complore
Connotea
Del.icio.us
Digg
Reddit
Technorati What's this?