Abstract
Cerebrospinal fluid (CSF) is a potential source of biomarkers for many disorders of the central nervous system, including Alzheimer disease (AD). Prior to comparing CSF samples between individuals to identify patterns of disease-associated proteins, it is important to examine variation within individuals over a short period of time so that one can better interpret potential changes in CSF between individuals as well as changes within a given individual over a longer time span. In this study, we analyzed 12 CSF samples, composed of pairs of samples from six individuals, obtained 2 weeks apart. Multiaffinity depletion, two-dimensional DIGE, and tandem mass spectrometry were used. A number of proteins whose abundance varied between the two time points was identified for each individual. Some of these proteins were commonly identified in multiple individuals. More importantly, despite the intraindividual variations, hierarchical clustering and multidimensional scaling analysis of the proteomic profiles revealed that two CSF samples from the same individual cluster the closest together and that the between-subject variability is much larger than the within-subject variability. Among the six subjects, comparison between the four cognitively normal and the two very mildly demented subjects also yielded some proteins that have been identified in previous AD biomarker studies. These results validate our method of identifying differences in proteomic profiles of CSF samples and have important implications for the design of CSF biomarker studies for AD and other central nervous system disorders.
Cerebrospinal fluid (CSF)1 is produced mainly by the choroid plexus within the ventricles of the brain. It circulates through the ventricular system and around the outside of the brain and spinal cord in the subarachnoid space. CSF is clinically accessible through standard lumbar puncture techniques. Because it is in direct contact with the brain interstitial fluid, biochemical changes taking place in the brain are often reflected in CSF. These features make CSF potentially a very useful source of biomarkers for diagnosis and response to treatment as well as for providing information on pathological processes underlying a number of central nervous system disorders, including Alzheimer disease (AD) (for reviews, see Refs. 1 and 2). Based on known pathology and the hypothesized etiology of AD, a few AD biomarkers in CSF have been identified that may differentiate groups of individuals with clinical disease from those who are cognitively normal, including amyloid β (Aβ) 42, total tau, phospho-tau, isoprostane, and sulfatide (3–7). Although some candidate biomarkers have shown promise, each has its limitations. Researchers are still searching for biomarkers that have a higher sensitivity and specificity to differentiate AD from normal aging and other dementias, especially in the early stages. It would also be especially useful to develop antecedent biomarkers for AD, considering its estimated preclinical phase of 10–20 years (8).
New proteomics technologies, notably development of more sensitive and accurate MS techniques, have improved the ability to discover new disease biomarkers. A classical approach involves comparison of the CSF proteomes of AD patients and controls with 2-DE and subsequent identification of differentially expressed proteins/peptides with MS. There are only a few published comparative studies using 2-DE/MS to analyze CSF proteins in AD (9–11); one study (12) utilized SELDI-TOF-MS, and one study used liquid chromatography and ICAT (13, 14) to analyze the CSF proteome of AD and control groups (15). However, a very important issue has not been addressed in previous studies, namely the intraindividual variation in the CSF proteome. Knowledge of fluctuations in protein abundance within an individual over a short period of time (i.e. weeks) will allow one to better interpret potential changes in CSF samples observed between individuals as well as changes within a given individual over a longer time span (i.e. years), an issue especially meaningful for AD because of its estimated preclinical phase lasting 10–20 years and its slow progression (8).
Variability between gels is a serious shortcoming of conventional 2-DE. Significant variation in the presence and patterns of protein spots occurs between different gels even from identical samples. Two-dimensional (2-D) DIGE minimizes this limitation by enabling multiple samples to be analyzed on the same gel (16, 17). In fluorescence 2-D DIGE, each sample is labeled with one of three spectrally distinguishable fluorescent dyes (i.e. Cy2, Cy3, and Cy5) before combining and analyzing on the same 2-D gel. The Cy dyes are covalently attached to proteins via lysine residues prior to electrophoresis. The dyes are matched for molecular weight, and the positive charge originally associated with the free lysine residue in proteins is replaced with the quaternary amino group in the dye molecule. Labeled proteins migrate at similar isoelectric points and vary only slightly in size from their original state. As a result, up to three samples can be analyzed on the same gel, allowing more reliable determination of differential expression. In addition, inclusion of an internal standard of pooled samples on all gels allows for improved intergel alignment of gel features and relative quantification of spot volumes (16). Although there have been numerous examples of the application of 2-D DIGE technology to detect protein differences (18–25), to our knowledge, there are no published reports using this technique to address inter- and intraindividual variability in the CSF proteome.
The main objective of this study was to evaluate variability in the CSF proteome associated with longitudinal collection of CSF from individuals. Utilizing single gel, multiple image analysis, we were able to identify within-subject differences with a high degree of confidence. By including a pooled sample in every gel as an internal standard, we were able to match and perform relative quantification of spots across gels and compare the degree of within-subject variation with that of between-subject variation. This study is an important component in our long term research program to identify biomarkers for preclinical and very mild AD.
EXPERIMENTAL PROCEDURES
CSF Sampling
Twelve human CSF samples were obtained by lumbar puncture (LP) from six subjects enrolled in the Memory and Aging Project at Washington University as part of an ongoing biomarker study. The study protocol was approved by the Human Studies Committee at Washington University, and we obtained written and verbal informed consent from participants at enrollment. Our consent form is available upon request. Two samples were obtained from the same individual 2 weeks apart. We chose 2 weeks for practical reasons. Following the first LP, we wanted the skin, subcutaneous tissue, and meninges to have adequate time for repair prior to a second LP. Also we were interested in examining short term variability (over days or weeks) rather than long term variability (over months or years). All LPs were performed at the same time of the day with no fasting requirement. Samples were taken to preserve any gradient that existed and were collected with a 25-gauge needle. A total of 30 ml of CSF was obtained: the first 20 ml was collected for other purposes, and the last 10 ml was used for this analysis. All CSF samples were free of blood contamination. After collection, CSF samples were briefly centrifuged at 1,000 × g to pellet any cell debris, frozen, and stored in polypropylene tubes at −80 °C in 0.5-ml aliquots until analysis. The age of these six individuals ranged from 64 to 91 years. Four of them had a Clinical Dementia Rating (CDR) score of 0 (cognitively normal) (26), and two of them were rated as CDR 0.5 (very mild dementia, believed to be clinically due to AD). The protein content in each CSF sample was determined with the micro-BCA protein assay kit (Pierce), and it ranged from 570 to 1,000 μg/ml.
Multiaffinity Immunodepletion of CSF Proteins
Because albumin, IgG, α1-antitrypsin, IgA, transferrin, and haptoglobin collectively account for ∼80% of the total CSF protein content (27), we selectively removed these proteins to enrich for proteins of lower abundance. An antibody-based multiaffinity removal system (Agilent Technologies, Palo Alto, CA) was used according to the manufacturer’s instructions. Briefly 1.5–2 ml of CSF was concentrated and buffer-exchanged with Agilent Buffer A to a final volume of ∼50 μl using Amicon Ultra-4 centrifugal filter units (10-kDa cut-off) (Millipore). Samples were then diluted to 200 μl with Buffer A and passed through an Ultra-free MC microcentrifuge filter (0.22 μm) (Millipore) to remove particulates. The filtrate was injected at 0.25 ml/min onto a 4.6 × 50-mm multiple affinity removal column equilibrated at room temperature with Agilent Buffer A on a Microtech (Vista, CA) Ultra-Plus HPLC system. CSF devoid of high abundance proteins (flow-through) was collected between 1.5 and 6 min. After 9 min of elution with Buffer A, the eluant was changed to Agilent Buffer B at 1 ml/min. The six bound proteins were eluted from the column between 10 and 14 min. After 3.5 min, the column was regenerated with Buffer A.
2-D DIGE
Depleted CSF samples were buffer-exchanged and concentrated with lysis buffer (30 mm Tris-Cl, pH 7.8, 7 m urea, 2 m thiourea, 4% CHAPS containing protease inhibitors (catalog number 697498, Roche Diagnostics) and phosphatase inhibitors (catalog numbers 524624 and 524625, EMD Biosciences, Darmstadt, Germany) using Amicon Ultra-4 centrifugal filter units (10-kDa cut-off) (Millipore). The protein concentration was determined with a modified Lowry method (PlusOne 2D-Quant kit, Amersham Biosciences). Fifty micrograms of protein from each sample was labeled with 400 pmol of one of three N-hydroxysuccinimide cyanine dyes for proteins (Amersham Biosciences), diluted with rehydration buffer (7 m urea, 2 m thiourea, 4% CHAPS, 2.5% DTT, 10% isopropanol, 5% glycerol, and 2% Pharmalyte pH 3–10), combined according to experimental design, and equilibrated with IPG strips (24 cm; pH 3–10, nonlinear). The three samples that were equilibrated with each IPG strip consisted of two depleted CSF samples from the same individual (Cy2 and Cy5) and a pooled sample (pooled using an equal volume aliquot of each of the 12 CSF samples) (Cy3) as the internal standard. First dimension isoelectric focusing was performed at 65.6 kV-h in an Ettan IPGphor system (Amersham Biosciences). The strips were then treated with reducing and alkylating solutions prior to the second dimension (SDS-PAGE). After equilibration with a solution containing 6 m urea, 30% glycerol, 2% SDS, 50 mm Tris-Cl, pH 7.8, 32 mm DTT, the strips were treated with the same solution containing 325 mm iodoacetamide instead of DTT. The strips were overlayered onto a 10% isocratic or gradient SDS-PAGE gel (20 × 24 cm), immobilized to a low fluorescence glass plate and electrophoresed for ∼18 h at 1 watt/gel. The Cy2-, Cy3-, and Cy5-labeled images were acquired on a Typhoon 9400 scanner (Amersham Biosciences) at the excitation/emission values of 488/520, 532/580, 633/670 nm, respectively.
Image Analyses
Intragel spot detection and quantification and intergel matching and quantification were performed using Differential In-gel Analysis (DIA) and Biological Variation Analysis (BVA) modules of DeCyder software version 5.01 (Amersham Biosciences) as described previously (16, 17). Briefly in DIA, the Cy2, Cy3, and Cy5 images for each gel were merged, spot boundaries were automatically detected, and normalized spot volumes (protein abundance) were calculated. During spot detection, the estimated number of spots was set at 3,500, and the exclude filter was set as follows: slope, >1.1; area, <100; peak height, <100; and volume, <10,000. This analysis was used to calculate abundance differences in given proteins between two samplings from the same individual. The resulting spot maps were exported to BVA. Matching of the protein spots across six gels was performed after several rounds of extensive land marking and automatic matching. Dividing each Cy2 or Cy5 spot volume with the corresponding Cy3 (internal standard) spot volume within each gel gave a standard abundance, thereby correcting intergel variations. For each of the CSF samples, a profile was created that consisted of standard abundance for all of the matched spots.
Protein Digestion and Mass Spectrometry
Gel features were selected in the DeCyder software and the X and Y coordinates were saved in a file for spot excision. After translation using in-house software (Imagemapper), the central core (1.8 mm) of the selected gel features was excised with a ProPic robot (Genomics Solutions, Ann Arbor, MI) and transferred to a 96-well PCR plate. The gel pieces were then digested in situ with trypsin using a modification of a published method (28). To maximize specificity, sensitivity, and sequence coverage of the digested proteins, the resulting peptide pools were analyzed by tandem MS using both MALDI and ESI. Spectra of the peptide pools were obtained on a MALDI-TOF/TOF instrument (Proteomics 4700, Applied Biosystems, Foster City, CA) (29). The initial spectra were used to determine the molecular weights of the peptides (to within 20 ppm of their theoretical masses). Selected precursor ions were then focused in the instrument using a timed ion selector (29), and peptide fragmentation spectra were produced after high energy (1.5-keV) collision-induced dissociation. ESI-MS was performed using an advanced capillary LC-MS/MS system (Eksigent nano-LC 1D Proteomics, Eksigent Technologies, Livermore, CA). A nanoflow (200 nl/min) pulse-free liquid chromatograph was interfaced to a quadrupole time-of-flight mass spectrometer (Q-STAR XL, Applied Biosystems) using a PicoView system (New Objective, Woburn, MA). Sample injection was performed with an Endurance autosampler (Spark Holland, Plainsboro, NJ). The peptide fragmentation spectra were processed using Data Explorer version 4.5 or Analyst software (Applied Biosystems). After centroiding and background subtraction, the peak lists were used to search databases with MASCOT version 1.9 (Matrix Sciences, Boston, MA). Peptide sequences were qualified by manual interpretation of raw non-centroided spectra.
Statistical Analyses
Threshold Selection—
The DIA software performs a log transformation of the volume ratios and uses them to generate a frequency histogram. A normal distribution is fitted to the main peak of the frequency histogram. After normalization, this fitted distribution curve centers on 0, which represents proteins with unaltered abundance. Model standard deviation (S.D.) is then derived based on the normalized model curve. 2 S.D., the volume ratio for 2 S.D. based on the raw data, is the software-recommended cut-off. In a normally distributed data set, 95% of data points would fall within this value. Based on the observation that 2 S.D. ranged from 1.31 to 1.52 for the six comparisons, gel features changing by >1.5 in spot volume were considered significant.
p Value Determination for Intraindividual Variation—
We estimated the statistical significance of observing different levels of the same protein in multiple intraindividual comparisons by describing the data as a binomial distribution and calculating the probability of the observed events. Our null hypotheses are as follows: 1) all intraindividual comparison experiments are independent from each other, and 2) in any intraindividual comparison, protein levels should not change; therefore any observed change should be random and represent system fluctuation rather than a property of an individual protein. For any given experiment (intraindividual comparison) that follows the null hypotheses, the probability of any protein changing its expression level is pc. This value can be estimated by maximum a posteriori estimation; i.e. based on the observed number of protein spots detected in a given gel and the observed number of spots determined to have altered abundance (i.e. having a >1.5 spot volume ratio) between the two time points in an intraindividual comparison, we calculated pc such that the probability of observing the experimental data given pc is maximized. In N independent trials (in this case six intraindividual comparisons), the probability of observing the same protein having changed abundance in n or more individuals is as follows.
pc is determined to be 0.0140, and N is 6. Because not all of the “changed” spots are identified by MS/MS, this p value is likely an underestimation of the significance.
Hierarchical Clustering and Multidimensional Scaling Analysis—
Hierarchical clustering was performed using Spotfire (Spotfire, Somerville, MA) software. Unweighted pair group method with arithmetic mean (UPGMA) was selected as the clustering method, and Euclidean distance was selected as the similarity measure. BRB Arraytools (linus.nci.nih.gov/BRB-ArrayTools.html) were used for multidimensional scaling analysis. Euclidean distance was used to measure similarity.
RESULTS
Within-gel Analyses: Intraindividual Variations—
Because high abundance proteins comprise a large portion of the protein content of CSF and thus dominate 2-D gel images, we removed these proteins to increase our ability to detect and quantify proteins of lower abundance. We used a column-based multiple immunoaffinity system to remove six proteins (albumin, IgG, α1-antitrypsin, IgA, transferrin, and haptoglobin) from human CSF. This depletion technique has been shown in a proteomic study on serum to be superior to three other similar depletion methods and resulted in a 76% increase in the number of protein spots detected (30). When loading the same amount of total protein, our CSF study showed a 99% increase in the number of spots detected (Fig. 1; quantitative data not shown): up to ∼2,100 spots were detected on a gel. Fig. 2 is a representative 2-D DIGE image of a postdepletion CSF sample used in this study.
2-D DIGE analysis of CSF prior to and following depletion of high abundance proteins. The same amount of protein (19 μg) in CSF prior to depletion (A) and following depletion (B) and in the retained proteins (C) was labeled with Cy2 (blue), Cy3 (green), and Cy5 (red), respectively, and analyzed on a single gel (10% isocratic SDS-PAGE gel). D, overlay of all three fluorescent images demonstrates the position of the depleted proteins (pink) with respect to the low abundance proteins revealed by the depletion method.
A representative 2-D DIGE image (Cy2-labeled) of CSF that has been depleted of six high abundance proteins. 50 μg of protein was labeled and resolved first on a pH 3–10 IPG strip and further separated on a 10–20% gradient SDS-PAGE gel.
Two CSF samples were taken from the same individual 2 weeks apart (time point 1 (T1) and time point 2 (T2)) and were analyzed on the same gel together with a pooled sample as an internal standard. Six individuals were analyzed, translating into 12 CSF samples and six gels. The internal standard was a pool that included an equal volume aliquot of all 12 CSF samples. Pairwise comparisons of individual spots were made between T1 and T2 samplings for each individual to identify differences in protein abundance. A number of spots were selected for MS/MS analysis for each individual based on the following two criteria: 1) the -fold change of the spot volume between the two time points was greater than 1.5 (see “Experimental Procedures” for threshold selection), and 2) the protein spot was well resolved and appeared as a symmetrical peak. A total of 104 spots met these criteria, ranging from eight to 34 spots per gel, and 73 of them were identified by MS/MS Table I; see Supplemental Table 3S for peptide coverage maps).
Proteins identified by MS/MS that vary in abundance within individuals over a 2-week time period
Column 2 is the GenBank™ accession number. “Number” indicates the number of individuals that share a greater than 1.5-fold change. Column 4 is the -fold change in protein abundance comparing time 2 with time 1. Direction of the change was ignored, and only the absolute value of the -fold change was used for calculation. Average ± S.D. were given for changes that are shared among several individuals as well as for proteins for which several isoforms have been identified in the same individual. “p value” indicates the statistical significance of a protein differing in abundance between the two time points in one or more individuals.
We found that there is a certain degree of variability in the CSF proteome detected within an individual over a 2-week period. Representative gel images and 3-D representations of one of such proteins are shown in Fig. 3. Although many protein constituents in the CSF are derived from plasma, some of these changed proteins Table I) have been shown to be enriched in CSF (31) or derived from nervous tissue, such as prostaglandin D2 synthase, transthyretin, apolipoprotein E (apoE), chromogranin A, chromogranin B, semaphorin L, and scrapie-responsive protein 1. Interestingly a number of the same proteins are found to vary in multiple individuals. We calculated the p value of a given protein appearing as changed by chance in one or more individuals (Table I) and found that it is statistically significant when a protein is shown as changed in two or more individuals (using a cut-off p value of 0.05). The protein found to vary most often is transthyretin (six of six individuals). Other commonly variable (isoforms of) proteins include chromogranins A and B, complement component 3 and C4A, hemopexin precursor, osteopontin a, β fibrinogen precursor, apoE, semaphorin L, prostaglandin D2 synthase, α2-macroglobulin, fibulin precursor, and ubiquitin. Aprotinin, a synthetic peptide present in the lysis buffer, appeared as a changed protein. Interestingly transthyretin and apoE have been shown to promote the solubility, transport, and clearance of Aβ, a molecule important in the pathogenesis of AD (32). For transthyretin, multiple isoforms were found to vary within individuals (in the same direction for a given individual), whereas only one isoform of apoE was found to vary. In fact, our apoE ELISA data showed that total apoE level does not vary significantly within individuals (data not shown). The direction of the intraindividual changes (increase versus decrease when comparing the initial and subsequent CSF sampling) did not show any trends among individuals. These findings strongly suggest that intraindividual variations in the CSF proteome reflect the dynamic steady states of those proteins.
Representative gel images and 3-D representations of one of the apoE spots that displayed intraindividual variation. Shown here are the data from Subject 2. There is a 3.1-fold change of the levels of this apoE spot between the two time points.
Across-gel Analyses: Profile Comparison—
An important goal of this study was to quantify the degree of intraindividual variation in relation to variation between individuals. For that purpose, we matched protein spots across all six gels so that a proteomic profile containing matched spots can be created for each sample, allowing for statistical comparisons to be made between individual profiles.
One advantage of 2-D DIGE is the ability to perform intergel matching and comparison through the inclusion of an internal standard on each gel (16). Nevertheless the assumption in using 2-D gel image analysis to measure relative protein concentration is that matched spots (spots that are located at the same position in different gels) correspond to the identical protein. We tested this assumption by first matching all six gels using one of the pooled samples (internal standard) as a master image and were able to match 306 protein spots across all six gels. We selected 16 matched spots that were well resolved and distributed across the entire gel and analyzed them with MS/MS (see Supplemental Table 1S). For 14 spots, we obtained protein identifications from more than two gels. For 11 of the 14 spots, the identified proteins were the same. Examination of the 3-D gel images revealed that, for the three spots that gave different proteins, there was evidence (e.g. a ridge) of another protein underneath the most prominent gel feature or matching/picking aberrations (see Supplemental Fig. 1S). Our conclusion is that matched, well resolved gel features correspond to the same protein (see Supplemental Fig. 1S).
After intergel matching, we were able to generate a proteomic profile that consisted of standard abundance (i.e. -fold change in protein abundance compared with the pooled internal standard) for matched spots for each of the 12 CSF samples. Because standard abundance is derived using the spot volume of the pooled sample as a denominator, it can be considered as the relative abundance of a protein spot. In this dataset, we applied two statistical analyses to assess the relationship among the CSF samples. First, we performed hierarchical clustering analysis on proteomic profiles. Hierarchical clustering orders objects in a treelike structure based on similarity (i.e. in this case, “pattern similarity”). Clustering analysis is used extensively in the mining of gene expression data generated by functional genomic studies (33–35), but its application to protein expression data remains limited. Through the inclusion of an internal standard, we were able to create a dataset that is very similar to datasets derived from GeneChip or microarray experiments and therefore allowed for the application of clustering algorithms to 2-D gel image analysis. The results, including a dendrogram and a heat map, are presented in Fig. 4. The dendrogram reveals that each pair of intraindividual CSF samples (T1 and T2) was clustered the closest together. As can be visualized in the heat map, the proteomic profiles of intraindividual samples are most similar to each other and distinctively different from other individuals’ profiles. The two CDR subjects with very mild AD (i.e. CDR 0.5) did not cluster the closest together, indicating that there are no obvious global changes of protein expression between the CDR 0.5 and CDR 0 samples in this initial study. To rule out the possibility that the two samplings from the same subject cluster together simply because they were run on the same gel, we analyzed T1 samples from two subjects on one gel and T2 samplings from these two subjects on another gel. Clustering analysis of the proteomic profiles showed that longitudinal samples from the same subject still cluster the closest together (data not shown). This further demonstrates the validity of the intergel comparison.
Hierarchical clustering of the 2-D DIGE profiles of 306 matched proteins spots from the 12 CSF samples from six individual subjects at time 1 (T1) and 2 weeks (T2). Each 2-D DIGE profile (column) contains 306 matched protein spots. Lines correspond to individual proteins, and colors represent their standard abundance after a log transformation and Z-score normalization (red = more abundant; green = less abundant). The CDR 0.5 samples are marked with an asterisk. Spotfire software was used to generate the cluster tree and the heat map. Distance in the cluster tree depicted here is not a reflection of similarity or strength of association.
Second, we used multidimensional scaling analysis. Like hierarchical clustering, multidimensional scaling is commonly used in microarray research as an analysis tool. Its purpose is to capture as much variation in the data as possible in a minimal number of dimensions (two or three) so that trends in data are more obvious. In effect, one is attempting to reduce the dimensionality of the data to summarize the most influential (i.e. defining) components while simultaneously filtering out noise. The distance between two samples can be thought to represent the similarity between them; the smaller the distance, the more similar the samples. Fig. 5 is a 2-D projection of the 3-D scatter plot of the proteomes of the 12 CSF samples. X, Y, and Z coordinates of the data points are provided in Supplemental Table 2S. CSF samples from the same subject are represented with the same color. As can be observed from the graph, there is a short distance between two samples from the same individual, and this distance varies among individuals. However, the distance between intraindividual CSF samples is much smaller than between samples from different individuals. These data highlight that intraindividual variation in the CSF proteome is much smaller than the interindividual variation.
Multidimensional scaling analysis of the 2-D DIGE profiles of 12 CSF samples from six individual subjects at time 1 (T1) and then 2 weeks later (T2). A 2-D projection of the 3-D scatter plot is shown. The proteomic profile of each sample is represented by a point. The axes correspond to the first three principal components. A single color has been used to label two intraindividual CSF samples. Coordinates of the spots are provided in the supplemental information.
Comparison of the CDR 0 and CDR 0.5 Groups—
Because our sample set included four CDR 0 subjects (eight CSF samples) and two CDR 0.5 subjects (four CSF samples), we were able to evaluate possible CDR group-associated differences. As mentioned earlier, 306 spots were matched across six gels/12 individual CSF samples. For each spot in a given CSF sample, there is a corresponding standard abundance (representing the relative abundance of this protein spot) that can be compared across the 12 samples. Therefore, we were able to perform a t test (in the BVA module of the DeCyder software) on the matched spots by designating the eight CDR 0 CSF samples as group 1 and the four CDR 0.5 samples as group 2. We selected the top ranking (p < 0.02) and well resolved spots and subjected them to MS/MS analysis. Eleven of the 13 spots selected were found to represent eight different proteins (Table II) (see Supplemental Table 4S for peptide coverage maps). Interestingly four of these proteins have been shown to have altered levels in AD CSF, including α1β-glycoprotein, prostaglandin D2 synthase, cystatin C, and β2-microglobulin (9, 10, 12). Consistent with previous studies, α1β-glycoprotein is decreased in the CDR 0.5 group, whereas cystatin C and β2-microglobulin are increased. Prostaglandin D2 synthase was found to increase in the CDR 0.5 group, which is consistent with a previous study (36), whereas another study reported a decrease of prostaglandin D2 synthase (10). We observed an increase in thioredoxin level in the CDR 0.5 group, whereas Lovell et al. (37) reported a decrease in this protein in AD brain by Western blot. Intriguingly several isoforms of chitinase 3-like 1, also known as GP-39 cartilage protein (38), were found to be increased in the CDR 0.5 group. This protein is primarily produced by human chondrocytes and synovial fibroblasts and has been shown to be a target antigen in patients with rheumatoid arthritis (39).
CSF proteins (identified by MS/MS) found to be differentially expressed between CDR 0.5 and CDR 0 groups
Column 1 is the protein spot ID from 2-D DIGE. For spots 1 and 5, two proteins were identified. Column 2 is the protein ID. The protein that had the higher matching score (when MS/MS results were used to search the protein databases) is listed first. Column 3 is the GenBank™ accession number. Column 4 is the statistical significance (p value) of a group-associated difference in a given protein. Column 5 is the direction of change (increase vs. decrease) when using the CDR 0 group as baseline.
DISCUSSION
In this study, we assessed intraindividual variation in the CSF proteome by analyzing a unique set of CSF samples. The use of proteomic techniques, including multiaffinity depletion, 2-D DIGE, and MS/MS, results in well resolved protein spots that span a large dynamic range, reliable quantification of protein level, and accurate protein identification. We discovered CSF proteins whose abundances fluctuate even within the same individual over a short time span as well as observed a greater similarity in the proteomic profiles between samples from the same individual compared with samples from different individuals. The presence of samples from both cognitively normal (CDR 0) subjects and very mildly demented (CDR 0.5) subjects in our collection enabled us to begin to preliminarily examine group-associated differences, i.e. potential biomarkers for very mild AD.
A number of intraindividual variations were identified. More importantly, some of these variations were common to multiple individuals, a finding that we believe reflects the intrinsic biological properties of those CSF proteins. There have been some studies on the diurnal variation in CSF neurotransmitter concentrations (40); however, to our knowledge, there are no reports that systematically study the normal variability in CSF protein abundance within individuals over a short period of time (e.g. days or weeks). Our data suggest that the levels of some (isoforms of) proteins in CSF tend to fluctuate more significantly than others due to the nature of their metabolism as opposed to standard errors in experiments. Such intraindividual protein abundance variations are a caveat when considering these proteins as potential disease-related biomarkers. Indeed some of these proteins, such as transthyretin and ubiquitin, have been shown to have altered levels in AD CSF samples and thus were suggested as candidate biomarkers. Our results may clarify discrepancies among AD biomarker studies. Some studies have reported reduced levels of transthyretin in AD CSF (41–44), whereas one study found no significant differences between AD and control groups (45), and one study has reported increased levels of transthyretin in the CSF of AD patients (9). A similar inconsistency was found with ubiquitin (increases (46–48), no change (49), and decreases (9)). Our results suggest that intraindividual variation is a significant factor in the biomarker discovery and translational process. Our results highlight a number of issues and options in choosing proteins from proteomic studies for bio-marker validation studies: 1) simply discard such a candidate, 2) redesign the experiment, for example, to expand the sample size to offset the intraindividual variation, or 3) further pursue it with more confidence if the interindividual variation is significantly bigger than the intraindividual variation. The results indicate that longitudinal measurement of biomarker levels from individuals may be critical to the demonstration of clinical utility.
In our collection of six subjects, four were cognitively normal (CDR 0), and two were very mildly impaired (CDR 0.5, likely very mild AD). Comparison of the proteomic profiles revealed CDR group-associated differences (Table II). Although the sample size is very small, five of eight differentially expressed proteins identified by MS/MS have been implicated in previous AD biomarker studies, and for three of them, our results (increase versus decrease) are consistent with previous reports. Importantly only one of these proteins (i.e. prostaglandin D2 synthase) was identified to display intraindividual variation. These preliminary differences that were found need to be validated with a much larger sample set; however, these results suggest that, given appropriate sample selection, protein quantification, and profile comparison, one may not require a large set of samples to identify disease-related biomarkers.
There are only a few comparative proteomic studies on AD biomarkers in CSF. Three of them used a 2-DE approach that couples conventional 2-DE to MS. A total of 14 putative biomarkers were identified with some overlap between studies (for a review, see Ref. 15). One study utilized SELDI-TOF-MS and identified four putative biomarkers (12). A recent study used liquid chromatography and ICAT to resolve and quantitatively analyze the CSF proteome of AD and control groups (15). A long list of proteins was identified that have altered abundance in AD versus control groups. However, the small overlap in proteins identified between repetitive ICAT runs (∼25%) and between sample sets from two institutions (∼30%) questions which of the differences are biological versus experimental. The current study is the first report on applying multiaffinity depletion and 2-D DIGE to the proteomic analysis of CSF samples. As a result of utilizing these new technologies, we can better resolve the CSF proteome and quantify the differences. The ability to carry out more reliable intergel comparison allows the application of statistical tools to extract signature patterns containing diagnostic or functional information.
One limitation in our methodology is that the number of spots that were matched across all gels is low. The number of spots detected in each of the six gels ranged from 1,646 to 2,106; however, only 306 spots were matched across all 18 images. The main reason is the inconsistency of the 2-D gel methods resulting in image artifacts, such as inadequate resolution, vertical and horizontal streaking, and particularly, local geometric distortions. Although 2-D DIGE was developed to minimize the effects of such inconsistencies, intergel image alignment remains difficult. The partial matching of spots likely results in the loss of potential biomarkers. An issue that exists for all 2-DE-based methodology results from the separation of multiple isoforms for each protein. Thus changes in abundance for a protein spot do not necessarily correlate with the change in total abundance of the corresponding protein. The application of orthogonal methodologies, such as ICAT and the newly developed ITRAQ (i.e. an amine-reactive isobaric tagging reagent-based protein quantification method (50)), to the same sample set will likely provide the most powerful discovery approach for AD biomarkers.
Our results have validated the rationale that interindividual comparison can be used to identify proteomic differences by demonstrating that intraindividual variation is far smaller than interindividual variation. We have also identified proteins whose levels in CSF change significantly within individuals. In addition, our study has shown the strength of our analytic and statistical methodologies in discovering group-associated differences in CSF. Given these results, is it worth developing CSF or other fluid biomarkers for AD? The reasons for developing biomarkers for very mild AD/mild cognitive impairment and AD are numerous. As of 2005, the diagnosis of AD is based completely on clinical criteria. Although at specialty referral centers, sensitivity and specificity of diagnosis is ∼90% (51), it is much lower in other settings (52). Improved diagnostic accuracy would be very useful as better therapies become available that have an impact on delaying, halting progression, or even improving function in AD. Diagnostic accuracy will be particularly important in drug trials in which a drug may be targeting a specific pathology that is present in AD but not in another dementing disorder. In addition to diagnosis, biomarkers may prove useful in assessing disease progression and serving as a surrogate for drug efficacy. The latter is important in AD because cognitive progression occurs over years, there is large interindividual variability in the rate of cognitive change, and multiyear trials with hundreds of subjects are required to determine efficacy when only clinical endpoints are utilized (53). Perhaps most importantly, as biomarkers are found for individuals with cognitive changes due to AD, it is possible that some of these same biomarkers will also prove useful as antecedent biomarkers for AD. For example, there is strong clinicopathological evidence that Aβ deposition in plaques precedes any sign of dementia caused by AD by many years, perhaps a decade or longer, i.e. “preclinical” AD (8, 54). If sensitive and specific biomarkers for preclinical AD can be developed, new therapies could be tested with the goal of delaying the onset and preventing clinical disease.
Given our current results using an unbiased proteomic approach, how might one develop CSF biomarkers for very mild/mild AD? We outline below an approach to consider. After validating the intra- and interindividual variability of different markers with the particular technique being utilized, we would recommend assessing CSF from a group of fully characterized age- and sex-matched individuals (controls versus very mild/mild AD). Characterization should include informant-based clinical assessment, neuroimaging, and neuropsychological assessment. Exclusion of other significant medical and neurological illness is important. Because some CSF biomarkers for AD such as Aβ42, tau, and phospho-tau have been established (6) and new imaging markers of AD pathology such as amyloid imaging with Pittsburgh compound B may be very accurate (55), segregating groups of controls and AD based on these markers may be helpful in grouping samples. In a very well characterized group, using a technique such as 2-D DIGE followed by mass spectrometry, we believe that initially utilizing N = 5–10 samples per group in the initial screen is likely to yield several candidate biomarkers. For example, if a particular marker differs in level by 50% between groups, and the standard deviation is 30% of the mean for both groups, one could detect a significant difference (α = 0.05, power of 0.8) with a sample size of N = 6. Prior to comparing all protein differences between groups, we would probably exclude from analysis the proteins whose levels are highly variable within individuals (e.g. see Table I). In addition, assessment of N = 2–3 time points per individual over weeks to months would provide important information regarding intraindividual variability that will assist in determining sample size required for larger validation sets. For markers that show significant differences between groups, validation of results with more quantitative methods such as ELISA is then required. If a marker is validated in a small, well characterized sample, assessment of larger samples of controls, AD subjects, and subjects with other dementing disorders (e.g. N = 50–100 per group) from several clinical centers would then be needed to truly validate the findings and determine sensitivity and specificity for diagnosis. Determining whether an individual biomarker or groups of biomarkers have value in predicting clinical outcome would then require longitudinal follow-up over several years.
Acknowledgments
We thank Pamela Millsap for coordinating CSF collection. Petra Gilmore, Jim Malone, Alan Davis, and Julia Gross provided excellent expert technical support for the proteomic studies. Ting Wang gave invaluable assistance on statistical analyses.
Footnotes
-
Published, MCP Papers in Press, September 30, 2005, DOI 10.1074/mcp.M500207-MCP200
-
↵1 The abbreviations used are: CSF, cerebrospinal fluid; AD, Alzheimer disease; 2-D, two-dimensional; 3-D, three-dimensional; Aβ, amyloid β; 2-DE, two-dimensional gel electrophoresis; Cy2, 3-[(4-carboxymethyl)-phenylmethyl]-3′-ethyloxacarbocyanine halide N-hydroxysuccinimidyl ester; Cy3, 1-(5-carboxypentyl)-1′-propylindocarbocyanine halide N-hydroxysuccinimidyl ester; Cy5, 1-(5-carboxypentyl)-1′-methylindodicarbocyanine halide N-hydroxysuccinimidyl ester; LP, lumbar puncture; CDR, Clinical Dementia Rating; DIA, Differential In-gel Analysis; BVA, Biological Variation Analysis; apoE, apolipoprotein E; T1, time point 1; T2, time point 2.
-
↵* This work was supported by National Institutes of Health Grants AG05681 and AG03991, the MetLife Foundation, and institutional resources provided by Washington University to the Proteomics Center at Washington University and by National Centers of Research Resources Grant P41RR00954 of the National Institutes of Health. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked ldquo;advertisementrdquo; in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
-
↵S The on-line version of this article (available at http://www.mcponline.org) contains supplemental material.
-
↵¶ Supported by National Institutes of Health Postdoctoral Fellowship AG025662.
- Received July 11, 2005.
- Revision received September 30, 2005.
- The American Society for Biochemistry and Molecular Biology