Urinary Protein Profiles in a Rat Model for Diabetic Complications*

Diabetes mellitus is estimated to affect ∼24 million people in the United States and more than 150 million people worldwide. There are numerous end organ complications of diabetes, the onset of which can be delayed by early diagnosis and treatment. Although assays for diabetes are well founded, tests for its complications lack sufficient specificity and sensitivity to adequately guide these treatment options. In our study, we employed a streptozotocin-induced rat model of diabetes to determine changes in urinary protein profiles that occur during the initial response to the attendant hyperglycemia (e.g. the first two months) with the goal of developing a reliable and reproducible method of analyzing multiple urine samples as well as providing clues to early markers of disease progression. After filtration and buffer exchange, urinary proteins were digested with a specific protease, and the relative amounts of several thousand peptides were compared across rat urine samples representing various times after administration of drug or sham control. Extensive data analysis, including imputation of missing values and normalization of all data was followed by ANOVA analysis to discover peptides that were significantly changing as a function of time, treatment and interaction of the two variables. The data demonstrated significant differences in protein abundance in urine before observable pathophysiological changes occur in this animal model and as function of the measured variables. These included decreases in relative abundance of major urinary protein precursor and increases in pro-alpha collagen, the expression of which is known to be regulated by circulating levels of insulin and/or glucose. Peptides from these proteins represent potential biomarkers, which can be used to stage urogenital complications from diabetes. The expression changes of a pro-alpha 1 collagen peptide was also confirmed via selected reaction monitoring.

Diabetes mellitus is estimated to affect ϳ24 million people in the United States and more than 150 million people worldwide. There are numerous end organ complications of diabetes, the onset of which can be delayed by early diagnosis and treatment. Although assays for diabetes are well founded, tests for its complications lack sufficient specificity and sensitivity to adequately guide these treatment options. In our study, we employed a streptozotocin-induced rat model of diabetes to determine changes in urinary protein profiles that occur during the initial response to the attendant hyperglycemia (e.g. the first two months) with the goal of developing a reliable and reproducible method of analyzing multiple urine samples as well as providing clues to early markers of disease progression. After filtration and buffer exchange, urinary proteins were digested with a specific protease, and the relative amounts of several thousand peptides were compared across rat urine samples representing various times after administration of drug or sham control. Extensive data analysis, including imputation of missing values and normalization of all data was followed by ANOVA analysis to discover peptides that were significantly changing as a function of time, treatment and interaction of the two variables. The data demonstrated significant differences in protein abundance in urine before observable pathophysiological changes occur in this animal model and as function of the measured variables. These included decreases in relative abundance of major urinary protein precursor and increases in pro-alpha collagen, the expression of which is known to be regulated by circulating levels of insulin and/or glucose. Peptides from these proteins represent potential biomarkers, which can be used to stage urogenital complications from diabetes. The expression changes of a pro-alpha 1 collagen peptide was also confirmed via selected reaction monitoring.

Molecular & Cellular Proteomics 8: 2145-2158, 2009.
Diabetic nephropathy (DNP) 1 accounts for ϳ44% of new cases of end stage renal disease (ESRD) (1). This high morbidity is the result of the impact of a growing population and longer life expectancy. With an increase in the prevalence of DM and a corresponding reduction in the mortality associated with both type 1 and type 2 DM, patients are living longer and are therefore at higher risk to develop complications such as nephropathy (2). Moreover, Type 1 DM patients who progress to ESRD have a substantial risk of mortality with estimated annual health care costs in the United States to be approximately $1.9 billion (3,4). Two key therapies for the prevention and management of ESRD are aggressive glycemic control and blood pressure regulation (5,6). Early intervention is essential in reducing the severity and course of this complication (6), and changes in urine biomarkers have historically been used to diagnose and monitor disease progression. In addition, urine represents a desirable matrix in which to detect biomarkers of nephropathy as urinary protein excretion profiles are reflective of functional changes within the kidney, such as glomerular filtration rate. Clinical determinations of urinary total protein and urinary albumin excretion are commonly used measurements to monitor and/or determine the onset of diabetic nephropathy. Unfortunately, these measurements often lead to improper diagnoses for at risk DM patients (7,8). Therefore, new prognostic indicators are required to accurately target these patients for therapeutic intervention earlier in the course of the disease as well as identify patients who are unlikely to progress, as therapy may be of little or no benefit to them.
Utilizing experimental models to study the pathophysiological changes that occur as function of disease progression has provided an approach for biomarker discovery. In diabetes, animal models have been widely used in the investigation of the progression of diabetes complications such as nephropathy. Research conducted on the association between hyperglycemia and microvascular disease in diabetes as well as the study of the effect of extracellular matrix protein ex-pression on changes in morphology in the diabetic kidney are two such examples (9). In addition, these models have assisted in developing appropriate clinical trials for the prevention and treatment of these complications. One such example is the use of anti-hypertensive treatment regimes in genetically hypertensive rats; these have examined whether early intervention may be renoprotective and therefore delay or prevent the onset of diabetic nephropathy (10 -12).
STZ-induced hyperglycemia in rodents is the most extensively studied model of diabetic nephropathy and associated complications (9,13). Hyperglycemia occurs in this model because of the toxin's destruction of pancreatic Beta-islet cells, which are essential to the production of insulin. STZinduced hyperglycemia is associated with reliable and consistent structural and functional deficits in specific urogenital organ function (i.e. kidney and bladder). Increased glomerular filtration and hypertrophy, as well as increased urgency and morphology changes, are structural and functional abnormalities that have been observed in the kidney and bladder, respectively, in both in humans and the STZ rat model (14 -19).
Currently, there are two primary methods used to monitor disease progression in diabetes. The measurement of urinary albumin excretion rates and total protein concentration are routinely used to monitor disease progression as they reflect structural and functional changes in the kidney. Measurements of albumin by immunochemical assays and size exclusion high performance liquid chromatography are routinely employed (7). However, urine consists of a multitude of proteins, many of which are also reflective of pathophysiological changes because of DM urogenital complications (20 -23). Proteomics provides a powerful approach for the detection of urinary protein changes as a result of disease, and multiple proteomic techniques are available in large scale protein profiling to discover new biomarkers (24 -26). To date, proteomic strategies for biomarker discovery in urine have primarily included top-down approaches, for example two-dimensional gel electrophoresis coupled with mass spectrometry and/or surface enhanced laser desorption/ionization time of flight mass spectrometry (SELDI-TOF-MS) analyses (27)(28)(29)(30). In addition a number of studies using capillary electrophoresis mass spectrometry, which have a number of advantages for analysis of urine, have been successfully carried out (31)(32)(33). While these approaches can easily detect and quantify a variety of proteinaceous species including isoforms, posttranslational modifications, or degradation products, other methods could be used to expand the number of proteins that are both quantified and identified providing an expanded set of biological targets to understand the complications of disease and its progression. Recent advances in both chromatography and mass spectrometry have enabled bottom-up approaches that identify and quantify at the peptide level (34 -36). One advantage of bottom-up proteomics is increased overall proteome coverage. Moreover, most bot-tom-up methods provide both qualitative and quantitative data in a single run, and quantifying at the peptide level leads directly into a bottom-up confirmation/validation analysis thereby avoiding the peptide selection step in this procedure. Approaches to bottom-up proteomics include specific peptide labeling or label-free analysis. Specific labeling approaches such as isobaric tag for relative and absolute quantification (iTRAQ) and 18 O employ differential stable isotope labeling strategies that create specific mass tags for different samples, which are mixed and then identified and quantified using mass spectrometry (37). The utility of these techniques is that they accommodate a wide range of pre-fractionation strategies thereby improving proteome coverage.
The label-free approach capitalizes on the highly reproducible chromatography and high mass accuracy available in current LC/MS systems. This method observes all detectable peptides and if interrogated by MS/MS their corresponding fragment ions. This approach quantifies a peptide by its intensity and groups each peptide across individual samples based on its accurate mass and retention time (38,39). These intensities associated with specific mass and retention time values are organized into peptide array tables that may be further processed using statistical techniques that accommodate high-dimensional data. As with other bottom-up approaches, this method is also amenable to pre-fractionation strategies, but unlike labeled approaches, the removal of chemical or metabolic labeling steps simplifies the overall approach.
Here we use a comparative label-free LC/MS/MS approach to identify and rank candidate biomarkers of urogenital complications from an STZ rat model of diabetes. We describe further technical validation of our approach by confirming the changes observed with the putative biomarker, pro-alpha (2) with an alternative method: selected reaction monitoring (SRM).

STZ-induced Hyperglycemia
Hyperglycemia was induced in male F-344 rats by a single intraperitoneal injection of STZ (35 mg/kg). All procedures were approved by the Animal Institute Care and Use Committee at Wake Forest University. STZ was dissolved in citrate buffer (1:1 mixture of 0.1 M citric acid and 0.2 M Na 2 HPO 4 ). Rats were fasted for a period of 24 h prior to STZ injection and presented with a 10% dextrose solution for a 24-hour period immediately following injection. The rat becomes hyperglycemic and glycosuric within 24 h. The extent of diabetes was confirmed via serum glucose evaluation from tail vein puncture. Blood was withdrawn from the tail vein whereas the animal was placed in a custom made restraining device to expose the tail. Blood glucose levels were determined with a glucometer (Ascensia Elite XL by Bayer system). Blood glucose levels were subsequently measured once a week for the first month and in the absence of signs of distress, monthly thereafter. Biochemical and electrophysiological deficits associated with the induced hyperglycemia are well characterized and generally occur early (within weeks of the onset of glucose elevation) (13,40). Control and DM urine was collected at 3 day, 1 month, and 2 months time points. To collect urine, rats were placed in a metabolic cage and urine was collected over a 6 h time period. The samples were then frozen in a Ϫ80°C freezer until proteomic analysis.

Experimental Design
Factorial Arrangement-The categorical factors under study consist of a treatment factor (two levels) crossed with a time factor (three levels). The two levels of the treatment group were the drug-induced disease group (T), while the control group (C) was the one that received a sham injection. The three levels of the time factor were the time points of 3 days, 1 month, and 2 months (3 d, 1 m, 2 m) from the start of the treatment (day 0). The experimental groups consisted of the 2 ϫ 3 possible combinations of levels of the factors, further denoted 'C.3dЈ, 'T.3dЈ, 'C.1m', 'T.1m', 'C.2m', 'T.2m'. This is a twoway factorial arrangement, which allows evaluation of the two main effects of the factors and their interactions.
Design Layout, Experimental Units, Pairing, Replications, and Pooling-Supplemental data Table I highlights the experimental units under study (n ϭ 16). Experimental units were 6 and 10 rats, respectively, in control and treated groups at all time points; and of 4, 6, and 6 rats at 3 days, 1 month, and 2 months to both control and treated. As rats were sacrificed at all time points, all experimental groups were independent (unpaired). In addition, treatments were assigned completely at random to independent experimental units. Therefore, this is an arrangement of treatments within an unbalanced and complete randomized design. Technical replicates were not performed, samples were not pooled, and no common reference sample was used.

Urine Sample Preparation
Four hundred microliters of each rat urine sample was concentrated to 100 l using a Microcon 3,000 molecular weight cut off filter (Millipore, Bedford, MA). Each sample is subsequently buffer-exchanged three times using 300 l of 50 mM Tris buffer to a final volume of ϳ100 l, and protein concentrations were determined by 2D Quant kit as described by the manufacturer (GE Healthcare). Twenty micrograms of each sample was run on a one-dimensional SDS gel for quality control. Subsequent to one-dimensional SDS gel analysis, aliquots of each sample were adjusted to 40 g in 57 l in 50 mM Tris. Twenty microliters of 0.2% Rapigest (Waters, Milford, MA) and dithiothreitol to a final concentration of 5 mM were added. The samples were reduced at 80°C for 15 min, cooled to room temperature, and alkylated with iodoacetamide at a final concentration of 10 mM for 30 minutes. Proteolytic digestion was performed with Lys-C (Wako Chemicals, Richmond, VA) with final enzyme to protein ratio of 1:6 (w/w), and the final digestion volume was adjusted to 200 l with 13 l 1.5 M Tris. The samples were then incubated for 18 h at 37°C. A primary stock solution of MassPREP TM Digestion Standard Mix (Waters) peptides containing yeast enolase was prepared to a concentration of 1 pmol/l each. The standard peptide mixture was added such that the final concentration was 0.4 pmol/l (final volume of 150 l).

Label-free Expression
Liquid Chromatography and Mass Spectrometry-Sixty nanograms of each sample were analyzed by LC/MS/MS and the order of sample injections randomized over all samples. Separation of peptides via capillary liquid chromatography was performed using a Dionex Ultimate 3000 capillary LC system (Dionex Sunnyvale, CA). Mobile phase A (aqueous) contained 0.1% formic acid in 5% acetonitrile, and mobile phase B (organic) contained 0.1% formic acid in 85% acetonitrile. Samples were trapped and desalted on-line in mobile phase A at 10 l/min for 10 min using a Dionex PepMap 100, (300 m x 5 mm). The sample was subsequently loaded onto a Dionex C18 PepMap (75 m x 15 cm) reversed phase column with 5% mobile phase B. Separation was obtained by employing a gradient of 6% to 28% mobile B at 0.300 l/min over 100 min. The column was washed at 99% mobile phase B for 10 min, followed by a re-equilibration at 100% mobile phase A for 17 min. Mass spectrometry analyses of samples were performed using a hybrid linear ion trap Fourier-transformation cyclotron resonance mass spectrometer (FT-LTQ; Thermo, Waltham, MA). Positive mode electrospray was conducted using a nanospray source, and the mass spectrometer(s) was operated at a resolution of 25,000. Quantitative and qualitative data were acquired using alternating full MS scan and MS/MS scans in normal mode. Survey data were acquired from m/z of 400 -1600 and up to three precursors, based on intensity, were interrogated by MS/MS per switch. Two micro scans were acquired for every precursor interrogated, and MS/MS was acquired as centroid data. The FT and LTQ were mass calibrated immediately before the analysis using the instrument protocol. Raw LC/MS/MS data was processed via Proteomarker software (Infochromics, Toronto, Canada).
Data Processing-Qualitative-The raw data for each run were first extracted to provide MS/MS peak lists for identification and intensitybased profile peak lists for quantification. The MS/MS peak lists were subsequently searched by Mascot version 2.2.0 (Matrix Science, London, UK). The database used was a compilation of both the rat and mouse International Protein Index (IPI) (July 2007; 97811) plus the sequence for a dietary protein from rat chow, GY5 which was soybean protein we expected to be recovered in urine. Search settings were as follows: no enzyme specificity, mass accuracy window for precursor ion, 10 ppm; mass accuracy window for fragment ions, 0.8 Daltons; variable modification, including only carbamidomethylation of cysteines and oxidation of methionine. The criteria for peptide identification were a mass accuracy of Յ10 ppm and an expectation value of p Յ 0.05. Proteins that had more than two peptides matching the above criteria were considered confirmed assignments whereas proteins identified with one peptide regardless of the Mascot score were highlighted as tentative assignments.
Data Processing-Quantitative-Automated differential quantification of peptides in a set of samples was accomplished with Proteomarker. For each profile MS spectrum of an LC/MS/MS run, isotopic envelopes were resolved, charge states determined, and monoisotopic masses assigned to generate a monoisotopic peak list. This peak list was further consolidated by grouping masses observed at different charge states and appearing in contiguous elution time points, resulting in a set of chromatographic peaks that corresponded to distinct putative peptides. Mascot protein identity data were integrated at this point by matching peptide sequences to chromatographic peaks on the basis of mass and retention time. Chromatographic peaks were aligned across samples using the attributes of monoisotopic mass, retention time, charge state distribution, and peptide sequence (when available). To effectively align peptides, Proteomarker corrects for retention time drift by warping elution times across samples using endogenous peaks. Alignment between sample datasets resulted in a peptide abundance matrix in which each row (or differential set) represented a putative peptide with sample-specific attributes such as observed mass, starting and ending chromatographic retention times, peptide sequence (if assigned), and calculated chromatographic area used as a measure of intensity for the given peptide.
Data Quality Control-Prefiltering-Subsequent to raw data acquisition and processing, data QC and statistical analysis were performed. Fig. 1 describes the protocol for data QC and statistical analysis that was conducted for this study. Two major peptide pre-filtering steps were performed to remove very poor quality peptide identifications as well as those differential sets (diffset) in the abundance matrix that did not receive a qualitative identification. A diffset for analysis was rejected if (i) a peptide sequence was not assigned or (ii) the consensus peptide sequence score was below the 81st percentile (Ͻ9.96), which represents the lowest mode of distribution of Mascot scores. To further reduce the dimensionality of the data, we carried out peptide specific intensity summaries for diffsets that contained peptides of identical sequence. When several diffsets could be summarized, the diffset with the least number of missing values was retained with all its annotations including retention time, m/z ratio, sequence, score, protein annotation; intensity values from other filtered diffsets were transferred into the retained diffset. A filtering step was added to retain those diffsets; (i) for which the number of missing intensity values by experimental group (3 day, 1 month, or 2 months) was less than 50% in at least one experimental group, and (ii) for which a certain threshold of missing intensity values per diffset was not exceeded. Given these criteria, the number of missing events per diffset was bounded to 15 as the control 3 day experimental group contained only one sample (see under "Results").
Data Quality Control-Probability Model for Missing Value Imputation-Missing values in LC/MS data arise because of imperfect detection and alignment of peak intensities or by true absence of expression. The probability model and iterative algorithm described in Wang et al. (41) was used to account for the missingness mechanism at play and its extent for these data. This model substitutes a missing measurement of intensity with its expected value of the true intensity given that it is unobservable. Estimation of the imputation parameters was done in a simulation type of study in order to minimize the percentage of remaining missing values.
Data Quality Control-Normalization, Variance Stabilization, and Normality-The purpose of normalization is to identify and remove sources of systematic variation because of experimental artifacts in the measured intensities. In addition, it is important to ensure that the imputed intensities are corrected for variance stabilization and normality. Various transformations were applied on the data features to ensure that the above assumptions were met. We used a common normalization method known as natural cubic smoothing splines as described in Workman et al. (42). This normalization procedure uses sample quantiles from the data features to fit natural cubic smoothing B-splines. The splines are then used as signal-dependent normalization functions on the original data.
Statistical Analysis-Grouping Effect, Linear Model, and Empirical Bayes Estimators for Statistical Inference-Potential groups and outliers among the samples were checked by a Principal Component Analysis (PCA) (43). Subsequent to PCA, we fitted the same statistical model individually to each univariate response variable (single peptide expression). A standard analysis method in modeling of high dimensional data is to fit the same statistical model (usually linear) individually to each outcome variable (diffset or peptide) and test for the contrast or effect of interest using the hypothesis testing framework.

FIG. 1. Workflow for quality control and statistical analysis of label-free expression data.
A drawback of this univariate approach is that the correlation structure or dependence between the variables is ignored. However, some compensating possibilities exist by borrowing information across similar variables, resulting in more stable variance estimates, which in turn assist in inference about each variable individually. Thanks to the parallel nature of the high-throughput data, this can easily be done by application of empirical Bayes methods and estimators derived from them, resulting in greater statistical power (moderated F-, t-, and B statistics) (44 -48). The B statistic is the empirical Bayes log 2 of the posterior odds that the peptide is differentially expressed and represents a measure of statistical significance. In addition, a number of authors have noted in gene expression studies that such approach of "borrowing strength" across variables is more reliable and does not reduce the power available to detect changes in expression for individual variables (49). In addition, the posterior odds statistics have proven to be a useful means of ranking variables in terms of evidence for differential expression, especially when the sample size are small, as is the case here (44 -53).
One common application of this approach is in the linear model setting (48). When multiple sources of variation and correlation are at play, a linear mixed effects model of analysis of variance is usually an appropriate and powerful approach. To fit a mixed effects model, where LC/MS runs or subjects random effects would enter into the model would usually require having technical replicates, which we do not have. Nevertheless, because in the present study the biological samples were all independent, and the LC/MS runs were randomized, we technically could ignore the corresponding random effect terms and use a simple fixed effects model of analysis of variance (two-way ANOVA). The fixed effects to be estimated are the treatment effect (T l ), the time effect (D k ), and an interaction effect (DT kl ). Letting y be the intensity signal on the original scale for the j-th peptide, and letting z jkl ϭ t(y jkl ) be the outcome on the transformed scale, where t(y jkl ) is the appropriate transformation described above (54), a linear fixed ANOVA model for each individual peptide j is fitted as follows: z jkl ϭ t͑ y jkl ͒ ϭ j ϩ D jk ϩ T jl ϩ DT jkl ϩ jkl (Eq. 1) where j represents the average signal intensity for that peptide across all factors and observations, and the error term jkl ϳ N(0, Α) is assumed to be normally distributed with mean 0 and some (unknown) variance component. In this experimental design, contrasts were built for each of the fixed effects of interest, coefficients were estimated, and variables were ranked in order of evidence of differential expression. Corresponding p values were adjusted for multiple testing using a recent extension of the standard Benjamini-Hochberg procedure, which controls the expected False Discovery Rate (FDR) under the usual assumption of general variable dependence (55). This error rate, called the positive FDR (denoted pFDR) (56 -58), results in a procedure much less conservative than the FDR and therefore typically more powerful and more appropriate for large datasets. In our study, the threshold was set arbitrarily to pFDR ϭ 0.05. Whenever available, implementations and algorithms of our methods were from the freely available consortium CRAN (Comprehensive R Archive Network). All other R codes written in our group can be provided upon request. For image analysis, exploratory data analysis, normalization procedures, and quality assessment, we used the packages "vsn" (59), "affy" (42,60), "PreprocessCore" (61). In addition, for linear modeling and supervised inferences, we used the package "limma" (48). Finally, for the control of the positive FDR, we used the package "qvalue" (53).
Confirmation of Peptide GEPGSVGAQGPPGPSGEEGK Selected Reaction Monitoring Mass Spectrometry-The pro-alpha (2) 1 collagen peptide GEPGSVGAQGPPGPSGEEGK was synthesized (Sigma-Aldrich, St. Louis, MO) and stock solutions were prepared in 10% acetonitrile at 1 nmol/l. A 1 pmol/l stock was infused into the mass spectrometer to determine ideal analysis conditions for the peptide as well as to select the best fragment ion which to monitor in selected reaction monitoring. Based on the MS/MS of this peptide, the y 18 fragment was chosen as it was both the most stable and intense fragment in the spectrum (see under "Results"). The stock solution was diluted in 10% acentonitrile/0.1% formic acid to various concentrations between 50 amol/l and 25 fmol/l. Digested samples were prepared such that their total concentration was 5 ng/l. Separation of target peptide via capillary liquid chromatography was performed using a Dionex Ultimate 3000 nanoscale LC system. Mobile phase A (aqueous) contained 0.1% formic acid in 5% acetonitrile, and mobile phase B (organic) contained 0.1% formic acid in 85% acetonitrile. Samples were trapped and desalted on-line in mobile phase A at 10 l/min for 10 min using a Dionex PepMap 100 (300 m ϫ 5 mm) column. Twenty microliters of standard as well as individual urine samples were subsequently loaded onto a Dionex C18 PepMap 75 m ϫ 15 cm reversed phase column with 5% mobile phase B. Separation was obtained by employing a gradient of 6% to 28% mobile B at 0.300 l/min over 30 min. The column was washed at 99% mobile phase B for 10 min, followed by a re-equilibration at 100% A for 10 min. Mass spectrometry analyses of samples were performed using an LTQ mass spectrometer (Thermo). Positive mode electrospray was conducted using a nanoflow sprayer, and quantitative data were acquired via selected reaction monitoring mode. The transition that was monitored was a normal scan of 897.40 m/z to 804.60 m/z, which corresponded to the 2ϩ charge state of the intact peptide and its y 18 fragment. On the LTQ, the product ion was isolated at 1.6 m/z width, the ion injection time was limited to 150 ms per microscan with one microscan, an AGC setting for the MS/MS of 1 ϫ 10 4 , and the data were acquired in centroid mode. Raw chromatograms were subsequently processed and analyzed using Xcalibur Quan View software version 2.0.5 (Thermo). Masses corresponding to the transition state of 897.40 -804.60 m/z, the retention time of which fell within a window Ϯ 2.5 min of the target retention time, were extracted, and the peak are was integrated by Quan View. The peak areas generated by the software were also manually inspected to ensure proper integration and adjusted when necessary. Following peak area integration, the software generated concentration curves for the standard samples via linear regression and the endogenous peptide from rat urine sample were plotted against this curve to determine concentration per 100 ng of total protein.

Label-free Expression Analysis-
The sample preparation protocol provided sufficient protein concentrations from 400 l of urine for label-free comparative analysis. These concentrations ranged from 1 g/l to 9.5 g/l. These variations are attributed to the wide range of sample collection volumes (1-15 ml). Reproducible protein patterns via one-dimensional SDS-PAGE were observed across time and treatment points (data not shown). These samples were subsequently analyzed by LC/MS/MS. The optimized LC/MS/MS analysis described under the "Experimental Procedures" section provided excellent chromatographic reproducibility across injections. Fig. 2 highlights this reproducibility with retention times on average deviating on the order of Ϯ 1 min.
Data Quality Control-Prefiltering-As described under the "Experimental Procedures" section, quality control filters were applied to the dataset before statistical analysis. The first filter removed diffsets for which a sequence was not obtained in the label-free LC/MS/MS analysis. A second qualitative filter was applied using the peptide identification probability score. These filters conserved those diffsets that were assigned a peptide sequence and a Mascot score with a good likelihood of validation in an independent assay. Fig. 3 highlights the threshold for Mascot score that was chosen. This threshold separates the lower mode in the distribution of scores from the higher mode. Peptides with scores that fell below this cutoff were removed from the dataset. The prefiltering steps conserved 1931 diffsets, and these were progressed to additional quality control and summarization measures. We then summarized and collapsed those diffsets for which the same peptide sequence was identified. One hundred and forty eight diffsets, sharing identical sequences but separately clustered, were summarized. This left 1783 diffsets with unique and reliable peptide annotations and adjusted intensities that were carried forward for further analysis. At this stage, we removed diffsets that had more than 15 missing intensity values reducing the total number of diffsets to 1429.
Data Quality Control-Missing Values and Imputation-Despite these QC filtering steps, a significant number of missing values remained in the data. Analysis of this dataset showed that the frequency of a missing value was not dependent on sequence length or composition as no observable correlation was discovered when analyzing these variables (data not shown). On the other hand, Fig. 4 highlights an inverse correlation of the average peptide intensity to the number of missing values for a peptide. A consistent trend is observed in  which an increase in average peptide intensity yields fewer missing data events. This was anticipated as low intensity peptides ions are closer to the limit of detection and are difficult to quantify from background noise. This type of missingness is referred to as Missing Not At Random (MNAR). The total number of missing values for the 1429 diffsets was 42.3%. The imputation model described in the "Experimental Procedures" section reduced the missing data to 31.6% (ϳ25% reduction). Because the remaining missing values typically represent undetectable true absent expression levels, these were substituted by the minimum of the imputed values. Subsequent to imputation, we carried out normalization procedure that was applied to the intensities of the entire dataset across all the samples, where each sample represented an independent LC/MS/MS run. Among a variety of transformation methods tested, the cubic smoothing splines normalization pro-cedure achieved the best combination of normalization, variance stabilization, and normality across all samples (Fig. 5). The normalized data were then subjected to statistical analysis.
Principal Component Analysis, Statistical Inferences, and FDR Corrections-Potential groups and outliers among the samples were evaluated by a PCA. Based on the PCA, three groups stand out: the control and treatment group at 3 days (C.3dЈ, 'T.3dЈ '); the two treatment groups at 1 and 2 months ('T.1m', 'T.2m'); the two control groups at 1 and 2 months ('C.1m', 'C.2m'). The treatment groups behave similarly at 1 month and 2 months with little overlap with the other control groups (Fig. 6). This indicates that there is probably no treatment effect at 3 days but there is a clear treatment effect with a potential interaction effect at 1 and 2 months time points. No outliers were observed.
Statistical inference of the dataset was performed using a fixed two-way ANOVA model. Diffsets were ranked by p value. Corrected p values, adjusted for multiple comparisons were reported with a positive pFDR max threshold of up to 5%. In this design, treatment and time main effects as well as their interaction effect can be evaluated. supplemental Tables 2, 3, and 4 highlight the peptides that were determined as having a significant change, including 434 peptides up-or down-regulated across all treatment comparisons, 82 up-or downregulated in all time comparisons and 54 up-or down-regulated in all interaction comparisons. Note that a cutoff FDR correction of 5% was applied, meaning that no more than 21, 4, and 2 of these selected peptides, respectively, are expected false positives. Overall, the majority of significant peptides observed across comparisons changed with treatment. These represent peptides/proteins the expression of which is sensitive to STZ-induced hypoglycemia and/or drug treatment. Moreover, the 2 months treatment comparison yielded the largest number of significant changes within the treatment groups. The volcano plots in Fig. 7 highlight the 326 peptides observed as significant at 2 months post-STZ treatment in comparison to 19 at 1 month (see also supplemental Tables 5, The central box in the plot represents the inter-quartile range, which is defined as the difference between the 75th percentile and 25th percentile, i.e. the upper and lower quartiles (the two "hinges"). The line in the middle of the box represents the median; a measure of central location of the data. 6, and 7). In addition, Table I highlights the proteins that were significant across all treatment comparisons, which met our criteria for confirmation of protein identification as described previously in the "Experimental Procedures" section. A total of 22 proteins were given a confirmed assignment with 5 of those up-regulated and 17 down-regulated across the treatment comparisons. Smaller numbers of peptides were significant with time and/or interaction in comparison to treatment. The 82 peptides that changed with time were related to age in this model as these rats were 12-weeks-old at the time of the study, and we anticipated differences in some protein levels because of maturation. The relative abundance of a few of these proteins also changed as a function of treatment and therefore they are identified as significant in the interaction analysis. The interaction comparison was consistent with observations made in the treatment effect as the majority of peptides identified as significant in the interaction effect were found at the 2 months comparison.
SRM of Pro-Alpha (2) 1 Collagen-A single peptide of rat pro-alpha (2) 1 collagen GEPGSVGAQGPPGPSGEEGK was identified in the label-free expression analysis. It was observed as a lower abundant peptide in the matrix with a tentative sequence assignment and had an average retention time of 14.5 min. Because of the biological interest and association with disease progression as well the existence of a homologous protein in humans, pro-alpha (2) 1 collagen was progressed to validation in the samples via SRM analysis. The peptide was synthetically generated and used for validation of the identification previously made as well as development of the SRM method. The synthetic peptide was infused into an LTQ mass spectrometer and MS/MS conditions optimized. The synthetic peptide fragmentation pattern was an exact match to the endogenous peptide MS/MS acquired in the label-free analysis thereby confirming the previous sequence assignment. In addition, a transition of 897.40 m/z to 804.60 m/z was chosen which corresponded to the ϩ2 charge state of the y 18 fragment of GEPGSVGAQGPPGPSGEEGK (Fig. 8). This transition was selected for SRM as it was the most reproducible and intense fragment ion, comprising most of the ion signal for the MS/MS. The synthetic peptide was spiked into 0.1% formic acid at various concentrations (50 amol/l, 125 amol/l, 250 amol/l, 500 amol/l, 1 fmol/l, and 2.5 fmol/l) to estimate limit of detection. The limit of detection was defined as 125 amol/l because below this level the data was not reliable. The curve was linear for this peptide over the concentration range selected with an r squared value of 0.996. Potential matrix suppression issues were evaluated by spiking the synthetic peptide standard into rat urine at a concentration of 1 fmol/l and comparing the intensity response for this concentration to the intensity response of this peptide in standard matrix (0.1% formic). After determining limit of detection and concentration range, the samples were subsequently analyzed. Two curves bracketed the samples with three quality control standards consisting of a pooled 3 day DM sample interspersed in the analysis to ensure instrument reproducibility. The raw data was processed, and concentrations of target peptide were calculated for each sample per 100 g of total urine protein. The targeted SRM for this peptide confirmed the increase in abundance observed in label-free analysis at the 2 months DM time point. In addition, Fig. 9 shows the targeted assay highlighted changes at the 1 month DM time point that were previously undetected. This is attributed to the low intensity level for this peptide in the label-free analysis, which may have appeared below the signal-to-noise threshold of the peak detection software. Tables 2-9 contain the estimated log 2 -fold change (M log-ratios) for individual peptides across effect or contrast of interest. The M log-ratios (M ϭ log 2 E 1 /E 2 ) represent a log 2 -FC between two or more experimental conditions in the case of a main effect and to a difference in log 2 -FC in the case of an interaction effect. For example, the range of log 2 -fold change observed for significant peptides in the 2 months treatment comparison were from 4.94 to Ϫ9.00. These -fold changes are to be considered an approximation of the true abundance because of the data processing required prior to statistical analysis. Pro-alpha (2) 1 collagen is one such example. The estimated average log 2 -fold change at the 2 months comparison for this peptide is 1.82, which corresponds to a 3.31-fold change; however an average increase of 3.07 was observed in the absolute quantification for these animals at the same time point. DISCUSSION Diabetes nephropathy accounts for the majority of cases of ESRD diagnosed each year. Diabetic patients who progress to ESRD have high risk of mortality and therefore early intervention is important in the prevention of this disease. Unfortunately, treatment is initiated only after DNP is observed clinically with consistent protein urea in diabetic patients. Thus, new biomarkers for diabetic nephropathy that detect molecular and cellular changes within the kidney prior to the onset of nephropathy could significantly impact the manage-ment of this disease. To evaluate the utility of label-free comparative analysis as a proteomic tool for biomarker discovery in DNP, we analyzed urine from the STZ experimental model of diabetes. The expected pathophysiology for altered end organ function because of diabetic complications in this model typically begins post 2 months STZ treatment (63,64). Interestingly, we observed large expression changes at 2 months post-drug treatment as well some changes at 1 month post-drug treatment suggesting this methodology is a sensitive technique for early detection of DNP.

Estimated Log -Fold Change for Label-free Expression Analysis-Supplemental
As mentioned above, of the 1429 peptides that progressed to statistical analysis, 633 had adjusted p values of 0.05 for the treatment and/or interaction (treatment ϩ time) comparison at one or two months post-STZ treatment. Of these, 285 down-regulated peptides were observed. The hypoinsulinemic state of the STZ rat may also contribute to decrease in peptide/protein abundance observed in our study because insulin regulates gene expression for many proteins (65). One FIG. 7. Volcano plots of treatment effect at 1 month and 2 months. The t test volcano plots arrange peptides by statistical significance. For each diffset, the horizontal axis represents the estimated log -fold change, and the vertical axis represents the log odds of differential expression (B). Significant peptides are those that have the largest absolute log-fold change and the highest log odds score. Among them, the most significant peptides are those found by ANOVA (highlighted in red (up) and green (down)) and distributed in the top right or left of the plots.

TABLE I Significant proteins that were assigned as a confirmed identification in treatment comparisons
Adjusted p value was determined using pFDR as described in methods section. * denotes that alpha-2u globulin is the same protein as MUP but named differently for mouse. The peptides identified for ␣-2u globulin are unique to that protein and represent a variant of MUP in the rat; therefore, we have included the a2U identification in the protein list. example is major urinary protein precursor, which was one of the most significant proteins observed in both the interaction and treatment analyses, with 23 peptides identified and an average adjusted p value of 3.68 Ϫ03 . Major urinary protein in mature male rats accounts for 30 -50% of total protein excreted. The primary synthesis site for this protein is the liver, and it is secreted into the blood. The protein is absorbed in the kidney from plasma. It has been shown to bind a number different hydrophobic small molecules, and its physiological role is believed to be pheromone transport in urine (66). Insulin deficiency in diabetic rats has been observed to trigger a reduction of urinary output for this protein. The primary cause of this reduction is because of the regulatory role of insulin in major urinary protein expression (67,68). In addition, our urinary results are consistent with Northern blot analysis of mRNA for this protein, which showed a significant decrease in expression at the mRNA level. In this study, the rate of transcription was decreased 10-fold with complete recovery after treatment of insulin in the streptozotocin rat (69). Just as decreased insulin concentrations can regulate gene expression and corresponding protein abundance, the resulting high levels of circulating glucose because of hypoinsu-  9. Box plots of illustrating abundance of pro-alpha (2) 1 collagen peptide GEPGSVGAQGPPGPSGEEGK in targeted SRM analysis (A) and label free expression analysis (B). Maximum value with a treatment group is represented by blue box, median value for a treatment group by red box, and minimum value for a treatment group by green box. This peptide was not detected in C 3 day, DM 3 day, or C 1 month time points in labelfree analysis therefore; red box for these time points represents baseline imputation values. The mean and standard deviations (STDEV) for the SRM analysis were the following: C 3 day (n ϭ 1) mean NA STDEV NA, DM 3 day (n ϭ 3) mean 2.5, STDEV 1.9, C 1 month (n ϭ 2) mean 3.1, STDEV 1.0, DM 1 month (n ϭ 4) mean 6.2, STDEV 3.6, C 2 months (n ϭ 3) mean 5.8, STDEV 0.96 and DM 2 months (n ϭ 3) mean 17.8, STDEV 11.9. linema can also effect gene expression and protein abundance. Pro-alpha (2) 1 collagen is one such example. Its protein abundance was increased at the 2 months time point and was highlighted as significant in this comparison with an adjusted p value of 0.047. In addition, SRM analysis of the samples analyzed by label-free analysis easily detected this peptide across all samples and confirmed the expression at 2 months post-STZ treatment as well as a potential expression change at one month post-STZ treatment. Pro-alpha (2) 1 collagen is a precursor to type 1 collagen, which is an extracellular matrix protein. It is of particular biological interest in diabetes as expansion of the glomerular mesangial matrix results in diffuse intercapillary sclerosis, which is the most important structural lesion in DNP (15,16,70). Studies utilizing animal models suggest that increased synthesis of extracellular matrix components occurs early in the progression of the disease where high levels of glucose stimulate gene expression of these proteins (71)(72)(73)(74). Interestingly, another protein which was observed to have increased abundance at 2 months post-STZ treatment was Ig kappa light chain. Six peptides to this protein were identified as significant at this time point with an average adjusted p value of 0.0058. These results are consistent with observations made from a recent plasma proteomic study of STZ-treated rats in which a 7.5fold decrease of this protein in plasma was observed in STZtreated rat (75). Moreover, increased urinary excretion of this protein in patients with DM has been observed (76). Increased levels of Ig kappa light chain in plasma and its subsequent increase in urine may be of significant importance in the accumulation of extracellular matrix protein in DNP. Studies of other glomerular-associated diseases suggest a pathophysiological role of immunoglobulins in DNP. One example is myeloma-mediated monoclonal Ig deposition disease (MIDD). Myeloma results in increased plasma levels of Ig kappa, which subsequently is lodged in the kidney and leads to MIDD (77). MIDD is a disease in which kidney deposition of Ig subunits induces an accumulation of extracellular matrix protein resulting in structural lesions that are similar to those found in DNP (78). The similarity of glomerular lesions associated with DNP and those derived from myeloma MIDD suggest a common pathophysiological role of immunoglobulins in the development of DNP.
In this study, we have utilized label-free protein expression to study protein changes in urine in the streptozotocin model of diabetes. To the best of our knowledge, this work represents the first application of label-free expression to detect urinary protein changes from diabetic complications. We have successfully identified a number of protein changes specific to STZ treatment that occur early in the development of DNP. These represent potential markers of DNP that may assist in the diagnosis, treatment, and prevention of this complication. It should be noted that there are limitations of the STZ-rat model of experimental diabetes as no single experimental preparation is an ideal animal model for diabetes in humans.
In addition, some of the proteins identified in this study may not have human orthologs and therefore will not be suitable human targets. However, this analysis can be used to direct additional discovery efforts in human samples to confirm protein observations made in this study as well as identify novel human biomarker candidates as the techniques developed here can be easily translated to a human analysis.