Naturally Occurring Human Urinary Peptides for Use in Diagnosis of Chronic Kidney Disease*

Because of its availability, ease of collection, and correlation with physiology and pathology, urine is an attractive source for clinical proteomics/peptidomics. However, the lack of comparable data sets from large cohorts has greatly hindered the development of clinical proteomics. Here, we report the establishment of a reproducible, high resolution method for peptidome analysis of naturally occurring human urinary peptides and proteins, ranging from 800 to 17,000 Da, using samples from 3,600 individuals analyzed by capillary electrophoresis coupled to MS. All processed data were deposited in an Structured Query Language (SQL) database. This database currently contains 5,010 relevant unique urinary peptides that serve as a pool of potential classifiers for diagnosis and monitoring of various diseases. As an example, by using this source of information, we were able to define urinary peptide biomarkers for chronic kidney diseases, allowing diagnosis of these diseases with high accuracy. Application of the chronic kidney disease-specific biomarker set to an independent test cohort in the subsequent replication phase resulted in 85.5% sensitivity and 100% specificity. These results indicate the potential usefulness of capillary electrophoresis coupled to MS for clinical applications in the analysis of naturally occurring urinary peptides.

Chronic kidney disease (CKD) 1 is often characterized by a slow, progressive loss of renal function with a loss of glomerular filtration over a period of months or years that may eventually lead to end stage renal disease (ESRD). Patients with ESRD require renal replacement therapy (dialysis or kidney transplantation). The most common causes of CKD in North America, Europe, and Japan are diabetic nephropathy, hypertension, and glomerulonephritis (1). Together, these diseases account for ϳ75% of all adult cases of ESRD. Historically, kidney diseases were classified according to the anatomical compartment of the kidney that is involved. On this basis, vascular diseases include large and small vessel diseases, such as hypertensive nephropathy and vasculitis. Glomerular diseases comprise a diverse group of histologically defined primary glomerulopathies (e.g. focal segmental glomerulosclerosis (FSGS), membranous glomerulonephritis (MGN), minimal change disease (MCD), and IgA nephropathy (IgAN)) and secondary glomerulopathies due to diabetes mellitus (diabetic nephropathy), systemic autoimmune disorders (e.g. lupus erythematosus and hemolytic-uremic syndrome), or chronic viral infection (e.g. hepatitis and HIV). Tubular diseases are characterized by low molecular weight proteinuria and multiple transport defects (e.g. DeToni-Debré -Fanconi syndrome).
In clinical practice, renal damage is generally detected by proteinuria/albuminuria on urinalysis or quantitative measurement, changes in serum creatinine concentration for estimation of glomerular filtration rate, or both. However, these methods have major limitations as they are nonspecific and frequently are also late manifestations of renal damage. Therefore, we have sought to define alternative biomarkers for renal damage that may enable earlier and more accurate disease assessment.
Analysis of urine plays a central role in clinical diagnostics as it can be collected non-invasively. Urine as a body fluid for clinical analysis is relatively stable, probably due to the fact that it is "stored" for hours in the bladder; hence, proteolytic degradation by endogenous proteases may be essentially complete by the time of voiding (2). This is in sharp contrast to blood for which the activation of proteases and, consequently, generation of an array of proteolytic breakdown products are inevitably associated with its collection (3). The human urinary peptidome has been extensively investigated to gain insight about disease processes affecting the kidney and the urogenital tract (4 -6). Urinary proteins and peptides originate not only from glomerular filtration but also from tubular secretion, epithelial cells shed from the kidney and urinary tract, secreted exosomes, and seminal secretions (7)(8)(9). Urine is a rich source of biomarkers for a wide range of diseases due to specific changes in its proteome (10 -13). To test the feasibility of urinary proteomics as a non-invasive diagnostic tool, large scale studies are needed to analyze urine samples with reliable and quantitative experimental procedures. Various techniques have been applied to this effort, including twodimensional electrophoresis combined with mass spectrometry (MS) and/or immunochemical identification of proteins (14 -16), liquid chromatography coupled to mass spectrometry (LC-MS) (17,18), and surface-enhanced laser desorption/ ionization mass spectrometry (SELDI-MS) (19).
Because of mostly technical challenges, studies relying on proteomics experimental procedures are often restricted to the comparison of two groups of subjects (i.e. healthy controls versus patients with a well defined disease entity) with only a few individuals in each group. The lack of comparability severely limits the suitability of such data for a meta-analysis approach to define broadly applicable biomarkers. Consequently, findings from several studies cannot be used to explore the human urinary proteome/peptidome in its entirety. In addition, the health state of patients with kidney disease is often too heterogeneous to be reliably classified by biomarkers identified by such a strictly single disease-oriented approach. Diagnosis of individuals with different stages or types of kidney disease (disease controls) is conceivable by multiplex screening of proteomics data. Realization of such an approach critically depends on the use of a measurement platform allowing analysis of proteomic profiles within a reasonably short time and with high resolution and on the generation of a reference database for the human urinary proteome/peptidome.
Capillary electrophoresis coupled to mass spectrometry (CE-MS) enables reproducible and robust high resolution analysis of several thousand low molecular weight urinary proteins/peptides in less than an hour (5). In comparison with other proteomics methods, CE-MS offers several advantages. (i) It provides fast separation with high resolution. (ii) It is robust: capillaries are inexpensive and can be reconditioned efficiently using NaOH. (iii) It is compatible with most volatile buffers and analytes generally required for ESI. (iv) It provides a stable constant flow, avoiding the necessity of buffer gradients (for more details, see recent reviews (3,5,20)). This approach has recently been used to analyze urine samples from healthy individuals and patients with various chronic kidney diseases in several independent masked studies (21)(22)(23)(24)(25)(26)(27), including IgAN (28), diabetic nephropathy (29), and ANCA-associated vasculitis (30). The high number of data sets analyzed under identical conditions using the same technological platform allows comprehensive characterization of the low molecular weight proteome (peptidome) that can then become a primary source of information for the diagnosis, classification, and monitoring of a wide range of diseases. Here, we report the analysis of the human urinary peptidome by CE-MS and the identification of peptide urinary biomarkers for the detection of pathological changes in the kidney during the development of many forms of CKD. Furthermore, we have replicated these findings in an independent cohort.

EXPERIMENTAL PROCEDURES
Sample Collection-Since 2004, urine samples were collected at Ͼ20 clinical centers (Europe, America, and Australia) using consistent standard operating procedures for sample collection, storage, and transport. 2 All samples were collected as a midstream portion of a spontaneously voided, second morning urine (with the exception of urine samples for the analysis of prostate cancer where an initial portion of a second morning urine was collected) and were stored immediately at Ϫ20°C until analysis. The accepted diagnostic reference standard of the respective disease was used for the characterization of disease status (e.g. cystoscopy for the analysis of bladder cancer and renal biopsy for diagnosis of the type of CKD). All healthy controls (HC) were checked with a dipstick urinalysis for proteinuria. Clinical data for this study are shown in Table I. Furthermore, clinical parameters of the majority of individual samples have been published previously (25,28,30) and are available upon request. The study complied with the guidelines of the Declaration of Helsinki (www. wma.net/en/30publications/10policies/b3/index.html). Informed consent was obtained from all subjects, and ethical approval was obtained from the appropriate Institutional Review Boards.
Sample Preparation-A 0.7-ml aliquot of urine was thawed immediately prior to use and diluted with 0.7 ml of an aqueous solution supplemented with 2 M urea, 10 mM NH 4 OH, and 0.02% SDS. To remove proteins of higher molecular mass, the sample was filtered using Centrisart ultrafiltration devices (20-kDa molecular mass cutoff; Sartorius, Goettingen, Germany) at 3,000 ϫ g for 45 min at 4°C. Subsequently, 1.1 ml of filtrate was applied onto a PD-10 desalting column (GE Healthcare) equilibrated in 0.01% NH 4 OH in HPLC grade water to remove urea, electrolytes, and salts. Finally, all samples were lyophilized, stored at 4°C, and resuspended in HPLC grade water shortly before CE-MS analysis as described previously (27). The resuspension volume was adjusted to yield 0.8 g/l protein as measured by BCA assay (Interchim, Montlucon, France).
CE-MS Analysis-CE-MS analysis was performed with a P/ACE TM MDQ capillary electrophoresis system (Beckman Coulter, Brea, CA) coupled on line to micro-TOF-MS instrument (Bruker Daltonics, Bremen, Germany) (27). The electroionization sprayer (Agilent Technologies, Santa Clara, CA) was grounded, and the ion spray interface potential was set between Ϫ4.0 and Ϫ4.5 kV. Data acquisition and MS acquisition experimental procedures were automatically controlled by the CE via contact closure. Spectra were accumulated every 3 s over a range of m/z 350 -3,000.
The analytical characteristics of the CE-MS system have been extensively investigated by Kolch et al. (3) and Theodorescu et al. (27). Briefly, the average recovery of the sample preparation was ϳ85% with a detection limit of ϳ1 fmol. Monoisotopic mass signals were resolved for z Յ 6. Mass accuracy of the CE-TOF-MS method was determined to be Ͻ25 ppm for monoisotopic resolution and Ͻ100 ppm for unresolved peaks (z Ͼ 6). Mass accuracy of the CE-TOF-MS method was determined to be Ͻ25 ppm for monoisotopic resolution and Ͻ100 ppm for unresolved peaks (z Ͼ 6).
CE-FT-ICR-MS Analysis-For CE-FT-ICR-MS, a Bruker Daltonics Apex Qe instrument equipped with a 12-tesla magnet, and an Apollo II ion source was used. Coupling of the P/ACE TM 5510 capillary electrophoresis system (Beckman Coulter) via the Agilent ESI sprayer was performed as above. The instrument was tuned with a peptide standard mixture (27) and externally mass-calibrated on arginine clusters (Ͻ0.1-ppm calibration errors). Mass spectra were acquired over an m/z range of 300 -2,000. Ions were stored in the collision cell for 500 ms, and five spectra were accumulated for each scan, resulting in a scan interval of 5 s.
A set of 20 samples was reanalyzed. The number of FT-ICRdetectable features was significantly lowered (by a factor of ϳ10). After normalization of the CE migration time, the analysis using CE-FT-ICR resulted in ϳ500 features, which can be matched to the CE-TOF-MS entities, which are in the urinary database. Eighty of these features were selected, enabling alignment with high confidence (because their sequence could be identified or they were found to be present at relatively high abundance) or providing a good coverage of features over the whole CE time and mass area. For mass calibration of the CE-TOF-MS data, the precise masses of these features detected by CE-FT-ICR-MS (see supplemental Table III) were used as references in a linear regression (see also Ref. 31).
Data Processing-Mass spectral ion peaks representing identical molecules at different charge states (m/z with z ϭ 1, 2, 3,…) were deconvoluted into single masses using MosaiquesVisu software (32) (www.proteomiques.com). Only signals observed in a minimum of three consecutive spectra with a signal-to-noise ratio of at least 4 were considered. Signals with a calculated charge of 1ϩ were automatically excluded to minimize interference with matrix compounds or drugs. MosaiquesVisu used a probabilistic clustering algorithm and used both isotopic distributions and conjugated masses for charge state determination of the entities. TOF-MS data were calibrated utilizing Fourier transform-ion cyclotron resonance mass spectrometry (FT-ICR-MS) data as reference masses and applying linear regression. Both CE migration time and ion signal intensity (amplitude) showed high variability, mostly due to different amounts of salt and peptides in the sample, and were consequently normalized. Reference signals of over 1,700 urinary entities were used for CE time calibration by local regression. For normalization of analytical and urine dilution variances, MS signal intensities were normalized relative to 29 "housekeeping" peptides with small relative standard. For calibration, linear regression was performed (27,33). The resulting peak list characterized each feature by its molecular mass (Da) and normalized CE migration time (min). Normalized signal intensity was used as a measure for relative abundance.
Data sets were accepted only if the following quality control criteria were met. A minimum of 950 chromatographic features (mean number of features minus one standard deviation) must be detected with a minimal MS resolution of 8,000 (required resolution to resolve ion signals with z ϭ 6) in a minimal migration time interval (the time window in which separated signals can be detected) of 10 min. After calibration, the mean deviation of migration time (compared with reference standards) must be below 0.35 min.
Urinary Peptidome Database-All detected features in the urine samples that passed the quality control criteria (on average, 1,724 2 H. Mischak, unpublished data. features were detected in each urine sample, ranging from 983 to 4,094) were deposited in a Microsoft Structured Query Language (SQL) database and subsequently matched for further analysis and comparison of individual samples. For clustering, features in different samples were considered identical if the mass deviation was lower than Ϯ50 ppm for 800-Da entities, gradually increasing to Ϯ75 ppm for 15-kDa features. Due to analyte diffusion effects, CE peak widths increase with CE migration time. In the data clustering process, this effect was considered by linearly increasing cluster widths over the entire electropherogram (19 -45 min) from 2 to 5%. A feature that can be detected in a specific cluster will be assigned to a respective protein ID with its amplitude value. Feature signals that could not be related to a specific cluster possess a value of 0. After calibration, deviation of migration time was controlled to be below 0.35 min. This process resulted in the tentative definition of 116,869 different features. Each feature (presumably peptide) was assigned a unique identification number (protein ID). As described previously (25,34), several of these features appeared sporadically, being observed in only one or a few samples. To eliminate such entities of apparently low significance from the analysis, only those features detected in more than 20% of the urine samples in at least one group (samples from patients with the same disease) were further investigated. This noise-filtering process reduced the number of features available for analysis significantly. Applying these limits, 5,010 "relevant" different entities characterized by molecular mass (Da) and normalized CE migration time (min) were detected. The filtered data of all individual samples are available. 3 Statistical Analysis-Estimates of sensitivity and specificity were calculated based on tabulating the number of correctly classified samples. Confidence intervals (95% CI) were based on binomial calculations performed with MedCalc version 8.1.1.0 (MedCalc Software, Mariakerke, Belgium; www.medcalc.be). The receiver operating characteristic (ROC) plot was evaluated as it provides a single measure of overall accuracy that is not dependent upon a particular threshold (35).
For statistical differential analysis, we set a frequency threshold of 30% for a feature to be deemed valid in one of the considered groups and to be included in downstream analysis. For each of the entities, q values were estimated from the permutation method implemented in the Cran package called "samr" (36) (available at: http://cran.at.rproject.org/web/packages/index.html). We retained all peptides with a q value Ͻ0.05 from the samr output for downstream analysis. The q values were adjusted p values found using an optimized FDR approach. The FDR approach is optimized by using characteristics of the p value distribution to produce a list of q values. A p value of 0.05 implies that 5% of all tests will result in false positives. An FDRadjusted p value (a q value) of 0.05 implies that 5% of significant tests will result in false positives. The latter is clearly a far smaller quantity. To test the results of the samr analysis, we calculated a p value for each feature using the standard Wilcoxon rank sum test. From those p values, the q values were obtained using the Cran package "fdrtool" (available at http://cran.at.r-project.org/web/packages/index.html) and the Bioconductor package "fdrame" (37, 38) (available at http:// www.bioconductor.org/packages/release/Software.html). The agreement between the lists of differentially expressed markers (i.e. q value Ͻ0.05) using the different packages was higher than 94%. Supervised classification was performed on the training data using Mosa-Cluster software (Biomosaiques Software GmbH) for support vector machines (SVM) classification (23).
Classification-MosaCluster (version 1.6.5) was developed for the discrimination between different patient groups. This software tool allows the classification of samples in the high dimensional parameter space by using SVM. For this purpose, MosaCluster generates polypeptide models, which rely on peptides displaying statistically significant differences when comparing data from patients with a specific disease with those from healthy controls or patients with other diseases. Each of these peptides allegorizes one dimension in the n-dimensional parameter space (25, 39 -41).
CE-MS Platform Validation-For statistical evaluation, F-test for comparison of standard deviations was used. For statistical testing, the classification score of the CKD classification model was used, reflecting the overall variability of the system (detection, deconvolution, clustering, and classification). Relative standard deviations were calculated with the entire observed classification range of the CKD model (lowest observed values ϾϪ1.4 and highest observed values Ͻϩ1.6). Hence, a standard deviation of S.D. ϭ 3 corresponds to 100% (statistical spread).
To assess intra-and intersample variations, reproducibility was based on the analysis of three individual urine samples (healthy controls) that were processed in 15 replicates. Therefore, each urine sample was prepared by one operator 15 times, and the 15 replicates were measured with the same device on 1 day. Over the next days, the second and the third batches of 15 replicates were processed as independent sets of experiments.
After assessment of variances over a short period of time, for one of the three urine samples (see reproducibility analysis below), the time point of analysis and operator/device dependences were assessed (intermediate precision). Therefore, at 15 different days the urine sample was prepared twice by two different operators; one aliquot was measured at device 1, and the second aliquot was measured at device 2.
To determine the statistical variances observed for CKD diagnosis of individuals over the time course of a month, classification results were assessed for four healthy volunteers (two males and two females). For each individual, second morning urine samples were taken on Wednesday (reflecting working activities) and on Sunday (reflecting weekend activities) for 4 weeks. The duration of sampling reflects variances according to work-life balance and according to menstrual cycle for females. All urine samples were immediately stored at Ϫ20°C after sampling for at least 24 h until time of preparation. All samples were prepared together as a single batch by one operator and were measured using the same device over a short period of time.
Sequencing of Peptides-Candidate biomarkers and other native peptides from urine were sequenced using CE-MS/MS or LC-MS/MS analysis from different sequencing platforms (42). Sequencing using MALDI-TOF-TOF was performed on an Ultraflex instrument (Bruker Daltonics). To retain the information on migration time, the entire CE run (see "CE-MS Analysis") was spotted onto a MALDI target plate using a Probot microfraction collector (LC Packings, San Francisco, CA), depositing one spot every 15 s. Matrix solution (2 mg/ml ␣-cyano-4-hydroxycinnamic acid in 50% acetonitrile and 0.1% TFA) was added to the CE eluate during application onto the plate with a flow rate of 4 l/min. The plate was initially examined in MS mode. Subsequently, peptides were fragmented using the tandem mass spectrometry (MS/MS) mode. MALDI-TOF-TOF-MS/MS was carried out using MALDI postsource decay (PSD) in combination with LIFTcell TOF/TOF setup of the Ultraflex mass spectrometer.
For sequencing using ESI-Q-TOF, a Qq-orthogonal time-of-flight mass spectrometer (micrOTOF-Q, Bruker Daltonics) in MS and MS/MS modes directly coupled to a CE system via a coaxial sheath liquid sprayer (Agilent Technologies) was used for the experiments. The CE conditions were adapted to the conditions described under "CE-MS Analysis" to have comparable migration time of the peptides. The MS analysis was performed in positive electrospray mode with an ESI-TOF sprayer kit from Agilent Technologies (Palo Alto, CA). The ESI sprayer was grounded, and the sheath liquid, consisting of 30% (v/v) isopropanol (Sigma-Aldrich Chemie GmbH) and 0.5% (v/v) for-mic acid in HPLC grade water, was applied coaxially with a rate of 2 l/min. The MS system operates in data-dependent MS/MS mode with one full-scan MS spectrum (50 -3,000 m/z) followed by three MS/MS spectra. The repetition rate of the TOF was set to 1.5 Hz in the full-scan mode and 1 Hz in the MS/MS mode. The instrument was operated at 15,000 resolution. Argon was used as collision gas, and the collision energies were set automatically, depending on the m/z value and the charge state of the peptide. All data were externally calibrated using a sodium formate cluster. The calibration was used for both MS modes.
For sequencing using ESI ion trap, an aliquot corresponding to ϳ25-50 g (2.5-5 l) of a sample was loaded onto a preanalytical column that consisted of a 10-cm-long piece of 360-m-outer diameter ϫ 100-m-inner diameter fused silica (Polymicro Technologies, Phoenix, AZ) packed with C 18 particles (5-15-m YMC ODS-A, Waters, Milford, MA). After sample loading, the column was washed extensively with 0.1% acetic acid and then butt-connected using a Teflon sleeve to an analytical column, which consisted of a 10-cmlong piece of 360-m-outer diameter ϫ 50-m-inner diameter fused silica with an integrated nanospray tip (1-2-m orifice) and packed with C 18 resin (5-m YMC ODS-AQ). Peptides were eluted using high performance liquid chromatography (Agilent 1100, Agilent Technologies, Palo Alto, CA) with a flow rate of ϳ100 nl/min using a splitter. Peptides were eluted directly into an ESI interface and analyzed with a Finnigan LTQ linear quadrupole ion trap mass spectrometer (Thermo Electron Corp., San Jose, CA). The LTQ was operated with a "top 10" data-dependent analysis method consisting of a repeated data acquisition cycle of one full mass spectrum (m/z 400 -2000) followed by 10 MS/MS spectra, which corresponded to fragmentation mass spectra of the top 10 most abundant ions from the MS.
MS/MS experiments were also performed on an Ultimate 3000 nanoflow system (Dionex/LC Packings, Sunnyvale, CA) connected to an LTQ Orbitrap hybrid mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a nanoelectrospray ion source. The mass spectrometer was operated in data-dependent mode to automatically switch between MS and MS/MS acquisition. Survey fullscan MS spectra (from m/z 300 to 2,000) were acquired in the Orbitrap. Ions were sequentially isolated for fragmentation in the linear ion trap using collision-induced dissociation. General mass spectrometric conditions were as follows: electrospray voltage, 1.6 kV; no sheath and auxiliary gas flow; ion transfer tube temperature, 225°C; collision gas pressure, 1.3 millitorrs; normalized collision energy, 32% for MS/MS. Ion selection threshold was 500 counts for MS/MS. In addition, samples were analyzed using electron transfer dissociation (43)(44)(45). Peptides were separated by nano-reversed phase HPLC (Agilent 1100; flow split by a tee to ϳ60 nl/min) and introduced into an electron transfer dissociation-capable LTQ quadrupole linear ion trap (Thermo Fisher Scientific, San Jose, CA) via nano-ESI using instrumental parameters described previously (46).
Raw data files were either converted into dta files (RAW files generated by ion traps from Thermo Fisher Scientific) with the use of DTA Generator (47,48) or into mgf files (data derived from MALDI-TOF and Q-TOF analyses) with the use of DataAnalysis (version 4.0; Bruker Daltonics). All resultant MS/MS data were submitted to MASCOT (www.matrixscience.com; release number, 2.3.01) for a search against human entries (20,295 sequences) in the Swiss-Prot database (Swiss-Prot Number 2010.06) without any enzyme specificity and with up to one missed cleavage. No fixed modification was selected, and oxidation products of methionine, proline, and lysine residues were set as variable modifications. The accepted parent ion mass deviation was 0.5 Da (20 ppm for all Orbitrap spectra); the accepted fragment ion mass deviation was 0.7 Da. Only search results with a MASCOT peptide score equal to or higher than the MASCOT score threshold were included (see supplemental Table I).
Additionally, ion coverage was controlled to be related to main spectral fragment features (b/y or c/z ion series) (see also the supplemental figures). For further validation of obtained peptide identifications, the strict correlation between peptide charge at pH 2 and CE migration time was utilized to minimize false positive identification rates (42,49). As depicted in Fig. 1, the polypeptides are arranged in four to five lines. The members of each line are characterized by the numbers of basic amino acids (arginine, histidine, and lysine) included in the peptide sequence. Specifically, the peptides in the right line contain no basic amino acids; only the N terminus of the peptide is positively charged at pH 2. In contrast, peptides in the other lines (from right to left) show increasing numbers of basic amino acids in addition to their N-terminal ammonium group (49). The calculated CE migration time of the sequence candidate based on its peptide sequence (number of basic amino acids) was compared with the experimental migration time. A peptide was accepted only if it had a mass deviation below Ϯ25 ppm and a CE migration time deviation less than Ϯ2 min. The full list of identified peptides and MS/MS spectra is available at Human ProteinPedia (http://www.humanproteinpedia.org; accession number HuPA_00670).

RESULTS
All urine samples were prepared using a standard protocol and analyzed by CE-MS, resulting in individual data sets containing information on generally 1,200 -2,000 features (presumably peptides) per sample. All information recommended by the "minimum information about proteomics experiments" guidelines (50) for proteome analysis was recorded and is available upon request. The resulting list of peptides was defined by mass, migration time, and ion counts, the last of which served as a measure of the relative abundance of a peptide. 3 To improve mass accuracy, 20 samples were reanalyzed using CE on line-coupled to FT-ICR-MS. Because of the high costs and the lower sensitivity (higher detection limit) of FT-ICR-MS instruments in comparison with TOF instruments, it was not practical to analyze all samples by CE-FT-ICR-MS. The high FT-ICR-MS resolution also enabled an accurate analysis of the first isotope signal of ions with zϾ 6, which is crucial for determination of the exact mass of proteins and high molecular weight peptides. These data were used to refine the TOF-MS masses of the urinary peptides.
During additional calibration steps using "internal standard" peptides (27,33) for reference, CE migration time and signal intensity were corrected for analytical variances and urine dilution effects. All calibrated data are publicly accessible on the Internet. 3 A subsequent, thorough data calibration (see "Data Processing") allowed digital compilation (matching of data from respective samples) of individual data sets to an averaged "case-specific" data set. This compiled data set can be compared with any compiled "control group" data set, enabling the identification of statistically significant changes for biomarker definition.
The main purpose of this "human urinary peptidome database" is to serve as a universal platform for definition and validation of biomarkers for a variety of diseases and (patho)physiological changes. For the selection of CKD-specific bi-omarkers, data from individual samples were compiled (21,27) and grouped according to patients' clinical profiles (diagnostic group). These groups included healthy subjects (n ϭ 379) and patients with various biopsy-proven kidney diseases (n ϭ 230) (for details, see Tables I and II). All peptides were statistically analyzed using multiple testing correction (51). This approach resulted in 634 peptides of statistical significance. To reduce this large number of variables, we chose only a subset of these peptides with known sequence information. This strategy resulted in the identification of a panel of 273 peptides listed in supplemental Table I. Subsequently, an SVM classification model was generated based on these peptides to distinguish between healthy subjects and individuals with biopsy-proven kidney diseases (Fig. 1). In the training set, a sensitivity of 98.7% and a specificity of 100% were observed (for ROC curve analysis, see also Fig. 2, "Training Set"). To avoid any bias introduced by the use of samples from only one clinical center, samples from patients with different diseases or (patho)physiological conditions collected at different centers were included in this training set.
In agreement with the recently published guidelines for clinical proteomics (52), we reproduced this biomarker pattern in a multicenter prospective study using an independent blinded cohort (test set) of 144 individuals, including patients with different kidney diseases (n ϭ 110) and controls (n ϭ 34). Upon unblinding, all controls and 94 patients with CKD were correctly classified, resulting in a sensitivity of 85.5% (95% CI, 77.5-91.4) and a specificity of 100% (95% CI, 89.6 -100.0) (see Fig. 2, "Test Set," and supplemental Table  IIa, "prospective study").
CKD is frequently associated with underlying pathologies due to complications of diabetes, hypertension, or chronic viral infection. Therefore, we examined the specificity of the identified markers using further "disease controls" present in the human urinary peptidome database. This test set consisted of patients with diabetes mellitus types I and II, untreated HIV patients, patients with benign prostatic hyperplasia (BPH), and patients with arterial hypertension. Furthermore, we included in this test set additional patients with prior MCD who now have complete remission induced by treatment

of all patients used in this study
The first part shows the clinical data of patients with different diseases (see column "Group"), which represents the training set. The total number (n) of patients, gender ratio (male/female), age, serum (S) creatinine, and proteinuria status are shown. All patients included in the test set are listed in the second part. Patients representing further controls are listed in the last part of the with glucocorticoids. All these patients showed no evidence of renal damage based on clinical history, serum creatinine, or urinary protein levels (see Table I). The results of the classification with the CKD model are shown in supplemental Table IIb, "additional controls." None of the 29 patients with HIV, one of the 22 patients with diabetes mellitus type II, one of the 36 patients with diabetes mellitus type I, none of the 34 patients with BPH, one of the 13 patients with arterial hypertension, and none of the five patients with MCD in remission were classified as having CKD, resulting in an overall specificity of 97.8%.
To precisely gauge the analytical variability of the CE-MSbased CKD platform, a set of experiments for platform validation was performed: temperature stability, postpreparation stability, frost/defrost stability, reproducibility, intermediate precision, and time course were evaluated. In all tests, urine samples were processed as described under "Experimental Procedures." The aim of the stability analyses was the assessment of sample stability at room temperature, at 4°C, and after repeated freeze/thaw cycles. Additionally, postpreparation stability of the prepared sample for 24 h at 4°C in an autosampler was investigated. Both prolonged storage (at room temperature for 6 h and at 4°C for 24 h) and repeated freeze/ thaw cycles did not yield significant differences in standard deviations (see Fig. 3A). Furthermore, all analyses resulted in SVM scores. This finding implied a consistent classification result for these patients in all experiments. All observed statistical spreads were below 11%. Variances observed for postpreparation stability were in the range of 6%.
Reproducibility and intermediate precision analyses were used to determine the statistical variances of the SVM classification result 1) under the same operating conditions over a short period of time (reproducibility or intra-assay precision) and 2) at different days by different operators with different devices in a long period of time (intermediate precision or within-laboratory precision). As depicted in Fig. 3B, the relative intra-assay standard deviation was below 7% for all tested samples. As expected, relative interassay standard deviation increased (p ϭ 0.005); however, it was still below 10%. Intermediate precision included different lot numbers of buffers, solvents, and chemicals. Both devices underwent biannual maintenance service.
Statistical variances were determined for time course experiments (see Fig. 3C) observed for CKD diagnosis of individuals over the course of a month. The variances were in the range of the intermediate precision, suggesting stable expression of the CKD pattern and, thus, a small biological variance over time.
We aimed to identify as many peptides of the human urinary database as possible by applying state-of-the-art top-down MS/MS. Sequence information was obtained for 444 of the 5,010 peptides constituting the human urinary peptidome database. 3 As described previously (42,49), the migration time in CE depends on peptide mass and charge state at pH 2, equaling the number of free amino groups (N-terminal and basic amino acids). Therefore, it is not a prerequisite to use CE separation for MS/MS sequencing as the number of basic amino acids as well as the calculated mass serve to assign a putative sequence to a signal in the CE-MS spectrum. The identification and annotation are associated with a level of uncertainty, but this level appeared to be very low as all data generated by CE-MS/MS were confirmed by LC-MS/MS identification and, thus, confirmed the subsequent annotation of CE-MS data. Furthermore, of the 273 potential CKD biomarkers, 107 were also among the features detected in the FT-ICR data (see supplemental Table I, last two columns), which show mass differences compared with the calculated masses of the sequences of 0.5 ppm (Ϯ0.9 ppm). The sequenced markers for the diagnosis of CKD were fragments of different collagens, blood proteins (e.g. ␣ 1antitrypsin, serum albumin, hemoglobin ␣ chain, and fibrinogen ␣ chain), and kidney-specific proteins (e.g. uromodulin, sodium/potassium-transporting ATPase ␥ chain, and membrane-associated progesterone receptor component 1) as well as fragments of various secreted proteins (see Table  III and supplemental Table I). As depicted in Fig. 1 and supplemental Table I, fragments from serum proteins increased, whereas those of most of the collagens decreased in CKD. DISCUSSION Here, we report the application of CE-MS for the analysis of the human urinary peptidome and demonstrate its utility for the definition of biomarkers of CKD in general. These biomarkers apparently reflect primary pathogenetic changes as well as the response to certain disease processes. Hence, their usefulness extends beyond the applicability to diseases of the urogenital tract, and the approach may be applicable to diseases that result in systemic manifestations. Although genetic analysis can predict the risk of a disease, proteomics, with its potential to monitor dynamic processes, may more clearly show at which point in time the risk manifests itself as disease and also facilitates monitoring of the response to therapy. Thus, these methods are complementary in the therapeutic approach to an individual patient (29,53).
Recently, several groups have reported sequencing of an array of urinary proteins (15,54). Although these data sets impressively encompass many proteins (and the potential for insightful information), clinically applicable information, i.e. for the definition of biomarkers, is completely missing. Moreover, all of these studies used tryptic digests of urinary proteins. The sequences from these peptides allowed the investigators to tentatively assign a particular protein to the obtained sequence with variable confidence. Unfortunately, because of the in vitro manipulation of samples by digestion, it is not possible to define which of these peptides are natural constituents of human urine. The information that is essentially required in clinical proteomics is restricted to only naturally occurring urinary peptides. Furthermore, in cases where bi-omarkers must be defined with a high level of confidence, information on their relative abundance is critically required.
We have therefore attempted to obtain such substantial information: the naturally occurring peptides are defined by molecular mass and migration time, and the relative abundance is defined by ion counting, which is normalized by the use of "internal standards" (27,33). The latter resemble peptides that are present in almost every sample and that do not change significantly in their ion counts in all samples and disease groups investigated to date. Although this approach does not allow unambiguous identification that can be realized only by amino acid sequence analysis, it does permit a tentative identification based on the exact mass and migration time. Sequencing was performed in a second step but was hindered by several obstacles associated with MS sequencing of naturally occurring peptides (tryptic digests cannot be utilized because of a loss of connectivity to the original identification parameters (55)). Major obstacles are suboptimal use of proteomics search machines (like MASCOT or the open mass spectrometry search algorithm) for naturally occurring peptides (5,34) as well as the chemical nature of the peptides that prevents successful sequencing (42). For an assignment of the sequences to the correct label (protein ID in our database), there is a small chance of error. Assignment of two different peptides to one label and assignment of one peptide to two labels are likely to occur. At the same time and based on our MS/MS data, the results indicate that not every peptide that may theoretically be found was in fact detected. The pool of identical peptides present in urine is generally limited, reducing the challenge of correct assignment (because of a limited number of features present). Furthermore, errors in assignment will result in higher variability; thus, such peptides will likely be excluded as potential biomarkers.
CE-MS analysis of urine enables a tentative identification of biomarkers for a variety of diseases of the kidney and the urogenital tract (21,22,24,25,27,56), although the high biological variability of peptides represents a serious methodological impediment. Therefore, it appears imperative to evaluate clinical disease conditions not on the basis of single peptide markers but rather on the basis of a biomarker set consisting of distinct and clearly defined discriminating molecules. A panel of biomarkers will tolerate changes in individual analytes without jeopardizing the diagnostic precision, which means that variability will not result in gross changes of the diagnostic result.
In comparison with LC-MS, CE-MS seems to have several advantages when analyzing urine samples. Of high relevance are the reproducibility and insensitivity toward interfering compounds and the fact that no "flow-through fraction" is generated. Small peptides that are highly charged generally do not bind to the typically used reversed phase columns and, thus, will be lost for LC-MS analysis. However, these small peptides would be detected by CE-MS analysis. Large molecules (Ͼ5 kDa) frequently do not elute efficiently off the LC column, and their relative abundance thus cannot be assessed with sufficient accuracy. These molecules are, however, easily and reproducibly detectable by CE-MS. A further advantage is the relative insensitivity of CE-MS to precipitates that interfere substantially with LC separation. Furthermore, such precipitates can be removed effectively using NaOH, which is applied routinely after each CE-MS analysis (enabling reproducible reconditioning). A similar approach is impossible in LC-MS. These and additional advantages are outlined in more detail in several recent reviews (3,34,57). Furthermore, CE-MS analysis could be viewed as a very reproducible proteomics approach for the diagnosis of CKD. The analysis of samples stored under different conditions showed no increased variance, indicating that these urinary peptides are stable biomarkers, which may be also an advantage of using multiple biomarkers (biomarker pattern) instead of a single biomarker. Variances observed for postpreparation stability suggested sufficient resistance of the CKD biomarkers to oxidation, postpreparation precipitation, and degradation. In summary, these data underline the analytical robustness of the CE-MS-based diagnostic platform and represent the foundation for its routine clinical application for the diagnosis of CKD as already demonstrated for other indications, such as graft-versus-host disease (26).
We have shown that several peptides were differentially excreted in the urine of patients with different forms of CKD compared with healthy individuals (22,25). These studies were pilot studies aiming toward the differential diagnosis of certain types of CKD and were compromised by low statistical power for marker definition because of the small numbers of patients and relatively poor mass resolution. Furthermore, the frequently observed high content of proteins in samples from patients with CKD resulted in precipitation during separation even in the case of CE-MS, thereby limiting accuracy and rendering comparison of multiple data sets challenging or even impossible. Therefore, the protocol for sample preparation has been optimized by the removal of proteins above 25 kDa without significant loss of low molecular mass urinary components (23). This improved method, consisting of an additional ultrafiltration step in the presence of SDS and urea to prevent the interaction of proteins and peptides (23), was used for all samples reported here and resulted in a higher consistency of obtained data sets compared with those from the pilot studies mentioned above. Nevertheless, these changes in preanalytical handling clearly disqualify old data sets for storage in the urinary database and proteomics comparison on the highest level for data consistency.
Recently, using the new sample preparation protocol, we showed that urinary biomarkers enable differential diagnosis of specific single chronic renal diseases (IgA nephropathy, diabetic nephropathy, and ANCA-associated vasculitis) with sufficient sensitivity and specificity in blinded data sets (28 -30). Here, we present the applications of the urinary peptidome database for the non-invasive diagnosis of CKD in general.
To assess the value of an optimized human urinary peptidome database, we utilized the data for the diagnosis of CKD as a representative example of urogenital tract diseases. Using this peptidome database, 273 biomarkers were de-fined. The complexity of diseases and the pathological processes involved suggest that the concept of a single biomarker indicating not only a reliable diagnosis but also a stage in the pathological process and prognosis appears questionable. Combining multiple independent biomarkers into a diagnostic or predictive pattern (Fig. 1) may better address this problem. Many algorithms utilizing the available information on multiple biomarkers have emerged (39) with hierarchical decision tree-based classification methods (58), support vector machines (59), and Gaussian processes (60) among them (for more detailed information, see Ref. 39).
We hypothesized that a combination of the 273 CKD-specific biomarkers in a CKD-specific biomarker pattern would more accurately discriminate CKD patients from unaffected individuals. Although some of the classifiers of these biomarkers might appear as unnecessary to increase their specificity and sensitivity, an analysis investigating different algorithms for identification of biomarkers and establishment of classifiers demonstrated that an increase in the number of biomarkers produced a more robust model. 4 Therefore, it appears appropriate to include a large number of biomarkers in a multidimensional model, and this is the reason why we did not reduce the number of biomarkers in the model. Due to the excellent performance of the classifier, it has not been further optimized because further optimization will be quite challenging, likely requiring additional data on Ͼ1,000 subjects.
Another issue is the heterogeneity of the cohort: CKD is a conglomerate of diseases, and this fact is also reflected by the character of the biomarkers. Although certain biomarkers may serve well as classifiers on their own, their combination does not necessarily enhance classification performance. On the other hand, a moderately well performing individual biomarker may in fact increase performance of the classification algorithm, i.e. when a particular biomarker reflects disease for only a subset of samples. Therefore, we built an SVM classification model with 273 peptides to distinguish healthy subjects from individuals with biopsy-proven kidney disease in the training set. Subsequently, we applied the model/panel to the test set to test the model. Under such circumstances, overfitting to or "memorizing" the training set and thus poor classification of blinded data sets can be minimized. The classification of a blinded test cohort of 34 healthy controls and 110 patients with CKD resulted in a sensitivity of 85.5% and a specificity of 100.0%. We further examined the specificity of the identified markers using additional disease controls in the human urinary peptidome database. The overall specificity of 97.8% reflects the relatively high 10 -20% prevalence of CKD worldwide (61,62), especially manifested in older patients with diabetes mellitus type II and arterial hypertension (see Table I).
Most of the biomarker peptides in the urine are a product of proteolytic activity. Extracellular proteases may reflect the presence of the disease and its progression (63). Disease-induced changes in protease activities may be more readily recognized by focusing the analysis on the proteolytic fragments rather than on the specific protease itself (63). CE-MS analysis may be suitable to indicate changes in the tightly regulated activity of proteases and protease inhibitors by displaying relative abundance of their potential cleavage products.
Collagen fragments, especially fragments of collagen ␣-1 (I) chain, appear to be the major constituents of urinary peptides (see Table III). These peptides likely reflect normal physiological turnover of the extracellular matrix (64). In addition to CKD, collagen fragments are also the source of identified biomarkers for the diagnosis of coronary artery disease (CAD) (65,66). The main difference of the biomarkers for the diagnosis of CAD and of CKD is the direction of their regulation. Whereas most of the collagen-derived biomarkers for CAD showed increased urinary excretion, the collagen-derived biomarkers indicated CKD by their relative paucity (see supplemental Table I). This differential regulation may arise from differences in the activity of collagenases. Increased levels of circulating collagenases have been reported in patients with stable angiographic coronary atherosclerosis or intermittent claudication (65). Elevated matrix metalloprotease-9 activity has been found in unstable plaques, suggesting a crucial role in plaque rupture. In contrast, decreased activity of collagenases was observed in CKD patients (29). Regardless of the primary etiology, CKD in its severe presentation is characterized by tubular atrophy, interstitial fibrosis, and glomerulosclerosis. Hence, it has been assumed that diminished activity of matrix metalloproteases may be responsible for the accumulation of proteins in the extracellular matrix and collagens that typify the fibrotic kidney (67). This histology may be interpreted to indicate increased levels of inhibitors of matrix metalloproteases, i.e. tissue inhibitors of matrix metalloproteinase. Furthermore, accumulation of extracellular matrix as predominantly observed in diabetic nephropathy was recently shown to be associated with decreased urinary excretion of several specific collagen fragments (29). These explanations are speculative but are fitting at a conceptual level and have been mentioned here to highlight the significance of the human urinary database in a way that opens a door for further investigation whereby researchers of all disciplines are invited to participate. ␣ 1 -Antitrypsin and its fragments were reported to be upregulated in several types of CKD (57,63,68,69). Thus, increased urinary excretion of a fragment of this highly abundant plasma protein, together with fragments derived from serum albumin and fibrinogen, may reflect chronic renal damage. Uromodulin has been reported to be downregulated in diabetic nephropathy (29,70,71), and reduced excretion of specific uromodulin fragments also has been observed in other forms of CKD (30). Decreased urinary levels of these collagen-and uromodulin-derived peptide fragments may serve as indicators of (patho)physiological changes in CKD.
As we learn to better appreciate the individual differences in the responses to therapy, objective methods to measure these changes will become of prime importance. This need will pave the way for proteomics-based personalized diagnosis that could be used to tailor therapy to an individual patient. Non-invasive urinary proteomics has the advantage of real time monitoring, and adjustments to therapy can thus be made accordingly. This vision appears within reach, but its realization depends heavily on the establishment of tools that allow a quick and robust comparison of the patient's proteomic profiles against those of healthy controls and patients with other diseases. Thus, we posit that the urinary peptidome and its application in CKD as presented here are a major step forward in this direction. We anticipate that the availability of such tools in the future will significantly improve the options for patients with respect to diagnosis and therapy.
The present study with several hundred patients reveals the potential of distinct biomarker panels in clinical diagnosis of complex chronic diseases. Thorough platform standardization is needed to address the intrinsic/biological variability of the human peptidome. The human urinary peptidome database presented here is an important step toward proteomics becoming a diagnostic tool in the clinical settings as recently demonstrated for graft-versus-host disease (26). Tables I-III and