A Proteomic Analysis of Human Bile*

We have carried out a comprehensive characterization of human bile to define the bile proteome. Our approach involved fractionation of bile by one-dimensional gel electrophoresis and lectin affinity chromatography followed by liquid chromatography tandem mass spectrometry. Overall, we identified 87 unique proteins, including several novel proteins as well as known proteins whose functions are unknown. A large majority of the identified proteins have not been previously described in bile. Using lectin affinity chromatography and enzymatically labeling of asparagine residues carrying glycan moieties by 18O, we have identified a total of 33 glycosylation sites. The strategy described in this study should be generally applicable for a detailed proteomic analysis of most body fluids. In combination with “tagging” approaches for differential proteomics, our method could be used for identification of cancer biomarkers from any body fluid.

Approximately 7,500 new patients are diagnosed with biliary tract cancer each year in the United States, and nearly 4,500 patients (ϳ65%) die from their disease (1). Once established, biliary tract cancers are notoriously challenging to diagnose and treat. At present, only those patients with a completely resectable cancer achieve a modest 5-year survival. Those with unresectable cancers have a poor prognosis. In general, the outcome for patients with biliary tract cancer at any site is disappointing, and neither radiation nor pre-or postoperative conventional chemotherapy significantly improve survival or quality of life. Therefore, identifying patients with early, potentially curable, biliary tract cancers offers the best chance for improving survival (2).
Currently, the sensitivity and specificity of laboratory tests for early detection of biliary tract cancers is less than optimal, and there is considerable difficulty in distinguishing malignant from benign causes of bile duct obstruction. For example, cytologic specimens from brush biopsies have a notorious propensity for yielding false-positives and false-negatives, with an unacceptable overall sensitivity in the range of only 33-60% (3)(4)(5). Cancer antigen (CA) 1   is widely used for serologic detection of cholangiocarcinoma and has a sensitivity of only 50 -60% and specificity of 80% (6,7). Similarly, detection of p53 and K-ras gene mutations in bile has a sensitivity of only 33% and a specificity of 87% (8). There is clearly a need to identify novel, highly sensitive, and specific biomarkers for fluid-based detection of biliary tract cancer. The development of a reliable, sensitive, and specific panel of fluid-based biomarkers will not only enable early diagnosis of cancer in at-risk individuals with a recognized risk factor for biliary tract cancer, but also provide a cost-effective alternative for noninvasive screening in populations where biliary tract cancer has a high incidence (e.g. American Indians and Hispanic communities).
Current proteomic technologies allow identification of disease-specific protein profiles. Changes that occur during the transformation of a healthy cell into a neoplastic cell can result in protein alterations including changes in abundance, protein modification, enzymatic activity, or subcellular localization. Identifying and understanding these changes is an underlying theme in cancer proteomics (9,10). One would expect that biomarkers for biliary tract cancers should be more readily identifiable in the bile because of higher local concentrations of proteins derived from the biliary tract. However, no comprehensive study targeted toward defining the baseline proteome of human bile fluid has yet been performed. Here we provide the first comprehensive proteomic study of human bile fluid using a liquid chromatography and tandem mass spectrometric (LC-MS/MS) approach. We have elucidated the proteome of bile fluid using multiple fractionation techniques and affinity enrichment methods to identify several proteins that have not been previously described in bile. In addition, we provide definitive evidence for a large number of N-linked glycosylation sites on these proteins using lectin affinity chromatography followed by 18 O-labeling of the glycan attachment site by peptide N-glycosidase F (PNGaseF) treatment.

MATERIALS AND METHODS
Sample Preparation-Bile fluid was obtained by endoscopic retrograde cholangiopancreatography (ERCP) from a patient with cholangiocarcinoma. One milliliter of the unfractionated bile fluid was centrifuged at 16,000 ϫ g for 10 min at 4°C. The partially cleared supernatant was then mixed with 250 l of Cleanascite™HC (Ligo-Chem, Inc., Fairfield, NJ) followed by rotation of the sample for 1 h at 4°C. After incubation, the sample was centrifuged at 16,000 ϫ g for 2 min to clear away the formed lipid-micelles, and the supernatant was transferred to a new tube. A pre-rinsed YM-3 centricon filter unit (molecular mass cut-off at 3 kDa) (Millipore, Bedford, MA) was loaded with the entire lipid-cleared sample and centrifuged at 6,500 ϫ g until half of the sample volume had passed through the filter. MilliQ water was then added and the centrifugation repeated in order to reduce the salt concentration of the sample.
Lectin Affinity Chromatography-For the concanavalin A (Con A) affinity purification, 100 l of concentrated bile fluid was mixed with 2ϫ volume of Tris-buffered saline (TBS) buffer (0.05 M Tris-HCl, pH 7.1, 0.15 M NaCl) and 2ϫ volume of a 50% slurry of Con A-agarose (Amersham Biosciences, Piscataway, NJ) followed by incubation at 4°C overnight under rotation. After overnight incubation, the Con A beads were washed twice in TBS buffer (remove all buffer between washes in order to ensure minimal carry-over of albumin) and the bound protein eluted in 2 ϫ 100 l of 100 mM methyl ␣-D-mannopyranoside for 10 min at room temperature. The wheat germ agglutinin (WGA) (Amersham Biosciences) purification was done as described for the Con A affinity purification except that 100 mM N-acetyl-Dglucosamine was used for elution.
Immunoglobulin Depletion-For purification involving Con A followed by protein A and G, the initial procedure was identical to the Con A-only purification using four times the amount as starting material and all subsequent steps being scaled up four times as well (elution was done using 2 ϫ 250 l). The eluted proteins were then incubated with 30 l of a 50% slurry of protein A and G (Sigma, St. Louis, MO) for 1 h at 4°C under rotation.
Gel-electrophoresis and LC-MS/MS-All fractions were subjected to SDS-PAGE, and the gel was subsequently silver-stained as previously described (11). Gel lanes were excised for each of the samples FIG. 1. A Schematic of the purification procedure and mass spectrometric analysis of human bile. Bile from a patient with cholangiocarcinoma was subjected to lipid removal followed by a 3-kDa size-exclusion filtration step as shown. The fraction obtained after this step was designated "unfractionated" bile and was the starting material for all subsequent experiments. The bottom part of the diagram shows an outline of the three types of experiments, indicated by A, B, and C, described in this study. Scheme A shows unfractionated bile separated by one-dimensional SDS-PAGE. The entire lane was excised into slices, followed by in-gel digestion with trypsin and PNGaseF, and finally identification of proteins by LC-MS/MS. Scheme B outlines two different types of experiments. Unfractionated bile was purified by lectin affinity chromatography, and the eluted proteins were separated by one-dimensional SDS-PAGE. Two identical affinity purifications are illustrated on the schematic of the gel. Lane 1 underwent the same protocol as described for Scheme A, whereas lane 2 was subjected to in-gel digestion with trypsin alone, extraction of the peptides, followed by labeling of N-linked glycosylation sites with PNGaseF in H 2 18 O. Scheme C illustrates a two-step fractionation strategy. The bile was first subjected to lectin affinity chromatography followed by a second step, which consisted of immunoglobulin depletion, by a mixture of protein A and G. The proteins were then separated by one-dimensional SDS-PAGE and analyzed by LC-MS/MS. and divided into 28 -32 sections depending on the complexity of the sample. All slices were in-gel digested with sequencing grade trypsin (Promega, Madison, WI) and PNGaseF (0.1 U/section) (Sigma) according to Kü ster et al. (12). Extracted peptides were dried down to ϳ10 l and analyzed by LC-MS/MS on a Micromass Q-TOF API-US mass spectrometer (Manchester, United Kingdom). 18 O-labeling of N-linked Glycosylation Sites Using PNGaseF-For the H 2 18 O-labeling of the samples by PNGaseF, the in-gel digestion was performed without PNGaseF. Extracted peptides were dried to completeness and rehydrated in 10 l of H 2 18 O containing 0.1 U PNGaseF followed by overnight incubation at 37°C. After incubation, the samples were analyzed by LC-MS/MS.
Liquid Chromatography and Mass Spectrometric Analysis-An Agilent 1100 series system (Agilent Technologies, Palo Alto, CA) was used for the chromatographic separation of the peptides. The peptides were loaded onto a pre-column packed with 10 m C 18 ODS-A (YMC, Ltd., Kyoto, Japan) and washed with 95% mobile phase A (100% H 2 O with 0.4% acetic acid and 0.005% heptafluorobutyric acid v/v) and 5% mobile phase B (90% acetonitrile with 10% water, 0.4% acetic acid and 0.005% heptafluorobutyricacid). Subsequently, the peptides were eluted over 34 min with a flow of 300 nl/min using a linear gradient of 10 -40% of mobile phase B onto an analytical column packed with 5 m Vydac C 18 material. The eluted peptides were analyzed by a Micromass Q-TOF API-US equipped with an ion source designed at Proxion Biosystems (Odense, Denmark). The automated data acquisition and generation of peak list files were done using MassLynx (version 4.0; Micromass). Data-dependent acquisition parameters for the scan cycle were set as follows. TOF survey scan: 0.9 s (from 350 to 1,500 m/z); MS/MS scans: 0.9 s (for up to three selected precursor ions) (from 50 to 2,000 m/z). Interscan time is 0.1 s for our instrument. For each survey scan, the three most intense ions in the spectrum were picked for MS/MS analysis, unless they appeared on the dynamic exclusion list (see below). MS/MS to MS switch criteria was set to intensity falling below 3 counts/s. Precursor ions selected for a given scan cycle were excluded for the next 180 s of the LC-MS/MS run. The collision energy was determined by charge state recognition for ϩ2, ϩ3, and ϩ4 charged precursor ions.
Settings for Peak List File Generation and Database Searches-The peak list files were generated with MassLynx 4.0 using the following settings. Background subtraction: polynomial order: 0; below curve: 40%. Smoothing: smooth windows (channels): 4.00; number of smooths: 2; smooth mode: Savitzky Golay. Centroid: min. peak width at half height: 2; centroid mode: centroid top, 80%. Mascot version 1.9 installed on a Linux cluster was used to search the data using the MS/MS ion search mode for the generated peak list files. Mascot was used for database searching using the following parameters.

RESULTS AND DISCUSSION
Sample Preparation and Fractionation-Our initial strategy was to identify protein components in human bile by onedimensional SDS-PAGE of crude bile followed by in-gel trypsin digestion and subsequent identification of the proteins by LC-MS/MS. However, the first attempts at separating crude bile on a one-dimensional gel revealed that the bile fluid, obtained by ERCP from a patient with cholangiocarcinoma, contained high amounts of lipids and bile salts, which inter-fered with our analysis. To circumvent this, we decided to perform a crude purification of the bile fluid for removal of these impurities (Fig. 1). The bile fluid was first subjected to a lipid removal step using Cleanascite™HC, a nonionic adsorbent used to precipitate lipids, fat droplets, cell debris, and mucinous impurities. Following lipid removal, we subjected the sample to a 3-kDa size-exclusion filtration step to remove salts and other small molecular mass components. As shown in Fig. 2, these two steps provided us with a satisfactory protein mixture that could be separated without smearing of gel bands as previously observed. The crude purified bile is referred to as "unfractionated bile" for the remainder of the article.
Identification of Protein Constituents in Bile by One-dimensional Gel Electrophoresis and LC-MS/MS-The unfractionated bile was first concentrated and purified further using a 3-kDa size-exclusion filtration step and subjected to SDS-PAGE followed by silver staining (Fig. 2). The entire lane was divided into 30 gel slices that were digested with trypsin and PNGaseF followed by LC-MS/MS. PNGaseF was added during the in-gel digestion procedure to remove N-linked glycans, as body fluids such as bile are expected to be rich in glycoproteins. PNGaseF is a glycopeptidase that cleaves Nlinked high mannose, hybrid, and complex-type glycans at the linkage between the core structure and the anchoring asparagine, releasing the entire oligosaccharide and resulting in deamidation of the asparagine to an aspartic acid residue. The enzymatic cleavage of N-linked glycans serves two purposes. First, it results in a homogenous peptide population because N-linked glycans display a high degree of heterogeneity with regard to the occupancy of the site of attachment and of the sugar moieties found in individual glycan structures  Table I.  (13,14). Second, electrospray experiments involving glycopeptides are difficult to perform because glycosylated peptides do not ionize as easily as their nonglycosylated counterparts. These properties of glycans and glycopeptides make analysis by electrospray mass spectrometry more tedious, unless the sugar chains are removed prior to analysis. The result of the LC-MS/MS experiment is summarized in Table I. A total of 59 unique proteins was identified, many of which have not previously been reported in bile. Table I is divided into four columns. The first column gives the accession number in the reference sequence (RefSeq) database, the second column lists the protein name, the third column classifies the identified proteins according to their primary function, and the last column indicates the gel slice in which the peptide used to unambiguously assign the protein was found. Fig. 3A shows a typical MS/MS spectrum of a peptide from a protein found in Table I from unfractionated bile. Data Processing Pipeline-The 30 peak list files generated from the LC-MS/MS runs corresponding to the 30 unfractionated bile samples were merged to a single peak list file to obtain better statistics for proteins identified in several gel bands (usually from neighboring gel bands) and to simplify the search procedure and data analysis. Database searching was done using the Mascot search engine (15). A combination of computer scoring and "human criteria" were employed in the screening of the data. First, the data was searched against the RefSeq database with tryptic constraints, and a base list of proteins was generated on which further analysis was performed. An initial list of proteins was generated by the following procedure: 1. Only proteins containing at least one unique peptide (we refer to peptide as being unique for specific protein if the sequence has not previously been used to assign to a different protein) with a peptide Mascot score greater than 20 were considered. 2. The highest scoring peptide for each of the protein entries generated in 1) was manually inspected and interpreted to confirm the identity of the peptide. If the spectrum could match a different sequence better than the assigned one, if it had poor ion statistics, or if no other good spectra pointed to the same protein, the hit was discarded. In addition, the inspected peptide match was required to have a length of at least 8 amino acids and to have a sequence tag of at least three amino acids, preferably a good y-ion series. Also, the y1 and a2-b2 pair, if present, had to be consistent with the identified peptide sequence. If the sequence tag was not composed of y-ions and if no other peptides matched the same protein, the hit was discarded unless the spectrum was of good quality and most peaks could be explained. 3. If a protein has multiple isoforms or has multiple entries in the databases, we only specify the major form of the protein unless a specific peptide points to a region of the protein, which exists only in one of the isoforms. 4. If multiple peptides that matched the same protein are not from the same vicinity on the gel (more than two gel bands away), then extra care is taken to confirm those entries.
In order to identify proteins, which are not present in the RefSeq database, the peak list file was searched against the nonredundant (nr) database at NCBI and the results compared with the list generated by searching the RefSeq database. New entries retrieved from the nr database were tested against the same criteria as described above for identification purposes. If a spectrum that had been used to confirm an entry in the RefSeq derived dataset fitted an entry better in the search against the nr database, the original hit was removed (e.g. a different splice variant). Because many secreted proteins are heavily processed both within the cells (e.g. cleavage of signal peptide) and in the context of body fluids (e.g. protease processing), a high abundance of nontryptic peptide is likely to be present in the tryptic digest of our gel bands. As we are only searching a data set of tryptic peptides to minimize the amount of false-positive hits, we are potentially missing a large amount of peptides with good fragmentation spectra. The peak list was, therefore, subsequently searched against the RefSeq database with semitryptic constraints, which allows for the peptides in the database to contain only one nontryptic end. Analogous to the previous iteration, this list was compared against the combined list derived from the tryptic RefSeq and nr database searches.
Identification of Proteins in Bile After Enrichment by Lectin Affinity Chromatography-A common problem in proteomic approaches toward defining the constituents of human body fluids is the presence of high concentrations of albumin, which due to dynamic range issues prevents the identification of proteins present in low abundance (16). In keeping with this observation, we found albumin to be present in all but one fraction of unfractionated bile. To enable us to identify proteins otherwise undetectable due to the high abundance of albumin in bile, we chose lectin affinity chromatography, which allows one to remove albumin and to enrich for glycosylated proteins. We decided to use enrichment by two lectins as a complementary method for identifying proteins in bile.
Lectin affinity chromatography was performed using two types of lectins with different binding specificities, Con A and WGA. Con A binds to ␣-D-mannopyranosyl, ␣-D-glucopyranosyl, and sugars with similar steric properties, implying a preference for high mannose type of N-linked glycans (and to a lesser extent for hybrid type of sugars), with complex types of N-linked glycans binding weakly to this type of lectin (17). WGA binds to oligomers (and with lesser affinity to monomers) of ␤ (1,4)-linked N-acetylglucosamine and to a lesser extent sialic acid residues (18). WGA, therefore, has preference for binding various hybrid and complex sugars, and to a lesser degree, if the glycan is extensively trimmed, it also binds to the core structure of N-linked glycans consisting of ␤ (1,4)-linked N-acetylglucosamine bound to the sugar-carrying asparagine.
The results of the affinity chromatography using these two lectins are shown in Fig. 4. The elution profile from the two lectins shows that less material was bound to WGA when compared with Con A. This is in agreement with the broader specificity and the higher affinity of Con A. Comparison with the lane containing unfractionated bile (see Fig. 4), the profiles of the two lectin-based affinity purifications are significantly different, emphasizing the ability of this method to enrich for a subset of proteins that would otherwise be difficult to identify in the presence of more-abundant proteins in bile. Notably, the lectin affinity purification practically eliminated the albumin band found in unfractionated bile, which is consistent with albumin not being N-glycosylated (19). In order to identify eluted proteins from the lectin-based affinity purification, both lanes were excised into smaller gel slices and subjected to in-gel digestion with PNGaseF and trypsin followed by LC-MS/MS as described above for the analysis of unfractionated bile. Also, the same data-processing pipeline used for the  unfractionated bile LC-MS/MS runs was applied to the data from the lectin affinity purification. As shown in Tables II and III, a total of 59 and 32 proteins were identified from the Con A and WGA affinity purification, respectively. Fig. 5A presents the overlap of proteins identified by the two different lectin-based affinity purification methods in the form of a Venn diagram, while Fig. 5B displays the overlap between the proteins identified from the unfractionated bile, the Con A purification, and WGA purification. In total, we identified the presence of 87 unique proteins in bile. 18 O-labeling-Glycosylation is the most common posttranslational modification found in mammalian cell systems and the site of attachment of the glycan, as well as the structure and composition of the carbohydrate moieties, has long been recognized as a biologically important characteristic. Two types of glycosylation events normally occur on proteins: N-linked glycosylation, where the carbohydrate chain is attached to asparagine residues, and O-linked glycosylation, where the carbohydrate chain is attached to serine or threonine residues. Proteins harboring N-linked glycosylation are commonly destined for secretion and many N-linked glycosylated proteins are thus often found in high abundance in extracellular environments.

Identification of N-linked Glycosylation Sites by PNGaseF Treatment and
As mentioned, PNGaseF treatment of the extracted peptides provides a characteristic tag on the peptide in the form of a deamidation event taking place on the glycan-linked asparagine during cleavage of the sugar by the enzyme, resulting in conversion of the asparagine residue to an aspartic acid residue with a concomitant increase of 1 Da in the mass of the amino acid residue. However, deamidation can occur spontaneously under certain conditions (20,21), and because such an event cannot be distinguished from a deamidation event originating from the enzymatic cleavage by PNGaseF, a definitive conclusion about N-linked glycosylation cannot be drawn. To circumvent this problem, we decided to repeat the Con A affinity purification and to perform the PNGaseF cleavage step in the presence of H 2 18 O (12, 22, 23). The advantage of doing the enzymatic cleavage in H 2 18 O is illustrated in Fig.  6. During the deamidation process, the asparagine is converted into an aspartic acid with an 18 O stably incorporated, which gives rise to a mass increase of 3 Da instead of 1 Da. This eliminates the false-positives that arise from the spontaneously occurring deamidation events. Table IV shows a list of the identified glycosylation sites found in the Con A affinitypurified samples. Although a majority of the sites have previously been identified, some of the sites that we have identified have not been reported previously, emphasizing the continued need for detailed analysis of glycoproteins. Immunoglobulin Depletion-During the course of analysis of our MS data, it became evident that a large proportion of the best quality MS/MS spectra originated from immunoglobulins. This was not unexpected, as most immunoglobulins found in body fluids are known to be glycosylated. Therefore, we decided to try to deplete our Con A affinity-purified proteins by using a combination of protein A and G, both of which bind heavy chains of various immunoglobulins and their subtypes. We anticipated that the automated data-dependent acquisition process, which governs the peak selection destined for sequencing by the mass spectrometer, would enable us to sequence low-abundance species that might be "squelched" in the nondepleted samples due to dynamic range issues. The immunoglobulin depletion experiment was carried out as for the "normal" lectin affinity purification, except for the one added step in which the samples were incubated with the mixture of protein A and G. Table V lists the glycosylation sites found in the depletion experiment. Again, we were able to identify a few novel glycosylation sites, although the large majority of the glycosylation sites have been previously reported.
Discussion of Proteins Identified in Human Bile Fluid-A large majority of proteins identified in our analysis include those that are synthesized by hepatocytes and thus would be somewhat expected in bile fluid. Such proteins include transport proteins (ceruloplasmin, transferrin, transthyretin (prealbumin), ␣ 2 -macroglobulin, and lactoferrin), enzymes (␥-glutamyltransferase and adenosine deaminase), proteins in the coagulation cascade (fibrinogen and antithrombin), and epithelial glycoproteins, such as the carcinoembryonic antigenrelated cell adhesion molecule (CEACAM) 1. In fact, CEACAM1 is also known as biliary glycoprotein, as it was first isolated from bile fluid (24). Thus, this subset of proteins could be referred to as the "physiologic proteome" of bile fluid. As the bile sample we analyzed was obtained by ERCP, the presence of multiple pancreatic enzymes (e.g. pancreatic car- boxypepidase, pancreatic amylase, cationic trypsinogen, pancreatic lipase, and pancreatic elastase) was also not unexpected in the list of proteins identified.
In addition to hepatic and pancreatic proteins, we also identified several known "cancer-associated" proteins, perhaps reflecting the fact that the bile fluid was obtained from a patient harboring a cholangiocarcinoma. For example, we identified two epithelial apomucins, mucin 16 (also known as CA125 ovarian cancer antigen) and mucin 2 (MUC2) in the bile specimen. CA125 is a cell-surface glycoprotein that is widely used as a serum tumor marker for gastrointestinal and gynecological cancers, for diagnosis, as well as for monitoring recurrence (25,26). CA125 levels are markedly elevated in both serum and bile in patients with cholangiocarcinomas (27)(28)(29). Along the same lines, the epithelial mucin MUC2 is normally expressed at minimal levels in the normal biliary epithelium, with MUC1 being the principal biliary mucin during development, switching to "adult-type" MUC3 expression after birth (30). However, expression of MUC2 is elevated in many pathologic conditions of the biliary tree, including chronic inflammatory states and in cancer (31,32). Thus, our ability to detect two apomucins previously reported as differentially overexpressed in cholangiocarcinomas affirms the validity of our mass spectrometry-based approach.
We also identified additional cancer-associated proteins that have not been reported previously in the context of either cholangiocarcinomas, or even in bile fluid per se. These included three proteins: Mac-2-binding protein, lipocalin 2 (oncogene 24p3), and deleted in malignant brain tumors 1 (DMBT1). Mac-2-binding protein is a secreted glycoprotein that binds galectins, ␤ 1 integrins, collagens, and fibronectin and has some relevance in cell-cell and cell-extracellular matrix adhesion (33,34). Elevated serum levels of Mac-2-binding protein are often observed in patients with different types of solid tumors, including breast, ovarian, lung, and colorectal cancers, and are usually associated with a poor survival and metastatic spread in these malignancies (35)(36)(37)(38)(39). Low levels of Mac-2-binding protein are normally present in serum, semen, saliva, urine, tears, and in breast milk (33); this is the first report identifying this protein in bile fluid. Mac-2-binding protein was detectable in all three fractions (unfractionated bile, Con A, and WGA), raising the possibility that this protein may be a potential tumor marker for biliary cancer. Similarly, lipocalin 2, also known as neutrophil gelatinase-associated lipocalin (NGAL), is overexpressed in a variety of human cancers such as breast, colorectal, and pancreatic carcinomas (40 -45); NGAL has recently been proposed as a tumor marker in urine for bladder cancer patients (46). Again, this is the first report of NGAL expression in bile fluid and implies that this protein could be a potential tumor marker for cholangiocarcinomas. Finally, DMBT1 is an opsonin receptor encoded by a gene located on chromosome 10q that is frequently deleted in gliomas and other malignant brain tumors (47); the DMBT1 protein is principally expressed in the lung, trachea, salivary gland, small intestine, and stomach (48). Curiously, while loss of DMBT1 protein expression has been reported in several tumor types (49,50), a recent study suggests that this protein is overexpressed in pancreatic cancers (51). In fact, using a peptidomic approach to screening the conditioned media, the authors identified a 29-residue carboxyl-terminal fragment of DMBT1 that is secreted by pancreatic adenocarcinoma cell lines, but not by cell lines derived from normal pancreatic ductal epithelium (51). A number of keratins were detected in the different LC-MS/MS experiments. Given the nature of the sample used for the study and the way it was obtained, some of the observed keratins might be due to contaminants introduced during the sampling. However, keratin 1, 2a, 9, and 10 have all been described in the context of hepatobiliary cancers and thus likely constitute real bile components. Notably, we have also identified a large number of proteins whose function is unknown including some proteins that were only predicted by gene prediction programs. Thus, mass spectrometry-derived data can be used for functional annotation of genomes as well as to verify the existence of predicted gene products.
Evaluation of 18 O-labeling in Determination of Glycosylation Site- Table IV shows a list of the identified glycosylation sites found in the Con A affinity-purified samples. The first column of the table contains the identified peptide sequence harboring the glycosylation site, the second column indicates whether the glycosylation site is annotated in Swiss-Prot (or Trembl), the third column lists the name of protein from which the peptide is derived, and the last column contains the RefSeq (or GenBank) accession number and the Swiss-Prot (or Trembl) accession number in parentheses.
Most proteins found contain one or more glycosylation sites, but in a few cases they did not contain any site. Serum albumin functions as a carrier in serum and is known to bind nonspecifically to many serum proteins; ␣and ␤-globin are not glycosylated but are noncovalently but tightly bound to the plasma glycoprotein haptoglobin. Thus identifying a protein in the lectin affinity-purified gel does not necessarily mean that it is glycosylated and that is why our 18 O-labeling approach is necessary for definitive identification of a protein as a glycoprotein. Two instances where we were able to localize the N-glycosylation site (membrane alanine aminopeptidase and Mac-2binding protein) are shown in Fig. 7. Due to the incorporation of 18 O, the mass of deglycosylated asparagine residue is 117 Da, leading to an unambiguous assignment of the glycosylation site. While most of the proteins identified by lectin affinity chromatography were glycoproteins, only 15 glycosylation sites were identified by this method. This could indicate that the method is limited by the complexity of the sample and that it is necessary to further decrease the sample complexity prior to deglycosylation. Because immunoglobulins are glycosylated and present in bile in fairly high amounts, we tried depleting the immunoglobulins by protein A and G chromatography, and, using this strategy, we were able to identify a total of 28 glycosylation sites (Table V). Reassuringly, many of these were also found in the Con A-purified samples listed in Table IV. In all, we definitively identified 33 glycosylation sites. In some cases, several forms of the same glycopeptide was found with different N termini. This could result from in-source fragmentation during analysis or from proteolytic processing by aminopeptidases present in bile. Because proline residues are prone to fragmentation, it is possible that these shorter forms are the result of in-source fragmentation. Proline residues are also known to slow the trimming of the peptide end by exopeptidases and could also explain why several of the shorter versions of peptides begin with a proline residue. CONCLUSIONS A limited number of proteomic studies to analyze bile have been performed thus far. Using two-dimensional electrophoresis, He et al. studied the composition of vesicular and micellar proteins of human gall bladder (52). Upon comparison with reference two-dimensional electrophoresis maps of human plasma, red blood cells, and liver cells, the authors identified eight serum proteins in the bile samples. In a different study aimed at isolating and identifying hydrophobic polypeptides in human bile, Stark et al. (53) managed, through chloroform/methanol extraction, specialized reversed-phase chromatography and gel-filtration, and MALDI-TOF mass spectrometry, to identify a small subset of five proteins, of which three had not been described in bile previously. Using one-dimensional gel electrophoresis and LC-MS/MS, Jones et al. (54) analyzed bile in rats before and after treatment with 1,1-dichloroethylene or diclofenac. The rat bile samples obtained prior to exposure with 1,1-dichloroethylene or diclofenac allowed the authors to identify a total of 23 proteins that FIG. 7. MS/MS spectra showing localization of two novel N-linked glycosylation sites. A shows localization of a glycosylation site from the protein membrane alanine aminopeptidase that was identified from the Con A affinity purification followed by PNGaseF cleavage of the N-linked glycans in the presence of H 2 18 O. The spectrum shows the fragmentation pattern of a triply charged precursor ion at m/z 823. 10. The peptide sequence of the precursor ion is shown and the deamidated asparagine is indicated with red coloring. The mass difference between the y9 and y10 ion corresponds to 117 Da, indicating an aspartic acid with one 18 O atom incorporated (see schematic in Fig. 6). B shows the localization of a glycosylation site in the protein Mac-2-binding protein identified from the Con A purification followed by immunoglobulin depletion using protein A and G. The deamidated asparagine residue is indicated with red coloring as in A.
included several immunoglobulins, as well as hemoglobin ␣-1 and ␤ chains.
Whereas the above-mentioned studies have targeted specific fractions of bile or at comparing specific states, our article aims to produce a catalog of protein components that exist in bile. The 87 unique proteins we have identified is the largest catalog of human bile protein components to date. As mentioned earlier, there is a need for better biomarkers to diagnose biliary tract cancers, and we believe that having a reliable catalog of proteins present in this body fluid could ease the difficult task of identifying potential biomarker candidates. We have used multiple fractionation and purification methods to obtain our catalog, and the Venn diagrams presented in Fig. 5 clearly shows the need for combining multiple techniques. Failure to use one of the three methods would have resulted in missing 8 -17 proteins corresponding to 9 -20% of our catalog. We believe that the catalog of proteins published in this article is only a starting point. Given the complexity of human serum (55), we will hopefully be able to expand on defining the bile proteome further using additional fractionation techniques to move closer to identification of biomarkers for hepatobiliary cancers using differential proteomics.