Chemical Strategies for Functional Proteomics

With complete genome sequences now available for several prokaryotic and eukaryotic organisms, biological researchers are charged with the task of assigning molecular and cellular functions to thousands of predicted gene products. To address this problem, the field of proteomics seeks to develop and apply methods for the global analysis of protein expression and protein function. Here we review a promising new class of proteomic strategies that utilizes synthetic chemistry to create tools and assays for the characterization of protein samples of high complexity. These approaches include the development of chemical affinity tags to measure the relative expression level and post-translational modification state of proteins in cell and tissue proteomes. Additionally, we discuss the emerging field of activity-based protein profiling, which aims to synthesize and apply small molecule probes that monitor dynamics in protein function in complex proteomes.


Introduction
In response to the availability of complete genome sequences for numerous organisms, the field of proteomics has emerged with the goals of developing and applying methodologies that accelerate the functional analysis of proteins (1,2). Strategies in proteomics can generally be divided into two categories that have complementary objectives: 1) the global characterization of protein expression, and 2) the global characterization of protein function. Large-scale efforts to measure protein expression have typically relied on a combination of two-dimensional gel electrophoresis (2DE), protein staining, and mass spectrometry (MS) for protein separation, detection, and identification, respectively (3). 2DE-MS methods are capable of simultaneously evaluating the relative abundance and modification state of numerous proteins from endogenous sources, thus permitting the identification of new proteins associated with discrete physiological and/or pathological states [e.g., nucleoside diphosphate kinase A as a marker for reduced metastatic potential in human prostate cancer cell lines (4)].
Additionally, by focusing on measurements of protein abundance, 2DE-MS methods provide only an indirect assessment of protein function and may fail to detect important post-translational forms of protein regulation such as those mediated by protein-protein and/or protein-small molecule interactions (7).
To expedite the functional analysis of proteins, methods have also been introduced to examine protein activity on a global scale. These technologies include large-scale yeast-two hybrid screens (8, 9), which aim to construct a comprehensive map of protein-protein interactions that occur in the cell, and protein microarrays (10,11), which offer a platform to rapidly assess the function of recombinantly expressed proteins.
Although capable of attributing specific molecular activities to individual protein products, these methods require that proteins be studied in artificial environments and therefore do not directly assess the functional state of these biomolecules in their native settings.
Recently, a breed of chemical strategies has emerged that utilizes organic synthesis to create new tools and assays to advance the field of proteomics (12,13). In this review, we will describe chemical approaches for both abundance-based and activitybased proteomics, with an emphasis on methods that permit the quantitative comparison of proteins, including low-abundance and membrane-associated proteins, in samples of high complexity.

Chemical approaches for determining the abundance and post-translational modification of proteins in complex proteomes
The most common method currently employed by proteomics researchers to monitor changes in protein abundance is 2DE-MS, in which proteins are typically visualized and quantified by staining. Traditional staining methods, including Coomassie Blue and silver staining are cost-effective, but offer limited dynamic range and sensitivity (14). To improve these features, fluorescent dyes like SYPRO-ruby have recently been developed (15). Nonetheless, independent of the staining method employed, 2DE methods suffer from a lack of resolving power that hinders the detection of several important classes of proteins, including membrane-associated (5) and low abundance proteins (3).
The isotope-coded affinity tag (ICAT) method -a chemical approach to quantify protein abundance in complex proteomes. As an alternative to 2DE-MS, a gel-free method for quantitative proteomics has been introduced that relies on chemical labeling reagents referred to as isotope-coded affinity tags (ICAT) (16). These chemical probes consist of three general elements: a reactive group capable of labeling a defined amino acid side chain (e.g. iodoacetamide to modify cysteine residues), an isotopically coded linker, and a tag (e.g., biotin) for the affinity isolation of labeled proteins/peptides (Fig. 1A). For the quantitative comparison of two proteomes, one sample is labeled with the isotopically light (d0) probe and the other with the isotopically heavy (d8) version. To minimize error, both samples are then combined, digested with a protease (i.e. trypsin), and subjected to avidin affinity chromatography to isolate peptides labeled with isotopecoded tagging reagents. These peptides are then analyzed by liquid chromatography-mass spectrometry (LC-MS). The ratios of signal intensities of differentially mass-tagged peptide pairs are quantified to determine the relative levels of proteins in the two samples.
ICAT circumvents several of the previously described limitations of gel-based methods, providing improved access to important portions of the proteome, like membrane-associated and low abundance proteins. For example, ICAT has been used to compare the microsomal fractions of normal and 12-phorbol 13-myristate acid (PMA)treated samples of the human HL-60 leukemia cell line (17). In this in vitro model of cellular differentiation, the ICAT method was capable of measuring the relative levels of 491 proteins, many of which were membrane-associated proteins and/or proteins of moderate to low abundance. Notably, this study identified previously unknown isoformspecific changes in protein kinase C that occurred during PMA-induced differentiation.
Recently, the ICAT technology was converted to a format for the solid phase capture and release of chemically tagged peptides (18). In this study, the solid phase isotope-tagging reagent consisted of a thiol-specific reactive group attached via an isotopically modified amino acid (either d0 or d7 leucine) to an o-nitrobenzyl-based photocleavable linker bound to an aminopropyl-coated glass bead (Fig. 1B). Each of the two proteomes under comparison was digested with trypsin and its cysteine-containing peptides captured with either the light or heavy form of the solid phase reagent. The light and heavy beads were then combined, washed, and exposed to UV light to induce photocleavage of the linker. The isotopically labeled peptides, now present in solution, were then analyzed by LC-ESI-MS-MS. Compared to the original solution phase ICAT approach, the solid phase strategy required less sample handling and provided greater sensitivity for quantitative protein analysis. On the other hand, because solid phase ICAT involves proteolysis prior to probe labeling, the solution phase ICAT method may still be preferred in cases where the separation of labeled proteins is desired. For example, solution phase ICAT methods have been used in combination with 2DE to concurrently quantify changes in protein expression and modification state that occur in the yeast proteome in response to a metabolic shift (19).
Chemical methods to measure protein phosphorylation in complex proteomes. Building on the success of ICAT, related chemical proteomics strategies have been introduced to evaluate the posttranslational modification state of proteins. In particular, several chemical reagents have been developed to measure the phosphorylation state of proteins in complex proteomes (20). Traditional methods for detecting protein phosphorylation include metabolic radiolabeling with 32 P inorganic phosphate (21,22) and affinity chromatography with either immobilized metal-affinity chromatography (IMAC) (23) or phospho-specific antibodies (24). However, each of these techniques exhibits shortcomings for quantitative proteome analysis (Table 1). For example, metabolic labeling with 32 P requires a viable cell source and therefore is not applicable for the proteomic analysis of human tissue specimens. Additionally, transitioning from the detection of 32 P-labeled proteins on 2DE gels to the molecular identification of these proteins can be challenging without the availability of target-specific enrichment reagents (e.g, antibodies). Affinity chromatography procedures like IMAC and phospho-specific antibodies have typically suffered from high levels of background binding by nonphosphorylated peptides and poor quantitation [notably, however, recent advances in IMAC methodologies may help to overcome these deficiencies (25)].
Two chemical tagging strategies for quantitative phosphoproteome analysis have recently been described. The first approach, concurrently put forth by two independent research groups (26,27), involves the sequential base-catalyzed b-elimination of the phosphate group and nucleophilic addition of an affinity tag to the resulting dehydroalanine residue (Fig. 2). In both methods, cysteine residues on proteins were first oxidized with performic acid to prevent crossreactivity in subsequent steps. Then, treatment with base transformed phosphoserine and phosphothreonine residues into Michael acceptors susceptible to nucleophilic attack with ethanedithiol (EDT). A reactive biotin reagent was then coupled to the free thiol end of EDT-modified sites, permitting the purification of the originally phosphorylated proteins by avidin-affinity chromatography (enriched as either whole proteins or as peptides if preceded by digestion with trypsin). Affinity-isolated biotinylated peptides were then analyzed by ESI-LC-MS/MS, permitting the identification of the corresponding proteins, as well as the specific sites of phosphorylation on these proteins. In the study by Goshe and colleagues, this method was adapted for quantitative analysis of phosphoproteomes by incorporating phosphoprotein-specific isotope-coded affinity tags (PhIAT) (27). These PhIAT reagents consisted of two isotopic derivatives of EDT [a light (d0) and heavy (d4) version], each of which served as the nucleophile for one of the two proteomes under comparison (Fig. 2). The light and heavy PhIAT-modified proteomes were then combined, processed, and analyzed by ESI-LC-MS/MS as previously described for ICAT.
Because the strategy detailed above requires the b-elimination of a phosphate group in order to expose a site for affinity tagging, it is not capable of monitoring the phosphorylation state of tyrosine residues on proteins. In contrast, a second chemical method for phosphoproteome analysis developed by Zhou and colleagues is applicable to phospho-seryl, -threonyl, and -tyrosyl residues (28). In this approach, proteins were first alkylated with iodoacetamide to block cysteine residues and then enzymatically digested with trypsin (Fig. 2). Following protection of the amino groups of the resulting peptide mixture with tBOC chemistry, the carboxyl/phosphoryl groups were modified with ethanolamine in a carbodiimide-catalyzed reaction. Treatment with acid promoted the hydrolysis of the less stable phosphoramidate bonds, which were then reacted with a cystamine disulfide-bonded dimer. Reduction of the cystamine substituent resulted in the exposure of a free thiol group at each site of phosphorylation in the peptide sample.
Thiol-modified phosphopeptides were then captured on the solid phase by reaction with iodoacetyl groups immobilized on glass beads. After stringent washing, phosphopeptides were released from the solid phase by phosphoramidate bond cleavage with trifluoroacetic acid and analyzed by ESI-LC-MS/MS.
A comparison of the phosphate elimination and phosphoramidate modification methods suggests that these approaches offer complementary advantages for phosphoproteome analysis. The phosphate elimination method requires fewer modification steps and results in a chemically modified peptide suitable for tandem MS analysis to identify the specific site of phosphorylation (Table 1). However, this strategy is only applicable to phospho-serine and phospho-threonine peptides. In contrast, the reversible phosphoramidate modification protocol can analyze any type of phosphorylated peptide, but involves numerous derivatization steps and results in the recovery of unmodified phosphate groups, which typically disassociate during tandem MS analysis, confounding efforts to determine sites of phosphorylation. Importantly, however, both methods reduce sample complexity, while at the same time enriching for phosphorylated proteins, and therefore should provide access to low abundance constituents of the phosphoproteome. Additionally, because these chemical strategies offer a means to quantify changes in the phosphoproteome by isotope tagging, they should facilitate the discovery of molecular changes in signal transduction cascades associated with particular physiological and/or pathological processes.

Chemical approaches for determining the activity of proteins in complex proteomes
Conventional proteomics methods record variations in protein abundance and therefore provide only an indirect estimate of changes in protein activity. Accordingly, these approaches may fail to detect important post-translational forms of protein regulation such as those mediated by protein-protein and/or protein-small molecule interactions (7).
To address these limitations, chemical strategies have been developed for activity-based protein profiling (ABPP) that employ active site-directed probes to determine the functional state of enzymes in complex proteomes (12). Chemical probes for ABPP consist of at least two molecular elements: 1) a reactive group for binding to and covalently modifying the active sites of many members of a given enzyme class (or classes), and 2) a chemical tag for the rapid detection and isolation of reactive enzymes ( Figure 3A). Because these probes possess moderately reactive electrophilic groups, they are poised to selectively modify enzyme active sites, which are often enriched in nucleophilic amino acid residues important for catalysis.
To date, two general strategies for ABPP have been devised: 1) directed approaches that target specific classes of enzymes, and 2) non-directed approaches that profile enzymes from several different classes. The chemical foundation for each of these methods, as well as examples of their biological application will be reviewed below. Although valuable for the affinity purification of probe-reactive proteins, biotinconjugated ABPP probes displayed several shortcomings for the systematic detection of enzyme activities in complex proteomes. In particular, biotin labeling events must be visualized indirectly, typically with avidin-horseradish peroxidase complexes and chemiluminescent substrates. These assays are limited in sensitivity, throughput, and dynamic range, thus hindering efforts to rapidly and quantitatively compare large numbers of proteomic samples. To address these limitations, the FP reactive group has been conjugated to fluorescent tags (either rhodamine or fluorescein), permitting the use of direct in-gel fluorescence scanning as a rapid, sensitive, and quantitative screen for activity-based protein labeling events (42). Notably, Patricelli and colleagues have estimated that in-gel fluorescence scanning can detect on the order of 100 amol of FPrhodamine-labeled enzyme (42), a value nearly two orders of magnitude more sensitive than the detection limit of biotin-conjugated probes (29). Thus, a two-tiered strategy for ABPP has since been adopted in which first, fluorescent probes are used for rapid and quantitative comparative proteome analysis, and second, biotin probes are applied to affinity enrich and identify differentially expressed enzyme activities.
Capitalizing on the technical advantages afforded by both fluorescent and biotinavidin ABPP methods, Jessani and colleagues set out to comparatively profile a panel of human cancer cell lines to determine if a global analysis of serine hydrolase activities would yield proteomic information of sufficient quantity and quality to depict higherorder cellular properties (43). In this study, cancer cell proteomes were split into three fractions prior to characterization: the secreted, membrane, and cytosolic fractions.
Profiling of these proteomic fractions from eleven breast and melanoma cancer lines identified a cluster of serine hydrolase activities that distinguished these lines based on tissue of origin. Interestingly however, nearly all of these enzymes were downregulated in the most invasive cancer lines examined, which instead upregulated a distinct set of serine hydrolase activities that included the protease urokinase and a novel membraneassociated enzyme, KIAA1363. A more detailed analysis revealed that most of the serine hydrolase activities responsible for classifying cancer cells into subtypes based on tissue of origin and/or state of invasiveness resided in the secreted and membrane proteome, suggesting that these proteomic fractions were particularly enriched in enzyme markers of cellular behavior. Collectively, these studies demonstrate that ABPP can generate molecular profiles that accurately depict higher-order cellular properties, and in the process, identify uncharacterized enzyme activities, like KIAA1363, that may represent new biomarkers and/or therapeutic targets for the diagnosis and treatment of human ABPP probes that target tyrosine phosphatases. Lo and colleagues have reported the synthesis and application of first generation activity-based probes to profile members of the protein tyrosine phosphatase (PTP) family (49). These probes were comprised of a mechanism-based reactive group [a 4-fluoromethyl-1-phosphophenyl substituent (50)], a diethylene glycol linker, and a biotin or dansyl tag (Fig. 3B). The authors hypothesized that PTP-catalyzed hydrolysis of the phosphate group would promote a 1,6 elimination of the fluorine atom to form a highly reactive quinone methide that might label phosphatase active sites. Consistent with this notion, the tyrosine phosphatase PTP-1B, but not other proteins like phosphorylase b and albumin, was covalently modified by these mechanismbased probes. Still, high probe concentrations (1 mM) were required to label PTP-1B and further studies will be needed to determine if such conditions are compatible with profiling members of the PTP family in complex proteomes.

Non-directed ABPP -the design and application of libraries of activity-based chemical
probes that target multiple classes of enzymes. As described above, the creation of activity-based probes for some enzyme classes, like serine and cysteine hydrolases, has been relatively straightforward. Because active site-directed affinity labels were already known for these enzymes, chemical proteomics researchers needed only to couple these "reactive group" elements to an appropriate linker and detection/isolation tag to generate probes for ABPP. For many enzyme classes, however, cognate affinity labels do not yet exist, thus limiting the scope of such directed ABPP efforts. To expand the number of enzyme classes addressable with ABPP methods, Adam and colleagues have introduced a non-directed or combinatorial strategy in which libraries of candidate probes are synthesized and screened against complex proteomes for activity-dependent protein reactivity (51,52).
To demonstrate the feasibility of non-directed approaches for ABPP, a relatively small library of candidate probes was synthesized that incorporated the following elements: 1) a variable alkyl/aryl binding group, 2) a sulfonate ester reactive group, 3) an aliphatic linker, and 4) a rhodamine or biotin tag (for the detection and affinity isolation of protein targets, respectively) (Fig. 4A). By selecting a carbon electrophile (sulfonate ester) as the reactive group element of the probe library, it was hoped that the probes would label the active sites of enzymes from several different mechanistic classes. In support of this hypothesis, natural products bearing carbon electrophiles have been identified that covalently modify the active sites of a diverse number of enzymes, including wortmannin, which targets kinases (53), microcystin, which targets phosphatases (54), and fumagillin, which targets metalloproteases (55). Rhodaminetagged members of the sulfonate probe library were applied to tissue and cell line proteomes in a screen for specific protein reactivities, which were defined as those that occurred in native, but not heat-denatured proteomes (conversely, proteins showing heatinsensitive reactivity were considered non-specific targets) (Fig. 4B). The authors hypothesized that proteins reacting with sulfonate probes in a heat-sensitive manner would possess structured sites for small molecule interactions, and that these sites would often determine the biological activity of the protein (e.g., the active site of enzyme). In these initial studies, several heat-sensitive sulfonate targets were detected in both soluble and membrane proteomes (51,52). Interestingly, most proteins showed preferential labeling with specific members of the sulfonate library, indicating that the varied binding group element was at least in part specifying the proteome reactivity of the probes.
Biotin-conjugated sulfonate probes were used to affinity isolate several protein targets and these proteins were identified by mass spectrometry methods as enzymes from nine mechanistically distinct classes (Table 2). Each enzyme was recombinantly expressed in COS-7 cells to confirm its sensitivity to sulfonate labeling. For several enzymes, additional evidence was obtained that sulfonate probes were modifying the active site.
For example, the addition of cofactors or substrates was found to reduce the labeling of some enzymes (52,56), while the sulfonate reactivity of other proteins was either enhanced or inhibited by known allosteric regulators of catalytic activity (56). For one enzyme, aldehyde dehydrogenase, the sulfonate probes were also shown to act as active site-directed irreversible inhibitors (51). Finally, it is notable that several of the sulfonate targets, including glutathione S-transferase GSTO 1-1 (51), tissue transglutaminase (56), and platelet-type phosphofructokinase (56), were found to be upregulated in invasive breast cancer cells, indicating that non-directed methods for ABPP can identify novel protein markers of discrete pathological states.
In summary, through the development and application of non-directed methods for ABPP, Adam and colleagues have shown that activity-based chemical probes compatible with whole-proteome analysis can be generated for numerous enzyme classes.
Strikingly, none of the enzymes labeled by the sulfonate library represented targets of previously described proteomics probes. This finding suggests that proteomics researchers are still far away from saturating the amount of "active site space" addressable with chemical probes. Nonetheless, successful attempts to further expand the scope of ABPP will likely require probe libraries of considerable chemical and structural diversity, as well as efficient strategies to screen proteomes for targets of these profiling tools. The future design of ABPP probes would also benefit from a deeper understanding of the parameters that drive the probe-enzyme reactions observed to date with the sulfonate library. The absence of a shared catalytic mechanism among the sulfonate targets argues that other features are dictating probe labeling. Efforts to identify the specific sites of sulfonate modification on the targeted enzymes may help to define the molecular properties that support active site-directed labeling events. Finally, it is worth considering how often active site-directed labeling actually equates with an "activitybased" event. For certain enzymes like tissue transglutaminase, sulfonate labeling appears to provide an exquisite readout of catalytic activity, as both properties show a strict requirement for calcium and inhibition by the allosteric regulator GTP (56). On the other hand, for some enzymes, active site modification may occur on non-catalytic residues, akin to the manner in which microcystin labels a non-catalytic cysteine residue in the active sites of serine/threonine-phosphatases (54). Can such labeling events be considered activity-based? From a pure mechanistic standpoint, the answer would be no; however, from a more biological perspective, if, as is often the case, enzyme activity is regulated in vivo by autoinhibitory domains, protein partners, and/or small molecules that sterically obstruct the active site (7), then any probe that is sensitive to such molecular interactions would provide an effective readout of the functional state of the enzyme in the context of the cell biology of the proteome.

Conclusions and future directions
In this review, we have highlighted a promising new class of proteomics methods that has united the fields of synthetic chemistry and protein biochemistry to create powerful tools and assays for the global analysis of protein expression and function. Chemical approaches like the isotope-coded affinity tag (ICAT) method offer proteomics researchers the opportunity to compare the expression level of low abundance proteins in samples of high complexity (16,18). The extension of ICAT methods to chemical probes specific for phosphorylated peptides has engendered assays to monitor changes in the post-translational modification state of proteins in cell and tissue proteomes (26)(27)(28).
Finally, both directed and non-directed strategies for activity-based protein profiling (ABPP) have produced a menu of chemical probes that can be used either separately or in combination to discover enzyme activities associated with discrete physiological and/or pathological states (29, 32, 43,52,56). The value of ABPP as a method for functional proteome analysis has been further highlighted by its application as a screen to evaluate the potency and selectivity of enzyme inhibitors (30, 32). Nonetheless, despite the considerable advances made to date, chemical approaches for proteome analysis still face significant technical challenges. Perhaps most notably, an unsatisfying trade-off seems to exist between the need for high sample throughput and the desire for in-depth analysis of individual proteomes. For example, with a 1D gel format, hundreds of proteomic samples treated with ABPP probes can be readily analyzed in a single day by a given academic lab (43). However, the modest resolution afforded by 1D gels likely will result in some low abundance and/or co-migrating protein targets eluding detection. In contrast, proteomic investigations that utilize LC as a separation method can achieve exceptional resolution of chemically tagged peptides, but with a much lower sample throughput. In the end, the optimal platform with which to analyze probe-labeled proteomes will likely depend on the biological question being addressed. Indeed, if extensive detail is sought on a select number of samples, then one may wish to apply all of the proteomics methods described above, thereby approaching a complete picture of dynamics in protein abundance, modification state, and activity.   Table 2. Representative enzyme activities identified from mouse and human proteomes by non-directed ABPP methods using a sulfonate ester probe library.

Enzyme Enzyme Class Proteome Source
Acetyl CoA acetyltransferase* Thiolase Mouse Heart