ANIBAL, Stable Isotope-based Quantitative Proteomics by Aniline and Benzoic Acid Labeling of Amino and Carboxylic Groups*

Identification and relative quantification of hundreds to thousands of proteins within complex biological samples have become realistic with the emergence of stable isotope labeling in combination with high throughput mass spectrometry. However, all current chemical approaches target a single amino acid functionality (most often lysine or cysteine) despite the fact that addressing two or more amino acid side chains would drastically increase quantifiable information as shown by in silico analysis in this study. Although the combination of existing approaches, e.g. ICAT with isotope-coded protein labeling, is analytically feasible, it implies high costs, and the combined application of two different chemistries (kits) may not be straightforward. Therefore, we describe here the develop-ment and validation of a new stable isotope-based quantitative proteomics approach, termed aniline benzoic acid labeling (ANIBAL), using a twin chemistry approach targeting two frequent amino acid functionalities, the carboxylic and amino groups. Two simple and inexpensive reagents, aniline and benzoic acid, in their 12 C and 13 C form with convenient mass peak spacing (6 Da) and without chromatographic discrimination or modification in fragmentation behavior, are used to modify carboxylic and amino groups at the protein level, resulting in an identical peptide bond-linked benzoyl modification for both reactions. The ANIBAL chemistry is simple and straightforward and is the first method that uses a 13 C-reagent for a general stable isotope labeling approach of carboxylic groups. In silico as well as in vitro analyses clearly revealed the increase in available quantifiable information using such a twin approach. ANIBAL was validated by means of model peptides and proteins with regard to the quality of the chemistry

Identification and relative quantification of hundreds to thousands of proteins within complex biological samples have become realistic with the emergence of stable isotope labeling in combination with high throughput mass spectrometry. However, all current chemical approaches target a single amino acid functionality (most often lysine or cysteine) despite the fact that addressing two or more amino acid side chains would drastically increase quantifiable information as shown by in silico analysis in this study. Although the combination of existing approaches, e.g. ICAT with isotope-coded protein labeling, is analytically feasible, it implies high costs, and the combined application of two different chemistries (kits) may not be straightforward. Therefore, we describe here the development and validation of a new stable isotope-based quantitative proteomics approach, termed aniline benzoic acid labeling (ANIBAL), using a twin chemistry approach targeting two frequent amino acid functionalities, the carboxylic and amino groups. Two simple and inexpensive reagents, aniline and benzoic acid, in their 12 C and 13 C form with convenient mass peak spacing (6 Da) and without chromatographic discrimination or modification in fragmentation behavior, are used to modify carboxylic and amino groups at the protein level, resulting in an identical peptide bond-linked benzoyl modification for both reactions. The ANIBAL chemistry is simple and straightforward and is the first method that uses a 13 Creagent for a general stable isotope labeling approach of carboxylic groups. In silico as well as in vitro analyses clearly revealed the increase in available quantifiable information using such a twin approach. ANIBAL was validated by means of model peptides and proteins with regard to the quality of the chemistry as well as the ionization behavior of the derivatized peptides. A milk fraction was used for dynamic range assessment of protein quantification, and a bacterial lysate was used for the evaluation of relative protein quantification in a complex sample in two different biological states. Molecular & Cellular Proteomics 7:800 -812, 2008.
Quantitative proteomics aims at globally comparing protein abundance levels between two or more biological conditions, e.g. cellular states or different environmental or health versus disease conditions. The determination of changes in protein expression is a key element in proteome research and widely applied for the functional analysis of biological systems and the detection of diagnostic, prognostic, and clinical markers. In the past 30 years, most of the proteomics experiments have been conducted with two-dimensional (2D) 1 PAGE as the main separation technique using gel image comparison to obtain relative quantitative information (1). 2D PAGE analysis is still widely used and yields excellent results. However, global proteomics by 2D gel electrophoresis/MS has always been limited by the discrimination of analytically challenging protein groups, such as membrane proteins, very large or small proteins, and very acidic or basic proteins (2)(3)(4).
In 1999, Gygi et al. (5) introduced a gel-free quantitative proteomics technique in conjunction with gel-free protein/ peptide separations called ICAT. The advantage of stable isotope-based methods relies on their efficiency when coupled to MS. As a consequence, several strategies have been developed based on stable, non-radioactive isotopes like 2 H, 13 C, 15 N, and 18 O and can be classified in three stable isotope labeling experiment categories: (i) metabolic stable isotope labeling like stable isotope labeling by amino acids in cell culture (SILAC) using growth medium with stable isotopelabeled amino acids (6) or via isotope-labeled nutrients like [ 15 N]ammonium salt (7,8); (ii) isotope tagging by chemical reaction such as ICAT, isotope-coded protein labeling (ICPL), or isobaric tagging for relative and absolute quantification (iTRAQ) (5,9,10); and (iii) enzyme-catalyzed reactions like 16 O/ 18 O exchange in terminal carboxylic groups conferred by trypsin or other proteases (11,12). Typically two biological conditions (control versus case) are defined and compared with regard to their protein expression profile. For this purpose, two chemically identical tags with different isotopic composition and thus different masses (light and heavy) are utilized. After labeling, the samples are combined and fractionated by liquid chromatography. Finally mass spectrometry is used to identify proteins from the derived peptides, and relative quantification is achieved by comparing ion intensities within the light and heavy labeled peptide pairs. Moreover in the case of iTRAQ, multiplexing is also feasible with simultaneous comparison of up to four different conditions (9).
Although all three categories show specific advantages and drawbacks (13,14), many proteomics experiments are nowadays performed using isotope tagging by chemical derivatization. This is so certainly because of its universal applicability to any kind of biological sample (cells, tissues, or biological fluids) including ex vivo sampling from higher organisms and because of the wide range of available quantification targets, i.e. amino acids amenable to derivatization (Cys, Lys, Met, Trp, His, Asp, Glu, or N/C termini) (15)(16)(17). Almost all approaches are based on a single labeling, but Liu and Regnier (18) described for the first time a strategy using a dual chemistry targeting amino and carboxylic groups at the peptide level.
In two previous studies, we have described the development and application of two tags, namely sulfophenyl isothiocyanate and sulfanilic acid, for quantitative proteomics and shown their performance when combined with MALDI ionization (19,20). Both tags showed efficient ionization in MALDI and simplified fragmentation due to the sulfonate group incorporated by both the amino-directed sulfophenyl isothiocyanate and the carboxyl-directed sulfanilic acid tag into the peptide. However, the tagged peptides revealed decreased ionization efficiency in electrospray mode when compared with their untagged, native counterparts.
Here we describe a new isotope-coded chemical reaction optimized for electrospray ionization called aniline benzoic acid labeling (ANIBAL) that relies on a twin/dual chemistry approach tagging carboxylic and amino groups at the protein level. Both reactions are based on carbodiimide chemistry to activate carboxylic groups for further reaction with primary amino groups. Light and heavy tags have a mass shift of 6 Da and do not show any retention time shift because of the 13 C isotope substitution in their phenyl ring. The ANIBAL technique was first applied to protein standards for optimization, then to a milk fraction containing six major abundant proteins for dynamic range assessment, and finally to a bacterial whole cell lysate (in a 1:1 ratio) to confirm the applicability to a complex sample.
In Silico Prediction-The human International Protein Index (IPI) database version 3.26 was uploaded from the European Bioinformatics Institute. In a first analysis, the total number of each amino acid was determined, and the mean and median values per protein were calculated for all 67,665 sequences available. Secondly the number of peptides containing quantifiable information and the number of proteins without quantifiable information were determined for each amino acid by sequentially considering each of them as a possible chemical target for a stable isotope chemical labeling strategy. More precisely, all proteins were in silico digested using trypsin in two ways: R/P for labeled lysine and KR/P for all other labeled amino acids. For each generated peptide, the mass and the amino acid composition were determined and sequentially submitted to a quantification test by considering each amino acid at a time as chemically labeled. The following criteria were necessary to pass the test: (i) containing at least one of the selected targets and (ii) having a mass between 400 and 4000 Da. Finally the same analysis was performed using chemical functional groups as targets. Carboxylic, amino, thiol, and indole groups were considered based on their known reactivity in biochemistry. All combinations of groups were also tested.
NHSS ]benzoic acid were dissolved in 5.0 ml of dry DMF. The reaction mixture was stirred overnight at room temperature and then cooled to 4°C and stirred for another 4 h. The precipitated dicyclohexylurea was removed by filtration and washed with 2 ml of dry DMF. The product was precipitated from solution by adding a 20ϫ excess of ethyl acetate and incubation at 4°C for 2 h. The precipitate was filtrated and stored in a desiccator.
Ionization Efficiency-500 pmol of BSA (corresponding to 30 nmol of carboxylic groups) were dissolved once in 20 l of 1 M pyridine, pH 5.0, containing 375 nmol/l [ 12 C]aniline and once in 20 l of 1 M pyridine without [ 12 C]aniline. 5 l of a 3.5 mol/l EDC solution was added to both samples to start the reaction. The final COOH:aniline: EDC ratio was approximately 1:250:500. Both samples were incubated for 2 h at room temperature, and the reaction was stopped by the addition of 5 l of concentrated acetic acid. Both reactions were pooled, and BSA was precipitated with the ProteoExtract kit as described in the manufacturer's protocol. After precipitation, the pellet was resuspended in 80 l of 100 mM ammonium bicarbonate, reduced for 30 min at 60°C by adding 10 l of 45 mM DTT, alkylated for 30 min at room temperature in the dark by adding 10 l of 100 mM iodoacetamide, and then digested overnight with trypsin (1:50 (w/w) trypsin: protein ratio) at 37°C. Finally five-replicate MS analysis was conducted.
Aniline Labeling of Human Milk-The human milk fraction was prepared as described previously (22), and the protein concentration was determined by Bradford test. 50 g of fractionated human milk aliquots (corresponding to approximately 50 nmol of carboxylic groups) were derivatized in 20 l of 1 M pyridine, pH 5.0, containing a 625 nmol/l concentration of either light or heavy aniline and sonicated for 10 min. 10 l of a 2.5 mol/l EDC solution were added to start the reaction. The final COOH:aniline:EDC ratio was approximately 1:250:500. After 2 h of derivatization at room temperature, both aliquots (light and heavy) were precipitated separately (Proteo-Extract kit). The precipitated pellets were resuspended in 80 l of 100 mM ammonium bicarbonate, reduced, alkylated, and digested over-night as described above. Light and heavy derivatized milk fractions were then mixed to produce the following light to heavy ratios: 0.05, 0.10, 0.20, 0.33, 0.50, 1, 2, 3, 5, 10, and 20.
Benzoic Acid Labeling of Human Milk-50 g of fractionated human milk (corresponding to approximately 25 nmol of amino groups) were derivatized in 40 l of 200 mM HEPES, pH 8.0, with light and heavy NHSS benzoate (in two different tubes) at an NH 2 :NHSS benzoate ratio of 1:500. After 4 h of derivatization at room temperature, both aliquots (light and heavy) were precipitated separately and processed (reduction, alkylation, and digestion with trypsin) as described above. Light and heavy derivatized milk fractions were then mixed to produce the following light to heavy ratios: 0.05, 0.10, 0.20, 0.33, 0.50, 1, 2, 3, 5, 10, and 20.
Lactococcus lactis Labeling with ANIBAL-Four 50-g aliquots of the soluble proteins of a total cell lysate of L. lactis were used (50 g corresponds to approximately 50 nmol of carboxylic groups and 25 nmol of amino groups). Two aliquots were derivatized in 60 l of 1 M pyridine, pH 5.0, containing a 200 nmol/l concentration of either light or heavy aniline and sonicated for 10 min. 10 l of a 2.5 mol/l EDC solution were added to initiate the reaction. The final COOH:aniline: EDC ratio was 1:250:500. Both samples were incubated for 2 h at room temperature, and the reaction was stopped by addition of 5 l of acetic acid. Samples were then mixed, and proteins were precipitated using the ProteoExtract kit.
In parallel, the remaining two aliquots were derivatized in 40 l of 200 mM HEPES, pH 8.0, with light and heavy NHSS benzoate at an NH 2 :NHSS benzoate ratio of 1:500 (approximately 4 mg of NHSS benzoate (molecular weight (light) ϭ 321; molecular weight (heavy) ϭ 327). After 4 h of derivatization at room temperature the two samples (light and heavy) were mixed (1:1) and precipitated. For both derivatizations the pellets were resuspended and processed (reduction, alkylation, and digestion with trypsin) as described above.
LC-MS Analysis-The LC-MS/MS data were acquired using an HCTultra ion trap mass spectrometer (Bruker Daltonics, Bremen, Germany) coupled on line to an Ultimate 3000 HPLC system (Dionex, Sunnyvale, CA) equipped with an analytical Magic C 18 reversed-FIG. 1. Work flow of the ANIBAL approach. Two aliquots of the sample, A and B, are derivatized using the two "chemically symmetrical" labeling approaches. After derivatization, light and heavy labeled samples are mixed, and proteins are precipitated and digested with trypsin before mass spectrometric analysis. Data are then extracted and submitted to database search before validation and quantification at the peptide level. Both labeling results at the peptide level are combined into a single result file at the protein level containing quantification information for both labeling approaches.
phase column (100 ϫ 0.075 mm, 5 m) (Spectronex, Basel, Switzerland). The scan range was set at m/z 400 -1600. For each injection, the peptide mixture was loaded and washed for 10 min with 2% (v/v) acetonitrile, 0.1% (v/v) formic acid on a C 18 PepMap100 trapping column (5 ϫ 0.3 mm, 5 m) (LC Packings, Amsterdam, Netherlands) at a flow rate of 20 l/min prior to elution with a linear gradient of 5-50% (v/v) acetonitrile, 0.1% (v/v) formic acid at a flow rate of 0.3 l/min over 30 min (BSA), 60 min (milk fraction), and 90 min (L. lactis). Peptides were analyzed using the "peptide scan" option of the HCTultra system consisting of a full-scan MS spectrum acquisition in "standard-enhanced" mode (8,100 m/z/s) for charge state assignment based on the 13 C isotope envelope followed by three MS/MS scans in "ultra scan" mode (26,000 m/z/s) on the three most abundant ions as well as exclusion of singly charged ions with preferred charge state set to double for MS/MS selection.
Data Extraction-Data were extracted using DataAnalysis 3.4 (Bruker Daltonics) generating peak lists in mgf format. Bruker Daltonics yep files were transformed to mzXML using CompassXport 1.3.2. An in-house script called da2tpp was used to transform mgf files to new mgf files or dta files used for Mascot (Matrix Science, London, UK) and Sequest TM (ThermoFisher, San Jose, CA) search and to be compatible for further validation using the Trans-Proteomic Pipeline version 2.9.8 (Institute for Systems Biology, Seattle, WA).
Database Search-Data were searched using Mascot or/and Sequest against the Swiss-Prot database using the taxonomy "other Mammalia" for BSA and "Homo sapiens" for the milk fraction. The L. lactis subsp. cremoris SK11 (GenBank TM accession number NC_009004) database was used for the bacterial samples. New modifications for aniline and benzoic acid were created in the Mascot modification file as described in supplemental Data 1.
Mascot search parameters were set as follows.
• Trypsin (KR/P) was used for aniline labeling and Arg-C (R/P) was used for benzoic acid labeling with in both cases two missed cleavages allowed. • Peptide tolerance was 100 ppm, and MS/MS tolerance was 0.6 Da.
Sequest search parameters were set as follows.
• Trypsin (KR/P) was used for aniline labeling or "trypsin_R" (R/P) was used for benzoic acid labeling with in both cases two missed cleavages allowed.
• Peptide tolerance was 100 ppm, and fragment ion tolerance was 0.8 Da.
Data Validation and Quantification-The mzXML, Mascot dat files, and Sequest dta/out directory were transferred on a Linux server with the Trans-Proteomic Pipeline version 2.9.8 installed on it. Search results were transformed to pepXML using either Mascot2XML or Out2XML. Data were then analyzed using PeptideProphet TM (23) and filtered using a 0.05 error rate. Quantification was performed using both Xpress (24) and ASAPRatio (25) for the milk sample, whereas only ASAPRatio was used for the bacterial sample. Finally proteins were validated using ProteinProphet TM (26) with again a 0.05 error rate, and protein quantification was checked manually.

RESULTS
Principle of the ANIBAL Approach-The complete work flow is shown in Fig. 1. The ANIBAL strategy relies on a FIG. 2. Illustration of the ANIBAL strategy. A, synthesis steps and chemicals. Aniline (either 12 C or 13 C) is used as a reagent toward carboxylic groups. Benzoic acid (either 12 C or 13 C) on the other hand is used to label amino groups. To be reactive, it is first transformed into a stable reactive intermediated called NHSS benzoate. B, reaction scheme for aniline. C, reaction scheme for NHSS benzoate. DCC, N,NЈ-dicyclohexylcarbodiimide; SA, sulfanilic acid; RT, room temperature.
combined "symmetric" chemistry approach targeting two frequent amino acid side chains resulting in the incorporation of the same peptide-like benzoyl group into the protein as illustrated in Fig. 2. Targeting both amino and carboxylic groups enables a broad proteome coverage for relative quantification experiments. For each pair of biological conditions to be compared, two pairs of samples with equal protein amounts are prepared and immediately derivatized using the two different labels either in their light ( 12 C) or heavy ( 13 C) forms. Light or heavy aniline is used to derivatize all carboxylic groups of the proteins, namely Asp or Glu residues as well as the C terminus. Similarly NHSS benzoate, a carbodiimideactivated form of benzoic acid, is used to derivatize all amino groups (Lys and N terminus). After combining both mixtures for each labeling approach, proteins are digested, and peptides are analyzed by high throughput LC-MS/MS. As peptides with identical sequence derived from the two different states only differ in mass, they appear as doublets in the acquired MS spectra. Relative quantification and peptide ratios are determined by performing an extracted ion chromatogram for each mass of the doublets. Proteins are identified by CID MS/MS and searching protein sequence databases with different search algorithms. Finally results are integrated into one single result file containing identification as well as quantification information of peptides derived from the two different labeling strategies.
In Silico Survey of Amino Acid Distribution among Proteins-To assess the suitability of multiple labeling approaches in terms of proteome coverage, a survey of amino acid distribution as well as protein functional groups was performed using the human IPI database version 3.26 as a model. Table I shows the results obtained for each amino acid plus both termini. The first two columns display the total number and the mean and median number of amino acids per protein. The last two columns show the mean and median number of quantifiable peptides per protein as well as the percentage of proteins without any quantification information available again for each chemically targeted amino acid. Peptides were generated in silico with trypsin (KR/P), and a peptide was considered quantifiable if it (i) bears at least one of the desired amino acids and (ii) has a mass between 400 and 4000 Da corresponding to the analysis range of doubly/triply charged tryptic peptides accessible with modern MS instruments. In the case of labeling lysine residues, a trypsin_R (R/P) cleavage was defined instead of standard tryptic proteolysis. Table II was generated with the same rules applied as for Table I but with a focus on chemical functional groups present on proteins. The four chemically best accessible groups were considered, namely amino (NH 2 ), carboxylic (COOH), thiol, and indole groups. Statistics were performed for each chemical side chain as well as for the combination of two, three, and all four functional groups. As in Table I, the two first columns correspond to the total number of groups and the mean and median number of groups per protein, and the last two columns indicate the numbers of mean and median quantifiable peptides per protein and the percentage of proteins without quantifiable information. Fig. 3 gives a graphical overview of the amino acid distribution (Fig. 3A) and the protein functional groups distribution (Fig. 3B). Fig. 3A reveals that some amino acids like leucine or serine (less than 2% of proteins escape quantification, and half of the Ͼ98% quantifiable proteins have Ͼ9 quantifiable peptides) are potentially excellent targets for stable isotope labeling experiments like SILAC in which any of the 20 amino acids can be incorporated in a tagged form. However, Leu and Ser are not amenable to stable isotope labeling by chemical reaction such as ICAT or ICPL. On the other hand, cysteines, lysines, aspartates, glutamates, or tryptophans can be chemically modified and therefore are good targets but offer only partial proteome coverage when targeted individually. For example, Ͼ9% of proteins lack cysteines, and half of the remaining proteins have less than four quantifiable peptides per protein. Very similar results were obtained for lysine residues. The cysteines and lysines are the prime targets for the frequently used and established ICAT and IPCL labeling approaches. When each technique is considered separately, the numbers are not appealing especially if one considers that in silico numbers should be at least divided by a factor of 2 to correct for peptides with poor chromatographic and/or ionization properties that could not be predicted by our in silico analysis. By contrast, if amino and sulfhydryl groups are targeted in a combined chemical approach (NH 2 :thiol combination in Fig. 3), less than 2% of proteins without quantifiable information remain, and an average of more than 16 quantifiable peptides per protein is achieved. The ideal scenario in terms of proteome coverage would be to use all four (COOH: NH 2 :thiol:indole) functional groups combined, but this comes at the expense of high cost, time, and a complex work flow. Most remarkably, as depicted in Fig. 3, the combined targeting of two frequent functional groups as pursued with our ANIBAL approach (NH 2 :COOH) reveals a near-to-ideal in silico performance with only 0.77% of proteins remaining not quantifiable and an average of more than 28 quantifiable peptides per protein available. Therefore, the in silico analysis clearly reveals that although each single chemical approach suffers from certain limitations due to amino acid distribution among proteins combinations thereof show interesting complementarities and emerge as a very powerful alternative.
Labeling of Standard Peptides and Proteins-Any protein labeling strategy has to address three critical issues: first, the completeness of the modification reaction; second, the influence of the labeling on the protein solubility; and third, the influence on the LC-MS/MS behavior (retention, ionization, and fragmentation).
To assess the completeness of the reaction for both aniline and benzoic acid, model peptides were first used for validation as already described (20). Briefly model peptides were derivatized and analyzed in short LC-MS runs to identify the different peptide derivatives after the reaction. Conditions were optimized for both labeling strategies until only a single peak corresponding to the fully derivatized peptide was obtained, indicating completeness of the reaction.
Once a quantitative reaction was achieved at the peptide level, BSA was chosen as the model protein to adapt the reaction to the protein level. To keep similar chemical ratios, the amount of reagent was adapted to the number of carboxylic or amino groups in BSA (100 ϫ COOH and 61 ϫ NH 2 ). BSA was derivatized using light reagents only, digested, and analyzed by means of LC-MS. Completeness of the reaction was first assessed by performing Mascot-driven database searches in two ways: (i) aniline or benzoic acid set as fixed modifications to identify only fully derivatized peptides and (ii) aniline or benzoic acid set as variable modifications to reveal incomplete reaction through the presence of native or partially derivatized peptides. In addition to the directed Mascot searches for identified labeled peptides, corresponding native or partially derivatized peptide masses were extracted from LC-MS/MS chromatograms to check whether low amounts of native or partially labeled peptides were remaining for which no MS/MS had been acquired. For both aniline (ANI) and Four chemically amenable amino acid side chain groups (carboxylic (COOH), amino (NH 2 ), thiol, and indole groups) were considered as chemical targets for stable isotope labeling either alone or in combination of two, three, or four. The total frequency of each functional group is displayed for all 67,687 entries of the IPI human database version 3.26 as well as the mean (median) number of groups per protein in the first two columns. The mean (median) quantifiable peptides per protein as well as percentage of proteins without quantifiable information are shown in the last two columns. benzoic acid (BA) labeling approaches, exclusively fully labeled peptides and no native or partially labeled peptides were observed in the extracted ion chromatograms (data not shown).
It is important to note that any solvents, reagents, or buffers that contain free amino groups (e.g. Tris and urea) would quench the reaction for both labeling approaches and should therefore be strictly avoided. Similarly solvents, reagents, or buffers containing carboxylic groups would compete for the aniline reaction and should be avoided too. Concerning the solubility of the proteins after labeling, no precipitation of the modified protein was observed in aqueous solution.
Finally the influence of ANIBAL on LC-MS behavior was evaluated by comparing native versus labeled BSA peptides. Identical amounts of BSA were treated in parallel under identical conditions in the presence or absence of the tagging reagent (aniline or benzoic acid). After derivatization, the reaction was stopped, and samples were mixed before protein digestion. Trypsin was used in the case of aniline labeling as only carboxylic groups had been derivatized. On the contrary, Arg-C was used for benzoic acid labeling as all lysines had been derivatized. If trypsin had been used in the case of benzoic acid labeling, labeled versus unlabeled BSA would have generated different and incomparable peptides. The digest was then analyzed by LC-MS (five-replicate analysis), and peptides were identified by database search with the tags defined as variable modifications (meaning that labeled and unlabeled peptides could be identified). At last, selected analytical parameters (Mascot score, intensity, scan number, and retention time) for 11 BSA peptides in their unlabeled and labeled forms were extracted as shown in Table III for aniline derivatization. Fig. 4B shows that similar Mascot scores were observed for both unlabeled and labeled peptides indicating that the tagging as such does not deteriorate the fragmentation of the peptides and therefore had no detrimental effect on the database searches. Fig. 4A illustrates the ionization efficiency of the 11 peptides in both unlabeled and labeled forms and indicates that the ionization efficiency of labeled peptides is in general unaltered, and thereby the tags do not compromise peptide/protein identification. Finally the peptide hydrophobicity was increased by the benzoyl group incorporation as observed by the prolonged retention times for ANI-and BA-derivatized peptides compared with their native counterparts (approximately 3.5 min per modified amino acid; Table  III). All peptides, derivatized or not, eluted before the gradient had reached 50% acetonitrile. The increase in hydrophobicity conferred by ANI and BA tagging reveals an advantageous property for both tags, namely to retain very short or hydrophilic peptides that normally do not bind to the column. This latter effect was also demonstrated by Julka and Regnier (27) by means of benzoyl derivatization to improve the retention of hydrophilic peptides in tryptic peptide mapping.
Dynamic Range Assessment of ANIBAL Technique on Abundant Milk Proteins-Besides the three already considered critical issues in protein labeling (reaction completeness, derivative solubility, and LC-MS/MS behavior of derivatives), a fourth quality criterion needs to be addressed for any labeling approach, namely the achievable dynamic range. For this purpose, a human milk fraction generated and analyzed in our laboratory in a previous study (22) was used. This fraction comprises a few very abundant proteins (albumin, lactotransferrin, immunoglobulins, etc.) besides many low abundant FIG. 3. Statistics of stable isotope labeling targets by in silico analysis using the IPI human database. A, each amino acid was evaluated as a potential target for stable isotope labeling, and the mean amino acid content per protein was calculated. Proteins were digested in silico with trypsin, and each peptide was submitted to a quantification test applying the following rules. A peptide was considered as quantifiable if it (i) contained at least one targeted amino acid and (ii) exhibited a mass between 400 and 4000 Da. Each of the 20 amino acids was considered as a potential target, and thereby each peptide was submitted to the test for all 20 cases. For each protein, the number of amino acids and the number of quantifiable peptides were stored in an array. Finally as shown graphically, the mean amino acid content and the mean number of quantifiable peptides per protein as well as the number of proteins without quantification were calculated. B, functional groups (carboxylic, amino, thiol, and indole groups) instead of amino acids were assessed in this case either alone or in combination of two, three, or even four. Processing was identical to that in A. Nter, N terminus; Cter, C terminus. species and is thus in this regard similar to blood-derived samples such as fractions of serum or plasma. We took advantage of these sample properties and the available information on its composition to select it as a "complex protein mixture of known proteins" and, at the same time, as a real life biological sample.
For both labeling approaches, the same protein amount was derivatized separately with light and heavy ANI and BA tags, respectively. The reaction conditions were adapted according to an in silico prediction of the human IPI protein database for which the mean molecular mass (approximately 48 kDa) and the number of amino (approximately 25 per protein) and carboxylic (approximately 50 per protein) groups were determined. After labeling, light and heavy labeled fractions were mixed, and proteins were precipitated and digested with trypsin. Finally different dilutions were performed to obtain ratios ranging from 0.05 to 20 (light versus heavy). Dilutions were performed after digestion to avoid precipitation problems due to different protein amounts that would arise if dilution was performed directly after labeling. Peptide mixtures were analyzed by LC-MS, and proteins were quantified using both the Xpress (24) and the ASAPRatio (25) tool from the Trans-Proteomic Pipeline (Institute for Systems Biology). Six of the most abundant proteins were assessed for the quantification, and each peptide spectrum was checked manually to ensure the reliability of both methods regarding dynamic range. Fig. 5 illustrates the results obtained for these six proteins over the assessed dynamic range (1:20 to 20:1 light versus heavy). Fig. 5, A and B, represent results for aniline labeling obtained with Xpress and ASAPRatio, respectively; Fig. 5, C and D, report on the benzoic acid labeling approach. Logarithmic curves clearly show that both the ANI and BA method are linear over 2 orders of magnitude (error Ͻ15% between relative ratios of 0.1 and 10). For ratios below 0.05 or above 20, it was no longer possible for Xpress and ASAPRatio to precisely calculate the ratios of the proteins because it was critical to accurately acquire the mass signals of the smaller peaks in the peptide pairs. Moreover and log- Eleven peptides of BSA are displayed with their respective Mascot score, intensity, and retention time (RT) in their native (white) or aniline-labeled (gray) form. Differences in retention time between unlabeled and labeled peptides as well as number of carboxylic groups (equals number of modifications) are shown. Periods in sequences represent cleavage sites.
ically, the quantification accuracy depended on the ionization efficiency. Peptides with high ion intensities are easier to quantify over a broad dynamic range. However, the quantitative information for peptide pairs showing a 20-fold up-or down-regulation in either one of the conditions is not lost because the more abundant labeled peptide is nevertheless detected as a singlet. The difficult task in this latter case, however, is to determine whether the identification is a true positive with a 20-fold up-or down-regulation explaining the singlet or whether it is a false-positive peptide with a sequence containing a labeled amino acid and for which no labeled peptide partner is detectable in this case.
Table IV summarizes all peptides considered for identification and quantification of lactotransferrin at a light-to-heavy ratio of 1 using ANIBAL. It shows individual ratios obtained for both tagging reagents as well as the mean ratio with the S.D. for the protein. Two observations can be made. First, the ratios of the 19 identified isotopically tagged peptides (nine for aniline and 10 for benzoic acid) (indicated with an asterisk (*)) are all close to the expected ratio of 1 with a mean Xpress and ASAPRatio of 1.01 Ϯ 0.12 and 1.03 Ϯ 0.09, respectively. Second, the sets of identified peptides obtained by either ANI or BA chemistry undoubtedly reveal the complementarity of both methods because ANI and BA applied in parallel clearly increase the overall protein coverage (Ͼ56% (401 amino acids) for a 78-kDa protein) as well as improve the quantification statistics. The advantages are mainly due to the different labeled amino acids and to the different cleavage sites obtained with trypsin, e.g. free lysines with ANI labeling and blocked lysines with BA labeling.
Analysis of the L. lactis Proteome in a 1:1 Ratio-Finally as the main proof-of-principle experiment, the ANIBAL method was applied to the soluble fraction of a total cell lysate of L. lactis. Two aliquots of 100 g were derivatized with ANIBAL in a 1:1 ratio, that is 50 g versus 50 g for light and heavy aniline and 50 g versus 50 g for light and heavy benzoic acid. The light and heavy labeled reaction mixtures of each derivatization were subsequently combined, digested, and analyzed by LC-MS/MS running a 90-min gradient. For each branch of the ANIBAL work flow (see Fig. 1), data were extracted, database searches were performed with Mascot and Sequest with their corresponding isotopic tags, and peptides were quantified and validated using Xpress, ASAPRatio, and PeptideProphet (23). Peptide results from both labeling and search procedures were combined, protein ratios were calculated, and protein probabilities were computed with Pro-teinProphet (26) to yield the final ANIBAL result. A 5% falsepositive rate was accepted for protein identification, and quantifiable proteins with this 95% identification confidence were further assessed with regard to the calculated ratio. Applying a single 90-min LC gradient (5-50% acetonitrile, 0.1% formic acid), a total number of 188 proteins (43 with a single peptide hit) were identified of which 101 revealed quantitative information with an average of 3.3 quantifiable peptides per protein. Fig. 6 displays the presence or absence of the 188 identified proteins with regard to identification alone or identification plus quantification achieved by the two labeling strategies of the ANIBAL approach. Fig. 6, A, B, and C, highlight the presence/absence of proteins identified with the combined ANIBAL approach (100% of the identifications corresponding to 188 proteins; Fig. 6A), with aniline labeling alone (Fig. 6B), and with benzoic acid labeling alone (Fig. 6C). Fig. 6, D, E, and F, show the presence/absence of quantifiable proteins with the combined ANIBAL approach (Fig. 6D), with the aniline labeling alone (Fig. 6E), and with the benzoic acid labeling alone (Fig. 6F). The complementarity of both tagging methods is evident with the aniline approach resulting in overall more protein identifications and quantifications. The bias of the latter approach can be partly explained by the fact that tryptic cleavage generates more peptides in the case of aniline la- beling (cleaves at KR£) than in the case of benzoic acid derivatization (cleaves after R£), thus yielding more peptides amenable to analysis. Regarding quantification, the better yield for ANI compared with BA may be explained by the more frequent carboxylic groups compared with amino groups (on average 2-fold more COOH than NH 2 groups per protein). This experimental finding confirms the in silico prediction of achievable quantification coverage as discussed previously (Table II).
As equal amounts of L. lactis protein were derivatized with the light and heavy tag, all labeled peptide pairs, and consequently all protein ratios, should yield a light-to-heavy ratio of 1:1. Fig. 7 plots all quantified proteins with their respective ratios as calculated by ASAPRatio. All determined ratios are close to the expected 1:1 ratio with an average ratio for all proteins of 1.03 Ϯ 0.16 representing a mean difference between expected and observed ratios of less than 17% using automatic data processing tools. DISCUSSION We have developed and validated a novel approach based on twin labeling for global identification and quantification of proteins in complex mixtures. The ANIBAL concept is based on separate and complementary labeling of proteins at their carboxylic and amino groups using two very similar tags in a symmetrical chemical reaction. Both reactions are based on carbodiimide chemistry to activate carboxylic groups and result in an amide bond formation with primary amino groups. For the aniline reaction, protein carboxylic groups are activated and react with the amino group of aniline present in excess to avoid any competitive reaction due to protein amino groups. In the benzoic acid method, the tag itself contains the carboxylic group, is activated, and reacts with primary amino groups of the proteins. Both reactions result in peptide bond formation between the benzoyl reagents and the protein carboxylic or amino group, thus avoiding the introduction of any new chemical bond in the protein.
The peptide ionization efficiency is preserved if not increased. A retardation of the peptide elution time is observed and may help to retain very hydrophilic peptides on reversedphase material as already shown by Julka and Regnier (27). Performance assessment with peptide and protein standards has demonstrated a quantitative, robust, and reproducible chemistry. The successful relative quantification experiment performed with a bacterial sample has confirmed the applicability of the ANIBAL technique to a complex sample.
Differential isotope incorporation by chemical labeling was first introduced in 1999 by Gygi et al. (5) with ICAT. Meanwhile other approaches have been developed like ICPL or iTRAQ (9,FIG. 5. Dynamic range assessment in the RAM human milk fraction for aniline and benzoic acid labeling using Xpress and ASAPRatio as automated quantification tools. Theoretical (black) and experimental curves (dotted) for ANI labeling obtained for ratios spanning from 0.05 to 20 using six abundant proteins in the RAM human milk fraction using Xpress (A) or ASAPRatio (B) are given. Theoretical (black) and experimental curves (dotted) for the BA labeling obtained for ratios spanning from 0.05 to 20 using six abundant proteins in the RAM human milk fraction using Xpress (C) or ASAPRatio (D) are given. 10). These approaches rely on a chemical reaction with often harsh conditions (compared with "soft" metabolic labeling like SILAC (6)) for the protein sample. Nevertheless chemical labeling is nowadays the most frequently used technique in quantitative proteomics mainly because of its applicability to any biological sample (cells, tissues, body fluids, etc.), for its selectivity toward certain amino acids, and for the possible peptide enrichment thanks to affinity tags. However, as dem-onstrated by our in silico analysis, each of these approaches suffers from limited proteome coverage in terms of availability of accessible groups, resulting in a significant number of proteins with a limited amount of quantifiable peptides or even without quantifiable information. This constraint becomes even more important at the peptide level when LC and MS amenability differ a lot. Therefore, a labeling strategy that would increase the chances of labeling a "proteotypic" pep- tide (28), i.e. one that exhibits the best LC-MS response and functions as a unique identifier for a given protein, would be of a great value at least until species-specific proteotypic peptide databases are established and commercial kits are available for spiking these peptides into samples for absolute quantification at proteomic scale. The ANIBAL approach is a first attempt to fill this gap and take chemical labeling to a new level of proteome coverage and simplicity of procedure especially by also targeting carboxylic groups that have been resistant so far to general derivatization chemistry using a 13 C-based reagent. As for phosphopeptide enrichment where laboratories use several strategies to increase proteome coverage (29), the same should be applied to any stable isotope labeling experiment. However, the costs of such experiments are often not considered, although this represents in many laboratories a limiting factor. Commercial kits like ICAT or ICPL are expensive and are therefore for many laboratories not affordable. One objective of the ANIBAL method was to overcome this limitation by using cheaper reagents, thereby drastically decreasing the costs per sample to approximately $5 per experiment with aniline and $15 for benzoic acid, resulting in a total of approximately $20 for each combined ANIBAL experiment in which two isotope labelings are performed.
In summary, our ANIBAL approach exhibits the following features: (i) a reaction that is simple, straightforward, and quantitative; (ii) symmetrical twin chemistry targeting protein amino and carboxylic groups with identical peptide bondlinked benzoyl modification; (iii) convenient peak spacing (6 Da) and no chromatographic heavy/light discrimination ( 13 C reagents); (iv) high proteome coverage through targeting of two frequent protein functionalities; (v) identical MS and MS/MS behavior; (vi) increased retention time of modified peptides for binding of very hydrophilic peptides; and (vii) inexpensive reagents. * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.  6. Coverage of the 188 identified proteins with regard to being identified only or both identified and quantified by the two labeling strategies of the ANIBAL approach. A, all proteins identified using the ANIBAL approach. B, proteins identified using aniline labeling alone. C, proteins identified using benzoic acid labeling alone. D, protein identified and quantified by ANIBAL. E, protein identified and quantified with aniline labeling alone. F, protein identified and quantified using benzoic acid labeling only. Proteins shared by the two labeling strategies are displayed in white boxes; those unique to one strategy are shown in black boxes.
FIG. 7. Ratio distribution for the 101 proteins identified and quantified using ANIBAL on L. lactis. Light-to-heavy (L/H) ratios for each of the 101 proteins identified, for which ratios were manually validated, are shown using a logarithmic scale. The mean ratio among the 101 proteins with S.D. is displayed as well as the mean number of quantifiable peptides per protein.