Mascot File Parsing and Quantification (MFPaQ), a New Software to Parse, Validate, and Quantify Proteomics Data Generated by ICAT and SILAC Mass Spectrometric Analyses

Proteomics strategies based on nanoflow (nano-) LC-MS/MS allow the identification of hundreds to thousands of proteins in complex mixtures. When combined with protein isotopic labeling, quantitative comparison of the proteome from different samples can be achieved using these approaches. However, bioinformatics analysis of the data remains a bottleneck in large scale quantitative proteomics studies. Here we present a new software named Mascot File Parsing and Quantification (MFPaQ) that easily processes the results of the Mascot search engine and performs protein quantification in the case of isotopic labeling experiments using either the ICAT or SILAC (stable isotope labeling with amino acids in cell culture) method. This new tool provides a convenient interface to retrieve Mascot protein lists; sort them according to Mascot scoring or to user-defined criteria based on the number, the score, and the rank of identified peptides; and to validate the results. Moreover the software extracts quantitative data from raw files obtained by nano-LC-MS/MS, calculates peptide ratios, and generates a non-redundant list of proteins identified in a multisearch experiment with their calculated averaged and normalized ratio. Here we apply this software to the proteomics analysis of membrane proteins from primary human endothelial cells (ECs), a cell type involved in many physiological and pathological processes including chronic inflammatory diseases such as rheumatoid arthritis. We analyzed the EC membrane proteome and set up methods for quantitative analysis of this proteome by ICAT labeling. EC microsomal proteins were fractionated and analyzed by nano-LC-MS/MS, and database searches were performed with Mascot. Data validation and clustering of proteins were performed with MFPaQ, which allowed identification of more than 600 unique proteins. The software was also successfully used in a quantitative differential proteomics analysis of the EC membrane proteome after stimulation with a combination of proinflammatory mediators (tumor necrosis factor-α, interferon-γ, and lymphotoxin α/β) that resulted in the identification of a full spectrum of EC membrane proteins regulated by inflammation.

formed with Mascot. Data validation and clustering of proteins were performed with MFPaQ, which allowed identification of more than 600 unique proteins. The software was also successfully used in a quantitative differential proteomics analysis of the EC membrane proteome after stimulation with a combination of proinflammatory mediators (tumor necrosis factor-␣, interferon-␥, and lymphotoxin ␣/␤) that resulted in the identification of a full spectrum of EC membrane proteins regulated by inflammation.

Molecular & Cellular Proteomics 6:1621-1637, 2007.
In recent years, nanoflow (nano-) 1 LC-MS/MS has emerged as an efficient alternative to two-dimensional electrophoresis in the field of proteomics. This technology has proved to be a powerful method for the identification of proteins in complex mixtures and has been applied to characterize the proteome of several organisms, organelles, and multiprotein complexes. Moreover many developments have been made to use nano-LC-MS/MS-based strategies in differential proteomics studies to compare the proteome of two or more samples in a quantitative or semiquantitative way. Although recent approaches use direct comparison of the MS peptide signals from independent nano-LC-MS/MS runs (1)(2)(3)(4)(5), most studies up to now have used quantitative methods based on isotopic labeling of proteins or peptides combined with nano-LC-MS/MS analyses (6,7). In these approaches, light and heavy isotopic labels are introduced into the proteins from the different samples to be compared. The samples are then mixed together, and a single nano-LC-MS/MS analysis is run. The relative abundance of a given protein can then be deduced from the ion signal intensity ratio calculated for light/heavy peptide pairs from this protein. This leads to a more accurate relative quantification of the proteins from the samples to be compared because the samples are analyzed simultaneously in a single nano-LC-MS/MS run. In the ICAT method, proteins are chemically labeled on cysteines with a biotinylated heavy or light reagent (8,9), whereas in the SILAC (stable isotope labeling with amino acids in cell culture) method, the label is introduced during protein synthesis by growing cells in a medium containing a heavy or light amino acid (10). In these strategies, systematic identification of as many proteins as possible is usually performed by nano-LC-MS/MS analysis with shotgun approaches involving prefractionation of the proteins or the peptides, and correctly assigned proteins can be quantified afterward on the basis of the MS signal of the corresponding peptides. This in turn leads to the production of a huge amount of MS/MS and MS spectra that must be handled for identification and quantification, thus necessitating appropriate bioinformatics tools.
Data analysis and validation of the results from MS/MS searches have become major issues of mass spectrometrybased proteomics, and a lot of efforts are made to provide efficient tools for evaluating and organizing data. Although Mascot (11) and Sequest (12) remain the two reference softwares that are widely used for protein identification from MS/MS data, the protein matching lists that they return still contain false positives and skip some false negatives. To improve the reliability of results, new search engines and scoring techniques were recently developed. These include the S-score (13), the softwares PeptideProphet and Protein-Prophet (14 -16), and the new Phenyx search engine based on the OLAV algorithm (17). Other tools and methods aiming at facilitating the validation and handling of Mascot results include MSQuant (18) and STEM (19). Here we describe a new program named Mascot File Parsing and Quantification (MF-PaQ) that allows fast and user-friendly verification of Mascot result files as well as data quantification from an experiment performed by isotopic labeling using either ICAT or SILAC methods.
This software provides an interactive interface with Mascot results. It is based on three modules, the Mascot File Parser module, the quantification module, and a third module designed for differential analysis in which validated protein lists are compared.
The potentialities of the MFPaQ software are illustrated by the analysis of the results from a nano-LC-MS/MS proteomics study of membrane proteins from primary human endothelial cells (ECs). ECs, which form a monolayer lining all blood vessels, play a key role in diverse physiological and pathological processes, including chronic inflammatory diseases such as rheumatoid arthritis, in which they are involved in the regulation of leukocyte extravasation, angiogenesis, cytokine production, protease and extracellular matrix synthesis, antigen presentation, vasodilatation, and blood vessel permeability (20,21). In this study, we tried to better characterize the membrane proteome of human ECs and to set up methods for quantitative analysis of this proteome by ICAT labeling. Microsomes from ECs were fractionated by 1D SDS-PAGE, resulting gel slices were analyzed by nano-LC-MS/MS, and database searches were performed using Mascot. Data validation and clustering of proteins were performed with MF-PaQ, which allowed the identification of more than 600 unique proteins. The software was then successfully used to perform quantification of proteins from a 1:1 heavy/light c-ICAT labeling test experiment. Finally we stimulated human ECs with a combination of key proinflammatory cytokines, TNF-␣, IFN␥, and lymphotoxin ␣/␤, and performed a differential proteomics analysis using the ICAT method. The validated results obtained using MFPaQ software allowed the identification of 44 EC membrane proteins regulated by inflammation.

MFPaQ Details and System
Requirements-MFPaQ is a Webbased application that runs on a server on which Mascot Server 2.1 and Perl 5.8 must be installed as well. It functions with an Internet Information Services Web Server under Windows XP Pro edition and Windows 2003 Server. Scripts are written in Perl language and use the modules XML-Simple, Spreadsheet-WriteExcel, and GD. The user interface is accessible via a Web browser: Microsoft Internet Explorer and Mozilla Firefox are currently compatible with the application. Proteomics data (protein and peptide identifications, validated protein lists, and quantification results) are stored in the XML file format. To perform quantification, an external module called "Extract Daemon" has been developed for extracting intensity values from raw data. This module was developed in Visual Basic.Net and works at the moment with ".wiff" files acquired on a QStar XL or QStarElite instrument (Applied Biosystems, Foster City, CA). It must be installed on the same server as MFPaQ on which Analyst QS 1.1 or Analyst QS 2.0 should be installed as well. Two versions of the application, compatible with these corresponding versions of Analyst QS, are freely available at mfpaq.sourceforge.net. Although Mascot 2.1 and Analyst QS are necessary to process and quantify new data with MFPaQ, the application can be installed alone and is able to display all detailed protein lists and peptide information presented in the results section. MS/MS spectra for all assigned peptide sequences can be viewed if Mascot has been installed on the same computer.
Purification of Microsomes-Cells were washed with PBS and collected with a cell scraper in 0.25 M sucrose, 10 mM Hepes, 2 mM MgCl 2 , pH 7.6, supplemented with protease inhibitors (Complete, Roche Applied Science). Cell lysis was performed with an Ultraturax homogenizer, and the resulting homogenate was centrifuged for 10 min at 800 ϫ g to remove nuclei and cell debris. The postnuclear supernatant was centrifuged for 10 min at 10,000 ϫ g, resulting in a pellet enriched in mitochondria that was not analyzed. The supernatant was centrifuged at 200,000 ϫ g for 45 min, and the microsomal pellet was washed by resuspension in 100 mM Na 2 CO 3 , pH 12, to remove soluble contaminants and centrifuged again at 200,000 ϫ g for 45 min. The washed pellet was solubilized in 50 mM Tris, 6 M urea, 0.5% SDS, pH 8.3. Protein concentration was determined with the reductant compatible-detergent compatible assay (Bio-Rad).
c-ICAT Labeling-Microsomal proteins (96 g) in 50 mM Tris, 6 M urea, 0.5% SDS, pH 8.3, were reduced with tris(2-carboxyethyl)phosphine HCl (0.1 mmol) for 2 h at room temperature and labeled with one unit of heavy or light c-ICAT (Applied Biosystems) for 3 h at room temperature in the dark. The reaction was stopped by adding Laemmli buffer to the samples, resulting in a 25 mM DTT final concentration. Samples labeled with the heavy or light reagent were then mixed and loaded on a 1D SDS-PAGE gel to fractionate the protein mixture and eliminate excess ICAT reagent.
Analysis by 1D Gel/Nano-LC-MS/MS-Microsomal proteins were fractionated on a 1D SDS-PAGE gel (1.5 mm ϫ 8 cm), the gel was briefly stained with Coomassie Blue, and the entire migration lane was cut into 20 homogeneous gel slices. Gel slices were washed and digested with modified sequencing grade trypsin (Promega, Madison, WI), and resulting peptides were extracted. For unlabeled proteins, extracted peptides were directly analyzed by nano-LC-MS/MS. For the ICAT labeling experiment, labeled peptides were purified by affinity chromatography on a monomeric avidin cartridge according to the manufacturer's protocol (Applied Biosystems, Framingham, MA). Peptides were eluted from the cartridge with 30% ACN, 0.4% TFA in H 2 O and dried down in a SpeedVac, and the cleavable biotin moiety of the labeling reagent was then submitted to acid hydrolysis according to the manufacturer's protocol. Resulting peptides were analyzed by nano-LC-MS/MS using an LC Packings system (Dionex, Amsterdam, The Netherlands) coupled to a QStar XL mass spectrometer (Applied Biosystems). Dried peptides were reconstituted in 12 l of solvent AЈ (5% ACN, 0.05% TFA in HPLC-grade water), and 6 l were loaded onto a precolumn (300-m inner diameter ϫ 5 mm) using the Switchos unit of the LC Packings system, delivering a flow rate of 20 l/min solvent AЈ. After desalting for 7 min, the precolumn was switched on line with the analytical column (75-m inner diameter ϫ 15-cm PepMap C 18 ) equilibrated in 95% solvent A (5% ACN, 0.1% formic acid in HPLC-grade water) and 5% solvent B (95% ACN, 0.1% formic acid in HPLC-grade water). Peptides were eluted from the precolumn to the analytical column and then to the mass spectrometer with a gradient from 5 to 50% solvent B (during either 60 or 80 min) at a flow rate of 200 nl/min delivered by the Ultimate pump. The QStar XL was operated in information-dependant acquisition mode with the Analyst QS 1.1 software. MS and MS/MS data were recorded continuously with a 5-s cycle time. Within each cycle, MS data were accumulated for 1 s over the mass range m/z 300 -2000 followed by two MS/MS acquisitions of 2 s each on the two most abundant ions over the mass range m/z 80 -2000. Dynamic exclusion was used within 60 s to prevent repetitive selection of the same ions. Collision energies were automatically adjusted according to the charge state and mass value of the precursor ions. The MS to MS/MS switch threshold was set to 10 cps.
Database Searching-The Mascot Daemon software (version 2.1.6) was used to automatically extract peak lists from Analyst QS .wiff files and to perform database searches in batch mode with all the .wiff files acquired on each gel slice. For creation of the peak lists, the default charge state was set to 2ϩ, 3ϩ, and 4ϩ. MS and MS/MS centroid parameters were set to 50% height percentage and a merge distance of 0.1 amu. All peaks in MS/MS spectra were conserved (threshold intensity set to 0% of highest peak). For MS/MS grouping, the following averaging parameters were selected: spectra with fewer than five peaks or precursor ions with less than 5 cps or more than 10,000 cps were rejected, the precursor mass tolerance for grouping was set to 0.1 Da, the maximum number of cycles per group was set to 10, and the minimum number of cycles per group was set to 1. MS/MS data were searched against all entries in the public database UniProt version 8.1, which consists of Swiss-Prot Protein Knowledgebase Release 50.1 and TrEMBL Protein Database Release 33.1 (3,192,898 entries in total), using the Mascot search engine (Mascot Daemon, version 2.1.6; Matrix Science, London, UK). To evaluate the false positive rate in these large scale experiments, we repeated the searches using identical search parameters and validation criteria against a random database. The database was the compilation of UniProt Swiss-Prot and UniProt TrEMBL databases (same versions described above) in which the sequences have been reversed. Oxidation of methionine was set as a variable modification for all Mascot searches, and for ICAT labeling experiments, alkylation of cysteine with light 12 C c-ICAT and with heavy 13 C c-ICAT also was set as a variable modification. Specificity of trypsin digestion was set for cleavage after Lys or Arg, and two missed trypsin cleavage sites were allowed. The peptide MS and MS/MS tolerances were set to 0.15 and 0.25 Da, respectively.

MFPaQ Features
MFPaQ is a software tool that facilitates organization, mining, and validation of Mascot results and offers different functionalities to work on validated protein lists. A schematic overview of the program is given in Fig. 1. The software is organized around a core module, the Mascot File Parser ("MFP") module that extracts data from Mascot result files (.dat) and allows the user to browse, validate, and cluster the results. The MFP module stores protein and peptide lists in .xml files that can be used by the "differential analysis" module to compare the lists of proteins from two or more experiments and by the "quantification" module to compute the ratios of the proteins in an isotopic labeling experiment (ICAT or SILAC). The software is a Web-based application that runs on a server (where Mascot Server is installed and the .dat result files are generated). It can be accessed by different users via a Web browser. Each user can create his own profile by defining several criteria that will be used by the MFP module to validate the proteins extracted from Mascot files. User profiles and criteria can be modified and saved at any time to perform another extraction using different criteria. Each user works under a personal session in which he can create and store experiments.

Description of the MFP Module
A first module, the Mascot File Parser, performs validation and classification of the proteins from a result data file according to Mascot scoring or according to user-defined criteria based on the number, score, and rank of identified peptides. This module offers to the user a convenient interface to manually validate or reject ambiguous identifications. It can also group identical or highly homologous proteins from several result data files to eliminate redundancy and to provide a global and relevant list of the proteins present in the sample. The use of the MFP module consists in three main steps detailed below corresponding to the extraction of Mascot files in batch mode, protein validation, and generation of protein lists.
Extraction of Mascot Files in Batch Mode-The MFP module offers the possibility to create an "experiment" corresponding to the extraction of one or several Mascot result files (.dat files). Depending on how the shotgun analysis of a protein sample is conducted, it may be relevant to perform either a single Mascot search or several searches for this sample. For example, if a whole complex protein mixture is enzymatically digested and the resulting peptides are fractionated using chromatography (e.g. on a strong cation exchange column), each peptide fraction will then be analyzed by nano-LC-MS/MS, and different peptides belonging to the same protein will be analyzed in several of these nano-LC-MS/MS runs. In this case, making a unique peak list from all the MS/MS scans acquired in all the runs will be necessary to identify efficiently the proteins in a single Mascot database search. Conversely if the protein mixture is fractionated first (e.g. in a series of 1D gel slices) and each protein fraction is digested, then peptides from each fraction will be analyzed by nano-LC-MS/MS, and all the peptides from a protein will be analyzed in the same run. In that case, several Mascot database searches should be performed with the different peak lists obtained from the nano-LC-MS/MS runs, and the different protein lists obtained should be gathered afterward to avoid erroneous assignments of MS/MS spectra acquired in one fraction to a protein present in another fraction. In this way, no information is lost in the identification process, and the physicochemical properties of the proteins that were used to perform fractionation in the first step (e.g. molecular weight in the case of 1D SDS-PAGE separation) may represent an additional parameter of interest for the validation of protein identification. For example, in the case of an ambiguous identification by Mascot, a strong discrepancy between the theoretical molecular weight of the predicted protein and the experimental molecular weight corresponding to the gel slice on which the analysis was performed can be used as a criterion by the user to reject the identification. In both cases, MFPaQ provides a clear interface for visualizing, mining, and organizing the results of a multisearch experiment. The software extracts in batch mode the data contained in a series of Mascot .dat files specified by the user under an experiment and displays a table with links to a validation window for each of these searches as illustrated in Fig. 2A.
Validation of Proteins-The MFP module extracts protein entries from Mascot files and can rank them according to either the Mascot "Standard scoring" or "MudPIT scoring." To facilitate manual validation, the software applies to the proteins of the list a two-color code related to filtering rules defined by the user under its configuration profile. Proteins that passed the "validation criteria" are displayed in green. They can be considered as confident hits that do not need further verification and will automatically be checked in the validation window. Proteins that meet the "exclusion criteria" are discarded and are not displayed in the list. All other proteins, which are considered as ambiguous identifications, appear in red and can be manually verified by the user. The filtering rules used for the classification of a protein in green and red are based either on the protein score defined in Mascot or on multiple criteria related to the peptide matches (sequence interpretation of an MS/MS spectrum) assigned to this protein. In the first case, the software basically displays in green color the "significant hits" list given in the Mascot Peptide summary report. Mascot uses the probability-based Mowse algorithm to calculate ion scores, defined as Ϫ10 ϫ log(p) where p is the probability that the observed match for this ion is a random event. Protein scores are derived from ion scores as a non-probabilistic basis for ranking protein hits and are computed differently in Standard scoring and MudPIT scoring. The significant hits list given by Mascot contains the proteins with total scores higher than the significance threshold, which depends on the database size and is calculated by default with the probability for a match to occur at random with a probability of less than 5% (p Ͻ 0.05). However, some false positives are clearly present in this list, and some false negatives are missing. Although Mascot still appears to be at the moment one of the most efficient search engines, this lack of specificity has prompted a lot of efforts from several groups to set up more reliable scoring systems. Although promising, these systems will need further validation and are not yet of general use. Therefore, manual validation is still often performed by many users at least for borderline proteins around the Mascot threshold. The MFP module is very helpful for this process because it can classify the proteins according to more or less stringent criteria based on the number, the rank, and the score of the peptide matches assigned to a protein.
The protein displayed in MFPaQ will still be ranked according to Mascot scoring, but proteins possessing for example at least two bold and red peptide matches in Mascot, with scores higher than 40, will appear in green and will be automatically validated. Proteins that do not fulfill these criteria, although being in the significant list of Mascot, will appear in red and will have to be verified manually. To that aim, all the information given by Mascot for a protein hit is also available in the MFP window: protein mass, pI, total score, list of assigned queries with the corresponding peptide sequence, theoretical and experimental masses of the peptide matches, delta value between these two masses, score and rank of the FIG. 2. Visualization and parsing of Mascot results in the MFP module. A, an experiment is created in MFP from a series of Mascot .dat files, corresponding for example to several nano-LC-MS/MS runs performed on consecutive gel slices of a 1D gel. Extracted Mascot results are automatically validated according to user-defined criteria. However, the user keeps a trace of the fraction (.dat) manual validation process: before being verified, modified, and saved, they appear in red, and afterward they are displayed in green. B, protein validation is also color-coded: green proteins fulfill user-defined criteria, whereas ambiguous identifications are displayed in red. The proteins are ranked according to Mascot scoring. In the expanded view of the window, peptide information is available, and links to MS/MS spectra allow verification of the peptide sequence assignment. peptide matches, and E-value for the assignment. Links to the Mascot "Peptide view" window containing MS/MS centroided spectra are available as well and allow rapid verification by the user of ambiguous proteins displayed in red (Fig. 2B). It has to be noted that in MS/MS strategies identical peptides can often be mapped to different protein sequences present in a database, corresponding either to redundant sequences, amino acid variants, splice isoforms, different protein fragments, or protein homologs. The Mascot software automatically groups together protein sequences matching exactly the same set of peptides. Under the MFP module, it is possible to display a concise list of the proteins identified where only one member of each group appears but also a detailed list containing all the members of each group of proteins sharing the same set of peptides. Moreover MFP is able to detect protein homologs or protein fragments related to another protein ranked higher in the list. These proteins are usually identified with a subset of shared peptides (displayed as red and nonbold peptides in Mascot) but are not grouped together with the previous hit because additional, specific peptide sequences are also assigned to them. The MFP module displays these proteins in italic if these supplemental sequences are low scoring peptide matches (score lower than 30) that do not allow their identification as specific hits (in that case, these proteins or protein groups were not validated in the following results section). However, it displays them as real specific hits if they have at least one high scoring (score higher than 30) red and bold specific peptide match. These features, and the interactive validation window, enable the user to save a lot of time by browsing and easily validating the results.
Saving Protein Lists-Once the verification has been performed, validated proteins (or protein groups), including all associated peptide information, are saved in XML files and can be exported into Excel. Another important feature of the MFP module is the possibility of generating exclusion lists for further nano-LC-MS/MS experiments. Such lists can be used to perform a second nano-LC-MS/MS run of the same sample in which intense ions that were already assigned to a validated protein in the first run will not be selected again for MS/MS, potentially giving the mass spectrometer more time to sequence less abundant peptides. Finally the MFP module can also generate a unique, non-redundant list of proteins from all the validated result files of a multisearch experiment. This unique feature from MFPaQ is particularly useful when protein fractionation is performed because the same protein can be identified several times in adjacent gel slices. The software compares proteins or protein groups (composed of all the proteins matching the same set of peptides) and creates clusters from protein groups found in different gel slices if they have one common member. This feature allows the editing of a global list of unique proteins (or clusters) representing the entire sample analyzed in the experiment.

Application of the MFP Module to the Identification of Membrane Proteins from Primary Human ECs
EC microsomes were prepared, washed with sodium carbonate at high pH to enrich the mixture in integral membrane or membrane-anchored proteins, fractionated by 1D SDS-PAGE, and analyzed by nano-LC-MS/MS. Analysis of highly hydrophobic proteins is often difficult because the classical buffers used in many protein separation techniques (twodimensional electrophoresis and liquid chromatography) and the conditions compatible with enzymatic digestion are often not efficient enough to solubilize them, leading to protein aggregation and precipitation. 1D SDS-PAGE is a well suited approach for the separation of highly hydrophobic proteins because they can be efficiently solubilized in Laemmli buffer and fractionated. The enzymatic digestion step can then be easily performed in gel once the proteins have been fixed in the gel and the SDS has been washed out of the bands. Twenty gel slices were cut all along the migration lane, digested with trypsin, and analyzed by nano-LC-MS/MS with a 60-min-long gradient. Mascot results obtained for all of them were filtered out and validated with the MFP module of MF-PaQ. Table I presents the number of proteins or protein groups identified in each gel slice when using different criteria for protein validation: either Mascot scoring (Standard or MudPIT) or criteria based on the number, the rank, and the score of the peptide matches. Validation based on the Mascot Standard scoring and a protein score higher than 34 (p Ͻ 0.05) resulted in a final non-redundant list of 1477 protein groups (data not shown), whereas validation based on Mascot MudPIT scoring, with the same threshold, gave a final nonredundant list of 855 protein groups (Table I, column 1). Performing a random database search and applying the later criterion for validation (Mascot MudPIT score higher than 34) led to the identification of 101 protein groups, indicating a false positive rate on the previous list of about 11%. Using the same procedure with the Standard scoring we obtained a false positive rate of 16%. When the stringency of filtering was increased by validating only proteins with at least two reliable peptide matches (rank 1 and individual score higher than 34), the number of validated proteins went down to 491. A random database search with the same criteria led to the validation of only two proteins, indicating a false positive rate of 0.4% (Table I, column 2). Thus, although this list appeared to be much more reliable according to the estimated false positive rate, it was also much more restrictive and potentially omitted a large number of false negatives. Among them, many proteins were identified on the basis of only one peptide match, and we thus tested several criteria of validation to rescue some of these proteins while maintaining an acceptable level of false positive rate. In addition to the proteins identified with more than two peptide matches with individual score higher than 34, we allowed automatic validation of proteins identified with a single peptide match. When the minimal score of these single peptide hits was set to 41 (p Ͻ 0.01), the list of validated protein groups significantly increased to 706, but the false positive rate went up to 4% (Table I, column 3). Finally intermediate criteria were selected by setting the minimal score for these single peptide match hits to 50, which gave a final non-redundant list of 626 protein groups and a false positive rate of 0.6% (Table I, column 4). We chose to use this criteria for automatic validation with MFP (green proteins; see Fig. 2). A manual check was additionally performed on ambiguous proteins that did not fulfill these criteria (displayed in red in the MFP window) and that potentially still contained false negatives. Manual verification of the MS/MS spectra allowed the rescue of 107 positive hits (Table I, column 4) when the fragmentation data were of high quality and strongly indicative of the peptide sequence (at least four consecutive y ions and a delta mass between measured and theoretical peptide molecular mass lower than 0.1 Da). The list of 626 proteins automatically validated by MFP with the above mentioned criteria is provided in Supplemental Data 1. For more clarity, only one member of each protein group (proteins matching the same set of peptides) is displayed. The lists of protein groups identified in each gel slice fraction with all peptide information associated are also provided (Supplemental Data 2) as well as annotated MS/MS spectra in the case of single peptide-based matches (Supplemental Data 9). The complete database search results with detailed protein groups and peptide assignments can be viewed and browsed over by downloading the MFPaQ software and associated data files at mfpaq.sourceforge.net. Automatic classification of the protein list according to Gene Ontology annotations was then performed with the GoMiner software (discover.nci.nih.gov/gominer/) and indicated that, of 450 proteins annotated in terms of subcellular localization, 254 proteins are membrane proteins, and 90 proteins appear to be localized at

Description of the MFPaQ Quantification Module
An important feature of MFPaQ is a quantification module, which extracts quantitative data from raw files obtained by nano-LC-MS/MS when using either ICAT or SILAC labeling techniques. The software allows the verification of the calculated ratios and the manual deselection of some peptide pairs or some MS scans in case of aberrant ratio calculation (coelution with other peptides, weak signal, etc.). After validation of the proteins identified, the quantification module uses the peptide lists generated by the MFP module to select the peptides containing an isotopic modification specified by the user (e.g. a cysteine modified by a c-ICAT reagent or a peptide containing an arginine in the case of a SILAC labeling with heavy arginine). To this aim, each validated result file must be associated with the corresponding raw data file (.wiff files). Intensities of peptide pairs are then extracted from the MS Survey scans of a series of raw data files in batch mode, and heavy/light ratios are computed for each peptide pair. The ratios of all validated peptide matches are averaged for each protein in a gel slice, and a coefficient of variation is calculated for the ratio of the proteins that have been quantified with several peptide matches (Fig. 4A). When a protein is identified and quantified several times in consecutive gel slices, a final protein ratio is computed by averaging the different ratios found for this protein in the different fractions, and a global coefficient of variation is calculated. Proteins or protein groups identified and quantified in different fractions are also clustered to generate a final non-redundant list of protein groups, with their normalized protein ratio and the associated global coefficient of variation, presented in the "Quantification report." To check and validate the quantification results, direct links are provided for each protein ratio to a "Quanti Viewer" window showing all data used for quantification of an individual protein. These include the list of isotopically labeled peptide pairs identified for this protein with peptide score, mass, and elution time; the list of MS scans used to extract peptide intensities; and the corresponding MS spectra of the peptide pairs (Fig. 4B). The program automatically selects the MS scans of good quality to reconstitute the elution peaks for each member of the peptide pair. Then it computes the elution profile intensities and the corresponding ratio. Another feature of MFPaQ is to manually deselect some MS scans or directly deselect some peptide pairs in the case of aberrant ratio calculation (co-elution with another peptide, weak signal, etc.). The ratios are then automatically recalculated and updated in the quantification report.

Validation of the MFPaQ Quantification Module Using a 1:1 Heavy/Light c-ICAT Ratio of Labeled Proteins Extracted from EC Microsomes
To test the efficiency of ICAT labeling of EC microsomal proteins as well as the efficiency of quantification by the MFPaQ software, we performed a 1:1 heavy/light test labeling experiment using 100 g of microsomes. Equal amounts of material were labeled with either light or heavy c-ICAT and mixed together. Proteins were solubilized in 6 M urea and 0.5% SDS to improve protein denaturation and labeling efficiency (23) and were then fractionated by 1D SDS-PAGE. Twenty gel slices were cut all along the migration lane and digested with trypsin. For each fraction, c-ICAT labeled peptides were enriched by monomeric avidin chromatography, the biotin moiety of the tag was then submitted to acidic cleavage, and the resulting peptides were analyzed by nano-LC-MS/MS with a 60-min-long gradient. Proteins identified by Mascot in each fraction were extracted and validated with the MFP module of MFPaQ using the optimized criteria described above (i.e. at least two peptide matches of rank 1 with score higher than 34 or one peptide match of rank 1 of score higher than 50). The MFPaQ software allowed the validation of 164 unique protein groups (Supplemental Data 3) from which 155 were assigned at least one ICAT labeled peptide match of score higher than 20 (threshold applied on validated peptide matches for MS data intensity extraction). Peptide information associated with each protein group in the different gel slice fractions are shown in Supplemental Data 4, and annotated MS/MS spectra are provided in the case of single peptidebased matches (Supplemental Data 9). Quantification was then performed on the validated protein groups using the MFPaQ quantification module. After calculation of an average ratio for each protein group identified, the software applies to all of them a normalization factor defined as the median ratio of the protein population. This compensates for a possible bias introduced during labeling if slightly different total protein amounts of the two samples to be compared are taken. In the test experiment, the software calculated a normalization factor of 0.98. This value was expected because the test experiment compared two aliquots of the same sample. Similarly it is also expected that all the normalized ratios for the identified proteins are very close to 1 because no differential expression of proteins occurs. The histogram in Fig. 5A presents the normalized ratios computed by the software for all quantified proteins (either heavy/light or light/heavy ratios are repre-sented to always obtain a final value Ͼ1). Of the 155 proteins possessing at least one ICAT labeled peptide match, 152 were successfully quantified (Supplemental Data 5). The calculated H/L and L/H ratios for this population vary between 1 and 1.22, which is in good agreement with the classical ϳ20% accuracy attributed to the c-ICAT labeling method associated with mass spectrometry analysis (8,24,25). Standard deviation from the median value of 1 is only 6%. Thus, the results of this test experiment indicate that the ICAT labeling method associated with the analytical mass spectrometry procedure described, database search result filtering using stringent criteria, and quantification with the MFPaQ software may be able to measure changes in ratios in a statistically significant way. FIG. 4. Visualization of the peptide and protein ratios from the 1:1 ICAT labeling test experiment using EC microsomes in the quantification module. A, for each validated protein list corresponding to one protein fraction (i.e. gel slice), protein ratios are calculated and displayed in an individual window along with the protein scores, numbers of peptides quantified per protein, and coefficient of variation (CV) of the protein ratios for this gel slice. NQP is the number of quantified peptide pairs. The Ratio column refers to the H/L ratios. B, detailed results for quantification of a particular protein in one gel slice can be inspected in a separate window showing m/z, scores, elution times of the ions used for quantification, and the different MS scans used for quantification for each peptide pair can be inspected. Peptide pairs and MS scans can be manually selected or deselected for a new calculation of the ratio. Min., minimum; Max., maximum; Exp., experimental; Int., intensity; Temps, time.

Quantitative Study of EC Membrane Proteins Regulated by Inflammatory Cytokines
ECs in secondary lymphoid organs and chronically inflamed tissues are found in a microenvironment rich in proinflammatory cytokines (21,26). Therefore, in an effort to mimic the inflammatory microenvironment found in vivo, cultured ECs were pretreated with a combination of potent proinflammatory cytokines before ICAT labeling of microsomal proteins. For this differential proteomics study, about 60 g of microsomal proteins from untreated ECs were labeled with light c-ICAT reagent, and the same amount of microsomal proteins from ECs stimulated with TNF-␣, IFN␥, and lymphotoxin-␣/␤ were labeled with heavy c-ICAT reagent. The samples were mixed and fractionated by 1D SDS-PAGE into 17 gel slices, which were digested with trypsin. After enrichment of c-ICAT peptides on a monomeric avidin cartridge and acidic cleavage of the tag, analysis of the peptides was performed by nano-LC-MS/MS with an 80-min-long gradient. The gradient time was increased to improve MS/MS coverage of the peptidic mixture and to maximize the number of proteins identified. Application of the MFPaQ software using the same database search parameters and protein extraction criteria as described above resulted in a final non-redundant list of 229 identified proteins. To maximize the number of quantified proteins in the experiment we then applied less stringent filtering criteria for automatic validation with the MFP module (at least one peptide match of rank 1 with ion score higher than 35, corresponding to p Ͻ 0.05) and manually checked all ambiguous proteins by close inspection of MS/MS spectra. Criteria for the manual validation of proteins were the following: at least one c-ICAT labeled peptide of rank 1 with relevant MS/MS fragmentation pattern (at least four consecutive y ions) and a good correlation between the theoretical molecular weight of the protein hit and the corresponding molecular weight of the gel slice number. In that way, we obtained a final list of 475 validated unique protein groups (Supplemental Data 6). Peptide information and annotated MS/MS spectra in the case of single peptide-based matches are shown, respectively, in Supplemental Data 7 and 9. From the 475 validated protein groups, 452 had at least one c-ICAT labeled peptide match of score higher than 20. Of them, the MFPaQ software could successfully quantify 415 protein groups (Supplemental Data 8). The normalization factor applied to all protein ratios for this experiment was 0.911, reflecting a 10% error in protein concentration measurement. In the final quantification report, 44 proteins are overexpressed under cytokine treatment with heavy/ light ratios between 1.6 and 24.6 ( Fig. 5B, Table II, and Supplemental Data 8), and on the other hand, 39 proteins are underexpressed with ratios light/heavy between 1.6 and 2.2. The most induced proteins are ICAM-1 (ratio of 25), vascular cell adhesion molecule-1 (VCAM-1; ratio of 21), and E-selectin, which represent major EC proteins involved in inflammation and TNF-␣ response. ICAM-1, which mediates firm adhesion of leukocytes to the vascular endothelium via interaction with lymphocyte function-associated antigen-1, is well known to be up-regulated on endothelium upon inflammation and has been shown to be important for transendothelial migration of lymphocytes (27,28). VCAM-1, which is induced on ECs at sites of inflammation, is one of the most important cell adhesion molecules involved in recruitment of monocytes via interaction with monocyte integrin VLA-4 (29,30). E-selectin mediates leukocyte rolling and is not constitutively expressed on ECs but is up-regulated upon inflammatory stimulation (31,32). Another cell adhesion protein, the activated leukocyte cell adhesion molecule (ALCAM)/CD166, which localizes to EC junctions and plays a role in monocyte transendothelial migration (33), was also shown to be upregulated although with a lower ratio (2-fold change). Many FIG. 5. Protein relative expression ratios of the final experiment quantification reports. The left part of these histograms displays proteins with heavy/light ratios Ͼ1, whereas the right part displays proteins with light/heavy ratios Ͼ1. A, quantification results from the ICAT labeling 1:1 test experiment using EC microsomes. B, quantification results from the differential proteomics study following treatment of ECs with proinflammatory cytokines and c-ICAT labeling. a Data related to the protein in the 1D gel slice where it was identified with the best Mascot protein score (major gel slices): Mascot MudPIT protein score, number of ICAT peptides pairs used by the software to quantify the protein in this particular gel slice (the number in parentheses corresponds to the total number of quantified ICAT peptide pairs), ratio computed by the software, and its associated coefficient of variation (CV) in percent (calculated if the number of ICAT peptides pairs used for quantification is higher than 1). b Final protein ratio computed for the protein in the whole experiment after averaging the different ratios found for the protein if it was quantified in different consecutive gel slices and correcting by the normalization factor. A global coefficient of variation (CV; percentage) of this ratio is also calculated if the protein was quantified in different gel slices.
c Major histocompatibility complex. d Mascot MudPIT scoring generates protein scores of 0.

Differential Induction of CD146 Isoforms in ECs Treated with Proinflammatory Cytokines
Very often a protein can be identified in several consecutive gel slices, particularly in the case of very abundant species, which will for example show significant tailing on a 1D gel lane. In this case, the ratios calculated for this protein in the different gel slices should be similar, and a low coefficient of variation on the final global protein ratio will indicate a good accuracy in quantification of the protein. However, different isoforms or fragments of a protein can also be identified in different gel slices, and although they will eventually belong to the same protein group as they will match the same set of peptides, the ratios calculated for each of them may actually differ. In that case, the final global ratio calculated for the protein group will be associated to a high coefficient of variation value indicating a discrepancy between individual gel slice calculated ratios, but this may reflect biologically relevant information. An example of such a case is given in Fig. 6 for the MUC18/CD146 protein, an immunoglobulin superfamily adhesion molecule and component of EC junctions involved in cell-cell cohesion and angiogenesis (41)(42)(43). In the 1:1 test labeling experiment, this protein is quantified with ratios close to 1 in each of the three gel slices where it was identified, leading to a final protein ratio of 1.02 with a low global coefficient of variation of about 3%. On the other hand, in the differential proteomics study following cytokine stimulation, the ratio calculated for this protein in gel slice 1 (high molecular weight fraction) is 1.76 (after correction with the normalization factor), whereas it is only 1.16 in gel slice 2 (low molecular weight fraction), leading to a final protein ratio of 1.46 with a high global coefficient of variation of about 29%. The coefficients of variation computed for the ratios in each of the two fractions are quite low, reflecting a good correlation between the values obtained for all the peptide pairs used for calculation of these ratios. Thus, the discrepancy between the protein ratios obtained in the two fractions may reflect a real difference in regulation of two distinct protein isoforms following cytokine treatment rather than a bad quantification. It could be assumed for example that a highly glycosylated form of the MUC18/CD146 is specifically induced in response to inflammatory signals. DISCUSSION In this study, we developed a new software tool, designated MFPaQ, that proved to be efficient for data validation and quantification after ICAT labeling, protein fractionation, analysis of consecutive fractions by several nano-LC-MS/MS runs, and multisearch with the Mascot engine. First this software greatly facilitated the sorting of protein lists and the verification of Mascot result files. Indeed although several search engines like Mascot, Sequest, or Phenyx are usually considered to be very efficient for protein identification, false protein assignments are clearly not avoided. In the case of the Mascot search engine, improvements were obtained in the 2.0 version with the introduction of the MudPIT scoring mode. Our study of the membrane proteome from ECs shows that application of this scoring yielded much fewer false positive hits than the Standard scoring. However, even with this new scoring, a validation step involving parsing of the results, either manually by the user or by application of automatic filters, still appears to be necessary. Convenient tools are not always available inside the identification softwares themselves to perform this task (Table III). For example, a unique filtering option can be selected in Mascot 2.0 and Mascot 2.1 to retain in the final list of proteins only those that have at least one bold and red peptide match. The filtering rules available in MFPaQ are more comprehensive because the number, the   rank, and the score of the peptide matches can be specified, and different filtering rules can be applied to validate the proteins. By performing a second Mascot search with the MS/MS data in a reversed database and by applying to the results the same filtering rules, the user can obtain a rough evaluation of the percentage of false positives associated to the automatic validation step. Thus, it is possible to adapt the stringency of the filtering rules to minimize the number of false positives while retaining a maximum of identified proteins. Other bioinformatics tools can perform proteomics data validation (Table III), like the Trans Proteomic Pipeline, which is based on the softwares PeptideProphet and ProteinProphet (16). These two softwares are powerful validation tools that were initially designed to sort, filter, and analyze the results of the Sequest search engine. They first assign measures of confidence to peptide sequences returned by Sequest, via a statistical data modeling algorithm, and then to the proteins from which they were likely derived, thus estimating the accuracy of peptide and protein identifications made. They have been very efficiently applied in large scale shotgun proteomics studies based on peptide fractionation and MS/MS data analysis using Sequest (44), but they do not appear to be suited for validation of Mascot results. Other programs like MSQuant (18) and STEM (19) offer functionalities to validate Mascot data files (Table III). However, one particular advantage of MFPaQ is that it provides a synthetic view of the identifications that can be obtained when using a shotgun strategy based on protein fractionation. Indeed several Mascot result files can be automatically parsed in batch mode with the MFP module and grouped afterward to generate a global concatenated, non-redundant list of identified protein groups.
Finally an important feature of MFPaQ is the quantification module, which provides data on protein relative expression following isotopic labeling and identification with Mascot. Some recently released commercial softwares offer the possibility to perform quantification for isotope labeling methods, like ProteinPilot from Applied Biosystems and ProteinScape from Bruker Daltonics. However, they are not always of generic use and run under a specific environment. ProteinPilot, for example, offers new functionalities both for parsing and quantifying the data from .wiff files in ICAT, iTRAQ, and SILAC but is based mainly on the results of the Paragon search engine and not on Mascot results. ProteinScape, for its part, can process the MS/MS data with several search engines, among which is Mascot, but is only designed to perform quantification on Bruker Daltonics raw files. The very latest version of Mascot, Mascot 2.2, now seems able to perform quantification but only based on the data contained in the MS/MS peak lists (e.g. iTRAQ quantification or semiquantitative label-free strategies based on peptide match counting). To perform MS-based quantification (e.g. ICAT or SILAC strategies), the intensity values for the peptides should be extracted from the raw data by another commercial program, Mascot Distiller. Although this application indeed seems promising to handle a wide range of mass spectrometer data file formats, the quantitation features are not yet implemented. It will be achieved using the Mascot Distiller Quantitation Toolbox, a program able to perform quantitation based on the relative intensities of extracted precursor ion chromatograms. The open source software MSQuant can do that and now works with a variety of MS data files formats but is specifically designed for SILAC analyses. MFPaQ has the advantage to process either SILAC or ICAT data. Moreover the MFP module and the quantification module of MFPaQ are well suited to easily manage an experiment constituted of multiple Mascot search result data files, corresponding for example to several protein fractions. This is an important feature because protein fractionation is very often performed in differential proteomics studies based on isotopic labeling. Indeed such studies usually proceed in two steps: first, as many proteins as possible are identified by MS/MS and database searching; and second, some of these proteins can be quantified if they were identified with a peptide bearing the isotopic modification by extracting the intensities of the peptide pair from raw MS data. This means that in such approaches only proteins that were identified first can potentially be quantified afterward. Thus, although very good quantification may be achieved on major protein components of a complex mixture, variation of expression of minor protein components may well be missed because these species will not be identified. This represents a major drawback, particularly if changes are expected to occur on low abundance species. It is thus critical in these strategies to extensively characterize the sample and to identify as many proteins as possible to track variations on a maximum number of species. A classical way for that is to perform a shotgun analysis of the sample, for example by protein fractionation, which currently seems to constitute the most efficient method to maximize the analytical coverage of a highly complex protein mixture. Thus, it is important that bioinformatics tools for data quantification can process and integrate data obtained after protein fractionation. The MFPaQ software is particularly useful for that in contrast to other quantification programs (Table III). Indeed it can generate a global quantification report to integrate and synthesize all the data obtained for a protein in the different fractions in which it could be identified and quantified. Display of coefficients of variation for protein ratios in each fraction makes it possible to track potential errors in quantification due for example to erroneous calculation of a specific peptide pair ratio and to exclude them from quantification to improve the final result. Moreover display of the global coefficient of variation on the final averaged protein ratio allows the evaluation of the statistical significance of the final value calculated by the software.
Here we applied the MFPaQ software to characterize the membrane proteome of human ECs and the variation of protein expression profile in response to cytokine stimulation. The MFP module of the MFPaQ software proved to be very useful for the proteomics analysis of EC membrane proteins. More than 600 proteins were identified after fractionation of the crude microsomal fraction from primary human ECs by 1D SDS-PAGE and nano-LC-MS/MS analysis (Supplemental Data 1). More than 55% of these proteins are membrane proteins according to automatic bioinformatics classification; this represents a relatively good enrichment of the membrane proteome compared with similar studies (45). The list of identified proteins comprises at least 41 known endothelial cell surface markers (Supplemental Data 1), including classical endothelial markers such as CD31/PECAM-1, VE-cadherin, ICAM-2, Tie-2, CD146/MUC18, podocalyxin, endothelial cellselective adhesion molecule, angiotensin-converting enzyme/ CD143, endothelin-converting enzyme, dipeptidyl-peptidase IV/CD26, and ALCAM/CD166. Strikingly although all these classical EC markers were identified in a proteomics study of luminal EC plasma membrane proteins freshly isolated from rat lungs in vivo, they were not found in EC surface proteins purified from cultured rat lung microvascular ECs (46). Therefore, our results indicate that cultured primary human ECs (HUVECs), a widely used in vitro EC model first described in 1973 (47), retain a cell surface phenotype closer to the in vivo EC phenotype than cultured rat lung ECs. In addition, our results suggest that the number of EC membrane proteins that differ between ECs in vivo and in vitro, previously suggested to be ϳ50% (46), may have been overestimated.
To the best of our knowledge, this is the first proteomics analysis of EC membrane proteins regulated by inflammatory cytokines. We used a combination of key proinflammatory mediators (TNF-␣, IFN␥, and lymphotoxin ␣/␤) to mimic the inflammatory microenvironment and performed a differential quantitative proteomics study using the ICAT method and the quantification module of the MFPaQ software. Our results revealed that ICAM-1, VCAM-1, and E-selectin, three critical cell adhesion molecules for leukocyte-endothelium interactions in inflammation (27), are the major EC membrane proteins up-regulated by inflammatory stimuli and the only ones induced more than 15-fold. These proteomics results are fully consistent with previous microarray data showing that ICAM-1 (-fold change, 111.9), E-selectin (-fold change, 48.0), and VCAM-1 (-fold change, 31.7) mRNAs were the most significantly induced after TNF-␣ treatment of human primary ECs (48). E-selectin, ICAM-1, and VCAM-1 mediate the initial rolling and arrest steps of leukocyte-EC interactions (27), which are followed by leukocyte transendothelial migration through EC junctions. Interestingly we identified two components of EC junctions that are regulated by proinflammatory mediators in human primary ECs. These two molecules, CD146 high molecular weight isoform and ALCAM/CD166, belong to the same protein family of immunoglobulin cell adhesion molecules, consisting of five extracellular immunoglobulin domains, a single transmembrane domain, and a short cytoplasmic tail, and may function in cell-cell cohesion (33,41). The up-regulation of these proteins in ECs treated with proinflammatory cytokines may therefore play an important role in the response of ECs to inflammation at the level of EC junctions and leukocyte transendothelial migration.
In conclusion, the present work validates the use of a new software tool for fast and efficient parsing of proteomics results obtained from several Mascot files and extraction of quantitative data from raw MS files in isotopic labeling strategies using either the ICAT or SILAC technique. Developments are in progress to adapt this software to additional types of labeling strategies including iTRAQ labeling and 15 N metabolic labeling. Moreover a clear perspective of development for the application is to improve its compatibility with different MS platforms. The first module of the software, the Mascot File Parser, runs independently of the MS acquisition software and is thus of general use for proteomics platforms equipped with various instruments. The current quantification module, for its part, is dedicated to process .wiff data files generated on QStar instruments by the Analyst QS software. Future versions of this module will be compatible with data files acquired with different types of mass spectrometers and with different MS acquisition softwares. The MFPaQ software, as well as all validated proteomics data associated with this study, are freely available at mfpaq.sourceforge.net.